Software development is a mature field, built on decades of experience. Organizations have sophisticated methodologies for developing, testing, debugging, and putting apps into production as stable products consumers can rely on.
Machine learning is a lot newer. While many of the elements are the same, new levels are involved in building and training an ML tool that customers can trust.
As data scientist Nicole Janeway Bills noted in an excellent article on why machine learning models fail, 87% of data science projects don't make it into production. Surprisingly, the reasons aren't by and large complex and esoteric. Rather, they're basic things many companies are failing at. Here are our key takeaways.
Lack of Agility
Agile development is a methodology that enables developers to keep up with the rapid rate of business and technology change. As computers started to fill the workplace in the 1990s, businesses faced a crisis: business needs were developing faster than software companies could fill them. Back then, it took three years to develop software to meet a new business need. Not only was that too long, but by the time the project was well underway, the needs of the client base had usually changed.
Agile software was proposed in the early 2000s to cope with the need for rapid development. It replaced the top-down Waterfall method that had dominated in the 90s with an iterative process that focused on teamwork, collaboration, and responding to change. Rather than building everything one stage at a time from a master plan, developers learned to collaborate with a diverse group of stakeholders with different perspectives, get frequent customer feedback, and build in short cycles, changing the project as needed along the way.
Many ML companies (and quite a few software developers) still struggle with this and a lack of diversity in the industry makes the problem worse. But when ML developers succeed at recruiting people from a range of backgrounds and collaborating closely with each other as a team, the result is a better product for the customer.
Lack of Sufficient Customer Involvement
When we ask for customer feedback, it's not just pro forma. Working closely with customers is always a good idea, but it's mission-critical for a lot of reasons in ML. ML models depend on data, but your dataset is never perfect. Small issues with the data chosen can lead to big problems with providing customers useful and actionable insights. And even when your model is accurate, it may not provide customers with the type of insight they want, or it could provide the right insight, but in an unintuitive and user-unfriendly way. And the longer your team spends working in seclusion without customer feedback, the further your product is likely to drift from customer needs.
To satisfy customers' needs, ML providers need to regularly meet with customers and incorporate their feedback into the next stage of development. Over time, fulfilling even minor customer requests results in significant improvement in your ML.
Losing Sight of Deployment
Good data scientists are intellectually curious people who love tinkering and solving complex problems. But creating a good model from the data is only one part of the task; you still have to put the software into production. That means making a scalable program that can cope with fluctuating customer demand without becoming slow and unstable.
When your data science team neglects containerization, deploying a usable product becomes much more difficult. Containers are essentially self-contained deployments of an app or service, which share the server's OS. Not only do they speed development, but they also enable companies to rapidly spin up new instances of the software in production, responding to customer demand spikes. By incorporating containerization into the process from the beginning, you can speed development, eliminate bottlenecks and help ensure a stable, scalable end product.
Failing to Take Care of Your Data
Selecting, cleaning and safeguarding your data is a complex and crucial task that many machine learning companies fail at. Not only is there the data you feed into the machine to account for, but there are also many iterations and tests as you develop the ML model. There's a lot of data to take care of and many opportunities for mistakes before the project becomes a finished product.
A lot of companies leave these tasks to overburdened data scientists, which poses additional risks — risks we see as unnecessary. That's why we have dedicated DevSecOps and DataOps teams and let our data scientists focus on data science problems. By working with a diverse range of skills and specializations in one cohesive team, we deliver a reliable product that gets better with every iteration.
Read our white paper below to learn more about what Bitvore can do for you.