The market for data science and artificial intelligence isn’t mature—not by a long shot. However, what we may be seeing is what Winston Churchill once termed “the end of the beginning.” In many industries and for many applications, artificial intelligence is a new and exciting tool, and its depths have barely been plumbed. In a (much) smaller number of industries and applications, AI is a proven tool, and bringing it up to scale is more important than iterating new features.
Imagine a small startup, for example. AI helps power their business, but it’s not the core of their business model. They’re not going to spend much time reinventing or improving the wheel. Instead, they want to make sure that their AI module is reliable enough to support their minimum viable product. What does this mean?
- It means that companies will prioritize scale over accuracy when it comes to AI models.
- Cleaning and preparing data will be more important than creating new kinds of analytics.
- ETL pipelines will become more important than new statistical models.
- Data engineers will become more important than data scientists
As AI technologies continue to mature, there’s a possibility that fewer companies will innovate in AI, and more companies will focus on operationalizing AI. This might spell a sea change in how companies hire and deploy data scientists—and a shift towards the more uncool side of AI.
Deemphasizing Data Science in Artificial Intelligence
Research from Mihail Eric (Founder of Confetti AI and a Senior Machine Learning Scientist at Amazon) shows 70% more open roles for data engineering than there are for data science. This jibes with anecdotal experience from the team at Bitvore.
First, the broader discipline of artificial intelligence is like an iceberg, in that the cool parts—image recognition, data visualization, predictive analytics, etc.—are all above the surface. The 90% below the surface is represented by annotating data, cleaning data, moving data, and optimizing the cleaning and moving data processes.
Second, data scientists mostly like dealing with the part of artificial intelligence that’s above the surface. Yet research shows that data scientists spend about 45% of their time dealing with data loading and cleaning.
Meanwhile, data engineers are specifically trained for data preparation and loading. This suggests a few things to us.
First, a data scientist and a data engineer will probably get more work done than two data scientists because a data engineer can prepare the data that the scientist will analyze.
Second, companies are generating increasing amounts of data. The amount of data produced in 2025 will be ten times greater than the amount produced in 2017. If previous trends hold, most of this data will be the kind that’s hard to categorize—emails, images, Twitter posts, and so on. In short, it will need cleaning (and so will the rest of the data).
Therefore, our need for data engineers is going to scale faster than our need for data scientists. This is before we incorporate the increasing maturity of AI technology and the advances in self-checkout AI that’s lead to what Gartner calls citizen data scientists.
Driving More Advances on the Uncool Side of AI
All the information above converges to a single paradox—to drive further advances in AI, operationalize AI in more companies, and bring the benefits of AI to more people, we need to focus more on engineering and infrastructure.
At the end of the day, AI doesn’t just depend on new and interesting statistical models. It depends on data. The more data that an AI can access, the more it can learn and the better its predictions become. Data is the bottleneck for AI implementation, especially for companies whose business model doesn’t continuously improve AI models or find new capabilities.
As the industry slowly begins to mature, we may find that companies who commit to adopting innovations in data preparation out-compete those who commit to improving data models. Innovation on the more “boring” side of AI may lead to more scalable, reliable, and approachable AI.