AI Stumbles Without Good Data, Despite Advances in Techniques

907641_Bitvore images_4_120220

Data is important to AI—and probably more important than you think. Data scientists and developers like to tout their AI techniques and models, but there's no such thing as a model that doesn't fall victim to GIGO—Garbage In, Garbage Out. No matter how good your model is, you must have good data to get good results.

Don't Trust Claims to the Contrary

 

One of the reasons we're highlighting this is that despite the fact that AI has matured in recent years—progressing further to the right along the Gartner Hype Cycle—organizations are still prone to making and believing misleading claims about artificial intelligence capabilities. 

 

According to Gartner, 30% of organizations are increasing their AI investments despite the pandemic, and only 7% are decreasing them. However, there's an increased risk involved with doubling down on AI investment in an economically uncertain time. This means that companies must ensure that they're investing in a product that performs as advertised.

 

Despite this increased level of investment, however—and despite the need for due diligence—many companies fail to assess AI's limits correctly. For example, a survey of European companies shows that 40% of self-described AI startups don't use AI in a meaningful way. Even if a startup only claims to work in AI, without producing any evidence that their product is AI-based—they still receive up to 50% more investment than startups in other fields.

 

In addition, out of the AI companies surveyed, almost 50% were concentrated into one of two areas—chatbots or fraud detection. These are both well-trodden paths for AI, and there's no good evidence to suggest that these two use-cases do much more than generate cost-savings for an organization. In other words, companies that pursue AI strategies have a good chance of either stumbling into AI vaporware (AI products that don't use AI) or in AI products that tout themselves as being revolutionary. At the same time, they are in fact just more of the same.

 

Again, the Data Must Inform the Algorithm

 

If an AI company tells you that their product will transform your business but says nothing about their data, don't trust it. Data is hard to get right. Let's explain.

 

Data needs to resist decay. Imagine an AI product that goes through the names, addresses, and buying habits of real people and tells you which one is most likely to buy your product. That's valuable. And AI products like this certainly exist. One problem that these products most overcome, however, is something called data decay. People move, change companies, and abandon their email addresses. If you start with a list of 100% accurate customer information, 70% of it will be useless after a year. If your prospective AI partner can't tell you how they deal with data decay, find someone else.

 

Data needs to be clean. No database is perfect. Data decays (see above), but there are also duplicate entries, missing entries, spelling mistakes, and data that's ingested without any organizing scheme. Data scientists routinely spend up to 60% of their time cleaning data before it's ever used to inform an AI. If your prospective AI partner can't tell you how they prepare data before using it in their algorithm, find someone who can.

 

Data needs to be targeted. Large datasets can be their own enemy. An AI algorithm is designed to predict or analyze just a few attributes—housing prices, customer sentiments, oil futures, etc. Adding too many variables is a good way for an algorithm to find correlations that don't exist. You see this often in AI startups who are attempting to break into new markets but don't have experts in those markets on staff. If your prospective AI partner doesn't have domain experts that can speak directly to your industry, look elsewhere.

 

Bitvore: Good Data, Good AI

 

Here at Bitvore, we call our solutions Precision Intelligence for a reason. Starting with a wide variety of high quality, unstructured data—press releases, journalism, earnings calls, and more—at one end, we're able to generate unprecedented business insights at the other. 

 

Want to learn more about Bitvore? Download our white paper and learn how to to identify emerging risk below!

Sentiment Analysis on Unstructured Data White Paper download