Garbage In/Garbage Out applies more than ever to AI data

Garbage in/Garbage Out applies more than ever to aI data
Whenever I talk to people who are working with enterprise customers about AI, they inevitably need to talk about data first. That’s because the quality of the models depends on the quality of the data used to train them. The phrase garbage in, garbage out goes back decades for a reason.
Data quality has long been an issue for companies trying to make data-driven decisions. These days, conventional wisdom suggests that if you want to make large language models really useful, you train them on your company data. That makes sense in theory, but if your data is messy, inconsistent and inaccurate, then the LLM might not give you the results you expect.
Lots of companies are struggling to build AI applications that truly propel the business forward, and a big part of that is the state of their data. Mike Mason, chief AI officer at technology consultancy Thoughtworks, says you can’t just jump into AI and think you can point your model at your company data and it magically makes you better, stronger and more efficient. Nothing is that simple, unfortunately.
“Organizations quickly realize that in most cases they need to do some AI readiness work, some platform building, data cleansing, all of that kind of stuff," Mason told me last year. And while it’s important to get your data house in order, he emphasized that you can get going faster with your AI projects by concentrating on high value data that has a direct impact on your most important business metrics.
“While organizations can be behind, I think the failure would be to say, ‘Oh, we're going to take two years off [to get all our data perfect] before we can even do any of these AI use cases.’ You need to challenge yourself to get to take that incremental approach." he said. That means finding a smaller set of data that is really important to your business, and concentrating on getting that right to train your models on.
In other words, don’t let perfect get in the way of good enough. It’s a lesson we hear with every technology cycle, and with generative AI, it’s important to start undertaking experiments with your data and different LLMs, and figuring out what works. Many companies have been in that experimental state for some time, trying to find those use cases where they can see some worthwhile results.
In fact, in a report that came out this week from Deloitte, the firm found that many companies are still in the experimental stage, so if that’s you, you have a lot of company. “We found organizations are still heavily experimenting with GenAI, and scaling tends to be a longer-term goal. Over two-thirds of respondents said that 30% or fewer of their current experiments will be fully scaled in the next three to six months,” the report stated.
One more thing to consider, as you go through those experiments, especially if you are playing with customer-facing applications, is that ideas that work in the lab don’t always scale in the outside world, something that my friend Jon Reed of Diginomica drove home in a presentation this week.
“One of the hard lessons of AI is that what you test in the lab isn't always really built for the wild conditions and outdoors,” Reed said. And that means you have to think about what your risk tolerance is, and how badly it will go for you if it goes awry.
Nobody is suggesting that these challenges, as daunting as they may seem, are in and of themselves reasons not to try, but as you play with generative AI (and AI agents), it’s important to keep in mind that there will be mountains to climb. This will not be a smooth, straight line from idea to implementation. Whether it’s issues with your underlying data or something else, you just have to be realistic about what you’re up against and take that into consideration as you build your AI strategy.
-Ron
Photo by Pawel Czerwinski on Unsplash