Walter Sun is helping SAP build AI that understands tabular data, not just language

As we move further along with AI in business, it’s clear that companies continue to struggle to use it effectively. As we noted in a recent article, the number of AI projects making it into production has been stubbornly stuck at 25% for months. Part of it is a data challenge, finding the right data to train effective models. One company sitting on a treasure trove of company operational data is SAP.
Walter Sun, who leads AI at SAP, sees a major opportunity to put that data to work. He’s tackling a two-fold challenge: helping internal teams and customers alike choose the right model for the job, while ensuring that they can handle the kind of data SAP specializes in.
One of the key issues is that large language models tend to like, well, language, rather than tabular data. SAP data, which tends to be tabular in nature, presents a problem. The company is attempting to address this by building custom offerings that can understand tables better. In addition, it’s built a model hub, which lets both customers and employees find the strongest choice for a particular job with trade-offs like cost and accuracy.
Easing model selection
Sun has been around technology for a long time including almost two decades at Microsoft before joining SAP in 2023. “I am responsible for the AI product engineering across the entire company,” he told FastForward. “So we're building technologies which are reusable, which can be used by different lines of businesses, as well as for customers.”
The first step for that was building an AI hub, which provides a central place for internal and external users to find the right model for whatever they are trying to do. It’s not unlike Amazon Bedrock, a similar product introduced by AWS two years ago. As with Bedrock, it provides a central place for users to assess the best offering for a particular job.

“Our goal is to make it as easy as possible for our businesses to onboard and use the large language generative AI technologies,” Sun said. “We show you which models perform best for a particular task.” Then they offer recommendations based on the individual use case.
He offers the example of a task like creating a question and answer bot, which doesn't require a lot of compute, and perhaps 23 of the models in the hub could handle this task. Then what other criteria might you use to narrow down that list of choices? “We can find them and sort them by cost and give you the model with the best ROI. So basically, if you could use any of 23 possibilities, we'll identify the three or four that are cheapest.”
As he points out, that not only saves you money, it helps reduce the environmental impact of your choice.
The challenge of tabular data
One of Sun’s biggest challenges has been finding a way to put SAP table data to work with LLMs. “The most important business data are tabular, if you will, basically Excel spreadsheets or comma separated value tables,” Sun said.
And the problem is that language models have trouble with this type of data. If you ask a question, it is really adept at answering it in grammatically correct English (or whatever language you choose), but when it comes to business use cases involving tables, it can struggle. “What makes it difficult? I think it's first of all, language models are trained on language, and so that's a different type of training data set,” he said.
An April paper, A Survey on Table Mining with Large Language Models: Challenges, Advancements and Prospects by Mingyue Cheng et al., bears this out. The authors found that while large language models are capable of processing tabular data, they often get tripped up by more complex tables.“While LLMs can handle a certain degree of structural information after training on massive textual corpus, they may struggle with parsing tables with highly complex structures. This limitation arises partly from the one-dimensional text-form encoding of two-dimensional tabular data (especially with multiple hierarchical indexes, merging cells), leading to comprehension gaps.”
While there are other companies working on the tabular challenge, it is a problem SAP is particularly motivated to solve. “We're building an SAP foundation model, which is different from large language models,” he said. “It's going to supplement LLMs and allow SAP customers to access and leverage business data. This business data, which we consider mostly tabular data, is more difficult to process,” he said.
The company is also building an internal knowledge graph on top of this specialized model to help connect information inside it with different business objects across lines of business and product. “The goal is to make the model understand, if you will, one's business practices and the dependencies that exist across the different lines of businesses,” he said.
We're building an SAP foundation model, which is different from large language models.
So, if for example, you were a construction company working on a spreadsheet of deliverables for your project with materials deliverable dates, the model and knowledge graph could work together to help with planning and avoid delays.
That’s because the knowledge graph can connect data points and provide a logical understanding of the dependencies, helping to explain why certain dates might or might not work, and how different business processes are interconnected.
Pushing AI with training
As companies like SAP are making an AI push, one of the things that’s been holding people back is a lack of training, but SAP has been working with employees to get them more comfortable with AI, even as far back as last year, so they can take better advantage of it in their jobs.
“We actually hosted AI days throughout 2023 to educate people on what language models were. I think that this technology is so revolutionary that we want to make sure people understand how to use it,” he said. “I kind of joke about people becoming a computer whisperer. You need to know how to speak to machines, and you speak to each model differently.”
Sun knows that making AI useful inside SAP, or any enterprise, isn’t just about building AI-fueled tools. It’s about making them accessible, understandable and tuned to real-world business data including data found in tabular formats. That means training the people as much as the models, and giving both the context they need to succeed.
Featured photo courtesy of SAP