Improving Generative AI Outputs by Using Structured Data

Much has been written about what Generative (Gen) AI can do, but also the nature of hallucinations which is inherent in the process. There are many tasks where Gen AI can produce different results each time it is instructed to be creative, and there are many situations where creativity is required.

There are other times when you don't want creativity, you want consistency, for example, where facts are important and need to be correct, and conventional knowledge would suggest using Traditional AI instead of Gen AI. Shumaila Handoo, Director Consulting Services - CGI, India in her article on Synergized Artificial Intelligence best stated it as follows: "Traditional AI places a stronger emphasis on effectiveness, predictability, and consistency, whereas Generative AI thrives on creativity and diversity." Although this is widely accepted to be true, there is a way of getting the best of both worlds, namely repeatable results using Gen AI: The key is to use Structured Data. For those that don’t know, structured data is data that has a specific format and structure. For example, a table in a database for customers might contain Customer Number, Customer Name, Address, etc. Each data element here can be represented in columns and rows. In contrast, unstructured data is all one stream of data. For example, text from chats is free-form (i.e. unstructured). With unstructured data, we cannot easily determine the context of the data: A name in a chat may represent a customer name or a company name.

Practical Example of using Customer Information

When working with structured data like CSV files containing customer information, generative AI models can be constrained to produce highly accurate, non-hallucinating results. For instance, consider a simple CSV file with columns for Customer ID, Customer First Name, and Customer Last Name:

Practical Example of using Customer Information

When a Gen AI model is prompted to analyze this data, it can easily determine its responses directly in the structured information provided. For example, when asked "What is Jane Doe’s Customer Number?" the model can extract the precise value from the CSV rather than generating a reasonable but potentially incorrect response. Another example is if you wanted to find a list of all customers with a last name of “Smith”. This method of retrieving results enables reliable, repeatable outputs even when you give the Gen AI tool a great amount of creative freedom.

Preventing or Minimizing Hallucinations

Gen AI doesn't hallucinate when using structured data because it has a concrete, unambiguous reference point to anchor its responses. Unlike with unstructured input where the model must synthesize information from its training data—potentially introducing errors or fabrications— structured data provides an explicit ground truth against which responses can be verified.

The structured format eliminates ambiguity, providing clear parameters for what constitutes a valid response. When processing a CSV file with customer data, the model can simply retrieve the value from the appropriate row and column rather than having to generate information based on statistical patterns in its training data. This reduces the probability of hallucination to near zero when properly implemented.

Furthermore, structured data enables deterministic processing pipelines, where the conversion from raw data to final output follows clear, verifiable steps. Data in each step in this pipeline can be validated, ensuring that the final output is traceable back to source data rather than being generated through opaque statistical inference. Now that we've established how structured data grounds Gen AI, let's examine its advantages over traditional AI systems.

Advantages of Gen AI Over Traditional AI

This approach of grounding Gen AI in structured data offers several advantages over traditional AI approaches:

Flexibility with reliability: Unlike traditional AI systems that often require rigid, predefined schemas, Gen AI can understand and adapt to various structured data formats without extensive reprogramming. In other words, you can change the structure of data input without having to retrain model. As well, it can intuitively grasp relationships between data points while maintaining factual accuracy.
Data synthesis capabilities: Gen AI can combine information from multiple structured sources, generating insights that might not be apparent from examining individual datasets in isolation. In other words, using traditional AI, one would need to provide context on how several models related to each other, but with Gen AI, it can reliably determine that on its own because it is dealing with structured data.
Explanation generation: Beyond providing raw data, Gen AI can better explain trends, anomalies, and relationships within the structured data in human-understandable terms.
Lower technical barriers: This approach simplifies data analysis, allowing non-technical users to extract meaningful insights from structured data without specialized programming skills.

Disadvantages of Gen AI Over Traditional AI

Despite its advantages, using Gen AI with structured data also has limitations:

Processing overhead: Gen AI models typically require more computational resources than traditional database systems when processing structured data, potentially resulting in higher latency and operational costs, though this gap is expected to decrease as Large Language Model (LLM) optimization advances. This is like how optimization in computer program compilers (e.g. for C++ or C#) have seen impressive improvements over the last 30 years.
Potential for misinterpretation: Using only Gen AI data without an accompanying metadata, Gen AI might occasionally misinterpret structured data formats or relationships if they differ significantly from common patterns in its training data. However, this can be substantially mitigated by providing the expected metadata for the data as well as domain values where either needed or possible.
Prompt sensitivity: Results can be highly dependent on how queries are formulated, with subtle changes in wording potentially leading to different interpretations of the structured data. This also is mitigated by having a well written, unambiguous prompt and with testing to ensure consistent results. Complex prompts can also be split into smaller tasks which allow for better tuning.
Possible worse performance with Gen AI: Since Gen AI uses Neural Networks, a highly tuned machine learning model may process information well with less resources than Gen AI in certain data processing tasks.
Versioning challenges: As Gen AI models are updated, their interpretation of the same structured data might change, potentially compromising reproducibility of results. This can be mitigated by retesting prompts each time there is a version change to ensure prompts continue to work as expected. This in the data world would be considered “regression testing.
Privacy and security concerns: Processing sensitive structured data through Gen AI systems may raise questions about data governance, particularly when using cloud-based services. However, if we are comparing traditional AI in the cloud versus Gen AI in the cloud, this is a concern for both. To mitigate these problems, one can use Gen AI models which can be deployed on premise, for example, Claude AI from Anthropic, in which case the data provisioning issues still apply but they are within one’s organization.

A skeptical reader might ask: "Why use Gen AI for structured data instead of traditional AI models that are purpose-built for it?" While traditional AI models excel at processing structured data, they often lack the adaptability and natural language capabilities that Gen AI provides. Gen AI allows non-technical users to interact with structured data conversationally, unlocking new ways to extract insights. In addition, the results are highly accurate.

Making Gen AI Shine with Structured Data

Despite the challenges, numerous applications stand to benefit significantly from combining Gen AI with structured data:

Healthcare analytics: Processing structured electronic health records to identify patterns, predict outcomes, and generate personalized treatment recommendations while maintaining factual accuracy.
Financial reporting: Transforming structured financial data into accessible narrative reports that explain complex trends to stakeholders without financial expertise.
Supply chain optimization: Analyzing structured inventory, logistics, and demand data to generate actionable insights and recommendations for improving efficiency.
Customer experience personalization: Using structured customer interaction data to generate highly personalized communications that remain factually accurate about products, services, and account details.
Scientific research: Processing structured experimental data to generate hypotheses, identify patterns, and explain findings in natural language accessible to broader audiences.
Educational assessment: Analyzing structured student performance data to generate personalized feedback and learning recommendations tailored to individual needs.
Public sector services: Processing structured government data to provide citizens with clear, accurate information about services, eligibility, and procedures through conversational interfaces.

Conclusion

By taking advantage of Gen AI with Structured Data, we can get the best of both worlds: the usability from Gen AI combined with accurate data from the structured data. By leveraging structured data, organizations can transform Gen AI from an unpredictable creative tool into a reliable, data-driven assistant. As AI continues to evolve, this hybrid approach may set a new standard for balancing innovation with accuracy.