Think Forward.

The Narrative Compiler Framework: Fixing LLM Hallucination & Tokenomics

664
Chapters: 5 7.6 min read

1: Chapter 1: Setting The Stage- Deloitte AI Scandal 135

In December 2024, the Australian government paid Deloitte $290,000 for a report that appeared complete and professionally written but contained fabricated material throughout. Several citations referred to sources that do not exist, some quotations were attributed to judges who never made them, and multiple references pointed to academic work that cannot be found in any database. The content was generated using GPT-4o and delivered to the client without these issues being identified during internal review. The problems were later discovered by a university researcher after the report had already been submitted, which led Deloitte to issue a corrected version and return the final payment. The failure originates from how current systems handle data-to-text generation. A single prompt is expected to read structured data, compute derived values, apply classification logic, organize content, and produce readable prose while preserving exact numerical and factual accuracy. These steps require different forms of reasoning, yet they are executed inside one probabilistic generation process without separation or verification between them. The result is text that is coherent at the surface level but unreliable when examined against the underlying data. This becomes a scaling problem rather than a one-off mistake. When document production relies on this approach, teams must allocate time to verify outputs, reconcile inconsistencies, and correct numerical or factual errors. As volume increases, the cost of review grows in proportion, often offsetting the time saved during generation. Attempts to improve reliability by adding more prompts or introducing agent-based workflows tend to increase repetition of the same operations without establishing a stable mechanism for verification. The approach presented in this series replaces that structure with a defined pipeline in which data processing, classification, generation, and validation are separated into distinct stages. Each stage has a fixed role, and outputs from earlier stages are treated as immutable inputs for later ones. The model is limited to producing language from already verified inputs rather than participating in computation or decision-making about the data itself.

2: Chapter 2: Why Agents, MCP, and RAG Fail for Data-to-Text 135

The current default approach to generating documents from data combines agents, multi-step prompting, and retrieval. These methods are often grouped together in practice, but they introduce the same structural issue: the model repeatedly interprets and transforms the same data without a fixed, verifiable intermediate state. Start with agent workflows. A typical setup assigns roles such as writer, reviewer, and editor. Each role operates on text produced by the previous step while also referencing the original data. The data is not processed once and stored as a stable representation; it is re-read and reinterpreted at every stage. Derived values are recomputed multiple times, sometimes with small differences. The final document depends on a chain of generated text rather than a single transformation from source data. When a number is incorrect, there is no clear point in the process where the error can be isolated, because each stage mixes interpretation with generation. Multi-chain prompting attempts to impose order by splitting the task into explicit steps within a single workflow. One step extracts information, another computes metrics, another organizes structure, and a final step generates the document. This looks closer to a pipeline, but the boundaries are not enforced. Each step still depends on the model to preserve exact values from the previous step. Intermediate outputs remain probabilistic. A value that is slightly altered during extraction will be used as input for all subsequent steps. The system accumulates small inconsistencies rather than preventing them. Retrieval-augmented generation changes how data is accessed, not how it is processed. Relevant documents or records are retrieved and inserted into the prompt. The model then reads and synthesizes them. For data-to-text tasks, this means that the model is responsible for selecting, combining, and expressing values from retrieved sources. If multiple sources contain overlapping or conflicting information, the model resolves them implicitly during generation. There is no requirement that the output match any single source exactly. Retrieval improves coverage but does not enforce consistency. These methods are often combined. A system may retrieve data, process it through multiple prompting steps, and coordinate the process with agents. The number of transformations applied to the same data increases. Each transformation introduces another opportunity for deviation. Token usage grows because the same information is processed repeatedly. The final output reflects a sequence of interpretations rather than a controlled mapping from input to output. Data-to-text generation requires a different structure. Numerical values must remain exact. Classifications must follow defined rules. Every statement must be traceable to a source. These requirements assume that data is processed once, stored in a stable form, and then used consistently throughout the pipeline. Agents, MCP, and RAG do not provide this property because they rely on iterative interpretation. They remain useful in earlier stages where the goal is to gather information, explore alternatives, or synthesize unstructured inputs. In those contexts, variation is acceptable and often necessary. Once the data is fixed and the task is to produce a document that must align exactly with that data, the process must shift to a deterministic pipeline where computation, classification, and generation are separated and verified.
bluwr.com/Chapter 2: Why Agents,...