1: Chapter 1: Setting The Stage- Deloitte AI Scandal 135
In December 2024, the Australian government paid Deloitte $290,000 for a report that appeared complete and professionally written but contained fabricated material throughout. Several citations referred to sources that do not exist, some quotations were attributed to judges who never made them, and multiple references pointed to academic work that cannot be found in any database. The content was generated using GPT-4o and delivered to the client without these issues being identified during internal review. The problems were later discovered by a university researcher after the report had already been submitted, which led Deloitte to issue a corrected version and return the final payment.
The failure originates from how current systems handle data-to-text generation. A single prompt is expected to read structured data, compute derived values, apply classification logic, organize content, and produce readable prose while preserving exact numerical and factual accuracy. These steps require different forms of reasoning, yet they are executed inside one probabilistic generation process without separation or verification between them. The result is text that is coherent at the surface level but unreliable when examined against the underlying data.
This becomes a scaling problem rather than a one-off mistake. When document production relies on this approach, teams must allocate time to verify outputs, reconcile inconsistencies, and correct numerical or factual errors. As volume increases, the cost of review grows in proportion, often offsetting the time saved during generation. Attempts to improve reliability by adding more prompts or introducing agent-based workflows tend to increase repetition of the same operations without establishing a stable mechanism for verification.
The approach presented in this series replaces that structure with a defined pipeline in which data processing, classification, generation, and validation are separated into distinct stages. Each stage has a fixed role, and outputs from earlier stages are treated as immutable inputs for later ones. The model is limited to producing language from already verified inputs rather than participating in computation or decision-making about the data itself.