AI Dev 26 x SF | Jerry Liu: My Agent Can’t Read a PDF?

AI Dev 26 x SF | Jerry Liu: My Agent Can’t Read a PDF?

More

Descriptions:

Jerry Liu, co-founder and CEO of LlamaIndex, delivers a conference talk at AI Dev 26 in San Francisco explaining why document parsing remains one of the most underestimated bottlenecks in production agentic AI systems. With over one billion pages processed and 300,000 users on the LlamaParse platform, Liu argues that most agent failures trace back not to reasoning capability but to low-quality document context โ€” garbage in, garbage out at enterprise scale.

Liu breaks down why 20 years of OCR progress still leaves major gaps for AI workflows: complex tables, multi-column layouts, embedded charts, and fine-grained financial data confuse even frontier vision-language models like Claude Opus 4.7. He notes that naively screenshotting pages and feeding them into a VLM works for interactive assistants where users absorb token costs, but becomes economically unworkable when processing millions of documents. LlamaParse’s approach combines specialized layout detection with bounding box grounding to achieve accuracy levels that general-purpose VLMs cannot match at comparable cost.

Beyond raw extraction accuracy, Liu emphasizes citation infrastructure as a critical design requirement for enterprise agent workflows: financial analysts, legal teams, and insurance processors need to trace an agent’s conclusion back to a specific region in the source document. This grounding capability โ€” knowing not just what the text says but exactly where on the page it appears โ€” is something that doesn’t come out of the box with standard VLM API calls and requires dedicated layout modeling to implement reliably.


๐Ÿ“บ Source: DeepLearningAI ยท Published May 22, 2026
๐Ÿท๏ธ Format: Deep Dive

1 Item

Channels