
Content Summary
Discussion & OpinionMultimodal Evals:🦄 #34 • Boundary
TL;DR
This episode demonstrates how to build a reliable multimodal eval system for receipt data extraction without a golden data set, using runtime invariant checks (like verifying subtotals add up to grand totals) instead of labeled ground truth. Kevin Gregory shows that by designing self-validating evals, iterating on data models based on discovered edge cases, and switching to Gemini Flash (3:20), he built a high-accuracy receipt extraction pipeline in roughly 3-4 hours. The key thesis is that understanding your data deeply and designing good eval infrastructure upfront makes multimodal AI problems dramatically easier than expected.
ELI5
Imagine you're checking your friend's math homework, but you don't have the answer key. You can still check if they're right by adding up all the small numbers to see if they equal the big number at the bottom! That's what Kevin did with computer receipts — he told the computer to read receipts and then checked if all the numbers added up correctly, without ever needing someone to tell him the right answers first.
Top Concepts
Keywords
Quick Actions
- !Design runtime evals based on mathematical invariants rather than requiring golden labeled data
- !Always look at your raw data before writing any extraction code
- !Use Gemini Flash for multimodal OCR/extraction tasks
Want to analyze your own content?
Extract insights from YouTube videos, PDFs, and web articles. Free to start.
Try Knowmler Free