The problem
The client is a general contractor in Florida — a small ops team running dozens of active jobs and dealing with hundreds of suppliers a year. Their Google Sheet had every supplier name in it. Almost no actual prices.
Every supplier sends data differently: a printed receipt from the lumber yard, a PDF quote attached to an email, a phone snap of a Home Depot invoice, a forwarded spec sheet from Sika or GAF. Their PM was retyping these one by one — when she had time, which was rarely. Most rows just never got priced. So bidding new jobs ran on last quarter's pricing and gut feel.
The pain wasn't lack of software. They'd already tried two SaaS tools. The pain was: no tool actually does the boring work of reading a crumpled receipt and turning it into structured rows in their sheet, with their item naming and their supplier IDs.
What we built (working demo)
A document pipeline that lives where their team already works — Google Drive and Google Sheets. No new app to learn. The pieces below are all working in the demo we walked the client through; the paid pilot is being scoped now.
- Document ingestion. Drop a PDF, JPG, HEIC, or PNG into a watched Drive folder and the pipeline picks it up within seconds. Email forwarding to a dedicated intake address is wired but hardening for the paid pilot.
- AI extraction. A routing layer picks the right vision-language model per document type (cheap model for clean printed receipts, stronger model for handwritten or crumpled images, the strongest for complex PDFs with multi-page line items).
- Structured output. 20 columns extracted per row — supplier, item name, SKU, size, color, thickness, unit price, quantity, tax, total, document type (bid vs quote vs invoice vs receipt), date, and more.
- Supplier fuzzy matching.New documents are matched against the existing 123-supplier base from the client's Excel — with confidence scores. No more "HOME DEPOT" / "Home Depot, Inc." / "HD #4321" living as three separate vendors.
- Deduplication. Composite key on
company | item | size— a repeat receipt updates the row, it doesn't double it. - Source-doc archive.Every extracted row links back to the original receipt or PDF (private Drive). When the PM asks "where did this $11.42 SIKA price come from?" — it's one click.
- Review-first UI.The PM sees the original document side-by-side with the extracted rows. She approves what's right, rejects noise (subtotals, taxes, freight), and only signed-off rows land in the sheet.
What we deliberately did not build (yet)
We made a few sharp decisions to keep the MVP cheap and useful instead of overengineered:
- No custom DB in v1 — their Google Sheet stays the source of truth. Postgres comes in Phase 2 when multi-user audit-log and history matter.
- No QuickBooks integrationyet. They're sheet-first today; QB sync becomes Phase 2 once the upstream data is clean.
- No mobile app.A photo from the crew's phone works — they email or drop it in Drive. A native app would have added three weeks for zero new value.
The architecture, in one breath
Watched-folder ingestion → document classifier → model router (OpenRouter — 6 models, picked per document type) → structured extraction → fuzzy supplier match → dedup → review queue → approved rows written to the client's Sheet via Apps Script webhook with a secret token. Source docs uploaded to private Drive, every Sheet row holds a link back.
Where it stands
Demo signed off by the client's PM, NDA in place, paid pilot being scoped. The pilot will move it off the founder's laptop onto real hosting, add proper multi-user auth and email-ingestion, and ship a mobile photo upload for the field crew. Phase 2 — Postgres, audit log, real-time price-anomaly alerts ("SIKA went up 15% this month") — sits on the roadmap after that.
Why this matters beyond one client
This pattern — turning unstructured supplier docs into structured rows — is the same shape that hits accountants, law firms, logistics, insurance, property management, and medical billing. We took the slowest, most expensive piece of someone's week and made it disappear. If that's a process you recognize in your own business, the discovery call is 15 minutes.