June 20, 2025

Waiting for AI to Catch Up

I've seen a lot of X posts on "waiting for the AI to catch up." Namely, companies that spend a lot on marketing with a half-baked product waiting for the AI to catch up to it. My own first company had a similar problem.

My first company, Leafpress, was focused on AI energy data management. A huge part of our product involved collecting and parsing utility data.

Utility data is one of the hardest document types for AI to parse, because of the wildly different formats and charge structures. A "total due" could be interpreted differently depending on the provider, and account numbers, meter numbers, costs, and units all needed to be grouped together. On top of that, the bills were extremely long. We had to chunk the documents as they came in - but we didn't always know where one bill ended and the next started. Chunking them incorrectly often made things worse.

Our first strategy was to spend a ton of time refining our prompting and model configuration, but we were working with GPT-3 and quickly realized that no matter how we worded the prompt, the output was unreliable and highly nondeterministic. Eventually, we hired a data QA person to help validate and manage our evaluations. We didn't start building evals until embarrassingly late — we hadn't realized how important they were - but we pulled together a support platform and a library of our worst bills to test against. Even that wasn't particularly useful, since the model never actually improved.

We tried to optimize every part of the process: switched to better OCRs, experimented with different chunking methods, added examples, even had the model mechanically highlight bolded items from the bills. Another issue? Utility bills needed to be 100% accurate - some had million-dollar charges, and misreading them could be a huge problem. They also had to be read fast. Most had a one-week window to be collected and paid, so our processing time was tight - leaving no room for manual review.

We spent all that time trying to perfect something - and no, we never fine-tuned the model. We didn't have enough data from a diverse range of providers, and we were worried that fine-tuning would overfit and squeeze the model in ways that would cause new problems. Plus, as a seed-stage team, we just didn't have the resources.

And then the miracle happened.

GPT-4o

All those months of work navigating prompts? Gone. The new model could read utility bills with incredible accuracy. It grouped charges correctly, handled formatting issues - all of it. Our customers immediately saw improvements in both accuracy and turnaround time. Our team felt a massive wave of relief.

Which made me ask: Was waiting actually the right strategy?

We had customers and brands in the energy space. We were one of the budding startups in the sector, part of YC S23 and listed as one of the top startups to watch on Demo Day. By the time the model was ready, we had the engineering talent, market fit, and real customers (Lindt, Johnson Controls) ready to use our product.

In the AI era, where things move fast, a lot of people wonder if waiting is a real strategy. It wasn't our intention - but we did end up benefiting from it. And now we're seeing other teams (like Icon, Cluely) leaning into that same bet.

But there's a big downside to this approach: it makes it really hard to prove user demand or iterate. Banking on new tech to "save" you removes your ability to build something scrappy and earn user trust. If your product doesn't meet people's needs - or if it feels like the founders aren't trying to make customers happy - retention will suffer, and trust fades fast.

Even though our product had challenges, we implemented evals and built support around it so customers could still experience the "aha" moment - even if the model wasn't perfect. I think that mindset needs to be applied to any team "waiting on the next model." How can you still deliver real value and a clear "aha" moment now? There's always an answer to that question - and it usually comes back to first principles.

Products that users don't want won't get you anywhere.

Built with v0