Applied AI Thinking for Operators · Part 4 of 4

From Demo to Production

Three iterations, two complete rebuilds, and what I learned about picking the right stack for the right stage of a project.

The previous post covered how I thought about the agent architecture: the orchestration structure, the search strategy that replaced event-anchored discovery, and the qualification rubric. This post is the infrastructure companion: how the system is deployed, what I rebuilt and why, and the trade-offs I navigated across the deployment options I tested.

In this post I detail my thought process working through different permutations of deployment methods, and share my reflections on what AI tools work best for which kind of production requirements. There are a lot of tools in this space now — Vercel, Replit, Lovable, Claude Code, Streamlit, to name a few — and they don't all serve the same function. I wanted to try as many of these as practical and figure out which stack works best for which situation.

The Full Stack

Before getting into what I tried and discarded, here is what the final architecture looks like end to end:

The pipeline and the dashboard share a database and nothing else. There is no shared runtime state between them. The diagram below shows the full architecture from browser to database to pipeline.

USER Browser / Sales Rep HTTPS VERCEL FRONTEND Next.js Dashboard React · TypeScript · shadcn/ui BACKEND API Routes (Serverless) Stateless · auto-deploy from GitHub reads DATABASE Supabase (Postgres) UUID · JSONB · pgvector-ready · RLS writes PYTHON PIPELINE AGENT 1 Research Exa findSimilar AGENT 2 Enrichment Claude · Phases A, B, C, D AGENT 3 Outreach Claude · per decision-maker
Full stack from browser to pipeline. The dashboard (top) and the Python pipeline (bottom) share only the Supabase database. The dashboard reads via Vercel API routes; the pipeline writes directly to Supabase at the end of each run.

Iteration 1: Streamlit + JSON

The right move for v0 is almost always the fastest path to something I can look at. Streamlit is a Python library that turns a script into an interactive web app with almost no additional code. It's Python-native, requires no frontend knowledge, and the gap from "script that produces output" to "app with a UI" is a matter of hours. I had a working demo within a day.

Streamlit works on a simple principle: every time the user interacts with the app (clicks a button, moves a slider, selects a dropdown), Streamlit reruns the entire Python script from top to bottom and re-renders the output. This is genuinely useful for certain kinds of work. If you're building a prototype of an ML model and you want to let stakeholders adjust a confidence threshold and immediately see how it affects the output, full-page reruns are a feature, not a bug. The same applies to data exploration dashboards, quick internal tools for running queries, or anything where the user's interaction is essentially "change a parameter, see the new result." A good example would be a tool that lets a non-technical team member upload a CSV and preview the output of a model at different settings, without the overhead of building a full frontend.

The problem is that this model breaks down entirely for anything that needs to hold state across multiple interactions. A CRM-style interface, the kind I was building, involves actions like: updating a lead's status without losing the rest of the view, copying an outreach email while keeping the panel open, marking a record as "contacted" and having that change persist without triggering a full reload. Each of these requires independent, localised state updates. Streamlit's full-page rerun makes all of them feel janky because the UI can't hold partial state between interactions. The tool I was building was closer to a lightweight CRM than a data visualisation dashboard, and Streamlit is designed for the latter.

The second problem was JSON as a persistence layer. A JSON file is a natural first choice for storing results from a local Python script: zero setup, human-readable, trivial to write and parse, and perfectly adequate when you're the only user running the script on one machine. For a quick prototype where you run the script, inspect the output file, and move on, it works fine. The limitations surface when you need concurrent access, partial record updates without rewriting the whole file, or queries across multiple runs without loading everything into memory. All of these are solved by an actual database.

The pattern here: Streamlit and JSON are both good choices at the prototype stage, on a single machine, for a single user. They become the wrong choice the moment you need the app to run somewhere else, hold state across user interactions, or be updated by more than one process at a time.

Iteration 2: Choosing a Production Stack

Once I decided Streamlit wasn't the right foundation, I had to pick a new stack. There were several combinations I seriously considered. Each represented a different set of trade-offs across frontend, hosting, and persistence:

Option A: Lovable with JSON as DB. Lovable generates complete React web applications from natural language prompts, which I could host on Lovable's own platform. It's the fastest path from zero to a working, shareable UI. The problem is persistence: Lovable-hosted apps don't have a reliable filesystem for writing JSON between deployments, and I'd be building entirely within their managed ecosystem. For a quick prototype I never intended to extend, this would be fine. For something I want to own and iterate on, it creates long-term friction from the start.

Option B: Export Lovable code, host on Replit, JSON as DB. Replit does have a persistent filesystem, so JSON-as-persistence would technically work here. The Python pipeline could also live on Replit. The limitation is the same ecosystem concern: Replit's managed environment is excellent for getting something running fast, but the runtime, pricing tiers, and infrastructure choices are all Replit's to control. The more the project grows, the harder it becomes to detach. Both Lovable and Replit are good choices for rapid prototyping; neither is ideal as a long-term production foundation for something you want to customise and own.

Option C: Next.js (built with Claude Code) + Vercel + Supabase. This is what I went with. Here's the reasoning for each component:

Option Stack Good For Why I Didn't Use It
A Lovable + Lovable hosting + JSON Fastest prototype; no-code UI generation; shareable demo in hours Ecosystem lock-in; no reliable persistence across deployments; harder to customise long-term
B Lovable (exported) + Replit + JSON Persistent filesystem; Python and frontend in one environment; fast to spin up Still inside Replit's managed ecosystem; runtime and pricing controlled by Replit; harder to own and extend
C (chosen) Next.js (Claude Code) + Vercel + Supabase Full ownership; dynamic React UI; serverless deployment; structured DB with pgvector upside Steeper setup than A or B; Vercel cold start on first request after idle (1-3s)

On Vercel's Cold Start Problem

Vercel runs Next.js API routes as serverless functions. When a function hasn't been called recently, the first invocation takes longer (typically 1-3 seconds) because a new instance needs to spin up. For a dashboard opened first thing in the morning after being idle overnight, that delay is noticeable.

The mitigations I'd apply for a production deployment: use Vercel's Edge Runtime for lightweight read endpoints (no cold start), and have the most latency-sensitive queries hit the Supabase JS client directly rather than going through an API route. Neither requires a major architecture change; they just need to be planned for rather than discovered after launch.

The Search Problem

The search architecture went through its own evolution in parallel with the infrastructure decisions. Before getting into what changed and why, here's a quick reference for how the three agents and Agent 2's four phases are structured, since the discussion below refers to them directly:

Agent Job Tool
Agent 1 — Research Discover 15-25 ICP-matching companies per run Exa findSimilar
Agent 2 — Enrichment Four sequential phases of enrichment per company (see below) Claude + web_search
Agent 3 — Outreach Draft personalised subject line and email per decision-maker Claude
Agent 2 Phase What It Does
Phase A Company profile: revenue estimate, employee count, core business description
Phase B Contact discovery: decision-makers, LinkedIn profiles, email addresses
Phase C Industry engagement: trade show attendance and association presence
Phase D Qualification scoring: 4-criterion ICP rubric, score 0-100, High/Medium/Low label

API Choices: Exa vs. Claude's web_search

I use both Exa and Claude's built-in web_search tool, but for different jobs.

Exa's findSimilar is a semantic similarity search: given a set of seed URLs (companies I know are in the ICP), it returns other companies that are conceptually similar across the web. It's designed for discovery: surfacing companies I don't already know about, fast and consistently, returning structured company-level results.

Claude's web_search is better for enrichment: answering specific factual questions about a known company. "Does Company X exhibit at ISA Sign Expo?" is a question with a potentially verifiable answer that a targeted web search can find. "Which companies are similar to Company X?" is not a question web search handles well; that's where Exa's semantic model earns its place.

The pipeline uses Exa for Agent 1 (discovery) and Claude's web_search within Agent 2's Phase C (industry engagement enrichment). They're complementary, not competing.

Filter vs. Qualifier

The first version of the pipeline used qualification as a filter: enrich each discovered company, score it, and only pass leads above a minimum threshold to Agent 3. This felt clean. It also produced too few results — sometimes as few as 5 to 8 leads per run — because the confidence intervals on inferred data are wide enough that a strict filter cuts too aggressively.

The change: qualification became a ranking signal, not a gate. Every discovered lead gets enriched and scored; all of them reach the dashboard; the score determines position in the list. The user decides which threshold is meaningful for their context, not the pipeline.

Trade-off

Enriching all discovered leads rather than pre-filtering means more Claude API calls per run: roughly 3 to 4 times the compute compared to a strict filter. Pipeline runs that used to complete in 3 to 4 minutes now take 8 to 12 minutes. This is the efficiency problem I haven't fully solved, and it's the most consequential open issue in the current architecture.

Architecture Reliability Complexity Typical Use
Basic RAG Medium Low Chatbots, document Q&A, simple retrieval
Retrieval + extraction High Medium Research pipelines, structured data extraction
Agentic search Very high (with fallbacks) High Autonomous agents, multi-step discovery tasks

This pipeline is in the third category. The agent doesn't just execute a search query: it decides what to search, reads the results, decides whether to search again or move on, and synthesises across multiple sources per company. That's why the runs take longer and why parallelisation matters more than it would for a simpler pipeline.

What I Haven't Solved Yet

Sequential execution

As a reminder, Agent 2's four phases are: Phase A (company profile), Phase B (contact discovery), Phase C (industry engagement), and Phase D (qualification scoring). Currently these run one at a time, sequentially, for each company in the list: all four phases complete for Company 1 before starting Company 1's neighbour. There's no technical reason Phases A and B can't run in parallel across all 20-plus companies simultaneously using Python's asyncio. Parallelising at the company level would reduce enrichment time from roughly 8 minutes to closer to 3. Supabase handles concurrent writes without issue. This is the highest-leverage engineering improvement for the next version and is purely an implementation gap, not an architectural one.

Caching and incremental search

Every pipeline run currently starts from scratch: Exa re-discovers the same ICP pool, Claude re-enriches the same companies, and deduplication happens at the Supabase write step. I'm spending most of each run's compute budget on leads already in the database.

The better approach: cache the discovered company pool and only pass net-new companies through enrichment. Move the deduplication check to before enrichment rather than after. This would make each run significantly faster and cheaper as the database grows, since I'd only be enriching companies I haven't seen before.

Qualify-first ordering

A cheaper first-pass qualification (lightweight scoring based on company name, URL, and a brief description, with no deep enrichment) could filter the discovery pool before spending API calls on full enrichment. A company that's clearly outside the ICP on basic signals doesn't need Phase B contact enrichment. Implementing this as a pre-enrichment filter would reduce cost without compromising list quality.

Pipeline scheduling

The pipeline currently runs on demand. For production, I want it running on a nightly schedule so the dashboard shows fresh results each morning without a manual trigger. A GitHub Actions cron job is the right infrastructure for this — stateless, free at this scale, and easy to configure. I used the same pattern in the daily digest project covered in Parts 1 and 2 of this series.

The Architecture I'm Building Toward

The current architecture is a pull model: run the pipeline, get results. The v2 model is a push model: continuously populate Supabase with a growing candidate pool, and let users query it in real time.

In practice, this means running a nightly wide-net discovery job that adds new ICP-matching companies to a candidates table without enriching them immediately. Enrichment runs on demand when a user opens a lead, or on a schedule for candidates that have been in the pool long enough. The dashboard becomes a query interface over a pre-built index rather than the output of a synchronous pipeline run.

This is how production sales intelligence tools work. ZoomInfo and Apollo don't run a search when you click "find leads": they query a pre-built database that has been continuously enriched in the background. I'm building a smaller, more targeted version of that model for a specific ICP.

The enabling technology for the semantic query layer is pgvector, Postgres's vector extension, which Supabase supports natively. This is one of the main reasons I chose Supabase: the upgrade path from structured relational queries to semantic search doesn't require a new infrastructure layer. A "save search intent" feature would let a user define ICP criteria once and receive net-new matching candidates every morning without triggering a manual run.

V2 Vision

Nightly wide-net discovery feeds a growing Supabase candidate pool. Users query it via structured filters plus semantic search (pgvector). Saved search intent becomes a recurring enrichment job. The shift is from "generate a list on demand" to "surface relevant signals continuously."

What This Taught Me

The biggest practical lesson from this project is about the relationship between tool choice and stage of work. Streamlit, Lovable, and Replit are all genuinely good tools: they're optimised for a different stage than what I was building toward. Streamlit is right for fast iteration when I'm the only user. Lovable and Replit are right when I need a shareable prototype quickly and don't yet need to own the infrastructure. The moment I decided this needed to be production-grade, ownable, and extensible, the stack choice became more constrained: Next.js on Vercel with Supabase is the right combination for those requirements, even though it takes longer to set up.

Three decisions that defined this project in retrospect:

Stateless infrastructure forced persistent storage discipline. Vercel's serverless model felt like a constraint when I first hit it. In practice it was a guardrail: it made the right architecture (Supabase as the integration layer between pipeline and dashboard) the obvious choice rather than an optional upgrade.

Switching from filter to qualifier shifted the design philosophy. What started as a compute problem (too few leads, too many API calls) resolved into a product principle: the pipeline's job in v1 is to surface signals with transparent reasoning, not to make decisions on behalf of the user.

Naming the unsolved problems is part of the architecture. Sequential execution, caching, qualify-first ordering: these aren't failures of the current build. They're the roadmap for the next one. Understanding where a system is inefficient and why is as important as understanding where it works.

The code is on GitHub. If you're building something similar — lead generation, research automation, or any pipeline where the search space is the hard problem — I'm happy to compare notes.

AI Agents Lead Generation Solutions Architecture Next.js Supabase Vercel Exa API

About the author: I'm Linus, a Singaporean Product Manager currently based in San Francisco. I write about building practical AI systems from the perspective of someone who's learning by doing. This is part of my ongoing series, Applied AI Thinking for Operators.

Email: seah.linus@gmail.com
GitHub: linusseah

References:
· Part 3 — When Your User Isn't You (design thinking companion to this post)
· Exa API — findSimilar and neural search documentation
· Supabase documentation — Postgres, RLS, and pgvector
· Vercel — serverless functions and Edge Runtime
· Next.js documentation — App Router and server components
· Anthropic Claude API documentation
· Live dashboard — instalily-lead-gen.vercel.app
· instalily-lead-gen — public GitHub repository