Articles
These days I spend most of my time thinking about the intersection of business strategy, product thinking, and AI. I'm particularly interested in how organizations can adopt AI tools practically. Beyond the technical side, I’m curious about what AI means for the world at large; how it will transform work, how future generations will learn, whether it narrows or widens inequality, and what it ultimately says about the value of human judgment and our responsibility in guiding its development..
I learn by building. My current focus is on understanding agentic AI systems, what they can actually do, where they break down, and how to evaluate the trade-offs between complexity and value. I write about what I learn so that others navigating the same questions can benefit from the experiments I've already run.
-
The technology is the easy part. Here is what actually determines whether enterprise AI adaptation succeeds or fails. From organizational readiness to workflow redesign, unpacking the four fronts that matter most.
-
Two questions keep coming up in every conversation I have about AI and work: what should I study, and what skills should I be building? Here is what current research actually says, and what to do about it. Includes full research paper on AI's labor market impact.
-
Reflections from working with the research team at Genmo: what open world models actually are, how they differ from physical AI and VLAs, why the LLM playbook doesn't transfer, and what needs to be true to push this frontier forward.
-
Scaling from a single personal agent to a multi-agent production pipeline, and what that progression forced me to rethink. From ICP discovery with Exa's findSimilar to 3-tier engagement modeling to qualification as explainability, not filtering.
-
Three iterations, two complete rebuilds, and what I learned about picking the right stack for the right stage of a project. Why Streamlit → Next.js, JSON → Supabase, and what filter-vs-qualifier taught me about product philosophy.
-
The gap between benchmark scores and real-world performance is wider than the leaderboards suggest. Understanding the three layers of model evaluation (general capability, safety, and task-specific performance), why benchmarks degrade over time, and the case for building your own evaluation framework.
-
Using a language model to score another language model's output sounds circular. An eight-dimension custom rubric, Pearson calibration to align the judge with my taste, what the scores revealed about source diversity issues, and why evaluation costs more than generation.
-
I built the same thing three times — each with a different level of "agency." Here's what I learned about what that word actually means, why automatic doesn't equal autonomous, and how to think about the spectrum of agency in AI systems.
-
The implementation details: Claude Agent SDK, Exa Search, free vs. paid models, every bug I hit, cost analysis, and the fallback pattern that makes agentic systems production-ready. Full code walkthroughs included.
-
A Flask web application that analyses textual input to classify emotions at the sentence level, combining a Logistic Regression model (trained on ~500k tagged observations) with VADER valence scoring and TextBlob phrase extraction. Deployed on Heroku.
-
Web scraping posts from two subreddits via the Pushshift API and applying NLP and classification modelling (Logistic Regression, Naive Bayes, Random Forest) to accurately distinguish between communities.
-
Using the Ames housing dataset to estimate sale prices and identify features that predict abnormal sales (foreclosures). Covers the full ML project framework: EDA, feature engineering, Lasso/Ridge regression, and model evaluation.