News from March 2026
The video explains why large language models frequently hallucinate—making factual errors, fabricating entities, and ignoring provided context—and offers practical ways to reduce these failures by supplying sources and using search tools.
Explains TurboQuant, a technique that dramatically shrinks KV cache memory to enable 4–8x larger context windows on consumer hardware with little to no accuracy loss, and why this unlocks more capable local AI workflows.
A practical walkthrough of running local AI on a 128 GB unified-memory mini PC, covering hardware choices, VRAM needs, quantization, Linux setup, and real-world results for chat and coding workflows.
The video benchmarks nine AI code review bots on a Convex + React stack across realistic PR tests—covering indexing, auth, performance, schema design, and OCC—to reveal which tools catch real issues versus noisy false positives.
A quick walkthrough showing how to generate images with TanStack AI using OpenRouter, including the prompt, adapter setup, and rendering the returned images.
A hands-on first look at the Tiiny AI Pocket Lab—a portable 80GB-RAM device that runs large language and image models locally, showcasing setup, model store, agent integrations, image generation, coding with GLM 4.7 Flash, and real-world speed tests.
Overview of Claude Code’s updated Skill Creator, showing how to build, evaluate, optimize, and reliably trigger skills, capped with a live end-to-end skill build and report demo.
The video demonstrates with concrete cost breakdowns and licensing caveats that self‑hosting large and even smaller open‑weight LLMs is far more expensive and riskier than using today’s heavily subsidized AI APIs, which are 10–30× cheaper for comparable throughput.
A concise walkthrough of seven phases—idea, research, prototype, PRD, implementation planning, execution, and QA—to reliably ship software with AI coding agents.
A practical guide to building a local AI PC that prioritizes GPU VRAM, with clear budget tiers, model quantization tips, and tooling choices like Ollama vs LM Studio.
A practical walkthrough of six core LLM generation controls—temperature, top‑p, top‑k, stop sequences, frequency penalty, and presence penalty—showing how to tune one model for consistent agents, creative writing, and precise code docs.
Theo argues that many AI-coded dev tools like Cursor and Claude Code feel inconsistent and sloppy because they were built too early with weaker models, and he proposes strict code quality, aggressive refactoring, and even maintaining a prototype “slop” codebase alongside a clean production one to fix it.