August 2025

August 31, 2025 • Ray Fernando

A live stress test of Claude Sonnet 4’s 1M-token context in Cursor builds a transcript editor, crashes on incomplete phases, and reveals how a single MCP tool call can silently consume ~800k tokens and tank performance.

August 30, 2025 • Ray Fernando

Demo of OpenAI’s new real-time model showing natural, sassy voice interactions, unified speech processing, and developer features like image input, SIP calling, MCP tool integration, and async function calling.

Tom Ballinger demonstrates how Convex’s end-to-end types and an MCP-powered chat app enable safer, more predictable agent workflows while highlighting risks like prompt injection, token flow, and permission design.

Web Dev Cody builds a YouTube‑like full‑stack app using agentic coding with Claude and Cursor, integrating Cloudinary for uploads, chapters, transcripts, previews, and adding features like profiles, tags, subscriptions, comments, likes, notifications, and related videos while demonstrating an AI‑assisted workflow.

August 29, 2025 • GosuCoder

A hands-on review of GPT-5 with Codex IDE, Remote Agents, and CLI finds faster performance, high-quality code generation, and seamless local–cloud workflows, while noting missing features and UX annoyances like mandatory file approvals and noisy CLI output.

The creator tests whether local LLMs can handle daily coding by comparing GLM 4.5 Air, Qwen 3 Coder, GPT OSS 120B, and others on a Framework Desktop and RTX 5090, concluding that a hybrid workflow—small fast models for grunt work and larger slow models for planning without agent loops—works best.

August 25, 2025 • Alex Ziskind

Hands-on with Framework Desktop boards using AMD Ryzen AI Max+ 395 to run large local AI models quietly, benchmarking memory modes, Vulkan vs ROCm performance, and comparing against Apple M4/M4 Max and GMKTEC Evo X2.

August 24, 2025 • Ray Fernando

A practical demo showing how Claude's specialized sub-agents are created and orchestrated to refactor a real app's UI, run iterative reviews, and automate fixes across desktop and mobile with separate context windows.

August 24, 2025 • GosuCoder • 16m 48s

A hands-on review of DeepSeek v3.1 shows major gains in structured tool calling and coding workflows (especially via Claude Code), faster agentic capabilities and better benchmarks, but with slow throughput and occasional issues like unexpected Chinese strings in code.

August 22, 2025 • Theo - t3.gg • 46m 44s

Theo explains why GPT‑5’s rocky launch felt underwhelming—arguing the model is strong but hamstrung by bad routing and UX layers like ChatGPT and Cursor—and compares its real capabilities against rivals in coding and long‑running, tool‑using tasks.

August 21, 2025 • Debbie O'Brien • 6m 53s

Demo of the Playwright MCP browser extension showing how to connect to an existing logged‑in Chrome/Edge profile so an agent can run tests against authenticated sessions and even perform profile changes without sharing credentials.

August 18, 2025 • Grafikart.fr • 30m 34s

A French-language walkthrough testing GPT‑5 in three real-world dev tasks—Laravel CRUD with guidelines, a React word-search grid, and a Lacuna board game prototype—highlighting strengths, pitfalls, and agent workflows in JetBrains.

August 18, 2025 • GosuCoder • 24m 1s

The creator benchmarks Qwen 3 Coder 30B against DevStral Small and GPT OSS 20B, showing strong tool-calling reliability, high tokens-per-second, and practical coding demos on an RTX 5090.

August 15, 2025 • Web Dev Cody • 24m 17s

Web Dev Cody compares GPT-5 and Claude Opus for agentic coding by implementing an early-access feature flag and landing page, discussing speed, reliability, and prompting strategies.

August 14, 2025 • Theo - t3.gg • 31m 27s

Theo explains how his early positive experience with GPT-5 differed from the public rollout, detailing launch missteps, degraded performance in tools, and clarifying his unpaid involvement.

August 14, 2025 • GosuCoder • 14m 57s

A hands-on review of GLM 4.5 for coding shows it’s fast, capable, and great for small, UI-focused tasks, but constrained by its limited context window and potential costs on longer chains.

August 13, 2025 • Web Dev Cody • 5m 28s

A quick demo shows how to use Claude Code hooks to trigger a custom, AI‑generated voice notification when an agent run finishes, using OpenAI for text and ElevenLabs for TTS, plus a brief tour of hook events and matchers.

August 12, 2025 • GosuCoder • 19m 40s

After burning ~50M tokens testing GPT‑5, the creator shows that using low reasoning and low verbosity dramatically speeds up coding workflows compared to medium reasoning, while contrasting GPT‑5’s strengths (following precise specs, debugging) and weaknesses (ideation, vague refactors) against Sonnet and others.

August 11, 2025 • Theo - t3.gg • 20m 25s

A critical take on Anthropic’s practices around access restrictions, open source, pricing, and developer relations, arguing their edge is fading amid new competition.

August 10, 2025 • Theo - t3.gg • 43m 19s

Theo argues that code was never the bottleneck and shows how AI should be used to rapidly prototype, iterate, and validate ideas to improve team understanding and product outcomes rather than to churn out production code.

August 9, 2025 • Convex • 27m 55s

A Convex engineer compares GPT-5 and Claude Sonnet by building a multiplayer Tic-Tac-Toe app in TypeScript with a Convex backend, revealing strengths in code generation, tool-calling quirks, UI differences, and mixed results when adding authentication.

August 8, 2025 • GosuCoder • 18m 29s

A hands-on benchmark of GPT‑5 across 10 coding assistants shows it’s a strong, affordable coding model that scores in the 25k range but falls short of the top spot, with notable quirks in long agent loops, environment handling, and occasional tool-call oddities.

A hands-on review of OpenAI’s 120B and 20B open-weight MoE models finds great speed and decent chat reasoning but inconsistent, unreliable performance for agentic coding and tool use across providers and temperatures.

August 6, 2025 • Theo - t3.gg • 30m 35s

Theo breaks down OpenAI’s newly released open‑weights 120B and 20B models, testing local and cloud performance, tooling reliability, benchmarks, and practical trade‑offs for developers.

How to run very large LLMs on AMD Strix Halo systems under Linux using unified memory, with practical setup steps, Vulkan/ROCm trade‑offs, and benchmarks on an HP Z2 Mini G1a.

August 5, 2025 • Alex so yes • 39m 4s

A French masterclass shows how to install, configure, and use Claude Code in VS Code and the CLI—covering commands, memory, MCP, parallel agents, and safe YOLO mode to speed up real-world dev workflows.

August 4, 2025 • Convex • 20m 26s

A talk showing how to build real-time, code-first AI agent workflows on Convex with TypeScript, covering threads, message streaming, context fetching, RAG, rate limiting, and durable workflows that can pause, resume, and scale.

A hands-on review tests Cerebras’ Qwen 3 Coder subscription, finding solid tool-calling and minor FP8 quality loss but real-world throughput far below the advertised 2,000 tokens/s and daily token limits that shape usability.

August 2, 2025 • James Q Quick • 1h 59s

A live session where Warp's agentic terminal is used to scaffold a Chrome extension and a TypeScript/Express backend, showcasing parallel agent workflows, inline diffs, task tracking, and real-time tone-translation features.

August 1, 2025 • Grafikart.fr • 35m 37s

A practical walkthrough that explains the MCP (Models-Context-Protocol) and shows how to implement a server (HTTP + JSON-RPC) with resources, tools, and prompts, then test it with an inspector, VS Code Copilot, and Gemini.