News from August 2025

Claude's 1M Context in Cursor: Dream to Disaster

Video

August 31, 2025 • Ray Fernando

A live stress test of Claude Sonnet 4’s 1M-token context in Cursor builds a transcript editor, crashes on incomplete phases, and reveals how a single MCP tool call can silently consume ~800k tokens and tank performance.

OpenAI Accidentally Made AI Too Human

Video

August 30, 2025 • Ray Fernando

Demo of OpenAI’s new real-time model showing natural, sassy voice interactions, unified speech processing, and developer features like image input, SIP calling, MCP tool integration, and async function calling.

Build safer agent workflows with Convex types and MCP

Video

August 29, 2025 • Convex

Tom Ballinger demonstrates how Convex’s end-to-end types and an MCP-powered chat app enable safer, more predictable agent workflows while highlighting risks like prompt injection, token flow, and permission design.

I Tried Building a Full Stack App with Agentic Coding (Claude + Cursor)

Video

August 29, 2025 • Web Dev Cody

Web Dev Cody builds a YouTube‑like full‑stack app using agentic coding with Claude and Cursor, integrating Cloudinary for uploads, chapters, transcripts, previews, and adding features like profiles, tags, subscriptions, comments, likes, notifications, and related videos while demonstrating an AI‑assisted workflow.

Is GPT-5 and Codex worth using now?

Video

August 29, 2025 • GosuCoder

A hands-on review of GPT-5 with Codex IDE, Remote Agents, and CLI finds faster performance, high-quality code generation, and seamless local–cloud workflows, while noting missing features and UX annoyances like mandatory file approvals and noisy CLI output.

Can a Local LLM REALLY be your daily coder? Framework Desktop with GLM 4.5 Air and Qwen 3 Coder

Video

August 27, 2025 • GosuCoder

The creator tests whether local LLMs can handle daily coding by comparing GLM 4.5 Air, Qwen 3 Coder, GPT OSS 120B, and others on a Framework Desktop and RTX 5090, concluding that a hybrid workflow—small fast models for grunt work and larger slow models for planning without agent loops—works best.

Near silent LLM Monster... NVIDIA, take notes

Video

August 25, 2025 • Alex Ziskind

Hands-on with Framework Desktop boards using AMD Ryzen AI Max+ 395 to run large local AI models quietly, benchmarking memory modes, Vulkan vs ROCm performance, and comparing against Apple M4/M4 Max and GMKTEC Evo X2.

Claude Sub-Agents Workflow (Full Demo)

Video

August 24, 2025 • Ray Fernando

A practical demo showing how Claude's specialized sub-agents are created and orchestrated to refactor a real app's UI, run iterative reviews, and automate fixes across desktop and mobile with separate context windows.

DeepSeek v3.1 Update is Better than I Expected... BUT?

Video

August 24, 2025 • GosuCoder • 16m 48s

A hands-on review of DeepSeek v3.1 shows major gains in structured tool calling and coding workflows (especially via Claude Code), faster agentic capabilities and better benchmarks, but with slow throughput and occasional issues like unexpected Chinese strings in code.

The current state of gpt-5

Video

August 22, 2025 • Theo - t3.gg • 46m 44s

Theo explains why GPT‑5’s rocky launch felt underwhelming—arguing the model is strong but hamstrung by bad routing and UX layers like ChatGPT and Cursor—and compares its real capabilities against rivals in coding and long‑running, tool‑using tasks.

Playwright MCP + Chrome Extension: Testing with Logged-In Profiles

Video

August 21, 2025 • Debbie O'Brien • 6m 53s

Demo of the Playwright MCP browser extension showing how to connect to an existing logged‑in Chrome/Edge profile so an agent can run tests against authenticated sessions and even perform profile changes without sharing credentials.

Testing ChatGPT5 with Junie (JetBrains’ agent)

Video

August 18, 2025 • Grafikart.fr • 30m 34s

A French-language walkthrough testing GPT‑5 in three real-world dev tasks—Laravel CRUD with guidelines, a React word-search grid, and a Lacuna board game prototype—highlighting strengths, pitfalls, and agent workflows in JetBrains.

Are Local LLMs finally good at coding now... Qwen 3 Coder 30B

Video

August 18, 2025 • GosuCoder • 24m 1s

The creator benchmarks Qwen 3 Coder 30B against DevStral Small and GPT OSS 20B, showing strong tool-calling reliability, high tokens-per-second, and practical coding demos on an RTX 5090.

This model might be my new favorite (for agentic coding)

Video

August 15, 2025 • Web Dev Cody • 24m 17s

Web Dev Cody compares GPT-5 and Claude Opus for agentic coding by implementing an early-access feature flag and landing page, discussing speed, reliability, and prompting strategies.

I was wrong about GPT-5

Video

August 14, 2025 • Theo - t3.gg • 31m 27s

Theo explains how his early positive experience with GPT-5 differed from the public rollout, detailing launch missteps, degraded performance in tools, and clarifying his unpaid involvement.

This open weight model is a coding beast.. GLM 4.5

Video

August 14, 2025 • GosuCoder • 14m 57s

A hands-on review of GLM 4.5 for coding shows it’s fast, capable, and great for small, UI-focused tasks, but constrained by its limited context window and potential costs on longer chains.

Claude Code hooks are Officially Awesome

Video

August 13, 2025 • Web Dev Cody • 5m 28s

A quick demo shows how to use Claude Code hooks to trigger a custom, AI‑generated voice notification when an agent run finishes, using OpenAI for text and ElevenLabs for TTS, plus a brief tour of hook events and matchers.

GPT 5 is confusing.... it took me too long to figure this thing out

Video

August 12, 2025 • GosuCoder • 19m 40s

After burning ~50M tokens testing GPT‑5, the creator shows that using low reasoning and low verbosity dramatically speeds up coding workflows compared to medium reasoning, while contrasting GPT‑5’s strengths (following precise specs, debugging) and weaknesses (ideation, vague refactors) against Sonnet and others.

Anthropic has weird vibes

Video

August 11, 2025 • Theo - t3.gg • 20m 25s

A critical take on Anthropic’s practices around access restrictions, open source, pricing, and developer relations, arguing their edge is fading amid new competition.

You're using AI coding tools wrong

Video

August 10, 2025 • Theo - t3.gg • 43m 19s

Theo argues that code was never the bottleneck and shows how AI should be used to rapidly prototype, iterate, and validate ideas to improve team understanding and product outcomes rather than to churn out production code.

Does GPT-5 dethrone Claude Sonnet?

Video

August 9, 2025 • Convex • 27m 55s

A Convex engineer compares GPT-5 and Claude Sonnet by building a multiplayer Tic-Tac-Toe app in TypeScript with a Convex backend, revealing strengths in code generation, tool-calling quirks, UI differences, and mixed results when adding authentication.

The Results are in for GPT 5...

Video

August 8, 2025 • GosuCoder • 18m 29s

A hands-on benchmark of GPT‑5 across 10 coding assistants shows it’s a strong, affordable coding model that scores in the 25k range but falls short of the top spot, with notable quirks in long agent loops, environment handling, and occasional tool-call oddities.

OpenAI FINALLY releases open weight models, but can they actually code?

Video

August 7, 2025 • GosuCoder • 14m 31s

A hands-on review of OpenAI’s 120B and 20B open-weight MoE models finds great speed and decent chat reasoning but inconsistent, unreliable performance for agentic coding and tool use across providers and temperatures.

OpenAI’s open source models are finally here

Video

August 6, 2025 • Theo - t3.gg • 30m 35s

Theo breaks down OpenAI’s newly released open‑weights 120B and 20B models, testing local and cloud performance, tooling reliability, benchmarks, and practical trade‑offs for developers.

GLM 4.5-Air-106B and Qwen3-235B on AMD "Strix Halo" AI Ryzen MAX+ 395 (HP Z2 G1a Mini Workstation)

Video

August 5, 2025 • Donato Capitella

How to run very large LLMs on AMD Strix Halo systems under Linux using unified memory, with practical setup steps, Vulkan/ROCm trade‑offs, and benchmarks on an HP Z2 Mini G1a.

Claude Code: the guide 95% of devs should watch

Video

August 5, 2025 • Alex so yes • 39m 4s

A French masterclass shows how to install, configure, and use Claude Code in VS Code and the CLI—covering commands, memory, MCP, parallel agents, and safe YOLO mode to speed up real-world dev workflows.

How to Build Realtime AI Agents with Convex Components

Video

August 4, 2025 • Convex • 20m 26s

A talk showing how to build real-time, code-first AI agent workflows on Convex with TypeScript, covering threads, message streaming, context fetching, RAG, rate limiting, and durable workflows that can pause, resume, and scale.

Qwen 3 Coder at 2000 tokens per second and a reasonable price, too good to be true?

Video

August 4, 2025 • GosuCoder • 13m 19s

A hands-on review tests Cerebras’ Qwen 3 Coder subscription, finding solid tool-calling and minor FP8 quality loss but real-world throughput far below the advertised 2,000 tokens/s and daily token limits that shape usability.

Live Coding with Warp

Video

August 2, 2025 • James Q Quick • 1h 59s

A live session where Warp's agentic terminal is used to scaffold a Chrome extension and a TypeScript/Express backend, showcasing parallel agent workflows, inline diffs, task tracking, and real-time tone-translation features.

How to create an MCP server?

Video

August 1, 2025 • Grafikart.fr • 35m 37s

A practical walkthrough that explains the MCP (Models-Context-Protocol) and shows how to implement a server (HTTP + JSON-RPC) with resources, tools, and prompts, then test it with an inspector, VS Code Copilot, and Gemini.

News from August 2025

Jacky THIERRY