Claude 4.6 and the Agent SDK: Patterns for Production Agents

Neural circuit pattern representing a language model

The Claude 4.6 family dropped in early 2026 with three models worth rebuilding around: Opus 4.6 with an optional 1M-token context window, Sonnet 4.6 as the balanced daily driver, and Haiku 4.5 for fast, cheap calls. The bigger shift is the Claude Agent SDK, which Anthropic released alongside — a proper framework for the things every team has been reinventing.

This is a focused tour of what changed and how the Agent SDK's patterns land in real code. I have been running Sonnet 4.6 as the primary model on a production agent for a few weeks, and the day-two ergonomics are a real step up from hand-rolled loops.

The Three Models at a Glance

Opus 4.6 (1M) — frontier reasoning and long-context work. The 1M variant (model ID claude-opus-4-6[1m]) is gated behind a feature flag and priced higher per token; the standard 200K context version is the default. Good for whole-codebase analysis, long transcripts, and document ingestion.
Sonnet 4.6 — the pragmatic choice for most production work. Matches or beats Opus 4.5 on most evals at a fraction of the cost.
Haiku 4.5 — surprisingly capable. I use it for classification, extraction, and tool-routing steps inside a larger agent where Sonnet is overkill.

Prompt caching now has a 5-minute TTL by default and extended cache lookups are more aggressive at matching partial prefixes, which matters a lot for agent loops where the system prompt and tool definitions repeat across every turn.

The Agent SDK: What It Replaces

If you have built agents with the raw Messages API, you know the drill: a while loop, parse stop_reason: "tool_use", execute the tool, append the result, continue. It works, but it ends up being the same 80 lines in every codebase, and getting cancellation, retries, and streaming right is finicky.

The Claude Agent SDK handles all of that. You define tools, pass them in, and the SDK runs the loop until the model produces a final response or you hit a stopping condition.

agents/repo-inspector.ts

import { Agent, tool } from '@anthropic-ai/agent-sdk';
import { z } from 'zod';
import { readFile, glob } from 'node:fs/promises';

const readRepoFile = tool({
  name: 'read_file',
  description: 'Read a file from the repository',
  input_schema: z.object({
    path: z.string().describe('Repo-relative path'),
  }),
  handler: async ({ path }) => {
    return { content: await readFile(path, 'utf8') };
  },
});

const findFiles = tool({
  name: 'glob',
  description: 'Find files matching a glob pattern',
  input_schema: z.object({ pattern: z.string() }),
  handler: async ({ pattern }) => {
    return { matches: await glob(pattern) };
  },
});

export const repoInspector = new Agent({
  model: 'claude-sonnet-4-6',
  system: 'You are a code-review assistant. Be concise.',
  tools: [readRepoFile, findFiles],
  max_steps: 10,
});

const result = await repoInspector.run({
  input: 'Summarize the authentication flow in this repo',
});
console.log(result.final_message);

The SDK streams intermediate steps by default, so for await (const event of agent.stream(input)) gives you tool calls, tool results, and text deltas as they happen — which is what you want to render in a UI.

Sub-agents

One pattern the SDK makes clean is delegating work to a cheaper model. Your main agent runs Sonnet; when it needs to extract a structured summary of a long document, it calls a sub-agent running Haiku. The sub-agent is just another tool from the parent's perspective.

agents/delegator.ts

import { Agent, tool } from '@anthropic-ai/agent-sdk';
import { z } from 'zod';

const summarizer = new Agent({
  model: 'claude-haiku-4-5',
  system: 'Return exactly three bullet points.',
  tools: [],
  max_steps: 1,
});

const summarizeDoc = tool({
  name: 'summarize_document',
  description: 'Summarize a long document using a fast model',
  input_schema: z.object({ text: z.string() }),
  handler: async ({ text }) => {
    const res = await summarizer.run({ input: text });
    return { summary: res.final_message };
  },
});

export const researcher = new Agent({
  model: 'claude-sonnet-4-6',
  system: 'You research topics and cite sources.',
  tools: [summarizeDoc /*, searchWeb, fetchUrl */],
  max_steps: 8,
});

Cost Tip: In a recent project, wrapping a summarization step as a Haiku sub-agent dropped the per-query cost from about $0.14 to $0.06 while keeping answer quality identical. The delegation overhead is marginal — the sub-agent call is one more round-trip, not a full turn.

Memory and Long Context

The Agent SDK ships with a pluggable memory interface. The default in-process implementation is fine for short-lived agents, but for long-running sessions you can drop in a vector store, a file-backed key-value store, or something like MemPalace for structured memory.

With Opus 4.6's 1M context, "just keep everything in the window" is suddenly a viable strategy for many session types. A full codebase at ~400K tokens fits comfortably with room for transcripts and tool output. Prompt caching makes the repeated ingestion economical; the 5-minute TTL is long enough to cover a multi-turn conversation, and an explicit cache_control: 'ephemeral' on the system block keeps per-turn costs predictable.

Model IDs and Where to Set Them

claude-opus-4-6 and claude-opus-4-6[1m] — 200K and 1M context variants
claude-sonnet-4-6
claude-haiku-4-5-20251001

Route these through the Vercel AI Gateway (anthropic/claude-sonnet-4-6) if you want failover and unified billing, or call the Messages API directly when you need prompt caching or the raw thinking block.

Gerson

What's New in Claude 4.6 and the Agent SDK

The Three Models at a Glance

The Agent SDK: What It Replaces

agents/repo-inspector.ts

Sub-agents

agents/delegator.ts

Memory and Long Context

Model IDs and Where to Set Them

Resources

Table of Contents