A mental model for agent harnesses

Most confusion around AI coding tools comes from mixing layers.

The confusing part is that the same names often point at different layers. "Claude" might mean a model such as Claude Opus, the Claude Code CLI, a Slack bot built on Claude, or an Anthropic API call. Codex can mean a model family, a product surface, or a coding-agent experience. MCP is not a model or an agent; it is a protocol for exposing tools and context. Once you split the stack into inference runtime, provider protocol, auth surface, and agent harness, the system becomes much easier to reason about.

The shortest useful model is this:

Model runtime

Predicts tokens.

Provider API

Accepts structured requests and returns structured events.

Harness

Decides context, tools, loop, approvals, memory, and execution.

Auth surface

Decides which endpoint and entitlement path you can use.

Where each thing sits

Lower layers are raw capability. Upper layers decide behavior, context, tools, safety, session shape, and UX.

User surface

Product UI

The native app, web UI, CLI, or IDE surface where users see sessions, logs, diffs, files, tools, approvals, and final output.

Agent harness

Provider binaries

Claude Code, Codex, and similar binaries bundle auth, prompt, tools, session behavior, and execution semantics.

sealed harness

OpenCode

OpenCode is a third-party harness with many providers. More open, but your product still adapts to its prompt/tool/session model.

borrowed harness

Pi

Pi is a minimal customizable harness. Prompt, tools, extensions, compaction, sessions, RPC, and SDK are meant to be changed.

editable harness

Custom harness

Your application owns prompt, tools, workers, compaction, session tree, and event stream. Providers become transports.

owned loop
Auth & billing

Subscriptions / OAuth

Entitlement path for ChatGPT Plus/Pro, Claude Pro/Max, Copilot, etc. Useful when users already pay, but product dialect constraints leak into adapters.

API keys / direct billing

Clean for custom harnesses, but usually metered token billing. This path has fewer product-protocol constraints.

Inference runtime

Inference endpoints

OpenAI-hosted GPT/Codex, Anthropic-hosted Claude, Copilot-routed models, local Ollama, vLLM, or llama.cpp. For text/chat/tool paths, structured inputs eventually become tokenized text/control tokens.

Two loops, not one

One loop generates tokens. The other loop decides what to do with model output. Tool use, MCP, files, shell, approvals, and retries live in the agent loop.

Inference loop

Owned by OpenAI, Anthropic, Ollama, vLLM, llama.cpp, or another model runtime. It is one model turn.

tokens = tokenize(rendered_prompt)
while not stopped:
  logits = model.forward(tokens)
  next = sampler(logits)
  tokens.append(next)
  if next == EOS or stop_sequence or max_tokens:
    break
return detokenize(new_tokens)

Agent loop

Owned by the harness: OpenCode, Claude Code, Codex, TanStack AI, Pi, or your own app. It is many model turns plus effects.

messages = [user_task]
for turn in 1..max_turns:
  output = run_inference(messages, tools)
  if output.tool_call:
    result = execute_tool(output.tool_call)
    messages += [output.tool_call, result]
    continue
  return output.final_text

Providers can blur this boundary with server-side tools, managed agents, or built-in code execution. But client-side tool execution still needs a harness loop.

Everything becomes tokens, eventually

For the text path, a model does not directly see JavaScript objects, JSON objects, or HTTP requests. The runtime first turns text into tokens: small chunks such as words, word pieces, punctuation, or special control markers. Each token has a numeric ID in the model's vocabulary. Those numbers are what the model receives.

Text:
"The capital of France is"

One possible tokenization:
["The", " capital", " of", " France", " is"]

Token IDs sent to the model:
[791, 6864, 315, 9822, 374]

An API request may start as structured data:

{
  "role": "tool",
  "content": "test failed: expected 4 got 5"
}

Before inference, the provider or local runtime renders that into a model-specific format, conceptually something like:

<|tool|>
test failed: expected 4 got 5
<|end|>
<|assistant|>

Then a tokenizer converts it into token IDs. The model computes raw scores for the next token. Those scores are called logits.

Prompt: "The capital of France is"

Raw next-token logits:
Paris:  8.2
London: 2.1
banana: -4.0
.:      0.3

The runtime turns logits into probabilities, samples a token, appends it, and repeats.

There are caveats. Images and audio may become embeddings or multimodal tokens rather than plain text. Cloud providers may add hidden system/tool scaffolding. Strict structured outputs can use constrained decoding, where invalid next tokens are masked. But for chat, tool calls, and reasoning text, the useful mental model remains: every next token the model predicts depends on the token sequence it has received so far.

What actually happens on a prompt

Example: a harness sends OpenAI-hosted model requests with MCP tools. The model sees rendered tokens; the harness owns execution and reprompting. The loop shape below is distilled from TanStack AI, but the important part is runtime behavior, not library names.

1. User sends a prompt

The product user only sends the prompt. The harness, written by the product developer, decides which tools are available.

const messages = [
  {
    role: 'user',
    content: 'Read src/app/page.tsx and summarize it.',
  },
]

await runAgentLoop({
  model: 'gpt-5.5',
  messages,
  tools: [readFileTool],
})

2. MCP tools become executable harness tools

For this read_file example, the schema tells the model that a valid request needs a path, such as "src/app/page.tsx". The execute function is not shown to the model; the harness runs it after validating the tool request.

const mcpTool = {
  name: 'read_file',
  description: 'Read a UTF-8 file from the workspace.',
  inputSchema: {
    type: 'object',
    properties: { path: { type: 'string' } },
    required: ['path'],
  },
}

const readFileTool = {
  name: mcpTool.name,
  description: mcpTool.description,
  inputSchema: mcpTool.inputSchema,
  execute: async ({ path }, { abortSignal }) => {
    return await mcp.callTool(
      { name: 'read_file', arguments: { path } },
      { signal: abortSignal },
    )
  },
}

3. Chat engine starts the agent loop

The harness alternates tool execution and model turns. The call to askModel({ messages, tools }) is the handoff to the next step.

const maxTurns = 5
let pendingToolCalls = []

for (let turn = 0; turn < maxTurns; turn += 1) {
  if (pendingToolCalls.length > 0) {
    await runToolsAndAppendResults(pendingToolCalls, messages)
  }

  const modelTurn = await askModel({ messages, tools })

  if (modelTurn.finalAnswer) return modelTurn.finalAnswer
  pendingToolCalls = modelTurn.toolCalls
}

4. Adapter prepares one provider turn

askModel does not stream immediately. It first converts harness state into the provider's request shape: messages plus provider-formatted tool contracts.

async function askModel({ messages, tools }) {
  const providerTools = tools.map(toProviderTool)
  const request = toProviderRequest({
    model: 'gpt-5.5',
    messages,
    tools: providerTools,
  })

  return streamProviderTurn(request)
}

5. Adapter converts the tool contract

This is the useful adapter move: an executable local tool becomes a provider-visible contract. The model can request read_file with arguments matching the schema; later, the harness maps that request back to readFileTool.

function toProviderTool(tool) {
  return {
    type: 'function',
    name: tool.name,
    description: tool.description,
    parameters: tool.inputSchema,
  }
}

const openAITool = toProviderTool(readFileTool)

openAITool === {
  type: 'function',
  name: 'read_file',
  description: 'Read a UTF-8 file from the workspace.',
  parameters: {
    type: 'object',
    properties: { path: { type: 'string' } },
    required: ['path'],
  },
}

6. Request streams to the provider

Now the request leaves the harness. The provider runs one inference turn. Internally, structured fields become model-specific token/control-token context, plus any hidden provider scaffolding.

const request = {
  model: 'gpt-5.5',
  instructions: 'system/developer prompts...',
  input: [
    { role: 'user', content: 'Read src/app/page.tsx and summarize it.' }
  ],
  tools: [
    {
      type: 'function',
      name: 'read_file',
      description: 'Read a UTF-8 file from the workspace.',
      parameters: {
        type: 'object',
        properties: { path: { type: 'string' } },
        required: ['path'],
      }
    }
  ],
  stream: true,
}

for await (const event of provider.stream(request)) {
  handleProviderEvent(event)
}

7. Provider adapter normalizes stream events

This happens in the adapter/harness layer, not inside the model. The provider emits raw stream events; the adapter maps them into the harness's event vocabulary.

if (chunk.type === 'response.output_text.delta') {
  yield { type: EventType.TEXT_MESSAGE_CONTENT, delta: textDelta }
}

if (chunk.type === 'response.function_call_arguments.done') {
  yield {
    type: EventType.TOOL_CALL_END,
    toolCallId: chunk.item_id,
    toolName: metadata.name,
    input: JSON.parse(chunk.arguments),
  }
}

if (chunk.type === 'response.reasoning_text.delta') {
  yield { type: EventType.STEP_FINISHED, stepType: 'thinking', delta }
}

8. Harness executes tool and reprompts

If the model requested a tool, the harness appends the assistant tool-call message, executes the matching tool, appends the trusted result, and starts another inference turn.

const toolCall = {
  id: 'call_123',
  name: 'read_file',
  arguments: { path: 'src/app/page.tsx' },
}

const result = await readFileTool.execute(toolCall.arguments)

messages.push({ role: 'assistant', toolCall })
messages.push({
  role: 'tool',
  toolCallId: 'call_123',
  content: result,
})

await runNextModelTurn({ messages, tools })

Here is the same flow summarized as a sequence diagram:

User
Harness
Provider
Tool runtime
prompt
messages + provider tool schema
text delta or tool request
execute requested tool
tool result
next model turn with tool result
final answer
render response

Where chain of thought lives

Chain of thought is not one clean layer. There is private model reasoning, visible reasoning text, provider summaries, and harness policy about what to store or show.

Local model / Ollama

If the model is trained to emit reasoning tags, the inference loop can generate those tokens like any other text. The runtime may stream, split, or hide them.

<|user|>
Solve 2+2.
<|assistant|>
<think>
Need add 2 and 2. Result 4.
</think>
The answer is 4.

Cloud provider

Providers often keep full chain of thought private. APIs may expose summaries, reasoning token counts, encrypted continuity blocks, or no reasoning.

request.reasoning = { effort: 'high', summary: 'auto' }

stream events may include:
- response.reasoning_summary_text.delta
- encrypted reasoning continuity
- usage.reasoning_tokens

full hidden chain of thought: provider-private

External deliberation

The harness can create visible planning by asking the model to plan, critique, call tools, revise, and verify across multiple turns.

messages.push({ role: 'system', content: 'First produce a short plan.' })
plan = await runInference(messages)

messages.push({ role: 'assistant', content: plan })
messages.push({ role: 'system', content: 'Critique the plan against repo evidence.' })
critique = await runInference(messages)

messages.push({ role: 'assistant', content: critique })
messages.push({ role: 'system', content: 'Now revise and execute with tools.' })
final = await agentLoop(messages, tools)

Subscription auth is a product dialect

One more layer matters: how you are authenticated.

Direct API-key access usually means using the public developer endpoint documented for integrations, with metered billing. Subscription-based access often means speaking the product protocol used by the official app or CLI.

Some harnesses implement product-compatible OAuth and stream directly to provider/product endpoints instead of shelling out to provider binaries. The catch is that subscription paths usually mean speaking the official product's protocol: different endpoints, headers, identity strings, tool names, and rate/feature rules than the public API-key endpoint.

OpenAI / Codex

OpenAI API-key billing and ChatGPT/Codex subscription access use different request targets and headers.

// direct API-key style
POST https://api.openai.com/v1/responses
Authorization: Bearer sk-...

// ChatGPT/Codex subscription style
POST https://chatgpt.com/backend-api/codex/responses
Authorization: Bearer <ChatGPT OAuth access token>
chatgpt-account-id: <account id from access token>
originator: <client identifier>
OpenAI-Beta: responses=experimental

Anthropic / Claude OAuth

A Claude Code-compatible subscription adapter speaks a product dialect: bearer OAuth token, beta flags, CLI-ish headers, Claude Code identity preamble, and canonical Claude Code tool names.

new Anthropic({
  apiKey: null,
  authToken: oauthAccessToken,
  baseURL: 'https://api.anthropic.com',
  defaultHeaders: {
    'anthropic-beta': 'claude-code-20250219,oauth-2025-04-20,...',
    'user-agent': 'claude-cli/2.1.75',
    'x-app': 'cli',
  },
})

params.system = [
  { type: 'text', text: "You are Claude Code, Anthropic's official CLI for Claude." },
  { type: 'text', text: customSystemPrompt },
]

toClaudeCodeName('bash') // 'Bash'
toClaudeCodeName('read') // 'Read'

GitHub Copilot

A Copilot subscription adapter can use GitHub device login, exchange for a Copilot token, derive the API base URL, and send VS Code Copilot-style headers.

GET https://api.github.com/copilot_internal/v2/token
Authorization: Bearer <GitHub device-flow access token>

proxy-ep=proxy.individual.githubcopilot.com
baseUrl = 'https://api.individual.githubcopilot.com'

User-Agent: GitHubCopilotChat/0.35.0
Editor-Version: vscode/1.107.0
Copilot-Integration-Id: vscode-chat
X-Initiator: user | agent
Openai-Intent: conversation-edits

What dialect means

Product-compatible integration means matching protocol shape, not just replacing an API key with an OAuth token.

type ProductDialect = {
  endpoint: string
  auth: 'api-key' | 'chatgpt-oauth' | 'claude-oauth' | 'copilot-token'
  headers: Record<string, string>
  toolFormat: 'openai-functions' | 'anthropic-tools' | 'claude-code-compatible-tools'
  identity?: 'raw-api-client' | 'claude-code-compatible-cli' | 'vscode-copilot-chat'
}