12 KiB
x402 Streaming Payment Architecture
Overview
NOFX calls AI models (DeepSeek, GPT, Claude, etc.) through the claw402 gateway, using the x402 protocol to pay per request with USDC on Base L2.
This document describes the full implementation of the SSE streaming call mode, including client, server, and billing logic.
Why Streaming Is Needed
NOFX (client) ──→ Cloudflare (100s idle limit) ──→ claw402 (gateway) ──→ AI upstream
- DeepSeek inference takes 60–180 seconds (up to 5 minutes)
- Cloudflare enforces a 100-second hard limit on idle connections, returning 520/EOF on timeout
- Non-streaming mode: the client receives no data until inference completes — Cloudflare disconnects after 100 seconds
- Streaming mode: the first byte arrives within seconds, subsequent chunks flow continuously, keeping Cloudflare alive
End-to-End Request Flow
NOFX Client claw402 Gateway AI Upstream
│ │ │
── Phase 1: Payment ──────────────────────────────────────────────────────────────────────────────
│ │ │
1. POST /api/v1/ai/... │ ─── body + stream:true ──────────→ │ │
(no payment header) │ │ │
│ ←── 402 + Payment-Required ────── │ │
│ (base64 JSON: price/chain/asset) │
│ │ │
2. EIP-712 signing │ │ │
(USDC TransferWithAuth)│ │ │
│ │ │
3. POST + X-Payment hdr │ ─── body + signature ────────────→ │ │
│ │ ── verify signature → Facilitator
│ │ ←── OK ──────────── Facilitator
│ │ ── settle USDC ───→ Facilitator
│ │ ←── tx hash ─────── Facilitator
│ │ │
── Phase 2: Streaming Response ───────────────────────────────────────────────────────────────────
│ │ │
│ ←── 200 OK ────────────────────── │ ─── POST stream:true ────────→ │
│ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ──────────────── │
│ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ──────────────── │
│ ←── ... (continuous) ──────────── │ ←── ... ─────────────────────── │
│ ←── data: [DONE] ──────────────── │ ←── data: [DONE] ────────────── │
Client Implementation (NOFX)
File Structure
| File | Responsibility |
|---|---|
mcp/payment/claw402.go |
Claw402Client — model routing, wallet management |
mcp/payment/x402.go |
x402 payment flow core — DoX402RequestStream, X402CallStream |
mcp/client.go |
ParseSSEStream — shared SSE parsing function |
Call Chain
Claw402Client.Call()
└→ X402CallStream() // x402.go:380
├→ Build request body + inject stream:true
├→ DoX402RequestStream() // x402.go:239
│ ├→ Send initial request (no payment header)
│ ├→ Receive 402 → parse Payment-Required header
│ ├→ signFn() → EIP-712 signature
│ └→ Send retry request with X-Payment header → return open *http.Response
│
├→ Start idle timeout watchdog (90s with no data → disconnect)
├→ TeeReader: simultaneous SSE parsing + raw byte buffering
├→ ParseSSEStream() // client.go:703
│ ├→ bufio.Scanner line-by-line read
│ ├→ Parse "data: {...}" → OpenAI chunk format
│ └→ Accumulate text + call onChunk callback
│
└→ Fallback: if SSE yields nothing, try JSON parsing on buffered bodyBuf
Request Identification
Every request carries an X-Client-ID: nofx header (x402.go:473), allowing claw402 to identify the request source for logging and monitoring.
Model Routing
claw402ModelEndpoints maps user-friendly model names to API paths:
"deepseek" → "/api/v1/ai/deepseek/chat"
"gpt-5.4" → "/api/v1/ai/openai/chat/5.4"
"claude-opus" → "/api/v1/ai/anthropic/messages/opus"
"qwen-max" → "/api/v1/ai/qwen/chat/max"
// ... more
Anthropic endpoints (containing /anthropic/) automatically switch to the Messages API wire format.
Server Implementation (claw402)
Core Problem: ginmw Is Incompatible with SSE
Coinbase's standard Gin middleware ginmw.PaymentMiddlewareFromConfig internally works as follows:
1. Wrap c.Writer with responseCapture (all writes go to buffer)
2. c.Next() — handler runs, SSE chunks all go into buffer
3. Settle payment after handler completes
4. Write buffered content to client only after successful settlement
Problems:
- SSE chunks are buffered — the client receives no data for minutes
- Cloudflare disconnects after 100 seconds → 520 error
- Handler runs too long (5 min), settlement context expires
Solution: streamAwareX402Middleware
Dual-path design (internal/gateway/x402.go):
func streamAwareX402Middleware(streamServer, standardMW) {
return func(c *gin.Context) {
if !isStreamingBody(c) {
standardMW(c) // Non-streaming → standard ginmw (battle-tested)
return
}
// Streaming → custom path
}
}
Non-Streaming Path
Delegates entirely to ginmw.PaymentMiddlewareFromConfig with no custom logic.
Streaming Path (Pre-Settlement)
1. isStreamingBody(c) — read body to check for {"stream": true}, restore body
2. streamServer.RequiresPayment(reqCtx) — does this route require payment?
3. streamServer.ProcessHTTPRequest() — verify X-Payment signature
4. handleStreamingPayment():
a. ProcessSettlement() — settle USDC on-chain (collect payment first)
b. c.Next() — pass to HandleAPIKeyStream
c. SSE chunks write directly to c.Writer (no responseCapture buffer)
Key differences:
| Standard ginmw (non-streaming) | Custom path (streaming) | |
|---|---|---|
| Settlement timing | After handler completes | Before handler starts |
| Response buffer | responseCapture buffers everything |
No buffer, writes directly to client |
| Timeout risk | Slow handler causes context expiry | Settlement uses context.Background() |
| SSE compatible | No | Yes |
Billing Logic
x402 Protocol Flow
x402 is an HTTP 402 payment protocol proposed by Coinbase. Core roles:
- Resource Server (claw402) — provides paid APIs
- Client (NOFX) — consumer, holds an EVM wallet
- Facilitator (Coinbase CDP) — verifies signatures, executes on-chain settlement
Payment Signing (EIP-712)
Client signature type: USDC TransferWithAuthorization
1. Receive Payment-Required header from 402 response (base64 JSON)
2. Decode to get:
- scheme: "exact"
- network: "eip155:8453" (Base L2)
- amount: USDC amount (e.g., "3000" = $0.003)
- asset: USDC contract address
- payTo: claw402 recipient address
3. Sign with wallet private key using EIP-712, authorizing USDC transfer from user wallet to payTo
4. Place signature in X-Payment + Payment-Signature headers
Pricing Models
Each AI model route has its own price configured in claw402:
| Mode | Description | Example |
|---|---|---|
| Fixed price | Specified directly via user_price field |
$0.003 per request |
| Token-based dynamic pricing | Calculated from request token count | $0.001 per 1K tokens |
| Dispatch fallback | Default price for SDK-compatible routes | $0.01 per request |
// Fixed price
price := fmt.Sprintf("$%s", route.UserPrice)
// Dynamic pricing
price = DynamicPriceFunc(func(ctx, reqCtx) (Price, error) {
return resolveDynamicPrice(ctx, reqCtx, rule)
})
Retry Logic and Double-Charge Prevention
const X402MaxPaymentRetries = 5
const X402RetryBaseWait = 3 * time.Second
- 5xx errors → exponential backoff retry (3s, 6s, 9s...), no re-signing (same payment authorization)
- Another 402 → previous signature expired, re-sign and retry (on-chain authorization auto-invalidates, no double charge)
- 4xx (non-402) → non-retryable, fail immediately
- Outer retry is set to 1 (
WithMaxRetries(1)) to prevent outer retries from causing duplicate payments
Settlement Timing: Streaming vs Non-Streaming
| Non-Streaming | Streaming | |
|---|---|---|
| Settlement timing | After receiving full response | Before streaming begins |
| Risk | Low (content confirmed before charge) | Slightly higher (charge before seeing content) |
| Necessity | Standard mode | Must charge first, otherwise SSE is buffered |
Timeout Configuration
| Location | Timeout | Purpose |
|---|---|---|
NOFX X402Timeout |
5 min | HTTP client overall timeout |
NOFX x402StreamIdleTimeout |
90s | SSE idle disconnect (prevent hangs) |
NOFX CallWithRequestStream idle |
60s | Idle timeout for non-x402 streaming |
claw402 ResponseHeaderTimeout |
120s | Wait for first byte from AI upstream |
claw402 streamingHTTP.Timeout |
0 (unlimited) | SSE stream can last indefinitely |
claw402 standardMW WithTimeout |
10 min | Non-streaming ginmw overall timeout |
claw402 x402PaymentTimeout |
30s | Payment verification/settlement timeout |
SSE Fault Tolerance
TeeReader Dual Parsing
var bodyBuf bytes.Buffer
tee := io.TeeReader(resp.Body, &bodyBuf)
text, sseErr := ParseSSEStream(tee, onChunk, onLine)
if text != "" {
return text, nil // SSE succeeded
}
// SSE yielded nothing → try JSON parsing on bodyBuf (server may have returned non-streaming JSON)
jsonText, _ := ParseMCPResponse(bodyBuf.Bytes())
Idle Timeout Watchdog
go func() {
t := time.NewTimer(90s)
for {
select {
case <-t.C:
cancel() // timeout → cancel context → close TCP → body.Read() returns error
case <-resetCh:
t.Reset(90s) // received SSE line → reset timer
}
}
}()
Every incoming SSE line resets the timer. If no data arrives for 90 seconds, the context is cancelled and the TCP connection is closed, preventing indefinite blocking.
Related Files
NOFX (Client)
mcp/payment/claw402.go— Claw402Client entry pointmcp/payment/x402.go— x402 payment flow (DoX402Request, DoX402RequestStream, X402CallStream)mcp/payment/x402_sign.go— EIP-712 signing implementationmcp/client.go— ParseSSEStream, CallWithRequestStream
claw402 (Server)
internal/gateway/x402.go— x402 middleware (streamAwareX402Middleware)internal/gateway/proxy/stream.go— SSE proxy (HandleAPIKeyStream)internal/config/— Route configuration (pricing, model mapping)