nofx/docs/architecture/AGENT_MEMORY_AND_PLANNING.md

# NOFXi Agent Memory And Planning Design

## Purpose

This document explains how the current NOFXi agent handles:

- short-term conversation memory
- durable task memory
- durable execution / planning state
- planner execution and replanning
- state reset and resume behavior

The implementation described here is primarily in:

- `agent/history.go`
- `agent/memory.go`
- `agent/execution_state.go`
- `agent/planner_runtime.go`
- `agent/agent.go`

## High-Level Model

The current agent uses three different layers of state:

1. `chatHistory`
Recent in-memory user/assistant turns for the live conversation.

2. `TaskState`
Durable summarized context that should survive beyond recent turns.

3. `ExecutionState`
Durable workflow state for the currently running or recently blocked plan.

These three layers serve different purposes and should not be treated as the same thing.

## State Layers

### 1. `chatHistory`

Defined in `agent/history.go`.

Role:

- stores recent `user` / `assistant` messages in memory
- keyed by `userID`
- used as short-term conversational context
- acts as the source material for later compression into `TaskState`

Characteristics:

- in-memory only
- capped by `maxTurns`
- cleared by `/clear`
- not suitable as durable truth

Typical contents:

- the last few user questions
- the last few assistant replies
- temporary conversational wording

### 2. `TaskState`

Defined in `agent/memory.go`.

Role:

- stores durable, structured, non-derivable context
- persisted through `system_config`
- injected into planning and reasoning prompts

Storage key:

- `agent_task_state_<userID>`

Fields:

- `CurrentGoal`
- `ActiveFlow`
- `OpenLoops`
- `ImportantFacts`
- `LastDecision`
- `UpdatedAt`

Intended contents:

- user goal that still matters across turns
- high-level unresolved issues that still matter across turns
- facts that tools cannot cheaply re-fetch
- latest important decision summary

Explicitly not intended for:

- step-level pending items such as "wait for API key"
- execution actions such as "call get_exchange_configs"
- live balances
- current positions
- current market prices
- mutable configuration availability

Those should be checked from tools at planning time instead of being trusted from old summaries.

### 3. `ExecutionState`

Defined in `agent/execution_state.go`.

Role:

- stores the current execution workflow
- allows the agent to resume after `ask_user`
- persists plan steps, observations, and completion status

Storage key:

- `agent_execution_state_<userID>`

Fields:

- `SessionID`
- `UserID`
- `Goal`
- `Status`
- `PlanID`
- `Steps`
- `CurrentStepID`
- `Observations`
- `FinalAnswer`
- `LastError`
- `UpdatedAt`

This is the planner's working state, not a general memory store.

## Data Flow

### Request Entry

Entry points:

- `HandleMessage(...)`
- `HandleMessageStream(...)`

Flow:

1. user message enters `agent`
2. slash commands and explicit direct branches are handled first
3. all other requests go into planner flow via `thinkAndAct(...)` / `thinkAndActStream(...)`

### Planner Flow

The planner pipeline in `agent/planner_runtime.go` is:

1. append user message into `chatHistory`
2. emit `planning` SSE event
3. load `ExecutionState`
4. optionally reset stale `ExecutionState`
5. optionally refresh dynamic configuration snapshots
6. create a fresh execution plan with the LLM
7. execute steps one by one
8. persist `ExecutionState` after important transitions
9. append assistant answer into `chatHistory`
10. maybe compress old conversation into `TaskState`

## Short-Term vs Durable Memory

### What lives in `chatHistory`

Good fits:

- raw recent messages
- conversational wording
- latest assistant phrasing

Bad fits:

- long-lived truths
- current external system state

### What lives in `TaskState`

Good fits:

- durable goal
- high-level unfinished work that remains relevant across turns
- important facts the user stated
- previous decisions and why they were made

Bad fits:

- pending steps inside the current plan
- execution-level reminders such as "wait for a field" or "call a tool"
- old conclusions about whether tools exist
- old conclusions about whether model/exchange config is present
- live operational state that can change outside the chat

### What lives in `ExecutionState`

Good fits:

- current plan steps
- observations from tool calls
- blocked-on-user-input status
- exact current workflow state
- step-level pending work and block reasons

Bad fits:

- evergreen user profile
- long-term semantic memory

## Planning Logic

### Plan Creation

`createExecutionPlan(...)` sends the following into the planner model:

- available tool definitions
- persistent preferences
- `TaskState` context
- `ExecutionState` JSON
- current user request

The planner must return JSON only with step types:

- `tool`
- `reason`
- `ask_user`
- `respond`

### Step Execution

`executePlan(...)` executes the plan loop:

- `tool`
  call tool and append observation
- `reason`
  run reasoning sub-call and append observation
- `ask_user`
  save `waiting_user` state and return question
- `respond`
  generate final answer and mark completed

After each completed step, `replanAfterStep(...)` may:

- continue
- replace remaining steps
- ask user
- finish

## Resume Behavior

When `ExecutionState.Status == waiting_user`, the next user turn is treated as a reply to the pending question.

Current safeguards:

- latest asked question is extracted from the stored plan
- the user reply is appended as a `user_reply` observation
- planner prompt receives explicit `Resume context`

This prevents short replies like `是` from being misread as unrelated fresh intents as often as before.

## Dynamic State Refresh

Configuration and trader management requests are dynamic by nature. Their truth can change outside the current chat, for example:

- user configures exchange in the UI
- user adds model in another tab
- user creates trader elsewhere

Because of that, configuration/trader requests should not trust stale model conclusions.

Current protection in `planner_runtime.go`:

- detects config / trader intent with `isConfigOrTraderIntent(...)`
- clears `TaskState` context from the planner prompt for these requests
- refreshes `ExecutionState.Observations` with fresh snapshots from:
  - `toolGetModelConfigs(...)`
  - `toolGetExchangeConfigs(...)`
  - `toolListTraders(...)`

This makes the planner rely more on current system state and less on older narrative memory.

## Reset Strategy

The system currently resets or weakens stale execution state when:

- user says retry-like phrases such as `再试`, `继续`, `try again`, `continue`
- request is config / trader related and old execution state is failed / completed / waiting

Reset scope:

- `ExecutionState` may be cleared
- `TaskState` is not globally deleted, but it is intentionally ignored for config/trader planning

Manual reset:

- `/clear`

This clears:

- short-term chat history
- task state
- execution state

## Compression Design

`maybeCompressHistory(...)` moves older short-term chat content into `TaskState` when:

- recent message count exceeds the configured window
- estimated token count exceeds the threshold

Compression strategy:

1. keep recent conversation in `chatHistory`
2. summarize older turns into structured `TaskState`
3. persist new `TaskState`
4. replace `chatHistory` with recent slice

Important design rule:

- `TaskState` should keep durable context only
- it should not become a stale copy of mutable operational state

## Current Architecture Diagram

```mermaid
flowchart TD
    U[User Message] --> A[HandleMessage / HandleMessageStream]
    A --> B{Direct command?}
    B -->|Yes| C[Direct branch or slash command]
    B -->|No| D[thinkAndAct / thinkAndActStream]

    D --> E[Append user turn to chatHistory]
    D --> F[Load ExecutionState]
    F --> G{waiting_user?}
    G -->|Yes| H[Attach user_reply observation]
    G -->|No| I[Create fresh ExecutionState]

    H --> J[Refresh dynamic snapshots if config/trader intent]
    I --> J
    J --> K[createExecutionPlan via LLM]
    K --> L[Execution plan]
    L --> M[executePlan loop]

    M --> N[tool step]
    M --> O[reason step]
    M --> P[ask_user step]
    M --> Q[respond step]

    N --> R[Append Observation]
    O --> R
    R --> S[replanAfterStep]
    S --> M

    P --> T[Persist waiting_user ExecutionState]
    T --> UQ[Return question to user]

    Q --> V[Persist completed ExecutionState]
    V --> W[Append assistant turn to chatHistory]
    W --> X[maybeCompressHistory]
    X --> Y[Persist TaskState]
    Y --> Z[Final response]
```

## Memory Relationship Diagram

```mermaid
flowchart LR
    CH[chatHistory\nin-memory\nrecent turns]
    TS[TaskState\npersisted summary\nsystem_config]
    ES[ExecutionState\npersisted workflow\nsystem_config]
    PL[Planner Prompt]

    CH -->|recent raw turns| PL
    ES -->|current workflow JSON| PL
    TS -->|durable structured context| PL

    CH -->|old turns compressed| TS
    PL -->|plan / observations / status| ES
```

## State Transition Diagram

```mermaid
stateDiagram-v2
    [*] --> planning
    planning --> running: plan created
    running --> waiting_user: ask_user step
    waiting_user --> planning: user replies
    running --> completed: respond step finished
    running --> failed: step error
    failed --> planning: retry / continue / config-trader reset
    completed --> planning: new relevant request or retry flow
```

## Known Design Tradeoffs

### Strengths

- separates short-term chat from durable task summary
- allows blocked flows to resume
- supports replanning after every meaningful step
- can recover from stale assumptions better for dynamic config/trader requests

### Weaknesses

- `TaskState` is still summary-driven, so summarization quality matters
- planner still depends on model compliance for some transitions
- `ExecutionState` is single-track per user, not multiple concurrent workflows
- config/trader intent detection is heuristic and keyword-based

## Practical Guidance

### When to trust `TaskState`

Trust it for:

- user intent continuity
- open loops
- durable facts

Do not trust it for:

- whether current exchange/model/trader config exists now
- whether a specific operational action is currently possible

### When to trust `ExecutionState`

Trust it for:

- current plan continuity
- exact blocked step
- latest observation chain

Do not trust it blindly when:

- user has changed configuration outside the chat
- the system capabilities changed after deployment

### When to fetch live state again

Always prefer fresh tool snapshots before answering about:

- existing model configs
- existing exchange configs
- existing traders
- whether trader creation can proceed

## Suggested Future Improvements

- add workflow versioning so capability changes invalidate stale `ExecutionState`
- separate `waiting_user_confirmation` from generic `waiting_user`
- introduce code-level handling for short confirmations such as `是`, `好`, `继续`
- move dynamic state refresh from heuristic to explicit planner preflight stage
- support multiple concurrent execution sessions per user if needed