Files
nofx/docs/architecture/AGENT_MEMORY_AND_PLANNING.md
T
lky-spec 3ca95b294d feat: port NOFXi agent module onto latest dev base (#1485)
* feat: integrate NOFXi agent into dev

* Enhance NOFXi agent workflow and diagnostics
2026-04-21 23:47:55 +08:00

455 lines
11 KiB
Markdown

# NOFXi Agent Memory And Planning Design
## Purpose
This document explains how the current NOFXi agent handles:
- short-term conversation memory
- durable task memory
- durable execution / planning state
- planner execution and replanning
- state reset and resume behavior
The implementation described here is primarily in:
- `agent/history.go`
- `agent/memory.go`
- `agent/execution_state.go`
- `agent/planner_runtime.go`
- `agent/agent.go`
## High-Level Model
The current agent uses three different layers of state:
1. `chatHistory`
Recent in-memory user/assistant turns for the live conversation.
2. `TaskState`
Durable summarized context that should survive beyond recent turns.
3. `ExecutionState`
Durable workflow state for the currently running or recently blocked plan.
These three layers serve different purposes and should not be treated as the same thing.
## State Layers
### 1. `chatHistory`
Defined in `agent/history.go`.
Role:
- stores recent `user` / `assistant` messages in memory
- keyed by `userID`
- used as short-term conversational context
- acts as the source material for later compression into `TaskState`
Characteristics:
- in-memory only
- capped by `maxTurns`
- cleared by `/clear`
- not suitable as durable truth
Typical contents:
- the last few user questions
- the last few assistant replies
- temporary conversational wording
### 2. `TaskState`
Defined in `agent/memory.go`.
Role:
- stores durable, structured, non-derivable context
- persisted through `system_config`
- injected into planning and reasoning prompts
Storage key:
- `agent_task_state_<userID>`
Fields:
- `CurrentGoal`
- `ActiveFlow`
- `OpenLoops`
- `ImportantFacts`
- `LastDecision`
- `UpdatedAt`
Intended contents:
- user goal that still matters across turns
- high-level unresolved issues that still matter across turns
- facts that tools cannot cheaply re-fetch
- latest important decision summary
Explicitly not intended for:
- step-level pending items such as "wait for API key"
- execution actions such as "call get_exchange_configs"
- live balances
- current positions
- current market prices
- mutable configuration availability
Those should be checked from tools at planning time instead of being trusted from old summaries.
### 3. `ExecutionState`
Defined in `agent/execution_state.go`.
Role:
- stores the current execution workflow
- allows the agent to resume after `ask_user`
- persists plan steps, observations, and completion status
Storage key:
- `agent_execution_state_<userID>`
Fields:
- `SessionID`
- `UserID`
- `Goal`
- `Status`
- `PlanID`
- `Steps`
- `CurrentStepID`
- `Observations`
- `FinalAnswer`
- `LastError`
- `UpdatedAt`
This is the planner's working state, not a general memory store.
## Data Flow
### Request Entry
Entry points:
- `HandleMessage(...)`
- `HandleMessageStream(...)`
Flow:
1. user message enters `agent`
2. slash commands and explicit direct branches are handled first
3. all other requests go into planner flow via `thinkAndAct(...)` / `thinkAndActStream(...)`
### Planner Flow
The planner pipeline in `agent/planner_runtime.go` is:
1. append user message into `chatHistory`
2. emit `planning` SSE event
3. load `ExecutionState`
4. optionally reset stale `ExecutionState`
5. optionally refresh dynamic configuration snapshots
6. create a fresh execution plan with the LLM
7. execute steps one by one
8. persist `ExecutionState` after important transitions
9. append assistant answer into `chatHistory`
10. maybe compress old conversation into `TaskState`
## Short-Term vs Durable Memory
### What lives in `chatHistory`
Good fits:
- raw recent messages
- conversational wording
- latest assistant phrasing
Bad fits:
- long-lived truths
- current external system state
### What lives in `TaskState`
Good fits:
- durable goal
- high-level unfinished work that remains relevant across turns
- important facts the user stated
- previous decisions and why they were made
Bad fits:
- pending steps inside the current plan
- execution-level reminders such as "wait for a field" or "call a tool"
- old conclusions about whether tools exist
- old conclusions about whether model/exchange config is present
- live operational state that can change outside the chat
### What lives in `ExecutionState`
Good fits:
- current plan steps
- observations from tool calls
- blocked-on-user-input status
- exact current workflow state
- step-level pending work and block reasons
Bad fits:
- evergreen user profile
- long-term semantic memory
## Planning Logic
### Plan Creation
`createExecutionPlan(...)` sends the following into the planner model:
- available tool definitions
- persistent preferences
- `TaskState` context
- `ExecutionState` JSON
- current user request
The planner must return JSON only with step types:
- `tool`
- `reason`
- `ask_user`
- `respond`
### Step Execution
`executePlan(...)` executes the plan loop:
- `tool`
call tool and append observation
- `reason`
run reasoning sub-call and append observation
- `ask_user`
save `waiting_user` state and return question
- `respond`
generate final answer and mark completed
After each completed step, `replanAfterStep(...)` may:
- continue
- replace remaining steps
- ask user
- finish
## Resume Behavior
When `ExecutionState.Status == waiting_user`, the next user turn is treated as a reply to the pending question.
Current safeguards:
- latest asked question is extracted from the stored plan
- the user reply is appended as a `user_reply` observation
- planner prompt receives explicit `Resume context`
This prevents short replies like `是` from being misread as unrelated fresh intents as often as before.
## Dynamic State Refresh
Configuration and trader management requests are dynamic by nature. Their truth can change outside the current chat, for example:
- user configures exchange in the UI
- user adds model in another tab
- user creates trader elsewhere
Because of that, configuration/trader requests should not trust stale model conclusions.
Current protection in `planner_runtime.go`:
- detects config / trader intent with `isConfigOrTraderIntent(...)`
- clears `TaskState` context from the planner prompt for these requests
- refreshes `ExecutionState.Observations` with fresh snapshots from:
- `toolGetModelConfigs(...)`
- `toolGetExchangeConfigs(...)`
- `toolListTraders(...)`
This makes the planner rely more on current system state and less on older narrative memory.
## Reset Strategy
The system currently resets or weakens stale execution state when:
- user says retry-like phrases such as `再试`, `继续`, `try again`, `continue`
- request is config / trader related and old execution state is failed / completed / waiting
Reset scope:
- `ExecutionState` may be cleared
- `TaskState` is not globally deleted, but it is intentionally ignored for config/trader planning
Manual reset:
- `/clear`
This clears:
- short-term chat history
- task state
- execution state
## Compression Design
`maybeCompressHistory(...)` moves older short-term chat content into `TaskState` when:
- recent message count exceeds the configured window
- estimated token count exceeds the threshold
Compression strategy:
1. keep recent conversation in `chatHistory`
2. summarize older turns into structured `TaskState`
3. persist new `TaskState`
4. replace `chatHistory` with recent slice
Important design rule:
- `TaskState` should keep durable context only
- it should not become a stale copy of mutable operational state
## Current Architecture Diagram
```mermaid
flowchart TD
U[User Message] --> A[HandleMessage / HandleMessageStream]
A --> B{Direct command?}
B -->|Yes| C[Direct branch or slash command]
B -->|No| D[thinkAndAct / thinkAndActStream]
D --> E[Append user turn to chatHistory]
D --> F[Load ExecutionState]
F --> G{waiting_user?}
G -->|Yes| H[Attach user_reply observation]
G -->|No| I[Create fresh ExecutionState]
H --> J[Refresh dynamic snapshots if config/trader intent]
I --> J
J --> K[createExecutionPlan via LLM]
K --> L[Execution plan]
L --> M[executePlan loop]
M --> N[tool step]
M --> O[reason step]
M --> P[ask_user step]
M --> Q[respond step]
N --> R[Append Observation]
O --> R
R --> S[replanAfterStep]
S --> M
P --> T[Persist waiting_user ExecutionState]
T --> UQ[Return question to user]
Q --> V[Persist completed ExecutionState]
V --> W[Append assistant turn to chatHistory]
W --> X[maybeCompressHistory]
X --> Y[Persist TaskState]
Y --> Z[Final response]
```
## Memory Relationship Diagram
```mermaid
flowchart LR
CH[chatHistory\nin-memory\nrecent turns]
TS[TaskState\npersisted summary\nsystem_config]
ES[ExecutionState\npersisted workflow\nsystem_config]
PL[Planner Prompt]
CH -->|recent raw turns| PL
ES -->|current workflow JSON| PL
TS -->|durable structured context| PL
CH -->|old turns compressed| TS
PL -->|plan / observations / status| ES
```
## State Transition Diagram
```mermaid
stateDiagram-v2
[*] --> planning
planning --> running: plan created
running --> waiting_user: ask_user step
waiting_user --> planning: user replies
running --> completed: respond step finished
running --> failed: step error
failed --> planning: retry / continue / config-trader reset
completed --> planning: new relevant request or retry flow
```
## Known Design Tradeoffs
### Strengths
- separates short-term chat from durable task summary
- allows blocked flows to resume
- supports replanning after every meaningful step
- can recover from stale assumptions better for dynamic config/trader requests
### Weaknesses
- `TaskState` is still summary-driven, so summarization quality matters
- planner still depends on model compliance for some transitions
- `ExecutionState` is single-track per user, not multiple concurrent workflows
- config/trader intent detection is heuristic and keyword-based
## Practical Guidance
### When to trust `TaskState`
Trust it for:
- user intent continuity
- open loops
- durable facts
Do not trust it for:
- whether current exchange/model/trader config exists now
- whether a specific operational action is currently possible
### When to trust `ExecutionState`
Trust it for:
- current plan continuity
- exact blocked step
- latest observation chain
Do not trust it blindly when:
- user has changed configuration outside the chat
- the system capabilities changed after deployment
### When to fetch live state again
Always prefer fresh tool snapshots before answering about:
- existing model configs
- existing exchange configs
- existing traders
- whether trader creation can proceed
## Suggested Future Improvements
- add workflow versioning so capability changes invalidate stale `ExecutionState`
- separate `waiting_user_confirmation` from generic `waiting_user`
- introduce code-level handling for short confirmations such as `是`, `好`, `继续`
- move dynamic state refresh from heuristic to explicit planner preflight stage
- support multiple concurrent execution sessions per user if needed