Files
nofx/docs/architecture/AGENT_MEMORY_AND_PLANNING.md
T
lky-spec 3ca95b294d feat: port NOFXi agent module onto latest dev base (#1485)
* feat: integrate NOFXi agent into dev

* Enhance NOFXi agent workflow and diagnostics
2026-04-21 23:47:55 +08:00

11 KiB

NOFXi Agent Memory And Planning Design

Purpose

This document explains how the current NOFXi agent handles:

  • short-term conversation memory
  • durable task memory
  • durable execution / planning state
  • planner execution and replanning
  • state reset and resume behavior

The implementation described here is primarily in:

  • agent/history.go
  • agent/memory.go
  • agent/execution_state.go
  • agent/planner_runtime.go
  • agent/agent.go

High-Level Model

The current agent uses three different layers of state:

  1. chatHistory Recent in-memory user/assistant turns for the live conversation.

  2. TaskState Durable summarized context that should survive beyond recent turns.

  3. ExecutionState Durable workflow state for the currently running or recently blocked plan.

These three layers serve different purposes and should not be treated as the same thing.

State Layers

1. chatHistory

Defined in agent/history.go.

Role:

  • stores recent user / assistant messages in memory
  • keyed by userID
  • used as short-term conversational context
  • acts as the source material for later compression into TaskState

Characteristics:

  • in-memory only
  • capped by maxTurns
  • cleared by /clear
  • not suitable as durable truth

Typical contents:

  • the last few user questions
  • the last few assistant replies
  • temporary conversational wording

2. TaskState

Defined in agent/memory.go.

Role:

  • stores durable, structured, non-derivable context
  • persisted through system_config
  • injected into planning and reasoning prompts

Storage key:

  • agent_task_state_<userID>

Fields:

  • CurrentGoal
  • ActiveFlow
  • OpenLoops
  • ImportantFacts
  • LastDecision
  • UpdatedAt

Intended contents:

  • user goal that still matters across turns
  • high-level unresolved issues that still matter across turns
  • facts that tools cannot cheaply re-fetch
  • latest important decision summary

Explicitly not intended for:

  • step-level pending items such as "wait for API key"
  • execution actions such as "call get_exchange_configs"
  • live balances
  • current positions
  • current market prices
  • mutable configuration availability

Those should be checked from tools at planning time instead of being trusted from old summaries.

3. ExecutionState

Defined in agent/execution_state.go.

Role:

  • stores the current execution workflow
  • allows the agent to resume after ask_user
  • persists plan steps, observations, and completion status

Storage key:

  • agent_execution_state_<userID>

Fields:

  • SessionID
  • UserID
  • Goal
  • Status
  • PlanID
  • Steps
  • CurrentStepID
  • Observations
  • FinalAnswer
  • LastError
  • UpdatedAt

This is the planner's working state, not a general memory store.

Data Flow

Request Entry

Entry points:

  • HandleMessage(...)
  • HandleMessageStream(...)

Flow:

  1. user message enters agent
  2. slash commands and explicit direct branches are handled first
  3. all other requests go into planner flow via thinkAndAct(...) / thinkAndActStream(...)

Planner Flow

The planner pipeline in agent/planner_runtime.go is:

  1. append user message into chatHistory
  2. emit planning SSE event
  3. load ExecutionState
  4. optionally reset stale ExecutionState
  5. optionally refresh dynamic configuration snapshots
  6. create a fresh execution plan with the LLM
  7. execute steps one by one
  8. persist ExecutionState after important transitions
  9. append assistant answer into chatHistory
  10. maybe compress old conversation into TaskState

Short-Term vs Durable Memory

What lives in chatHistory

Good fits:

  • raw recent messages
  • conversational wording
  • latest assistant phrasing

Bad fits:

  • long-lived truths
  • current external system state

What lives in TaskState

Good fits:

  • durable goal
  • high-level unfinished work that remains relevant across turns
  • important facts the user stated
  • previous decisions and why they were made

Bad fits:

  • pending steps inside the current plan
  • execution-level reminders such as "wait for a field" or "call a tool"
  • old conclusions about whether tools exist
  • old conclusions about whether model/exchange config is present
  • live operational state that can change outside the chat

What lives in ExecutionState

Good fits:

  • current plan steps
  • observations from tool calls
  • blocked-on-user-input status
  • exact current workflow state
  • step-level pending work and block reasons

Bad fits:

  • evergreen user profile
  • long-term semantic memory

Planning Logic

Plan Creation

createExecutionPlan(...) sends the following into the planner model:

  • available tool definitions
  • persistent preferences
  • TaskState context
  • ExecutionState JSON
  • current user request

The planner must return JSON only with step types:

  • tool
  • reason
  • ask_user
  • respond

Step Execution

executePlan(...) executes the plan loop:

  • tool call tool and append observation
  • reason run reasoning sub-call and append observation
  • ask_user save waiting_user state and return question
  • respond generate final answer and mark completed

After each completed step, replanAfterStep(...) may:

  • continue
  • replace remaining steps
  • ask user
  • finish

Resume Behavior

When ExecutionState.Status == waiting_user, the next user turn is treated as a reply to the pending question.

Current safeguards:

  • latest asked question is extracted from the stored plan
  • the user reply is appended as a user_reply observation
  • planner prompt receives explicit Resume context

This prevents short replies like from being misread as unrelated fresh intents as often as before.

Dynamic State Refresh

Configuration and trader management requests are dynamic by nature. Their truth can change outside the current chat, for example:

  • user configures exchange in the UI
  • user adds model in another tab
  • user creates trader elsewhere

Because of that, configuration/trader requests should not trust stale model conclusions.

Current protection in planner_runtime.go:

  • detects config / trader intent with isConfigOrTraderIntent(...)
  • clears TaskState context from the planner prompt for these requests
  • refreshes ExecutionState.Observations with fresh snapshots from:
    • toolGetModelConfigs(...)
    • toolGetExchangeConfigs(...)
    • toolListTraders(...)

This makes the planner rely more on current system state and less on older narrative memory.

Reset Strategy

The system currently resets or weakens stale execution state when:

  • user says retry-like phrases such as 再试, 继续, try again, continue
  • request is config / trader related and old execution state is failed / completed / waiting

Reset scope:

  • ExecutionState may be cleared
  • TaskState is not globally deleted, but it is intentionally ignored for config/trader planning

Manual reset:

  • /clear

This clears:

  • short-term chat history
  • task state
  • execution state

Compression Design

maybeCompressHistory(...) moves older short-term chat content into TaskState when:

  • recent message count exceeds the configured window
  • estimated token count exceeds the threshold

Compression strategy:

  1. keep recent conversation in chatHistory
  2. summarize older turns into structured TaskState
  3. persist new TaskState
  4. replace chatHistory with recent slice

Important design rule:

  • TaskState should keep durable context only
  • it should not become a stale copy of mutable operational state

Current Architecture Diagram

flowchart TD
    U[User Message] --> A[HandleMessage / HandleMessageStream]
    A --> B{Direct command?}
    B -->|Yes| C[Direct branch or slash command]
    B -->|No| D[thinkAndAct / thinkAndActStream]

    D --> E[Append user turn to chatHistory]
    D --> F[Load ExecutionState]
    F --> G{waiting_user?}
    G -->|Yes| H[Attach user_reply observation]
    G -->|No| I[Create fresh ExecutionState]

    H --> J[Refresh dynamic snapshots if config/trader intent]
    I --> J
    J --> K[createExecutionPlan via LLM]
    K --> L[Execution plan]
    L --> M[executePlan loop]

    M --> N[tool step]
    M --> O[reason step]
    M --> P[ask_user step]
    M --> Q[respond step]

    N --> R[Append Observation]
    O --> R
    R --> S[replanAfterStep]
    S --> M

    P --> T[Persist waiting_user ExecutionState]
    T --> UQ[Return question to user]

    Q --> V[Persist completed ExecutionState]
    V --> W[Append assistant turn to chatHistory]
    W --> X[maybeCompressHistory]
    X --> Y[Persist TaskState]
    Y --> Z[Final response]

Memory Relationship Diagram

flowchart LR
    CH[chatHistory\nin-memory\nrecent turns]
    TS[TaskState\npersisted summary\nsystem_config]
    ES[ExecutionState\npersisted workflow\nsystem_config]
    PL[Planner Prompt]

    CH -->|recent raw turns| PL
    ES -->|current workflow JSON| PL
    TS -->|durable structured context| PL

    CH -->|old turns compressed| TS
    PL -->|plan / observations / status| ES

State Transition Diagram

stateDiagram-v2
    [*] --> planning
    planning --> running: plan created
    running --> waiting_user: ask_user step
    waiting_user --> planning: user replies
    running --> completed: respond step finished
    running --> failed: step error
    failed --> planning: retry / continue / config-trader reset
    completed --> planning: new relevant request or retry flow

Known Design Tradeoffs

Strengths

  • separates short-term chat from durable task summary
  • allows blocked flows to resume
  • supports replanning after every meaningful step
  • can recover from stale assumptions better for dynamic config/trader requests

Weaknesses

  • TaskState is still summary-driven, so summarization quality matters
  • planner still depends on model compliance for some transitions
  • ExecutionState is single-track per user, not multiple concurrent workflows
  • config/trader intent detection is heuristic and keyword-based

Practical Guidance

When to trust TaskState

Trust it for:

  • user intent continuity
  • open loops
  • durable facts

Do not trust it for:

  • whether current exchange/model/trader config exists now
  • whether a specific operational action is currently possible

When to trust ExecutionState

Trust it for:

  • current plan continuity
  • exact blocked step
  • latest observation chain

Do not trust it blindly when:

  • user has changed configuration outside the chat
  • the system capabilities changed after deployment

When to fetch live state again

Always prefer fresh tool snapshots before answering about:

  • existing model configs
  • existing exchange configs
  • existing traders
  • whether trader creation can proceed

Suggested Future Improvements

  • add workflow versioning so capability changes invalidate stale ExecutionState
  • separate waiting_user_confirmation from generic waiting_user
  • introduce code-level handling for short confirmations such as , , 继续
  • move dynamic state refresh from heuristic to explicit planner preflight stage
  • support multiple concurrent execution sessions per user if needed