Files
AstrBot/docs/live-api/README.md
T
Soulter 565c371e5c feat: enhance Live Mode with text input functionality and UI improvements
- Added a text input panel to allow users to send plain text messages while in Live Mode.
- Updated the LiveMode.vue component to handle text input and integrate it with WebSocket communication.
- Improved the layout and styling of the Live Mode interface for better user experience.
- Documented the new `text_input` message type in the Live API README.
2026-03-16 22:36:29 +08:00

6.7 KiB

AstrBot Live API Protocol

This document describes the current WebSocket protocol for AstrBot Live API.

Endpoint

  • Legacy JWT endpoint: /api/live_chat/ws
  • Legacy unified JWT endpoint: /api/unified_chat/ws
  • Open API endpoint: /api/v1/live/ws

Authentication

Legacy dashboard endpoints

Pass a dashboard JWT in the token query parameter.

Example:

ws://localhost:6185/api/live_chat/ws?token=<dashboard_jwt>

Open API endpoint

Use an API key and provide username in the query string.

Examples:

ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice
ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice&ct=chat

ct values:

  • live: voice conversation mode
  • chat: unified chat mode over the same WebSocket transport

The Open API endpoint reuses the chat API key scope.

Transport

  • Protocol: WebSocket
  • Payload format: UTF-8 JSON text frames
  • Audio upload format in live mode:
    • client sends raw PCM frames encoded as Base64
    • sample rate: 16000
    • channels: 1
    • sample width: 16-bit

Top-Level Envelope

Client to server

{
  "t": "message_type",
  "...": "message specific fields"
}

When using the unified socket, the client can also include:

{
  "ct": "live|chat",
  "t": "message_type"
}

Server to client

Legacy live mode uses:

{
  "t": "message_type",
  "data": {}
}

Unified chat mode uses:

{
  "ct": "chat",
  "type": "message_type",
  "data": {}
}

Some forwarded chat frames may also contain t, streaming, chain_type, message_id, or session_id.

Live Mode

Client messages

start_speaking

Start a voice capture segment.

{
  "t": "start_speaking",
  "stamp": "seg_001"
}

speaking_part

Send one audio frame.

{
  "t": "speaking_part",
  "data": "<base64_pcm_bytes>"
}

end_speaking

Finish the current voice capture segment.

{
  "t": "end_speaking",
  "stamp": "seg_001"
}

text_input

Send a plain text input directly while using ct=live. The server will still route through Live mode with TTS and interrupt handling.

{
  "t": "text_input",
  "text": "Hello, what is the weather today?"
}

interrupt

Interrupt the current model or TTS response.

{
  "t": "interrupt"
}

Server messages

metrics

Performance and provider metadata.

Example:

{
  "t": "metrics",
  "data": {
    "wav_assemble_time": 0.12,
    "stt": "whisper_api",
    "llm_ttft": 0.84,
    "tts_total_time": 1.72
  }
}

user_msg

STT result from the uploaded audio.

{
  "t": "user_msg",
  "data": {
    "text": "Hello there",
    "ts": 1710000000000
  }
}

bot_delta_chunk

Raw model text delta. This is the token or chunk level stream and is not sentence segmented.

{
  "t": "bot_delta_chunk",
  "data": {
    "text": "Hel"
  }
}

Notes:

  • This event is generated directly from the model streaming path.
  • It is independent from TTS chunking.
  • Consumers should append data.text to a local buffer.

bot_text_chunk

Text associated with the current TTS chunk. This is usually sentence or phrase segmented.

{
  "t": "bot_text_chunk",
  "data": {
    "text": "Hello there."
  }
}

Notes:

  • This event is aligned to TTS output, not raw token streaming.
  • It may be coarser than bot_delta_chunk.

response

One TTS audio chunk, Base64 encoded.

{
  "t": "response",
  "data": "<base64_audio_bytes>"
}

bot_msg

Final bot text when the response completed without audio streaming.

{
  "t": "bot_msg",
  "data": {
    "text": "Final reply text",
    "ts": 1710000001234
  }
}

stop_play

Stop client-side audio playback because the response was interrupted.

{
  "t": "stop_play"
}

end

Marks the end of the current response turn.

{
  "t": "end"
}

error

Recoverable or terminal processing error.

{
  "t": "error",
  "data": "error message"
}

Unified Chat Mode

Set ct=chat on the Open API endpoint or include "ct": "chat" in each client frame when using /api/unified_chat/ws.

Client messages

bind

Subscribe to an existing webchat session.

{
  "ct": "chat",
  "t": "bind",
  "session_id": "session_001"
}

send

Send a chat request.

{
  "ct": "chat",
  "t": "send",
  "username": "alice",
  "session_id": "session_001",
  "message_id": "msg_001",
  "message": [
    {
      "type": "plain",
      "text": "Please summarize this"
    }
  ],
  "selected_provider": "openai_chat_completion",
  "selected_model": "gpt-4.1-mini",
  "enable_streaming": true
}

message uses the same message-part schema as POST /api/v1/chat.

interrupt

Interrupt the current chat response.

{
  "ct": "chat",
  "t": "interrupt"
}

Server messages

session_bound

Acknowledges a successful bind.

{
  "ct": "chat",
  "type": "session_bound",
  "session_id": "session_001",
  "message_id": "ws_sub_xxx"
}

Forwarded streaming events

The server forwards the normal webchat queue payloads. Common examples:

{
  "ct": "chat",
  "type": "plain",
  "data": "Hello",
  "streaming": true,
  "chain_type": null,
  "message_id": "msg_001"
}
{
  "ct": "chat",
  "type": "image",
  "data": "[IMAGE]file.jpg",
  "streaming": false,
  "message_id": "msg_001"
}
{
  "ct": "chat",
  "type": "agent_stats",
  "data": {
    "time_to_first_token": 0.8
  }
}
{
  "ct": "chat",
  "type": "message_saved",
  "data": {
    "id": 123,
    "created_at": "2026-03-16T10:00:00Z"
  }
}
{
  "ct": "chat",
  "type": "end",
  "data": "",
  "streaming": false,
  "message_id": "msg_001"
}

Chat errors

{
  "ct": "chat",
  "t": "error",
  "code": "INVALID_MESSAGE_FORMAT",
  "data": "message must be list"
}

For live mode:

  1. Append every bot_delta_chunk.data.text into a raw transcript buffer.
  2. Use bot_text_chunk only when you need text aligned with audio playback.
  3. Decode and play each response audio chunk in arrival order.
  4. Reset per-turn buffers after end.

For chat mode:

  1. Treat plain + streaming=true as incremental text.
  2. Treat complete or end as the end of a response turn.
  3. Persist message_saved metadata if you need server-side history IDs.

Compatibility Notes

  • bot_text_chunk remains sentence or phrase segmented for TTS compatibility.
  • bot_delta_chunk is the new delta-level text event for real-time rendering.
  • The legacy JWT endpoints and the new Open API endpoint share the same runtime behavior after authentication.