Files

T

Soulter 565c371e5c feat: enhance Live Mode with text input functionality and UI improvements

- Added a text input panel to allow users to send plain text messages while in Live Mode.
- Updated the LiveMode.vue component to handle text input and integrate it with WebSocket communication.
- Improved the layout and styling of the Live Mode interface for better user experience.
- Documented the new `text_input` message type in the Live API README.

2026-03-16 22:36:29 +08:00

6.7 KiB

Raw Blame History

AstrBot Live API Protocol

This document describes the current WebSocket protocol for AstrBot Live API.

Endpoint

Legacy JWT endpoint: /api/live_chat/ws
Legacy unified JWT endpoint: /api/unified_chat/ws
Open API endpoint: /api/v1/live/ws

Authentication

Legacy dashboard endpoints

Pass a dashboard JWT in the token query parameter.

Example:

ws://localhost:6185/api/live_chat/ws?token=<dashboard_jwt>

Open API endpoint

Use an API key and provide username in the query string.

Examples:

ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice
ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice&ct=chat

ct values:

live: voice conversation mode
chat: unified chat mode over the same WebSocket transport

The Open API endpoint reuses the chat API key scope.

Transport

Protocol: WebSocket
Payload format: UTF-8 JSON text frames
Audio upload format in live mode:
- client sends raw PCM frames encoded as Base64
- sample rate: 16000
- channels: 1
- sample width: 16-bit

Top-Level Envelope

Client to server

{
  "t": "message_type",
  "...": "message specific fields"
}

When using the unified socket, the client can also include:

{
  "ct": "live|chat",
  "t": "message_type"
}

Server to client

Legacy live mode uses:

{
  "t": "message_type",
  "data": {}
}

Unified chat mode uses:

{
  "ct": "chat",
  "type": "message_type",
  "data": {}
}

Some forwarded chat frames may also contain t, streaming, chain_type, message_id, or session_id.

Live Mode

Client messages

`start_speaking`

Start a voice capture segment.

{
  "t": "start_speaking",
  "stamp": "seg_001"
}

`speaking_part`

Send one audio frame.

{
  "t": "speaking_part",
  "data": "<base64_pcm_bytes>"
}

`end_speaking`

Finish the current voice capture segment.

{
  "t": "end_speaking",
  "stamp": "seg_001"
}

`text_input`

Send a plain text input directly while using ct=live. The server will still route through Live mode with TTS and interrupt handling.

{
  "t": "text_input",
  "text": "Hello, what is the weather today?"
}

`interrupt`

Interrupt the current model or TTS response.

{
  "t": "interrupt"
}

Server messages

`metrics`

Performance and provider metadata.

Example:

{
  "t": "metrics",
  "data": {
    "wav_assemble_time": 0.12,
    "stt": "whisper_api",
    "llm_ttft": 0.84,
    "tts_total_time": 1.72
  }
}

`user_msg`

STT result from the uploaded audio.

{
  "t": "user_msg",
  "data": {
    "text": "Hello there",
    "ts": 1710000000000
  }
}

`bot_delta_chunk`

Raw model text delta. This is the token or chunk level stream and is not sentence segmented.

{
  "t": "bot_delta_chunk",
  "data": {
    "text": "Hel"
  }
}

Notes:

This event is generated directly from the model streaming path.
It is independent from TTS chunking.
Consumers should append data.text to a local buffer.

`bot_text_chunk`

Text associated with the current TTS chunk. This is usually sentence or phrase segmented.

{
  "t": "bot_text_chunk",
  "data": {
    "text": "Hello there."
  }
}

Notes:

This event is aligned to TTS output, not raw token streaming.
It may be coarser than bot_delta_chunk.

`response`

One TTS audio chunk, Base64 encoded.

{
  "t": "response",
  "data": "<base64_audio_bytes>"
}

`bot_msg`

Final bot text when the response completed without audio streaming.

{
  "t": "bot_msg",
  "data": {
    "text": "Final reply text",
    "ts": 1710000001234
  }
}

`stop_play`

Stop client-side audio playback because the response was interrupted.

{
  "t": "stop_play"
}

`end`

Marks the end of the current response turn.

{
  "t": "end"
}

`error`

Recoverable or terminal processing error.

{
  "t": "error",
  "data": "error message"
}

Unified Chat Mode

Set ct=chat on the Open API endpoint or include "ct": "chat" in each client frame when using /api/unified_chat/ws.

Client messages

`bind`

Subscribe to an existing webchat session.

{
  "ct": "chat",
  "t": "bind",
  "session_id": "session_001"
}

`send`

Send a chat request.

{
  "ct": "chat",
  "t": "send",
  "username": "alice",
  "session_id": "session_001",
  "message_id": "msg_001",
  "message": [
    {
      "type": "plain",
      "text": "Please summarize this"
    }
  ],
  "selected_provider": "openai_chat_completion",
  "selected_model": "gpt-4.1-mini",
  "enable_streaming": true
}

message uses the same message-part schema as POST /api/v1/chat.

`interrupt`

Interrupt the current chat response.

{
  "ct": "chat",
  "t": "interrupt"
}

Server messages

`session_bound`

Acknowledges a successful bind.

{
  "ct": "chat",
  "type": "session_bound",
  "session_id": "session_001",
  "message_id": "ws_sub_xxx"
}

Forwarded streaming events

The server forwards the normal webchat queue payloads. Common examples:

{
  "ct": "chat",
  "type": "plain",
  "data": "Hello",
  "streaming": true,
  "chain_type": null,
  "message_id": "msg_001"
}

{
  "ct": "chat",
  "type": "image",
  "data": "[IMAGE]file.jpg",
  "streaming": false,
  "message_id": "msg_001"
}

{
  "ct": "chat",
  "type": "agent_stats",
  "data": {
    "time_to_first_token": 0.8
  }
}

{
  "ct": "chat",
  "type": "message_saved",
  "data": {
    "id": 123,
    "created_at": "2026-03-16T10:00:00Z"
  }
}

{
  "ct": "chat",
  "type": "end",
  "data": "",
  "streaming": false,
  "message_id": "msg_001"
}

Chat errors

{
  "ct": "chat",
  "t": "error",
  "code": "INVALID_MESSAGE_FORMAT",
  "data": "message must be list"
}

Recommended Client Strategy

For live mode:

Append every bot_delta_chunk.data.text into a raw transcript buffer.
Use bot_text_chunk only when you need text aligned with audio playback.
Decode and play each response audio chunk in arrival order.
Reset per-turn buffers after end.

For chat mode:

Treat plain + streaming=true as incremental text.
Treat complete or end as the end of a response turn.
Persist message_saved metadata if you need server-side history IDs.

Compatibility Notes

bot_text_chunk remains sentence or phrase segmented for TTS compatibility.
bot_delta_chunk is the new delta-level text event for real-time rendering.
The legacy JWT endpoints and the new Open API endpoint share the same runtime behavior after authentication.

6.7 KiB Raw Blame History

AstrBot Live API Protocol

Endpoint

Authentication

Legacy dashboard endpoints

Open API endpoint

Transport

Top-Level Envelope

Client to server

Server to client

Live Mode

Client messages

start_speaking

speaking_part

end_speaking

text_input

interrupt

Server messages

metrics

user_msg

bot_delta_chunk

bot_text_chunk

response

bot_msg

stop_play

end

error

Unified Chat Mode

Client messages

bind

send

interrupt

Server messages

session_bound

Forwarded streaming events

Chat errors

Recommended Client Strategy

Compatibility Notes

6.7 KiB

Raw Blame History

`start_speaking`

`speaking_part`

`end_speaking`

`text_input`

`interrupt`

`metrics`

`user_msg`

`bot_delta_chunk`

`bot_text_chunk`

`response`

`bot_msg`

`stop_play`

`end`

`error`

`bind`

`send`

`interrupt`

`session_bound`