Files
AstrBot/docs/live-api/README.md
T
Soulter 565c371e5c feat: enhance Live Mode with text input functionality and UI improvements
- Added a text input panel to allow users to send plain text messages while in Live Mode.
- Updated the LiveMode.vue component to handle text input and integrate it with WebSocket communication.
- Improved the layout and styling of the Live Mode interface for better user experience.
- Documented the new `text_input` message type in the Live API README.
2026-03-16 22:36:29 +08:00

435 lines
6.7 KiB
Markdown

# AstrBot Live API Protocol
This document describes the current WebSocket protocol for AstrBot Live API.
## Endpoint
- Legacy JWT endpoint: `/api/live_chat/ws`
- Legacy unified JWT endpoint: `/api/unified_chat/ws`
- Open API endpoint: `/api/v1/live/ws`
## Authentication
### Legacy dashboard endpoints
Pass a dashboard JWT in the `token` query parameter.
Example:
```text
ws://localhost:6185/api/live_chat/ws?token=<dashboard_jwt>
```
### Open API endpoint
Use an API key and provide `username` in the query string.
Examples:
```text
ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice
ws://localhost:6185/api/v1/live/ws?api_key=<api_key>&username=alice&ct=chat
```
`ct` values:
- `live`: voice conversation mode
- `chat`: unified chat mode over the same WebSocket transport
The Open API endpoint reuses the `chat` API key scope.
## Transport
- Protocol: WebSocket
- Payload format: UTF-8 JSON text frames
- Audio upload format in `live` mode:
- client sends raw PCM frames encoded as Base64
- sample rate: `16000`
- channels: `1`
- sample width: `16-bit`
## Top-Level Envelope
### Client to server
```json
{
"t": "message_type",
"...": "message specific fields"
}
```
When using the unified socket, the client can also include:
```json
{
"ct": "live|chat",
"t": "message_type"
}
```
### Server to client
Legacy `live` mode uses:
```json
{
"t": "message_type",
"data": {}
}
```
Unified `chat` mode uses:
```json
{
"ct": "chat",
"type": "message_type",
"data": {}
}
```
Some forwarded `chat` frames may also contain `t`, `streaming`, `chain_type`, `message_id`, or `session_id`.
## Live Mode
### Client messages
#### `start_speaking`
Start a voice capture segment.
```json
{
"t": "start_speaking",
"stamp": "seg_001"
}
```
#### `speaking_part`
Send one audio frame.
```json
{
"t": "speaking_part",
"data": "<base64_pcm_bytes>"
}
```
#### `end_speaking`
Finish the current voice capture segment.
```json
{
"t": "end_speaking",
"stamp": "seg_001"
}
```
#### `text_input`
Send a plain text input directly while using `ct=live`. The server will still route through Live mode with TTS and interrupt handling.
```json
{
"t": "text_input",
"text": "Hello, what is the weather today?"
}
```
#### `interrupt`
Interrupt the current model or TTS response.
```json
{
"t": "interrupt"
}
```
### Server messages
#### `metrics`
Performance and provider metadata.
Example:
```json
{
"t": "metrics",
"data": {
"wav_assemble_time": 0.12,
"stt": "whisper_api",
"llm_ttft": 0.84,
"tts_total_time": 1.72
}
}
```
#### `user_msg`
STT result from the uploaded audio.
```json
{
"t": "user_msg",
"data": {
"text": "Hello there",
"ts": 1710000000000
}
}
```
#### `bot_delta_chunk`
Raw model text delta. This is the token or chunk level stream and is not sentence segmented.
```json
{
"t": "bot_delta_chunk",
"data": {
"text": "Hel"
}
}
```
Notes:
- This event is generated directly from the model streaming path.
- It is independent from TTS chunking.
- Consumers should append `data.text` to a local buffer.
#### `bot_text_chunk`
Text associated with the current TTS chunk. This is usually sentence or phrase segmented.
```json
{
"t": "bot_text_chunk",
"data": {
"text": "Hello there."
}
}
```
Notes:
- This event is aligned to TTS output, not raw token streaming.
- It may be coarser than `bot_delta_chunk`.
#### `response`
One TTS audio chunk, Base64 encoded.
```json
{
"t": "response",
"data": "<base64_audio_bytes>"
}
```
#### `bot_msg`
Final bot text when the response completed without audio streaming.
```json
{
"t": "bot_msg",
"data": {
"text": "Final reply text",
"ts": 1710000001234
}
}
```
#### `stop_play`
Stop client-side audio playback because the response was interrupted.
```json
{
"t": "stop_play"
}
```
#### `end`
Marks the end of the current response turn.
```json
{
"t": "end"
}
```
#### `error`
Recoverable or terminal processing error.
```json
{
"t": "error",
"data": "error message"
}
```
## Unified Chat Mode
Set `ct=chat` on the Open API endpoint or include `"ct": "chat"` in each client frame when using `/api/unified_chat/ws`.
### Client messages
#### `bind`
Subscribe to an existing webchat session.
```json
{
"ct": "chat",
"t": "bind",
"session_id": "session_001"
}
```
#### `send`
Send a chat request.
```json
{
"ct": "chat",
"t": "send",
"username": "alice",
"session_id": "session_001",
"message_id": "msg_001",
"message": [
{
"type": "plain",
"text": "Please summarize this"
}
],
"selected_provider": "openai_chat_completion",
"selected_model": "gpt-4.1-mini",
"enable_streaming": true
}
```
`message` uses the same message-part schema as `POST /api/v1/chat`.
#### `interrupt`
Interrupt the current chat response.
```json
{
"ct": "chat",
"t": "interrupt"
}
```
### Server messages
#### `session_bound`
Acknowledges a successful `bind`.
```json
{
"ct": "chat",
"type": "session_bound",
"session_id": "session_001",
"message_id": "ws_sub_xxx"
}
```
#### Forwarded streaming events
The server forwards the normal webchat queue payloads. Common examples:
```json
{
"ct": "chat",
"type": "plain",
"data": "Hello",
"streaming": true,
"chain_type": null,
"message_id": "msg_001"
}
```
```json
{
"ct": "chat",
"type": "image",
"data": "[IMAGE]file.jpg",
"streaming": false,
"message_id": "msg_001"
}
```
```json
{
"ct": "chat",
"type": "agent_stats",
"data": {
"time_to_first_token": 0.8
}
}
```
```json
{
"ct": "chat",
"type": "message_saved",
"data": {
"id": 123,
"created_at": "2026-03-16T10:00:00Z"
}
}
```
```json
{
"ct": "chat",
"type": "end",
"data": "",
"streaming": false,
"message_id": "msg_001"
}
```
#### Chat errors
```json
{
"ct": "chat",
"t": "error",
"code": "INVALID_MESSAGE_FORMAT",
"data": "message must be list"
}
```
## Recommended Client Strategy
For `live` mode:
1. Append every `bot_delta_chunk.data.text` into a raw transcript buffer.
2. Use `bot_text_chunk` only when you need text aligned with audio playback.
3. Decode and play each `response` audio chunk in arrival order.
4. Reset per-turn buffers after `end`.
For `chat` mode:
1. Treat `plain + streaming=true` as incremental text.
2. Treat `complete` or `end` as the end of a response turn.
3. Persist `message_saved` metadata if you need server-side history IDs.
## Compatibility Notes
- `bot_text_chunk` remains sentence or phrase segmented for TTS compatibility.
- `bot_delta_chunk` is the new delta-level text event for real-time rendering.
- The legacy JWT endpoints and the new Open API endpoint share the same runtime behavior after authentication.