483048e3dc
* feat: add bocha web search tool (#4902) * add bocha web search tool * Revert "add bocha web search tool" This reverts commit1b36d75a17. * add bocha web search tool * fix: correct temporary_cache spelling and update supported tools for web search * ruff --------- Co-authored-by: Soulter <905617992@qq.com> * fix: messages[x] assistant content must contain at least one part (#4928) * fix: messages[x] assistant content must contain at least one part fixes: #4876 * ruff format * chore: bump version to 4.14.5 (#4930) * feat: implement feishu / lark media file handling utilities for file, audio and video processing (#4938) * feat: implement media file handling utilities for audio and video processing * feat: refactor file upload handling for audio and video in LarkMessageEvent * feat: add cleanup for failed audio and video conversion outputs in media_utils * feat: add utility methods for sending messages and uploading files in LarkMessageEvent * fix: correct spelling of 'temporary' in SharedPreferences class * perf: optimize webchat and wecom ai queue lifecycle (#4941) * perf: optimize webchat and wecom ai queue lifecycle * perf: enhance webchat back queue management with conversation ID support * fix: localize provider source config UI (#4933) * fix: localize provider source ui * feat: localize provider metadata keys * chore: add provider metadata translations * chore: format provider i18n changes * fix: preserve metadata fields in i18n conversion * fix: internationalize platform config and dialog * fix: add Weixin official account platform icon --------- Co-authored-by: Soulter <905617992@qq.com> * chore: bump version to 4.14.6 * feat: add provider-souce-level proxy (#4949) * feat: 添加 Provider 级别代理支持及请求失败日志 * refactor: simplify provider source configuration structure * refactor: move env proxy fallback logic to log_connection_failure * refactor: update client proxy handling and add terminate method for cleanup * refactor: update no_proxy configuration to remove redundant subnet --------- Co-authored-by: Soulter <905617992@qq.com> * feat(ComponentPanel): implement permission management for dashboard (#4887) * feat(backend): add permission update api * feat(useCommandActions): add updatePermission action and translations * feat(dashboard): implement permission editing ui * style: fix import sorting in command.py * refactor(backend): extract permission update logic to service * feat(i18n): add success and failure messages for command updates --------- Co-authored-by: Soulter <905617992@qq.com> * feat: 允许 LLM 预览工具返回的图片并自主决定是否发送 (#4895) * feat: 允许 LLM 预览工具返回的图片并自主决定是否发送 * 复用 send_message_to_user 替代独立的图片发送工具 * feat: implement _HandleFunctionToolsResult class for improved tool response handling * docs: add path handling guidelines to AGENTS.md --------- Co-authored-by: Soulter <905617992@qq.com> * feat(telegram): 添加媒体组(相册)支持 / add media group (album) support (#4893) * feat(telegram): 添加媒体组(相册)支持 / add media group (album) support ## 功能说明 支持 Telegram 的媒体组消息(相册),将多张图片/视频合并为一条消息处理,而不是分散成多条消息。 ## 主要改动 ### 1. 初始化媒体组缓存 (__init__) - 添加 `media_group_cache` 字典存储待处理的媒体组消息 - 使用 2.5 秒超时收集媒体组消息(基于社区最佳实践) - 最大等待时间 10 秒(防止永久等待) ### 2. 消息处理流程 (message_handler) - 检测 `media_group_id` 判断是否为媒体组消息 - 媒体组消息走特殊处理流程,避免分散处理 ### 3. 媒体组消息缓存 (handle_media_group_message) - 缓存收到的媒体组消息 - 使用 APScheduler 实现防抖(debounce)机制 - 每收到新消息时重置超时计时器 - 超时后触发统一处理 ### 4. 媒体组合并处理 (process_media_group) - 从缓存中取出所有媒体项 - 使用第一条消息作为基础(保留文本、回复等信息) - 依次添加所有图片、视频、文档到消息链 - 将合并后的消息发送到处理流程 ## 技术方案论证 Telegram Bot API 在处理媒体组时的设计限制: 1. 将媒体组的每个消息作为独立的 update 发送 2. 每个 update 带有相同的 `media_group_id` 3. **不提供**组的总数、结束标志或一次性完整组的机制 因此,bot 必须自行收集消息,并通过硬编码超时(timeout/delay)等待可能延迟到达的消息。 这是目前唯一可靠的方案,被官方实现、主流框架和开发者社区广泛采用。 ### 官方和社区证据: - **Telegram Bot API 服务器实现(tdlib)**:明确指出缺少结束标志或总数信息 https://github.com/tdlib/telegram-bot-api/issues/643 - **Telegram Bot API 服务器 issue**:讨论媒体组处理的不便性,推荐使用超时机制 https://github.com/tdlib/telegram-bot-api/issues/339 - **Telegraf(Node.js 框架)**:专用媒体组中间件使用 timeout 控制等待时间 https://github.com/DieTime/telegraf-media-group - **StackOverflow 讨论**:无法一次性获取媒体组所有文件,必须手动收集 https://stackoverflow.com/questions/50180048/telegram-api-get-all-uploaded-photos-by-media-group-id - **python-telegram-bot 社区**:确认媒体组消息单独到达,需手动处理 https://github.com/python-telegram-bot/python-telegram-bot/discussions/3143 - **Telegram Bot API 官方文档**:仅定义 `media_group_id` 为可选字段,不提供获取完整组的接口 https://core.telegram.org/bots/api#message ## 实现细节 - 使用 2.5 秒超时收集媒体组消息(基于社区最佳实践) - 最大等待时间 10 秒(防止永久等待) - 采用防抖(debounce)机制:每收到新消息重置计时器 - 利用 APScheduler 实现延迟处理和任务调度 ## 测试验证 - ✅ 发送 5 张图片相册,成功合并为一条消息 - ✅ 保留原始文本说明和回复信息 - ✅ 支持图片、视频、文档混合的媒体组 - ✅ 日志显示 Processing media group <media_group_id> with 5 items ## 代码变更 - 文件:astrbot/core/platform/sources/telegram/tg_adapter.py - 新增代码:124 行 - 新增方法:handle_media_group_message(), process_media_group() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor(telegram): 优化媒体组处理性能和可靠性 根据代码审查反馈改进: 1. 实现 media_group_max_wait 防止无限延迟 - 跟踪媒体组创建时间,超过最大等待时间立即处理 - 最坏情况下 10 秒内必定处理,防止消息持续到达导致无限延迟 2. 移除手动 job 查找优化性能 - 删除 O(N) 的 get_jobs() 循环扫描 - 依赖 replace_existing=True 自动替换任务 3. 重用 convert_message 减少代码重复 - 统一所有媒体类型转换逻辑 - 未来添加新媒体类型只需修改一处 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(telegram): handle missing message in media group processing and improve logging messages --------- Co-authored-by: Ubuntu <ubuntu@localhost.localdomain> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Soulter <905617992@qq.com> * feat: add welcome feature with localized content and onboarding steps * fix: correct height attribute to max-height for dialog component * feat: supports electron app (#4952) * feat: add desktop wrapper with frontend-only packaging * docs: add desktop build docs and track dashboard lockfile * fix: track desktop lockfile for npm ci * fix: allow custom install directory for windows installer * chore: migrate desktop workflow to pnpm * fix(desktop): build AppImage only on Linux * fix(desktop): harden packaged startup and backend bundling * fix(desktop): adapt packaged restart and plugin dependency flow * fix(desktop): prevent backend respawn race on quit * fix(desktop): prefer pyproject version for desktop packaging * fix(desktop): improve startup loading UX and reduce flicker * ci: add desktop multi-platform release workflow * ci: fix desktop release build and mac runner labels * ci: disable electron-builder auto publish in desktop build * ci: avoid electron-builder publish path in build matrix * ci: normalize desktop release artifact names * ci: exclude blockmap files from desktop release assets * ci: prefix desktop release assets with AstrBot and purge blockmaps * feat: add electron bridge types and expose backend control methods in preload script * Update startup screen assets and styles - Changed the icon from PNG to SVG format for better scalability. - Updated the border color from #d0d0d0 to #eeeeee for a softer appearance. - Adjusted the width of the startup screen from 460px to 360px for improved responsiveness. * Update .gitignore to include package.json * chore: remove desktop gitkeep ignore exceptions * docs: update desktop troubleshooting for current runtime behavior * refactor(desktop): modularize runtime and harden startup flow --------- Co-authored-by: Soulter <905617992@qq.com> Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * fix: dedupe preset messages (#4961) * feat: enhance package.json with resource filters and compression settings * chore: update Python version requirements to 3.12 (#4963) * chore: bump version to 4.14.7 * feat: refactor release workflow and add special update handling for electron app (#4969) * chore: bump version to 4.14.8 and bump faiss-cpu version up to date * chore: auto ann fix by ruff (#4903) * chore: auto fix by ruff * refactor: 统一修正返回类型注解为 None/bool 以匹配实现 * refactor: 将 _get_next_page 改为异步并移除多余的请求错误抛出 * refactor: 将 get_client 的返回类型改为 object * style: 为 LarkMessageEvent 的相关方法添加返回类型注解 None --------- Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * fix: prepare OpenSSL via vcpkg for Windows ARM64 * ci: change ghcr namespace * chore: update pydantic dependency version (#4980) * feat: add delete button to persona management dialog (#4978) * Initial plan * feat: add delete button to persona management dialog - Added delete button to PersonaForm dialog (only visible when editing) - Implemented deletePersona method with confirmation dialog - Connected delete event to PersonaManager for proper handling - Button positioned on left side of dialog actions for clear separation - Uses existing i18n translations for delete button and messages Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * fix: use finally block to ensure saving state is reset - Moved `this.saving = false` to finally block in deletePersona - Ensures UI doesn't stay in saving state after errors - Follows best practices for state management Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * feat: enhance Dingtalk adapter with active push message and image, video, audio message type (#4986) * fix: handle pip install execution in frozen runtime (#4985) * fix: handle pip install execution in frozen runtime * fix: harden pip subprocess fallback handling * fix: collect certifi data in desktop backend build (#4995) * feat: 企业微信应用 支持主动消息推送,并优化企微应用、微信公众号、微信客服音频相关的处理 (#4998) * feat: 企业微信智能机器人支持主动消息推送以及发送视频、文件等消息类型支持 (#4999) * feat: enhance WecomAIBotAdapter and WecomAIBotMessageEvent for improved streaming message handling (#5000) fixes: #3965 * feat: enhance persona tool management and update UI localization for subagent orchestration (#4990) * feat: enhance persona tool management and update UI localization for subagent orchestration * fix: remove debug logging for final ProviderRequest in build_main_agent function * perf: 稳定源码与 Electron 打包环境下的 pip 安装行为,并修复非 Electron 环境下点击 WebUI 更新按钮时出现跳转对话框的问题 (#4996) * fix: handle pip install execution in frozen runtime * fix: harden pip subprocess fallback handling * fix: scope global data root to packaged electron runtime * refactor: inline frozen runtime check for electron guard * fix: prefer current interpreter for source pip installs * fix: avoid resolving venv python symlink for pip * refactor: share runtime environment detection utilities * fix: improve error message when pip module is unavailable * fix: raise ImportError when pip module is unavailable * fix: preserve ImportError semantics for missing pip * fix: 修复非electron app环境更新时仍然显示electron更新对话框的问题 --------- Co-authored-by: Soulter <905617992@qq.com> * fix: 'HandoffTool' object has no attribute 'agent' (#5005) * fix: 移动agent的位置到super().__init__之后 * add: 添加一行注释 * chore(deps): bump the github-actions group with 2 updates (#5006) Bumps the github-actions group with 2 updates: [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) and [actions/download-artifact](https://github.com/actions/download-artifact). Updates `astral-sh/setup-uv` from 6 to 7 - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](https://github.com/astral-sh/setup-uv/compare/v6...v7) Updates `actions/download-artifact` from 6 to 7 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v6...v7) --- updated-dependencies: - dependency-name: astral-sh/setup-uv dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/download-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: stabilize packaged runtime pip/ssl behavior and mac font fallback (#5007) * fix: patch pip distlib finder for frozen electron runtime * fix: use certifi CA bundle for runtime SSL requests * fix: configure certifi CA before core imports * fix: improve mac font fallback for dashboard text * fix: harden frozen pip patch and unify TLS connector * refactor: centralize dashboard CJK font fallback stacks * perf: reuse TLS context and avoid repeated frozen pip patch * refactor: bootstrap TLS setup before core imports * fix: use async confirm dialog for provider deletions * fix: replace native confirm dialogs in dashboard - Add shared confirm helper in dashboard/src/utils/confirmDialog.ts for async dialog usage with safe fallback. - Migrate provider, chat, config, session, platform, persona, MCP, backup, and knowledge-base delete/close confirmations to use the shared helper. - Remove scattered inline confirm handling to keep behavior consistent and avoid native blocking dialog focus/caret issues in Electron. * fix: capture runtime bootstrap logs after logger init - Add bootstrap record buffer in runtime_bootstrap for early TLS patch logs before logger is ready. - Flush buffered bootstrap logs to astrbot logger at process startup in main.py. - Include concrete exception details for TLS bootstrap failures to improve diagnosis. * fix: harden runtime bootstrap and unify confirm handling - Simplify bootstrap log buffering and add a public initialize hook for non-main startup paths. - Guard aiohttp TLS patching with feature/type checks and keep graceful fallback when internals are unavailable. - Standardize dashboard confirmation flow via shared confirm helpers across composition and options API components. * refactor: simplify runtime tls bootstrap and tighten confirm typing * refactor: align ssl helper namespace and confirm usage * fix: 修复 Windows 打包版后端重启失败问题 (#5009) * fix: patch pip distlib finder for frozen electron runtime * fix: use certifi CA bundle for runtime SSL requests * fix: configure certifi CA before core imports * fix: improve mac font fallback for dashboard text * fix: harden frozen pip patch and unify TLS connector * refactor: centralize dashboard CJK font fallback stacks * perf: reuse TLS context and avoid repeated frozen pip patch * refactor: bootstrap TLS setup before core imports * fix: use async confirm dialog for provider deletions * fix: replace native confirm dialogs in dashboard - Add shared confirm helper in dashboard/src/utils/confirmDialog.ts for async dialog usage with safe fallback. - Migrate provider, chat, config, session, platform, persona, MCP, backup, and knowledge-base delete/close confirmations to use the shared helper. - Remove scattered inline confirm handling to keep behavior consistent and avoid native blocking dialog focus/caret issues in Electron. * fix: capture runtime bootstrap logs after logger init - Add bootstrap record buffer in runtime_bootstrap for early TLS patch logs before logger is ready. - Flush buffered bootstrap logs to astrbot logger at process startup in main.py. - Include concrete exception details for TLS bootstrap failures to improve diagnosis. * fix: harden runtime bootstrap and unify confirm handling - Simplify bootstrap log buffering and add a public initialize hook for non-main startup paths. - Guard aiohttp TLS patching with feature/type checks and keep graceful fallback when internals are unavailable. - Standardize dashboard confirmation flow via shared confirm helpers across composition and options API components. * refactor: simplify runtime tls bootstrap and tighten confirm typing * refactor: align ssl helper namespace and confirm usage * fix: avoid frozen restart crash from multiprocessing import * fix: include missing frozen dependencies for windows backend * fix: use execv for stable backend reboot args * Revert "fix: use execv for stable backend reboot args" This reverts commit9cc27becff. * Revert "fix: include missing frozen dependencies for windows backend" This reverts commit52554bea1f. * Revert "fix: avoid frozen restart crash from multiprocessing import" This reverts commit10548645b0. * fix: reset pyinstaller onefile env before reboot * fix: unify electron restart path and tray-exit backend cleanup * fix: stabilize desktop restart detection and frozen reboot args * fix: make dashboard restart wait detection robust * fix: revert dashboard restart waiting interaction tweaks * fix: pass auth token for desktop graceful restart * fix: avoid false failure during graceful restart wait * fix: start restart waiting before electron restart call * fix: harden restart waiting and reboot arg parsing * fix: parse start_time as numeric timestamp * fix: 修复app内重启异常,修复app内点击重启不能立刻提示重启,以及在后端就绪时及时刷新界面的问题 (#5013) * fix: patch pip distlib finder for frozen electron runtime * fix: use certifi CA bundle for runtime SSL requests * fix: configure certifi CA before core imports * fix: improve mac font fallback for dashboard text * fix: harden frozen pip patch and unify TLS connector * refactor: centralize dashboard CJK font fallback stacks * perf: reuse TLS context and avoid repeated frozen pip patch * refactor: bootstrap TLS setup before core imports * fix: use async confirm dialog for provider deletions * fix: replace native confirm dialogs in dashboard - Add shared confirm helper in dashboard/src/utils/confirmDialog.ts for async dialog usage with safe fallback. - Migrate provider, chat, config, session, platform, persona, MCP, backup, and knowledge-base delete/close confirmations to use the shared helper. - Remove scattered inline confirm handling to keep behavior consistent and avoid native blocking dialog focus/caret issues in Electron. * fix: capture runtime bootstrap logs after logger init - Add bootstrap record buffer in runtime_bootstrap for early TLS patch logs before logger is ready. - Flush buffered bootstrap logs to astrbot logger at process startup in main.py. - Include concrete exception details for TLS bootstrap failures to improve diagnosis. * fix: harden runtime bootstrap and unify confirm handling - Simplify bootstrap log buffering and add a public initialize hook for non-main startup paths. - Guard aiohttp TLS patching with feature/type checks and keep graceful fallback when internals are unavailable. - Standardize dashboard confirmation flow via shared confirm helpers across composition and options API components. * refactor: simplify runtime tls bootstrap and tighten confirm typing * refactor: align ssl helper namespace and confirm usage * fix: avoid frozen restart crash from multiprocessing import * fix: include missing frozen dependencies for windows backend * fix: use execv for stable backend reboot args * Revert "fix: use execv for stable backend reboot args" This reverts commit9cc27becff. * Revert "fix: include missing frozen dependencies for windows backend" This reverts commit52554bea1f. * Revert "fix: avoid frozen restart crash from multiprocessing import" This reverts commit10548645b0. * fix: reset pyinstaller onefile env before reboot * fix: unify electron restart path and tray-exit backend cleanup * fix: stabilize desktop restart detection and frozen reboot args * fix: make dashboard restart wait detection robust * fix: revert dashboard restart waiting interaction tweaks * fix: pass auth token for desktop graceful restart * fix: avoid false failure during graceful restart wait * fix: start restart waiting before electron restart call * fix: harden restart waiting and reboot arg parsing * fix: parse start_time as numeric timestamp * fix: preserve windows frozen reboot argv quoting * fix: align restart waiting with electron restart timing * fix: tighten graceful restart and unmanaged kill safety * chore: bump version to 4.15.0 (#5003) * fix: add reminder for v4.14.8 users regarding manual redeployment due to a bug * fix: harden plugin dependency loading in frozen app runtime (#5015) * fix: compare plugin versions semantically in market updates * fix: prioritize plugin site-packages for in-process pip * fix: reload starlette from plugin target site-packages * fix: harden plugin dependency import precedence in frozen runtime * fix: improve plugin dependency conflict handling * refactor: simplify plugin conflict checks and version utils * fix: expand transitive plugin dependencies for conflict checks * fix: recover conflicting plugin dependencies during module prefer * fix: reuse renderer restart flow for tray backend restart * fix: add recoverable plugin dependency conflict handling * revert: remove plugin version comparison changes * fix: add missing tray restart backend labels * feat: adding support for media and quoted message attachments for feishu (#5018) * docs: add AUR installation method (#4879) * docs: sync system package manager installation instructions to all languages * Update README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update README.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * fix/typo * refactor: update system package manager installation instructions for Arch Linux across multiple language README files * feat: add installation command for AstrBot in multiple language README files --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> Co-authored-by: Soulter <905617992@qq.com> * fix(desktop): 为 Electron 与后端日志增加按大小轮转 (#5029) * fix(desktop): rotate electron and backend logs * refactor(desktop): centralize log rotation defaults and debug fs errors * fix(desktop): harden rotation fs ops and buffer backend log writes * refactor(desktop): extract buffered logger and reduce sync stat calls * refactor(desktop): simplify rotation flow and harden logger config * fix(desktop): make app logging async and flush-safe * fix: harden app log path switching and debug-gated rotation errors * fix: cap buffered log chunk size during path switch * feat: add first notice feature with multilingual support and UI integration * fix: 提升打包版桌面端启动稳定性并优化插件依赖处理 (#5031) * fix(desktop): rotate electron and backend logs * refactor(desktop): centralize log rotation defaults and debug fs errors * fix(desktop): harden rotation fs ops and buffer backend log writes * refactor(desktop): extract buffered logger and reduce sync stat calls * refactor(desktop): simplify rotation flow and harden logger config * fix(desktop): make app logging async and flush-safe * fix: harden app log path switching and debug-gated rotation errors * fix: cap buffered log chunk size during path switch * fix: avoid redundant plugin reinstall and upgrade electron * fix: stop webchat tasks cleanly and bind packaged backend to localhost * fix: unify platform shutdown and await webchat listener cleanup * fix: improve startup logs for dashboard and onebot listeners * fix: revert extra startup service logs * fix: harden plugin import recovery and webchat listener cleanup * fix: pin dashboard ci node version to 24.13.0 * fix: avoid duplicate webchat listener cleanup on terminate * refactor: clarify platform task lifecycle management * fix: continue platform shutdown when terminate fails * feat: temporary file handling and introduce TempDirCleaner (#5026) * feat: temporary file handling and introduce TempDirCleaner - Updated various modules to use `get_astrbot_temp_path()` instead of `get_astrbot_data_path()` for temporary file storage. - Renamed temporary files for better identification and organization. - Introduced `TempDirCleaner` to manage the size of the temporary directory, ensuring it does not exceed a specified limit by deleting the oldest files. - Added configuration option for maximum temporary directory size in the dashboard. - Implemented tests for `TempDirCleaner` to verify cleanup functionality and size management. * ruff * fix: close unawaited reset coroutine on early return (#5033) When an OnLLMRequestEvent hook stops event propagation, the reset_coro created by build_main_agent was never awaited, causing a RuntimeWarning. Close the coroutine explicitly before returning. Fixes #5032 Co-authored-by: Limitless2023 <limitless@users.noreply.github.com> * fix: update error logging message for connection failures * docs: clean and sync README (#5014) * fix: close missing div in README * fix: sync README_zh-TW with README * fix: sync README * fix: correct typo correct url in README_en README_fr README_ru * docs: sync README_en with README * Update README_en.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * fix: provider extra param dialog key display error * chore: ruff format * feat: add send_chat_action for Telegram platform adapter (#5037) * feat: add send_chat_action for Telegram platform adapter Add typing/upload indicator when sending messages via Telegram. - Added _send_chat_action helper method for sending chat actions - Send appropriate action (typing, upload_photo, upload_document, upload_voice) before sending different message types - Support streaming mode with typing indicator - Support supergroup with message_thread_id * refactor(telegram): extract chat action helpers and add throttling - Add ACTION_BY_TYPE mapping for message type to action priority - Add _get_chat_action_for_chain() to determine action from message chain - Add _send_media_with_action() for upload → send → restore typing pattern - Add _ensure_typing() helper for typing status - Add chat action throttling (0.5s) in streaming mode to avoid rate limits - Update type annotation to ChatAction | str for better static checking * feat(telegram): implement send_typing method for Telegram platform --------- Co-authored-by: Soulter <905617992@qq.com> * fix: 修复更新日志、官方文档弹窗双滚动条问题 (#5060) * docs: sync and fix readme typo (#5055) * docs: fix index typo * docs: fix typo in README_en.md - 移除英文README中意外出现的俄语,并替换为英语 * docs: fix html typo - remove unused '</p>' * docs: sync table with README * docs: sync README header format - keep the README header format consistent * doc: sync key features * style: format files - Fix formatting issues from previous PR * fix: correct md anchor link * docs: correct typo in README_fr.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * docs: correct typo in README_zh-TW.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * fix: 修复备份时缺失的人格文件夹映射 (#5042) * feat: QQ 官方机器人平台支持主动推送消息、私聊场景下支持接收文件 (#5066) * feat: QQ 官方机器人平台支持主动推送消息、私聊场景下支持接收文件 * feat: enhance QQOfficialWebhook to remember session scenes for group, channel, and friend messages * perf: 优化分段回复间隔时间的初始化逻辑 (#5068) fixes: #5059 * fix: chunk err when using openrouter deepseek (#5069) * feat: add i18n supports for custom platform adapters (#5045) * Feat: 为插件提供的适配器的元数据&i18n提供数据通路 * chore: update docstrings with pull request references Added references to pull request 5045 in docstrings. --------- Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * fix: 完善转发引用解析与图片回退并支持配置化控制 (#5054) * feat: support fallback image parsing for quoted messages * fix: fallback parse quoted images when reply chain has placeholders * style: format network utils with ruff * test: expand quoted parser coverage and improve fallback diagnostics * fix: fallback to text-only retry when image requests fail * fix: tighten image fallback and resolve nested quoted forwards * refactor: simplify quoted message extraction and dedupe images * fix: harden quoted parsing and openai error candidates * fix: harden quoted image ref normalization * refactor: organize quoted parser settings and logging * fix: cap quoted fallback images and avoid retry loops * refactor: split quoted message parser into focused modules * refactor: share onebot segment parsing logic * refactor: unify quoted message parsing flow * feat: move quoted parser tuning to provider settings * fix: add missing i18n metadata for quoted parser settings * chore: refine forwarded message setting labels * fix: add config tabs and routing for normal and system configurations * chore: bump version to 4.16.0 (#5074) * feat: add LINE platform support with adapter and configuration (#5085) * fix-correct-FIRST_NOTICE.md-locale-path-resolution (#5083) (#5082) * fix:修改配置文件目录 * fix:添加备选的FIRST_NOTICE.zh-CN.md用于兼容 * fix: remove unnecessary frozen flag from requirements export in Dockerfile fixes: #5089 * fix #5089: add uv lock step in Dockerfile before export (#5091) Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * feat: support hot reload after plugin load failure (#5043) * add :Support hot reload after plugin load failure * Apply suggestions from code review Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * fix:reformat code * fix:reformat code --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * feat: add fallback chat model chain in tool loop runner (#5109) * feat: implement fallback provider support for chat models and update configuration * feat: enhance provider selection display with count and chips for selected providers * feat: update fallback chat providers to use provider settings and add warning for non-list fallback models * feat: add Afdian support card to resources section in WelcomePage * feat: replace colorlog with loguru for enhanced logging support (#5115) * feat: add SSL configuration options for WebUI and update related logging (#5117) * chore: bump version to 4.17.0 * fix: handle list format content from OpenAI-compatible APIs (#5128) * fix: handle list format content from OpenAI-compatible APIs Some LLM providers (e.g., GLM-4.5V via SiliconFlow) return content as list[dict] format like [{'type': 'text', 'text': '...'}] instead of plain string. This causes the raw list representation to be displayed to users. Changes: - Add _normalize_content() helper to extract text from various content formats - Use json.loads instead of ast.literal_eval for safer parsing - Add size limit check (8KB) before attempting JSON parsing - Only convert lists that match OpenAI content-part schema (has 'type': 'text') to avoid collapsing legitimate list-literal replies like ['foo', 'bar'] - Add strip parameter to preserve whitespace in streaming chunks - Clean up orphan </think> tags that may leak from some models Fixes #5124 * fix: improve content normalization safety - Try json.loads first, fallback to ast.literal_eval for single-quoted Python literals to avoid corrupting apostrophes (e.g., "don't") - Coerce text values to str to handle null or non-string text fields * fix: update retention logic in LogManager to handle backup count correctly * chore: bump version to 4.17.1 * docs: Added instructions for deploying AstrBot using AstrBot Launcher. (#5136) Added instructions for deploying AstrBot using AstrBot Launcher. * fix: add MCP tools to function tool set in _plugin_tool_fix (#5144) * fix: add support for collecting data from builtin stars in electron pyinstaller build (#5145) * chore: bump version to 4.17.1 * chore: ruff format * fix: prevent updates for AstrBot launched via launcher * fix(desktop): include runtime deps for builtin plugins in backend build (#5146) * fix: 'Plain' object has no attribute 'text' when using python 3.14 (#5154) * fix: enhance plugin metadata handling by injecting attributes before instantiation (#5155) * fix: enhance handle_result to support event context and webchat image sending * chore: bump version to 4.17.3 * chore: ruff format * feat: add NVIDIA provider template (#5157) fixes: #5156 * feat: enhance provider sources panel with styled menu and mobile support * fix: improve permission denied message for local execution in Python and shell tools * feat: enhance PersonaForm component with responsive design and improved styling (#5162) fix: #5159 * ui(CronJobPage): fix action column buttons overlapping in CronJobPage (#5163) - 修改前:操作列容器仅使用 `d-flex`,在页面宽度变窄时,子元素(开关和删除按钮)会因为宽度挤压而发生视觉重叠,甚至堆叠在一起。 - 修改后: 1. 为容器添加了 `flex-nowrap`,强制禁止子元素换行。 2. 设置了 `min-width: 140px`,确保该列拥有固定的保护空间,防止被其他长文本列挤压。 3. 增加了 `gap: 12px` 间距,提升了操作辨识度并优化了点击体验。 * feat: add unsaved changes notice to configuration page and update messages * feat: implement search functionality in configuration components and update UI (#5168) * feat: add FAQ link to vertical sidebar and update navigation for localization * feat: add announcement section to WelcomePage and localize announcement title * chore: bump version to 4.17.4 * feat: supports send markdown message in qqofficial (#5173) * feat: supports send markdown message in qqofficial closes: #1093 #918 #4180 #4264 * ruff format * fix: prevent duplicate error message when all LLM providers fail (#5183) * fix: 修复选择配置文件进入配置文件管理弹窗直接关闭弹窗显示的配置文件不正确 (#5174) * feat: add MarketPluginCard component and integrate random plugin feature in ExtensionPage (#5190) * feat: add MarketPluginCard component and integrate random plugin feature in ExtensionPage * feat: update random plugin selection logic to use pluginMarketData and refresh on relevant events * feat: supports aihubmix * docs: update readme * chore: ruff format * feat: add LINE support to multiple language README files * feat(core): add plugin error hook for custom error routing (#5192) * feat(core): add plugin error hook for custom error routing * fix(core): align plugin error suppression with event stop state * refactor: extract Voice_messages_forbidden fallback into shared helper with typed BadRequest exception (#5204) - Add _send_voice_with_fallback helper to deduplicate voice forbidden handling - Catch telegram.error.BadRequest instead of bare Exception with string matching - Add text field to Record component to preserve TTS source text - Store original text in Record during TTS conversion for use as document caption - Skip _send_chat_action when chat_id is empty to avoid unnecessary warnings * chore: bump version to 4.17.5 * feat: add admin permission checks for Python and Shell execution (#5214) * fix: 改进微信公众号被动回复处理机制,引入缓冲与分片回复,并优化超时行为 (#5224) * 修复wechat official 被动回复功能 * ruff format --------- Co-authored-by: Soulter <905617992@qq.com> * fix: 修复仅发送 JSON 消息段时的空消息回复报错 (#5208) * Fix Register_Stage · 补全 JSON 消息判断,修复发送 JSON 消息时遇到 “消息为空,跳过发送阶段” 的问题。 · 顺带补全其它消息类型判断。 Co-authored-by: Pizero <zhaory200707@outlook.com> * Fix formatting and comments in stage.py * Format stage.py --------- Co-authored-by: Pizero <zhaory200707@outlook.com> * docs: update related repo links * fix(core): terminate active events on reset/new/del to prevent stale responses (#5225) * fix(core): terminate active events on reset/new/del to prevent stale responses Closes #5222 * style: fix import sorting in scheduler.py * chore: remove Electron desktop pipeline and switch to tauri repo (#5226) * ci: remove Electron desktop build from release pipeline * chore: remove electron desktop and switch to tauri release trigger * ci: remove desktop workflow dispatch trigger * refactor: migrate data paths to astrbot_path helpers * fix: point desktop update prompt to AstrBot-desktop releases * fix: update feature request template for clarity and consistency in English and Chinese * Feat/config leave confirm (#5249) * feat: 配置文件增加未保存提示弹窗 * fix: 移除unsavedChangesDialog插件使用组件方式实现弹窗 * feat: add support for plugin astrbot-version and platform requirement checks (#5235) * feat: add support for plugin astrbot-version and platform requirement checks * fix: remove unsupported platform and version constraints from metadata.yaml * fix: remove restriction on 'v' in astrbot_version specification format * ruff format * feat: add password confirmation when changing password (#5247) * feat: add password confirmation when changing password Fixes #5177 Adds a password confirmation field to prevent accidental password typos. Changes: - Backend: validate confirm_password matches new_password - Frontend: add confirmation input with validation - i18n: add labels and error messages for password mismatch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(auth): improve error message for password confirmation mismatch * fix(auth): update password hashing logic and improve confirmation validation --------- Co-authored-by: whatevertogo <whatevertogo@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(provider): 修复 dict 格式 content 导致的 JSON 残留问题 (#5250) * fix(provider): 修复 dict 格式 content 导致的 JSON 残留问题 修复 _normalize_content 函数未处理 dict 类型 content 的问题。 当 LLM 返回 {"type": "text", "text": "..."} 格式的 content 时, 现在会正确提取 text 字段而非直接转为字符串。 同时改进 fallback 行为,对 None 值返回空字符串。 Fixes #5244 * Update warning message for unexpected dict format --------- Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> * chore: remove outdated heihe.md documentation file * fix: all mcp tools exposed to main agent (#5252) * fix: enhance PersonaForm layout and improve tool selection display * fix: update tool status display and add localization for inactive tools * fix: remove additionalProperties from tool schema properties (#5253) fixes: #5217 * fix: simplify error messages for account edit validation * fix: streamline error response for empty new username and password in account edit * chore: bump vertion to 4.17.6 * feat: add OpenRouter provider support and icon * chore: ruff format * refactor(dashboard): replace legacy isElectron bridge fields with isDesktop (#5269) * refactor dashboard desktop bridge fields from isElectron to isDesktop * refactor dashboard runtime detection into shared helper * fix: update contributor avatar image URL to include max size and columns (#5268) * feat: astrbot http api (#5280) * feat: astrbot http api * Potential fix for code scanning alert no. 34: Use of a broken or weak cryptographic hashing algorithm on sensitive data Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: improve error handling for missing attachment path in file upload * feat: implement paginated retrieval of platform sessions for creators * feat: refactor attachment directory handling in ChatRoute * feat: update API endpoint paths for file and message handling * feat: add documentation link to API key management section in settings * feat: update API key scopes and related configurations in API routes and tests * feat: enhance API key expiration options and add warning for permanent keys * feat: add UTC normalization and serialization for API key timestamps * feat: implement chat session management and validation for usernames * feat: ignore session_id type chunks in message processing --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * feat(dashboard): improve plugin platform support display and mobile accessibility (#5271) * feat(dashboard): improve plugin platform support display and mobile accessibility - Replace hover-based tooltips with interactive click menus for platform support information. - Fix mobile touch issues by introducing explicit state control for status capsules. - Enhance UI aesthetics with platform-specific icons and a structured vertical list layout. - Add dynamic chevron icons to provide clear visual cues for expandable content. * refactor(dashboard): refactor market card with computed properties for performance * refactor(dashboard): unify plugin platform support UI with new reusable chip component - Create shared 'PluginPlatformChip' component to encapsulate platform meta display. - Fix mobile interaction bugs by simplifying menu triggers and event handling. - Add stacked platform icon previews and dynamic chevron indicators within capsules. - Improve information hierarchy using structured vertical lists for platform details. - Optimize rendering efficiency with computed properties across both card views. * fix: qq official guild message send error (#5287) * fix: qq official guild message send error * Update astrbot/core/platform/sources/qqofficial/qqofficial_message_event.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * 更新readme文档,补充桌面app说明,并向前移动位置 (#5297) * docs: update desktop deployment section in README * docs: refine desktop and launcher deployment descriptions * Update README.md * feat: add Anthropic Claude Code OAuth provider and adaptive thinking support (#5209) * feat: add Anthropic Claude Code OAuth provider and adaptive thinking support * fix: add defensive guard for metadata overrides and align budget condition with docs * refactor: adopt sourcery-ai suggestions for OAuth provider - Use use_api_key=False in OAuth subclass to avoid redundant API-key client construction before replacing with auth_token client - Generalize metadata override helper to merge all dict keys instead of only handling 'limit', improving extensibility * Feat/telegram command alias register #5233 (#5234) * feat: support registering command aliases for Telegram Now when registering commands with aliases, all aliases will be registered as Telegram bot commands in addition to the main command. Example: @register_command(command_name="draw", alias={"画", "gen"}) Now /draw, /画, and /gen will all appear in the Telegram command menu. * feat(telegram): add duplicate command name warning when registering commands Log a warning when duplicate command names are detected during Telegram command registration to help identify configuration conflicts. * refactor: remove Anthropic OAuth provider implementation and related metadata overrides * fix: 修复新建对话时因缺少会话ID导致配置绑定失败的问题 (#5292) * fix:尝试修改 * fix:添加详细日志 * fix:进行详细修改,并添加日志 * fix:删除所有日志 * fix: 增加安全访问函数 - 给 localStorage 访问加了 try/catch + 可用性判断:dashboard/src/utils/chatConfigBinding.ts:13 - 新增 getFromLocalStorage/setToLocalStorage(在受限存储/无痕模式下异常时回退/忽略) - getStoredDashboardUsername() / getStoredSelectedChatConfigId() 改为走安全读取:dashboard/src/utils/chatConfigBinding.ts:36 - 新增 setStoredSelectedChatConfigId(),写入失败静默忽略:dashboard/src/utils/chatConfigBinding.ts:44 - 把 ConfigSelector.vue 里直接 localStorage.getItem/setItem 全部替换为上述安全方法:dashboard/src/components/chat/ConfigSelector.vue:81 - 已重新跑过 pnpm run typecheck,通过。 * rm:删除个人用的文档文件 * Revert "rm:删除个人用的文档文件" This reverts commit0fceee0543. * rm:删除个人用的文档文件 * rm:删除个人用的文档文件 * chore: bump version to 4.18.0 * fix(SubAgentPage): 当中间的介绍文本非常长时,Flex 布局会自动挤压右侧的控制按钮区域 (#5306) * fix: 修复新版本插件市场出现插件显示为空白的 bug;纠正已安装插件卡片的排版,统一大小 (#5309) * fix(ExtensionCard): 解决插件卡片大小不统一的问题 * fix(MarketPluginCard): 解决插件市场不加载插件的问题 (#5303) * feat: supports spawn subagent as a background task that not block the main agent workflow (#5081) * feat:为subagent添加后台任务参数 * ruff * fix: update terminology from 'handoff mission' to 'background task' and refactor related logic * fix: update terminology from 'background_mission' to 'background_task' in HandoffTool and related logic * fix(HandoffTool): update background_task description for clarity on usage --------- Co-authored-by: Soulter <905617992@qq.com> * cho * fix: 修复 aiohttp 版本过新导致 qq-botpy 报错的问题 (#5316) * chore: ruff format * fix: remove hard-coded 6s timeout from tavily request * fix: remove changelogs directory from .dockerignore * feat(dashboard): make release redirect base URL configurable (#5330) * feat(dashboard): make desktop release base URL configurable * refactor(dashboard): use generic release base URL env with upstream default * fix(dashboard): guard release base URL normalization when env is unset * refactor(dashboard): use generic release URL helpers and avoid latest suffix duplication * feat: add stop functionality for active agent sessions and improve handling of stop requests (#5380) * feat: add stop functionality for active agent sessions and improve handling of stop requests * feat: update stop button icon and tooltip in ChatInput component * fix: correct indentation in tool call handling within ChatRoute class * fix: chatui cannot persist file segment (#5386) * fix(plugin): update plugin directory handling for reserved plugins (#5369) * fix(plugin): update plugin directory handling for reserved plugins * fix(plugin): add warning logs for missing plugin name, object, directory, and changelog * chore(README): updated with README.md (#5375) * chore(README): updated with README.md * Update README_fr.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update README_zh-TW.md Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * feat: add image urls / paths supports for subagent (#5348) * fix: 修复5081号PR在子代理执行后台任务时,未正确使用系统配置的流式/非流请求的问题(#5081) * feat:为子代理增加远程图片URL参数支持 * fix: update description for image_urls parameter in HandoffTool to clarify usage in multimodal tasks * ruff format --------- Co-authored-by: Soulter <905617992@qq.com> * feat: add hot reload when failed to load plugins (#5334) * feat:add hot reload when failed to load plugins * apply bot suggestions * fix(chatui): add copy rollback path and error message. (#5352) * fix(chatui): add copy rollback path and error message. * fix(chatui): fixed textarea leak in the copy button. * fix(chatui): use color styles from the component library. * fix: 处理配置文件中的 UTF-8 BOM 编码问题 (#5376) * fix(config): handle UTF-8 BOM in configuration file loading Problem: On Windows, some text editors (like Notepad) automatically add UTF-8 BOM to JSON files when saving. This causes json.decoder.JSONDecodeError: "Unexpected UTF-8 BOM" and AstrBot fails to start when cmd_config.json contains BOM. Solution: Add defensive check to strip UTF-8 BOM (\ufeff) if present before parsing JSON configuration file. Impact: - Improves robustness and cross-platform compatibility - No breaking changes to existing functionality - Fixes startup failure when configuration file has UTF-8 BOM encoding Relates-to: Windows editor compatibility issues * style: fix code formatting with ruff Fix single quote to double quote to comply with project code style. * feat: add plugin load&unload hook (#5331) * 添加了插件的加载完成和卸载完成的钩子事件 * 添加了插件的加载完成和卸载完成的钩子事件 * format code with ruff * ruff format --------- Co-authored-by: Soulter <905617992@qq.com> * test: enhance test framework with comprehensive fixtures and mocks (#5354) * test: enhance test framework with comprehensive fixtures and mocks - Add shared mock builders for aiocqhttp, discord, telegram - Add test helpers for platform configs and mock objects - Expand conftest.py with test profile support - Update coverage test workflow configuration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(tests): 移动并重构模拟 LLM 响应和消息组件函数 * fix(tests): 优化 pytest_runtest_setup 中的标记检查逻辑 --------- Co-authored-by: whatevertogo <whatevertogo@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * test: add comprehensive tests for message event handling (#5355) * test: add comprehensive tests for message event handling - Add AstrMessageEvent unit tests (688 lines) - Add AstrBotMessage unit tests - Enhance smoke tests with message event scenarios Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: improve message type handling and add defensive tests --------- Co-authored-by: whatevertogo <whatevertogo@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add support for showing tool call results in agent execution (#5388) closes: #5329 * fix: resolve pipeline and star import cycles (#5353) * fix: resolve pipeline and star import cycles - Add bootstrap.py and stage_order.py to break circular dependencies - Export Context, PluginManager, StarTools from star module - Update pipeline __init__ to defer imports - Split pipeline initialization into separate bootstrap module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add logging for get_config() failure in Star class * fix: reorder logger initialization in base.py --------- Co-authored-by: whatevertogo <whatevertogo@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: enable computer-use tools for subagent handoff (#5399) * fix: enforce admin guard for sandbox file transfer tools (#5402) * fix: enforce admin guard for sandbox file transfer tools * refactor: deduplicate computer tools admin permission checks * fix: add missing space in permission error message * fix(core): 优化 File 组件处理逻辑并增强 OneBot 驱动层路径兼容性 (#5391) * fix(core): 优化 File 组件处理逻辑并增强 OneBot 驱动层路径兼容性 原因 (Necessity): 1. 内核一致性:AstrBot 内核的 Record 和 Video 组件均具备识别 `file:///` 协议头的逻辑,但 File 组件此前缺失此功能,导致行为不统一。 2. OneBot 协议合规:OneBot 11 标准要求本地文件路径必须使用 `file:///` 协议头。此前驱动层未对裸路径进行自动转换,导致发送本地文件时常触发 retcode 1200 (识别URL失败) 错误。 3. 容器环境适配:在 Docker 等路径隔离环境下,裸路径更容易因驱动或协议端的解析歧义而失效。 更改 (Changes): - [astrbot/core/message/components.py]: - 在 File.get_file() 中增加对 `file:///` 前缀的识别与剥离逻辑,使其与 Record/Video 组件行为对齐。 - [astrbot/core/platform/sources/aiocqhttp/aiocqhttp_message_event.py]: - 在发送文件前增加自动修正逻辑:若路径为绝对路径且未包含协议头,驱动层将自动补全 `file:///` 前缀。 - 对 http、base64 及已有协议头,确保不干扰原有的正常传输逻辑。 影响 (Impact): - 以完全兼容的方式增强了文件发送的鲁棒性。 - 解决了插件在发送日志等本地生成的压缩包时,因路径格式不规范导致的发送失败问题。 * refactor(core): 根据 cr 建议,规范化文件 URI 生成与解析逻辑,优化跨平台兼容性 原因 (Necessity): 1. 修复原生路径与 URI 转换在 Windows 下的不对称问题。 2. 规范化 file: 协议头处理,确保符合 RFC 标准并能在 Linux/Windows 间稳健切换。 3. 增强协议判定准确度,防止对普通绝对路径的误处理。 更改 (Changes): - [astrbot/core/platform/sources/aiocqhttp]: - 弃用手动拼接,改用 `pathlib.Path.as_uri()` 生成标准 URI。 - 将协议检测逻辑从前缀匹配优化为包含性检测 ("://")。 - [astrbot/core/message/components]: - 重构 `File.get_file` 解析逻辑,支持对称处理 2/3 斜杠格式。 - 针对 Windows 环境增加了对 `file:///C:/` 格式的自动修正,避免 `os.path` 识别失效。 - [data/plugins/astrbot_plugin_logplus]: - 在直接 API 调用中同步应用 URI 规范化处理。 影响 (Impact): - 解决 Docker 环境中因路径不规范导致的 "识别URL失败" 报错。 - 提升了本体框架在 Windows 系统下的文件操作鲁棒性。 * i18n(SubAgentPage): complete internationalization for subagent orchestration page (#5400) * i18n: complete internationalization for subagent orchestration page - Replace hardcoded English strings in [SubAgentPage.vue] with i18n keys. - Update `en-US` and `zh-CN` locales with missing hints, validation messages, and empty state translations. - Fix translation typos and improve consistency across the SubAgent orchestration UI. * fix(bug_risk): 避免在模板中的翻译调用上使用 || 'Close' 作为回退值。 * fix(aiocqhttp): enhance shutdown process for aiocqhttp adapter (#5412) * fix: pass embedding dimensions to provider apis (#5411) * fix(context): log warning when platform not found for session * fix(context): improve logging for platform not found in session * chore: bump version to 4.18.2 * chore: bump version to 4.18.2 * chore: bump version to 4.18.2 * fix: Telegram voice message format (OGG instead of WAV) causing issues with OpenAI STT API (#5389) * chore: ruff format * feat(dashboard): add generic desktop app updater bridge (#5424) * feat(dashboard): add generic desktop app updater bridge * fix(dashboard): address updater bridge review feedback * fix(dashboard): unify updater bridge types and error logging * fix(dashboard): consolidate updater bridge typings * fix(conversation): retain existing persona_id when updating conversation * fix(dashboard): 修复设置页新建 API Key 后复制失败问题 (#5439) * Fix: GitHub proxy not displaying correctly in WebUI (#5438) * fix(dashboard): preserve custom GitHub proxy setting on reload * fix(dashboard): keep github proxy selection persisted in settings * fix(persona): enhance persona resolution logic for conversations and sessions * fix: ensure tool call/response pairing in context truncation (#5417) * fix: ensure tool call/response pairing in context truncation * refactor: simplify fix_messages to single-pass state machine * perf(cron): enhance future task session isolation fixes: #5392 * feat: add useExtensionPage composable for managing plugin extensions - Implemented a new composable `useExtensionPage` to handle various functionalities related to plugin management, including fetching extensions, handling updates, and managing UI states. - Added support for conflict checking, plugin installation, and custom source management. - Integrated search and filtering capabilities for plugins in the market. - Enhanced user experience with dialogs for confirmations and notifications. - Included pagination and sorting features for better plugin visibility. * fix: clear markdown field when sending media messages via QQ Official Platform (#5445) * fix: clear markdown field when sending media messages via QQ Official API * refactor: use pop() to remove markdown key instead of setting None * fix: cannot automatically get embedding dim when create embedding provider (#5442) * fix(dashboard): 强化 API Key 复制临时节点清理逻辑 * fix(embedding): 自动检测改为探测 OpenAI embedding 最大可用维度 * fix: normalize openai embedding base url and add hint key * i18n: add embedding_api_base hint translations * i18n: localize provider embedding/proxy metadata hints * fix: show provider-specific embedding API Base URL hint as field subtitle * fix(embedding): cap OpenAI detect_dim probes with early short-circuit * fix(dashboard): return generic error on provider adapter import failure * 回退检测逻辑 * fix: 修复Pyright静态类型检查报错 (#5437) * refactor: 修正 Sqlite 查询、下载回调、接口重构与类型调整 * feat: 为 OneBotClient 增加 CallAction 协议与异步调用支持 * fix(telegram): avoid duplicate message_thread_id in streaming (#5430) * perf: batch metadata query in KB retrieval to fix N+1 problem (#5463) * perf: batch metadata query in KB retrieval to fix N+1 problem Replace N sequential get_document_with_metadata() calls with a single get_documents_with_metadata_batch() call using SQL IN clause. Benchmark results (local SQLite): - 10 docs: 10.67ms → 1.47ms (7.3x faster) - 20 docs: 26.00ms → 2.68ms (9.7x faster) - 50 docs: 63.87ms → 2.79ms (22.9x faster) * refactor: use set[str] param type and chunk IN clause for SQLite safety Address review feedback: - Change doc_ids param from list[str] to set[str] to avoid unnecessary conversion - Chunk IN clause into batches of 900 to stay under SQLite's 999 parameter limit - Remove list() wrapping at call site, pass set directly * fix:fix the issue where incomplete cleanup of residual plugins occurs… (#5462) * fix:fix the issue where incomplete cleanup of residual plugins occurs in the failed loading of plugins * fix:ruff format,apply bot suggestions * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * chore: 为类型检查添加 TYPE_CHECKING 的导入与阶段类型引用 (#5474) * fix(line): line adapter does not appear in the add platform dialog fixes: #5477 * [bug]查看介入教程line前往错误界面的问题 (#5479) Fixes #5478 * chore: bump version to 4.18.3 * feat: implement follow-up message handling in ToolLoopAgentRunner (#5484) * feat: implement follow-up message handling in ToolLoopAgentRunner * fix: correct import path for follow-up module in InternalAgentSubStage * feat: implement websockets transport mode selection for chat (#5410) * feat: implement websockets transport mode selection for chat - Added transport mode selection (SSE/WebSocket) in the chat component. - Updated conversation sidebar to include transport mode options. - Integrated transport mode handling in message sending logic. - Refactored message sending functions to support both SSE and WebSocket. - Enhanced WebSocket connection management and message handling. - Updated localization files for transport mode labels. - Configured Vite to support WebSocket proxying. * feat(webchat): refactor message parsing logic and integrate new parsing function * feat(chat): add websocket API key extraction and scope validation * Revert "可选后端,实现前后端分离" (#5536) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: can <51474963+weijintaocode@users.noreply.github.com> Co-authored-by: Soulter <905617992@qq.com> Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com> Co-authored-by: letr <123731298+letr007@users.noreply.github.com> Co-authored-by: 搁浅 <id6543156918@gmail.com> Co-authored-by: Helian Nuits <sxp20061207@163.com> Co-authored-by: Gao Jinzhe <2968474907@qq.com> Co-authored-by: DD斩首 <155905740+DDZS987@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@localhost.localdomain> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: エイカク <62183434+zouyonghe@users.noreply.github.com> Co-authored-by: 鸦羽 <Raven95676@gmail.com> Co-authored-by: Dt8333 <25431943+Dt8333@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Li-shi-ling <114913764+Li-shi-ling@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: Limitless <127183162+Limitless2023@users.noreply.github.com> Co-authored-by: Limitless2023 <limitless@users.noreply.github.com> Co-authored-by: evpeople <54983536+evpeople@users.noreply.github.com> Co-authored-by: SnowNightt <127504703+SnowNightt@users.noreply.github.com> Co-authored-by: xzj0898 <62733743+xzj0898@users.noreply.github.com> Co-authored-by: stevessr <89645372+stevessr@users.noreply.github.com> Co-authored-by: Waterwzy <2916963017@qq.com> Co-authored-by: NayukiMeko <MekoNayuki@outlook.com> Co-authored-by: 時壹 <137363396+KBVsent@users.noreply.github.com> Co-authored-by: sanyekana <Clhikari@qq.com> Co-authored-by: Chiu Chun-Hsien <95356121+911218sky@users.noreply.github.com> Co-authored-by: Dream Tokenizer <60459821+Trance-0@users.noreply.github.com> Co-authored-by: NanoRocky <76585834+NanoRocky@users.noreply.github.com> Co-authored-by: Pizero <zhaory200707@outlook.com> Co-authored-by: 雪語 <167516635+YukiRa1n@users.noreply.github.com> Co-authored-by: whatevertogo <1879483647@qq.com> Co-authored-by: whatevertogo <whatevertogo@users.noreply.github.com> Co-authored-by: 香草味的纳西妲喵 <151599587+VanillaNahida@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Lovely Moe Moli <44719954+moemoli@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Minidoracat <minidora0702@gmail.com> Co-authored-by: Chen <42998804+a61995987@users.noreply.github.com> Co-authored-by: hanbings <hanbings@hanbings.io> Co-authored-by: tangsenfei <155090747+tangsenfei@users.noreply.github.com> Co-authored-by: PyuraMazo <1605025385@qq.com> Co-authored-by: Axi404 <118950647+Axi404@users.noreply.github.com> Co-authored-by: 氕氙 <2014440212@qq.com> Co-authored-by: Yunhao Cao <18230652+realquantumcookie@users.noreply.github.com> Co-authored-by: exynos <110159911+exynos967@users.noreply.github.com> Co-authored-by: Luna_Dol <86590429+Luna-channel@users.noreply.github.com> Co-authored-by: CCCCCCTV <64309817+CCCCCCTV@users.noreply.github.com> Co-authored-by: CAICAII <3360776475@qq.com> Co-authored-by: 圣达生物多 <qq3258819795@163.com>
1269 lines
45 KiB
Python
1269 lines
45 KiB
Python
"""知识库管理 API 路由"""
|
|
|
|
import asyncio
|
|
import os
|
|
import traceback
|
|
import uuid
|
|
from typing import Any
|
|
|
|
import aiofiles
|
|
from quart import request
|
|
|
|
from astrbot.core import logger
|
|
from astrbot.core.core_lifecycle import AstrBotCoreLifecycle
|
|
from astrbot.core.provider.provider import EmbeddingProvider, RerankProvider
|
|
from astrbot.core.utils.astrbot_path import get_astrbot_temp_path
|
|
|
|
from ..utils import generate_tsne_visualization
|
|
from .route import Response, Route, RouteContext
|
|
|
|
|
|
class KnowledgeBaseRoute(Route):
|
|
"""知识库管理路由
|
|
|
|
提供知识库、文档、检索、会话配置等 API 接口
|
|
"""
|
|
|
|
def __init__(
|
|
self,
|
|
context: RouteContext,
|
|
core_lifecycle: AstrBotCoreLifecycle,
|
|
) -> None:
|
|
super().__init__(context)
|
|
self.core_lifecycle = core_lifecycle
|
|
self.kb_manager = None # 延迟初始化
|
|
self.kb_db = None
|
|
self.session_config_db = None # 会话配置数据库
|
|
self.retrieval_manager = None
|
|
self.upload_progress = {} # 存储上传进度 {task_id: {status, file_index, file_total, stage, current, total}}
|
|
self.upload_tasks = {} # 存储后台上传任务 {task_id: {"status", "result", "error"}}
|
|
|
|
# 注册路由
|
|
self.routes = {
|
|
# 知识库管理
|
|
"/kb/list": ("GET", self.list_kbs),
|
|
"/kb/create": ("POST", self.create_kb),
|
|
"/kb/get": ("GET", self.get_kb),
|
|
"/kb/update": ("POST", self.update_kb),
|
|
"/kb/delete": ("POST", self.delete_kb),
|
|
"/kb/stats": ("GET", self.get_kb_stats),
|
|
# 文档管理
|
|
"/kb/document/list": ("GET", self.list_documents),
|
|
"/kb/document/upload": ("POST", self.upload_document),
|
|
"/kb/document/import": ("POST", self.import_documents),
|
|
"/kb/document/upload/url": ("POST", self.upload_document_from_url),
|
|
"/kb/document/upload/progress": ("GET", self.get_upload_progress),
|
|
"/kb/document/get": ("GET", self.get_document),
|
|
"/kb/document/delete": ("POST", self.delete_document),
|
|
# # 块管理
|
|
"/kb/chunk/list": ("GET", self.list_chunks),
|
|
"/kb/chunk/delete": ("POST", self.delete_chunk),
|
|
# # 多媒体管理
|
|
# "/kb/media/list": ("GET", self.list_media),
|
|
# "/kb/media/delete": ("POST", self.delete_media),
|
|
# 检索
|
|
"/kb/retrieve": ("POST", self.retrieve),
|
|
}
|
|
self.register_routes()
|
|
|
|
def _get_kb_manager(self):
|
|
return self.core_lifecycle.kb_manager
|
|
|
|
def _init_task(self, task_id: str, status: str = "pending") -> None:
|
|
self.upload_tasks[task_id] = {
|
|
"status": status,
|
|
"result": None,
|
|
"error": None,
|
|
}
|
|
|
|
def _set_task_result(
|
|
self, task_id: str, status: str, result: Any = None, error: str | None = None
|
|
) -> None:
|
|
self.upload_tasks[task_id] = {
|
|
"status": status,
|
|
"result": result,
|
|
"error": error,
|
|
}
|
|
if task_id in self.upload_progress:
|
|
self.upload_progress[task_id]["status"] = status
|
|
|
|
def _update_progress(
|
|
self,
|
|
task_id: str,
|
|
*,
|
|
status: str | None = None,
|
|
file_index: int | None = None,
|
|
file_name: str | None = None,
|
|
stage: str | None = None,
|
|
current: int | None = None,
|
|
total: int | None = None,
|
|
) -> None:
|
|
if task_id not in self.upload_progress:
|
|
return
|
|
p = self.upload_progress[task_id]
|
|
if status is not None:
|
|
p["status"] = status
|
|
if file_index is not None:
|
|
p["file_index"] = file_index
|
|
if file_name is not None:
|
|
p["file_name"] = file_name
|
|
if stage is not None:
|
|
p["stage"] = stage
|
|
if current is not None:
|
|
p["current"] = current
|
|
if total is not None:
|
|
p["total"] = total
|
|
|
|
def _make_progress_callback(self, task_id: str, file_idx: int, file_name: str):
|
|
async def _callback(stage: str, current: int, total: int) -> None:
|
|
self._update_progress(
|
|
task_id,
|
|
status="processing",
|
|
file_index=file_idx,
|
|
file_name=file_name,
|
|
stage=stage,
|
|
current=current,
|
|
total=total,
|
|
)
|
|
|
|
return _callback
|
|
|
|
async def _background_upload_task(
|
|
self,
|
|
task_id: str,
|
|
kb_helper,
|
|
files_to_upload: list,
|
|
chunk_size: int,
|
|
chunk_overlap: int,
|
|
batch_size: int,
|
|
tasks_limit: int,
|
|
max_retries: int,
|
|
) -> None:
|
|
"""后台上传任务"""
|
|
try:
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="processing")
|
|
self.upload_progress[task_id] = {
|
|
"status": "processing",
|
|
"file_index": 0,
|
|
"file_total": len(files_to_upload),
|
|
"stage": "waiting",
|
|
"current": 0,
|
|
"total": 100,
|
|
}
|
|
|
|
uploaded_docs = []
|
|
failed_docs = []
|
|
|
|
for file_idx, file_info in enumerate(files_to_upload):
|
|
try:
|
|
# 更新整体进度
|
|
self._update_progress(
|
|
task_id,
|
|
status="processing",
|
|
file_index=file_idx,
|
|
file_name=file_info["file_name"],
|
|
stage="parsing",
|
|
current=0,
|
|
total=100,
|
|
)
|
|
|
|
# 创建进度回调函数
|
|
progress_callback = self._make_progress_callback(
|
|
task_id, file_idx, file_info["file_name"]
|
|
)
|
|
|
|
doc = await kb_helper.upload_document(
|
|
file_name=file_info["file_name"],
|
|
file_content=file_info["file_content"],
|
|
file_type=file_info["file_type"],
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
progress_callback=progress_callback,
|
|
)
|
|
|
|
uploaded_docs.append(doc.model_dump())
|
|
except Exception as e:
|
|
logger.error(f"上传文档 {file_info['file_name']} 失败: {e}")
|
|
failed_docs.append(
|
|
{"file_name": file_info["file_name"], "error": str(e)},
|
|
)
|
|
|
|
# 更新任务完成状态
|
|
result = {
|
|
"task_id": task_id,
|
|
"uploaded": uploaded_docs,
|
|
"failed": failed_docs,
|
|
"total": len(files_to_upload),
|
|
"success_count": len(uploaded_docs),
|
|
"failed_count": len(failed_docs),
|
|
}
|
|
|
|
self._set_task_result(task_id, "completed", result=result)
|
|
|
|
except Exception as e:
|
|
logger.error(f"后台上传任务 {task_id} 失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
self._set_task_result(task_id, "failed", error=str(e))
|
|
|
|
async def _background_import_task(
|
|
self,
|
|
task_id: str,
|
|
kb_helper,
|
|
documents: list,
|
|
batch_size: int,
|
|
tasks_limit: int,
|
|
max_retries: int,
|
|
) -> None:
|
|
"""后台导入预切片文档任务"""
|
|
try:
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="processing")
|
|
self.upload_progress[task_id] = {
|
|
"status": "processing",
|
|
"file_index": 0,
|
|
"file_total": len(documents),
|
|
"stage": "waiting",
|
|
"current": 0,
|
|
"total": 100,
|
|
}
|
|
|
|
uploaded_docs = []
|
|
failed_docs = []
|
|
|
|
for file_idx, doc_info in enumerate(documents):
|
|
file_name = doc_info.get("file_name", f"imported_doc_{file_idx}")
|
|
chunks = doc_info.get("chunks", [])
|
|
|
|
try:
|
|
# 更新整体进度
|
|
self._update_progress(
|
|
task_id,
|
|
status="processing",
|
|
file_index=file_idx,
|
|
file_name=file_name,
|
|
stage="importing",
|
|
current=0,
|
|
total=100,
|
|
)
|
|
|
|
# 创建进度回调函数
|
|
progress_callback = self._make_progress_callback(
|
|
task_id, file_idx, file_name
|
|
)
|
|
|
|
# 调用 upload_document,传入 pre_chunked_text
|
|
doc = await kb_helper.upload_document(
|
|
file_name=file_name,
|
|
file_content=None, # 预切片模式下不需要原始内容
|
|
file_type=doc_info.get("file_type")
|
|
or (
|
|
file_name.rsplit(".", 1)[-1].lower()
|
|
if "." in file_name
|
|
else "txt"
|
|
),
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
progress_callback=progress_callback,
|
|
pre_chunked_text=chunks,
|
|
)
|
|
|
|
uploaded_docs.append(doc.model_dump())
|
|
except Exception as e:
|
|
logger.error(f"导入文档 {file_name} 失败: {e}")
|
|
failed_docs.append(
|
|
{"file_name": file_name, "error": str(e)},
|
|
)
|
|
|
|
# 更新任务完成状态
|
|
result = {
|
|
"task_id": task_id,
|
|
"uploaded": uploaded_docs,
|
|
"failed": failed_docs,
|
|
"total": len(documents),
|
|
"success_count": len(uploaded_docs),
|
|
"failed_count": len(failed_docs),
|
|
}
|
|
|
|
self._set_task_result(task_id, "completed", result=result)
|
|
|
|
except Exception as e:
|
|
logger.error(f"后台导入任务 {task_id} 失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
self._set_task_result(task_id, "failed", error=str(e))
|
|
|
|
async def list_kbs(self):
|
|
"""获取知识库列表
|
|
|
|
Query 参数:
|
|
- page: 页码 (默认 1)
|
|
- page_size: 每页数量 (默认 20)
|
|
- refresh_stats: 是否刷新统计信息 (默认 false,首次加载时可设为 true)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
page = request.args.get("page", 1, type=int)
|
|
page_size = request.args.get("page_size", 20, type=int)
|
|
|
|
kbs = await kb_manager.list_kbs()
|
|
|
|
# 转换为字典列表
|
|
kb_list = []
|
|
for kb in kbs:
|
|
kb_list.append(kb.model_dump())
|
|
|
|
return (
|
|
Response()
|
|
.ok({"items": kb_list, "page": page, "page_size": page_size})
|
|
.__dict__
|
|
)
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取知识库列表失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取知识库列表失败: {e!s}").__dict__
|
|
|
|
async def create_kb(self):
|
|
"""创建知识库
|
|
|
|
Body:
|
|
- kb_name: 知识库名称 (必填)
|
|
- description: 描述 (可选)
|
|
- emoji: 图标 (可选)
|
|
- embedding_provider_id: 嵌入模型提供商ID (可选)
|
|
- rerank_provider_id: 重排序模型提供商ID (可选)
|
|
- chunk_size: 分块大小 (可选, 默认512)
|
|
- chunk_overlap: 块重叠大小 (可选, 默认50)
|
|
- top_k_dense: 密集检索数量 (可选, 默认50)
|
|
- top_k_sparse: 稀疏检索数量 (可选, 默认50)
|
|
- top_m_final: 最终返回数量 (可选, 默认5)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
kb_name = data.get("kb_name")
|
|
if not kb_name:
|
|
return Response().error("知识库名称不能为空").__dict__
|
|
|
|
description = data.get("description")
|
|
emoji = data.get("emoji")
|
|
embedding_provider_id = data.get("embedding_provider_id")
|
|
rerank_provider_id = data.get("rerank_provider_id")
|
|
chunk_size = data.get("chunk_size")
|
|
chunk_overlap = data.get("chunk_overlap")
|
|
top_k_dense = data.get("top_k_dense")
|
|
top_k_sparse = data.get("top_k_sparse")
|
|
top_m_final = data.get("top_m_final")
|
|
|
|
# pre-check embedding dim
|
|
if not embedding_provider_id:
|
|
return Response().error("缺少参数 embedding_provider_id").__dict__
|
|
prv = await kb_manager.provider_manager.get_provider_by_id(
|
|
embedding_provider_id,
|
|
) # type: ignore
|
|
if not prv or not isinstance(prv, EmbeddingProvider):
|
|
return (
|
|
Response().error(f"嵌入模型不存在或类型错误({type(prv)})").__dict__
|
|
)
|
|
try:
|
|
vec = await prv.get_embedding("astrbot")
|
|
if len(vec) != prv.get_dim():
|
|
raise ValueError(
|
|
f"嵌入向量维度不匹配,实际是 {len(vec)},然而配置是 {prv.get_dim()}",
|
|
)
|
|
except Exception as e:
|
|
return Response().error(f"测试嵌入模型失败: {e!s}").__dict__
|
|
# pre-check rerank
|
|
if rerank_provider_id:
|
|
rerank_prv: RerankProvider = (
|
|
await kb_manager.provider_manager.get_provider_by_id(
|
|
rerank_provider_id,
|
|
)
|
|
) # type: ignore
|
|
if not rerank_prv:
|
|
return Response().error("重排序模型不存在").__dict__
|
|
# 检查重排序模型可用性
|
|
try:
|
|
res = await rerank_prv.rerank(
|
|
query="astrbot",
|
|
documents=["astrbot knowledge base"],
|
|
)
|
|
if not res:
|
|
raise ValueError("重排序模型返回结果异常")
|
|
except Exception as e:
|
|
return (
|
|
Response()
|
|
.error(f"测试重排序模型失败: {e!s},请检查平台日志输出。")
|
|
.__dict__
|
|
)
|
|
|
|
kb_helper = await kb_manager.create_kb(
|
|
kb_name=kb_name,
|
|
description=description,
|
|
emoji=emoji,
|
|
embedding_provider_id=embedding_provider_id,
|
|
rerank_provider_id=rerank_provider_id,
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
top_k_dense=top_k_dense,
|
|
top_k_sparse=top_k_sparse,
|
|
top_m_final=top_m_final,
|
|
)
|
|
kb = kb_helper.kb
|
|
|
|
return Response().ok(kb.model_dump(), "创建知识库成功").__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"创建知识库失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"创建知识库失败: {e!s}").__dict__
|
|
|
|
async def get_kb(self):
|
|
"""获取知识库详情
|
|
|
|
Query 参数:
|
|
- kb_id: 知识库 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
kb_id = request.args.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
kb = kb_helper.kb
|
|
|
|
return Response().ok(kb.model_dump()).__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取知识库详情失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取知识库详情失败: {e!s}").__dict__
|
|
|
|
async def update_kb(self):
|
|
"""更新知识库
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
- kb_name: 新的知识库名称 (可选)
|
|
- description: 新的描述 (可选)
|
|
- emoji: 新的图标 (可选)
|
|
- embedding_provider_id: 新的嵌入模型提供商ID (可选)
|
|
- rerank_provider_id: 新的重排序模型提供商ID (可选)
|
|
- chunk_size: 分块大小 (可选)
|
|
- chunk_overlap: 块重叠大小 (可选)
|
|
- top_k_dense: 密集检索数量 (可选)
|
|
- top_k_sparse: 稀疏检索数量 (可选)
|
|
- top_m_final: 最终返回数量 (可选)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
kb_name = data.get("kb_name")
|
|
description = data.get("description")
|
|
emoji = data.get("emoji")
|
|
embedding_provider_id = data.get("embedding_provider_id")
|
|
rerank_provider_id = data.get("rerank_provider_id")
|
|
chunk_size = data.get("chunk_size")
|
|
chunk_overlap = data.get("chunk_overlap")
|
|
top_k_dense = data.get("top_k_dense")
|
|
top_k_sparse = data.get("top_k_sparse")
|
|
top_m_final = data.get("top_m_final")
|
|
|
|
# 检查是否至少提供了一个更新字段
|
|
if all(
|
|
v is None
|
|
for v in [
|
|
kb_name,
|
|
description,
|
|
emoji,
|
|
embedding_provider_id,
|
|
rerank_provider_id,
|
|
chunk_size,
|
|
chunk_overlap,
|
|
top_k_dense,
|
|
top_k_sparse,
|
|
top_m_final,
|
|
]
|
|
):
|
|
return Response().error("至少需要提供一个更新字段").__dict__
|
|
|
|
kb_helper = await kb_manager.update_kb(
|
|
kb_id=kb_id,
|
|
kb_name=kb_name,
|
|
description=description,
|
|
emoji=emoji,
|
|
embedding_provider_id=embedding_provider_id,
|
|
rerank_provider_id=rerank_provider_id,
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
top_k_dense=top_k_dense,
|
|
top_k_sparse=top_k_sparse,
|
|
top_m_final=top_m_final,
|
|
)
|
|
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
kb = kb_helper.kb
|
|
return Response().ok(kb.model_dump(), "更新知识库成功").__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"更新知识库失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"更新知识库失败: {e!s}").__dict__
|
|
|
|
async def delete_kb(self):
|
|
"""删除知识库
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
success = await kb_manager.delete_kb(kb_id)
|
|
if not success:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
return Response().ok(message="删除知识库成功").__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"删除知识库失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"删除知识库失败: {e!s}").__dict__
|
|
|
|
async def get_kb_stats(self):
|
|
"""获取知识库统计信息
|
|
|
|
Query 参数:
|
|
- kb_id: 知识库 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
kb_id = request.args.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
kb = kb_helper.kb
|
|
|
|
stats = {
|
|
"kb_id": kb.kb_id,
|
|
"kb_name": kb.kb_name,
|
|
"doc_count": kb.doc_count,
|
|
"chunk_count": kb.chunk_count,
|
|
"created_at": kb.created_at.isoformat(),
|
|
"updated_at": kb.updated_at.isoformat(),
|
|
}
|
|
|
|
return Response().ok(stats).__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取知识库统计失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取知识库统计失败: {e!s}").__dict__
|
|
|
|
# ===== 文档管理 API =====
|
|
|
|
async def list_documents(self):
|
|
"""获取文档列表
|
|
|
|
Query 参数:
|
|
- kb_id: 知识库 ID (必填)
|
|
- page: 页码 (默认 1)
|
|
- page_size: 每页数量 (默认 20)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
kb_id = request.args.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
page = request.args.get("page", 1, type=int)
|
|
page_size = request.args.get("page_size", 100, type=int)
|
|
|
|
offset = (page - 1) * page_size
|
|
limit = page_size
|
|
|
|
doc_list = await kb_helper.list_documents(offset=offset, limit=limit)
|
|
|
|
doc_list = [doc.model_dump() for doc in doc_list]
|
|
|
|
return (
|
|
Response()
|
|
.ok({"items": doc_list, "page": page, "page_size": page_size})
|
|
.__dict__
|
|
)
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取文档列表失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取文档列表失败: {e!s}").__dict__
|
|
|
|
async def upload_document(self):
|
|
"""上传文档
|
|
|
|
支持两种方式:
|
|
1. multipart/form-data 文件上传(支持多文件,最多10个)
|
|
2. JSON 格式 base64 编码上传(支持多文件,最多10个)
|
|
|
|
Form Data (multipart/form-data):
|
|
- kb_id: 知识库 ID (必填)
|
|
- file: 文件对象 (必填,可多个,字段名为 file, file1, file2, ... 或 files[])
|
|
|
|
JSON Body (application/json):
|
|
- kb_id: 知识库 ID (必填)
|
|
- files: 文件数组 (必填)
|
|
- file_name: 文件名 (必填)
|
|
- file_content: base64 编码的文件内容 (必填)
|
|
|
|
返回:
|
|
- task_id: 任务ID,用于查询上传进度和结果
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
|
|
# 检查 Content-Type
|
|
content_type = request.content_type
|
|
kb_id = None
|
|
chunk_size = None
|
|
chunk_overlap = None
|
|
batch_size = 32
|
|
tasks_limit = 3
|
|
max_retries = 3
|
|
files_to_upload = [] # 存储待上传的文件信息列表
|
|
|
|
if content_type and "multipart/form-data" not in content_type:
|
|
return (
|
|
Response().error("Content-Type 须为 multipart/form-data").__dict__
|
|
)
|
|
form_data = await request.form
|
|
files = await request.files
|
|
|
|
kb_id = form_data.get("kb_id")
|
|
chunk_size = int(form_data.get("chunk_size", 512))
|
|
chunk_overlap = int(form_data.get("chunk_overlap", 50))
|
|
batch_size = int(form_data.get("batch_size", 32))
|
|
tasks_limit = int(form_data.get("tasks_limit", 3))
|
|
max_retries = int(form_data.get("max_retries", 3))
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
# 收集所有文件
|
|
file_list = []
|
|
# 支持 file, file1, file2, ... 或 files[] 格式
|
|
for key in files.keys():
|
|
if key == "file" or key.startswith("file") or key == "files[]":
|
|
file_items = files.getlist(key)
|
|
file_list.extend(file_items)
|
|
|
|
if not file_list:
|
|
return Response().error("缺少文件").__dict__
|
|
|
|
# 限制文件数量
|
|
if len(file_list) > 10:
|
|
return Response().error("最多只能上传10个文件").__dict__
|
|
|
|
# 处理每个文件
|
|
for file in file_list:
|
|
file_name = file.filename
|
|
|
|
# 保存到临时文件
|
|
temp_file_path = os.path.join(
|
|
get_astrbot_temp_path(),
|
|
f"kb_upload_{uuid.uuid4()}_{file_name}",
|
|
)
|
|
await file.save(temp_file_path)
|
|
|
|
try:
|
|
# 异步读取文件内容
|
|
async with aiofiles.open(temp_file_path, "rb") as f:
|
|
file_content = await f.read()
|
|
|
|
# 提取文件类型
|
|
file_type = (
|
|
file_name.rsplit(".", 1)[-1].lower() if "." in file_name else ""
|
|
)
|
|
|
|
files_to_upload.append(
|
|
{
|
|
"file_name": file_name,
|
|
"file_content": file_content,
|
|
"file_type": file_type,
|
|
},
|
|
)
|
|
finally:
|
|
# 清理临时文件
|
|
if os.path.exists(temp_file_path):
|
|
os.remove(temp_file_path)
|
|
|
|
# 获取知识库
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
# 生成任务ID
|
|
task_id = str(uuid.uuid4())
|
|
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="pending")
|
|
|
|
# 启动后台任务
|
|
asyncio.create_task(
|
|
self._background_upload_task(
|
|
task_id=task_id,
|
|
kb_helper=kb_helper,
|
|
files_to_upload=files_to_upload,
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
),
|
|
)
|
|
|
|
return (
|
|
Response()
|
|
.ok(
|
|
{
|
|
"task_id": task_id,
|
|
"file_count": len(files_to_upload),
|
|
"message": "task created, processing in background",
|
|
},
|
|
)
|
|
.__dict__
|
|
)
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"上传文档失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"上传文档失败: {e!s}").__dict__
|
|
|
|
def _validate_import_request(self, data: dict):
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
raise ValueError("缺少参数 kb_id")
|
|
|
|
documents = data.get("documents")
|
|
if not documents or not isinstance(documents, list):
|
|
raise ValueError("缺少参数 documents 或格式错误")
|
|
|
|
for doc in documents:
|
|
if "file_name" not in doc or "chunks" not in doc:
|
|
raise ValueError("文档格式错误,必须包含 file_name 和 chunks")
|
|
if not isinstance(doc["chunks"], list):
|
|
raise ValueError("chunks 必须是列表")
|
|
if not all(
|
|
isinstance(chunk, str) and chunk.strip() for chunk in doc["chunks"]
|
|
):
|
|
raise ValueError("chunks 必须是非空字符串列表")
|
|
|
|
batch_size = data.get("batch_size", 32)
|
|
tasks_limit = data.get("tasks_limit", 3)
|
|
max_retries = data.get("max_retries", 3)
|
|
return kb_id, documents, batch_size, tasks_limit, max_retries
|
|
|
|
async def import_documents(self):
|
|
"""导入预切片文档
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
- documents: 文档列表 (必填)
|
|
- file_name: 文件名 (必填)
|
|
- chunks: 切片列表 (必填, list[str])
|
|
- file_type: 文件类型 (可选, 默认从文件名推断或为 txt)
|
|
- batch_size: 批处理大小 (可选, 默认32)
|
|
- tasks_limit: 并发任务限制 (可选, 默认3)
|
|
- max_retries: 最大重试次数 (可选, 默认3)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id, documents, batch_size, tasks_limit, max_retries = (
|
|
self._validate_import_request(data)
|
|
)
|
|
|
|
# 获取知识库
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
# 生成任务ID
|
|
task_id = str(uuid.uuid4())
|
|
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="pending")
|
|
|
|
# 启动后台任务
|
|
asyncio.create_task(
|
|
self._background_import_task(
|
|
task_id=task_id,
|
|
kb_helper=kb_helper,
|
|
documents=documents,
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
),
|
|
)
|
|
|
|
return (
|
|
Response()
|
|
.ok(
|
|
{
|
|
"task_id": task_id,
|
|
"doc_count": len(documents),
|
|
"message": "import task created, processing in background",
|
|
},
|
|
)
|
|
.__dict__
|
|
)
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"导入文档失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"导入文档失败: {e!s}").__dict__
|
|
|
|
async def get_upload_progress(self):
|
|
"""获取上传进度和结果
|
|
|
|
Query 参数:
|
|
- task_id: 任务 ID (必填)
|
|
|
|
返回状态:
|
|
- pending: 任务待处理
|
|
- processing: 任务处理中
|
|
- completed: 任务完成
|
|
- failed: 任务失败
|
|
"""
|
|
try:
|
|
task_id = request.args.get("task_id")
|
|
if not task_id:
|
|
return Response().error("缺少参数 task_id").__dict__
|
|
|
|
# 检查任务是否存在
|
|
if task_id not in self.upload_tasks:
|
|
return Response().error("找不到该任务").__dict__
|
|
|
|
task_info = self.upload_tasks[task_id]
|
|
status = task_info["status"]
|
|
|
|
# 构建返回数据
|
|
response_data = {
|
|
"task_id": task_id,
|
|
"status": status,
|
|
}
|
|
|
|
# 如果任务正在处理,返回进度信息
|
|
if status == "processing" and task_id in self.upload_progress:
|
|
response_data["progress"] = self.upload_progress[task_id]
|
|
|
|
# 如果任务完成,返回结果
|
|
if status == "completed":
|
|
response_data["result"] = task_info["result"]
|
|
# 清理已完成的任务
|
|
# del self.upload_tasks[task_id]
|
|
# if task_id in self.upload_progress:
|
|
# del self.upload_progress[task_id]
|
|
|
|
# 如果任务失败,返回错误信息
|
|
if status == "failed":
|
|
response_data["error"] = task_info["error"]
|
|
|
|
return Response().ok(response_data).__dict__
|
|
|
|
except Exception as e:
|
|
logger.error(f"获取上传进度失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取上传进度失败: {e!s}").__dict__
|
|
|
|
async def get_document(self):
|
|
"""获取文档详情
|
|
|
|
Query 参数:
|
|
- doc_id: 文档 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
kb_id = request.args.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
doc_id = request.args.get("doc_id")
|
|
if not doc_id:
|
|
return Response().error("缺少参数 doc_id").__dict__
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
doc = await kb_helper.get_document(doc_id)
|
|
if not doc:
|
|
return Response().error("文档不存在").__dict__
|
|
|
|
return Response().ok(doc.model_dump()).__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取文档详情失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取文档详情失败: {e!s}").__dict__
|
|
|
|
async def delete_document(self):
|
|
"""删除文档
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
- doc_id: 文档 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
doc_id = data.get("doc_id")
|
|
if not doc_id:
|
|
return Response().error("缺少参数 doc_id").__dict__
|
|
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
await kb_helper.delete_document(doc_id)
|
|
return Response().ok(message="删除文档成功").__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"删除文档失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"删除文档失败: {e!s}").__dict__
|
|
|
|
async def delete_chunk(self):
|
|
"""删除文本块
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
- chunk_id: 块 ID (必填)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
chunk_id = data.get("chunk_id")
|
|
if not chunk_id:
|
|
return Response().error("缺少参数 chunk_id").__dict__
|
|
doc_id = data.get("doc_id")
|
|
if not doc_id:
|
|
return Response().error("缺少参数 doc_id").__dict__
|
|
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
await kb_helper.delete_chunk(chunk_id, doc_id)
|
|
return Response().ok(message="删除文本块成功").__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"删除文本块失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"删除文本块失败: {e!s}").__dict__
|
|
|
|
async def list_chunks(self):
|
|
"""获取块列表
|
|
|
|
Query 参数:
|
|
- kb_id: 知识库 ID (必填)
|
|
- page: 页码 (默认 1)
|
|
- page_size: 每页数量 (默认 20)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
kb_id = request.args.get("kb_id")
|
|
doc_id = request.args.get("doc_id")
|
|
page = request.args.get("page", 1, type=int)
|
|
page_size = request.args.get("page_size", 100, type=int)
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
if not doc_id:
|
|
return Response().error("缺少参数 doc_id").__dict__
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
offset = (page - 1) * page_size
|
|
limit = page_size
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
chunk_list = await kb_helper.get_chunks_by_doc_id(
|
|
doc_id=doc_id,
|
|
offset=offset,
|
|
limit=limit,
|
|
)
|
|
return (
|
|
Response()
|
|
.ok(
|
|
data={
|
|
"items": chunk_list,
|
|
"page": page,
|
|
"page_size": page_size,
|
|
"total": await kb_helper.get_chunk_count_by_doc_id(doc_id),
|
|
},
|
|
)
|
|
.__dict__
|
|
)
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"获取块列表失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"获取块列表失败: {e!s}").__dict__
|
|
|
|
# ===== 检索 API =====
|
|
|
|
async def retrieve(self):
|
|
"""检索知识库
|
|
|
|
Body:
|
|
- query: 查询文本 (必填)
|
|
- kb_ids: 知识库 ID 列表 (必填)
|
|
- top_k: 返回结果数量 (可选, 默认 5)
|
|
- debug: 是否启用调试模式,返回 t-SNE 可视化图片 (可选, 默认 False)
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
query = data.get("query")
|
|
kb_names = data.get("kb_names")
|
|
debug = data.get("debug", False)
|
|
|
|
if not query:
|
|
return Response().error("缺少参数 query").__dict__
|
|
if not kb_names or not isinstance(kb_names, list):
|
|
return Response().error("缺少参数 kb_names 或格式错误").__dict__
|
|
|
|
top_k = data.get("top_k", 5)
|
|
|
|
results = await kb_manager.retrieve(
|
|
query=query,
|
|
kb_names=kb_names,
|
|
top_m_final=top_k,
|
|
)
|
|
result_list = []
|
|
if results:
|
|
result_list = results["results"]
|
|
|
|
response_data = {
|
|
"results": result_list,
|
|
"total": len(result_list),
|
|
"query": query,
|
|
}
|
|
|
|
# Debug 模式:生成 t-SNE 可视化
|
|
if debug:
|
|
try:
|
|
img_base64 = await generate_tsne_visualization(
|
|
query,
|
|
kb_names,
|
|
kb_manager,
|
|
)
|
|
if img_base64:
|
|
response_data["visualization"] = img_base64
|
|
except Exception as e:
|
|
logger.error(f"生成 t-SNE 可视化失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
response_data["visualization_error"] = str(e)
|
|
|
|
return Response().ok(response_data).__dict__
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"检索失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"检索失败: {e!s}").__dict__
|
|
|
|
async def upload_document_from_url(self):
|
|
"""从 URL 上传文档
|
|
|
|
Body:
|
|
- kb_id: 知识库 ID (必填)
|
|
- url: 要提取内容的网页 URL (必填)
|
|
- chunk_size: 分块大小 (可选, 默认512)
|
|
- chunk_overlap: 块重叠大小 (可选, 默认50)
|
|
- batch_size: 批处理大小 (可选, 默认32)
|
|
- tasks_limit: 并发任务限制 (可选, 默认3)
|
|
- max_retries: 最大重试次数 (可选, 默认3)
|
|
|
|
返回:
|
|
- task_id: 任务ID,用于查询上传进度和结果
|
|
"""
|
|
try:
|
|
kb_manager = self._get_kb_manager()
|
|
data = await request.json
|
|
|
|
kb_id = data.get("kb_id")
|
|
if not kb_id:
|
|
return Response().error("缺少参数 kb_id").__dict__
|
|
|
|
url = data.get("url")
|
|
if not url:
|
|
return Response().error("缺少参数 url").__dict__
|
|
|
|
chunk_size = data.get("chunk_size", 512)
|
|
chunk_overlap = data.get("chunk_overlap", 50)
|
|
batch_size = data.get("batch_size", 32)
|
|
tasks_limit = data.get("tasks_limit", 3)
|
|
max_retries = data.get("max_retries", 3)
|
|
enable_cleaning = data.get("enable_cleaning", False)
|
|
cleaning_provider_id = data.get("cleaning_provider_id")
|
|
|
|
# 获取知识库
|
|
kb_helper = await kb_manager.get_kb(kb_id)
|
|
if not kb_helper:
|
|
return Response().error("知识库不存在").__dict__
|
|
|
|
# 生成任务ID
|
|
task_id = str(uuid.uuid4())
|
|
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="pending")
|
|
|
|
# 启动后台任务
|
|
asyncio.create_task(
|
|
self._background_upload_from_url_task(
|
|
task_id=task_id,
|
|
kb_helper=kb_helper,
|
|
url=url,
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
enable_cleaning=enable_cleaning,
|
|
cleaning_provider_id=cleaning_provider_id,
|
|
),
|
|
)
|
|
|
|
return (
|
|
Response()
|
|
.ok(
|
|
{
|
|
"task_id": task_id,
|
|
"url": url,
|
|
"message": "URL upload task created, processing in background",
|
|
},
|
|
)
|
|
.__dict__
|
|
)
|
|
|
|
except ValueError as e:
|
|
return Response().error(str(e)).__dict__
|
|
except Exception as e:
|
|
logger.error(f"从URL上传文档失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
return Response().error(f"从URL上传文档失败: {e!s}").__dict__
|
|
|
|
async def _background_upload_from_url_task(
|
|
self,
|
|
task_id: str,
|
|
kb_helper,
|
|
url: str,
|
|
chunk_size: int,
|
|
chunk_overlap: int,
|
|
batch_size: int,
|
|
tasks_limit: int,
|
|
max_retries: int,
|
|
enable_cleaning: bool,
|
|
cleaning_provider_id: str | None,
|
|
) -> None:
|
|
"""后台上传URL任务"""
|
|
try:
|
|
# 初始化任务状态
|
|
self._init_task(task_id, status="processing")
|
|
self.upload_progress[task_id] = {
|
|
"status": "processing",
|
|
"file_index": 0,
|
|
"file_total": 1,
|
|
"file_name": f"URL: {url}",
|
|
"stage": "extracting",
|
|
"current": 0,
|
|
"total": 100,
|
|
}
|
|
|
|
# 创建进度回调函数
|
|
progress_callback = self._make_progress_callback(task_id, 0, f"URL: {url}")
|
|
|
|
# 上传文档
|
|
doc = await kb_helper.upload_from_url(
|
|
url=url,
|
|
chunk_size=chunk_size,
|
|
chunk_overlap=chunk_overlap,
|
|
batch_size=batch_size,
|
|
tasks_limit=tasks_limit,
|
|
max_retries=max_retries,
|
|
progress_callback=progress_callback,
|
|
enable_cleaning=enable_cleaning,
|
|
cleaning_provider_id=cleaning_provider_id,
|
|
)
|
|
|
|
# 更新任务完成状态
|
|
result = {
|
|
"task_id": task_id,
|
|
"uploaded": [doc.model_dump()],
|
|
"failed": [],
|
|
"total": 1,
|
|
"success_count": 1,
|
|
"failed_count": 0,
|
|
}
|
|
|
|
self._set_task_result(task_id, "completed", result=result)
|
|
|
|
except Exception as e:
|
|
logger.error(f"后台上传URL任务 {task_id} 失败: {e}")
|
|
logger.error(traceback.format_exc())
|
|
self._set_task_result(task_id, "failed", error=str(e))
|