CAICAII
39b9e55434
perf: batch metadata query in KB retrieval to fix N+1 problem ( #5463 )
...
* perf: batch metadata query in KB retrieval to fix N+1 problem
Replace N sequential get_document_with_metadata() calls with a single
get_documents_with_metadata_batch() call using SQL IN clause.
Benchmark results (local SQLite):
- 10 docs: 10.67ms → 1.47ms (7.3x faster)
- 20 docs: 26.00ms → 2.68ms (9.7x faster)
- 50 docs: 63.87ms → 2.79ms (22.9x faster)
* refactor: use set[str] param type and chunk IN clause for SQLite safety
Address review feedback:
- Change doc_ids param from list[str] to set[str] to avoid unnecessary conversion
- Chunk IN clause into batches of 900 to stay under SQLite's 999 parameter limit
- Remove list() wrapping at call site, pass set directly
2026-02-26 09:59:37 +08:00
エイカク
9c691b2266
chore: remove Electron desktop pipeline and switch to tauri repo ( #5226 )
...
* ci: remove Electron desktop build from release pipeline
* chore: remove electron desktop and switch to tauri release trigger
* ci: remove desktop workflow dispatch trigger
* refactor: migrate data paths to astrbot_path helpers
* fix: point desktop update prompt to AstrBot-desktop releases
2026-02-19 23:04:18 +09:00
Dt8333
7dd95d8a59
chore: auto ann fix by ruff ( #4903 )
...
* chore: auto fix by ruff
* refactor: 统一修正返回类型注解为 None/bool 以匹配实现
* refactor: 将 _get_next_page 改为异步并移除多余的请求错误抛出
* refactor: 将 get_client 的返回类型改为 object
* style: 为 LarkMessageEvent 的相关方法添加返回类型注解 None
---------
Co-authored-by: Soulter <37870767+Soulter@users.noreply.github.com >
2026-02-09 00:22:24 +08:00
Li-shi-ling
9a91f2fb11
fix: ensure atomic creation of knowledge base with proper cleanup on failure ( #4406 )
...
* fix: ensure atomic creation of knowledge base with proper cleanup on failure
- Added pre-validation for embedding_provider_id parameter
- Added check for existing knowledge base with same name
- Implemented proper rollback mechanism when KBHelper initialization fails
- Uses same session for cleanup to ensure data consistency
- Fixes #4403
* fix: ensure atomic KB creation with session.flush() to remove race condition risks
* fix: ensure change the annotation back
2026-01-11 14:24:26 +08:00
Soulter
792fb69d6d
perf: allow zero chunk overlap in recursive chunker ( #4258 )
...
* Allow zero chunk overlap
* Validate recursive chunking bounds
2025-12-30 15:23:05 +08:00
Dt8333
f624971613
chore: fix bunches of type checking errors ( #3213 )
...
* chore(core.utils): 🚨 修正错误Lint
* chore(core.provider): 🚨 修复基类错误Lint
* chore(core.utils): 补全session_get()的重载
* chore(core.provider): 🚨 修正实现错误Lint
* chore(core.platform): 🚨 修正platform基类和webchat的错误Lint
* chore(core.platform): 修正错误实现Lint
* fix(core.provider): 修复循环调用和错误assert
* chore(core.platform): 修复部分实现Lint
* chore(core.provider): 补充Dify.text_chat_stream的参数类型
* chore(core.pipeline): 🚨 修复错误Lint
* fix(core.slack): 补充遗漏导入
* chore(core.utils): 修复错误的session_get声明
* chore(core.platform): 移除Lark adapter import中的wildcard
* chore(core.db): 修复声明和部分逻辑
* chore(core.db): 添加typings,使faiss参数能被正确识别。
* chore(core): 修复声明
* chore(core): 修改声明
* chore: 补充faiss声明
* chore(dashboard): 修改实现,减少报错
* chore(package): 修改部分声明与实现,减少报错
* chore(core): 添加Handler的overload,以去除部分assert同时通过类型检查
* chore(core.pipeline): 修改Pipeline Scheduler的execute,将判断属性改为判断类型,通过静态类型检查
* chore(core.config): 添加类型标注,通过类型检查
* chore(core.message): 为File._download_file添加检查,通过类型检查
* fix: 将断言改为条件判断以实现优雅关闭的容错性
* refactor: 移除 discord 客户端中的 assert,改用 if None 判断并抛出异常
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: DiscordPlatformAdapter 对 self.client.user 为 None 做日志并返回,移除断言
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 增强 Lark 相关空值/异常检查并完善日志输出
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 将断言替换为条件检查并加入日志与错误处理
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* chore: 移除LLM生成的无用注释
* refactor: 使用 File.get_file 替换下载逻辑并移除 assert,提供默认 filename
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: Slack Socket 未初始化抛出运行时异常,图片 URL 判空改为非空判断
* refactor: 将 WeChatPadProAdapter 的断言改为空值判断并添加日志
* refactor: 使用 isinstance 替代断言实现类型判断,便于静态检查
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 去除cast,直接使用字段与字典访问,修正端口解析
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 使用 match-case 重构 ProviderManager 加载并通过类型检查抛出 TypeError
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: group_name_display 时若 group 对象为空则记录错误并返回
* fix: 将 _get_current_persona_id 的 assert 替换成 if guard 并返回 None
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 优化插件目录存在性检查及图片URL非空验证,更新JSON排序配置
* fix: 将 datetime_str 的 assert 替换为显式检查并抛出异常
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 移除 cast,改为运行时检查并在找不到调度器时跳过
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 移除 cast,改用 isinstance 检查 FaissVecDB 并警告
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 删除 typing.cast 导入,并在获取文件绝对路径前校验 file_
* refactor: 移除 typing.cast,简化内容安全检查调用
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 将 PlatformMetadata.id 设为必填并在注册时传入 id,移除 cast
* refactor: 移除 cast,改用 HasInitialize 与 isinstance 进行初始化
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 为 ProviderManager.initialize 增加ID类型判断,避免 None 导致 get 失败
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 为 OTTSProvider 与 AzureNativeProvider 引入 _client 与 client 属性改进上下文管理
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 为 Whisper 自托管源添加模型未初始化校验并直接调用 transcribe
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 移除未使用的 cast 导入并简化 platform_name 赋值
* refactor: 引入 cast 并对 id 使用 cast(str, ...) 提升类型安全
* fix: 将 _id_to_sid 返回改为 str,空值返回空串;对 id 与 message_id 使用 cast
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 重构 Discord 处理逻辑:强制 类型转换、优先斜杠指令并优化提及判断
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* fix: 统一对 id 获取执行 cast,并在微信消息解析失败时抛错
* Revert "fix: 去除cast,直接使用字段与字典访问,修正端口解析"
This reverts commit 1cbfdf9d1b .
* fix: 百炼 Rerank 会话关闭时返回空结果;初始化 request.prompt 避免空值拼接
* fix: 统一处理搜索结果链接为字符串,新增 _get_url 助手并适配 Bing/Sogo
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
* refactor: 调整 call_handler 泛型、Discord 通道注解及 FishAudioTTS API 请求类型
* refactor: 使用 col(...) 替代列引用并对结果进行 CursorResult 强转
* chore: ruff format
---------
Co-authored-by: aider (openai/gemini-3-pro-high) <aider@aider.chat >
Co-authored-by: Soulter <905617992@qq.com >
2025-12-09 14:13:47 +08:00
RC-CHN
270c89c12f
feat: Add URL document parser for knowledge base ( #3622 )
...
* feat: 添加从 URL 上传文档的功能,支持进度回调和错误处理
* feat: 添加从 URL 上传文档的前端
* chore: 添加 URL 上传功能的警告提示,确保用户配置正确
* feat: 添加内容清洗功能,支持从 URL 上传文档时的清洗设置和服务提供商选择
* feat: 更新内容清洗系统提示,增强信息提取规则;添加 URL 上传功能的测试版标识
* style: format code
* perf: 优化上传设置,增强 URL 上传时的禁用逻辑和清洗提供商验证
* refactor:使用自带chunking模块
* refactor: 提取prompt到单独文件
* feat: 添加 Tavily API Key 配置对话框,增强网页搜索功能的配置体验
* fix: update URL hint and warning messages for clarity in knowledge base upload settings
* fix: 修复设置tavily_key的热重载问题
---------
Co-authored-by: Soulter <905617992@qq.com >
2025-11-17 19:05:14 +08:00
LIghtJUNction
0b7fc29ac4
style: add ruff lint module of isort and pyupgrade, and some ruff check fix ( #3214 )
...
Co-authored-by: Dt8333 <25431943+Dt8333@users.noreply.github.com >
Co-authored-by: Soulter <905617992@qq.com >
2025-11-01 13:26:19 +08:00
Soulter
0823f7aa48
在检查字面量集合的成员资格时使用 set
...
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
2025-10-25 22:04:17 +08:00
Soulter
eb201c0420
feat: refactor knowledge base parsers and add MarkitdownParser for docx, xls, xlsx support
2025-10-25 22:00:54 +08:00
lxfight
57f868cab1
Merge branch 'feature/knowledge-base' of https://github.com/lxfight/AstrBot into feature/knowledge-base
2025-10-25 13:53:03 +08:00
Soulter
016783a1e5
feat: implement RecursiveCharacterChunker and update KnowledgeBaseManager to use it
2025-10-25 13:46:06 +08:00
lxfight
594ccff9c8
fix: 添加数据库连接检查和知识库终止功能,增强错误处理和清理逻辑,修复知识库无法删除的问题
2025-10-25 11:56:37 +08:00
Soulter
8f021eb35a
feat: refactor document storage to use SQLModel and enhance database operations
2025-10-24 23:17:37 +08:00
Soulter
4cedc6d3c8
feat: add t-SNE visualization for FAISS index and enhance knowledge base retrieval with debug mode
2025-10-24 21:22:46 +08:00
Soulter
4e9cce76da
feat: add timing logs for dense and sparse retrieval processes and adjust top K results in sparse retriever
2025-10-24 17:51:30 +08:00
Soulter
9b004f3d2f
feat: update document retrieval to include limit and offset parameters
2025-10-24 17:38:22 +08:00
Soulter
9430e3090d
feat: add progress callback for document upload and enhance upload progress tracking
2025-10-24 17:13:44 +08:00
Soulter
ba44f9117b
feat: enhance document upload process with batch settings and improved chunk handling
2025-10-24 16:37:37 +08:00
Soulter
38e3f27899
feat: update knowledge base retrieval configuration and UI adjustments
2025-10-24 15:06:07 +08:00
Soulter
a6be0cc135
feat: refresh knowledge base and document after uploading a document
2025-10-24 14:28:27 +08:00
Soulter
a53510bc41
refactor: comment out file path handling in KBHelper and search input in DocumentDetail
2025-10-24 14:27:01 +08:00
Soulter
1fd482e899
feat: update chunk deletion to include document ID and refresh metadata
2025-10-24 14:18:32 +08:00
Soulter
2f130ba009
feat: delete chunk and delete document
2025-10-24 13:59:17 +08:00
Soulter
e0ac743cdb
perf: remove rerank functionality from settings tab and related form data
2025-10-24 12:13:51 +08:00
Soulter
7e0a50fbf2
feat: enhance knowledge base retrieval with chunk metadata and pagination support; remove unused chunk model
2025-10-24 00:44:40 +08:00
Soulter
59df244173
improve
2025-10-23 21:20:41 +08:00
Soulter
e3aa1315ae
stage
2025-10-23 00:31:15 +08:00
Soulter
65bc5efa19
feat: 集成知识库管理器,优化知识库上下文注入流程,移除冗余代码
2025-10-22 21:59:00 +08:00
lxfight
a05868cc45
feat: 更新知识库管理器以支持重排序模型提供商,调整相关组件的默认配置和提示信息
2025-10-20 22:38:06 +08:00
lxfight
2fc77aed15
feat: 添加知识库检索功能,支持根据知识库 ID 列出相关会话;更新相关界面和国际化文本
2025-10-20 22:23:35 +08:00
lxfight
beccae933f
fix:修复KBSessionConfig的导入问题
2025-10-19 21:36:01 +08:00
lxfight
a0254ed817
refactor: 优化知识库管理器和数据库操作的代码格式
2025-10-19 19:36:26 +08:00
lxfight
ad96d676e6
feat: 实现知识库核心后端模块
...
- 实现完整的知识库数据模型(知识库、文档、文档块、会话配置)
- 实现基于 SQLite 的向量数据库存储和检索
- 实现文档解析器(PDF、TXT)和固定大小分块器
- 实现混合检索系统(密集向量检索 + BM25 稀疏检索 + RRF 融合)
- 实现知识库生命周期管理和消息注入器
- 支持会话级别的知识库配置和关联
2025-10-19 18:40:55 +08:00