V1.0.0: 基于索引的知识检索系统

核心功能:
- 文档索引:使用LLM分析提取关键词/摘要/主题/实体
- 查询处理:LLM分析查询意图并扩展关键词
- BM25检索:基于倒排索引的相关性排序
- RAG问答:检索增强生成

技术栈:
- Flask + SQLAlchemy
- OpenAI API兼容LLM
- BM25算法

特点: 不依赖向量模型和向量库
This commit is contained in:
2026-04-07 23:48:06 +08:00
commit cdaadef10c
10 changed files with 2079 additions and 0 deletions

162
templates/index.html Normal file
View File

@@ -0,0 +1,162 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LLM Index RAG - 基于索引的知识检索系统</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.10.0/font/bootstrap-icons.css" rel="stylesheet">
<style>
body { background-color: #f8f9fa; }
.hero { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 60px 0; }
.hero h1 { font-size: 2.5rem; margin-bottom: 1rem; }
.search-box { background: white; border-radius: 10px; padding: 20px; box-shadow: 0 5px 20px rgba(0,0,0,0.1); }
.stat-card { border-radius: 10px; border: none; transition: transform 0.3s; }
.stat-card:hover { transform: translateY(-5px); }
.result-item { border-left: 4px solid #667eea; padding-left: 15px; margin-bottom: 20px; }
.source-tag { font-size: 0.8rem; background: #e9ecef; padding: 2px 8px; border-radius: 4px; }
</style>
</head>
<body>
<!-- 导航栏 -->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark">
<div class="container">
<a class="navbar-brand" href="/">
<i class="bi bi-search"></i> LLM Index RAG
</a>
<div class="navbar-nav ms-auto">
<a class="nav-link" href="/">首页</a>
<a class="nav-link" href="/documents">文档管理</a>
<a class="nav-link" href="/search">知识检索</a>
</div>
</div>
</nav>
<!-- Hero区域 -->
<div class="hero">
<div class="container text-center">
<h1><i class="bi bi-diagram-3"></i> LLM Index RAG</h1>
<p class="lead">基于索引和搜索的知识检索系统(不使用向量模型)</p>
<p class="small">使用LLM构建索引 • 关键词检索 • BM25排序</p>
</div>
</div>
<!-- 搜索区域 -->
<div class="container" style="margin-top: -30px;">
<div class="search-box">
<form id="searchForm">
<div class="input-group input-group-lg">
<input type="text" class="form-control" id="queryInput"
placeholder="输入您的问题..." autocomplete="off">
<button class="btn btn-primary" type="submit">
<i class="bi bi-search"></i> 检索
</button>
</div>
<div class="mt-2">
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" name="mode" id="modeSearch" value="search" checked>
<label class="form-check-label" for="modeSearch">文档检索</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="radio" name="mode" id="modeRAG" value="rag">
<label class="form-check-label" for="modeRAG">智能问答</label>
</div>
</div>
</form>
</div>
</div>
<!-- 统计信息 -->
<div class="container mt-5">
<div class="row">
<div class="col-md-3">
<div class="card stat-card text-center p-4">
<i class="bi bi-file-earmark-text display-4 text-primary"></i>
<h3 class="mt-2" id="statDocs">{{ stats.total_documents or 0 }}</h3>
<small class="text-muted">文档数量</small>
</div>
</div>
<div class="col-md-3">
<div class="card stat-card text-center p-4">
<i class="bi bi-puzzle display-4 text-success"></i>
<h3 class="mt-2" id="statChunks">{{ stats.total_chunks or 0 }}</h3>
<small class="text-muted">文档分块</small>
</div>
</div>
<div class="col-md-3">
<div class="card stat-card text-center p-4">
<i class="bi bi-key display-4 text-warning"></i>
<h3 class="mt-2" id="statTerms">{{ stats.total_terms or 0 }}</h3>
<small class="text-muted">索引词条</small>
</div>
</div>
<div class="col-md-3">
<div class="card stat-card text-center p-4">
<i class="bi bi-file-word display-4 text-info"></i>
<h3 class="mt-2" id="statWords">{{ "{:,}".format(stats.total_words or 0) }}</h3>
<small class="text-muted">总字数</small>
</div>
</div>
</div>
</div>
<!-- 搜索结果 -->
<div class="container mt-4">
<div id="resultsSection" style="display:none;">
<h5><i class="bi bi-list-ul"></i> 检索结果 <span id="resultCount" class="badge bg-secondary"></span></h5>
<div id="resultsContainer"></div>
</div>
<!-- RAG回答 -->
<div id="ragSection" style="display:none;">
<h5><i class="bi bi-chat-dots"></i> 智能回答</h5>
<div id="ragAnswer" class="card p-4 mb-4"></div>
<h6><i class="bi bi-book"></i> 参考来源</h6>
<div id="ragSources"></div>
</div>
</div>
<!-- 工作原理 -->
<div class="container mt-5 mb-5">
<h4 class="text-center mb-4"><i class="bi bi-gear"></i> 工作原理</h4>
<div class="row">
<div class="col-md-4">
<div class="card h-100">
<div class="card-body text-center">
<i class="bi bi-file-earmark-plus display-4 text-primary"></i>
<h5 class="mt-3">1. 文档索引</h5>
<p class="text-muted">使用LLM分析文档提取关键词、摘要、主题、实体等信息构建索引</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-body text-center">
<i class="bi bi-search-heart display-4 text-success"></i>
<h5 class="mt-3">2. 查询处理</h5>
<p class="text-muted">LLM分析查询意图提取关键词并进行查询扩展</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-body text-center">
<i class="bi bi-sort-numeric-up display-4 text-warning"></i>
<h5 class="mt-3">3. BM25检索</h5>
<p class="text-muted">基于倒排索引和BM25算法计算相关性得分返回最相关文档</p>
</div>
</div>
</div>
</div>
</div>
<footer class="bg-dark text-white py-4 mt-5">
<div class="container text-center">
<p>LLM Index RAG v1.0.0 | 基于索引的知识检索系统</p>
</div>
</footer>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/main.js"></script>
</body>
</html>