Browse Source

docs: 完善项目文档,添加完整 README 和 SKILL 说明

Daily Deploy Bot 20 giờ trước cách đây
mục cha
commit
0bcb85666c
2 tập tin đã thay đổi với 333 bổ sung66 xóa
  1. 248 0
      README.md
  2. 85 66
      arxiv-digest/SKILL.md

+ 248 - 0
README.md

@@ -0,0 +1,248 @@
+# RobotDaily
+
+> 🤖 每日 AI/机器人领域论文精选推送
+
+RobotDaily 是一个自动化论文精选系统,每天从 arXiv 抓取具身智能、表征学习、强化学习方向的最新论文,经过 LLM 筛选和解读后,推送到 Discord 频道并归档到 Hugo 博客。
+
+---
+
+## 项目结构
+
+```
+skills/robdaily/
+├── arxiv-digest/              # 核心模块:论文抓取与处理
+│   ├── scripts/               # Python 脚本
+│   │   ├── run_daily.py       # 主入口:一键完成全流程
+│   │   ├── fetch_arxiv.py     # 从 arXiv RSS 抓取论文
+│   │   ├── search_arxiv_papers.py  # arXiv API 搜索
+│   │   ├── select_papers.py   # 论文筛选与评分
+│   │   ├── get_daily_papers.py        # 综合抓取工具
+│   │   ├── complete_llm_pipeline.py   # LLM 增强全流程
+│   │   ├── render_digest.py           # 渲染 HTML/Markdown 摘要
+│   │   ├── enhanced_translation.py    # 高质量中文翻译
+│   │   ├── translate_abstract.py      # 摘要翻译
+│   │   ├── llm_translation_extraction.py  # LLM 提取解读
+│   │   ├── publish_discord.py         # Discord 推送
+│   │   ├── publish_hugo.py            # Hugo 同步
+│   │   ├── format_telegram_card.py    # Telegram 卡片格式化(废弃)
+│   │   └── install_system_cron.py     # 系统定时任务安装
+│   ├── output/                # 每日输出目录(自动生成)
+│   │   └── YYYY-MM-DD/
+│   │       ├── candidates.json        # 候选论文列表
+│   │       ├── selected.json          # 精选论文列表
+│   │       ├── enriched.json          # LLM 增强后数据
+│   │       ├── robotdaily.html        # HTML 移动端摘要卡片
+│   │       ├── robotdaily.md          # Markdown 归档版本
+│   │       └── manifest.json          # 元数据清单
+│   ├── logs/                # 运行日志
+│   ├── references/          # 参考文档
+│   │   └── selection-and-delivery.md  # 筛选与推送策略
+│   ├── assets/              # 静态资源(模板、样式)
+│   ├── .env                 # 环境变量配置
+│   └── .env.example         # 配置模板
+│
+├── site/                    # Hugo 博客站点
+│   ├── content/             # Markdown 内容
+│   │   ├── _index.md        # 首页
+│   │   ├── ai-daily/        # 每日 AI 简报(自动生成)
+│   │   │   ├── _index.md
+│   │   │   └── YYYY-MM-DD.md
+│   │   ├── blog/            # 个人博客文章
+│   │   ├── projects/        # 项目文档
+│   │   │   └── robotdaily/
+│   │   │       ├── architecture.md  # 架构说明
+│   │   │       ├── roadmap.md       # 路线图
+│   │   │       ├── ops.md           # 运维文档
+│   │   │       └── changelog.md     # 更新日志
+│   │   └── resume/          # 简历页面
+│   ├── layouts/             # Hugo 模板
+│   │   ├── _default/
+│   │   │   ├── baseof.html
+│   │   │   ├── list.html
+│   │   │   └── single.html
+│   │   └── index.html       # 首页模板
+│   ├── static/              # 静态资源
+│   │   └── css/
+│   │       └── site.css
+│   ├── hugo.yaml            # Hugo 配置
+│   └── README.md            # 站点说明
+│
+├── deploy/                    # Docker 部署方案
+│   ├── docker-compose.yml     # 容器编排
+│   ├── .env.example           # 环境变量模板
+│   └── README.md              # 部署说明
+│
+├── node_modules/              # Node.js 依赖
+├── package.json               # Node.js 配置
+├── generate_arxiv_digest.js   # Node.js 入口脚本(旧版)
+└── README.md                  # 本文件
+```
+
+---
+
+## 快速开始
+
+### 1. 配置环境变量
+
+```bash
+cd skills/robdaily/arxiv-digest
+cp .env.example .env
+# 编辑 .env 配置 Ollama 模型、Discord 令牌等
+```
+
+### 2. 运行每日简报
+
+```bash
+# 仅生成简报(dry run)
+python3 scripts/run_daily.py
+
+# 生成并推送到 Discord
+python3 scripts/run_daily.py --publish-discord
+
+# 生成并同步到 Hugo
+python3 scripts/run_daily.py --publish-hugo
+
+# 生成并推送两者
+python3 scripts/run_daily.py --publish-discord --publish-hugo
+
+# 跳过 LLM 增强(快速测试)
+python3 scripts/run_daily.py --skip-enrich
+```
+
+### 3. 启动 Hugo 站点(本地开发)
+
+```bash
+cd skills/robdaily/site
+hugo server -D
+# 访问 http://localhost:1313
+```
+
+### 4. Docker 部署
+
+```bash
+cd skills/robdaily/deploy
+cp .env.example .env
+docker compose up -d
+# 访问 http://localhost:9080
+```
+
+---
+
+## 核心功能
+
+### 论文抓取与筛选
+
+- **自动抓取**:从 arXiv RSS/API 获取最新论文
+- **领域聚焦**:具身智能、表征学习、强化学习
+- **智能评分**:基于创新性、应用价值、时效性自动排序
+- **精选推荐**:每个领域精选 2-3 篇最有价值论文
+
+### LLM 增强解读
+
+- **中文摘要翻译**:使用本地 Ollama 模型高质量翻译
+- **价值解读**:生成简短的技术要点解读
+- **标签分类**:自动提取关键词和标签
+- **移动端优化**:渲染适合手机阅读的卡片样式
+
+### 多渠道推送
+
+- **Discord**:支持线程、频道、固定频道等多种推送模式
+- **Hugo 博客**:自动生成每日简报页面
+- **定时任务**:支持系统 cron 或 OpenClaw cron 调度
+
+---
+
+## 配置说明
+
+### 关键环境变量
+
+| 变量 | 说明 | 默认值 |
+|------|------|--------|
+| `INSIGHT_MODELS` | Ollama 模型名称 | `qwen3.5:27b` |
+| `ROBOTDAILY_OUTPUT_DIR` | 输出目录 | `./output` |
+| `HUGO_CONTENT_DIR` | Hugo 内容目录 | `../site/content/ai-daily` |
+| `DISCORD_DELIVERY_MODE` | 推送模式 | `thread` |
+| `DISCORD_BOT_TOKEN` | Discord Bot Token | - |
+| `DISCORD_TARGET_CHANNEL_ID` | 目标频道 ID | - |
+
+### 推送模式
+
+- `thread`:在 Discord 线程中推送(默认)
+- `channel`:每日创建新频道
+- `fixed-channel`:使用固定频道名 `robotdaily`
+- `existing-channel`:使用指定频道 ID
+
+---
+
+## 维护指南
+
+### 日常运维
+
+1. **检查日志**:`cat arxiv-digest/logs/robotdaily.log`
+2. **查看输出**:`ls arxiv-digest/output/YYYY-MM-DD/`
+3. **验证 Hugo**:`cd site && hugo --quiet`
+
+### 故障排查
+
+- **Discord 推送失败**:检查 `DISCORD_BOT_TOKEN` 和频道权限
+- **LLM 翻译失败**:确认 Ollama 服务运行正常
+- **Hugo 构建失败**:检查 Markdown 格式和 Front Matter
+
+### 定时任务配置
+
+**OpenClaw Cron(推荐)**:
+
+```json
+{
+  "name": "RobotDaily 每日推送",
+  "schedule": {"expr": "30 10 * * *", "kind": "cron", "tz": "Asia/Shanghai"},
+  "payload": {
+    "kind": "agentTurn",
+    "message": "运行 RobotDaily 每日简报"
+  }
+}
+```
+
+**系统 Cron**:
+
+```cron
+30 10 * * * cd /path/to/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord >> logs/robotdaily.log 2>&1
+```
+
+---
+
+## 开发说明
+
+### 添加新论文来源
+
+编辑 `scripts/fetch_arxiv.py` 或 `scripts/search_arxiv_papers.py`,添加新的查询条件。
+
+### 调整评分算法
+
+编辑 `scripts/select_papers.py`,修改 `score_paper()` 函数的权重参数。
+
+### 自定义推送模板
+
+- Discord 卡片:`scripts/publish_discord.py`
+- Hugo 模板:`site/layouts/_default/`
+- HTML 样式:`site/static/css/site.css`
+
+---
+
+## 版本历史
+
+- **2026-03-12**:分支统一为 `master`,删除 `main`
+- **2026-03-10**:Discord 推送改为 Embed 卡片格式
+- **2026-03-08**:双阶段推送策略确立(保底版 + 修订版)
+- **2026-03-06**:初始版本发布
+
+---
+
+## 相关项目
+
+- [MathLab](../../mathlab/) - 深度学习数学教材编译器
+- [OpenClaw](https://github.com/openclaw/openclaw) - AI 助手框架
+
+---
+
+*Last updated: 2026-03-12*

+ 85 - 66
arxiv-digest/SKILL.md

@@ -1,98 +1,117 @@
 ---
 name: arxiv-digest
-description: "Daily arXiv digest generation for embodied intelligence, representation learning, and reinforcement learning. Use when Codex needs to: (1) fetch recent papers from arXiv, (2) rank them with an applied-research bias, (3) pick 2-3 papers per domain, (4) translate abstracts into Chinese, add short explanations and tag keywords, (5) render mobile-friendly digest cards, or (6) publish the digest to Discord threads/channels on a schedule."
+description: "Daily arXiv digest generation for embodied intelligence, representation learning, and reinforcement learning. Use when Codex needs to: (1) fetch recent papers from arXiv, (2) rank them with an applied-research bias, (3) pick 2-3 papers per domain, (4) translate abstracts into Chinese, add short explanations and tag keywords, (5) render mobile-friendly digest cards, or (6) publish the digest to Discord on a schedule."
 ---
 
 # arXiv Digest
 
-Use `scripts/run_daily.py` as the single entry point.
+每日从 arXiv 抓取具身智能、表征学习、强化学习方向的最新论文,经过 LLM 筛选和解读后,推送到 Discord 并归档到 Hugo 博客。
 
-## Workflow
+## 核心入口
 
-1. Fetch recent arXiv papers with `scripts/fetch_arxiv.py`.
-2. Score papers for domain fit, applied value, innovation, and recency.
-3. Select 2-3 papers for each domain:
-   - 具身智能
-   - 表征学习
-   - 强化学习
-4. Use the local Ollama model to produce:
-   - 中文摘要翻译
-   - 简短价值解读
-   - 卡片标签
-5. Render two outputs:
-   - mobile-friendly HTML digest with expandable cards
-   - Markdown archive for Discord / quick search / Hugo import
-6. Publish to Hugo (optional):
-   - convert the Markdown digest into `site/content/ai-daily/YYYY-MM-DD.md`
-   - keep daily briefs separate from personal blog and resume content
-7. Publish to Discord:
-   - `thread` mode: OpenClaw-native daily thread/forum post
-   - `channel` mode: create one dated text channel per day via Discord REST + OpenClaw posting
-   - `fixed-channel` mode: reuse one stable channel name such as `robotdaily`, and create it if missing
-   - `existing-channel` mode: reuse a fixed channel id (best for already-known target channels)
-
-## Run commands
-
-Dry run without Discord:
+使用 `scripts/run_daily.py` 作为唯一入口:
 
 ```bash
+# 仅生成简报(dry run)
 python3 scripts/run_daily.py
-```
-
-Generate digest and publish to Discord:
 
-```bash
+# 生成并推送到 Discord
 python3 scripts/run_daily.py --publish-discord
-```
 
-Generate digest and sync into Hugo content:
-
-```bash
+# 生成并同步到 Hugo
 python3 scripts/run_daily.py --publish-hugo
-```
 
-Generate digest but skip LLM enrichment:
+# 生成并推送两者
+python3 scripts/run_daily.py --publish-discord --publish-hugo
 
-```bash
+# 跳过 LLM 增强(快速测试)
 python3 scripts/run_daily.py --skip-enrich
 ```
 
-## Config
+## 工作流程
+
+1. **抓取论文**:从 arXiv RSS/API 获取最新论文
+2. **智能评分**:基于创新性、应用价值、时效性自动排序
+3. **领域筛选**:每个领域(具身智能、表征学习、强化学习)精选 2-3 篇
+4. **LLM 增强**:使用本地 Ollama 模型生成:
+   - 中文摘要翻译
+   - 简短价值解读
+   - 卡片标签
+5. **渲染输出**:
+   - `robotdaily.html` - 移动端友好的 HTML 卡片
+   - `robotdaily.md` - Markdown 归档版本
+6. **多渠道推送**(可选):
+   - Discord:推送到 RobotDaily 频道
+   - Hugo:同步到 `site/content/ai-daily/YYYY-MM-DD.md`
+
+## 输出文件
+
+每次运行在 `output/YYYY-MM-DD/` 目录生成:
+
+- `candidates.json` - 候选论文列表
+- `selected.json` - 精选论文列表
+- `enriched.json` - LLM 增强后数据
+- `robotdaily.html` - HTML 移动端摘要卡片
+- `robotdaily.md` - Markdown 归档版本
+- `manifest.json` - 元数据清单
 
-Read `references/selection-and-delivery.md` when you need to tune scoring or choose the Discord delivery mode.
+## 配置
 
-Common env vars in `arxiv-digest/.env`:
+环境变量配置在 `arxiv-digest/.env`:
 
-- `INSIGHT_MODELS=qwen3.5:27b`
-- `ROBOTDAILY_OUTPUT_DIR=/path/to/output`
-- `HUGO_CONTENT_DIR=/path/to/robdaily/site/content/ai-daily`
-- `DISCORD_DELIVERY_MODE=thread|channel|fixed-channel|existing-channel`
-- `DISCORD_ACCOUNT_ID=codex`
-- `DISCORD_GUILD_ID=...`
-- `DISCORD_PARENT_CHANNEL_ID=...`
-- `DISCORD_TARGET_CHANNEL_ID=...`
-- `DISCORD_TARGET_CHANNEL_NAME=robotdaily`
-- `DISCORD_CATEGORY_ID=...`
-- `DISCORD_BOT_TOKEN=...` (needed when a missing channel must be created)
-- `DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080`
+```bash
+# LLM 模型
+INSIGHT_MODELS=qwen3.5:27b
+
+# 输出目录
+ROBOTDAILY_OUTPUT_DIR=/home/zhn/.openclaw/workspace/skills/robdaily/arxiv-digest/output
 
-## Output
+# Hugo 内容目录
+HUGO_CONTENT_DIR=/home/zhn/.openclaw/workspace/skills/robdaily/site/content/ai-daily
 
-Each run writes a dated bundle containing:
+# Discord 推送模式:thread | channel | fixed-channel | existing-channel
+DISCORD_DELIVERY_MODE=existing-channel
 
-- `candidates.json`
-- `selected.json`
-- `enriched.json`
-- `robotdaily.html`
-- `robotdaily.md`
-- `manifest.json`
+# Discord 配置
+DISCORD_ACCOUNT_ID=codex
+DISCORD_GUILD_ID=1474063117875937340
+DISCORD_TARGET_CHANNEL_ID=1481632217930141697
+DISCORD_BOT_TOKEN=your-bot-token-here
 
-## Scheduling
+# 线程自动归档时间(分钟)
+DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080
+```
 
-The pipeline is designed for a daily 10:30 run in Asia/Shanghai.
+## 定时任务
 
-Recommended cron entry example:
+每天上午 10:30(Asia/Shanghai)自动运行:
 
 ```cron
-30 10 * * * cd /path/to/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord >> logs/robotdaily.log 2>&1
+30 10 * * * cd /home/zhn/.openclaw/workspace/skills/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord --publish-hugo >> logs/robotdaily.log 2>&1
+```
+
+## 相关文档
+
+- [项目结构说明](../../README.md) - 整个 RobotDaily 项目结构
+- [筛选与推送策略](references/selection-and-delivery.md) - 论文评分和推送规则
+- [Hugo 部署说明](../../deploy/README.md) - Docker 部署方案
+
+## 维护
+
+### 查看日志
+
+```bash
+cat arxiv-digest/logs/robotdaily.log
+```
+
+### 检查输出
+
+```bash
+ls -la arxiv-digest/output/$(date +%Y-%m-%d)/
+```
+
+### 验证 Hugo 站点
+
+```bash
+cd site && hugo --quiet
 ```