|
@@ -1,98 +1,117 @@
|
|
|
---
|
|
---
|
|
|
name: arxiv-digest
|
|
name: arxiv-digest
|
|
|
-description: "Daily arXiv digest generation for embodied intelligence, representation learning, and reinforcement learning. Use when Codex needs to: (1) fetch recent papers from arXiv, (2) rank them with an applied-research bias, (3) pick 2-3 papers per domain, (4) translate abstracts into Chinese, add short explanations and tag keywords, (5) render mobile-friendly digest cards, or (6) publish the digest to Discord threads/channels on a schedule."
|
|
|
|
|
|
|
+description: "Daily arXiv digest generation for embodied intelligence, representation learning, and reinforcement learning. Use when Codex needs to: (1) fetch recent papers from arXiv, (2) rank them with an applied-research bias, (3) pick 2-3 papers per domain, (4) translate abstracts into Chinese, add short explanations and tag keywords, (5) render mobile-friendly digest cards, or (6) publish the digest to Discord on a schedule."
|
|
|
---
|
|
---
|
|
|
|
|
|
|
|
# arXiv Digest
|
|
# arXiv Digest
|
|
|
|
|
|
|
|
-Use `scripts/run_daily.py` as the single entry point.
|
|
|
|
|
|
|
+每日从 arXiv 抓取具身智能、表征学习、强化学习方向的最新论文,经过 LLM 筛选和解读后,推送到 Discord 并归档到 Hugo 博客。
|
|
|
|
|
|
|
|
-## Workflow
|
|
|
|
|
|
|
+## 核心入口
|
|
|
|
|
|
|
|
-1. Fetch recent arXiv papers with `scripts/fetch_arxiv.py`.
|
|
|
|
|
-2. Score papers for domain fit, applied value, innovation, and recency.
|
|
|
|
|
-3. Select 2-3 papers for each domain:
|
|
|
|
|
- - 具身智能
|
|
|
|
|
- - 表征学习
|
|
|
|
|
- - 强化学习
|
|
|
|
|
-4. Use the local Ollama model to produce:
|
|
|
|
|
- - 中文摘要翻译
|
|
|
|
|
- - 简短价值解读
|
|
|
|
|
- - 卡片标签
|
|
|
|
|
-5. Render two outputs:
|
|
|
|
|
- - mobile-friendly HTML digest with expandable cards
|
|
|
|
|
- - Markdown archive for Discord / quick search / Hugo import
|
|
|
|
|
-6. Publish to Hugo (optional):
|
|
|
|
|
- - convert the Markdown digest into `site/content/ai-daily/YYYY-MM-DD.md`
|
|
|
|
|
- - keep daily briefs separate from personal blog and resume content
|
|
|
|
|
-7. Publish to Discord:
|
|
|
|
|
- - `thread` mode: OpenClaw-native daily thread/forum post
|
|
|
|
|
- - `channel` mode: create one dated text channel per day via Discord REST + OpenClaw posting
|
|
|
|
|
- - `fixed-channel` mode: reuse one stable channel name such as `robotdaily`, and create it if missing
|
|
|
|
|
- - `existing-channel` mode: reuse a fixed channel id (best for already-known target channels)
|
|
|
|
|
-
|
|
|
|
|
-## Run commands
|
|
|
|
|
-
|
|
|
|
|
-Dry run without Discord:
|
|
|
|
|
|
|
+使用 `scripts/run_daily.py` 作为唯一入口:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
|
|
+# 仅生成简报(dry run)
|
|
|
python3 scripts/run_daily.py
|
|
python3 scripts/run_daily.py
|
|
|
-```
|
|
|
|
|
-
|
|
|
|
|
-Generate digest and publish to Discord:
|
|
|
|
|
|
|
|
|
|
-```bash
|
|
|
|
|
|
|
+# 生成并推送到 Discord
|
|
|
python3 scripts/run_daily.py --publish-discord
|
|
python3 scripts/run_daily.py --publish-discord
|
|
|
-```
|
|
|
|
|
|
|
|
|
|
-Generate digest and sync into Hugo content:
|
|
|
|
|
-
|
|
|
|
|
-```bash
|
|
|
|
|
|
|
+# 生成并同步到 Hugo
|
|
|
python3 scripts/run_daily.py --publish-hugo
|
|
python3 scripts/run_daily.py --publish-hugo
|
|
|
-```
|
|
|
|
|
|
|
|
|
|
-Generate digest but skip LLM enrichment:
|
|
|
|
|
|
|
+# 生成并推送两者
|
|
|
|
|
+python3 scripts/run_daily.py --publish-discord --publish-hugo
|
|
|
|
|
|
|
|
-```bash
|
|
|
|
|
|
|
+# 跳过 LLM 增强(快速测试)
|
|
|
python3 scripts/run_daily.py --skip-enrich
|
|
python3 scripts/run_daily.py --skip-enrich
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
-## Config
|
|
|
|
|
|
|
+## 工作流程
|
|
|
|
|
+
|
|
|
|
|
+1. **抓取论文**:从 arXiv RSS/API 获取最新论文
|
|
|
|
|
+2. **智能评分**:基于创新性、应用价值、时效性自动排序
|
|
|
|
|
+3. **领域筛选**:每个领域(具身智能、表征学习、强化学习)精选 2-3 篇
|
|
|
|
|
+4. **LLM 增强**:使用本地 Ollama 模型生成:
|
|
|
|
|
+ - 中文摘要翻译
|
|
|
|
|
+ - 简短价值解读
|
|
|
|
|
+ - 卡片标签
|
|
|
|
|
+5. **渲染输出**:
|
|
|
|
|
+ - `robotdaily.html` - 移动端友好的 HTML 卡片
|
|
|
|
|
+ - `robotdaily.md` - Markdown 归档版本
|
|
|
|
|
+6. **多渠道推送**(可选):
|
|
|
|
|
+ - Discord:推送到 RobotDaily 频道
|
|
|
|
|
+ - Hugo:同步到 `site/content/ai-daily/YYYY-MM-DD.md`
|
|
|
|
|
+
|
|
|
|
|
+## 输出文件
|
|
|
|
|
+
|
|
|
|
|
+每次运行在 `output/YYYY-MM-DD/` 目录生成:
|
|
|
|
|
+
|
|
|
|
|
+- `candidates.json` - 候选论文列表
|
|
|
|
|
+- `selected.json` - 精选论文列表
|
|
|
|
|
+- `enriched.json` - LLM 增强后数据
|
|
|
|
|
+- `robotdaily.html` - HTML 移动端摘要卡片
|
|
|
|
|
+- `robotdaily.md` - Markdown 归档版本
|
|
|
|
|
+- `manifest.json` - 元数据清单
|
|
|
|
|
|
|
|
-Read `references/selection-and-delivery.md` when you need to tune scoring or choose the Discord delivery mode.
|
|
|
|
|
|
|
+## 配置
|
|
|
|
|
|
|
|
-Common env vars in `arxiv-digest/.env`:
|
|
|
|
|
|
|
+环境变量配置在 `arxiv-digest/.env`:
|
|
|
|
|
|
|
|
-- `INSIGHT_MODELS=qwen3.5:27b`
|
|
|
|
|
-- `ROBOTDAILY_OUTPUT_DIR=/path/to/output`
|
|
|
|
|
-- `HUGO_CONTENT_DIR=/path/to/robdaily/site/content/ai-daily`
|
|
|
|
|
-- `DISCORD_DELIVERY_MODE=thread|channel|fixed-channel|existing-channel`
|
|
|
|
|
-- `DISCORD_ACCOUNT_ID=codex`
|
|
|
|
|
-- `DISCORD_GUILD_ID=...`
|
|
|
|
|
-- `DISCORD_PARENT_CHANNEL_ID=...`
|
|
|
|
|
-- `DISCORD_TARGET_CHANNEL_ID=...`
|
|
|
|
|
-- `DISCORD_TARGET_CHANNEL_NAME=robotdaily`
|
|
|
|
|
-- `DISCORD_CATEGORY_ID=...`
|
|
|
|
|
-- `DISCORD_BOT_TOKEN=...` (needed when a missing channel must be created)
|
|
|
|
|
-- `DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080`
|
|
|
|
|
|
|
+```bash
|
|
|
|
|
+# LLM 模型
|
|
|
|
|
+INSIGHT_MODELS=qwen3.5:27b
|
|
|
|
|
+
|
|
|
|
|
+# 输出目录
|
|
|
|
|
+ROBOTDAILY_OUTPUT_DIR=/home/zhn/.openclaw/workspace/skills/robdaily/arxiv-digest/output
|
|
|
|
|
|
|
|
-## Output
|
|
|
|
|
|
|
+# Hugo 内容目录
|
|
|
|
|
+HUGO_CONTENT_DIR=/home/zhn/.openclaw/workspace/skills/robdaily/site/content/ai-daily
|
|
|
|
|
|
|
|
-Each run writes a dated bundle containing:
|
|
|
|
|
|
|
+# Discord 推送模式:thread | channel | fixed-channel | existing-channel
|
|
|
|
|
+DISCORD_DELIVERY_MODE=existing-channel
|
|
|
|
|
|
|
|
-- `candidates.json`
|
|
|
|
|
-- `selected.json`
|
|
|
|
|
-- `enriched.json`
|
|
|
|
|
-- `robotdaily.html`
|
|
|
|
|
-- `robotdaily.md`
|
|
|
|
|
-- `manifest.json`
|
|
|
|
|
|
|
+# Discord 配置
|
|
|
|
|
+DISCORD_ACCOUNT_ID=codex
|
|
|
|
|
+DISCORD_GUILD_ID=1474063117875937340
|
|
|
|
|
+DISCORD_TARGET_CHANNEL_ID=1481632217930141697
|
|
|
|
|
+DISCORD_BOT_TOKEN=your-bot-token-here
|
|
|
|
|
|
|
|
-## Scheduling
|
|
|
|
|
|
|
+# 线程自动归档时间(分钟)
|
|
|
|
|
+DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080
|
|
|
|
|
+```
|
|
|
|
|
|
|
|
-The pipeline is designed for a daily 10:30 run in Asia/Shanghai.
|
|
|
|
|
|
|
+## 定时任务
|
|
|
|
|
|
|
|
-Recommended cron entry example:
|
|
|
|
|
|
|
+每天上午 10:30(Asia/Shanghai)自动运行:
|
|
|
|
|
|
|
|
```cron
|
|
```cron
|
|
|
-30 10 * * * cd /path/to/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord >> logs/robotdaily.log 2>&1
|
|
|
|
|
|
|
+30 10 * * * cd /home/zhn/.openclaw/workspace/skills/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord --publish-hugo >> logs/robotdaily.log 2>&1
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+## 相关文档
|
|
|
|
|
+
|
|
|
|
|
+- [项目结构说明](../../README.md) - 整个 RobotDaily 项目结构
|
|
|
|
|
+- [筛选与推送策略](references/selection-and-delivery.md) - 论文评分和推送规则
|
|
|
|
|
+- [Hugo 部署说明](../../deploy/README.md) - Docker 部署方案
|
|
|
|
|
+
|
|
|
|
|
+## 维护
|
|
|
|
|
+
|
|
|
|
|
+### 查看日志
|
|
|
|
|
+
|
|
|
|
|
+```bash
|
|
|
|
|
+cat arxiv-digest/logs/robotdaily.log
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 检查输出
|
|
|
|
|
+
|
|
|
|
|
+```bash
|
|
|
|
|
+ls -la arxiv-digest/output/$(date +%Y-%m-%d)/
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 验证 Hugo 站点
|
|
|
|
|
+
|
|
|
|
|
+```bash
|
|
|
|
|
+cd site && hugo --quiet
|
|
|
```
|
|
```
|