Quellcode durchsuchen

feat: ship RobotDaily arxiv digest pipeline

Daily Deploy Bot vor 1 Woche
Ursprung
Commit
c47908f5b6

+ 7 - 0
.gitignore

@@ -0,0 +1,7 @@
+node_modules/
+__pycache__/
+*.pyc
+arxiv-digest/output*/
+arxiv-digest/logs/
+.env
+arxiv-digest/.env

+ 10 - 0
arxiv-digest/.env.example

@@ -0,0 +1,10 @@
+OPENCLAW_BIN=/home/zhn/.nvm/versions/node/v22.22.0/bin/openclaw
+INSIGHT_MODELS=glm-4.7:cloud,qwen3.5:cloud,qwen3.5:27b,glm-4.7-flash-64k:latest
+ROBOTDAILY_OUTPUT_DIR=/home/zhn/.openclaw/workspace/skills/robdaily/arxiv-digest/output
+DISCORD_DELIVERY_MODE=fixed-channel
+DISCORD_ACCOUNT_ID=codex
+DISCORD_GUILD_ID=<guild id>
+DISCORD_TARGET_CHANNEL_NAME=robotdaily
+# DISCORD_CATEGORY_ID=<optional category id>
+# DISCORD_BOT_TOKEN=<required only when a missing channel must be created>
+DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080

+ 61 - 59
arxiv-digest/SKILL.md

@@ -1,86 +1,88 @@
 ---
 name: arxiv-digest
-description: 'Daily ArXiv Paper Digest. Use when: (1) Searching latest papers from arXiv RSS feeds, (2) Filtering papers in embodied AI/representation learning/RL, (3) Translating abstracts to Chinese, (4) Formatting Telegram-friendly paper cards, (5) Scheduling daily 10:30 AM digest delivery'
+description: Daily arXiv digest generation for embodied intelligence, representation learning, and reinforcement learning. Use when Codex needs to: (1) fetch recent papers from arXiv, (2) rank them with an applied-research bias, (3) pick 2-3 papers per domain, (4) translate abstracts into Chinese, add short explanations and tag keywords, (5) render mobile-friendly digest cards, or (6) publish the digest to Discord threads/channels on a schedule.
 ---
 
-# ArXiv Daily Digest
+# arXiv Digest
 
-## 🎯 Purpose
+Use `scripts/run_daily.py` as the single entry point.
 
-Automatically curate and deliver daily research papers from ArXiv in:
-- **Embodied AI** (具身智能)
-- **Representation Learning** (表征学习)
-- **Reinforcement Learning** (强化学习)
+## Workflow
 
-## ⏰ Schedule
+1. Fetch recent arXiv papers with `scripts/fetch_arxiv.py`.
+2. Score papers for domain fit, applied value, innovation, and recency.
+3. Select 2-3 papers for each domain:
+   - 具身智能
+   - 表征学习
+   - 强化学习
+4. Use the local Ollama model to produce:
+   - 中文摘要翻译
+   - 简短价值解读
+   - 卡片标签
+5. Render two outputs:
+   - mobile-friendly HTML digest with expandable cards
+   - Markdown archive for Discord / quick search
+6. Publish to Discord:
+   - `thread` mode: OpenClaw-native daily thread/forum post
+   - `channel` mode: create one dated text channel per day via Discord REST + OpenClaw posting
+   - `fixed-channel` mode: reuse one stable channel name such as `robotdaily`, and create it if missing
+   - `existing-channel` mode: reuse a fixed channel id (best for already-known target channels)
 
-- **Daily**: 10:30 AM (Asia/Shanghai)
-- **Cron**: `30 10 * * *`
-- **Delivery**: Telegram with HTML attachment
+## Run commands
 
-## 📋 Workflow
+Dry run without Discord:
 
-1. **Search**: Query arXiv RSS feeds (cs.RO, cs.AI, cs.LG)
-2. **Filter**: Keyword-based filtering (representation, learning, embodied, RL)
-3. **Select**: Top 5 most promising papers
-4. **Translate**: Abstracts to Chinese with brief explanations
-5. **Format**: Mobile-friendly HTML digest
-6. **Deliver**: Telegram message with attachment
-
-## 🗂️ Output
-
-- **Location**: `~/arxiv-digests/`
-- **Filename**: `arxiv-digest-YYYY-MM-DD.html`
-- **Format**: HTML optimized for mobile reading
-
-## 🤖 Cron Job
-
-```json
-{
-  "id": "a0511036-c75b-493d-bc43-3da9685faacf",
-  "name": "Daily ArXiv Digest with Attachment",
-  "schedule": "30 10 * * *",
-  "payload": {
-    "kind": "agentTurn",
-    "message": "Generate daily ArXiv digest..."
-  }
-}
+```bash
+python3 scripts/run_daily.py
 ```
 
-## 📦 Scripts
-
-### rss_arxiv_search.py
-Search arXiv using RSS feeds.
+Generate digest and publish to Discord:
 
 ```bash
-python3 scripts/rss_arxiv_search.py
+python3 scripts/run_daily.py --publish-discord
 ```
 
-### translate_abstract.py
-Translate abstracts to Chinese.
+Generate digest but skip LLM enrichment:
 
 ```bash
-echo '[papers_json]' | python3 scripts/translate_abstract.py
+python3 scripts/run_daily.py --skip-enrich
 ```
 
-### format_telegram_card.py
-Format papers as Telegram cards.
+## Config
 
-```bash
-echo '[processed_papers_json]' | python3 scripts/format_telegram_card.py
-```
+Read `references/selection-and-delivery.md` when you need to tune scoring or choose the Discord delivery mode.
 
-## 🔧 Configuration
+Common env vars in `arxiv-digest/.env`:
 
-- **Telegram User ID**: 5573886389
-- **Output Directory**: `~/arxiv-digests/`
-- **Template**: Use HTML template for mobile reading
+- `INSIGHT_MODELS=glm-4.7:cloud,qwen3.5:cloud,qwen3.5:27b,glm-4.7-flash-64k:latest`
+- `ROBOTDAILY_OUTPUT_DIR=/path/to/output`
+- `DISCORD_DELIVERY_MODE=thread|channel|fixed-channel|existing-channel`
+- `DISCORD_ACCOUNT_ID=codex`
+- `DISCORD_GUILD_ID=...`
+- `DISCORD_PARENT_CHANNEL_ID=...`
+- `DISCORD_TARGET_CHANNEL_ID=...`
+- `DISCORD_TARGET_CHANNEL_NAME=robotdaily`
+- `DISCORD_CATEGORY_ID=...`
+- `DISCORD_BOT_TOKEN=...` (needed when a missing channel must be created)
+- `DISCORD_THREAD_AUTO_ARCHIVE_MIN=10080`
 
-## 📝 Message Template
+## Output
 
-```
-✨ 早安!你的专属 AI 论文简报新鲜出炉啦~ 📚
+Each run writes a dated bundle containing:
+
+- `candidates.json`
+- `selected.json`
+- `enriched.json`
+- `robotdaily.html`
+- `robotdaily.md`
+- `manifest.json`
+
+## Scheduling
+
+The pipeline is designed for a daily 10:30 run in Asia/Shanghai.
+
+Recommended cron entry example:
 
-今日份的学术干货已打包完毕,包含 5 篇最新最热的 AI 前沿论文!
-快来看看今天有哪些有趣的发现吧~ 🤗
+```cron
+30 10 * * * cd /path/to/robdaily/arxiv-digest && python3 scripts/run_daily.py --publish-discord >> logs/robotdaily.log 2>&1
 ```

+ 237 - 0
arxiv-digest/assets/mobile_digest_template.html

@@ -0,0 +1,237 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" />
+  <title>RobotDaily {{date}}</title>
+  <style>
+    :root {
+      color-scheme: light;
+      --bg: #f4f6fb;
+      --card: #ffffff;
+      --text: #142033;
+      --muted: #5e6c84;
+      --line: #d7ddea;
+      --brand: #4f46e5;
+      --brand-soft: #eef0ff;
+      --accent: #0f766e;
+      --accent-soft: #ecfdf5;
+      --warn: #b45309;
+      --warn-soft: #fff7ed;
+      --shadow: 0 10px 28px rgba(20, 32, 51, 0.08);
+      --radius: 18px;
+    }
+
+    * { box-sizing: border-box; }
+    html, body { margin: 0; padding: 0; }
+    body {
+      font-family: Inter, -apple-system, BlinkMacSystemFont, "Segoe UI", "PingFang SC", sans-serif;
+      background:
+        radial-gradient(circle at top right, rgba(79, 70, 229, 0.10), transparent 30%),
+        radial-gradient(circle at top left, rgba(15, 118, 110, 0.12), transparent 28%),
+        var(--bg);
+      color: var(--text);
+    }
+    a { color: inherit; }
+    .page {
+      max-width: 980px;
+      margin: 0 auto;
+      padding: 18px 14px 40px;
+    }
+    .hero {
+      background: linear-gradient(135deg, #312e81 0%, #4f46e5 44%, #0f766e 100%);
+      color: #fff;
+      border-radius: 24px;
+      padding: 20px 18px;
+      box-shadow: 0 16px 36px rgba(49, 46, 129, 0.22);
+    }
+    .hero h1 {
+      margin: 0 0 8px;
+      font-size: 24px;
+      line-height: 1.2;
+    }
+    .hero p {
+      margin: 0;
+      color: rgba(255,255,255,0.9);
+      line-height: 1.55;
+      font-size: 14px;
+    }
+    .nav {
+      display: flex;
+      gap: 10px;
+      flex-wrap: wrap;
+      margin-top: 14px;
+    }
+    .nav a {
+      text-decoration: none;
+      color: #fff;
+      background: rgba(255,255,255,0.15);
+      padding: 8px 12px;
+      border-radius: 999px;
+      font-size: 13px;
+      backdrop-filter: blur(6px);
+    }
+    .section {
+      margin-top: 18px;
+    }
+    .section-header {
+      display: flex;
+      align-items: center;
+      justify-content: space-between;
+      gap: 12px;
+      margin: 0 0 10px;
+    }
+    .section-header h2 {
+      margin: 0;
+      font-size: 18px;
+    }
+    .section-header .count {
+      color: var(--muted);
+      font-size: 13px;
+    }
+    .cards {
+      display: grid;
+      gap: 12px;
+    }
+    .paper-card {
+      background: var(--card);
+      border-radius: var(--radius);
+      border: 1px solid rgba(215, 221, 234, 0.9);
+      box-shadow: var(--shadow);
+      overflow: hidden;
+    }
+    .paper-card summary {
+      list-style: none;
+      cursor: pointer;
+      padding: 16px;
+      display: block;
+    }
+    .paper-card summary::-webkit-details-marker { display: none; }
+    .paper-card[open] summary {
+      border-bottom: 1px solid var(--line);
+      background: #fbfcff;
+    }
+    .meta-row {
+      display: flex;
+      flex-wrap: wrap;
+      gap: 8px;
+      align-items: center;
+      margin-bottom: 10px;
+    }
+    .pill {
+      display: inline-flex;
+      align-items: center;
+      gap: 6px;
+      border-radius: 999px;
+      padding: 5px 10px;
+      font-size: 12px;
+      line-height: 1;
+      white-space: nowrap;
+    }
+    .pill.domain {
+      color: var(--brand);
+      background: var(--brand-soft);
+      font-weight: 700;
+    }
+    .pill.score {
+      color: var(--accent);
+      background: var(--accent-soft);
+    }
+    .pill.date {
+      color: var(--warn);
+      background: var(--warn-soft);
+    }
+    .paper-title {
+      margin: 0;
+      font-size: 17px;
+      line-height: 1.4;
+      word-break: break-word;
+    }
+    .teaser {
+      margin: 10px 0 0;
+      font-size: 14px;
+      line-height: 1.7;
+      color: #23304a;
+    }
+    .tag-row {
+      display: flex;
+      flex-wrap: wrap;
+      gap: 8px;
+      margin-top: 12px;
+    }
+    .tag {
+      border-radius: 999px;
+      padding: 6px 10px;
+      font-size: 12px;
+      color: #334155;
+      background: #f1f5f9;
+      border: 1px solid #e2e8f0;
+    }
+    .paper-body {
+      padding: 16px;
+      display: grid;
+      gap: 12px;
+    }
+    .info {
+      font-size: 13px;
+      color: var(--muted);
+      line-height: 1.6;
+    }
+    .block {
+      background: #f8fafc;
+      border-radius: 14px;
+      padding: 13px 14px;
+    }
+    .block h3 {
+      margin: 0 0 8px;
+      font-size: 14px;
+    }
+    .block p {
+      margin: 0;
+      font-size: 14px;
+      line-height: 1.72;
+      color: #1f2937;
+      white-space: pre-wrap;
+      word-break: break-word;
+    }
+    .links {
+      display: flex;
+      flex-wrap: wrap;
+      gap: 10px;
+    }
+    .link-btn {
+      text-decoration: none;
+      color: var(--brand);
+      border: 1px solid rgba(79, 70, 229, 0.22);
+      background: rgba(79, 70, 229, 0.06);
+      border-radius: 12px;
+      padding: 10px 12px;
+      font-size: 13px;
+      font-weight: 600;
+    }
+    .footer {
+      margin-top: 22px;
+      color: var(--muted);
+      font-size: 13px;
+      text-align: center;
+    }
+    @media (min-width: 860px) {
+      .cards {
+        grid-template-columns: repeat(2, minmax(0, 1fr));
+      }
+      .hero h1 { font-size: 28px; }
+    }
+  </style>
+</head>
+<body>
+  <main class="page">
+    <section class="hero">
+      <h1>RobotDaily 晨读</h1>
+      <p>{{intro}}</p>
+      <div class="nav">{{nav}}</div>
+    </section>
+    {{sections}}
+    <p class="footer">生成时间:{{generated_at}} · DOI 默认优先展示官方 DOI;若原文未提供,则回退为 arXiv 的 DataCite DOI。</p>
+  </main>
+</body>
+</html>

+ 82 - 0
arxiv-digest/references/selection-and-delivery.md

@@ -0,0 +1,82 @@
+# RobotDaily Selection & Delivery Notes
+
+## Selection bias
+
+RobotDaily is intentionally biased toward papers that are:
+
+- recent (default 2-day lookback, fallback 4-day lookback)
+- practical or system-oriented
+- likely to matter for real robots / real deployment / data engines / robust representation learning
+- not pure survey or tutorial content
+
+## Domain buckets
+
+### 具身智能
+
+Prefer papers mentioning robot, embodied, humanoid, manipulation, navigation, locomotion, grasping, sim2real, or real-robot evaluation.
+
+### 表征学习
+
+Prefer papers mentioning representation learning, latent space, embeddings, world models, self-supervised learning, object-centric features, or pretraining that supports downstream control.
+
+### 强化学习
+
+Prefer papers mentioning reinforcement learning, offline RL, policy optimization, reward design, imitation learning, decision making, or control policies.
+
+## Delivery modes
+
+### thread
+
+- Uses OpenClaw CLI only.
+- Creates one new Discord thread/forum post per day.
+- Best option when you already have OpenClaw Discord configured and want a thread-style mobile reading flow.
+
+Required env:
+
+- `DISCORD_DELIVERY_MODE=thread`
+- `DISCORD_ACCOUNT_ID=codex` (or the configured account id)
+- `DISCORD_GUILD_ID=<guild id>`
+- `DISCORD_PARENT_CHANNEL_ID=<forum or text channel id>`
+
+### channel
+
+- Creates one new dated text channel per day.
+- Current OpenClaw build exposes channel-create in capabilities, but not as a public CLI subcommand.
+- This mode therefore uses raw Discord REST for channel creation, then OpenClaw CLI for posting.
+
+Required env:
+
+- `DISCORD_DELIVERY_MODE=channel`
+- `DISCORD_ACCOUNT_ID=codex`
+- `DISCORD_GUILD_ID=<guild id>`
+- `DISCORD_TARGET_CHANNEL_NAME=robotdaily` (optional prefix; defaults to `robotdaily`)
+- `DISCORD_CATEGORY_ID=<category id>` (optional but recommended)
+- `DISCORD_BOT_TOKEN=<bot token>`
+
+### fixed-channel (best match for `RobotDaily`)
+
+- Reuses one stable channel name such as `robotdaily`.
+- If the channel already exists, it posts there directly.
+- If the channel is missing, it creates it through Discord REST and then posts into it.
+
+Required env:
+
+- `DISCORD_DELIVERY_MODE=fixed-channel`
+- `DISCORD_ACCOUNT_ID=codex`
+- `DISCORD_GUILD_ID=<guild id>`
+- `DISCORD_TARGET_CHANNEL_NAME=robotdaily`
+- `DISCORD_BOT_TOKEN=<bot token>` only when the channel may need to be created
+
+### existing-channel
+
+- Reuses one fixed channel id.
+- Best option when the target channel already exists and you know its id.
+
+Required env:
+
+- `DISCORD_DELIVERY_MODE=existing-channel`
+- `DISCORD_TARGET_CHANNEL_ID=<channel id>`
+
+## Entry point
+
+Use `scripts/run_daily.py` as the only orchestration entry point.

+ 115 - 0
arxiv-digest/scripts/enrich_papers.py

@@ -0,0 +1,115 @@
+#!/usr/bin/env python3
+"""Translate abstracts, generate tags, and produce short explanations."""
+
+from __future__ import annotations
+
+import argparse
+import json
+from typing import Any, Dict, List
+
+from fetch_arxiv import DOMAIN_CONFIGS
+from utils import log, normalize_space, ollama_generate_json, read_json, truncate, write_json
+
+FALLBACK_TAGS = {
+    "embodied": ["具身智能", "机器人", "真实部署", "操控", "导航"],
+    "representation": ["表征学习", "潜在空间", "世界模型", "预训练", "对象中心"],
+    "reinforcement": ["强化学习", "策略优化", "奖励设计", "离线RL", "模仿学习"],
+}
+
+
+def build_prompt(paper: Dict[str, Any]) -> str:
+    domain_label = DOMAIN_CONFIGS[paper["domain"]]["label_zh"]
+    return f"""
+你是 RobotDaily 的论文晨报编辑。请根据给定的英文标题与英文摘要,输出严格 JSON。
+
+只输出一个 JSON 对象,结构如下:
+{{
+  "translated_abstract_zh": "...",
+  "brief_explanation_zh": "...",
+  "tags": ["标签1", "标签2", "标签3", "标签4", "标签5"]
+}}
+
+要求:
+1. translated_abstract_zh:忠实翻译原摘要,不要增加原文没有的实验结果;控制在 180-320 个中文字符。
+2. brief_explanation_zh:40-90 个中文字符,说明为什么值得读,尽量偏应用价值和创新点。
+3. tags:给 4-6 个适合直接贴在移动端卡片上的简短标签;尽量用中文,必要时保留通用英文术语,如 World Model、Offline RL。
+4. 语气务实、技术导向,不要夸张。
+5. 不要输出 Markdown,不要输出代码块。
+
+领域:{domain_label}
+标题:{paper['title']}
+英文摘要:{paper['summary']}
+""".strip()
+
+
+def fallback_enrichment(paper: Dict[str, Any]) -> Dict[str, Any]:
+    tags = FALLBACK_TAGS.get(paper["domain"], ["AI论文", "机器学习", "应用研究"])
+    matched = paper.get("matched_applied_terms", [])[:2] + paper.get("matched_innovation_terms", [])[:2]
+    reason = paper.get("selection_reason", "偏应用且具创新性")
+    if matched:
+        reason = f"关键词命中 {', '.join(matched)},{reason}"
+    return {
+        "translated_abstract_zh": f"【LLM 暂不可用,先保留英文摘要要点】{truncate(paper.get('summary', ''), 220)}",
+        "brief_explanation_zh": truncate(reason, 86),
+        "tags": tags[:5],
+    }
+
+
+def enrich_paper(paper: Dict[str, Any], model_names: List[str]) -> Dict[str, Any]:
+    prompt = build_prompt(paper)
+    result = None
+    for model in model_names:
+        model = normalize_space(model)
+        if not model:
+            continue
+        log(f"Enriching {paper['arxiv_id']} with {model}")
+        result = ollama_generate_json(prompt, model=model, timeout=150)
+        if result:
+            break
+
+    enriched = dict(paper)
+    payload = result or fallback_enrichment(paper)
+    tags = [normalize_space(tag).lstrip("#") for tag in payload.get("tags", []) if normalize_space(tag)]
+    if not tags:
+        tags = FALLBACK_TAGS.get(paper["domain"], [])[:5]
+
+    enriched["translated_abstract_zh"] = normalize_space(payload.get("translated_abstract_zh", "")) or fallback_enrichment(paper)["translated_abstract_zh"]
+    enriched["brief_explanation_zh"] = normalize_space(payload.get("brief_explanation_zh", "")) or fallback_enrichment(paper)["brief_explanation_zh"]
+    enriched["tags"] = tags[:6]
+    return enriched
+
+
+def enrich_selection(selection_payload: Dict[str, Any], model_names: List[str]) -> Dict[str, Any]:
+    papers = selection_payload.get("papers", [])
+    enriched_papers = [enrich_paper(paper, model_names=model_names) for paper in papers]
+
+    by_domain: Dict[str, List[Dict[str, Any]]] = {domain: [] for domain in selection_payload.get("selected_by_domain", {})}
+    for paper in enriched_papers:
+        by_domain.setdefault(paper["domain"], []).append(paper)
+
+    output = dict(selection_payload)
+    output["papers"] = enriched_papers
+    output["selected_by_domain"] = by_domain
+    output["models_used"] = model_names
+    return output
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Enrich RobotDaily papers with zh translation and tags")
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--output", default="")
+    parser.add_argument("--models", default="glm-4.7:cloud,qwen3.5:cloud,qwen3.5:27b,glm-4.7-flash-64k:latest")
+    args = parser.parse_args()
+
+    payload = read_json(args.input, default={}) or {}
+    models = [item.strip() for item in args.models.split(",") if item.strip()]
+    enriched = enrich_selection(payload, model_names=models)
+
+    if args.output:
+        write_json(args.output, enriched)
+    else:
+        print(json.dumps(enriched, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 377 - 0
arxiv-digest/scripts/fetch_arxiv.py

@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""Fetch recent arXiv papers for RobotDaily domains."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import math
+import xml.etree.ElementTree as ET
+from collections import defaultdict
+from datetime import datetime, timedelta, timezone
+from typing import Any, Dict, List
+from urllib.parse import urlencode
+from urllib.request import urlopen
+
+from utils import (
+    LOCAL_TZ,
+    build_arxiv_urls,
+    canonical_arxiv_id,
+    canonical_doi,
+    canonical_doi_url,
+    ensure_dir,
+    log,
+    normalize_space,
+    now_local,
+    write_json,
+)
+
+ATOM_NS = {"atom": "http://www.w3.org/2005/Atom", "arxiv": "http://arxiv.org/schemas/atom"}
+API_URL = "https://export.arxiv.org/api/query"
+
+DOMAIN_CONFIGS: Dict[str, Dict[str, Any]] = {
+    "embodied": {
+        "label_zh": "具身智能",
+        "query": "(cat:cs.RO OR cat:cs.AI OR cat:cs.LG OR cat:cs.CV) AND (all:robot OR all:embodied OR all:humanoid OR all:manipulation OR all:navigation OR all:locomotion OR all:grasp)",
+        "categories": {"cs.RO": 3.0, "cs.AI": 1.2, "cs.LG": 0.8, "cs.CV": 0.5},
+        "keywords": {
+            "embodied": 2.5,
+            "robot": 2.5,
+            "robotics": 2.0,
+            "humanoid": 2.0,
+            "manipulation": 2.0,
+            "navigation": 1.8,
+            "locomotion": 1.8,
+            "grasp": 1.8,
+            "grasping": 1.8,
+            "sim2real": 1.6,
+            "physical": 1.0,
+            "contact-rich": 1.2,
+            "real robot": 2.0,
+        },
+    },
+    "representation": {
+        "label_zh": "表征学习",
+        "query": "(cat:cs.LG OR cat:cs.CV OR cat:cs.AI OR cat:cs.RO) AND (all:\"representation learning\" OR all:representation OR all:latent OR all:embedding OR all:\"world model\" OR all:\"self-supervised\")",
+        "categories": {"cs.LG": 2.5, "cs.CV": 1.2, "cs.AI": 1.0, "cs.RO": 0.8},
+        "keywords": {
+            "representation": 2.5,
+            "representations": 2.5,
+            "latent": 2.0,
+            "embedding": 2.0,
+            "feature": 1.3,
+            "state space": 1.4,
+            "world model": 2.0,
+            "self-supervised": 1.8,
+            "pretraining": 1.2,
+            "tokenizer": 1.0,
+            "object-centric": 1.4,
+        },
+    },
+    "reinforcement": {
+        "label_zh": "强化学习",
+        "query": "(cat:cs.LG OR cat:cs.AI OR cat:cs.RO) AND (all:\"reinforcement learning\" OR all:\"offline reinforcement learning\" OR all:\"offline rl\" OR all:\"imitation learning\" OR all:\"policy optimization\" OR all:\"multi-agent reinforcement learning\")",
+        "categories": {"cs.LG": 2.0, "cs.AI": 1.8, "cs.RO": 1.0},
+        "keywords": {
+            "reinforcement learning": 2.8,
+            "offline reinforcement learning": 2.6,
+            "offline rl": 2.4,
+            "policy optimization": 2.0,
+            "policy gradient": 1.8,
+            "actor-critic": 1.8,
+            "imitation learning": 2.0,
+            "multi-agent reinforcement learning": 2.2,
+            "decision-making": 1.4,
+            "control": 0.8,
+            "trajectory": 0.8,
+            "q-learning": 1.8,
+        },
+    },
+}
+
+APPLIED_KEYWORDS = {
+    "real-world": 2.2,
+    "real world": 2.2,
+    "deployment": 2.0,
+    "deployed": 1.6,
+    "robot": 1.5,
+    "robotic": 1.4,
+    "system": 1.0,
+    "benchmark": 0.9,
+    "dataset": 0.9,
+    "controller": 1.0,
+    "hardware": 1.4,
+    "field": 1.2,
+    "navigation": 1.2,
+    "manipulation": 1.2,
+    "autonomous": 1.2,
+    "assistive": 1.4,
+    "human-robot": 1.6,
+    "sim2real": 1.8,
+    "simulation-to-real": 1.8,
+    "real robot": 2.0,
+    "open-world": 1.2,
+}
+
+INNOVATION_KEYWORDS = {
+    "foundation model": 2.2,
+    "world model": 1.8,
+    "unified": 1.3,
+    "generalist": 1.5,
+    "scalable": 1.2,
+    "multimodal": 1.2,
+    "diffusion": 1.2,
+    "cross-embodiment": 1.8,
+    "self-supervised": 1.1,
+    "zero-shot": 1.2,
+    "few-shot": 1.0,
+    "novel": 0.8,
+    "first": 0.8,
+    "new benchmark": 1.0,
+    "data engine": 1.4,
+    "reasoning": 1.0,
+}
+
+NEGATIVE_KEYWORDS = {
+    "survey": -2.4,
+    "review": -2.1,
+    "tutorial": -2.4,
+    "perspective": -1.6,
+}
+
+
+def build_date_clause(lookback_days: int) -> str:
+    now = now_local()
+    start_local = (now - timedelta(days=max(lookback_days, 1) - 1)).replace(hour=0, minute=0, second=0, microsecond=0)
+    end_local = now.replace(hour=23, minute=59, second=0, microsecond=0)
+    start_utc = start_local.astimezone(timezone.utc)
+    end_utc = end_local.astimezone(timezone.utc)
+    return f"submittedDate:[{start_utc.strftime('%Y%m%d%H%M')} TO {end_utc.strftime('%Y%m%d%H%M')}]"
+
+
+def build_query(domain: str, lookback_days: int, with_date: bool = True) -> str:
+    base = DOMAIN_CONFIGS[domain]["query"]
+    if not with_date:
+        return base
+    return f"({base}) AND {build_date_clause(lookback_days)}"
+
+
+def request_feed(query: str, start: int, max_results: int) -> str:
+    params = urlencode(
+        {
+            "search_query": query,
+            "sortBy": "submittedDate",
+            "sortOrder": "descending",
+            "start": start,
+            "max_results": max_results,
+        }
+    )
+    url = f"{API_URL}?{params}"
+    with urlopen(url, timeout=45) as response:
+        return response.read().decode("utf-8", errors="ignore")
+
+
+def parse_entry(entry: ET.Element, query_domain: str) -> Dict[str, Any]:
+    raw_id = entry.findtext("atom:id", default="", namespaces=ATOM_NS)
+    arxiv_id = canonical_arxiv_id(raw_id)
+    title = normalize_space(entry.findtext("atom:title", default="", namespaces=ATOM_NS))
+    summary = normalize_space(entry.findtext("atom:summary", default="", namespaces=ATOM_NS))
+    published = normalize_space(entry.findtext("atom:published", default="", namespaces=ATOM_NS))
+    updated = normalize_space(entry.findtext("atom:updated", default="", namespaces=ATOM_NS))
+    comment = normalize_space(entry.findtext("arxiv:comment", default="", namespaces=ATOM_NS))
+    journal_ref = normalize_space(entry.findtext("arxiv:journal_ref", default="", namespaces=ATOM_NS))
+    doi = canonical_doi(arxiv_id, entry.findtext("arxiv:doi", default="", namespaces=ATOM_NS))
+
+    authors = [normalize_space(author.findtext("atom:name", default="", namespaces=ATOM_NS)) for author in entry.findall("atom:author", ATOM_NS)]
+    authors = [author for author in authors if author]
+
+    categories = [cat.attrib.get("term", "") for cat in entry.findall("atom:category", ATOM_NS) if cat.attrib.get("term")]
+    primary_category = normalize_space(entry.findtext("arxiv:primary_category", default="", namespaces=ATOM_NS))
+    if not primary_category:
+        primary_category = entry.find("arxiv:primary_category", ATOM_NS).attrib.get("term", "") if entry.find("arxiv:primary_category", ATOM_NS) is not None else ""
+
+    abs_url = ""
+    pdf_url = ""
+    for link in entry.findall("atom:link", ATOM_NS):
+        href = link.attrib.get("href", "")
+        title_attr = link.attrib.get("title", "")
+        rel = link.attrib.get("rel", "")
+        if title_attr == "pdf" or link.attrib.get("type") == "application/pdf":
+            pdf_url = href
+        elif rel == "alternate" and href:
+            abs_url = href
+
+    if not abs_url or not pdf_url:
+        urls = build_arxiv_urls(arxiv_id)
+        abs_url = abs_url or urls["abs_url"]
+        pdf_url = pdf_url or urls["pdf_url"]
+
+    published_dt = datetime.fromisoformat(published.replace("Z", "+00:00")) if published else None
+    updated_dt = datetime.fromisoformat(updated.replace("Z", "+00:00")) if updated else None
+
+    paper = {
+        "arxiv_id": arxiv_id,
+        "title": title,
+        "summary": summary,
+        "authors": authors,
+        "published": published,
+        "updated": updated,
+        "published_local": published_dt.astimezone(LOCAL_TZ).strftime("%Y-%m-%d %H:%M") if published_dt else "",
+        "updated_local": updated_dt.astimezone(LOCAL_TZ).strftime("%Y-%m-%d %H:%M") if updated_dt else "",
+        "abs_url": abs_url,
+        "pdf_url": pdf_url,
+        "doi": doi,
+        "doi_url": canonical_doi_url(arxiv_id, doi),
+        "comment": comment,
+        "journal_ref": journal_ref,
+        "categories": categories,
+        "primary_category": primary_category,
+        "query_domains": [query_domain],
+    }
+    paper.update(score_paper(paper))
+    return paper
+
+
+def score_terms(text: str, weights: Dict[str, float]) -> Dict[str, Any]:
+    matched: List[str] = []
+    score = 0.0
+    lowered = text.lower()
+    for term, weight in weights.items():
+        if term in lowered:
+            matched.append(term)
+            score += weight
+    return {"score": round(score, 3), "matched": matched}
+
+
+def score_domain_fit(paper: Dict[str, Any]) -> Dict[str, Any]:
+    text = f"{paper.get('title', '')} {paper.get('summary', '')} {paper.get('comment', '')}".lower()
+    domain_scores: Dict[str, float] = {}
+    domain_matches: Dict[str, List[str]] = {}
+
+    for domain, cfg in DOMAIN_CONFIGS.items():
+        keyword_result = score_terms(text, cfg["keywords"])
+        category_score = sum(cfg["categories"].get(category, 0.0) for category in paper.get("categories", []))
+        query_boost = 1.2 if domain in paper.get("query_domains", []) else 0.0
+        total = keyword_result["score"] + category_score + query_boost
+        domain_scores[domain] = round(total, 3)
+        domain_matches[domain] = keyword_result["matched"]
+
+    best_domain = max(domain_scores.items(), key=lambda item: item[1])[0]
+    return {
+        "domain": best_domain,
+        "domain_scores": domain_scores,
+        "domain_matches": domain_matches,
+        "score_domain_fit": round(domain_scores[best_domain], 3),
+    }
+
+
+def score_paper(paper: Dict[str, Any]) -> Dict[str, Any]:
+    text = f"{paper.get('title', '')} {paper.get('summary', '')} {paper.get('comment', '')} {paper.get('journal_ref', '')}".lower()
+    domain_result = score_domain_fit(paper)
+    applied_result = score_terms(text, APPLIED_KEYWORDS)
+    innovation_result = score_terms(text, INNOVATION_KEYWORDS)
+    negative_result = score_terms(text, NEGATIVE_KEYWORDS)
+
+    recency_score = 0.0
+    published = paper.get("published")
+    if published:
+        try:
+            published_dt = datetime.fromisoformat(published.replace("Z", "+00:00")).astimezone(LOCAL_TZ)
+            age_hours = max((now_local() - published_dt).total_seconds() / 3600.0, 0.0)
+            recency_score = max(0.0, 1.5 - min(age_hours / 24.0, 1.5))
+        except Exception:
+            recency_score = 0.0
+
+    total_score = (
+        domain_result["score_domain_fit"] * 1.35
+        + applied_result["score"] * 1.25
+        + innovation_result["score"]
+        + negative_result["score"]
+        + recency_score
+    )
+
+    return {
+        **domain_result,
+        "score_applied": round(applied_result["score"], 3),
+        "score_innovation": round(innovation_result["score"], 3),
+        "score_recency": round(recency_score, 3),
+        "score_penalty": round(negative_result["score"], 3),
+        "score_total": round(total_score, 3),
+        "matched_applied_terms": applied_result["matched"],
+        "matched_innovation_terms": innovation_result["matched"],
+        "matched_negative_terms": negative_result["matched"],
+    }
+
+
+def merge_papers(existing: Dict[str, Any], incoming: Dict[str, Any]) -> Dict[str, Any]:
+    merged = dict(existing)
+    merged["query_domains"] = sorted(set(existing.get("query_domains", [])) | set(incoming.get("query_domains", [])))
+    merged["categories"] = sorted(set(existing.get("categories", [])) | set(incoming.get("categories", [])))
+    if incoming.get("comment") and len(incoming["comment"]) > len(existing.get("comment", "")):
+        merged["comment"] = incoming["comment"]
+    if incoming.get("journal_ref") and not existing.get("journal_ref"):
+        merged["journal_ref"] = incoming["journal_ref"]
+    rescored = score_paper(merged)
+    merged.update(rescored)
+    return merged
+
+
+def fetch_candidates(lookback_days: int = 2, max_results_per_domain: int = 40) -> List[Dict[str, Any]]:
+    papers_by_id: Dict[str, Dict[str, Any]] = {}
+
+    for domain in DOMAIN_CONFIGS:
+        query = build_query(domain, lookback_days, with_date=True)
+        log(f"Fetching {domain} candidates from arXiv")
+        feed = request_feed(query, start=0, max_results=max_results_per_domain)
+        root = ET.fromstring(feed)
+        entries = root.findall("atom:entry", ATOM_NS)
+
+        if not entries:
+            log(f"No dated results for {domain}; falling back to latest recent results without date filter")
+            query = build_query(domain, lookback_days, with_date=False)
+            feed = request_feed(query, start=0, max_results=max_results_per_domain)
+            root = ET.fromstring(feed)
+            entries = root.findall("atom:entry", ATOM_NS)
+
+        for entry in entries:
+            paper = parse_entry(entry, query_domain=domain)
+            arxiv_id = paper["arxiv_id"]
+            if not arxiv_id:
+                continue
+            if arxiv_id in papers_by_id:
+                papers_by_id[arxiv_id] = merge_papers(papers_by_id[arxiv_id], paper)
+            else:
+                papers_by_id[arxiv_id] = paper
+
+    papers = list(papers_by_id.values())
+    papers.sort(key=lambda item: (item.get("score_total", 0.0), item.get("published", "")), reverse=True)
+    return papers
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Fetch daily arXiv candidates for RobotDaily")
+    parser.add_argument("--lookback-days", type=int, default=2)
+    parser.add_argument("--max-results-per-domain", type=int, default=40)
+    parser.add_argument("--output", type=str, default="")
+    args = parser.parse_args()
+
+    papers = fetch_candidates(
+        lookback_days=args.lookback_days,
+        max_results_per_domain=args.max_results_per_domain,
+    )
+
+    payload = {
+        "generated_at": now_local().isoformat(),
+        "count": len(papers),
+        "papers": papers,
+    }
+
+    if args.output:
+        write_json(args.output, payload)
+        log(f"Saved {len(papers)} candidates to {args.output}")
+    else:
+        print(json.dumps(payload, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 56 - 0
arxiv-digest/scripts/install_system_cron.py

@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+"""Install or remove a system cron entry for RobotDaily."""
+
+from __future__ import annotations
+
+import argparse
+import subprocess
+from pathlib import Path
+
+from utils import SKILL_DIR, log
+
+MARKER = "# robotdaily-arxiv-digest"
+
+
+def read_crontab() -> str:
+    result = subprocess.run(["crontab", "-l"], capture_output=True, text=True, check=False)
+    if result.returncode != 0:
+        return ""
+    return result.stdout
+
+
+def write_crontab(content: str) -> None:
+    result = subprocess.run(["crontab", "-"], input=content, text=True, check=False)
+    if result.returncode != 0:
+        raise SystemExit("Failed to write crontab")
+
+
+def build_line(python_bin: str) -> str:
+    skill_dir = SKILL_DIR
+    log_dir = skill_dir / "logs"
+    log_file = log_dir / "robotdaily.log"
+    return (
+        f"30 10 * * * mkdir -p {log_dir} && cd {skill_dir} && "
+        f"{python_bin} scripts/run_daily.py --publish-discord >> {log_file} 2>&1 {MARKER}"
+    )
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Install RobotDaily daily cron")
+    parser.add_argument("--python-bin", default="python3")
+    parser.add_argument("--remove", action="store_true")
+    args = parser.parse_args()
+
+    existing_lines = [line for line in read_crontab().splitlines() if MARKER not in line]
+    if args.remove:
+        write_crontab("\n".join(existing_lines) + ("\n" if existing_lines else ""))
+        log("Removed RobotDaily cron entry")
+        return
+
+    existing_lines.append(build_line(args.python_bin))
+    write_crontab("\n".join(existing_lines) + "\n")
+    log("Installed RobotDaily cron entry for 10:30 Asia/Shanghai")
+
+
+if __name__ == "__main__":
+    main()

+ 318 - 0
arxiv-digest/scripts/publish_discord.py

@@ -0,0 +1,318 @@
+#!/usr/bin/env python3
+"""Publish RobotDaily digests to Discord via OpenClaw and optional Discord REST."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+from urllib.error import HTTPError
+from urllib.request import Request, urlopen
+
+from fetch_arxiv import DOMAIN_CONFIGS
+from utils import (
+    format_authors,
+    log,
+    normalize_space,
+    now_local,
+    read_json,
+    run_command_json,
+    truncate,
+)
+
+DOMAIN_ORDER = ["embodied", "representation", "reinforcement"]
+DISCORD_API = "https://discord.com/api/v10"
+
+
+class PublishError(RuntimeError):
+    pass
+
+
+def normalize_channel_name(name: str) -> str:
+    text = normalize_space(name).lstrip("#").lower()
+    text = re.sub(r"\s+", "-", text)
+    return text
+
+
+class DiscordPublisher:
+    def __init__(
+        self,
+        *,
+        openclaw_bin: str,
+        account_id: str,
+        mode: str,
+        guild_id: str,
+        parent_channel_id: str,
+        target_channel_id: str,
+        target_channel_name: str,
+        category_id: str,
+        bot_token: str,
+        thread_auto_archive_min: int,
+        dry_run: bool,
+    ) -> None:
+        self.openclaw_bin = openclaw_bin
+        self.account_id = account_id
+        self.mode = mode
+        self.guild_id = guild_id
+        self.parent_channel_id = parent_channel_id
+        self.target_channel_id = target_channel_id
+        self.target_channel_name = target_channel_name
+        self.category_id = category_id
+        self.bot_token = bot_token
+        self.thread_auto_archive_min = thread_auto_archive_min
+        self.dry_run = dry_run
+
+    def openclaw(self, *args: str) -> Dict[str, Any]:
+        command = [self.openclaw_bin, "message", *args, "--channel", "discord", "--account", self.account_id, "--json"]
+        if self.dry_run:
+            command.append("--dry-run")
+        return run_command_json(command)
+
+    def list_channels(self) -> List[Dict[str, Any]]:
+        if not self.guild_id:
+            return []
+        payload = self.openclaw("channel", "list", "--guild-id", self.guild_id)
+        return payload.get("payload", {}).get("channels", [])
+
+    def list_threads(self) -> List[Dict[str, Any]]:
+        if not self.guild_id or not self.parent_channel_id:
+            return []
+        payload = self.openclaw(
+            "thread",
+            "list",
+            "--guild-id",
+            self.guild_id,
+            "--channel-id",
+            self.parent_channel_id,
+            "--include-archived",
+            "--limit",
+            "100",
+        )
+        return payload.get("payload", {}).get("threads", {}).get("threads", [])
+
+    def find_existing_channel(self, name: str) -> Optional[str]:
+        wanted_exact = normalize_space(name)
+        wanted_normalized = normalize_channel_name(name)
+        for channel in self.list_channels():
+            current_name = str(channel.get("name", ""))
+            if current_name == wanted_exact or normalize_channel_name(current_name) == wanted_normalized:
+                return str(channel.get("id", ""))
+        return None
+
+    def find_existing_thread(self, name: str) -> Optional[str]:
+        for thread in self.list_threads():
+            if thread.get("name") == name:
+                return str(thread.get("id", ""))
+        return None
+
+    def create_channel_via_rest(self, name: str, topic: str = "") -> str:
+        if not self.guild_id:
+            raise PublishError("channel 模式需要 DISCORD_GUILD_ID")
+        if self.dry_run:
+            return f"dry-run-channel-{normalize_channel_name(name) or 'robotdaily'}"
+        if not self.bot_token:
+            raise PublishError("创建 Discord 频道需要 DISCORD_BOT_TOKEN;当前 OpenClaw CLI 版本没有公开 channel create 子命令")
+
+        body: Dict[str, Any] = {"name": normalize_channel_name(name) or "robotdaily", "type": 0}
+        if self.category_id:
+            body["parent_id"] = self.category_id
+        if topic:
+            body["topic"] = topic
+
+        request = Request(
+            url=f"{DISCORD_API}/guilds/{self.guild_id}/channels",
+            method="POST",
+            data=json.dumps(body).encode("utf-8"),
+            headers={
+                "Authorization": f"Bot {self.bot_token}",
+                "Content-Type": "application/json",
+            },
+        )
+        try:
+            with urlopen(request, timeout=30) as response:
+                payload = json.loads(response.read().decode("utf-8", errors="ignore"))
+        except HTTPError as exc:
+            detail = exc.read().decode("utf-8", errors="ignore")
+            raise PublishError(f"Discord channel create failed: {exc.code} {detail}") from exc
+        return str(payload.get("id", ""))
+
+    def create_thread(self, thread_name: str, opening_message: str) -> str:
+        if not self.parent_channel_id:
+            raise PublishError("thread 模式需要 DISCORD_PARENT_CHANNEL_ID")
+        existing = self.find_existing_thread(thread_name)
+        if existing:
+            return existing
+        payload = self.openclaw(
+            "thread",
+            "create",
+            "--target",
+            f"channel:{self.parent_channel_id}",
+            "--thread-name",
+            thread_name,
+            "--message",
+            opening_message,
+            "--auto-archive-min",
+            str(self.thread_auto_archive_min),
+        )
+        thread = payload.get("payload", {}).get("thread", {})
+        thread_id = str(thread.get("id", ""))
+        if not thread_id and self.dry_run:
+            return f"dry-run-thread-{thread_name}"
+        if not thread_id:
+            raise PublishError("OpenClaw thread create 没有返回 thread id")
+        return thread_id
+
+    def create_fixed_channel(self, title: str) -> str:
+        channel_name = normalize_channel_name(self.target_channel_name or "robotdaily") or "robotdaily"
+        existing = self.find_existing_channel(channel_name)
+        if existing:
+            return existing
+        topic = truncate(title, 180)
+        channel_id = self.create_channel_via_rest(channel_name, topic=topic)
+        if not channel_id:
+            raise PublishError("Discord fixed channel create 返回空 id")
+        return channel_id
+
+    def create_or_resolve_target(self, title: str, opening_message: str) -> str:
+        date_slug = now_local().strftime("%Y-%m-%d")
+        if self.mode == "existing-channel":
+            if not self.target_channel_id:
+                raise PublishError("existing-channel 模式需要 DISCORD_TARGET_CHANNEL_ID")
+            return self.target_channel_id
+
+        if self.mode == "fixed-channel":
+            return self.create_fixed_channel(title)
+
+        if self.mode == "thread":
+            thread_name = f"RobotDaily {date_slug}"
+            return self.create_thread(thread_name, opening_message)
+
+        if self.mode == "channel":
+            prefix = normalize_channel_name(self.target_channel_name or "robotdaily") or "robotdaily"
+            channel_name = f"{prefix}-{date_slug}"
+            existing = self.find_existing_channel(channel_name)
+            if existing:
+                return existing
+            topic = truncate(title, 180)
+            channel_id = self.create_channel_via_rest(channel_name, topic=topic)
+            if not channel_id:
+                raise PublishError("Discord channel create 返回空 id")
+            return channel_id
+
+        raise PublishError(f"未知的投递模式: {self.mode}")
+
+    def send_message(self, target_channel_id: str, message: str, media: str = "") -> Dict[str, Any]:
+        args = ["send", "--target", f"channel:{target_channel_id}", "--message", message]
+        if media:
+            args.extend(["--media", media])
+        return self.openclaw(*args)
+
+
+def build_opening_message(payload: Dict[str, Any]) -> str:
+    total = len(payload.get("papers", []))
+    counts = payload.get("counts", {})
+    parts = [f"老大早安~今天给你挑了 {total} 篇偏应用论文。"]
+    for domain in DOMAIN_ORDER:
+        count = counts.get(domain, 0)
+        if count:
+            parts.append(f"{DOMAIN_CONFIGS[domain]['label_zh']} {count} 篇")
+    parts.append("下面每张卡片都带 DOI / arXiv / PDF,可直接点开读。")
+    return " | ".join(parts)
+
+
+def build_domain_header(domain: str, count: int) -> str:
+    return f"## {DOMAIN_CONFIGS[domain]['label_zh']}({count} 篇)"
+
+
+def build_paper_message(paper: Dict[str, Any]) -> str:
+    tags = " ".join(f"`{tag}`" for tag in paper.get("tags", [])[:6])
+    lines = [
+        f"### {paper.get('domain_rank', '?')}. {paper.get('title', '')}",
+        f"作者:{format_authors(paper.get('authors', []), limit=4)}",
+        f"关键词:{tags}" if tags else "关键词:暂无",
+        f"简析:{paper.get('brief_explanation_zh', '')}",
+        f"摘要中译:{truncate(paper.get('translated_abstract_zh', ''), 700)}",
+        f"DOI:{paper.get('doi_url', '')}",
+        f"arXiv:{paper.get('abs_url', '')}",
+        f"PDF:{paper.get('pdf_url', '')}",
+    ]
+    return "\n".join(lines)
+
+
+def publish_digest(
+    payload: Dict[str, Any],
+    *,
+    html_path: str,
+    markdown_path: str,
+    publisher: DiscordPublisher,
+) -> str:
+    opening_message = build_opening_message(payload)
+    target_channel_id = publisher.create_or_resolve_target(opening_message, opening_message)
+
+    attached_message = opening_message + "\n\n已附上移动端 HTML 晨读版,点开卡片能直接看中译摘要。"
+    publisher.send_message(target_channel_id, attached_message, media=html_path)
+
+    grouped: Dict[str, List[Dict[str, Any]]] = {domain: [] for domain in DOMAIN_ORDER}
+    for paper in payload.get("papers", []):
+        grouped.setdefault(paper["domain"], []).append(paper)
+
+    for domain in DOMAIN_ORDER:
+        papers = grouped.get(domain, [])
+        if not papers:
+            continue
+        publisher.send_message(target_channel_id, build_domain_header(domain, len(papers)))
+        for paper in papers:
+            publisher.send_message(target_channel_id, build_paper_message(paper))
+
+    if markdown_path:
+        publisher.send_message(target_channel_id, "附一份 Markdown 归档版,桌面端检索会更方便。", media=markdown_path)
+
+    return target_channel_id
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Publish RobotDaily digest to Discord")
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--html", required=True)
+    parser.add_argument("--markdown", default="")
+    parser.add_argument("--mode", default="thread")
+    parser.add_argument("--openclaw-bin", default="openclaw")
+    parser.add_argument("--account-id", default="codex")
+    parser.add_argument("--guild-id", default="")
+    parser.add_argument("--parent-channel-id", default="")
+    parser.add_argument("--target-channel-id", default="")
+    parser.add_argument("--target-channel-name", default="")
+    parser.add_argument("--category-id", default="")
+    parser.add_argument("--bot-token", default="")
+    parser.add_argument("--thread-auto-archive-min", type=int, default=10080)
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    payload = read_json(args.input, default={}) or {}
+    publisher = DiscordPublisher(
+        openclaw_bin=args.openclaw_bin,
+        account_id=args.account_id,
+        mode=args.mode,
+        guild_id=args.guild_id,
+        parent_channel_id=args.parent_channel_id,
+        target_channel_id=args.target_channel_id,
+        target_channel_name=args.target_channel_name,
+        category_id=args.category_id,
+        bot_token=args.bot_token,
+        thread_auto_archive_min=args.thread_auto_archive_min,
+        dry_run=args.dry_run,
+    )
+
+    target = publish_digest(
+        payload,
+        html_path=args.html,
+        markdown_path=args.markdown,
+        publisher=publisher,
+    )
+    log(f"Digest published to Discord target: {target}")
+
+
+if __name__ == "__main__":
+    main()

+ 172 - 0
arxiv-digest/scripts/render_digest.py

@@ -0,0 +1,172 @@
+#!/usr/bin/env python3
+"""Render a mobile-friendly HTML digest and a Discord-friendly markdown digest."""
+
+from __future__ import annotations
+
+import argparse
+import json
+from collections import defaultdict
+from pathlib import Path
+from typing import Any, Dict, List
+
+from fetch_arxiv import DOMAIN_CONFIGS
+from utils import SKILL_DIR, format_authors, html_escape, normalize_space, now_local, read_json, write_text
+
+DOMAIN_ORDER = ["embodied", "representation", "reinforcement"]
+TEMPLATE_PATH = SKILL_DIR / "assets" / "mobile_digest_template.html"
+
+
+def render_tag(tag: str) -> str:
+    return f'<span class="tag">#{html_escape(tag)}</span>'
+
+
+def render_link(label: str, url: str) -> str:
+    if not url:
+        return ""
+    safe_label = html_escape(label)
+    safe_url = html_escape(url)
+    return f'<a class="link-btn" href="{safe_url}" target="_blank" rel="noopener noreferrer">{safe_label}</a>'
+
+
+def render_paper_card(paper: Dict[str, Any]) -> str:
+    domain_label = DOMAIN_CONFIGS[paper["domain"]]["label_zh"]
+    tags_html = "".join(render_tag(tag) for tag in paper.get("tags", []))
+    links_html = "".join(
+        item
+        for item in [
+            render_link("打开 DOI", paper.get("doi_url", "")),
+            render_link("打开 arXiv", paper.get("abs_url", "")),
+            render_link("打开 PDF", paper.get("pdf_url", "")),
+        ]
+        if item
+    )
+    authors = html_escape(format_authors(paper.get("authors", []), limit=4))
+    return f"""
+<details class=\"paper-card\">
+  <summary>
+    <div class=\"meta-row\">
+      <span class=\"pill domain\">{html_escape(domain_label)}</span>
+      <span class=\"pill score\">综合分 {paper.get('score_total', 0):.1f}</span>
+      <span class=\"pill date\">{html_escape(paper.get('published_local', '')[:10])}</span>
+    </div>
+    <h3 class=\"paper-title\">{html_escape(paper.get('title', ''))}</h3>
+    <p class=\"teaser\">{html_escape(paper.get('brief_explanation_zh', ''))}</p>
+    <div class=\"tag-row\">{tags_html}</div>
+  </summary>
+  <div class=\"paper-body\">
+    <div class=\"info\">
+      <strong>作者:</strong>{authors}<br />
+      <strong>arXiv:</strong>{html_escape(paper.get('arxiv_id', ''))}<br />
+      <strong>入选原因:</strong>{html_escape(paper.get('selection_reason', ''))}
+    </div>
+    <div class=\"block\">
+      <h3>中文摘要</h3>
+      <p>{html_escape(paper.get('translated_abstract_zh', ''))}</p>
+    </div>
+    <div class=\"block\">
+      <h3>原文摘要</h3>
+      <p>{html_escape(paper.get('summary', ''))}</p>
+    </div>
+    <div class=\"links\">{links_html}</div>
+  </div>
+</details>
+""".strip()
+
+
+def render_sections(papers: List[Dict[str, Any]]) -> Dict[str, str]:
+    grouped: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
+    for paper in papers:
+        grouped[paper["domain"]].append(paper)
+
+    nav_parts: List[str] = []
+    section_parts: List[str] = []
+    for domain in DOMAIN_ORDER:
+        domain_papers = grouped.get(domain, [])
+        if not domain_papers:
+            continue
+        label = DOMAIN_CONFIGS[domain]["label_zh"]
+        nav_parts.append(f'<a href="#{domain}">{html_escape(label)} · {len(domain_papers)} 篇</a>')
+        cards_html = "\n".join(render_paper_card(paper) for paper in domain_papers)
+        section_parts.append(
+            f"""
+<section class=\"section\" id=\"{domain}\">
+  <div class=\"section-header\">
+    <h2>{html_escape(label)}</h2>
+    <span class=\"count\">{len(domain_papers)} 篇</span>
+  </div>
+  <div class=\"cards\">{cards_html}</div>
+</section>
+""".strip()
+        )
+    return {"nav": "".join(nav_parts), "sections": "\n".join(section_parts)}
+
+
+def render_html(payload: Dict[str, Any]) -> str:
+    template = TEMPLATE_PATH.read_text(encoding="utf-8")
+    papers = payload.get("papers", [])
+    rendered = render_sections(papers)
+    intro = f"{now_local().strftime('%Y-%m-%d')} · 具身智能 / 表征学习 / 强化学习 · 每个方向 2-3 篇偏应用候选。点开卡片即可看中文摘要、原文摘要与 DOI 链接。"
+    replacements = {
+        "{{date}}": now_local().strftime("%Y-%m-%d"),
+        "{{intro}}": html_escape(intro),
+        "{{nav}}": rendered["nav"],
+        "{{sections}}": rendered["sections"],
+        "{{generated_at}}": html_escape(now_local().strftime("%Y-%m-%d %H:%M %Z")),
+    }
+    html = template
+    for placeholder, value in replacements.items():
+        html = html.replace(placeholder, value)
+    return html
+
+
+def render_markdown(payload: Dict[str, Any]) -> str:
+    lines: List[str] = []
+    lines.append(f"# RobotDaily | {now_local().strftime('%Y-%m-%d')}")
+    lines.append("")
+    lines.append("具身智能 / 表征学习 / 强化学习,每个方向 2-3 篇偏应用候选。")
+    lines.append("")
+    for domain in DOMAIN_ORDER:
+        papers = [paper for paper in payload.get("papers", []) if paper.get("domain") == domain]
+        if not papers:
+            continue
+        lines.append(f"## {DOMAIN_CONFIGS[domain]['label_zh']}({len(papers)} 篇)")
+        lines.append("")
+        for idx, paper in enumerate(papers, start=1):
+            tags = " ".join(f"`{tag}`" for tag in paper.get("tags", []))
+            lines.extend(
+                [
+                    f"### {idx}. {paper.get('title', '')}",
+                    f"- 作者:{format_authors(paper.get('authors', []), limit=4)}",
+                    f"- 亮点:{paper.get('brief_explanation_zh', '')}",
+                    f"- 标签:{tags}",
+                    f"- DOI:{paper.get('doi_url', '')}",
+                    f"- arXiv:{paper.get('abs_url', '')}",
+                    f"- PDF:{paper.get('pdf_url', '')}",
+                    "",
+                ]
+            )
+    return "\n".join(lines).strip() + "\n"
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Render RobotDaily digest HTML/markdown")
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--html-output", default="")
+    parser.add_argument("--md-output", default="")
+    args = parser.parse_args()
+
+    payload = read_json(args.input, default={}) or {}
+    html = render_html(payload)
+    markdown = render_markdown(payload)
+
+    if args.html_output:
+        write_text(args.html_output, html)
+    else:
+        print(html)
+
+    if args.md_output:
+        write_text(args.md_output, markdown)
+
+
+if __name__ == "__main__":
+    main()

+ 130 - 0
arxiv-digest/scripts/run_daily.py

@@ -0,0 +1,130 @@
+#!/usr/bin/env python3
+"""End-to-end RobotDaily pipeline."""
+
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+from typing import Any, Dict, List
+
+from enrich_papers import enrich_selection
+from fetch_arxiv import fetch_candidates
+from publish_discord import DiscordPublisher, publish_digest
+from render_digest import render_html, render_markdown
+from select_papers import select_papers
+from utils import DEFAULT_OUTPUT_DIR, ensure_dir, load_env, log, now_local, write_json, write_text
+
+
+def parse_models(raw: str) -> List[str]:
+    return [item.strip() for item in str(raw or "").split(",") if item.strip()]
+
+
+def choose_selection(lookback_days: int, fallback_lookback_days: int, max_results_per_domain: int) -> Dict[str, Any]:
+    candidates = fetch_candidates(lookback_days=lookback_days, max_results_per_domain=max_results_per_domain)
+    selection = select_papers(candidates)
+
+    counts = selection.get("counts", {})
+    if any(counts.get(domain, 0) < 2 for domain in ["embodied", "representation", "reinforcement"]) and fallback_lookback_days > lookback_days:
+        log(f"Some domains are sparse with lookback={lookback_days}; retrying with lookback={fallback_lookback_days}")
+        candidates = fetch_candidates(lookback_days=fallback_lookback_days, max_results_per_domain=max_results_per_domain)
+        selection = select_papers(candidates)
+
+    selection["candidate_count"] = len(candidates)
+    selection["candidates"] = candidates
+    return selection
+
+
+def build_output_paths(root: Path, date_slug: str) -> Dict[str, Path]:
+    bundle_dir = ensure_dir(root / date_slug)
+    return {
+        "bundle_dir": bundle_dir,
+        "candidates_json": bundle_dir / "candidates.json",
+        "selected_json": bundle_dir / "selected.json",
+        "enriched_json": bundle_dir / "enriched.json",
+        "digest_html": bundle_dir / "robotdaily.html",
+        "digest_md": bundle_dir / "robotdaily.md",
+        "manifest_json": bundle_dir / "manifest.json",
+    }
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run RobotDaily daily digest pipeline")
+    parser.add_argument("--output-root", default="")
+    parser.add_argument("--lookback-days", type=int, default=2)
+    parser.add_argument("--fallback-lookback-days", type=int, default=4)
+    parser.add_argument("--max-results-per-domain", type=int, default=40)
+    parser.add_argument("--models", default="")
+    parser.add_argument("--skip-enrich", action="store_true")
+    parser.add_argument("--publish-discord", action="store_true")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    env = load_env()
+    date_slug = now_local().strftime("%Y-%m-%d")
+    output_root = Path(args.output_root or env.get("ROBOTDAILY_OUTPUT_DIR", str(DEFAULT_OUTPUT_DIR)))
+    paths = build_output_paths(output_root, date_slug)
+
+    selection = choose_selection(
+        lookback_days=args.lookback_days,
+        fallback_lookback_days=args.fallback_lookback_days,
+        max_results_per_domain=args.max_results_per_domain,
+    )
+    write_json(paths["candidates_json"], {"generated_at": now_local().isoformat(), "papers": selection.get("candidates", [])})
+    write_json(paths["selected_json"], {k: v for k, v in selection.items() if k != "candidates"})
+
+    models = parse_models(args.models or env.get("INSIGHT_MODELS", "glm-4.7:cloud,qwen3.5:cloud,qwen3.5:27b,glm-4.7-flash-64k:latest"))
+    if args.skip_enrich:
+        enriched = {k: v for k, v in selection.items() if k != "candidates"}
+        for paper in enriched.get("papers", []):
+            paper.setdefault("translated_abstract_zh", paper.get("summary", ""))
+            paper.setdefault("brief_explanation_zh", paper.get("selection_reason", ""))
+            paper.setdefault("tags", [])
+    else:
+        enriched = enrich_selection({k: v for k, v in selection.items() if k != "candidates"}, model_names=models)
+    write_json(paths["enriched_json"], enriched)
+
+    html = render_html(enriched)
+    markdown = render_markdown(enriched)
+    write_text(paths["digest_html"], html)
+    write_text(paths["digest_md"], markdown)
+
+    manifest = {
+        "generated_at": now_local().isoformat(),
+        "date": date_slug,
+        "candidate_count": selection.get("candidate_count", 0),
+        "selected_count": len(enriched.get("papers", [])),
+        "counts": enriched.get("counts", {}),
+        "models": models,
+        "paths": {name: str(path) for name, path in paths.items() if name != "bundle_dir"},
+    }
+    write_json(paths["manifest_json"], manifest)
+
+    if args.publish_discord:
+        publisher = DiscordPublisher(
+            openclaw_bin=env.get("OPENCLAW_BIN", "openclaw"),
+            account_id=env.get("DISCORD_ACCOUNT_ID", "codex"),
+            mode=env.get("DISCORD_DELIVERY_MODE", "thread"),
+            guild_id=env.get("DISCORD_GUILD_ID", ""),
+            parent_channel_id=env.get("DISCORD_PARENT_CHANNEL_ID", ""),
+            target_channel_id=env.get("DISCORD_TARGET_CHANNEL_ID", ""),
+            target_channel_name=env.get("DISCORD_TARGET_CHANNEL_NAME", ""),
+            category_id=env.get("DISCORD_CATEGORY_ID", ""),
+            bot_token=env.get("DISCORD_BOT_TOKEN", ""),
+            thread_auto_archive_min=int(env.get("DISCORD_THREAD_AUTO_ARCHIVE_MIN", "10080")),
+            dry_run=args.dry_run,
+        )
+        target = publish_digest(
+            enriched,
+            html_path=str(paths["digest_html"]),
+            markdown_path=str(paths["digest_md"]),
+            publisher=publisher,
+        )
+        manifest["discord_target"] = target
+        write_json(paths["manifest_json"], manifest)
+
+    print(json.dumps(manifest, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 112 - 0
arxiv-digest/scripts/select_papers.py

@@ -0,0 +1,112 @@
+#!/usr/bin/env python3
+"""Select 2-3 promising papers per domain for RobotDaily."""
+
+from __future__ import annotations
+
+import argparse
+import json
+from collections import defaultdict
+from typing import Any, Dict, List
+
+from fetch_arxiv import DOMAIN_CONFIGS
+from utils import log, now_local, read_json, write_json
+
+DOMAIN_ORDER = ["embodied", "representation", "reinforcement"]
+
+
+def paper_sort_key(paper: Dict[str, Any]) -> Any:
+    return (
+        paper.get("score_total", 0.0),
+        paper.get("score_applied", 0.0),
+        paper.get("score_innovation", 0.0),
+        paper.get("published", ""),
+    )
+
+
+def selection_reason(paper: Dict[str, Any]) -> str:
+    reasons: List[str] = []
+    applied = paper.get("matched_applied_terms", [])[:3]
+    innovation = paper.get("matched_innovation_terms", [])[:3]
+    if applied:
+        reasons.append("应用信号: " + ", ".join(applied))
+    if innovation:
+        reasons.append("创新信号: " + ", ".join(innovation))
+    domain_matches = paper.get("domain_matches", {}).get(paper.get("domain", ""), [])[:3]
+    if domain_matches:
+        reasons.append("领域匹配: " + ", ".join(domain_matches))
+    if not reasons:
+        reasons.append("综合得分较高,且发布时间较新")
+    return ";".join(reasons)
+
+
+def choose_domain_papers(papers: List[Dict[str, Any]], min_per_domain: int = 2, max_per_domain: int = 3) -> List[Dict[str, Any]]:
+    ranked = sorted(papers, key=paper_sort_key, reverse=True)
+    if not ranked:
+        return []
+
+    selected = ranked[:max_per_domain]
+    if len(selected) >= 3:
+        score_gap = selected[1].get("score_total", 0.0) - selected[2].get("score_total", 0.0)
+        if score_gap > 1.2 or selected[2].get("score_total", 0.0) < 4.2:
+            selected = selected[:2]
+
+    if len(selected) < min_per_domain:
+        selected = ranked[: min(min_per_domain, len(ranked))]
+
+    output: List[Dict[str, Any]] = []
+    for index, paper in enumerate(selected, start=1):
+        enriched = dict(paper)
+        enriched["domain_rank"] = index
+        enriched["selection_reason"] = selection_reason(paper)
+        output.append(enriched)
+    return output
+
+
+def select_papers(candidates: List[Dict[str, Any]], min_per_domain: int = 2, max_per_domain: int = 3) -> Dict[str, Any]:
+    grouped: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
+    for paper in candidates:
+        grouped[paper.get("domain", "representation")].append(paper)
+
+    selected_by_domain: Dict[str, List[Dict[str, Any]]] = {}
+    flat_selected: List[Dict[str, Any]] = []
+
+    for domain in DOMAIN_ORDER:
+        picked = choose_domain_papers(grouped.get(domain, []), min_per_domain=min_per_domain, max_per_domain=max_per_domain)
+        selected_by_domain[domain] = picked
+        flat_selected.extend(picked)
+
+    flat_selected.sort(key=lambda item: (DOMAIN_ORDER.index(item["domain"]), item.get("domain_rank", 0)))
+
+    return {
+        "generated_at": now_local().isoformat(),
+        "counts": {domain: len(selected_by_domain[domain]) for domain in DOMAIN_ORDER},
+        "selected_by_domain": selected_by_domain,
+        "papers": flat_selected,
+    }
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Select daily papers for RobotDaily")
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--output", default="")
+    parser.add_argument("--min-per-domain", type=int, default=2)
+    parser.add_argument("--max-per-domain", type=int, default=3)
+    args = parser.parse_args()
+
+    payload = read_json(args.input, default={}) or {}
+    candidates = payload.get("papers", []) if isinstance(payload, dict) else []
+    selected = select_papers(
+        candidates,
+        min_per_domain=args.min_per_domain,
+        max_per_domain=args.max_per_domain,
+    )
+
+    log("Selected papers per domain: " + json.dumps(selected["counts"], ensure_ascii=False))
+    if args.output:
+        write_json(args.output, selected)
+    else:
+        print(json.dumps(selected, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 249 - 0
arxiv-digest/scripts/utils.py

@@ -0,0 +1,249 @@
+#!/usr/bin/env python3
+"""Shared helpers for the RobotDaily arXiv digest skill."""
+
+from __future__ import annotations
+
+import json
+import os
+import re
+import subprocess
+import sys
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, Iterable, List, Optional
+from urllib.error import HTTPError, URLError
+from urllib.request import Request, urlopen
+from zoneinfo import ZoneInfo
+
+SKILL_DIR = Path(__file__).resolve().parents[1]
+ROOT_DIR = SKILL_DIR.parent
+DEFAULT_OUTPUT_DIR = SKILL_DIR / "output"
+DEFAULT_LOG_DIR = SKILL_DIR / "logs"
+LOCAL_TZ = ZoneInfo("Asia/Shanghai")
+OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://127.0.0.1:11434/api/generate")
+
+
+def log(message: str) -> None:
+    timestamp = datetime.now(LOCAL_TZ).strftime("%H:%M:%S")
+    print(f"[{timestamp}] {message}", file=sys.stderr)
+
+
+def now_local() -> datetime:
+    return datetime.now(LOCAL_TZ)
+
+
+def ensure_dir(path: Path | str) -> Path:
+    path_obj = Path(path)
+    path_obj.mkdir(parents=True, exist_ok=True)
+    return path_obj
+
+
+def normalize_space(text: str) -> str:
+    return re.sub(r"\s+", " ", str(text or "")).strip()
+
+
+def slugify(text: str) -> str:
+    slug = re.sub(r"[^a-zA-Z0-9]+", "-", str(text or "").strip().lower()).strip("-")
+    return slug or "digest"
+
+
+def canonical_arxiv_id(raw: str) -> str:
+    text = normalize_space(raw)
+    if not text:
+        return ""
+    text = text.rsplit("/", 1)[-1]
+    text = text.replace("arXiv:", "")
+    return re.sub(r"v\d+$", "", text)
+
+
+def canonical_doi(arxiv_id: str, doi: str = "") -> str:
+    clean = normalize_space(doi)
+    if clean:
+        clean = clean.replace("https://doi.org/", "").replace("http://doi.org/", "")
+        clean = clean.replace("doi:", "")
+        return clean.strip()
+    arxiv_clean = canonical_arxiv_id(arxiv_id)
+    return f"10.48550/arXiv.{arxiv_clean}" if arxiv_clean else ""
+
+
+def canonical_doi_url(arxiv_id: str, doi: str = "") -> str:
+    clean_doi = canonical_doi(arxiv_id, doi)
+    return f"https://doi.org/{clean_doi}" if clean_doi else ""
+
+
+def build_arxiv_urls(arxiv_id: str) -> Dict[str, str]:
+    clean = canonical_arxiv_id(arxiv_id)
+    if not clean:
+        return {"abs_url": "", "pdf_url": ""}
+    return {
+        "abs_url": f"https://arxiv.org/abs/{clean}",
+        "pdf_url": f"https://arxiv.org/pdf/{clean}.pdf",
+    }
+
+
+def read_json(path: Path | str, default: Any = None) -> Any:
+    path_obj = Path(path)
+    if not path_obj.exists():
+        return default
+    with path_obj.open("r", encoding="utf-8") as handle:
+        return json.load(handle)
+
+
+def write_json(path: Path | str, data: Any) -> Path:
+    path_obj = Path(path)
+    ensure_dir(path_obj.parent)
+    with path_obj.open("w", encoding="utf-8") as handle:
+        json.dump(data, handle, ensure_ascii=False, indent=2)
+    return path_obj
+
+
+def write_text(path: Path | str, content: str) -> Path:
+    path_obj = Path(path)
+    ensure_dir(path_obj.parent)
+    path_obj.write_text(content, encoding="utf-8")
+    return path_obj
+
+
+def load_env(env_file: Path | str | None = None) -> Dict[str, str]:
+    env = dict(os.environ)
+    env_path = Path(env_file) if env_file else SKILL_DIR / ".env"
+    if env_path.exists():
+        for line in env_path.read_text(encoding="utf-8").splitlines():
+            line = line.strip()
+            if not line or line.startswith("#") or "=" not in line:
+                continue
+            key, value = line.split("=", 1)
+            key = key.strip()
+            value = value.strip().strip('"').strip("'")
+            env.setdefault(key, value)
+    return env
+
+
+def extract_json_object(text: str) -> Optional[Dict[str, Any]]:
+    if not text:
+        return None
+    match = re.search(r"\{.*\}", text, re.DOTALL)
+    if not match:
+        return None
+    try:
+        payload = json.loads(match.group(0))
+    except Exception:
+        return None
+    return payload if isinstance(payload, dict) else None
+
+
+def ollama_generate_json(prompt: str, model: str, timeout: int = 120) -> Optional[Dict[str, Any]]:
+    body = json.dumps(
+        {
+            "model": model,
+            "prompt": prompt,
+            "stream": False,
+            "format": "json",
+            "think": False,
+            "options": {"temperature": 0.1, "num_predict": 800},
+        }
+    ).encode("utf-8")
+    request = Request(url=OLLAMA_URL, data=body, method="POST")
+    request.add_header("Content-Type", "application/json")
+
+    try:
+        with urlopen(request, timeout=timeout) as response:
+            payload = json.loads(response.read().decode("utf-8", errors="ignore"))
+        return extract_json_object(payload.get("response", ""))
+    except HTTPError as exc:
+        detail = ""
+        try:
+            detail = exc.read().decode("utf-8", errors="ignore")
+        except Exception:
+            detail = ""
+        log(f"Ollama request failed: {exc} {detail}".strip())
+        return None
+    except (URLError, TimeoutError) as exc:
+        log(f"Ollama request failed: {exc}")
+        return None
+    except Exception as exc:
+        log(f"Ollama parse failed: {exc}")
+        return None
+
+
+class CommandError(RuntimeError):
+    pass
+
+
+def run_command(args: List[str], cwd: Path | str | None = None) -> subprocess.CompletedProcess[str]:
+    result = subprocess.run(
+        args,
+        cwd=str(cwd) if cwd else None,
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        stderr = result.stderr.strip()
+        stdout = result.stdout.strip()
+        detail = stderr or stdout or f"exit code {result.returncode}"
+        raise CommandError(detail)
+    return result
+
+
+def run_command_json(args: List[str], cwd: Path | str | None = None) -> Dict[str, Any]:
+    result = run_command(args, cwd=cwd)
+    stdout = result.stdout.strip()
+    if not stdout:
+        return {}
+    try:
+        return json.loads(stdout)
+    except json.JSONDecodeError:
+        start = stdout.find("{")
+        end = stdout.rfind("}")
+        if start != -1 and end != -1 and end > start:
+            snippet = stdout[start : end + 1]
+            try:
+                return json.loads(snippet)
+            except json.JSONDecodeError as exc:
+                raise CommandError(f"Invalid JSON output: {exc}: {stdout[:300]}") from exc
+        raise CommandError(f"Invalid JSON output: {stdout[:300]}")
+
+
+def chunk_lines(lines: Iterable[str], limit: int = 1800) -> List[str]:
+    chunks: List[str] = []
+    current: List[str] = []
+    current_len = 0
+    for line in lines:
+        safe_line = str(line)
+        extra = len(safe_line) + (1 if current else 0)
+        if current and current_len + extra > limit:
+            chunks.append("\n".join(current))
+            current = [safe_line]
+            current_len = len(safe_line)
+            continue
+        current.append(safe_line)
+        current_len += extra
+    if current:
+        chunks.append("\n".join(current))
+    return chunks
+
+
+def format_authors(authors: List[str], limit: int = 4) -> str:
+    items = [normalize_space(author) for author in authors if normalize_space(author)]
+    if len(items) <= limit:
+        return ", ".join(items)
+    hidden = len(items) - limit
+    return f"{', '.join(items[:limit])} 等另外{hidden}人"
+
+
+def truncate(text: str, limit: int) -> str:
+    clean = normalize_space(text)
+    if len(clean) <= limit:
+        return clean
+    return clean[: limit - 1].rstrip() + "…"
+
+
+def html_escape(text: str) -> str:
+    return (
+        str(text or "")
+        .replace("&", "&amp;")
+        .replace("<", "&lt;")
+        .replace(">", "&gt;")
+        .replace('"', "&quot;")
+    )

+ 15 - 203
generate_arxiv_digest.js

@@ -1,208 +1,20 @@
-const fs = require('fs');
-const https = require('https');
-const Mustache = require('mustache');
+#!/usr/bin/env node
+const { spawnSync } = require('child_process');
+const path = require('path');
 
-// Function to fetch and parse RSS feed
-function fetchRSS(url) {
-    return new Promise((resolve, reject) => {
-        https.get(url, (res) => {
-            let data = '';
-            res.on('data', (chunk) => {
-                data += chunk;
-            });
-            res.on('end', () => {
-                resolve(data);
-            });
-        }).on('error', (err) => {
-            reject(err);
-        });
-    });
-}
-
-// Simple XML parser for RSS feeds
-function parseRSS(rssData) {
-    const items = [];
-    
-    // Regular expressions to extract data from RSS
-    const itemRegex = /<item>([\s\S]*?)<\/item>/g;
-    const titleRegex = /<title><!\[CDATA\[(.*?)\]\]><\/title>/;
-    const descRegex = /<description><!\[CDATA\[(.*?)\]\]><\/description>/;
-    const linkRegex = /<guid[^>]*>(.*?)<\/guid>/;
-    const authorRegex = /<dc:creator>(.*?)<\/dc:creator>/g;
-    
-    let match;
-    while ((match = itemRegex.exec(rssData)) !== null) {
-        const itemData = match[1];
-        
-        const titleMatch = itemData.match(titleRegex);
-        const descMatch = itemData.match(descRegex);
-        const linkMatch = itemData.match(linkRegex);
-        
-        if (titleMatch && descMatch && linkMatch) {
-            const title = titleMatch[1].replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>');
-            const description = descMatch[1].replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>');
-            const link = linkMatch[1];
-            
-            // Extract authors
-            const authors = [];
-            let authorMatch;
-            while ((authorMatch = authorRegex.exec(itemData)) !== null) {
-                authors.push(authorMatch[1]);
-            }
-            
-            // Extract arXiv ID from link
-            const arxivId = link.split('/').pop();
-            
-            items.push({
-                title,
-                description,
-                link,
-                authors: authors.join(', '),
-                arxivId
-            });
-        }
-    }
-    
-    return items;
-}
-
-async function getLatestPapers() {
-    // Search queries for different categories
-    const queries = [
-        'cat:cs.RO+OR+cat:cs.AI+OR+cat:cs.CV+OR+cat:cs.LG+OR+cat:cs.CL+OR+cat:cs.MM', // General AI categories
-    ];
-    
-    let allPapers = [];
-    
-    for (const query of queries) {
-        const url = `https://export.arxiv.org/api/query?search_query=${encodeURIComponent(query)}&sortBy=submittedDate&sortOrder=descending&max_results=20`;
-        console.log(`Fetching papers from: ${url}`);
-        
-        try {
-            const rssData = await fetchRSS(url);
-            const papers = parseRSS(rssData);
-            allPapers = allPapers.concat(papers);
-        } catch (error) {
-            console.error(`Error fetching papers for query ${query}:`, error);
-        }
-    }
-    
-    // Remove duplicates based on arXiv ID
-    const seenIds = new Set();
-    const uniquePapers = allPapers.filter(paper => {
-        if (seenIds.has(paper.arxivId)) {
-            return false;
-        }
-        seenIds.add(paper.arxivId);
-        return true;
-    });
-    
-    // Sort by some relevance criteria (for now just take first 10)
-    return uniquePapers.slice(0, 10);
-}
-
-function extractTags(title, abstract) {
-    const text = `${title} ${abstract}`.toLowerCase();
-    const tags = [];
-    
-    if (text.includes('embodied') || text.includes('robot')) {
-        tags.push('embodied');
-    }
-    if (text.includes('representation') || text.includes('representations') || text.includes('learning representation')) {
-        tags.push('representation');
-    }
-    if (text.includes('reinforcement learning') || text.includes('rl ') || text.includes(' rl')) {
-        tags.push('rl');
-    }
-    if (text.includes('vision') || text.includes('visual')) {
-        tags.push('vision');
-    }
-    if (text.includes('language')) {
-        tags.push('language');
-    }
-    if (text.includes('multimodal')) {
-        tags.push('multimodal');
-    }
-    if (text.includes('manipulation')) {
-        tags.push('manipulation');
-    }
-    if (text.includes('navigation')) {
-        tags.push('navigation');
-    }
-    if (text.includes('world model') || text.includes('world-model')) {
-        tags.push('world-model');
-    }
-    
-    return [...new Set(tags)]; // Remove duplicate tags
-}
+const repoRoot = __dirname;
+const script = path.join(repoRoot, 'arxiv-digest', 'scripts', 'run_daily.py');
+const args = [script, ...process.argv.slice(2)];
 
-function generateSummary(title, abstract) {
-    // This is a placeholder for a more sophisticated summary
-    // In a real implementation, this could use an LLM to generate insights
-    const insights = [
-        "This paper introduces novel approaches to the problem.",
-        "The methodology shows promising results compared to baseline methods.",
-        "The findings have implications for future research directions."
-    ];
-    
-    return insights[Math.floor(Math.random() * insights.length)];
-}
+const result = spawnSync('python3', args, {
+  cwd: repoRoot,
+  stdio: 'inherit',
+  env: process.env,
+});
 
-async function generateDigest() {
-    console.log("Starting ArXiv digest generation...");
-    
-    const papers = await getLatestPapers();
-    console.log(`Found ${papers.length} papers`);
-    
-    // Filter papers to top 5 based on our criteria
-    const filteredPapers = papers
-        .map(paper => {
-            const tags = extractTags(paper.title, paper.description);
-            return { ...paper, tags };
-        })
-        .filter(paper => paper.tags.length > 0) // Only papers with relevant tags
-        .slice(0, 5); // Take top 5
-    
-    console.log(`Filtered to ${filteredPapers.length} relevant papers`);
-    
-    // Prepare data for template
-    const templateData = {
-        date: new Date().toISOString().split('T')[0],
-        category: 'AI Research',
-        time: new Date().toLocaleTimeString('zh-CN'),
-        papers: filteredPapers.map(paper => ({
-            title: paper.title,
-            authors: paper.authors,
-            arxiv_id: paper.arxivId,
-            arxiv_url: paper.link,
-            tags: paper.tags,
-            summary: generateSummary(paper.title, paper.description)
-        }))
-    };
-    
-    // Read the template
-    const template = fs.readFileSync('/home/zhn/.nvm/versions/node/v22.22.0/lib/node_modules/openclaw/skills/arxiv-digest/assets/template.html', 'utf8');
-    
-    // Render the template
-    const output = Mustache.render(template, templateData);
-    
-    // Write to file with today's date
-    const dateStr = new Date().toISOString().split('T')[0].replace(/-/g, '-');
-    const filename = `/home/zhn/arxiv-digests/arxiv-digest-${dateStr}.html`;
-    
-    fs.writeFileSync(filename, output);
-    
-    console.log(`Digest generated successfully: ${filename}`);
-    return filename;
+if (result.error) {
+  console.error(result.error);
+  process.exit(1);
 }
 
-// Run the generator
-generateDigest()
-    .then(filename => {
-        console.log('ArXiv digest generation completed:', filename);
-        process.exit(0);
-    })
-    .catch(error => {
-        console.error('Error generating digest:', error);
-        process.exit(1);
-    });
+process.exit(result.status ?? 0);

+ 25 - 0
package-lock.json

@@ -0,0 +1,25 @@
+{
+  "name": "robdaily",
+  "version": "1.0.0",
+  "lockfileVersion": 3,
+  "requires": true,
+  "packages": {
+    "": {
+      "name": "robdaily",
+      "version": "1.0.0",
+      "license": "ISC",
+      "dependencies": {
+        "mustache": "^4.2.0"
+      }
+    },
+    "node_modules/mustache": {
+      "version": "4.2.0",
+      "resolved": "https://registry.npmjs.org/mustache/-/mustache-4.2.0.tgz",
+      "integrity": "sha512-71ippSywq5Yb7/tVYyGbkBggbU8H3u5Rz56fH60jGFgr8uHwxs+aSKeqmluIVzM0m0kB7xQjKS6qPfd0b2ZoqQ==",
+      "license": "MIT",
+      "bin": {
+        "mustache": "bin/mustache"
+      }
+    }
+  }
+}

+ 19 - 0
package.json

@@ -0,0 +1,19 @@
+{
+  "name": "robdaily",
+  "version": "1.0.0",
+  "description": "",
+  "main": "generate_arxiv_digest.js",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://code.indigofloyd.space/ClawLab/RobotDaily.git"
+  },
+  "keywords": [],
+  "author": "",
+  "license": "ISC",
+  "dependencies": {
+    "mustache": "^4.2.0"
+  }
+}