# GLM-4.7-flash-128k 测试报告（代码场景）

时间：2026-02-22

## 1) 已实施配置变更

- Provider API：`openai-completions` -> `ollama`（原生）
- Base URL：`http://127.0.0.1:11434/v1` -> `http://127.0.0.1:11434`
- 模型参数：
  - `contextWindow`: 131072 -> 65536
  - `maxTokens`: 16384 -> 8192
  - `agents.defaults.models[ollama/glm-4.7-flash-128k].params`：
    - `temperature`: 0.2
    - `num_ctx`: 65536
    - `num_predict`: 4096

## 2) 环境与硬件快照

- CPU: i3-12100F (4C/8T)
- RAM: 15GiB
- GPU:
  - RTX 2080 Ti 22GB
  - Tesla P100 16GB
- Ollama: 0.16.3
- OpenClaw: 2026.2.19-2

## 3) 压测结果（代码编写多轮）

测试文件：`reports/ollama-coding-bench.json`

三组配置（ctx32k / ctx64k / ctx96k），每组 5 轮代码任务。

结果：
- ctx32k: 第1轮超时
- ctx64k: 第1轮超时
- ctx96k: 第1轮超时

额外单轮短任务验证（ctx64k, num_predict=256）：
- 成功返回，耗时约 13.19s

## 4) 结论

1. 你的模型“能工作”，但在“长输出+代码多轮”下非常容易触发超时。
2. 当前主要瓶颈不是消息通道，而是推理吞吐（长响应生成速度不足）。
3. 5 轮代码压测失败说明：当前参数对该硬件+模型规模来说仍偏激进。

## 5) 推荐稳定参数（优先稳定）

建议改成：
- `num_ctx`: 32768
- `num_predict`: 1024（必要时 768）
- `temperature`: 0.2

使用策略：
- 代码场景默认先短答，必要时再“继续”生成下一段
- 避免一次性超长代码块

## 6) 可观测性（你能确认我是否在工作）

建议固定用：
- `openclaw status`
- `openclaw models status`
- `ollama ps`
- `tail -f /tmp/openclaw/openclaw-$(date +%F).log | grep -Ei "embedded run (start|done|timeout)|FailoverError|timed out"`

这样你可以实时看到：是否在跑、是否超时、是否切换fallback。