| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889 |
- <!DOCTYPE html>
- <html lang="zh-CN">
- <head>
- <meta charset="UTF-8">
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
- <title>2 - K 近邻</title>
- <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.css">
- <style>
- body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; background: #1a1a2e; color: #eaeaea; }
- h1 { color: #e94560; border-bottom: 2px solid #e94560; padding-bottom: 10px; }
- h2 { color: #0f3460; background: #16213e; padding: 10px; border-left: 4px solid #e94560; margin-top: 30px; }
- .module { background: #0f3460; padding: 15px; margin: 20px 0; border-radius: 5px; }
- .module-title { color: #e94560; font-weight: bold; margin-bottom: 10px; }
- code { background: #1a1a2e; padding: 2px 6px; border-radius: 3px; color: #f0f6f6; }
- pre { background: #1a1a2e; padding: 15px; border-radius: 5px; overflow-x: auto; }
- .symbol-map { background: #16213e; padding: 10px; margin: 5px 0; border-left: 3px solid #0f3460; }
- .warning { background: #e94560; color: #fff; padding: 10px; border-radius: 5px; margin: 10px 0; }
- .youtube { background: #ff0000; color: #fff; padding: 10px; border-radius: 5px; display: inline-block; margin: 10px 0; }
- </style>
- </head>
- <body>
- <h1>👾 Day 2: K 近邻</h1>
-
- <div class="module">
- <div class="module-title">1️⃣【技术债与演进动机】The Technical Debt & Evolution</div>
- 基于模型的算法需要训练过程,无法利用新样本。K 近邻是实例学习(惰性学习),直接存储训练数据用于预测。
- </div>
- <div class="module">
- <div class="module-title">2️⃣【直觉建立】Visual Intuition</div>
- 想象在一个平面上,新点周围的 k 个最近邻居多数是红色,那它也被预测为红色。距离越近影响越大。
- <div class="youtube">🎬 B 站搜索:<code>K 近邻算法 直观解释</code></div>
- </div>
- <div class="module">
- <div class="module-title">3️⃣【符号解码字典】The Symbol Decoder</div>
-
- <div class="symbol-map"><strong>$x$</strong> → <code>input_tensor</code> (待预测样本,shape: [d])</div>
- <div class="symbol-map"><strong>$X$</strong> → <code>self.X_train</code> (训练集,shape: [N, d])</div>
- <div class="symbol-map"><strong>$y$</strong> → <code>label</code> (待预测标签)</div>
- <div class="symbol-map"><strong>$Y$</strong> → <code>self.y_train</code> (训练标签,shape: [N])</div>
- <div class="symbol-map"><strong>$k$</strong> → <code>self.k</code> (近邻数量)</div>
- <div class="symbol-map"><strong>$p$</strong> → <code>p</code> (Lp 范数,p=2 为欧氏距离)</div>
-
- </div>
- <div class="module">
- <div class="module-title">4️⃣【核心推导】The Math</div>
-
- ### 距离计算(Lp 范数)
- $$d(x, x^{(i)}) = \left( \sum_{j=1}^{d} |x^{(j)} - x^{(i)(j)}|^p \right)^{1/p}$$
- 当 $p=2$ 时为欧氏距离:
- $$d(x, x^{(i)}) = \sqrt{\sum_{j=1}^{d} (x^{(j)} - x^{(i)(j)})^2}$$
- ### 投票规则
- $$\hat{y} = \text{argmode}_{c} \sum_{i=1}^{k} \mathbb{I}(y^{(i)} = c)$$
- 其中 $\mathbb{I}(\cdot)$ 是指示函数,统计 k 个近邻中各类别出现次数。
- </div>
- <div class="module">
- <div class="module-title">5️⃣【工程优化点】The Optimization Bottleneck</div>
- 预测时需要计算到所有训练样本的距离,复杂度 $O(N \cdot d)$。使用 KD 树或 Ball 树加速到 $O(\log N)$。
- </div>
- <div class="module">
- <div class="module-title">6️⃣【今日靶机】The OJ Mission</div>
- <div class="warning">🎯 任务:<code>cd exercises/ && python3 day2_task.py</code></div>
- 实现 K 近邻的 distance 和 predict 函数,在手写数字数据集上验证 k 值对准确率的影响。
- </div>
- <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.js"></script>
- <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/contrib/auto-render.min.js"></script>
- <script>
- renderMathInElement(document.body, {
- delimiters: [
- {left: '$$', right: '$$', display: true},
- {left: '$', right: '$', display: false}
- ],
- throwOnError: false
- });
- </script>
- </body>
- </html>
|