| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091 |
- <!DOCTYPE html>
- <html lang="zh-CN">
- <head>
- <meta charset="UTF-8">
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
- <title>3 - 朴素贝叶斯</title>
- <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.css">
- <style>
- body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; background: #1a1a2e; color: #eaeaea; }
- h1 { color: #e94560; border-bottom: 2px solid #e94560; padding-bottom: 10px; }
- h2 { color: #0f3460; background: #16213e; padding: 10px; border-left: 4px solid #e94560; margin-top: 30px; }
- .module { background: #0f3460; padding: 15px; margin: 20px 0; border-radius: 5px; }
- .module-title { color: #e94560; font-weight: bold; margin-bottom: 10px; }
- code { background: #1a1a2e; padding: 2px 6px; border-radius: 3px; color: #f0f6f6; }
- pre { background: #1a1a2e; padding: 15px; border-radius: 5px; overflow-x: auto; }
- .symbol-map { background: #16213e; padding: 10px; margin: 5px 0; border-left: 3px solid #0f3460; }
- .warning { background: #e94560; color: #fff; padding: 10px; border-radius: 5px; margin: 10px 0; }
- .youtube { background: #ff0000; color: #fff; padding: 10px; border-radius: 5px; display: inline-block; margin: 10px 0; }
- </style>
- </head>
- <body>
- <h1>👾 Day 3: 朴素贝叶斯</h1>
-
- <div class="module">
- <div class="module-title">1️⃣【技术债与演进动机】The Technical Debt & Evolution</div>
- 传统分类器需要估计联合概率分布,参数多且易过拟合。朴素贝叶斯通过条件独立假设简化模型。
- </div>
- <div class="module">
- <div class="module-title">2️⃣【直觉建立】Visual Intuition</div>
- 想象判断一封邮件是否是垃圾邮件:根据关键词出现的概率,结合先验经验,计算后验概率。
- <div class="youtube">🎬 B 站搜索:<code>朴素贝叶斯 直观解释</code></div>
- </div>
- <div class="module">
- <div class="module-title">3️⃣【符号解码字典】The Symbol Decoder</div>
-
- <div class="symbol-map"><strong>$P(c_k)$</strong> → <code>self.class_priors_[k]</code> (类别先验概率)</div>
- <div class="symbol-map"><strong>$P(x^{(j)} | c_k)$</strong> → <code>self.feature_probs_[k, j]</code> (条件概率)</div>
- <div class="symbol-map"><strong>$P(c_k | x)$</strong> → <code>posterior</code> (后验概率)</div>
- <div class="symbol-map"><strong>$\alpha$</strong> → <code>alpha</code> (拉普拉斯平滑系数)</div>
-
- </div>
- <div class="module">
- <div class="module-title">4️⃣【核心推导】The Math</div>
-
- ### 贝叶斯公式
- $$P(c_k | x) = \frac{P(x | c_k) P(c_k)}{P(x)}$$
- ### 朴素条件独立假设
- $$P(x | c_k) = P(x^{(1)}, x^{(2)}, ..., x^{(d)} | c_k) = \prod_{j=1}^{d} P(x^{(j)} | c_k)$$
- ### 后验概率最大化
- $$\hat{y} = \text{argmax}_{c_k} P(c_k) \prod_{j=1}^{d} P(x^{(j)} | c_k)$$
- ### 拉普拉斯平滑
- $$P(x^{(j)} = v | c_k) = \frac{\sum_{i=1}^{N} \mathbb{I}(x^{(i)(j)} = v, y^{(i)} = c_k) + \alpha}{\sum_{i=1}^{N} \mathbb{I}(y^{(i)} = c_k) + \alpha \cdot V}$$
- 其中 $V$ 是特征取值数量,$\alpha$ 是平滑系数。
- </div>
- <div class="module">
- <div class="module-title">5️⃣【工程优化点】The Optimization Bottleneck</div>
- 需要统计所有特征 - 类别组合的频率,高维数据下计算量大。使用哈希表或稀疏矩阵优化。
- </div>
- <div class="module">
- <div class="module-title">6️⃣【今日靶机】The OJ Mission</div>
- <div class="warning">🎯 任务:<code>cd exercises/ && python3 day3_task.py</code></div>
- 实现高斯朴素贝叶斯的 fit 和 predict 函数,在鸢尾花数据集上验证分类效果。
- </div>
- <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.js"></script>
- <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/contrib/auto-render.min.js"></script>
- <script>
- renderMathInElement(document.body, {
- delimiters: [
- {left: '$$', right: '$$', display: true},
- {left: '$', right: '$', display: false}
- ],
- throwOnError: false
- });
- </script>
- </body>
- </html>
|