course_day3.html 4.3 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
  1. <!DOCTYPE html>
  2. <html lang="zh-CN">
  3. <head>
  4. <meta charset="UTF-8">
  5. <meta name="viewport" content="width=device-width, initial-scale=1.0">
  6. <title>3 - 朴素贝叶斯</title>
  7. <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.css">
  8. <style>
  9. body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 20px; background: #1a1a2e; color: #eaeaea; }
  10. h1 { color: #e94560; border-bottom: 2px solid #e94560; padding-bottom: 10px; }
  11. h2 { color: #0f3460; background: #16213e; padding: 10px; border-left: 4px solid #e94560; margin-top: 30px; }
  12. .module { background: #0f3460; padding: 15px; margin: 20px 0; border-radius: 5px; }
  13. .module-title { color: #e94560; font-weight: bold; margin-bottom: 10px; }
  14. code { background: #1a1a2e; padding: 2px 6px; border-radius: 3px; color: #f0f6f6; }
  15. pre { background: #1a1a2e; padding: 15px; border-radius: 5px; overflow-x: auto; }
  16. .symbol-map { background: #16213e; padding: 10px; margin: 5px 0; border-left: 3px solid #0f3460; }
  17. .warning { background: #e94560; color: #fff; padding: 10px; border-radius: 5px; margin: 10px 0; }
  18. .youtube { background: #ff0000; color: #fff; padding: 10px; border-radius: 5px; display: inline-block; margin: 10px 0; }
  19. </style>
  20. </head>
  21. <body>
  22. <h1>👾 Day 3: 朴素贝叶斯</h1>
  23. <div class="module">
  24. <div class="module-title">1️⃣【技术债与演进动机】The Technical Debt & Evolution</div>
  25. 传统分类器需要估计联合概率分布,参数多且易过拟合。朴素贝叶斯通过条件独立假设简化模型。
  26. </div>
  27. <div class="module">
  28. <div class="module-title">2️⃣【直觉建立】Visual Intuition</div>
  29. 想象判断一封邮件是否是垃圾邮件:根据关键词出现的概率,结合先验经验,计算后验概率。
  30. <div class="youtube">🎬 B 站搜索:<code>朴素贝叶斯 直观解释</code></div>
  31. </div>
  32. <div class="module">
  33. <div class="module-title">3️⃣【符号解码字典】The Symbol Decoder</div>
  34. <div class="symbol-map"><strong>$P(c_k)$</strong> → <code>self.class_priors_[k]</code> (类别先验概率)</div>
  35. <div class="symbol-map"><strong>$P(x^{(j)} | c_k)$</strong> → <code>self.feature_probs_[k, j]</code> (条件概率)</div>
  36. <div class="symbol-map"><strong>$P(c_k | x)$</strong> → <code>posterior</code> (后验概率)</div>
  37. <div class="symbol-map"><strong>$\alpha$</strong> → <code>alpha</code> (拉普拉斯平滑系数)</div>
  38. </div>
  39. <div class="module">
  40. <div class="module-title">4️⃣【核心推导】The Math</div>
  41. ### 贝叶斯公式
  42. $$P(c_k | x) = \frac{P(x | c_k) P(c_k)}{P(x)}$$
  43. ### 朴素条件独立假设
  44. $$P(x | c_k) = P(x^{(1)}, x^{(2)}, ..., x^{(d)} | c_k) = \prod_{j=1}^{d} P(x^{(j)} | c_k)$$
  45. ### 后验概率最大化
  46. $$\hat{y} = \text{argmax}_{c_k} P(c_k) \prod_{j=1}^{d} P(x^{(j)} | c_k)$$
  47. ### 拉普拉斯平滑
  48. $$P(x^{(j)} = v | c_k) = \frac{\sum_{i=1}^{N} \mathbb{I}(x^{(i)(j)} = v, y^{(i)} = c_k) + \alpha}{\sum_{i=1}^{N} \mathbb{I}(y^{(i)} = c_k) + \alpha \cdot V}$$
  49. 其中 $V$ 是特征取值数量,$\alpha$ 是平滑系数。
  50. </div>
  51. <div class="module">
  52. <div class="module-title">5️⃣【工程优化点】The Optimization Bottleneck</div>
  53. 需要统计所有特征 - 类别组合的频率,高维数据下计算量大。使用哈希表或稀疏矩阵优化。
  54. </div>
  55. <div class="module">
  56. <div class="module-title">6️⃣【今日靶机】The OJ Mission</div>
  57. <div class="warning">🎯 任务:<code>cd exercises/ && python3 day3_task.py</code></div>
  58. 实现高斯朴素贝叶斯的 fit 和 predict 函数,在鸢尾花数据集上验证分类效果。
  59. </div>
  60. <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.js"></script>
  61. <script src="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/contrib/auto-render.min.js"></script>
  62. <script>
  63. renderMathInElement(document.body, {
  64. delimiters: [
  65. {left: '$$', right: '$$', display: true},
  66. {left: '$', right: '$', display: false}
  67. ],
  68. throwOnError: false
  69. });
  70. </script>
  71. </body>
  72. </html>