How Spam Filters Work: Coding Naive Bayes from Scratch
Building a Spam Detector from Scratch: The Naive Bayes Classifier Three billion emails are sent every hour. And nearly half of them are spam. "Congratulations! You won free money. Click here now." If we look at this email, we immediately realize it is spam. But how does the machine realize it? It's an engineering problem . An email has thousands of words, and each word changes the probability of it being spam or not spam. Checking how these words relate to each other is computationally impossible. We need to make a naive assumption to solve it. This assumption leads us to one of the most effective classifiers: The Naive Bayes Classifier . The "Naive" Assumption The Naive Bayes classifier assumes that all features are conditionally independent given the class . What does this mean in simple terms? It assumes that the presence of one word does not affect the probability of another wo...