Spam detection is one of the classical applications of classification algorithms. It simply consists of assigning a received email one of two labels: spam or not spam.
By automatically classifying received emails as spam or not spam, email services provide a cleaner and safer mail Inbox.
The training data is obtained by collecting samples of emails kept in Inbox or archived and emails manually labeled as spam.
Using NLP techniques like bag-of-words, for example, it’s possible to map the email content to a vector of features and then apply machine learning classification algorithms to obtain a model capable of distinguishing spam and non-spam emails.
By deploying the model and exposing it as an API, the email software can get the classification and place the new email in the Inbox or Spam folder.
Finally, suppose the model misclassifies some emails. In that case, we may collect the correct classification to monitor and measure the performance of our model and collect new data, so we can retrain the model if necessary.