Classification Applications: Spam Detection

Spam detection is one of the classical applications of classification algorithms. It simply consists of assigning a received email one of two labels: spam or not spam.

By automatically classifying received emails as spam or not spam, email services provide a cleaner and safer mail Inbox.

The training data is obtained by collecting samples of emails kept in Inbox or archived and emails manually labeled as spam.

Using NLP techniques like bag-of-words, for example, it’s possible to map the email content to a vector of features and then apply machine learning classification algorithms to obtain a model capable of distinguishing spam and non-spam emails.

By deploying the model and exposing it as an API, the email software can get the classification and place the new email in the Inbox or Spam folder.

Finally, suppose the model misclassifies some emails. In that case, we may collect the correct classification to monitor and measure the performance of our model and collect new data, so we can retrain the model if necessary.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s