Machine Learning Applications: Topic Discovery with Clustering

Given a set of documents, a common task is to group them accordingly to topics or subjects. A human agent can create a hierarchy of subjects and assign each document to its related issue.

However, a clustering algorithm can create this structure automatically and more precisely. We can apply hierarchical clustering algorithms to group documents as we build the hierarchy between them.

We don’t need to set the subjects nor the topic hierarchy previously. Instead, the clustering process discovers them.

However, human agents need to provide some parameters to the algorithms:

  • a criterion to measure the similarity between documents (a distance function)
  • a level to cut the hierarchy or minimum size of a cluster
  • labels for the discovered clusters

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s