What are Topics and Topic Modeling?

When considering the full scope of Conversation AI, a fundamental component is the role of Topics and Topic Modeling. Topics refer to the most important keywords or phrases used in a conversation. As we learned in a previous post, topics are an important part of sentiment analysis.

Symbl’s topic model is based on the internal conversation structure of how concepts are interrelated in a discussion. This goes beyond traditional topic classification which is tethered to its reliance on frequency, probability distribution and a supervised training algorithm.

Before we dive further into Symbl’s advanced topic modeling capabilities, let’s take a look at traditional topic modeling, in order to understand how far we have come.

What is Traditional Topic Modeling?

Traditional topic modeling is a natural language processing (NLP) task for detecting keywords that have significance in a given document. Unlike Symbl’s advanced topic modeling, traditional topic modeling was specifically built for processing sets of documents, not for spoken conversations. Even though majority of topic modeling is applied to just about anything today, from standalone docs to chatbot conversations, the most used approach is unsupervised models best suited to analyzing multiple documents, detecting word patterns within them, and automatically clustering word groups and similar expressions that best characterize the entire set of documents.

Traditional topic modeling performs mostly keyword-based analysis like inverse normalization (doing the reverse sorting of keywords). This more rudimentary approach delivers results based on complex mathematical systems that work well when there is a clear structure and distribution of information in a document. For example, when analyzing news articles or formal documents.

Advanced topic modeling goes beyond documents and provides topic analysis for spoken conversations. This works with both real-time and off-line conversations.

Consider a free-flowing conversation where someone initiates a thread on a topic and then switches to another topic during the same conversation. Non-context-aware topic modeling and topic classification systems will underperform here, reacting to the unstructured nature of the conversation data as data sparsity and ignore the order and semantic relationship between words because context doesn’t exist in those systems without intervention.

Each time a context switch happens in the conversation, Symbl’s topic algorithm can detect the change in the context and extract the most important topics out of it.

Using Added Intelligence to improve Topic Modeling

The topics algorithm provides a framework for users to calibrate and exactly model the relationship among the concepts. The analysis of certain fundamental features of the conversation graph provide the ability to abstract and derive the most relevant topics.

As mentioned earlier, when you try to use a traditional topic modeling for spoken conversations, that pattern no longer exists. There is no flow of information nor a coherent pattern you can mathematically model like can in documents. So you need a more sophisticated system that doesn’t rely on the normality of the document and the distribution of the way the words are used. To do this you want to understand the goal of that conversation – what concepts are they talking about and how they are talking about them.

What Sets Symbl Topic Modeling Apart?

Symbl works off of a graphed based modeling system that analyzes the dialog and conversation, converting it into a graph of a conversation. This graph shows the relationship between different concepts, how they interact with each other.

Graph analyses on top of this layer identify which combination of which concepts (which phases of that concept) have higher and higher significance. The graph favors the concept that has more diverse links in the conversational patterns.

Symbl not only identifies the top keywords in a conversation but also assigns a contextual score to them based on the graph intelligence that model’s the structure of conversation. You can see this scoring of keyword ranking in the Topics API response.

Parent Topics

Parent Topics are the highest level of abstraction of discussion and key aspects of discussion that the speakers talked and expanded their discussion on in the meeting. You can see ParentTopics of conversation in the Topics API response.

Scope

Scope of a topic defines the sentences and the information in the conversation that is directly linked to the topic of discussion. You can see the scope of the topic in the Topics API response.

This is the baseline engine that we use for Symbl’s topic modeling, but we can take this topic modeling and create different variations of it.

Flat and Hierarchical Topics

One variation is called flat topic API, which consists of just a list of topics.

Going a level deeper, instead of flattening that topic, we can pick up the top hierarchy of the topic to create hierarchical topics.

Abstract Topics

Another problem with the traditional topic modeling system is that it doesn’t generalize. It only looks at that document and that conversation, therefore all it has to work with are the words inside that document..

To solve this problem Symbl came up with abstract topics. The idea is to take all the elements of a graph and use deep learning to generalize it and find the abstract concepts that map to the specific thing being talked about. That way, a person that was not in that conversation can easily grasp what the conversation was about.

Custom Topics

Custom topics are essentially just a layer that allows users to bias the way we analyze a graph. By allowing assigning topic detection to specific keywords, the Symbl API analyzes and changes the calibrating function to favor topics higher in the hierarchy list. That way you can introduce inherent bias into the deep-learning model.

Next Steps

With Symbl, you have the capability to customize topics with your own topic vocabulary. You can also use trackers to find specific things in the conversation. Learn more about trackers here.

Ready to try Symbl.ai? Get started with a free account.

Visit our documentation page to learn more about Symbl topics.

Joshua Molina

Director of Content, Symbl.ai

Joshua is a former journalist and veteran tech writer. He currently leads content strategy at Symbl.ai.

Nebula

Generative APIs

Understanding APIs

Integration

Pre-Built UI

Deployment

Security

Featured Blogs

Introducing a Gen AI Powered Pre-Built Experience for Call Insights

Symbl.ai Blog