As we think about how to solve the problem of natural human conversation understanding and how to extract information from unstructured conversations, the approach to solving this problem can make a significant impact on when and how we will end up getting there.
Before jumping into the application of Deep Learning and NLP solutions for tackling the above problem, let’s establish the inherent problem present in the conversation intelligence.
Compared to news articles and other kinds of free-flowing text available on the internet, conversations have a variety of different issues to be solved from an engineering, mathematical and scientific standpoint.
Conversations are complex, and so is the data. Why?
There are a variety of reasons why the data derived from conversations are difficult to collect and use to extract information. Here are a few examples of why that could be:
- Conversations are random and dependent on state and user context
- Meaning in conversations can be hierarchical, can be communicated in bits and pieces at irregular intervals with no inference
- Conversations consist of topics that break at uneven points and can be disentangled. For example, we may speak about my trip to India for a while and then switch back to another experience. This raises many problems in understanding co-referencing and other aspects of structuring them.
- Conversational data is heavily subjective to the user present in the meeting. For a third party witnessing the conversation, it may take a long time and a lot of effort to comprehend.
- Error propagation is possible due to heavy bias and dependency on speech recognition
- Context is an integral part of the conversation and is highly uncertain. It is synonymous with uncertainty similar to a non-zero sum game where it can’t be modeled as a fixed vector at any point in time. The modeling is always dependent on uncertain conditions.
- The inference AI model has to learn from a small amount of initialization data
Conversation intelligence stands apart from traditional NLP where the hypothesis, situation and inference communicated by the user are clearly stated and explained in a given flow. Conversational data is all over the place and has a clear mismatch with the existing embeddings and pre-trained models used on the scale by the NLP community. This makes conversation intelligence beyond a curve-fitting problem and it can be transposed as a problem of making decisions under uncertainty and incomplete information.
Deep learning problems when trained with advanced mechanisms like attention and multi-head attention tend to perform well on certain NLP tasks. But they don’t really work for conversations as they fall prey to the problems of the train-test mismatch of conversation vs articles/documents. Second, it’s computation-intensive and costly to inject context.
Context can be intuitively understood as “What led to this point in the conversation and where may it lead next?”
As we look for answers, let’s dig a little deeper into what both sides of the spectrum can fetch us, and a few of the approaches and aspects that hybrid learning might prove worthy.