All About Enterprise Reference Implementation for Conversation Aggregation

In case you missed my previous blog post entitled Understanding Enterprise Architecture for Conversation Aggregation, this blog builds on top of the eye-opening capabilities of an enterprise-type conversation application. It’s clear that large companies with deep pockets are heavily investing in technologies related to AI/ML and conversation analytics, such as the Symbl.ai Platform, OpenAI’s GPT-3, Amazon SageMaker, Kubeflow, etc.

Many of these enterprise companies are looking to solve problems such as predicting human decision-making, observing patterns related to behavior, and even strategically guiding people toward desired outcomes. My previous blog explained why we care about this. Today, we will delve into the “how” through examples using this Enterprise Reference Implementation repo on GitHub.

Breaking Down the Problem

If we are interested in segmenting conversations, a minimal set of requirements is necessary to achieve a predictive or associative form of conversation analytics.

The minimal set of requirements for higher-order conversation perception are listed below:

Conversation ingress (i.e. the input)
Extracting insights from conversations
Historical conversation and associated insights
Conversation aggregation and analytics
Action (i.e. the output or what we hope to achieve)

These topics are all important, and we can (and will in subsequent blog posts) do a deep dive into the nitty-gritty details of each one. For now, it’s essential to look at the ten-thousand-foot views of these requirements to level set on what we are talking about. We’re doing this because some of these topics, at face value, seem trivial but in fact have a great deal of nuance to them.

First, let’s take a look at a block-style architectural diagram.

We Love Architecture Diagrams

Personally, I love architecture diagrams. I like diagrams or pictures of any kind because I’m a visual learner. When discussing software systems and complex platforms, these diagrams are a great way to see all the components are involved and visualize the control and data path for these systems. Let’s look at the architecture for a traditional or simple system for deriving conversation insights, and then compare that with architecture attempting complex conversation aggregation.

Before we begin, this blog will focus on real-time conversation analytics in a streaming-type capacity, even though asynchronous conversation would look reasonably similar in terms of how the conversation data is ingested by the system.

Simplistic Architecture

The architecture diagram below represents how conversation insights are derived today across many applications. The design is simplistic and, more often than not, meets the needs of your typical real-time conversation use cases. Some goals are to coach, prompt individuals, or perform actions based on what is being said in a given conversation.

**Simple Conversation Architecture Diagram**

That last thought, analytics for this specific conversation, is often overlooked frequently (and by coincidence intentionally). Analytics for multiple conversations typically is out of scope. Why is that?

First, the system that does the conversation aggregation over time is complex to build. Second, to offset that design and implementation complexity we fall back on using humans to accomplish complex aggregation. As you might realize, relying on humans to perform this task can lead to inconsistent and error-prone results despite the best efforts to mitigate this using reporting hierarchy, generating reports, providing searchability, spreadsheets, etc. However, it generally gets the job done.

It turns out that these limitations are perfectly OK, because this model is able to address a good portion of in-demand use cases. These include systems such as chatbots, (simple) call center enablement, and applications creating simple triggers—to name just a few. Because these applications are simpler, most of them run client-side and, in turn, that’s the biggest reason one sees so many React, Angular, and Vue SDKs for these CPaaS platforms.

In summation:

It’s simple to build this type of application
They are transactional type applications (input mapped to output)
They offer Point-in-Time conversation analysis
The conversation insights are usually processed client-side
The conversations are isolated from other conversations

Conversation Aggregation with Enterprise Scale

If you examine the use cases in the previous architecture, you are looking at extracting conversation insights from a finite point in time. Those conversations are typically more transactional in nature. Let’s look at the call center enablement use case for an internet service provider; these conversations are framed in a very predictable and finite way.

In almost all cases, a customer initiates this transaction via phone, chat, etc., and then contacts the support technician with a particular problem. Let’s use “my internet connection is no longer working” as an example. The customer will provides some details about what they are experiencing, and the technical support staff might ask additional questions to refine remediation steps. Upon completing the conversation, the customer receives the desired output, which is being reconnected to the internet.

For more complex use cases where we are looking to make connections, establish patterns, and aggregate insights over many conversations throughout time, we need to address a critical difference between these two architectures: persisting conversation data. It is a straightforward, logical, and natural step if you want to make connections between what is said in real time and what was said in prior conversations.

This very simple realization has very significant implications. How do we store this conversation data? What kind of data storage do we use? What’s more important, access speed for our insights or association via putting the dots close enough together to aggregate insights intelligently? What mechanisms do we want to use to detect patterns in conversations?

To understand the complexities in this architecture captured via the Enterprise Reference Implementation repo, we must dive into and understand the minimal set of requirements we have been dancing around.

In summation:

More effort is required to build these types of applications
Conversations can be aggregated
One can build applications with a historical conversation context
One can have more control over the conversation data
They are better for building scalable conversation applications
The company’s business rules/logic are pushed into backend server microservices

If you’d like to get more detailed about both of these architectures, watch the informational video below:

Ingress Data for Conversations

This topic is the most trivial aspect of any conversation processing design, but it also happens to be the most incomplete or overlooked element because it seems simple enough.

These days, we often think of our conversation data coming from an audio stream in a communication platform such as Zoom, Twilio, Vonage, etc. Still, in reality, there are many more forms of real-time conversation sources that we should remember. These include Telephony via Session Initiation Protocol (SIP) and Public Switched Telephone Network (PSTN), team collaboration applications such as Slack, the unending sea of video/chat platforms including Discord, the one friend who uses email like it’s an instant messenger application, and many more.

These data streams feed into this Analyzer component, which effectively takes the conversation that is embedded in these streams and extracts the contextual insights, as well as enacts some pre-processing of discovered insights. It so happens that the Symbl.ai platform plays a huge role in this component, and, because of that role, there are several Symbl.ai SDKs that can assist with ingesting these data streams in the form of Websocket SDKs for Streaming APIs, Telephony SDKs, and even Asynchronous APIs for handling text and other inputs.

Mining Conversation Data

Many complex details are associated with extracting conversation insights from these various real-time forms of communication. This Analyzer component aims to ingest the data, create any additional metadata to associate with these insights, and then save the context to recall later.

There happens to be an excellent platform that does all of the heavy lifting for us. We can extract these conversation insights without having to train models, have expertise in artificial intelligence or machine learning, or require a team of data scientists. Of course, I’m talking about using the real-time streaming capabilities on the Symbl.ai platform.

Some capabilities that would be invaluable to leverage within your application would be:

Trackers for honing in on topics specific to your business
Custom Entity Detection to see how your products, capabilities, and perhaps even feature gates are discussed and utilized
Conversation Groups, which is a new feature on the Symbl.ai platform, and could be used to process lower-priority batch-style analytics
Summarization for distilling down larger conversations and creating tiered topics, because this would remove less relevant subjects

The second feature we alluded to with regard to this Analyzer component is being able to save all these insights and metadata. Because this is a vast topic, we will cover it in the module below.

Preserving Conversation Insights

Preserving insights represents the first of two significant pieces of work in this design. In order to aggregate conversation insights from external conversation sources and through historical data, we need to have a method for persisting this data to recall and make associations to conversations happening now.

This requirement naturally lends itself to using some form of database, but what kind? If we are talking about the aggregation of millions or even billions of conversations, we need scalable backend storage. This means that more storage or database nodes can be brought online to expand capacity. Because we are talking about enterprise applications, this form of expansion needs to be completed without interrupting the application’s availability (or performance), without (much) human intervention, and, as always, in as simple a manner as possible.

We want to note that, in terms of the application’s performance, storage is just one aspect of any data storage system. Another equally crucial dimension is access performance. There are different flavors of databases out there offering their unique take or capability for persisting and recalling data. Still, we need to break down the requirements even further to make better storage choices.

Processing millions of conversations at scale necessitates the ability to quickly store (AKA write) contextual insights in order to keep pace with each conversation. The querying or reading of these insights occurs at the same frequency as the database writes them down. This makes sense, because each new insight invokes a write operation to record what has been discovered, and it will also trigger a read operation to see if there are any prior incidents of the newly discovered insight.

Another data access requirement unique to this application and the conversation intelligence domain we are working with is that the data storage platform needs to be able to quickly and efficiently query for relationships between data points. It would make sense to either select a data storage platform that provides users with this capability natively, or build something that farms out the work to reduce the complexity of the search.

Below are several key takeaways from a storage perspective:

One needs extensible and scalable backend storage that doesn’t impact availability
Performant data access requires significant read/write access to a ratio roughly that of 50/50
The ability to query for relationships between data points is a must

Performing Real-time Analytics

The previous section discussed the need to archive conversation insights that can be recalled by real-time conversations happening in the present moment. This section expands on the final, but extremely significant, feature required in this Enterprise Architecture for Conversation Analysis: making associations or defining the relationships between contextual insights. That functionality happens in this Middleware component in the Enterprise Architecture diagram above.

The best way to visualize this Middleware component is via specific use cases. If we go back to our Internet Service Provider scenario, let’s say in this particular conversation a Tracker insight relating to a “flashing light on cable modem” is recognized and triggered by the Symbl.ai Platform. This information, a “flashing light”, isn’t surprising and is probably even expected in this technical support call involving a customer without internet access.

In an application based on our Simple Architecture definition, we could display a popup to the support technician advising the customer to reboot the cable modem by unplugging and plugging the modem back in. However, in an application based on our Enterprise Architecture that considers historical data, we could query to see which other conversations are associated with Tracker insight “flashing light”.

It could be that there has been a large number of conversations that have triggered this Tracker insight from users originating from Long Beach, California, in the past 30 minutes. Situations like this could indicate a local outage, and our application could dispatch a higher-tier support technician to look into the problem and notify the technician speaking with the customer within the application’s user interface that there is a possibility of a general outage in the area.

In the above example, the Tracker insight “flashing light,” or the data itself, wasn’t significant. However, the relationship or association with that particular Tracker to other conversations that had taken place recently is the noteworthy piece of information. That’s the value proposition for this type of application architecture.

As you can see, this Middleware component is deeply tied to what your business cares about. This component, either in code or interfacing with another external system, captures your company’s specific business rules. These business rules can then be used to notify others within the company to take action, create events that you might want to pass along to other software systems, or trigger actions you want to perform directly in this component.

Although there is a generic implementation provided in this Middleware component, the intent of this Enterprise Reference Implementation is only to be just that—a reference. This Middleware component contained in the repo should either, at minimum, be modified to capture your business rules or in practice, be re-implemented to fit your specific business needs.

Should you choose to use this Reference Implementation as a starting point, the interfaces into and out of this Middleware component use an industry-standard system, which means this Middleware component can be implemented in any language your organization has the most expertise with.

Next Up: A Deep Dive into Data Storage for Conversations

This blog post has shown you solid comparisons between a simple design for a conversation application and what an enterprise conversation-based application architecture would look like. The Enterprise Reference Implementation cited in this blog post is open source and open for use to serve as a template, fire off the creative energy to build your own implementation, and even be used as-is with no strings attached.

The next topic in this series will be a deep dive into the storage or archival aspects of the Analyzer component. Although this blog post has been an incredible start to describing the requirements and functionality necessary for this component, there are far more details that are beneficial to discuss. Those learnings will enable others to make highly informed decisions in terms of designing and selecting an intelligent storage platform to meet their needs.

I hope this has been an enlightening discussion. The big takeaway of this article is to understand the purpose of each component within this higher-level block diagram, and be able to extrapolate your own implementation to put the dots of knowledge close enough together to predict and create desired outcomes for your business. Cheers!

David vonThenen

Developer Advocate

David is a self-described Tech geek and Developer Advocate enabling others to process communications to derive conversation intelligence. David talks Kubernetes/containers, VMware virtualization, backup recovery/replication solutions, adaptors in hardware storage connectivity, and everything else that is tech!

Everything to Know About Enterprise Reference Implementation for Conversation Aggregation