In our ongoing series on the Symbl.ai Go SDK, this post focuses on real-time processing of conversation insights using WebSockets. Part 1 discussed Async APIs, introducing methods to derive conversation insights asynchronously using the Symbl.ai Go SDK. The first thing I should mention is that we recently pushed out a new v0.1.1 release of the SDK.

This release makes a lot of usability enhancements so that it’s easier to consume the SDK from other projects, specifically around creating named structs. We also fixed an issue where Trackers weren’t exposed in the Streaming configuration; the feature was previously only exposed using managed Trackers via the Management API.

Why Streaming API?

The Streaming API is ideal for situations where real-time conversation is occurring and insights are required in low latency. By leveraging the WebSocket protocol, there is no need to poll the server for updates — events are streamed directly to your client as Symbl.ai processes the real-time conversation. Implement the Streaming API in your web app to provide your users with active support from Symbl.ai:

  • Entity detection for custom and managed entities such as PHI and PII
  • Trackers to automatically recognize intents, phrases and their meaning in conversations
  • Sentiment 
  • Speaker analytics (pace, silence time, and talk time)
  • Real time interim transcripts 

Streaming API, alongside our other products such as Nebula LLM & Embeddings, can enable real time generative AI use cases such as Real Time Assist for sales, customer service, and other frontline operations. For instance, detecting objections raised by a customer and assisting the agent in responding to that. Detecting moments of customer frustration and suggesting script changes. 

A Little More About WebSockets

For those that might not be familiar with WebSockets, it’s an interesting internet protocol that allows for bi-directional exchange of information between a client and server. Typically, a small amount of information is exchanged upfront to set up what this bi-directional exchange will look like. Then after that, data is exchanged asynchronously. If I drew a diagram of this process, it would look something like the picture below.

There are two types of exchanges included in the protocol: a configuration exchange or update and the raw “data” being exchanged between the client and server. The input to the server (i.e., what the client is sending) differs depending on the server type. In the case of the Symbl.ai Platform, we are talking about an audio data stream. In return we get back conversational insights in the form of transcription, results for topics, trackers, etc.

How to Use?

  1. Explore the Repository Examples:
  2. Install Required Libraries:
    • The examples utilize a microphone package that depends on the PortAudio library, a cross-platform open source audio library.
    • Linux Users: Install PortAudio using your system’s package manager (yum, apt, etc.).
    • macOS Users: Install Portaudio using Homebrew.
  3. Sign Up for Symbl.ai:
    • If you don’t already have a Symbl.ai account, sign up here for free, no credit card required.
  4. Obtain API Keys:
    • After signing up, obtain the API keys from your Symbl.ai account.
  5. Configure Environment Variables:
    •    – Add your API keys to your environment:
export APP_ID=YOUR-APP-ID-HERE
export APP_SECRET=YOUR-APP-SECRET-HERE
  • Using environment variables is beneficial as they are easy to configure, support Platform as a Service (PaaS) deployments, and work effectively in containerized environments like Docker and Kubernetes.

Next, you must add your APP_ID and APP_SECRET to your list of environment variables. We use environment variables because they are easy to configure, support PaaS-style deployments, and work very well in containerized environments such as Docker and Kubernetes.

Let’s Start Streaming Using WebSockets

As I mentioned in the previous section, we need to first log into the Symbl.ai platform (this is taken care of for you under the covers of the SDK), and the second step is to pass a configuration to set up the WebSocket protocol. You can do that by building a StreamingConfig object.

import (
   cfginterfaces "github.com/symblai/symbl-go-sdk/pkg/client/interfaces"
)
 
config := &cfginterfaces.StreamingConfig{
   InsightTypes: []string{"topic", "question", "action_item", "follow_up"},
   Config: cfginterfaces.Config{
       MeetingTitle:        "my-meeting",
       ConfidenceThreshold: 0.7,
       SpeechRecognition: cfginterfaces.SpeechRecognition{
           Encoding:        "LINEAR16",
           SampleRateHertz: 16000,
       },
   },
   Speaker: cfginterfaces.Speaker{
       Name:   "Jane Doe",
       UserID: "[email protected]",
   },
}

The next thing you need to do is define a struct that implements the InsightCallback interface.

type InsightCallback interface {
   RecognitionResultMessage(rr *RecognitionResult) error
   MessageResponseMessage(mr *MessageResponse) error
   InsightResponseMessage(ir *InsightResponse) error
   TopicResponseMessage(tr *TopicResponse) error
   TrackerResponseMessage(tr *TrackerResponse) error
   UnhandledMessage(byMsg []byte) error
}

This struct will feed into all of these conversational insights that the Streaming API defines. Suppose you are interested in understanding which messages you will receive from the Symbl.ai platform; in that case, you can always pass in the DefaultMessageRouter, which moves the conversation insight structs to the console.

Input Source What?!

The next thing we need to do is provide an input source for our audio or conversation. In this case, we’re going to use that PortAudio-based microphone library. To initialize and also start the microphone, just provide an AudioConfig but do the following:

// mic stuf
sig := make(chan os.Signal, 1)
signal.Notify(sig, os.Interrupt, os.Kill)
 
mic, err := microphone.Initialize(microphone.AudioConfig{
   InputChannels: 1,
   SamplingRate:  16000,
})
if err != nil {
   fmt.Printf("Initialize failed. Err: %v\n", err)
   os.Exit(1)
}
 
// start the mic
err = mic.Start()
if err != nil {
   fmt.Printf("mic.Start failed. Err: %v\n", err)
   os.Exit(1)
}

Microphone Meet WebSocket

Then, we need to pass the microphone to the Streaming interface, which is beyond simple because the Microphone library implements a Go streaming interface.

// this is a blocking call
mic.Stream(client)

That’s really it! When you connect the two, simply start talking into your microphone and you should begin seeing conversational insights being passed back to you.

If you would like to run the Streaming example in the repo, you can install AudioPort, add the Symbl.ai API environment variables, and then run the following commands:

$ cd examples/streaming/
$ go run streaming.go

If you want to see this process in action, watch the quick video below:

What’s Next?

To prove the Symbl.ai Go SDK, I created a project that would consume this SDK. This project just so happens to be the demo that was used in my API World presentation on Nov. 2, 2022, as well as the subject of the next Symbl.ai Go SDK blog.

In the meantime, please give the Symbl.ai Go SDK a try and provide feedback via issues, whether they are feature requests, enhancements, or bugs. The SDK is only as good as the feedback and ideas we receive from the people consuming it—so, please give it a try!

David vonThenen
David vonThenen
Developer Advocate

David is a self-described Tech geek and Developer Advocate enabling others to process communications to derive conversation intelligence. David talks Kubernetes/containers, VMware virtualization, backup recovery/replication solutions, adaptors in hardware storage connectivity, and everything else that is tech!