Continuing with our ongoing series on the Symbl.ai Go SDK, in this blog post I am going to give you an overview of processing conversation insights in real-time using WebSockets. The first thing I should mention is that we recently pushed out a new v0.1.1 release of the SDK.

This release makes a lot of usability enhancements so that it’s easier to consume the SDK from other projects, specifically around creating named structs. We also fixed an issue where Trackers weren’t exposed in the Streaming configuration; the feature was previously only exposed using managed Trackers via the Management API.

Let’s Dive Into the Streaming API

There are several examples within the repo, including using the Streaming API via WebSockets to the Symbl.ai platform. The example makes use of a microphone package contained within the repository. That package makes use of the PortAudio library, which is a cross-platform open source audio library. If you are on Linux, you can install this library using whatever package manager is available (yum, apt, etc.) on your operating system. If you are on macOS, you can install this library using brew.

Once you get PortAudio installed, the next thing you need to do before using the Streaming API is to obtain the API keys on your account. If you don’t have a free Symbl.ai account, you can sign up here (for free) without providing a credit card.

export APP_ID=YOUR-APP-ID-HERE
export APP_SECRET=YOUR-APP-SECRET-HERE

Next, you must add your APP_ID and APP_SECRET to your list of environment variables. We use environment variables because they are easy to configure, support PaaS-style deployments, and work very well in containerized environments such as Docker and Kubernetes.

A Little More About WebSockets

For those that might not be familiar with WebSockets, it’s an interesting internet protocol that allows for bi-directional exchange of information between a client and server. Typically, a small amount of information is exchanged upfront to set up what this bi-directional exchange will look like. Then after that, data is exchanged asynchronously. If I drew a diagram of this process, it would look something like the picture below.

There are two types of exchanges included in the protocol: a configuration exchange or update and the raw “data” being exchanged between the client and server. The input to the server (i.e., what the client is sending) differs depending on the server type. In the case of the Symbl.ai Platform, we are talking about an audio data stream. In return we get back conversational insights in the form of transcription, results for topics, trackers, etc.

Let’s Start Streaming Using WebSockets

As I mentioned in the previous section, we need to first log into the Symbl.ai platform (this is taken care of for you under the covers of the SDK), and the second step is to pass a configuration to set up the WebSocket protocol. You can do that by building a StreamingConfig object.

import (
   cfginterfaces "github.com/dvonthenen/symbl-go-sdk/pkg/client/interfaces"
)
 
config := &cfginterfaces.StreamingConfig{
   InsightTypes: []string{"topic", "question", "action_item", "follow_up"},
   Config: cfginterfaces.Config{
       MeetingTitle:        "my-meeting",
       ConfidenceThreshold: 0.7,
       SpeechRecognition: cfginterfaces.SpeechRecognition{
           Encoding:        "LINEAR16",
           SampleRateHertz: 16000,
       },
   },
   Speaker: cfginterfaces.Speaker{
       Name:   "Jane Doe",
       UserID: "[email protected]",
   },
}

The next thing you need to do is define a struct that implements the InsightCallback interface.

type InsightCallback interface {
   RecognitionResultMessage(rr *RecognitionResult) error
   MessageResponseMessage(mr *MessageResponse) error
   InsightResponseMessage(ir *InsightResponse) error
   TopicResponseMessage(tr *TopicResponse) error
   TrackerResponseMessage(tr *TrackerResponse) error
   UnhandledMessage(byMsg []byte) error
}

This struct will feed into all of these conversational insights that the Streaming API defines. Suppose you are interested in understanding which messages you will receive from the Symbl.ai platform; in that case, you can always pass in the DefaultMessageRouter, which moves the conversation insight structs to the console.

Input Source What?!

The next thing we need to do is provide an input source for our audio or conversation. In this case, we’re going to use that PortAudio-based microphone library. To initialize and also start the microphone, just provide an AudioConfig but do the following:

// mic stuf
sig := make(chan os.Signal, 1)
signal.Notify(sig, os.Interrupt, os.Kill)
 
mic, err := microphone.Initialize(microphone.AudioConfig{
   InputChannels: 1,
   SamplingRate:  16000,
})
if err != nil {
   fmt.Printf("Initialize failed. Err: %v\n", err)
   os.Exit(1)
}
 
// start the mic
err = mic.Start()
if err != nil {
   fmt.Printf("mic.Start failed. Err: %v\n", err)
   os.Exit(1)
}

Microphone Meet WebSocket

Then, we need to pass the microphone to the Streaming interface, which is beyond simple because the Microphone library implements a Go streaming interface.

// this is a blocking call
mic.Stream(client)

That’s really it! When you connect the two, simply start talking into your microphone and you should begin seeing conversational insights being passed back to you.

If you would like to run the Streaming example in the repo, you can install AudioPort, add the Symbl.ai API environment variables, and then run the following commands:

$ cd examples/streaming/
$ go run streaming.go

If you want to see this process in action, watch the quick video below:

What’s Next?

To prove the Symbl.ai Go SDK, I created a project that would consume this SDK. This project just so happens to be the demo that was used in my API World presentation on Nov. 2, 2022, as well as the subject of the next Symbl.ai Go SDK blog.

In the meantime, please give the Symbl.ai Go SDK a try and provide feedback via issues, whether they are feature requests, enhancements, or bugs. The SDK is only as good as the feedback and ideas we receive from the people consuming it—so, please give it a try!

David vonThenen
David vonThenen
Developer Advocate

David is a self-described Tech geek and Developer Advocate enabling others to process communications to derive conversation intelligence. David talks Kubernetes/containers, VMware virtualization, backup recovery/replication solutions, adaptors in hardware storage connectivity, and everything else that is tech!