Process Audio Recording with Symbl Async API

In this blog post, you’ll be guided with the processing of multi-channel audio using the Async Audio URL. Once the Asynchronous Job gets completed, you should be able to make a call to the Conversation APIs for getting the insights such as Topics, Follow-Ups, Questions, Action-Items, and Trackers. The pre-requisite for processing the multi-channel file is the audio channel count and the channel metadata information such as speaker name and email.

Let’s try to understand what multi-channel audio is and how it’s built or constructed. When it comes to the audio file, it could be either a mono or a stereotyped recording. A recording that consists of more than one channel of audio recordings is called a stereo or multi-channel recording. There are various tools for building multi-channel audio. You’ll now see how to build a multichannel recording using FFmpeg. Please follow this tutorial for building a proper multi-channel recording using an open-source tool named FFmpeg.

Before sending an Async Audio URL request to Symbl there are a few parameters that can be added to the body request in order to expand the results of the insights and increase transcript accuracy.

Transcript accuracy: How can this be achieved?

  1. In case you have a recording that was done in two or more channels you can enable Speaker Separated Channel audio processing and add to each channel the speaker name and details using these parameters:

2. Add custom vocabulary to add a list of words and phrases that provide hints to the speech recognition task:

3. In case the recorded audio sample rate was 8KHz adding the phone mode can improve the transcript quality:

Note: If your call recorded sample rate is higher than 8KHz there is no need to add this param
Note:  More details in the params can be found in this Link.

More insights options: How can this be achieved?

  • Entities – By adding the parameter “detectEntities”: true in the body request it will find the entities in the conversation like location, person, date, number, organization, datetime, daterange, etc.
  • Trackers – By adding tracker functionality it will find the themes and business insights that the customer is looking to trace in the conversation. This can be achieved easily by either adding the tracker’s parameters to the body request with a  list of dictionaries that contains the tracker’s name and vocabulary. For example:
"trackers": [
         {
             "name": "Hire interns",
             "vocabulary": [
                 "would like to interview",
                 "hire our candidates",
                 "hire high school interns"
             ]
         },
             {
           "name":"c-level",
           "vocabulary":[
               "CEO",
               "Co-founder",
               "CTO",
               "CFO"
           ]
           }
   ]

Note: There are more ways to enable trackers by first creating them using manage API and then using all of them (enableAllTrackers) or only using a few of them by selecting the trackers option with the trackers Ids list. More details can be found in the Trackers API documentation.
For example:
"enableAllTrackers": true
Or, alternatively, using tracker IDs created using the Tracker Management API. (note: replace the “id” key value pairing with your own tracker ID values):

"trackers":[
     { "id": "6581143257219072" },
     { "id": "5044262090571776" },
     { "id": "6191012855676928" },
     { "id": "5512700349120512" }
 ],

      1. Summarization (Beta) – This allows you to generate conversation summaries using the Summary API.

Async POST Audio URL, cURL example:

curl --location --request POST 'https://api.symbl.ai/v1/process/audio/url' 
--header 'x-api-key: ' 
--header 'Content-Type: application/json' 
--data-raw '{
   "url": "",
   "confidenceThreshold": 0.6,
   "timezoneOffset": 0,
   "name": "",
   "mode": "phone",
   "channelMetadata":[
       {
           "channel": 1,
           "speaker": {
           "name": "Robert Bartheon",
           "email": "robertbartheon@example.com"
           }
       },
       {
           "channel": 2,
           "speaker": {
               "name": "John Snow",
               "email": "johnny@example.com"
           }
       }
   ],
        "trackers": [
        {
            "name": "Hire interns",
            "vocabulary": [
                "would like to interview",
                "hire our candidates",
                "hire high school interns"
            ]
        },
            {
          "name":"c-level",
          "vocabulary":[
              "CEO",
              "Co-founder",
              "CTO",
              "CFO"
          ]
          }
  ],
   "detectEntities": true
 }'

Getting Conversation Insights once Async API job request is completed:

Below is part of the Conversation API that you can make use of it for getting the conversation insights.

POST Formatted Transcript

The API returns a formatted transcript in Markdown and SRT format.

Messages

The Messages API returns a list of all the messages in a conversation with an option to get the sentiment of each message and a verbose option to get the word level time stamps from each message in the conversation.

Follow-Ups

This is a category of action items with a connotation to follow-up a request or a task like sending an email or making a phone call, booking an appointment, or setting up a meeting.

Action Items

This API returns a  list of all the action items generated from the conversation. An action item is a specific outcome recognized in the conversation that requires one or more people in the conversation to act in the future.

Questions

This API helps you find explicit questions or requests for information that come up during the conversation, whether answered or not.

Trackers

With this API users can define/modify groups of names with key phrases and words easily to their choice and detect specific or “contextually similar” occurrences of it in any conversation in one json structure for the whole conversation and add it to the Async or real-time calls. The result will be a json structure output with all the results found, where the users can get the searched relevant information for them in any point of the conversation.

Sentiments

Provides a measure of sentiment in the transcript at the message and topic level. Positive vs. negative vs. neutral.
Note: Topics sentiment are available in post conversation.

For post conversation more Insights and UI:

Analytics:

Provides customers with functionality of finding speaker ratio, talk time, silence, pace and overlap in a conversation and is not limited to the number of participants per channel.

Note: Relevant for conversation with speaker separation

Experience API

To create a pre-built Summary UI (this is a great way to review some of the features without diving deep into every single one)

Entities

This API provides you with a functionality to extract entities from the conversation. Each entity belongs to a category specified by the entity’s associated type. The platform generates entities related to the insight types for datetime and person.

Summarization (Beta)

Summarization is capturing key discussion points in a conversation that helps you shorten the time to grasp the contents of the conversation. Using Summary API, you can create Summaries that are succinct and context-based.

Comprehensive Action Item (Labs)

Similar to Action Items, but with added details such as speaker identification, and context.

Conclusion

In this blog post, you have seen how to process  audio recordings by making use of the Symbl Async Audio URL API. Also, understood on how to get the insights (Topics, Questions, Action-Items, Follow-ups, Trackers etc.) using the Conversation APIs. In addition to getting the regular insights, you have seen the usages of Entities and Trackers for getting additional insights that will help your business needs.

Resources

Async Audio API
Trackers
Summary
Conversation API

Share

Janhavi Sathe
Associate Solutions Engineer