Blog

Use Cases
Concepts
Product
Developer

Basic Code for Live Conversation Intelligence with Javascript WebSocket API

A WebSocket is a protocol for establishing two-way communication streams over the Internet. By using an API for WebSockets you can capture conversation intelligence insights in real-time. Symbl.ai’s Streaming API easily integrates directly into JavaScript WebSocket software, enabling access to conversational AI in real-time conversations.

WebSocket is a protocol that Real-Time Communications (RTC) or Real-Time Engagement (RTE) companies use for establishing two-way or multi-member voice, video, messaging, or broadcast streams over the internet. Many of the most famous RTC/RTE companies, such as Twilio, Vonage, or Agora, use WebSockets for their mobile or browser Software Development Kits (SDK). For example, “Zoom” employs WebSockets as its underlying technology to transfer voice and video at the same time to all participants on the call.

Adding real-time insights to your app with an API

If you have a WebSocket stream set up with one of those companies but don’t know how to record a call, perform live transcription, or run algorithms on the results of a conversation, then you need to use an API. An API for WebSockets enables developers to build voice, video, chat, or broadcast applications, with the ability to capture insights in real-time using artificial intelligence. One API you can use is from Symbl.ai’s Telephony API, however, Symbl.ai’s Websocket easily integrates directly into JavaScript software to enable real-time conversations.

WebSockets for Symbl.ai

Basic Code for Live Conversation Intelligence with Javascript WebSocket API 3

Basic representation of how WebSockets transfer data.

WebSockets facilitate communications between clients or servers in real time without the connection suffering from sluggish, high-latency, and bandwidth-intensive HTTP API calls. Specified as ws or wss, unencrypted or encrypted, a WebSocket endpoint is a persistent bi-directional communication channel without the overhead of HTTP residuals, such as headers, cookies, or artifacts. Here’s Symbl.ai’s API for WebSockets: wss://api.symbl.ai/v1/realtime/ Due to the full-duplex nature of millisecond-accurate state synchronization, many APIs operate under WebSocket messages from publishers to subscribers. TokBox describes its channel software’s method names entirely from WebSockets’ perspective with subscriber or publisher method names.

Best Practices

Although you can already find a detailed guide on best practices for audio integrations with Symbl, here’s the 101 on adopting Symbl.ai’s real-time WebSocket API.

  • Separate channels per person: Make sure the audio for each person is on a separate channel or over separate WebSocket connections for optimal results.
  • Lossless Codecs: Use FLAC or LINEAR16 codecs, if bandwidth is not an issue, or, if it is, Opus, AMR_WB, or Speex codecs.

How to use Symbl’s JavaScript WebSockets API

To achieve live transcription with Symbl.ai’s JavaScript WebSocket’s API in the browser, all you have to do is:

  1. Set up your account
  2. Configure the WebSocket endpoint
  3. Run the code
  4. Transcribe in real-time.

We’ll go through each step.

1. Set up

Register for an account at Symbl (i.e. https://platform.symbl.ai/). Grab both your appId and your appSecret. With both of these you should authenticate either with a cURL command or with Postman to get your x-api-key. Here’s an example with cURL:

curl -k -X POST "https://api.symbl.ai/oauth2/token:generate" \
    -H "accept: application/json" \
    -H "Content-Type: application/json" \
    -d "{ \"type\": \"application\", \"appId\": \"\", \"appSecret\": \"\"}"

Ideally, a token server would handle authentication (with code that makes RESTful API calls for generating a token), so neither the appSecret nor the appId are ever exposed. However, cURL sets you up immediately. With the x-api-key handy you’re now ready to establish a WebSocket endpoint for performing live transcription.

2. Configure the WebSocket endpoint

To enable the WebSocket, configure two values as query parameters fed directly into the WebSocket API’s endpoint. In turn, you feed the WebSocket API’s endpoint directly into JavaScript’s software for enabling real-time conversations.

Here are the first two values:

const uniqueMeetingId = btoa('EMAIL@ADDRESS.COM');
const accessToken = '';
With these two values for the WebSocket’s API endpoint set, you feed these directly into the WebSocket API’s endpoint:
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;

Suppose you want to test your Javascript WebSocket before any further integration. In that case, you load that endpoint with the data for unique MeetingId together with the accessToken into Hoppscotch.io — a free, fast, sleek API request builder.

Basic Code for Live Conversation Intelligence with Javascript WebSocket API 2

3. Create an Instance of JavaScript’s native WebSocket API

The next step is to create an instance of JavaScript’s native WebSocket API:

const ws = new WebSocket (symblEndpoint); 

In this scenario, you call methods specific to handling live transcription.

// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
 console.log(event);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connection
ws.onerror  = (err) => {
 console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
 console.info('Connection to websocket closed');
};
// Fired when the connection succeeds.
ws.onopen = (event) => {
 ws.send(JSON.stringify({
   type: 'start_request',
   meetingTitle: 'Websockets How-to', // Conversation name
   insightTypes: ['question', 'action_item'], // Will enable insight generation
   config: {
     confidenceThreshold: 0.5,
     languageCode: 'en-US',
     speechRecognition: {
       encoding: 'LINEAR16',
       sampleRateHertz: 44100,
     }
   },
   speaker: {
     userId: 'example@symbl.ai',
     name: 'Example Sample',
   }
 }));
};

To set up the stream for accessing a user’s media devices, such as their laptop’s microphone, you’ll need to program the browser’s navigator accordingly:

const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

The code above asks the user for permission to access their device through a pop-up. In mobile applications, permission for device access is hardcoded into the controlling directory of an application binary, such as in the manifest for Android or the property list for iOS. JavaScript doesn’t compile, so there’s no need to ask for permission so you can ask for permission!

To set up the stream to handle events, program the following:

const handleSuccess = (stream) => {
 const AudioContext = window.AudioContext;
 const context = new AudioContext();
 const source = context.createMediaStreamSource(stream);
 const processor = context.createScriptProcessor(1024, 1, 1);
 const gainNode = context.createGain();
 source.connect(gainNode);
 gainNode.connect(processor);
 processor.connect(context.destination);
 processor.onaudioprocess = (e) => {
   // convert to 16-bit payload
   const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
   const targetBuffer = new Int16Array(inputData.length);
   for (let index = inputData.length; index > 0; index--) {
       targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
   }
   // Send to websocket
   if (ws.readyState === WebSocket.OPEN) {
     ws.send(targetBuffer.buffer);
   }
 };
};

Here’s the complete code:

const uniqueMeetingId = btoa('email@address.com');
const accessToken = '';
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
 console.log(event);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connection
ws.onerror  = (err) => {
 console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
 console.info('Connection to websocket closed');
};
// Fired when the connection succeeds.
ws.onopen = (event) => {
 ws.send(JSON.stringify({
   type: 'start_request',
   meetingTitle: 'Websockets How-to', // Conversation name
   insightTypes: ['question', 'action_item'], // Will enable insight generation
   config: {
     confidenceThreshold: 0.5,
     languageCode: 'en-US',
     speechRecognition: {
       encoding: 'LINEAR16',
       sampleRateHertz: 44100,
     }
   },
   speaker: {
     userId: 'example@symbl.ai',
     name: 'Example Sample',
   }
 }));
};
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });
const handleSuccess = (stream) => {
 const AudioContext = window.AudioContext;
 const context = new AudioContext();
 const source = context.createMediaStreamSource(stream);
 const processor = context.createScriptProcessor(1024, 1, 1);
 const gainNode = context.createGain();
 source.connect(gainNode);
 gainNode.connect(processor);
 processor.connect(context.destination);
 processor.onaudioprocess = (e) => {
   // convert to 16-bit payload
   const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
   const targetBuffer = new Int16Array(inputData.length);
   for (let index = inputData.length; index > 0; index--) {
       targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
   }
   // Send to websocket
   if (ws.readyState === WebSocket.OPEN) {
     ws.send(targetBuffer.buffer);
   }
 };
};
handleSuccess(stream);

4. Run the code in your browser

If you implemented your JavaScript right, you can now run your code directly in the browser.

Basic Code for Live Conversation Intelligence with Javascript WebSocket API 1

To stop your live transcription, close your browser or send a “stop_request” directly to the server.

ws.send(JSON.stringify({
 "type": "stop_request"
}));

Conclusion

You should now be able to successfully integrate Symbl’s API directly into JavaScript’s software to transcribe a conversation live from the browser — congratulations!

If you look closely at the data, the conversationId may be applied to new API calls to access AI insights for action items, topics, etc. You can hit these API endpoints with cURL commands, Postman, or integrate Symbl’s WebSockets API with major RTC/RTE companies, like Vonage, Agora, Twilio, Dolby, and Getstream.

If you get stuck, send your burning questions to our Slack Channel, or drop us an email at devrelations@symbl.ai.

Additional reading

Sign up for Symbl updates