Blog

Developer
Concepts
Use Cases
Product
Spotlight

How to Use WebSockets in Your Voice/Video App

WebSockets are the most popular protocol for real-time communication within a browser, like instant messaging, multiplayer games and voice/video calls. You can use open-source libraries to simplify your WebSocket setup, or go a step further with Symbl.ai’s WebSocket API to add real-time conversation intelligence to your app.

In the age of lightning-fast internet speeds and real-time interactions, there’s simply no room (or tolerance) for delays — particularly with voice/video applications.

Enter WebSocket with its low-latency, bidirectional messaging ability that suits any application designed for real-time communication within a browser. To get you up and running with this popular protocol, here’s a brief overview of what you can use to implement it, and a few clever ways you can take your voice/video app the extra mile.

The why and where of WebSockets

The general-use WebSocket protocol transforms the simplicity of HTTP’s request/response model into a highly reactive, event-driven, duplex protocol.This allows end-users to send and receive messages without delay and lets savvy developers bring real-time communication to anyone with a web browser and an internet connection.

The type of apps that benefit the most from WebSockets typically involve real-time communication between multiple users and/or have to relay constantly-changing server-side data. This includes:

  • Instant messaging apps
  • Real-time collaboration
  • Multiplayer games
  • Real-time feeds (sports, social, finance, etc.)
  • Location-based apps
  • Audio/video calls

If you want to look under the hood and learn how WebSockets actually work, check out our post on WebSockets vs. SIP.

Using WebSockets with open-source libraries

Implementing WebSockets can be tricky if your app needs more than just basic support. Luckily, you can abstract away much of the setup using open-source libraries, including: 

Socket.io

This popular app framework consists of a Node.js server and a Javascript client library for the browser. Unlike vanilla WebSockets, this library can establish a connection even if there’s a personal firewall or antivirus software and will handle a dropped connection automatically.

It also has the benefit of falling back on other technologies like AJAX long polling, Flash Socket, and many others if WebSockets isn’t supported, helping you create real-time apps that work everywhere.

SocketCluster

This is a real-time, client-server messaging framework for Node.js that’s ideal for building chat systems. It can handle an unlimited number of processes/hosts, and is easy to scale after you’ve deployed your app to a Kubernetes cluster. This library takes slightly more effort to install than the easy breezy Socket.io, but it’s still reasonably simple to set up

By the way, if you’re creating brand new schemes for real-time engagements over a WebSocket protocol enabled with Node.js, check out Symbl.ai’s Node.js SDK to make it happen right out of the box. To make your job even easier, Symbl.ai already runs on the most popular microservices, like Twilio, Agora.io, and AWS.

SockJS

SockJS mimics the WebSockets API but uses a SockJS Javascript object instead. Some devs may find the need for this WebSocket emulator questionable, considering most modern browsers natively support WebSockets. But, there are certain environments that might not support the WebSocket protocol — like corporate proxies, for example.

μWebSockets

µWebSockets is described as ‘the obvious starting point for any real-time web project with high demands.’ It’s generally considered the fastest WebSocket server implementation available and is actually used by SocketCluster for its performance and stability.

Taking your app up a notch with APIs

With so many voice/video apps already on the market, it’s worth thinking about elevating yours with dedicated APIs that work over WebSockets. This would give you all the advantages of real-time, two-way communication, while squeezing the most value out of your data.

For example, Symbl.ai’s Real-Time WebSocket API lets you plug in conversation intelligence using a suite of APIs to swiftly upgrade your app with features like:

Real-time intelligence: Analyze audio and surface useful insights when you need them — like whether a customer has called before, what their previous interaction was about and what information to give them next. You can analyze audio either in real time or asynchronously, and also access other useful data like speaker ratios, silence and overlap through Symbl.ai’s Analytics API.

Sentiment analysis: Pick up on people’s tone and word choices to determine their emotional state. Then use that information to help steer the conversation toward a positive outcome. Symbl.ai does this by transcribing speech in real time and analyzing those messages to pull sentiments.

Accurate transcriptions: Capture conversations (from a recording or in real time) with human-level comprehension. With advanced speech recognition and contextual AI, you can get highly accurate notes and summaries with added information like action items, topics, questions, and intents.

Automated actions: Translate insights into action, like automatically schedule a follow-up meeting for specific team members or send an email summary to a customer after a call.

Of course, you could build these capabilities yourself using open-source libraries, but the trade-off is a slower time to market and more effort needed to scale. Your best bet is to harness existing APIs, like Symbl.ai’s, that have already done the heavy lifting for you. Then all you need to do is focus on turning accurate, real-time insights into actionable data.

Implementing WebSockets with Symbl.ai

To try out Symbl.ai’s Streaming API in the browser with your microphone, sign up for a free account. Once you’ve done that, authenticate with cURL or Postman to get a bearer authorization token. Then, add that token to the line below called accessToken. Lastly, open a Chrome browser, hit Control + J + Command,  drop the code with the accessToken in the browser, and hit enter!

/**
 * The JWT token you get after authenticating with our API.
 * Check the Authentication section of the documentation for more details.
 */
const accessToken = ""
const uniqueMeetingId = btoa("user@example.com")
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
  if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connection
ws.onerror  = (err) => {
  console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
  console.info('Connection to websocket closed');
};
// Fired when the connection succeeds.
ws.onopen = (event) => {
  ws.send(JSON.stringify({
    type: 'start_request',
    meetingTitle: 'Websockets How-to', // Conversation name
    insightTypes: ['question', 'action_item'], // Will enable insight generation
    config: {
      confidenceThreshold: 0.5,
      languageCode: 'en-US',
      speechRecognition: {
        encoding: 'LINEAR16',
        sampleRateHertz: 44100,
      }
    },
    speaker: {
      userId: 'example@symbl.ai',
      name: 'Example Sample',
    }
  }));
};
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });
/**
 * The callback function which fires after a user gives the browser permission to use
 * the computer's microphone. Starts a recording session which sends the audio stream to
 * the WebSocket endpoint for processing.
 */
const handleSuccess = (stream) => {
  const AudioContext = window.AudioContext;
  const context = new AudioContext();
  const source = context.createMediaStreamSource(stream);
  const processor = context.createScriptProcessor(1024, 1, 1);
  const gainNode = context.createGain();
  source.connect(gainNode);
  gainNode.connect(processor);
  processor.connect(context.destination);
  processor.onaudioprocess = (e) => {
    // convert to 16-bit payload
    const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
    const targetBuffer = new Int16Array(inputData.length);
    for (let index = inputData.length; index > 0; index--) {
        targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
    }
    // Send audio stream to websocket.
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(targetBuffer.buffer);
    }
  };
};
handleSuccess(stream);

Additional reading

For more information on WebSockets, how to use them and how they compare to similar protocols, check out these sources:

How to use Symbl’s voice API over WebSocket to generate real-time insights

WebSocket or SIP: which is better for your app?

WebRTC or WebSockets: which is right for your app?

How come I can’t cURL a WebSocket?

How to secure your WebSocket connections