If you are building a communication experience that is enabled natively with voice, video, you are probably building either with an open source communication stack like Jitsi, a
cloud API like Twilio or Agora or with communication platforms like Zoom or Microsoft Teams.
In any case, building communication does not end by adding the voice or video, it goes beyond the enablement. The 2.0 for communication experiences are now defined with conversation intelligence both in real-time, before the communication and after the particular instance is ended. Using the characteristics, content, and tone of conversations and leveraging the intelligence from single or multiple calls is redefining the next generation of digital communications.
Conversation content is an untapped data source from finding the most effective ways to improve your product to learning about your customer’s pain points and concerns. Meaningful insights can still be missed or go undocumented when these conversations are not captured or contextually analyzed. If you don’t take the time to pass information from calls on to other team members, or if there isn’t a method in place to record and distribute feedback to other stakeholders, all of the knowledge that could have helped better your company and product is never saved and utilized.
Conversational intelligence refers to software that analyzes audio or text using artificial intelligence (AI) to obtain data-driven insights from communication between employees and consumers.
Real-world applications for Intelligent Communication Experiences
In order to act on data in real time, conversation data needs to be streamed across various platforms like CRMs, ad platforms, data analytics, attribution systems, and digital experience platforms. It’s then used by revenue teams in marketing, sales, customer service, and e-commerce to improve purchasing experiences and increase conversions and revenue.
Some examples of applications across the customer experience lifecycle:
- Retrieve meaningful insights directly from customers by obtaining detailed information from conversations with them.
- Better understand customer behavior and map them out in order to improve your services.
- Predict customer behavior and provide them with exceptional service.
Other use cases include:
- Management of pipelines. Sales managers can review late-stage conversations so that they can forecast more accurately.
- Coaching for sales. Managers can examine successful sales calls to learn from top performers and provide coaching/feedback to those who need it.
- Sharing tribal knowledge. Sales conversations provide a wealth of data that teams can use to influence product roadmaps, messaging, and competitive market intelligence.
- Improvements to the sales process. Organizations can detect bottlenecks in their sales process and make adjustments.
- Sales onboarding. To reduce training time, new sales personnel can listen to successful sales calls.
In this blog, we will take an example of how you can set-up Symbl and Jitsi and the communication workflow to enable conversation intelligence and analytics in your application. Note that for demo purposes we are using a simple js code to run the browser, but ideally if you are building a web application with the communication APIs, please use
WebSDK to build the integration. Drop a note on
[email protected] if you have any questions on using the WebSDK for your app.
Now let’s take a quick look at a quick test run with Jisti and Symbl.ai
What Is Jitsi?
Jitsi is an open source video conferencing software that allows you to easily build and deploy secure video conferencing solutions. At the heart of Jitsi are
Jitsi Videobridge and
Jitsi Meet, which let you hold conferences on the internet. Other projects in the community enable features like audio, dial-in, recording, and simulcasting.
What Is Symbl.ai?
Symbl.ai is a conversation intelligence platform that allows developers to natively integrate conversation intelligence into their voice or video applications without building machine learning models. It’s an AI-powered, API first, conversation intelligence platform for natural human conversations that works on audio, video, and textual content in real-time or recorded files.
Symbl.ai’s APIs lets you generate highly accurate, and contextually relevant real-time sentiment analysis, question, action items, topics, trackers, and summary in your applications.
Setting Up Jitsi
For this tutorial, you’ll need to run Node.js version 14 or greater and
npm version 7 or greater.
Download Node.js and install it by cloning the repository.
# Clone the repository
git clone https://github.com/jitsi/jitsi-meet
cd ./jitsi-meet
npm install
# To build the Jitsi Meet application, just type
make
Run this code with webpack-dev-server for development. When you execute the
make dev
command, you’ll be able to successfully serve the app on
localhost:8080
in the browser.
In your terminal, type the
make dev
command. The default backend deployment is
alpha.jitsi.net
.
If you plan to use a different server, you can use a proxy server to point the Jitsi Meet app to a different backend. To accomplish this, set the
WEBPACK_DEV_SERVER_PROXY_TARGET
variable:
export WEBPACK_DEV_SERVER_PROXY_TARGET=https://your-example-server.com
make dev
Now the app should be running at
https://localhost:8080/.
Note that the development certificate is self-signed and browsers may display a certificate error. It’s safe to ignore these warnings and proceed to your website.
Setting Up Symbl.ai
To set up Symbl.ai,
create a free account or
log in at Symbl.ai.
Then, copy your API keys. Using the Symbl.ai credentials, generate an authentication token for making API queries.
!
Symbl.ai API credentials
To send a recorded conversation or to make a live connection, you need to send discussion data in real-time or after the call has ended using the following APIs:
- Async APIs allow you to send text, audio, or video conversations in recorded format.
- Streaming APIs allow you to connect Symbl.ai on a live call with WebSocket protocol.
- Telephony APIs allow you to connect Symbl.ai on a live audio conversation with Session Initiation Protocol [SIP] and Public Switched Telephone Network [PSTN].
Finally, you need to get the conversation intelligence. By default, you should have had
conversationId
returned to you in the previous step. This can now be used in the Conversation API to create any of the following:
- Speech-to-text (transcripts)
- Topics
- Sentiment analysis
- Action items
- Follow-ups
- Questions
- Trackers
- Conversation analytics
Integrate Symbl.ai with Jitsi
Building apps and SDKs on Windows is not supported by Jitsi. To address this problem, you’ll need to make use of
Debian or
Ubuntu.
Before integrating Symbl.ai with Jitsi, make sure you have Jitsi running locally on your machine. Then log in to your
Symbl.ai dashboard and copy your App ID and App Secret.
Make a POST request to
https://api.symbl.ai/oauth2/token:generate
with a tool like
cURL or
Postman. Use the following as the POST body:
{
"type": "application",
"appId": "YOUR APP ID",
"appSecret": "YOUR APP SECRET"
}
Once you edit the token, click
SEND to generate an accessToken.
Now you need to integrate the Live speech to text and AI insights on your browser within your Jitsi meeting using
WebSockets. Instead of using the PSTN (Public Switched Telephone Network) because of its expense when it comes to scalability, you’ll utilize WebSockets for this integration.
Navigate to your Jitsi webpage, open the console, and paste the following:
/**
* The JWT token you get after authenticating with our API.
* Check the Authentication section of the documentation for more details.
*/
const accessToken = "" //your access token from symbl.ai
const uniqueMeetingId = btoa("[email protected]")
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
// You can find the conversationId in event.message.data.conversationId;
const data = JSON.parse(event.data);
if (data.type === 'message' && data.message.hasOwnProperty('data')) {
console.log('conversationId', data.message.data.conversationId);
}
if (data.type === 'message_response') {
for (let message of data.messages) {
console.log('Transcript (more accurate): ', message.payload.content);
}
}
if (data.type === 'topic_response') {
for (let topic of data.topics) {
console.log('Topic detected: ', topic.phrases)
}
}
if (data.type === 'insight_response') {
for (let insight of data.insights) {
console.log('Insight detected: ', insight.payload.content);
}
}
if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
}
console.log(`Response type: ${data.type}. Object: `, data);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror = (err) => {
console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
console.info('Connection to websocket closed');
};
// Fired when the connection succeeds.
ws.onopen = (event) => {
ws.send(JSON.stringify({
type: 'start_request',
meetingTitle: 'Websockets How-to', // Conversation name
insightTypes: ['question', 'action_item'], // Will enable insight generation
config: {
confidenceThreshold: 0.5,
languageCode: 'en-US',
speechRecognition: {
encoding: 'LINEAR16',
sampleRateHertz: 44100,
}
},
speaker: {
userId: '[email protected]',
name: 'Example Sample',
}
}));
};
const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false
});
/**
* The callback function which fires after a user gives the browser permission to use
* the computer's microphone. Starts a recording session which sends the audio stream to
* the WebSocket endpoint for processing.
*/
const handleSuccess = (stream) => {
const AudioContext = window.AudioContext;
const context = new AudioContext();
const source = context.createMediaStreamSource(stream);
const processor = context.createScriptProcessor(1024, 1, 1);
const gainNode = context.createGain();
source.connect(gainNode);
gainNode.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = (e) => {
// convert to 16-bit payload
const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
const targetBuffer = new Int16Array(inputData.length);
for (let index = inputData.length; index > 0; index--) {
targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
}
// Send audio stream to websocket.
if (ws.readyState === WebSocket.OPEN) {
ws.send(targetBuffer.buffer);
}
};
};
handleSuccess(stream);
Run this code in your browser’s developer console or embed it in a HTML document’s script
/>
element.
Click
Enter. Whatever is being said in the meeting will be recorded, as well as topic and transcripts.
You should now be able to retrieve conversation insights using the Conversation ID, which you can retrieve from the onmessage handler. With it, you can also view conversation topics, action items, and follow-ups.
To end the connection, just close your browser or if you wish to automate the process, you can add your email so that at the end of a call, the insights are emailed directly to you.
Conclusion
In this article you looked at the need of conversation intelligence, the 2.0 of your communication experience and the importance of using conversation data from voice or video calls in your product or business. Conversation intelligence is a critical part of your communication stack and can boost your growth on all forefronts of communication.
You were also introduced to Jitsi, an open source communication stack. You can use it in combination with Symbl.ai to provide APIs that integrate into your conversation intelligence. By utilizing Jitsi and Symbl.ai together, you can retrieve meaningful insights, understand customer behavior, and predict customer behavior in order to improve both your product and your sales.
If you are building a web based communication app, please refer to the
Web SDK tutorial with Symbl. You can also extend the existing experience with both additional intelligence or build the same integration using Cloud APIs for Voice, Video instead of using Jitsi.
Please share any feedback on the
developer Slack community.
Associate Solutions Engineer