Intelligent transcription uses contextual understanding to capture valuable business insights and make your content more searchable and easily accessible.’s Transcription Plus offers real-time and async transcription with multiple inputs and outputs. You can use Transcription Plus for optimization of your video and audio content. It’s also expert at formatting, accurate punctuation, entity recognition, custom vocabulary, speaker separation, and supports multiple languages.

What is intelligent transcription?

Normal transcription is the process of converting audio and voice into a text transcript. This is also known as speech-to-text, or automatic speech recognition (ASR). Intelligent transcription uses contextual understanding for an almost “human” understanding of a conversation, including sentiments. Intelligent transcription can be done real-time or from a recording (async). It’s also domain agnostic, which means it can understand words from different domains and apply that knowledge to contexts it hasn’t specifically been trained for. With speech recognition technology, you can make transcriptions intelligent, adding real value to communication, to enhance note-taking, compliance, and making your content more searchable and accessible. Also, intelligent transcripts allow for better knowledge sharing, can summarize conversations to catch valuable business insights, and help reduce costs of recording and storing conversation data.

What’s intelligent transcription used for?

Intelligent transcription is highly applicable to everyday life. Some real-world examples include live captioning, accessibility, and transcribing whole video and audio files into text.

Live captioning

Live captions help with focus, engagement, and information retention. They can even boost your SEO because search engines index text, whereas they don’t index audio. Live captioning is often used for online learning because it allows student engagement in real-time. It’s also invaluable in sound-sensitive environments, like if someone is taking a class in a noisy home or with a sleeping child. In fact, studies have shown that the majority of people watch videos with the sound off!


Intelligent transcription makes videos accessible to viewers that are hearing-impaired. Not only does this inclusion increase the potential audience, but it also enables compliance with accessibility laws to ensure, for example, that local governments don’t discriminate against groups of people when live streaming town hall or board meetings. The same is true for businesses removing barriers when sharing content or live meetings when it’s estimated that 30% of the workforce has a disability.

Transcribing whole video and audio files into text

Intelligent transcription has the advantage of making your video and audio content quicker and easier to distribute, giving you the ability to reach a wider audience. For example, you can take the transcription of your live webinar and create written content or thought leadership material from it, or broadcast the contents of a business meeting beyond the team that attended.

How is’s Transcription Plus different?’s speech-to-text intelligent transcription product, better known as Transcription Plus, offers you conversation intelligence with your speech-to-text. Transcription plus is specifically designed to reduce engineering requirements for developers. Let’s take a closer look at a few more features and capabilities:

  1. Real-time and async, with multiple inputs: Whether it’s video, audio, text or phone, Transcription Plus offers you multiple formats, such as asynchronous and synchronous (telephony, WebSockets).
  2. Multiple outputs: You can choose SRT format (standardized for video captions and subtitles), Markdown (easy to publish in an HTML paragraph structure), JSON, and text.
  3. Optimization of your video and audio content: Detect insights, action items, questions, topics, and trackers.
  4. Formatting: Auto paragraph generation (this is Markdown format).
  5. Accurate punctuation and entity recognition: You can benefit from features such as punctuation (making it more readable with sentence separation) and entity recognition (to identify names, times, dates, and places).
  6. Custom vocabulary: You can program your model if you need specialist understanding or have frequently used words. For example, you might want the word “sell” to be transcribed as “sell” more often than “cell.” Here you would use speech adaptation to bias the transcription to recognize “sell.”
  7. Speaker Separation/Diarization: You can create speaker separation for multiple speakers, or if you can’t identify separate speakers, you can emulate this by using different channels and speaker events.
  8. Supports multiple languages and accents: supports more than 20 languages including English, Russian, French, Italian, Hindi, Japanese, and Spanish. also offers speech recognition models that are fine-tuned for different accents, for example, to understand the differences between spoken American and British English.
  9. Sentence level sentiments: Sentiment analysis at the sentence level lets  you determine whether the speech is positive, negative, or neutral.
  10. Unlimited streaming length: There is no limit on the time available for’s live transcription. Other standard transcription providers, like Google, only offer five-minute time chunks of transcription, which means you have to start a new five-minute section just before the previous one ends to preserve the flow.

How to get started with’s Transcription Plus offers you Messages, which is the endpoint of the Conversation API, specifically for speech-to-text transcription. “Message” refers to a continuous spoken sentence. To get started with’s Transcription Plus, there are two ways to hit the ground running. When you want to ingest or process files, this can be done in real-time or async with each of these options having a different API endpoint. Let’s take a look at each:


With real-time,  can get your speech-to-text transcription in real-time using’s Streaming API for WebSocket Protocol, Telephony API for SIP/PSTN, or with’s SDKs for JavaScript or Python.


With an async file, you need to use’s Async API. This provides a REST interface that helps you submit any recorded or saved conversations to’s unique “ConversationId”

When you process any conversation through, whether it’s from Async API, Javascript SDK, Python SDK, Telephony or Streaming API, you’ll always receive a unique conversation identifier (called the “ConversationId”), which consists of numerical digits and is unique to your conversation. Your ConversationId is the key to receiving conversational insights from any conversation (async, real time, or text) processed with Here’s a simple API call as an example that grabs the speech-to-text transcription from the conversation. You can process any text payload with the Async Text API, or audio file with the Async Audio API.

Sentiment analysis with’s ConversationId

You can also get sentiment analysis – the interpretation of the general thought, feeling, or sense of an object or a situation – using your ConversationId in the Sentiment API. All you need to do is pass query parameters sentiment=true. Here’s an example of the sentiment analysis API response. “Polarity” shows the intensity of the sentiment. It ranges from -1.0 to 1.0, where -1.0 is the most negative sentiment and 1.0 is the most positive sentiment.

     "messages": [
              "id": "6412283618000896",
              "text": "Best package for you is $69.99 per month.",
              "from": {
                  "name": "Roger",
                  "email": "[email protected]"
              "startTime": "2020-07-10T11:16:21.024Z",
              "endTime": "2020-07-10T11:16:26.724Z",
              "conversationId": "6749556955938816",
              "phrases": [
                     "type": "action_phrase",
                     "text": "$69.99 per month"
              "sentiment": {
                 "polarity": {
                     "score": 0.6

The platform offers APIs for all your conceivable, intelligent transcription needs. Get in touch today to get started.

Further reading: