Summarizing conversations can be a tedious, labor-intensive process when attempted manually. Using artificial intelligence to automate the process of generating summaries can free up resources better served elsewhere in your organization. We will discuss the pros and cons of using Symbl.ai’s Summary API vs. OpenAI’s GPT-3 model to do just that.
What is abstractive summarization?
Abstractive summarization is an approach used to generate a summary in natural language without copying sentences directly from the original source. Simply put, this method of summarization produces a summary written in natural language just as a human could if given the same instructions.
The summaries provided by abstractive summarization are more than a templated list of key sentences or phrases which occurred over the course of a conversation or meeting; they are a true synopsis of the content of that conversation.
Option 1: Symbl.ai
The Symbl.ai platform provides developers with the ability to generate many conversational intelligence data points, which includes the ability to produce an abstractive summary. While all of these conversation APIs are included on the platform, this blog will only focus on summarization.
Symbl.ai supports processing conversations on the audio, video, and text channels. All three channels are supported for asynchronous processing, while the audio channel supports streaming via Websocket.
Symbl.ai’s solution for summarization currently supports six dialects of English on all of their channels. The dialects are the United States, United Kingdom, Australia, Ireland, India, and South Africa. It is important to note that this solution requires the language code to be provided as a parameter at the time of processing, or it will default to English (United States).
Symbl.ai offers ASR services for the video and audio channels, and supports a few methods of speaker separation. When using the asynchronous API, speaker separation can be achieved by using speaker separated audio, with each speaker on a different audio channel. Speaker diarization can also be used for single channel audio, but this will come at a cost of separation accuracy. The streaming API supports an unlimited number of connections on the same conversation, which allows users the ability to have speaker separation for any amount of speakers needed. This is particularly helpful in today’s world of remote meetings which can have any number of participants. The streaming API also supports speaker diarization if all audio is provided on a single connection. Speaker separation makes an abstractive summary much more effective, as the model can pinpoint specific speakers and use that information to better format the summary. Unfortunately, the summary API does not support any customizations such as length or detail level, but support for these customization options are currently in the works.
Option 2: GPT-3
GPT-3 is a language model created and maintained by OpenAI. This model is made available to developers via API, and it is capable of understanding and producing a seemingly endless list of tasks such as classification, summarization, translation, and even generating code from natural language.
GPT-3 only understands text, and offers no ASR capabilities for the video or audio channels. A third party ASR service is needed to complete the speech-to-text transcript before GPT-3 could offer any summarization capabilities on those channels.
OpenAI does not publish a specific number of supported languages for GPT-3, as this solution’s model is intended to be “all encompassing.” This means, in theory, GPT-3 supports all languages. In reality, this solution supports any language its model was trained on, but that data is proprietary and unknown to the general public.
GPT-3 is an incredibly impressive language model capable of understanding nearly anything in natural language. Instead of having different API endpoints to generate conversational intelligence, GPT-3 attempts to understand the user’s input, and generate an output based on what was provided to the model.
For a summarization example, you could provide the model with “Summarize the following text: …” followed by the text to be summarized. The model would then comprehend your request and output a summary. You can even provide more guidance to the model such as “Summarize the following text in five sentences for a 5th grade reader: …”
In this scenario, the model does its best to write five simple sentences that a 5th grader could read. This understanding is not limited to just summarization, however. The user could use natural language to request the model to write a story, create a recipe, or even translate text from one language to another.
The real limitation with GPT-3’s publicly available API is that it can only process so much text at once. A single request can only use 4,000 tokens This limitation is made even tighter because the output is included in the limit, making GPT-3 not very suitable for large scale summarizations.
Side by Side Test
A side-by-side comparison was done with both solutions using a podcast video. The video chosen lasts just over twenty minutes. The video was downloaded in MP4 format as neither solution can process a video from video platforms such as YouTube.
Symbl.ai makes using their solution easy to use by providing access to fork a Postman collection here. This collection provides developers with pre-built requests in Postman for every supported feature of the platform. With this collection, evaluating the solution took minutes. In total, five API methods were used, including the asynchronous POST Video method to process the MP4 video file and receive the conversation ID associated with the video.
Symbl.ai took just over 7 minutes and 16 seconds to produce this summary:
Speaker is looking for a job offer national will help with that. Speaker is excited about the future of work-life balance between employers and employees. Speaker thinks that setting up an entity in singapore opens up more opportunity for employers and employees. Speaker believes that companies that are global are more flexible and able to adapt to changes in the economy Speaker is the author of a book. Speaker is back on the tech talks daily podcast they are going to talk about the challenges of working from anywhere in the global economy, the shortage of tech skills and different salaries for the same job. Speaker invited rick hamill, founder and ceo of atlas, to join them in the discussion about all they said today rick will join Speaker tonight in chicago, illinois. Speaker is going to spin the wheel of fortune tomorrow to decide what topic they are going to talk about. Speaker asks guests to leave them a song that has inspired them elias’ song ” revolution ” by elias is one of them Speaker is inspired by the organization they work for. Speaker will add before i let you go to their playlist. Speaker is writing a book called ” the good, the bad and the ugly ” about their experience as an entrepreneur. Speaker has a platform that allows them to hire the best developer without setting up an entity in a different country. Speaker would like to be able to onboard talent as fast as possible Speaker was reading yesterday that there will be a shortfall of three million people in cyber security. Speaker will mention cyber secure and evelyn to address the critical tech skills gap. Rick hamill is the founder and ceo of atlas Speaker created opportunities for companies to expand their business and connect them to talent in 160 countries. Speaker’s team in china can onboard an employee as soon as two weeks in china it takes 18 months to set up an entity 250,000 dollars to actually and share capital once the entities established where with Speaker, they can onboard employees as fast as 2 weeks andre. Speaker was a head of hr for a government contractor in saudi arabia before they started working for atlas Speaker helps companies on their growth journeys with software and solutions that enable global talent management. Speaker found a provider that could be the employer of their services, but later found out that they were outsourcing their services they need to set up their own entities and transition the employees off the outsourcing provider. Speaker started working for a direct-to-direct business model two or three years ago now they work for a company that owns and operates in over 160 countries. Speaker developed a platform based on the experience of doing business in 160 countries and rebranded from element. Speaker launched a new version of their platform, atlas it’s a service-focused platform they manage global operations, global people operations, but they still have a service component. Speaker is talking about the three pillars that make up their company: clients, employees and the world. Speaker believes that a lot of the reports are based on what’s going on in certain countries Speaker wants to be able to hire more flexible in terms of the type of talent that they are looking for. Employee retention in the tech industry should be a high priority for businesses businesses should create a work environment where employees feel like they have a part in the mission and they have an opportunity to contribute to the mission they should also benchmark their wages to make sure they are relevant. Speaker makes sure that people have a seat at the table and are part of the vision. Speaker wants to know how businesses navigate around different salaries, depending on where their employees live for the exact same job Speaker believes in regional salary benchmarking. Businesses and employees have to evolve and adapt to change the future of work because there’s no going back to the old ways Speaker believes there’s a war on talent where employers are competing for the best talent. Speaker recommends employers to use learning, development, tools, education and assistance to improve their internal talent Speaker partner with coursera to give their employees access to the platform to learn and improve their skills. Speaker is looking at how to improve the quality of their team’s learning. Speaker thanks neil for listening to the tech talks daily podcast. Speaker will have a live chat with henry x-ray martin today they will talk about the business model of direct employer of record, business model and the rise of digital nomads.
GPT-3 provides a playground GUI for their solution, as well as an open API and limited SDK for NodeJS and Python. The GUI was used exclusively for this test, as there is only one API call to be made on the solution. The output is determined by a natural language prompt, followed by the text to be summarized. The limit of 4,000 tokens was reached quickly, as the twenty minute podcast’s transcript was over 5,000 tokens. This limit includes the prompt (“Summarize this transcript:”), the text to be summarized, and the summary response. This limit makes summarizing long form media impossible. After Using Symbl.ai to generate a transcript (7 minute and 24 second processing time), an error is received detailing the token overage. Once the transcript was trimmed to allow for processing, GPT-3 provided this summary:
This text discusses the challenges that businesses and employees face as the world of work changes and becomes more globalized. It describes how companies need to be more flexible in order to compete for the best talent, and how they can use tools and services to reskill and retain employees.
As is clearly seen here, GPT-3 provided a much less detailed summary of the podcast than Symbl.ai.
In conclusion, GPT-3 is an incredible display of the power of artificial intelligence and natural language understanding. However, it is not as well suited for creating an abstracted summary for large-scale conversational intelligence efforts.
Symbl.ai’s platform offers greater flexibility for channel support, as well as no text, audio, or video length limits for processing. If you want to get started with Symbl.ai, you can sign up for the platform for free and use this guide to begin generating summaries from your conversations today.