Choosing the right integration approach and providing speech data to a Voice or Audio API is straight-forward if you take the time to understand the complexities and plan accordingly. These guidelines help decision-makers understand key considerations so you can move forward with a strategy that works for the long term.
Choosing the right API
Here’s a quick decision flow to help you choose the right API for your business.
General Best Practices
After choosing the best API for your needs, the next considerations for accuracy and efficiency are:
- Capture the audio at the source with a sampling rate of 16,000 Hz or above when integrating over SIP.
- Lower sampling rates may lead to reduced accuracy
- If you cannot capture audio at the source with 16,000 Hz or higher, don’t re-sample the original audio to bump up the sample rate because this can reduce accuracy
- Retain the original sample even if it is lower than 16,000 Hz. For example, in telephony the native rate is commonly 8000 Hz.
Audio Chunk (Buffer) Size
- For live audio streaming use cases (Telephony with SIP and Real-time WebSocket API), use a single audio chunk or buffer size closer to 100-millisecond for a balanced latency vs efficiency tradeoff
- Larger chunk size in the audio is better for accuracy but will add latency
- It’s best to provide audio that is as clean as possible
- Excessive background noise and echoes can reduce accuracy
- When possible, position the user close to the microphone
- If you are considering noise cancellation techniques, be aware they may result in information loss and reduced accuracy. If unsure, avoid noise cancellation.
- Don’t use Automatic Gain Control (AGC)
- Avoid audio clipping
Multiple People in a Single Channel
- Ensure audio volume for each person is the same. Differing audio levels for speakers can be misinterpreted as background noise and ignored.
- Where possible, avoid multiple speakers talking at the same time
- Push Speaker Events to indicate the start and stop times for each person in the meeting or call.
For optimal results, consider using Real-time WebSocket API with speaker separated audio.
- Symbl provides the optional calibration phase that helps fine-tune the overall system to fit your preferences. Contact us to learn more.
You can learn about best practices for each API in our Documentation.
Sign up for Symbl to get 100 minutes in free trial credits so you can put these best practices to work.