Gateway Options: Text-to-Speech
When a phone call is answered,
the system needs to play the message a) immediately for live human pickup, and
b) until a beep sound is detected for answering machine or voice mail system.
Based on the answering party, the system then decides whether to play or wait.
Unfortunately, there is no tone or signal over the phone line that indicates the
pickup situation. Hence the system has to analyze the audio stream over the
phone line in order to make a decision.
What can complicate the detection process
Strong background noise can be
hard to filter out, especially it is from a human voice. A loud TV in the
background, a second person talking to other person, etc, can mislead the system
to make an answering machine prediction for human pickup.
On the other hand, a very weak
answering machine volume can be diagnosed as background noise. So the system
might think it is a live human pickup and play the message too early.
Another factor is that people
answer the phone differently. There are really no fixed patterns or even regular
patterns to follow. People from different countries, from different ethnic
groups also answer the phone differently.
Answering machine messages and
voicemail prompts are very different too.
Phone company messages are also
complicated. Some announcements start with a disconnected beep sound, others
with a different beep, and the rest do not even have one.
The system has to make a decision in real
time
The system has to analyze the
audio stream in real time so it can respond as soon as possible. This in certain
degree limits what kind of algorithm the system can deploy. It is much harder to
do it in real time with partial audio streams.
The system does make mistakes
As you can see, the prediction
algorithm is based on statistics data. There is no guarantee the prediction is
correct. We can always improve the accuracy and make the system more
intelligent, but to reach 100% accuracy is impossible using today’s technology.
The best system for prediction is the human perception system. But even humans
make mistakes from time to time.
The choice is
based on your application
In order to make the system more responsive to human
pickup, something has to give. And this something is the accuracy for answering
machine detection. If you care more about live human pickup and you can
tolerate more mistakes for answering machine, then you can make the algorithm
more responsive to humans.

Make it most responsive for live human pickup
On an extreme case, you can instruct the system not to do
any answering machine vs. human analysis. Whenever the system hears a voice, it
can start playing the message right away. Of cause, the system still needs some
short time to recognize the human voice, filtering out the background noises,
etc. The drawback is of this approach is that all answering machines will be
treated as humans. The message will be played immediately after an answering
machine answers the call. There will be no message, or only partial message,
left on answering machines.
To set this option, please select Setup > Options…
from the gateway main menu, then choose the Detection tab. Move the
sliding control bar to the position marked “Most aggressive”.
The biggest problem of this approach is that no answering
machine will be recognized. For general usage, this setting is not recommended.
Make it more aggressive more human pickup
The default setting is the balanced approach for humans vs. answering machines.
This is the default setting and it is recommended for general usage. The system
will try to make a prediction as soon as possible.
You can make the system to be
more aggressive on human pickup without totally losing the ability of answering
machine detection. You can set it prediction to be “more aggressive” on humans.
In this setting, the system will try to make a prediction as soon as the audio
is likely to be humans. But there will be more mistakes for answering machines.
Previous: Text-to-Speech
Next: Advanced
|