What are the types of voices that I can implement?
a) Recorded: In the original version of Voice, a human voice recorded by an announcer can be used, being a more expensive and less scalable option, since the content is customized and must be updated by the same announcer to maintain the same voice at the same time. add new knowledge in the bot.
In this case, the use of SSML (Speech Synthesis Markup Language) must be admitted.
Soon the SSML alternative will be included in Voice 2.0, to enable the functionality of recorded voices.
b) Dynamics: It is a voice generated by a voice synthesizer (using Google voice, Amazon, or Microsoft). It is the alternative in Voice by default, being a simple and scalable option since it allows you to easily update the content.
Within the dynamic voices, there are neural voices that are implemented in version 2.0 of the Voice solution.
Neural voices make it possible to offer a much clearer interaction experience, with higher audio quality and a natural sound, thanks to the use of multiple Deep Neural Networks, or DNNs, for its acronym in English. They are trained based on how people express themselves orally and generate audio based on pitch prediction, prosody, spectral structure, and the sound wave of speech.
In this option, you can choose a voice among different available voicebanks. At Aivo, we work with Amazon Polly, IBM Watson, Google WaveNet, and Microsoft Azure.