


There are many options for automatic speech to text software out there, from paid services to free and open-source options. There is not a vendor out there that supports all the languages and dialects of the world, but in theory this is possible as long as the model can be trained with the right data sets. These two make up the language model, and by applying artificial intelligence and running multiple iterations with this data the language model will become better and better in making the right combinations between sounds and words. Using the audio data, engineers can build an acoustic model that contains specific sounds and with the transcript data, engineers can build a lexicon that contains specific words. What this means, is that in order to build a model in a certain language you would need thousands of hours of audio in that specific language as well as hundreds of hours of perfect transcripts in that specific language. The great thing about automatic speech recognition is that models can be built for any language out there, all that is needed is the right dataset.

Step 2: Convert speech to text with an API and in different languages
