How are audio to text transcripts made?

Transcripciones de audio a texto

Transcription work is required when we have an audio or video and need to pass its content to text, sometimes to translate the text at a later stage.

Traditionally, the person doing the transcription listened to the text and typed simultaneously. Nowadays there are tools that help us to carry out this task more quickly if we already have an audio file, but especially for English. If the audio is in another language, the result is usually useless, but we always test it to see if the process can speed up the work and therefore result in a lower cost. Even very popular platforms, such as YouTube or Google, generate a transcript for free, with a simple click. We must be cautious with free tools, as they do not ensure the confidentiality of our data as they become part of the “cloud”.

If we choose to entrust it to a professional service, the most common thing is to use a local tool that allows us to obtain a draft of the transcription. Depending on the system used, the quality of this draft will allow us to speed up the transcription process. By investing in these tools, the price of transcriptions is becoming more and more affordable.

How long does it take to transcribe a minute of audio?

The average time is between four and six minutes of work per minute of audio.

What does this time depend on?

In order to obtain a final transcription of an audio file several aspects influence, which will cause that we have to listen to the audio file several times.

  • Volume: If it is very low it may mean that even some parts are inaudible.
  • Sound quality: The environment or surrouding sound can influence the “capture” of dialogues.
  • Diction of the people involved: Do they have a strong local accent? Do they have a foreign accent?

A general distinction is made between various types of transcriptions:

  • phonetics, which uses a system of symbols of its own to represent the sounds of human speech, and
  • verbatim, which uses the spelling and conventions of a language.

This last service is what we refer to when we speak of the transcription service. Within this, we distinguish between several types of transcriptions: Edited or vermatim transcription. Edited transcription eliminates all expressions that do not contribute or change the meaning of what is heard, for example, uhm, eh, or repetitions. Verbatim, on the contrary, includes everything that is heard, for example, unfinished phrases or words.