How to Transcribe Audio to Text

September 18, 2020
How to Transcribe Audio to Text

I think transcription is one of those tasks that many people assume is quick and easy… until they try it. While the concept is simple – you just sit and type to transcribe audio to text – the task actually requires a great deal of concentration and focus. Not only that, but you have to work at a high speed if you want to transcribe audio at the kind of pace that means you can make a living from it. And that’s at the same time as ensuring there are no mistakes within the copy you’ve just typed. 

If you want to know how to transcribe, read on. We’ll run through how to transcribe audio to text accurately and efficiently, look at potential pitfalls and what to do about them and consider how to deal with technical and sensitive content. Ready? Then let’s get cracking. 

Key Considerations with Audio to Text Transcription

Audio to text transcription is surprisingly different to video transcription. When I transcribe video, it’s never hard to recognise who is speaking, and having the visual element can do much to make up for poor sound quality. 

Audio file transcription is a very different matter, presenting its own unique set of challenges. These include:

  • Audio quality
  • Transcribing standards
  • Technical subject matters
  • Multiple speakers 
  • Resources and project size
  • Sensitive content 

Let’s take a closer look at each of these. In addition, if you want to explore what transcription services are in more detail, and why your business needs them, you can click the link below. 

Read more: What Are Transcription Services and Why Does Your Business Need Them?

Looking to cut a long story short? Have a look at this video instead.


Audio Quality

When I talk about transcription, a lot of people imagine me sitting and typing away happily as one person speaks clearly and slowly on the recording. If only that were the case! More often than not, transcribing audio to text means dealing with a whole host of inaudible words, mumbling, non-words (er, um, hm), background noise (including entire conversations), periods of silence and stutters and gaps in the recording itself. 

Thankfully, none of this is insurmountable – it’s all just part of the fun of undertaking audio file transcription. 

One of the trickiest challenges is when a word or phrase is inaudible. I’ll find myself listening repeatedly and tweaking the volume to try and catch what is said, but sometimes it’s simply impossible. How to deal with it depends on the context:

  • If the speaker is mumbling, you can put “(inaudible)” in the transcript, along with a timestamp. 
  • If the speaker’s voice is overridden by another sound, you can indicate what that sound was, using “(crosstalk)” or “(laughter)” or similar. 
  •  If the recording itself was interrupted (I’ve had many a Zoom call with stutters in recently) then you can put “(audio gap)” to indicate the reason for the inaudible content. 

As with so many elements of the audio file transcription process, there’s a solution to hand, it just takes care and attention to ensure you use the right one. 

Transcribing Standards

When you transcribe audio to text for a living, you quickly become used to working with different standards for different clients. I find this is just a question of understanding why the client needs the transcription and what they are looking for in it. That can then guide you as to how to transcribe it for them. 

I find these four main transcribing standards cover most bases: 

  • Verbatim transcription: includes every single “um” and “er” that’s on the audio file.
  • Edited transcription: the meaning is preserved without paraphrasing, but filler words (like, you know) and non-words are edited out.
  • Intelligent transcription: the meaning of the speech is delivered, but with paraphrasing and summarising along the way. 
  • Phonetic transcription: a specialist form of transcription that records the speaker’s pronunciation and tone. You’ll need training on how to transcribe in this way if phonetic transcription is a service you plan to offer. 

A quick check with each client as to what they need before you start to transcribe audio to text means that you can always adhere to the right transcribing standard for the job. 

Technical Transcription

Do you have specialist legal or medical knowledge? Are you familiar with the latest tech sector jargon? If you specialise in a certain area, then you are well positioned to transcribe sound to text when it relates to that subject matter. 

The terminology included within technical transcriptions can tie you in knots if you’re not familiar with the topic. Aside from the potential for you to misunderstand a term or simply spell it wrong, you can end up wasting valuable time looking phrases up to check you’ve got them right. 

I would advise against taking on a technical transcription job if you don’t have the required skills. By all means set yourself a goal of learning the become a specialist transcriber – just don’t over-promise a client until you really do have all the knowledge you’re going to need in order to deliver on such a project. 

Multiple Speakers

This can be a really tricky area when you transcribe an audio file to text, rather than a video. In order to produce an accurate transcription, you need to identify who is speaking for each element of the recording. Otherwise, you could pair parts of the discussion with the wrong speaker, making for an inaccurate and confusing transcription. 

How can you identify each speaker? I think it’s one of those tasks that becomes easier with practice. You need to attune your ear to each individual’s pace and style of speech. There will be contextual clues to help you out too, from the format of the conversation to the content being delivered. 


You’ve got a client who wants an enormous transcription job. Excellent! 

Or is it?

One of the challenges with large-scale transcription jobs is working out how to transcribe it all within the client’s deadline. When you transcribe audio to text, it takes about an hour to transcribe 15 minutes of the recording. That means that for every audio hour, you need to allow four hours for transcription.  

For large projects, you’re likely to need some support from your fellow transcribers. You’ll need to work out how many transcriptionists it will take in order to deliver everything in line with the client’s timescales. For multi-staff projects like this, you also need to allow time for managing the process. That excellent-seeming large job can quickly turn into a headache if you misjudge the resources that you’ve going to need! 

Sensitive Content

When you transcribe audio to text for a living, it won’t be long before you find yourself working on sensitive content. You may know how to transcribe, but are you also familiar with IT security requirements and the standard wording of a non-disclosure agreement? If not, I suggest brushing up on these fast, as clients will want to know that their data is safe when you’re working on it. 

How to Transcribe Audio to Text – Step by Step Guide

If you’re ready to transcribe audio to text, simply follow this step by step process:

  • Set up your tech
  • Find the perfect text expander
  • Type
  • Review and proof
  • Think about file type

Set Up Your Tech

You can’t transcribe audio to text easily or quickly without the right setup. You don’t need much, but you do need the right equipment if you want to transcribe audio files to text efficiently. 

You’ll need a PC or laptop with a word processor and software to play the audio file. You’ll also need a keyboard with a play/pause button on it or (even better) a transcription foot pedal that allows you to start and stop the audio without taking your fingers off the keyboard. 

Find the Perfect Text Expander

If you want to learn how to transcribe fast, you’ll also need a text expander programme. These nifty little pieces of software allow you to enter your own abbreviations, which the software then expands for you automatically. 

Personally, I use TypeIt4Me. The interface is super simple, and I can enter any abbreviations and terms that I need. I use it across a huge range of tasks, from entering “BW” at the end of an email to produce “Best wishes,” to signing off with “HALE” at the end of instant messaging conversations (this expands into “Have a lovely evening”). 

When you transcribe audio to text, you can programme the speakers’ names and any common words or phrases that keep cropping up into the text expander. This provides you with your own version of shorthand, with the software typing out in full what you’re entering in note form. As I say, if knowing how to transcribe fast is your priority, this is a real bonus. 


With your text expander in place, it’s time to start typing. The faster you can type, the better. Aim for a minimum typing speed of 50 words per minute if you plan to transcribe audio to text for a living and upwards of 65-70 words per minute if you intend to provide rapid services for time-sensitive jobs. 

If you don’t already touch type, I recommend Mavis Beacon Teaches Typing. This software gamifies learning to touch type and will have you doing so extremely fast, meaning you can boost your typing speed and, ultimately, earn more from your transcription career by fitting more work into each day. 

Review and Proof

Once you’ve finished typing, stop and take a breath. Make a cup of tea, even. Then sit back down and read through what you’ve typed at the same time as listening to the audio file. 

This review stage is really important if you want to transcribe audio to text accurately. It’s your opportunity to proofread and tidy up your transcription, so that you can be confident it’s word-perfect by the time you hand it over to the client. I can’t emphasise enough how important this part of the process is if you want to earn a reputation for knowing how to transcribe accurately. 

Think About File Type

Finally, you need to think about the file format that the client needs. Will a Word document suffice, or would the client prefer a PDF? Or do they need the content in a multimedia format that they can then work with at their end? Get it right if you want to keep your clients happy! 

How to Transcribe Audio to Text with Tomedes’ Transcription Services

If you don’t want to transcribe audio to text yourself, Tomedes is here to help. Our multilingual transcription services take care of all of your transcription needs. We use a simple, four-stage process: 

  • File upload 
  • Transcription by our qualified experts
  • Quality assurance
  • Transcript delivery 

I’ll go through each of these in turn below. In the meantime, if you want to find out more about why Tomedes is the best transcription services in 2021, you can click the following link. 

Read more: The Best Transcription Services in 2021

File Upload

You can use the form on the transcription services page of our website to upload your file and let us know the type of transcription you would like. 

Transcription by Our Qualified Experts

Sit back and relax while our experienced, qualified experts transcribe audio to text for you accurately and efficiently. 

Quality Assurance

We’ll review and proof your transcription to ensure that every I is dotted and every T crossed. 

Delivery of Transcript

We’ll then deliver your transcript on time, in whichever file format you need. Job done! 


When you need to transcribe audio to text, you have the option to do it yourself or to engage a professional transcriptionist. If you can listen well and type fast, a successful career in transcription could await you. You just need to follow our pointers on how to transcribe well: 

  • Set up your tech, including a text expander
  • Type, review and proof
  • Deliver the correct file format

If you’re looking for someone else to transcribe audio to text on your behalf, you can engage a company to:

  • Transcribe from a file you upload
  • Undertake quality assurance on your transcript
  • Provide the file format you need 

However, you prefer to meet your transcription needs, it’s worth talking to Tomedes. We work with talented transcriptionists, transcribing for business clients around the world. Contact us today to find out more! 


By Ofer Tirosh

Ofer Tirosh is the founder and CEO of Tomedes, a language technology and translation company that supports business growth through a range of innovative localization strategies. He has been helping companies reach their global goals since 2007.



Subscribe to receive all the latest updates from Tomedes.

Post your Comment

I want to receive a notification of new postings under this topic


Need expert language assistance? Inquire now

Do It Yourself

I want a free quote now and I'm ready to order my translations.

Do It For Me

I'd like Tomedes to provide a customized quote based on my specific needs.

Want to be part of our team?