Transcription Style Guide Reference

Have more questions? Submit a request

Welcome to the Transcription Style Guide!

The Transcription Style Guide explains Rev's expectations for transcript quality. In addition to this guide, which covers the fundamental elements of a quality transcript, we have a robust Help Center for additional guidance and examples. 

We trust you to deliver high-quality work. Customers rely on your accurate and timely transcription as a  crucial part of their daily work.


Browser Compatibility

Rev Recommends that you use the most up-to-date version of Google Chrome when working in the transcription editors




Always transcribe the audio as spoken. Remember, the transcript should accurately reflect what was actually said in the audio file. Although spoken word is not always grammatically correct, your transcript must preserve the integrity of the original speech. Do not type what you think the speaker meant to say.
Always attribute what is being said to the correct speaker.

  • Do not omit content
    • There are allowable exceptions.
  • Never add content or paraphrase.
  • Never censor or edit expletives/profanity if the word is spoken.
    • If the word is censored with a sound or silenced in the audio, use a notation such as (beep) or (censored) to indicate the censored word.
  • Egregious phonetic and pronunciation errors that inhibit readability or understanding may be corrected to help readability.
    • Example: if a speaker pronounces “refrigerator, washer and dryer” as “refrigurator, warshar and dryear”, please use the correct word and spelling based on the context of the audio.
  • Informal contractions may be corrected in non-verbatim projects to help readability.
    • Informal contractions are short forms of words that people use while speaking casually.
    • You may change these to the formal form when applicable if it would help with readability.
      • ’cause ➜ because
      • ‘em ➜ them
      • doin’ ➜ doing
      • gonna ➜ going to
      • gotta ➜ got to
      • kinda ➜ kind of
      • wanna ➜ want to



Wrong Words

Always use context clues in the audio to type the correct word or phrase. If you are unsure of a word or phrase, try researching, using Lend An Ear, or asking for a second opinion on the forum.


“aerospace” vs “arrow space”
“Botox” vs “boat ox”

Always use context clues to write down the appropriate word. This is especially important for proper nouns or industry terminology.

“looked” vs “loved”
“kissed” vs “killed”

Take your time while transcribing—a changed word could result in a drastic change in the meaning of a sentence.

“than” vs “then”

Be mindful to use the correct form of a homophone.

“desert” vs “dessert”

Sometimes even a single letter can completely change the meaning.



Spelling & Grammar

  • Use standard U.S. spelling.
  • Always research words, phrases and proper nouns (names, companies, titles, etc.) you are unfamiliar with. 
    • If you cannot confirm the spelling of a proper noun through research, use your best guess and keep it consistent throughout the project.
  • Always reference glossary terms when provided. If a customer has provided glossary terms, they will display in the left-hand menu of the editor.
  • Make sure to spell check for spelling and typographical errors.*
  • Use English grammar conventions while maintaining the integrity of what was spoken.
    • We are unable to cover and address specific guidelines regarding grammar.
    • We expect you to have prior knowledge of, or be able to research, American English grammar, capitalization, and punctuation guidelines.


* The spellcheck in the Editor is a very helpful tool to help catch errors, but it is still ultimately up to you to proof your document for spelling errors/incorrect word swaps.



Verbatim vs. Non-Verbatim

In verbatim projects, transcribe exactly what you hear, including filler words, stutters, interjections (active listening) and repetitions. Click here to see an example.

You will be able to tell if a project is verbatim in Find Work (indicated in the TYPE column) and in the editor (listed next to TYPE in the upper right corner, above the playback controls).


If the project was requested Verbatim but was not completed as such, the project will be graded 1/1 for accuracy/formatting.


Non-Verbatim (default style)

In non-verbatim projects, you should lightly edit for readability. You should not change the structure or meaning of the speech. Non-verbatim projects will not have an indicator in the TYPE column in Find Work and are listed as NON-VERBATIM in the editor.


  Verbatim Non-Verbatim

Non-speech sounds  

(laughs) and (laughing) are the only non-speech sounds we capture, and only in verbatim projects.

check.jpg no_x.jpg

All OTHER non-speech sounds  

(e.g. coughs, sneezes, clapping, paper rustling, dog barks, car honks) should not be transcribed.

no_x.jpg no_x.jpg

Interjections or signs of active listening that interrupt a speaker.

(e.g. Okay, Yeah, Mm-hmm) 

check.jpg no_x.jpg

Filler words (um, uh)

Also known as “verbal pauses”; other words such as like or you know may also be used like this.

check.jpg no_x.jpg

False starts / self-corrections that are quickly reworded, unless they provide additional context.

A complete sentence is not a false start. 

check.jpg no_x.jpg


(e.g. I think we should go to the, the m- m- movies.)

check.jpg no_x.jpg
Explicit content or profanity should be captured as spoken (or as censored) in both default and verbatim projects. check.jpg check.jpg
Singing should be noted only as (singing) in both default and verbatim projects; do not transcribe the lyrics to a song in a transcript. check.jpg check.jpg


Examples: Verbatim vs Non-Verbatim

  Verbatim Non-Verbatim
Example 1 And so, um, I guess… I think we should go to the, the m- m- movies tonight ‘cause of the discount (laughs). I think we should go to the movies tonight because of the discount.
Example 2 I like, you know, called her, like, yesterday and, um, like, she was, like, sleeping. Probably, she was just like, really tired. I called her yesterday and she was sleeping. Probably, she was just really tired.
Example 3
Leave the false start in default style because it provides context as to who called.
My mom was (laughs)… I forgot to tell you, she called me yesterday. My mom was… I forgot to tell you, she called me yesterday.

Example 4

Remove the false start in default style because “My mom” is introduced later.

My mom… I forgot to tell you, my mom called me yesterday. I forgot to tell you, my mom called me yesterday.




An inaudible tag should be used when unintelligible or inaudible words are spoken. This may happen due to difficult audio quality, a sound (such as a car horn) obscuring the main speaker, or recording issues. This tag should never be used in place of research when you are unfamiliar with a term.



  • Excessive Inaudibles: If you are using an excessive number of inaudibles in a transcript (to the point where the transcript would be unusable to the customer), unclaim and report the file as difficult audio.
  • Incorrect use of the inaudible tag is an error
    • Using the tag when the word can be identified is an accuracy error. 
    • Incorrectly formatting the tag is a formatting error, as explained under Notation Tags.




Provided Speaker Labels

If a customer has provided speaker labels, they will appear in the information pane in the left-hand panel of the editor. You must use them if:

  • The speaker is self-identified in the audio or video. 
    • “My name is Arnold”
  • You can reasonably infer who is speaking if another speaker introduces the name.

    •  “What do you think, Gustav?”

  • There is only one speaker and one name is provided.

  • You can use the process of elimination to assign the correct speaker names (e.g. one male name and one female name match up with one male speaker and one female speaker).


If you cannot assign the provided speaker labels, follow the guidelines below for Inferred Speaker Labels.



Inferred Speaker Labels

A reasonable effort must be made to distinguish speakers using the rules below: 

  • Never create your own descriptive speaker labels (e.g. “Old man” or “Blue shirt guy”).
    • This is extremely unprofessional and will result in a 1 in Formatting


  • Please make every effort to not use gender in any format for speaker labels.
    • This can be considered offensive in some scenarios, and other options must be explored.
    • While this would not qualify as an automatic 1 in formatting, this will result in a reduction in score.
    • Speaker + number, roles/titles, or group-type labels can be used, depending on the scenario in the project.


Speaker Label Type Examples When to Use
Speaker + Number Speaker 1, Speaker 2 Default and most common way of labeling speakers when the speaker’s name cannot be reasonably inferred from the audio or video.
Speaker’s Name John Smith, Sara, Professor Lee If the speaker’s name can be reasonably inferred from the audio or video. If labels were not provided by the customer, Speaker + Number is also acceptable in this scenario.
Professional Role or Title Interviewer, Doctor, Translator  (Optional) If the speaker’s name cannot be reasonably inferred from the transcript. Using Speaker + Number is also acceptable.
Group Label

Students, Audience, Camera Crew, Speaker X

Only when there are too many speakers to consistently track who says what (e.g. classroom discussion, focus group). Do not use as a substitute for reasonable speaker identification.
Customer-provided speaker labels must be used whenever possible according to the guidelines in the previous section.



Notation Tags

If you encounter difficult or non-English audio, use one of the bracketed notation tags below, including a timestamp of the audio location. Also take note of the parenthetical tags used for singing, laughter, and censored content. Do not create a notation tag not listed below. 


Notation Tag

When to Use

[inaudible hh:mm:ss] Use when unintelligible or inaudible words are stated. Equivalent to a “blank” in medical transcription.
[foreign language hh:mm:ss]

For any non-English portions of audio, indicate where they begin with a timestamp and either the name of the language (if known) or simply “foreign language”. DO NOT transcribe non-English audio.

If a translator is speaking on a respondent's behalf, there is no need to denote [foreign language hh:mm:ss] every time that the respondent speaks. 

(singing) Used only if the lyrics cannot be clearly discerned because of challenging audio or unclear singing. 
(laughs) or (laughing) Used to indicate laughter in verbatim files only.
(beep), (censored) Used to indicate words that have been intentionally censored in the audio (usually profanity or redacted content). DO NOT censor content if it is spoken in the audio.




Occasionally, customers dictate instructions to format the transcription while they are speaking. These instructions should be followed when possible but never transcribed.

  • Follow customer requests for spoken directions such as “new paragraph”, “comma”, “period” or “bullet point” (use a dash). Do not type out the instruction.
  • If a customer has clearly missed an instruction (e.g. “period” after a sentence has obviously concluded), it’s acceptable to add it in to aid readability.
  • As Rev does not support text formatting in the editor, ignore requests such as “bold”, “italics”, “underline” or “strikethrough”.



Lyrics and Singing

Transcribe lyrics when there is no spoken dialogue occuring at the same time, even if there are pre-existing captions or subtitles on the video. When there are no spoken words, the lyrics become the dialogue to be transcribed. 

  • Omitting clearly heard lyrics will result in a reduction in score and can be scored as low as 1/1.
  • Tip: Googling portions of the lyrics can be helpful.


How to notate lyrics

  • Use the speaker label MUSIC for lyrics
    • If there are multiple singers and/or background vocalists, all content should remain under a single MUSIC label
  • Each lyrics line should be on a new line, or in a new paragraph in the editor
  • Capitalize the beginning of a line of lyrics
  • Each lyrics line should end with a period
  • Background vocals, if present, should be transcribed and included on a new line
    • Do not use parentheses for background vocals
  • All lyrics need to be transcribed as sung, including repeated lines
    • Do not use elements like (repeat x2)
  • When words repeat at the end of a song and fade out, ellipses may be used to represent the fadeout
  • Filler words in lyrics, like oh, ah, etc, should be transcribed sparingly if they add to the song's content stylistically
  • When these are background vocals, they can be omitted


Instrumental Music Only

Music should only be noted if the project only contains instrumental music. Music should not be noted in a project with dialogue or singing. 

A project with only instrumental music will contain music and no dialogue, speaking, singing, or lyrics. If a project has only instrumental music, it can be submitted with a single (music) notation tag with MUSIC as the speaker label.



Please note that we strongly recommend listening to the full length of the project to ensure that there is no dialogue or singing later in the file. Omitted dialogue or singing can result in a score as low as 1/1.



Additional Guidelines

Unworkable Projects

Certain types of projects are considered “unworkable” and should not be completed. You can unclaim a project by selecting Unclaim in the Project dropdown in Line. Unclaim projects if they meet the criteria below.




The only audible content is in a non-English language Unclaim the project as “No English audio present”

The content violates our Terms of Service

(pornography, excessively violent, hate speech, etc)

Unclaim the project as “Contains explicit or disturbing content”


If you submit a project that is 100% foreign language, you may receive a grade of 1/1 and have the project pay removed.



Project-Specific Instructions

Occasionally a project may have approved special instructions that deviate from our normal guidelines. These instructions will be clearly marked as Special Instructions in the editor with either a yellow banner or in a designated Special Instructions section in the left-hand menu. They will also appear on the Find Work page. 

Customers will sometimes include separate instructions that go against our Style Guide in the glossary or speaker name section. Any customer-provided requests that do not appear in the designated Special Instructions section or banner and that go against our style guidelines should be ignored.


Not following official special instructions is considered an error and may result in a score of 1/1.



Grading Scale

In transcription, a grade consists of scores in two categories: Accuracy and Formatting. This scoring rubric is used by graders when assessing overall project quality.


5 - Excellent Customer ready Transcript contains very few errors and is accurate and high quality.
4 - Fair Customer ready Transcript contains occasional errors but is generally accurate and acceptable quality.
3 - Needs Improvement  Not customer ready – Transcript contains frequent errors, and multiple edits would be needed before this is considered customer-ready.
2 - Poor Not customer ready – Transcript contains very frequent errors, and significant edits would be needed before this is considered customer-ready.
1 - Unusable Not customer ready – Transcript appears incomplete, partially unedited, or of such poor quality to be unusable.*



Accuracy Rubric

The overall accuracy quality of the graded sections can be described as…

5 - Excellent 4 - Fair 3 - Needs Improvement 2 - Poor 1 - Unusable

Spoken audio is accurately represented.


Contains very few accuracy and/or punctuation errors that minimally impact meaning or readability.


Speech is almost always attributed to the correct speaker, though there may be very rare misattributions.

Spoken audio is generally well-represented.


Contains occasional accuracy and/or punctuation errors that moderately impact meaning or readability.


Speech is very often attributed to the correct speaker, though there may be rare misattributions.

Spoken audio is sometimes misrepresented.


Contains frequent accuracy and/or punctuation errors that regularly impact meaning or readability.


Speech is usually attributed to the correct speaker, though there may be occasional misattributions.

Spoken audio is often misrepresented.


Contains very frequent accuracy and/or punctuation errors that significantly impact meaning or readability.


Speech is sometimes attributed to the correct speaker, though there may be frequent misattributions.

Appears to be incomplete, unedited, or of such poor quality that the final deliverable is unusable.


This includes verbatim projects captured in the default style.


Formatting Rubric

The overall formatting quality of the graded sections can be described as…

5 - Excellent 4 - Fair 3 - Needs Improvement 2 - Poor 1 - Unusable

Contains very few notation tag or labelling errors that minimally impact readability.

Contains few notation tag or labelling errors that moderately impact readability.

Contains regular notation tag or labelling errors that impact readability.

Contains frequent notation tag or labelling errors that significantly impact readability.

Appears to be incomplete, partially unedited, or with poor adherence to the formatting guidelines.


This includes unformatted dictation projects and verbatim projects captured in the default style.



Was this article helpful?
0 out of 0 found this helpful