What's the Difference Between Captions and Transcription?

Have more questions? Submit a request

Transcription and Captions


difference between transcription and caption chart


While the aim of transcriptionists is to create an accurate transcript of the English speech heard, the aim of captioners is to recreate the full audio experience for non-hearing viewers which includes capturing the English speech, any singing, and using atmospherics to describe the music and sounds integral to the context of the story. The ultimate goal of captions is to give the viewer as close to the same experience as the hearing viewer. Captioners also sync caption groups to the audio.  


Speaker labeling

Dashes indicate speaker changes. In general, use only a dash "-" and space when it's obvious through visual clues who's speaking. Add a speaker label, [Name], if the speaker cannot be visually identified as speaking before being interrupted by another speaker. If the name is not known, use the most appropriate role descriptor such as Instructor, Narrator, Announcer, et cetera. 


Proper caption breaking

Caption groups should be created for best readability Start a new caption group after terminal punctuation, period, question mark, exclamation mark or a double dash for an abrupt interruption by another speaker or relevant sound. A caption group cannot exceed five seconds nor 60 characters in length. Break before pronouns, adverbs, and prepositional phrases such as: that, who, in order to, not only, as we, in which, where, with, what, how, for, through, until, to, as, of, yet, so, by, as well as conjunctions such as and, nor, but, or, because. Here's an example of good caption breaking:

It's invaluable as far as what it's going to do

for my job security and my options when I get out

of school and start looking for full-time work.

I don't miss school appointments or school plays.

Those are benefits that you can't get in an office.

I'm not sure how it can get much better than that.


Syncing caption groups

In the sync stage, use the Up or Down Arrow key to sync each caption group so it appears on screen when the audio begins. The start time needs to align with the beginning of the sound. This applies to both atmospherics and speech. Aim for precision, but it’s okay for the start time to be up to a ½ second early or late from the start of the sound.



Caption lyrics when there is no spoken dialogue occurring at the same time. In the absence of spoken words, the lyrics become the dialogue to be captioned. Add a musical eighth note “♪” at the start of every caption group containing the lyrics by typing ## followed by a space in Dash.



Captions need to indicate sounds heard on screen. These identifiers, which we call atmospherics, provide visual indicators of non-verbal sounds to viewers. Use adjectives to describe mood music, i.e., (bright piano music), and use active verbs to describe relevant sounds heard, i.e., (jet engine roaring) or (audience cheering). Keep these points in mind:

  • Use parentheses ( ) and lowercase unless a proper noun is used
  • Use a noun + descriptor/verb in present tense form
  • Use present tense
  • For music, include adjectives describing the type of music
Was this article helpful?
475 out of 498 found this helpful