Welcome to the Captions Style Guide!
This guide explains Rev’s expectations for captions quality. In addition to this guide, which covers the fundamental elements of a quality captions project, we have a robust Help Center for additional guidance.
Captions are used as part of accessibility services. As such, we need to ensure that a deaf or hard-of-hearing viewer receives a similar experience to that of a hearing viewer.
If you are ordering Premium Service Captions, you may be interested in our Premium Service Captions Supplemental Style Guide Reference.
Browser Compatibility
Rev Recommends that you use the most up-to-date version of Google Chrome when working in the Dash editor
Capturing Content
Spoken Content
RULE: Caption all spoken English words, even if there are pre-existing captions or subtitles on the video. Lightly edit when necessary for readability. Use US spelling.
WHY:To provide a deaf or hard-of-hearing viewer the same experience of watching a video as anyone else.
Rules of thumb for caption accuracy:
-
Maintain the integrity of the spoken words.
- Do not paraphrase, rearrange, or change the speaker's words.
- Caption contractions, formal and informal, as spoken.
- Most projects will have an auto-generated speech recognition draft. Do not overly rely on the draft.
- Lightly edit unscripted productions, but do not omit intentionally spoken words.
-
You are expected to research proper nouns and terminology for representation and proper spelling.
- Watching for terms on screen can be helpful.
- Googling with a bit of context from your video/audio is also helpful.
- URLs, hashtags, social media tags should be captioned using common convention: www.rev.com / #revcaptions / @rev
- Include proper punctuation and capitalization per common English grammar rules.
-
Never type out a censored, silenced, obscured, or beeped/bleeped out word.
- Use an appropriate atmospheric for the sound heard when the word is censored, e.g. (beep)
Speaker Identification
RULE: Indicate speaker changes by using a dash and a space at the beginning of each speaker’s dialogue. This includes the first speaker.
RULE: When the speaker cannot be obviously identified, include a bracketed identifier after the dash and space, also called a speaker ID. Always use appropriate language for speaker IDs.
WHY: So that a deaf or hard-of-hearing viewer will know someone different has started speaking.
When the speaker CAN be visually identified:
Use a dash and space, or speaker label, at the beginning of the speaker’s dialogue.
WHY? So a deaf or hard-of-hearing viewer will know someone different has started speaking.
When the speaker CANNOT be visually identified:
Use a dash, space, and bracketed identifier, or speaker ID, at the beginning of the speaker’s dialogue.
WHY? So a deaf or hard-of-hearing viewer will know who is speaking.
EXCEPTION: For audio-only projects, only a dash and space is needed for speaker changes.
Atmospherics
RULE: Captions need to indicate sounds heard on screen. We call these identifiers atmospherics.
WHY: Atmospherics provide visual indicators of non-verbal sounds to the viewer. This allows the deaf or hard-of-hearing audience to pick up on sounds that are important to the content of the video.
How to create atmospherics:
Do | Don’t |
|
|
Lyrics and Music
RULE: Caption music and lyrics when there is no spoken dialogue occuring at the same time, even if there are pre-existing captions or subtitles on the video.
WHY: When there are no spoken words, the lyrics become the dialogue to be captioned. When lyrics are heard, it is important that they be captioned on-screen for the deaf or hard-of-hearing audience to experience.
Lyrics
- When there is no spoken dialogue occurring at the same time, lyrics should be captioned, even if there are pre-existing captions or subtitles on the video.
- Tip: Googling portions of the lyrics can be helpful.
- Omitting clearly heard lyrics will result in a reduction in score and can be scored as low as 1/1/1.
How to notate lyrics:
Include a musical note at the beginning of the caption group. In Dash, use ## followed by a space to create the musical note. Dash will add a second music note at the end of the caption group after the project is submitted.
Music Atmospherics
- When a project also contains spoken words, only include a background music atmospheric if there’s a significant time gap and it would benefit the viewer to include.
- A common format is a descriptor followed by the word “music.” You can indicate the progression of music with words like begins and continues. E.g. (orchestral music begins)
- Introductory music is a common use case. E.g. (bells chiming)
Atmospheric-Only Projects
RULE: For atmospheric-only projects, accurately caption all sounds that are heard using atmospherics.
WHY: Even when there is no spoken audio in the project, atmospherics tell the story for the deaf or hard-of-hearing viewer.
What is an atmospherics-only project?
- Contains no dialogue or lyrics but does contain sounds
- Contains no dialogue or lyrics but does contain music
- Contains only cartoon gibberish
-
Contains no sound
- If the entire file is blank or silent, the project can be submitted with a single (silence) atmospheric
- Please note that we strongly recommend listening to the full length of the project to ensure that there is no dialogue, sound, or singing later in the file. Omitted dialogue, atmospherics, or singing can result in a score as low as 1/1/1
- If the entire file is blank or silent, the project can be submitted with a single (silence) atmospheric
What is NOT an atmospherics-only project?
- Project with dialogue where none of the dialogue is in English
- Project with singing where none of the singing is in English
Projects that are 100% in a non-English language are considered unworkable. These projects should be unclaimed and reported as having no English content.
How to caption an atmospherics-only project:
- Include atmospherics more frequently than you would in a normal project.
- Caption all environmental sounds, action sounds, character noises or gibberish using an atmospheric.
-
Use detailed atmospherics to capture music.
- Does the instrument convey a tone?
- Does the volume or tempo increase or drop off?
Atmospherics are inherently subjective, and we understand that this isn't always a well-defined area. Please do your best to caption these projects in a way that creates valuable content for hard-of-hearing viewers.
Foreign Language
RULE: For foreign language (non-English) in a project, follow the guidelines below.
WHY: Some customers place orders with foreign languages spoken within the content. We want to be sure we are delivering a product the customer expects, so we’ve put together guidelines on how to handle foreign language. Noting when non-English dialogue is being spoken lets a deaf or hard-of-hearing viewer know that a conversation is taking place.
Difficult or Challenging Content
RULE: You should do your best to caption all spoken words and/or lyrics. For extremely challenging content, follow the guidelines below.
WHY: When dealing with challenging content, consider the best interest of the customer and the final product that will be seen by the deaf or hard-of-hearing viewer.
If your project is entirely too challenging to accurately caption the spoken words:
- Unclaim the project and select “difficult audio” as the reason for unclaiming.
If your project contains both challenging content, and clear content:
- Accurately caption the spoken words that can be clearly heard.
-
For certain challenging sections of audio where it’s not crucial to understand every word, use an atmospheric to provide context to the viewer.
- e.g. (friends conversing quietly), (group chattering)
-
If an occasional word cannot be understood, use (indistinct) in place of the word.
- (indistinct) should only be used if you absolutely cannot determine the word; excess or inappropriate tags could result in a lower grade.
Glossary Terms and Resource Files
RULE: Customers may provide additional information in the form of glossary terms. Customers may choose to upload a resource file with glossary information.
WHY: This additional information is provided to help create accurate captions.
What is a glossary?
The glossary is a section where customers can provide clarification on the spelling of words and terms that may be presented in the video. These can include but are not limited to niche terminology, company names, and speaker names.
Important things to note about glossary terms and resource files:
- Always check projects for included glossary terms. Misspelling a term found in the glossary will result in a lower score.
-
If a customer provides instructions in a glossary or resource file that go against the guidelines found in this guide, they should be ignored. Caption the project as usual.
- These types of instructions are not approved by Rev.
- Following instructions that go against our guidelines can result in scores as low as 1/1/1.
Remember
Glossaries and resource files should only clarify spelling. Any additional instructions are not approved and should not be followed.
Scripts
RULE: Scripts should be used as resources to help create accurate captions. Regular Rev guidelines still apply to projects submitted with scripts.
WHY: When a customer provides a script, their expectation is that we will use their document to create the captions for their video.
General guidelines when following scripts:
-
Follow scripts as closely as possible to ensure that all dialogue is captioned correctly. This will ensure that names and important terms are spelled correctly.
- For some projects, it may be possible to use the scripts exactly as provided by copying and pasting the content into Dash.
- If copy and pasting isn’t an option, the script should be used as a resource to ensure accuracy.
- Some scripts may not be complete and we may need to add in captions for spoken dialogue and/or atmospherics that have not been included.
- Scripts can provide valuable information about characters and scenes. This may be included as notes, separate from the dialogue.
Remember
A script should only be provided to assist with captioning dialogue, names, terms, and sounds. Any additional instructions that do not follow Rev guidelines are not approved and should not be followed.
Formatting Captions
Caption Length
RULE: Individual caption groups must always contain 60 or fewer characters.
WHY: Captions that fall within certain character limits are easier to read quickly. This ensures that viewers do not miss spoken content.
The typing area turns green and then yellow as you add more characters and near the 60 character limit.
It’s perfectly acceptable to submit captions with a white, green or yellow caption group.
The typing box turns red when you are over the 60 character limit. Split the text across multiple caption groups until the red color disappears.
Caption Grouping
RULE: Captions should be split into individual caption groups such that whole phrases, nouns, sentences, and flow of dialogue is interrupted as minimally as possible, while abiding by the 60 character limit.
WHY: This allows the captions to be easily read by the viewer, with logical breaks and enough time on screen.
Split captions...
- after terminal punctuation
- before conjunctions
- before prepositions
Syncing Captions
Caption Timing (Alignment)
RULE: Sync each caption group (atmospherics and speech) so it appears on-screen when the audio begins.
WHY: The deaf or hard-of-hearing viewer should see the text on screen at the same time it would have been heard by anyone else.
The start time needs to align with the beginning of the sound.
- Aim for precision, but it’s ok for the start time to be up to a ½ second early or late from the true beginning of the sound.
Do not worry about the end time.
- Rev automatically calculates the end time for a caption group after you submit the project (post-processing).
- NEVER add extra spaces to a caption group OR double up the captions in an attempt to adjust the amount of time the caption group is on-screen. This causes errors in the file format for customers.
Keep in mind the readability of the caption groups.
- Split the caption groups if a speaker is talking very slowly (more than 5 seconds to say a sentence) or there is a long pause. This maintains proper timing with the speech and ensures the caption group does not end too early.
- Use advanced caption format if multiple speakers are talking very quickly. This combines two quick phrases into the same caption group for readability.
Grading Information
Grading Scale
A grade consists of scores in three categories: Accuracy, Formatting, and Alignment. You can view the rubric below to see how graders will assess your work.
5 - Excellent | Customer ready – Captions contain very few errors and are accurate and high quality. |
4 - Fair | Customer ready – Captions contain occasional errors but are generally accurate and acceptable quality. |
3 - Needs Improvement | Not customer ready – Captions contain frequent errors, and multiple edits would be needed before this is considered customer-ready. |
2 - Poor | Not customer ready – Captions contain very frequent errors, and significant edits would be needed before this is considered customer-ready. |
1 - Unusable | Not customer ready – Captions appear incomplete, partially unedited, or of such poor quality to be unusable.* |
Keep in mind that both the severity and frequency of errors is taken into consideration when assigning a score. |
* If you submit incomplete or unedited projects, your pay for the project will be removed, the project will be graded 1/1/1, and your account may be closed without warning.
Accuracy Rubric
Does the text accurately reflect the spoken audio?
Score |
Quality Description |
5 - Excellent |
Spoken audio is accurately represented throughout. While there may be minor mishears, they do not impact readability or change the meaning of what was said. AND Punctuation is correctly used. Captions are readable. |
4 - Fair |
Spoken audio is accurately represented in the majority of the project. There may be some errors that impact readability or change the meaning of what was said, but they are infrequent. AND/OR Punctuation errors that impact readability may be present but are infrequent. |
3 - Needs Improvement |
Spoken audio is sometimes misrepresented. Multiple errors are present that moderately impact readability or change the meaning of what was said, though the overall capture shows intentional effort. AND/OR Punctuation errors that impact readability may be present and are noticeable. |
2 - Poor |
Spoken audio is frequently misrepresented. Multiple errors are present that significantly impact readability or change the meaning of what was said, though some intentional effort was made. AND/OR Punctuation errors make readability difficult. |
1 - Unusable | Project appears to be incomplete, unedited, or capture is so poor that the final deliverable is unusable. |
Formatting Rubric
Are the captions styled according to Rev's guidelines?
Score |
Quality Description |
5 - Excellent |
Captions are well-formatted, with no egregious style guide violations, though a few errors that don't impact readability may be present. AND Speaker changes have been correctly indicated. |
4 - Fair |
Captions are well-formatted, with no egregious style guide violations, though a few errors that somewhat impact readability may be present. AND/OR Speaker changes have been correctly indicated, though there may be an occasional error. |
3 - Needs Improvement |
Captions are sometimes poorly formatted. Egregious style guide violations are present that moderately impact readability, though the overall capture shows intentional effort. AND/OR Speaker changes are mostly correctly indicated, though there may be more frequent errors. |
2 - Poor |
Captions are poorly formatted. Egregious style guide violations that significantly impact readability are present throughout, though some intentional effort was made. AND/OR Speaker changes are not correctly indicated, with several errors. |
1 - Unusable |
Project appears to be incomplete, unedited, or no attempt was made to follow formatting guidelines. |
Alignment Rubric
Do the caption groups align with the audio?
Score |
Quality Description |
5 - Excellent |
Caption groups are synced within 0.5 seconds of speech throughout the project, with no significant timing errors. |
4 - Fair |
The majority of caption groups are synced within 0.5 seconds of speech, but there are occasional timing errors. Overall readability is not impacted. |
3 - Needs Improvement |
Several caption groups are timed incorrectly, somewhat impacting the readability of the captions. |
2 - Poor |
The majority of caption groups are timed incorrectly, significantly impacting the readability of the captions. |
1 - Unusable |
Project appears to be incomplete or unedited, or no reasonable attempt was made to sync the project. |