CTRMAXXING ∕∕ SIGNAL DROP · MAY ’26NETWORK ONLINE · 1,248 OPERATORS
ctrmaxxingv0.4 · invite-only
SCRIPTS · May 26, 2026 · 7 min read

How long should a YouTube script be by video format

Words-per-minute math for spoken narration, target script lengths for every faceless format, and how script length ties to retention and re-hook cadence.

Script length is not arbitrary. It flows directly from the video length you are targeting, the words-per-minute pace of your narration, and the retention structure the script has to support. Most operators either underwrite (thin scripts that force the editor to pad with filler B-roll) or overwrite (dense scripts that push the video past the point where average view duration drops off).

Below is the math we use across the channels we operate, broken down by format.

The baseline: spoken words per minute

The standard range for clear, narrated YouTube voiceover is 130-150 words per minute (WPM). Faster than 150 WPM and the audience has to work too hard to follow. Slower than 130 WPM and the pacing feels labored.

Most AI voice tools, including the ElevenLabs Asher preset that we use for most of our channels, default to around 135-140 WPM with natural sentence pauses accounted for. If you are recording your own voiceover, you are probably reading faster than you think and landing closer to 145-155 WPM before any post-processing.

For planning purposes, use 140 WPM as your working number. This gives you a simple formula:

Target video length in minutes x 140 = target script word count

A 10-minute video targets 1,400 words of narrated script. That is not the total document length (you will have production notes, chapter markers, and B-roll directions in the working draft), but it is the narrated word count that the final voiceover should hit.

Shorts: 100-150 words

A 60-second short at 140 WPM is 140 words of narration. That is a hook, two or three supporting sentences, and a close. There is no room for a slow build, no room for context-setting, and no room for a re-hook. The entire script is a single-unit hook.

The mistake we see on faceless shorts is writing 200-word scripts and then speeding up the narration to fit. The result sounds rushed and the viewer cannot follow the argument. If a short requires more than 150 words to make the point, the point is not right for a short. Either narrow the topic or save it for long-form.

One structural note: because there is no space for a re-hook in a short, the 90-second re-hook cadence does not apply. The entire short has to work as a single-tension arc from the first sentence.

Explainer format (4-6 minutes): 560-840 words

A 5-minute explainer at 140 WPM is 700 words. This is the most common faceless format for channels in business, science, history, and curiosity niches. It is also the format where script length errors are most costly.

At 560-700 words, the script has room for:

  • A cold open (5-25 seconds, roughly 60-80 words)
  • A pivot sentence at the 30-second mark
  • Two to three substantive sections with a re-hook landing around the 90-second mark
  • A close

If you write to 840 words for a 6-minute target and the script runs long in the edit, you have buffer. If you write 560 words for a 5-minute target and the script is thin, you will feel the padding pressure in post.

For explainers, the first-30-seconds structure is especially important because the format does not give you long enough to recover from a weak cold open. The 5-minute mark is where AVD data typically shows a sharp decision point. Viewers who are still watching at 4:30 will usually finish the video. Viewers who dropped before 2 minutes rarely make it back.

Narrative deep-dives (8-12 minutes): 1,120-1,680 words

A 10-minute narrative at 140 WPM targets 1,400 words. This format is used for business post-mortems, historical investigations, and industry-focused stories where the topic has enough depth to sustain a longer arc.

The structural demands at this length are higher. A 1,400-word script needs at least two re-hooks to hold AVD through the video. The first re-hook lands around the 90-second mark (roughly 210 words in). The second lands around the 4-minute mark (roughly 560 words in). Without both, the retention graph shows a predictable step-down at each of those points.

At this length, chapter structure becomes important for both retention and SEO. Five or six chapters with specific-noun titles keep the viewer oriented and help the video rank for sub-topic queries. Read the chapter timestamps guide for the formatting rules that get YouTube to recognize them.

One practical note: operators who write at the low end of this range (1,120 words for an 8-minute target) often find the final edit runs 9-10 minutes after B-roll transitions and natural pacing. Build the script to the low end of the word count and let post-production fill the time, rather than writing long and cutting for time in the edit.

Long-form deep-dives (13-17 minutes): 1,820-2,380 words

A 15-minute narrative at 140 WPM is 2,100 words. This is the format used for corporate-collapse stories, complex multi-act historical narratives, and investigative formats where the payoff requires building a case over time.

At this length the script is doing a significant amount of structural work. The minimum re-hook count is three: at 90 seconds, at 5 minutes, and at 9-10 minutes. Each re-hook should use fresh language and re-state the stakes from a slightly different angle. Repeating the same forward-tease phrasing across multiple re-hooks trains viewers to tune them out.

Chapter structure is essential at this length. A 15-minute video without chapters has a higher drop-off rate because viewers who lose the thread at the 6-minute mark have no on-ramp back into the narrative. With 6-8 well-titled chapters, those viewers can skip to the next beat and remain engaged. The chapter timestamps guide covers the specific formatting rules.

At 2,100+ words, the script is also long enough that AI-tell density becomes a real risk. Models generating a 2,100-word script in a single pass will cluster certain phrases and patterns that are not visible at the paragraph level but are obvious at the full-script level. Running a deterministic AI-tell check before finishing the script is worth doing. See the AI tells guide for the specific patterns to catch.

Script length and re-hook placement together

The re-hook cadence is not a separate decision from script length. It is a structural feature of the script that should be planned at the same time as the word count target.

A practical way to think about it: place a re-hook at every 200-250 words of narration for long-form content, starting at the 200-word mark. For a 2,100-word script, that means re-hooks at roughly 200, 500, 900, 1,400, and 1,800 words. Not every one of those has to be a full re-hook structure. Some can be a single forward-look sentence. But the rhythm of re-commitment every 200-250 words keeps the AVD curve flat across the second half of the video.

The table below summarizes the target ranges:

| Format | Video length | Target word count | Re-hooks needed | |---|---|---|---| | Short | ~60 seconds | 100-150 words | None | | Explainer | 4-6 minutes | 560-840 words | 1 (at ~90 seconds) | | Narrative | 8-12 minutes | 1,120-1,680 words | 2 (90 seconds, 4 minutes) | | Deep-dive | 13-17 minutes | 1,820-2,380 words | 3+ (90 seconds, 5 minutes, 9-10 minutes) |

The overwriting trap

The instinct when writing for a longer video is to write more. The error this produces is scripts that are padded with context the viewer did not ask for, context-setting that arrives too late, and transitions that restate what was just said rather than promising what comes next.

A 2,100-word deep-dive script that is tight is harder to write than a 2,800-word script that rambles. The discipline of hitting the word count target without padding is what forces the cuts that improve retention. If you are consistently writing 20-30% over the target word count, the script is doing repetitive work that the viewer will fast-forward through. That fast-forwarding registers as a drop in the AVD even if the viewer never actually leaves.

Word count targets are a floor and a ceiling, not just a ceiling. Hitting the floor means every section is earning its time. Hitting the ceiling means the pacing never drags.