How to Make WebVTT Subtitle Files

Updated: September 12, 20256 min read

Two production-ready workflows for WebVTT captions in QuickLRC.

WebVTT (Web Video Text Tracks) is the HTML5-native caption format for browsers, streaming platforms, and OTT apps. QuickLRC pairs a hands-on VTT Maker with an AI-driven VTT Generator so you can cover precise editorial work and fast automation in a single workflow. This guide highlights both approaches while keeping your exports compliant with the WebVTT specification.

Understand the WebVTT Essentials

WebVTT files start with a WEBVTT header and cue blocks that look familiar to SRT users but use periods for fractional seconds. Keep these structural rules in mind while editing:

  • Header: Begin the document with WEBVTT on the first line.
  • Cue identifier: Optional line for numbering or labeling each cue.
  • Timestamp range: Use HH:MM:SS.mmm --> HH:MM:SS.mmm with periods as decimal separators.
  • Caption text: One or more lines, supporting basic formatting like italics, bold, and speaker cues.
  • Blank line: Leave an empty line between cue blocks.
WEBVTT

1
00:00:01.200 --> 00:00:04.500
Welcome to our video tutorial

2
00:00:05.000 --> 00:00:08.300
WebVTT files keep subtitles
perfectly in sync

3
00:00:09.100 --> 00:00:12.800
Stay consistent with cue labels
and include a blank line

Manual Workflow with VTT Maker

Reach for the VTT Maker whenever you need detailed control over timing, speaker labeling, and formatting cues. The editor mirrors the WebVTT rule set while letting you preview captions against the media timeline.

1. Prepare Your Assets

  • Gather the source audio or video and a clean transcript or script.
  • Note speaker changes and sound effects you plan to caption.
  • Organize sentences or phrases the way you expect them to appear on-screen.

2. Upload and Segment in VTT Maker

Open /manual-sync, load your media, and paste the transcript. Break long paragraphs into cues that stay under two lines to protect readability on small screens.

3. Mark Timings and Styling

Use the playback controls to set start and end times, then add optional styling tags like <i> for emphasis or <v Speaker> for voice labels. Leave at least 1 frame between cues to avoid overlap warnings.

WebVTT requires periods for milliseconds—QuickLRC enforces this so your cues stay standards-compliant.

4. Export and Review

Download the `.vtt` file and preview it in your browser or HTML5 player. Run it through the WebVTT Validator to confirm cue order, spacing, and timing.

AI Workflow with VTT Generator

Let the VTT Generator handle speech recognition and cue timing when you're working on longer programs, rapid turnarounds, or multi-language deliverables.

  1. Upload media: Visit /auto-sync, choose your file, and select the correct language model for the spoken audio.
  2. AI transcription: QuickLRC transcribes speech, assigns timestamps, and structures each cue following WebVTT syntax automatically.
  3. Proofread the text: Correct names, numbers, or specialized terminology. The editor preserves cue timing while you edit.
  4. Fine-tune cues: Split lengthy captions, adjust cue padding, or add speaker labels for clarity.
  5. Export the VTT file: Download, preview, and validate before publishing or handing off to your platform.

Manual vs. AI: Pick the Right Workflow

VTT Maker (Manual)

  • Best for short or premium content where every line and styling choice matters.
  • Supports custom cue identifiers, comments, and styling tags.
  • Requires real-time playback and detailed editorial attention.

VTT Generator (AI)

  • Ideal for long-form episodes, webinars, or batches of localized tracks.
  • Automatically handles timing and line length—just apply final edits.
  • Delivers multiple language tracks quickly from the same source media.

Quality Checks Before Publishing

Keep these checkpoints on your finishing checklist to ensure platform compatibility:

  • Header integrity: Confirm the file begins with `WEBVTT` followed by a blank line.
  • Timestamp precision: Verify period separators and two-digit hours/minutes/seconds for every cue.
  • Readability: Maintain 32–42 characters per line and break cues on natural pauses.
  • Validation: Drop the export into the WebVTT Validator to catch overlapping cues, malformed tags, or trailing whitespace.

Ready for Your Next Subtitle Project?