Skip to main content

Transcribe Command

The transcribe command extracts audio from a video file or YouTube video and transcribes it using Whisper. This is useful for generating text content from videos or for accessibility purposes.

Usage

transfer-learning transcribe PATH [OPTIONS]

Arguments

ArgumentDescription
PATHPath to video file or YouTube URL

Options

OptionDescriptionDefault
--output-dir TEXTOutput directory for transcriptstranscripts/
--model TEXTWhisper model to use (tiny, base, small, medium, large)base
--device TEXTDevice to use for transcription (cpu, cuda)cpu
--language TEXTLanguage code (auto for auto-detection)auto
--format TEXTOutput format (txt, srt, vtt, json)txt
--helpShow help message and exit-

Examples

Transcribe a local video file

transfer-learning transcribe path/to/video.mp4

Transcribe a YouTube video

transfer-learning transcribe "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Transcribe with a specific model

transfer-learning transcribe path/to/video.mp4 --model medium

Transcribe with GPU acceleration

transfer-learning transcribe path/to/video.mp4 --device cuda

Transcribe to a specific format

transfer-learning transcribe path/to/video.mp4 --format srt

Output

The command generates a transcript file in the specified format in the output directory:
output_dir/
└── video_name_transcript.txt

Whisper Models

The command supports the following Whisper models:
ModelSizeMemory RequiredRelative Speed
tiny39M~1GB~32x
base74M~1GB~16x
small244M~2GB~6x
medium769M~5GB~2x
large1550M~10GB1x
The larger models provide better accuracy but require more memory and processing time.

Output Formats

The command supports the following output formats:
  • txt: Plain text transcript
  • srt: SubRip subtitle format
  • vtt: WebVTT subtitle format
  • json: JSON format with timestamps and confidence scores

JSON Format Example

When using the json format, the output will look like this:
{
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 5.0,
      "text": "Hello, welcome to this video.",
      "confidence": 0.95
    },
    {
      "id": 1,
      "start": 5.0,
      "end": 10.0,
      "text": "Today we're going to talk about...",
      "confidence": 0.92
    }
  ],
  "language": "en",
  "duration": 120.5,
  "word_count": 150,
  "processing_time": 45.2
}