The Subtitle Generation Module uses OpenAI's open-source Whisper Series (Fast-Whisper implementation) as its recognition core. It can automatically transcribe any major video or audio file (such as MP4, MKV, MP3, M4A, WAV, FLAC, AAC, OGG) into accurate, time-stamped subtitles (SRT/VTT).
This system not only supports basic subtitle generation but also integrates Auto-Translation, Bilingual Subtitles layout, and a one-click Auto Burn-in feature, allowing you to produce finished, subtitled videos without opening other editing software.
We provide four different tiers of models. You can choose based on your PC's performance and accuracy needs:
Specify the language spoken in the video. While setting it to Auto is usually accurate, if
the video has very sparse dialogue or loud background music, manually specifying the language (e.g.,
English) can significantly improve accuracy.
This is a highly important feature. If checked, the AI will first analyze "where there is human voice"
before transcribing.
Purpose: Prevents the AI from hallucinating subtitles (like meaningless symbols or
repeated words) during purely musical or silent segments.
Recommendation: Turn on manually depending on the situation. It
is turned off by default in the latest version to prevent accidentally deleting soft-spoken
or whispered dialogue.
Uses Pyannote AI for audio track separation, automatically recognizing how many people are speaking in
the video, and pre-pending speaker tags (e.g., [SPEAKER_00]:,
[SPEAKER_01]:) to the beginning of subtitles.
hf_). This is a free service. If you encounter authorization errors, please follow these steps
to verify:
Our subtitle generation core utilizes a powerful Dual-Layer Overlay Dynamic Calculation Engine. It not only guarantees compatibility with all video resolutions but also allows the "Primary Subtitle" and "Secondary Subtitle (Original Text)" to have completely independent visual designs without interfering with each other!
On the right side of the interface, we provide an intuitive Tabview:
Within each tab, you can independently configure the following parameters, and the settings will sync in real-time to the Preview above:
PrimaryStyle / SecondaryStyle calculation tech, even if you overlap two
lines with different colored boxes, the layout will never break!
Cause: Insufficient PC memory to load the large model.
Solution: Please downgrade the model to medium or
large-v2 (Good). The difference in results is usually minor, but it allows the
program to run smoothly.
Solution: This issue has been fixed in the latest version. The system now forces extraction of the original audio track and transcodes it to AAC format, ensuring audio is fully preserved.
A: This is an incredible tool for learning foreign languages! When checked, the system simultaneously displays both the "Translated/Target Language" and the "Original", formatting them automatically (Translated on top, Original on bottom).