RVC (Retrieval-based Voice Conversion) is a powerful AI voice conversion technology. Unlike "Voice Cloning (input text to output speech)", RVC's operation involves "inputting an original sound clip," and the AI retains the original speaking or singing intonation, emotion, and rhythm, but replaces the timbre with your specified model target.
This technology is most commonly used to create "AI Covers", such as having a famous singer sing someone else's song, or to hide one's real voice for live streaming and video dubbing.
This is your original sound source to be converted:
Load the voice of the person you want to "become" here:
RVC's most powerful aspect is its highly flexible parameters. Proper adjustment can save voice cracking, missing audio, or make cross-gender covers sound seamless.
| Pitch Shift (Pitch) |
Function: Changes the fundamental frequency (pitch) premise of the input sound,
measured in "semitones". Usage: If it's originally a male voice and you want to convert it with a female model, it's recommended to set Pitch to +12 (up one octave). Conversely, for female
to male, set it to -12. Keep it at 0 for same-gender conversion. If
you need to change the song's key itself to fit the model's vocal range, you can also fine-tune
like `+1` or `-2`.
|
|---|---|
| F0 Prediction Algorithm |
Function: The method the AI uses to track your original articulation pitch
curve. Options:
|
| Index Rate |
Function: Controls the degree to which voice characteristics "lean towards the
model" (range 0~1). Requires a .index file to take effect.Usage: Higher values: Articulation and tone will be more like the model itself, but if the original audio quality is poor, an excessively high index rate will cause various weird "artifacts (buzzing)". Lower values: Will have more traces of your own original voice. Recommendation: Default 0.75. If weird noises appear, lower it to
0.3 ~ 0.5.
|
| Filter Radius |
Function: When the pitch (F0) trajectory fluctuates wildly (e.g., prediction
errors caused by breathiness or voice cracking), use this value for median filtering to smooth
it. Effective only when greater than 3. Usage: If you notice the converted voice suddenly has extremely unnatural "off-key or popping sounds" in certain sections, you can increase this item to weaken this abrupt change. Recommendation: Default 3. Increase if you encounter breathiness
or popping.
|
| RMS Mix Rate |
Function: Determines how much the "volume change of the output sound" refers to
the "volume of the original input sound". Usage: Default 0.25 means only a quarter of the output sound's
volume fluctuation follows your input, and three-quarters are determined by the model. If you
check this (or increase the value closer to 1), where the AI shouts or whispers, the volume
changes will more closely restore the level of your original recording.Recommendation: Check it if you want obvious emotional fluctuations (large difference between loud and soft); uncheck it (set to 0) if you want the volume of every sentence to be very even. |
Solution:
Please ensure that the filenames of your .pth or .index files, and the "folder
name" they are in, absolutely do not contain any spaces, Chinese characters, Japanese
characters, or special symbols!
RVC's underlying Python is very strict about path parsing. Please change folder and model names entirely
to pure English + numbers.
Solution:
This indicates the underlying Demucs vocal separation module failed.
The cause is likely insufficient graphics card memory (VRAM) to load these two resource-heavy AI
processes simultaneously. It's recommended to first go to the "Vocal Separation" menu, manually extract
a pure vocal file, then bring that file here for voice conversion, and finally use video editing
software to layer it with the accompaniment yourself.
Solution:
This is because RVC's algorithm caught the wrong vocal range. The most effective solution is: Change the
F0 Prediction Algorithm!
If you originally used rmvpe, change it to fcpe or crepe and run
it again; this usually results in massive improvement. Additionally, please check if your
Pitch shift is set backward.