RVC Model Cloud Training

Prologue: Creating Your Exclusive AI Voice

This guide will show you how to use Google Colab's free computing power to train high-quality RVC voice conversion models. For the best results, we standardize the process into three stages:

  1. Data Preparation: Use this application and UVR5 to create pure clean vocals.
  2. Cloud Training: Use Google Colab's powerful GPUs for training.
  3. Model Import: Place the trained model into this application for use.

Stage 1: Data Preparation (The Most Important Step)

The quality of a model is 90% dependent on the quality of the dataset. Please ensure your training audio is an absolutely clean voice with no background music (BGM) and no reverberation (Reverb/Echo).

1. Initial Separation (Using this application)

Use this software's "Vocal Separation" feature:
  1. Input your audio/video source.
  2. Check Save Clean Vocals As (WAV).
  3. Execute and retrieve vocals.wav.

2. Removing Reverb and Noise (Using UVR5) Crucial

Vocals separated by this application usually still have spatial reverberation, which will cause the model training to fail (the voice sounds muddy). Please make sure to use UVR5 (Ultimate Vocal Remover) for secondary processing.

UVR5 Recommended Settings (De-Reverb):
  • Process Method: VR Architecture
  • Window Size: 320 (Default) or 512
  • Aggression Setting: 10
  • Model: Select 5_HP-Karaoke-UVR.pth or VR - DeEcho-DeReverb.pth
  • After execution, you will get extremely clean, close-miked sound, which is standard training material!

3. Packaging the Dataset (Detailed Instructions)

File Preparation Principles:
  • Quantity: Please prepare multiple short audio files (10~15 seconds each recommended), totaling about 10~30 minutes.
  • Format: Must be in WAV format (PCM_16), sample rate 44100Hz or 48000Hz, mono is preferred.
  • Naming: Please use English or numbers for filenames (e.g., 001.wav, 002.wav), avoiding special symbols.
Packaging Steps:
  1. Create a folder and name it dataset.
  2. Place all prepared .wav files into this folder.
  3. Right-click the folder and select "Add to archive" to create a dataset.zip file.

Structure Example:
dataset.zip
└── dataset/
    ├── 001.wav
    ├── 002.wav
    └── ...

Stage 2: Google Colab Cloud Training

We recommend using Applio Colab (currently the most powerful RVC modification).

1. Prerequisite Setup

After entering the Colab page:
  1. Top menu "Runtime" -> "Change runtime type".
  2. Hardware accelerator: select T4 GPU.
  3. Click "Connect" in the top right corner.

2. Starting Execution and Entering WebUI

Click the play button (▶) on the left of each block in order:

3. Training Parameter Recommendations

In the WebUI's Train tab:
  • Experiment Name: Choose an English name (e.g., my_voice).
  • Sample Rate: 40k or 48k.
  • Process Data: Upload your dataset.zip.
  • Extract Features: F0 Method select rmvpe (best results).
  • Train Model:
    • Total Epochs: 100 ~ 300 (train more rounds if you have less data).
    • Batch Size: T4 graphics card can handle 8 ~ 12.
Click Train Model to begin training!

Stage 3: Import to this application

Once training is complete, download the .pth model file and .index index file from Colab.

  1. Go to this software's directory models/RVC/.
  2. Create a new folder (e.g., My_AI_Voice).
  3. Place the two downloaded files into this folder.
  4. Open this application -> Real-time VC.
  5. Load your new model in the model field, and start voice conversion!