Audio Samples from "Noise-Robust Voice Conversion by Conditional Denoising Training (CDT) Using Latant Variables of Speech Quality and Recording Environment"

Voice Conversion (VC) samples

Source (noisy) Target (clean) Baseline (uw, uw) (uw, fw) (fw, uw) (fw, fw)
Sample 1 (jvs091-to-jvs100)
Sample 2 (jvs092-to-jvs099)
Sample 3 (jvs093-to-jvs092)
Sample 4 (jvs095-to-jvs100)
Sample 5 (jvs096-to-jvs100)
Sample 6 (jvs098-to-jvs096)
Sample 7 (jvs099-to-jvs093)
Sample 8 (jvs100-to-jvs096)