SRC4VC: Smartphone-Recorded Corpus for Voice Conversion

ダウンロード (Download link):
SRC4VC_ver1.zip (Feb. 29: ver. 1 w/ 3.4 GB)

概要:

SRC4VCは，日本語母語話者100名によるスマートフォン収録音声のコーパスです．
本コーパスは，「エンドユーザが所有する実デバイスで収録された音声を高品質に変換可能な音声変換技術」の実現を目指して構築されています．
テキストは既存のコーパスから借用し，音声はLancersによるクラウドソーシングで収集しました．
収録された音声データ（48000Hz/16bit wav）に加え，Miipher の非公式実装で復元した音声データ（22050Hz/16bit wav）を含んでいます．
研究目的であれば無償で利用可能ですが，再配布・公序良俗に反する利用などの行為はご遠慮ください．
論文などで利用される場合，下記の通りに引用していただければ幸いです．

齋藤佑樹, 五十嵐琢斗, 関健太郎, 高道慎之介, 山本龍一, 橘健太郎, 猿渡洋, "SRC4VCデータセット：多話者音声変換モデルのベンチマークを目的とした実デバイス収録音声コーパス," 電子情報通信学会研究報告, 2024-02-SIP-SP-EA-SLP, 2024年2月.

Summary:

The SRC4VC corpus consists of smartphone-recorded speech uttered by 100 native Japanese speakers.
This corpus is designed with the aim of realizing high-quality voice conversion (VC) from end-users' degraded speech input.
The text was borrowed from existing corpora, and the voices were collected through crowdsourcing using Lancers.
In addition to the recorded voice data (48000Hz/16bit wav), the audio data restored by the unofficial implementation of Miipher (22050Hz/16bit wav) is included.
The materials may be used free of charge for research purposes, but please refrain from redistribution or use that is offensive to public order and morals.
If you wish to use this information in your paper, please cite the following paper:

Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuuichi Yamamoto, Kentaro Tachibana, and Hiroshi Saruwatari, "SRC4VC: Smartphone-recorded corpus for voice conversion benchmark," Proc. INTERSPEECH, pp. 1825--1829, Kos, Greece, Sep. 2024. (Paper)

収録物 (Contents):

wav & txt: スマートフォン収録音声 (smartphone-recorded speech samples) & テキスト (text)

RECITATIONサブセット: 読み上げ調 (reading), ITAコーパスより引用
JVNVサブセット: 感情発話 (emotional), JVNVコーパスより引用
CALLSサブセット: 対話調 (conversational), CALLSコーパスより引用
STUDIESサブセット: 対話調 (conversational), STUDIESコーパスより引用
SONGSサブセット: 歌声 (singing), 童謡「かたつむり」と魔王魂「シャイニングスター」の1フレーズより引用

wav-R: Miipher で復元した音声 (Miiphered speech)
emo (for CALLS, JVNV, and STUDIES subsets): クラウドソーシングで収集した感情ラベル (crowdsourced emotion labels)
speaker-wise recording quality scores: クラウドソーシングで収集した話者ごとの録音品質スコア (MOS)

Sample 1 (SRC4VC001, MOS = 4.13):
Sample 2 (SRC4VC042, MOS = 3.66):
Sample 3 (SRC4VC099, MOS = 3.33):
Sample 4 (SRC4VC068, MOS = 3.01):
Sample 3 (SRC4VC064, MOS = 1.69):

デモ (Demonstrations):

更新情報 (Update information):

コーパス Version 1 を公開しました (2024/02/29) / Version 1 is available online (Feb. 29, 2024)

主な開発者 (Main developers):

齋藤佑樹 (東京大学情報理工学系研究科) / Yuki Saito at The University of Tokyo, Japan.
五十嵐琢斗 (東京大学情報理工学系研究科) / Takuto Igarashi at The University of Tokyo, Japan.
関健太郎 (東京大学情報理工学系研究科) / Kentaro Seki at The University of Tokyo, Japan.
高道慎之介 (東京大学情報理工学系研究科) / Shinnosuke Takamichi at The University of Tokyo, Japan.
山本龍一 (LINEヤフー株式会社) / Ryuichi Yamamoto at LY Corp., Japan.
橘健太郎 (LINEヤフー株式会社) / Kentaro Tachibana at LY Corp., Japan.
猿渡洋 (東京大学情報理工学系研究科) / Hiroshi Saruwatari at The University of Tokyo, Japan.

謝辞 (Acknowledgements):

本研究は，LINEヤフー株式会社と東京大学猿渡・高道研究室の共同研究プロジェクトとして実施した． / This research was conducted as a joint research project between LY Corp. and Saruwatari-Takamichi Lab. at The University of Tokyo, Japan.