Weighted generative adversarial network for many-to-many voice conversion

Main Authors: Paul, Dipjyoti, Pantazis, Yannis, Stylianou, Yannis
Format: Proceeding eJournal
Bahasa: eng
Terbitan: Deutsche Gesellschaft für Akustik (DEGA e. V.) & RWTH Publications , 2019
Subjects:
Online Access: https://zenodo.org/record/3528013
Daftar Isi:
  • The goal of voice conversion (VC) is to convert speech from a source speaker to that of a target, without chang-ing phonetic contents. VC usually relies on parallel data for training, which limits its practical applications.Existing approaches are also limited in handling multiple speakers, since different models should be built inde-pendently for every speaker pair. To tackle that, a variant of Generative Adversarial Network (StarGAN-VC)were introduced that allows many-to-many mapping instead of learning all the pairwise transformations. More-over, StarGAN-VC can handle non-parallel data, i.e., speakers do not need to utter the same sentences. In thispaper, we suggest an algorithmic variation of StarGAN training where suitable weights are introduced. Weightswhich modify the Generator’s gradient value aim to put more power to fake samples that fool the Discriminator.The suggested algorithm results in a stronger Generator. We refer to this variation as weighted-StarGAN (weS-tarGAN). In weStarGAN, the convergence of the training performance is accelerated. More importantly, theproposed algorithm achieves significant improvement against baseline StarGAN-VC concerning speech subjec-tive quality for both speech quality and speaker similarity.