Abstract: Current voice cloning models are only trained by approximating the generated speech to match the reference speech, without directly supervising the speaker timbre information during training ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results