License

#11

by mrfakename - opened Feb 10

Feb 10

Hi @steveheh ,
Congrats on the launch! Thanks for releasing this. Might it be possible to switch the license to a more permissive one?
Thanks!

nithinraok

NVIDIA org Feb 11

This is the best we could do for this model, however we are working on new models with more permissible license.

nithinraok changed discussion status to closed Feb 11

mrfakename

Feb 11

Hi,
Thanks for the response! Is it the training data that restricts usage?
Thanks!

halbefn

Apr 19

As far as I can tell, the whole training data is freely available. Most of it was used for training earlier CC-BY Nemo models.
Did anyone try to replicate the training yet?

@nithinraok is there a timeline for the new models?

halbefn

15 days ago

It turns out that some training data is not freely available.

"The Canary-1B model is trained on a total of 85k hrs of speech data. It consists of 31k hrs of public data, 20k hrs collected by Suno, and 34k hrs of in-house data."

In this commit we even got changing hour counts for the public data: https://huggingface.co/nvidia/canary-1b/commit/e2ec44628860649c9fee47ea2f591c4ebb542c02t

So it is actually not possible to replicate the training, as far as I can tell.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment