Cedille is a project to bring large language models to non-English languages.
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the mesh-transformer-jax codebase.
Anna was trained on German text with a similar methodology to Boris, our French model. We started training from GPT-J, which has been trained on The Pile. As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna") model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for GPT-J models in float32 precision. The first trick would be to load the model with the specific argument below to load only one copy of the weights.
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna") model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.
model.eval() input_sentence = "Wo hast du unsere Sprache gelernt?" input_ids = tokenizer.encode(input_sentence, return_tensors='pt') beam_outputs = model.generate( input_ids, max_length=100, do_sample=True, top_k=50, top_p=0.95, num_return_sequences=1 ) print(tokenizer.decode(beam_outputs, skip_special_tokens=True))
For any custom development please contact us at email@example.com.
- Downloads last month