|
--- |
|
license: mit |
|
datasets: |
|
- cerebras/SlimPajama-627B |
|
- oscar-corpus/OSCAR-2301 |
|
- bigcode/starcoderdata |
|
language: |
|
- fr |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- legal |
|
- art |
|
- code |
|
- finance |
|
- medical |
|
- text-generation-inference |
|
--- |
|
|
|
# CroissantLLM: A not so flaky bilingual 1.3B model |
|
|
|
An experimental mode trained on a small subsplit of the final data. |
|
|
|
### Usage |
|
|
|
```python |
|
model_name = "croissantllm/base_50k" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
inputs = tokenizer("His name is Bob. -> Il s'appelle Bob.\nHe is heading to the market. -> Il va au marché.\nWe are heading to the beach, let's go together. ->", return_tensors="pt").to(model.device) |
|
tokens = model.generate(**inputs, max_length=100, do_sample=True, top_p=0.95, top_k=60, temperature=0.5) |
|
print(tokenizer.decode(tokens[0])) |
|
|
|
# remove bos token |
|
inputs = tokenizer("France -> Paris, Italie -> Rome, Allemagne -> Berlin, Espagne ->", return_tensors="pt", add_special_tokens=False).to(model.device) |
|
tokens = model.generate(**inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60) |
|
print(tokenizer.decode(tokens[0])) |
|
``` |