Text Generation
Transformers
Safetensors
French
English
llama
legal
code
text-generation-inference
art
conversational
Inference Endpoints
manu commited on
Commit
f05665c
1 Parent(s): df360ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -36
README.md CHANGED
@@ -68,61 +68,43 @@ Our work can be cited as:
68
 
69
  This model is a Chat model, that is, it is finetuned for Chat function and works best with the provided template.
70
 
71
- #### With pipeline
72
-
73
- ```python
74
- import torch
75
- from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
76
-
77
-
78
- model_name = "croissantllm/CroissantLLMChat-v0.1"
79
- tokenizer = AutoTokenizer.from_pretrained(model_name)
80
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
81
-
82
- messages = [
83
- {"role": "user", "content": "Qui est le président francais ?"},
84
- ]
85
-
86
- pipe = pipeline(
87
- "text-generation",
88
- model=model,
89
- tokenizer=tokenizer,
90
- )
91
-
92
- generation_args = {
93
- "max_new_tokens": 50,
94
- "return_full_text": False,
95
- "temperature": 0.2,
96
- "do_sample": True,
97
- }
98
-
99
- output = pipe(messages, **generation_args)
100
- print(output[0]['generated_text'])
101
- ```
102
 
103
  #### With generate
104
 
105
  This might require a stopping criteria on <|im_end|> token.
106
 
107
  ```python
108
-
109
  import torch
110
  from transformers import AutoModelForCausalLM, AutoTokenizer
111
 
112
 
113
  model_name = "croissantllm/CroissantLLMChat-v0.1"
114
  tokenizer = AutoTokenizer.from_pretrained(model_name)
115
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  chat = [
118
- {"role": "user", "content": "Que puis-je faire à Marseille en hiver?"},
119
  ]
120
 
121
  chat_input = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
122
 
123
- inputs = tokenizer(chat_input, return_tensors="pt", add_special_tokens=True).to(model.device)
124
- tokens = model.generate(**inputs, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=60, temperature=0.3)
 
125
  print(tokenizer.decode(tokens[0]))
 
 
126
  ```
127
 
128
 
 
68
 
69
  This model is a Chat model, that is, it is finetuned for Chat function and works best with the provided template.
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  #### With generate
73
 
74
  This might require a stopping criteria on <|im_end|> token.
75
 
76
  ```python
 
77
  import torch
78
  from transformers import AutoModelForCausalLM, AutoTokenizer
79
 
80
 
81
  model_name = "croissantllm/CroissantLLMChat-v0.1"
82
  tokenizer = AutoTokenizer.from_pretrained(model_name)
83
+ model = AutoModelForCausalLM.from_pretrained(model_name)
84
+
85
+
86
+ generation_args = {
87
+ "max_new_tokens": 256,
88
+ "do_sample": True,
89
+ "temperature": 0.3,
90
+ "top_p": 0.90,
91
+ "top_k": 40,
92
+ "repetition_penalty": 1.05,
93
+ "eos_token_id": [tokenizer.eos_token_id, 32000],
94
+ }
95
 
96
  chat = [
97
+ {"role": "user", "content": "Qui est le président francais actuel ?"},
98
  ]
99
 
100
  chat_input = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
101
 
102
+ inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
103
+ tokens = model.generate(**inputs, **generation_args)
104
+
105
  print(tokenizer.decode(tokens[0]))
106
+ # print tokens individually
107
+ print([(tokenizer.decode([tok]), tok) for tok in tokens[0].tolist()])
108
  ```
109
 
110