bofenghuang commited on
Commit
0c6e02d
1 Parent(s): 2e3e0bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -24
README.md CHANGED
@@ -1,11 +1,10 @@
1
  ---
2
- language:
3
- - fr
4
  pipeline_tag: text-generation
5
- library_name: transformers
6
  inference: false
7
  tags:
8
  - LLM
 
9
  - llama
10
  - llama-2
11
  ---
@@ -14,11 +13,11 @@ tags:
14
  <img src="https://huggingface.co/bofenghuang/vigogne-2-7b-chat/resolve/v2.0/logo_v2.jpg" alt="Vigogne" style="width: 30%; min-width: 300px; display: block; margin: auto;">
15
  </p>
16
 
17
- # Vigogne-2-7B-Chat-V2.0: A Llama-2 based French chat LLM
18
 
19
- Vigogne-2-7B-Chat-V2.0 is a French chat LLM, based on [LLaMA-2-7B](https://ai.meta.com/llama), optimized to generate helpful and coherent responses in user conversations.
20
 
21
- Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) and [GitHub repository](https://github.com/bofenghuang/vigogne) for more information.
22
 
23
  **Usage and License Notices**: Vigogne-2-7B-Chat-V2.0 follows Llama-2's [usage policy](https://ai.meta.com/llama/use-policy). A significant portion of the training data is distilled from GPT-3.5-Turbo and GPT-4, kindly use it cautiously to avoid any violations of OpenAI's [terms of use](https://openai.com/policies/terms-of-use).
24
 
@@ -27,14 +26,57 @@ Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023
27
  All previous versions are accessible through branches.
28
 
29
  - **V1.0**: Trained on 420K chat data.
30
- - **V2.0**: Trained on 520K data. Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Usage
33
 
34
  ```python
 
35
  import torch
36
  from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, TextStreamer
37
- from vigogne.preprocess import generate_inference_chat_prompt
38
 
39
  model_name_or_path = "bofenghuang/vigogne-2-7b-chat"
40
  revision = "v2.0"
@@ -45,18 +87,22 @@ model = AutoModelForCausalLM.from_pretrained(model_name_or_path, revision=revisi
45
  streamer = TextStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
46
 
47
 
48
- def infer(
49
- utterances,
50
- system_message=None,
51
- temperature=0.1,
52
- top_p=1.0,
53
- top_k=0,
54
- repetition_penalty=1.1,
55
- max_new_tokens=1024,
56
  **kwargs,
57
  ):
58
- prompt = generate_inference_chat_prompt(utterances, tokenizer, system_message=system_message)
59
- input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(model.device)
 
 
 
 
60
  input_length = input_ids.shape[1]
61
 
62
  generated_outputs = model.generate(
@@ -68,23 +114,32 @@ def infer(
68
  top_k=top_k,
69
  repetition_penalty=repetition_penalty,
70
  max_new_tokens=max_new_tokens,
71
- eos_token_id=tokenizer.eos_token_id,
72
- pad_token_id=tokenizer.pad_token_id,
73
  **kwargs,
74
  ),
75
  streamer=streamer,
76
  return_dict_in_generate=True,
77
  )
 
78
  generated_tokens = generated_outputs.sequences[0, input_length:]
79
  generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
80
- return generated_text
81
 
 
 
 
 
 
 
 
 
 
 
82
 
83
- user_query = "Expliquez la différence entre DoS et phishing."
84
- infer([[user_query, ""]])
85
  ```
86
 
87
- You can utilize the Google Colab Notebook below for inferring with the Vigogne chat models.
88
 
89
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
90
 
 
1
  ---
2
+ language: fr
 
3
  pipeline_tag: text-generation
 
4
  inference: false
5
  tags:
6
  - LLM
7
+ - finetuned
8
  - llama
9
  - llama-2
10
  ---
 
13
  <img src="https://huggingface.co/bofenghuang/vigogne-2-7b-chat/resolve/v2.0/logo_v2.jpg" alt="Vigogne" style="width: 30%; min-width: 300px; display: block; margin: auto;">
14
  </p>
15
 
16
+ # Vigogne-2-7B-Chat-V2.0: A Llama-2-based French Chat LLM
17
 
18
+ Vigogne-2-7B-Chat-V2.0 is a French chat LLM, based on [LLaMA-2-7B](https://ai.meta.com/llama), optimized to generate helpful and coherent responses in conversations with users.
19
 
20
+ Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) and [GitHub repository](https://github.com/bofenghuang/vigogne) for more information.
21
 
22
  **Usage and License Notices**: Vigogne-2-7B-Chat-V2.0 follows Llama-2's [usage policy](https://ai.meta.com/llama/use-policy). A significant portion of the training data is distilled from GPT-3.5-Turbo and GPT-4, kindly use it cautiously to avoid any violations of OpenAI's [terms of use](https://openai.com/policies/terms-of-use).
23
 
 
26
  All previous versions are accessible through branches.
27
 
28
  - **V1.0**: Trained on 420K chat data.
29
+ - **V2.0**: Trained on 520K data. Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
30
+
31
+
32
+ ## Quantized Models
33
+
34
+ The quantized versions of this model are generously provided by [TheBloke](https://huggingface.co/TheBloke)!
35
+
36
+ - AWQ: [TheBloke/Vigogne-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-AWQ)
37
+ - GTPQ: [TheBloke/Vigogne-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GPTQ)
38
+ - GGUF: [TheBloke/Vigogne-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GGUF)
39
+
40
+ ## Prompt Template
41
+
42
+ We utilized prefix tokens `<user>` and `<assistant>` to distinguish between user and assistant utterances.
43
+
44
+ You can apply this formatting using the [chat template](https://huggingface.co/docs/transformers/main/chat_templating) through the `apply_chat_template()` method.
45
+
46
+ ```python
47
+ from transformers import AutoTokenizer
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained("bofenghuang/vigogne-2-7b-chat")
50
+
51
+ conversation = [
52
+ {"role": "user", "content": "Bonjour ! Comment ça va aujourd'hui ?"},
53
+ {"role": "assistant", "content": "Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?"},
54
+ {"role": "user", "content": "Quelle est la hauteur de la Tour Eiffel ?"},
55
+ {"role": "assistant", "content": "La Tour Eiffel mesure environ 330 mètres de hauteur."},
56
+ {"role": "user", "content": "Comment monter en haut ?"},
57
+ ]
58
+
59
+ print(tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True))
60
+ ```
61
+
62
+ You will get
63
+
64
+ ```
65
+ <s><|system|>: Vous êtes l'assistant IA nommé Vigogne, créé par Zaion Lab (https://zaion.ai). Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.
66
+ <|user|>: Bonjour ! Comment ça va aujourd'hui ?
67
+ <|assistant|>: Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?</s>
68
+ <|user|>: Quelle est la hauteur de la Tour Eiffel ?
69
+ <|assistant|>: La Tour Eiffel mesure environ 330 mètres de hauteur.</s>
70
+ <|user|>: Comment monter en haut ?
71
+ <|assistant|>:
72
+ ```
73
 
74
  ## Usage
75
 
76
  ```python
77
+ from typing import Dict, List, Optional
78
  import torch
79
  from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, TextStreamer
 
80
 
81
  model_name_or_path = "bofenghuang/vigogne-2-7b-chat"
82
  revision = "v2.0"
 
87
  streamer = TextStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
88
 
89
 
90
+ def chat(
91
+ query: str,
92
+ history: Optional[List[Dict]] = None,
93
+ temperature: float = 0.7,
94
+ top_p: float = 1.0,
95
+ top_k: float = 0,
96
+ repetition_penalty: float = 1.1,
97
+ max_new_tokens: int = 1024,
98
  **kwargs,
99
  ):
100
+ if history is None:
101
+ history = []
102
+
103
+ history.append({"role": "user", "content": query})
104
+
105
+ input_ids = tokenizer.apply_chat_template(history, add_generation_prompt=True, return_tensors="pt").to(model.device)
106
  input_length = input_ids.shape[1]
107
 
108
  generated_outputs = model.generate(
 
114
  top_k=top_k,
115
  repetition_penalty=repetition_penalty,
116
  max_new_tokens=max_new_tokens,
117
+ pad_token_id=tokenizer.eos_token_id,
 
118
  **kwargs,
119
  ),
120
  streamer=streamer,
121
  return_dict_in_generate=True,
122
  )
123
+
124
  generated_tokens = generated_outputs.sequences[0, input_length:]
125
  generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
 
126
 
127
+ history.append({"role": "assistant", "content": generated_text})
128
+
129
+ return generated_text, history
130
+
131
+
132
+ # 1st round
133
+ response, history = chat("Un escargot parcourt 100 mètres en 5 heures. Quelle est sa vitesse ?", history=None)
134
+
135
+ # 2nd round
136
+ response, history = chat("Quand il peut dépasser le lapin ?", history=history)
137
 
138
+ # 3rd round
139
+ response, history = chat("Écris une histoire imaginative qui met en scène une compétition de course entre un escargot et un lapin.", history=history)
140
  ```
141
 
142
+ You can also utilize the Google Colab Notebook below for inferring with the Vigogne chat models.
143
 
144
  <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
145