maddes8cht commited on
Commit
2eedde8
1 Parent(s): e22b841

"Update README.md"

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - fr
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - LLM
9
+ inference: false
10
+ ---
11
+ [![banner](https://maddes8cht.github.io/assets/buttons/Huggingface-banner.jpg)]()
12
+
13
+ ## I am still building the structure of these descriptions.
14
+
15
+ These will contain increasingly more content to help find the best models for a purpose.
16
+
17
+ # vigogne-falcon-7b-instruct - GGUF
18
+ - Model creator: [bofenghuang](https://huggingface.co/bofenghuang)
19
+ - Original model: [vigogne-falcon-7b-instruct](https://huggingface.co/bofenghuang/vigogne-falcon-7b-instruct)
20
+
21
+ Vigogne-Falcon-7B-Instruct is a Falcon-7B model fine-tuned to follow the French instructions.
22
+
23
+
24
+
25
+
26
+ # About GGUF format
27
+
28
+ `gguf` is the current file format used by the [`ggml`](https://github.com/ggerganov/ggml) library.
29
+ A growing list of Software is using it and can therefore use this model.
30
+ The core project making use of the ggml library is the [llama.cpp](https://github.com/ggerganov/llama.cpp) project by Georgi Gerganov
31
+
32
+ # Quantization variants
33
+
34
+ There is a bunch of quantized files available. How to choose the best for you:
35
+
36
+ # legacy quants
37
+
38
+ Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
39
+ Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
40
+ Falcon 7B models cannot be quantized to K-quants.
41
+
42
+ # K-quants
43
+
44
+ K-quants are based on the idea that the quantization of certain parts affects the quality in different ways. If you quantize certain parts more and others less, you get a more powerful model with the same file size, or a smaller file size and lower memory load with comparable performance.
45
+ So, if possible, use K-quants.
46
+ With a Q6_K you should find it really hard to find a quality difference to the original model - ask your model two times the same question and you may encounter bigger quality differences.
47
+
48
+
49
+
50
+ # Original Model Card:
51
+ <p align="center" width="100%">
52
+ <img src="https://huggingface.co/bofenghuang/vigogne-falcon-7b-instruct/resolve/main/vigogne_logo.png" alt="Vigogne" style="width: 40%; min-width: 300px; display: block; margin: auto;">
53
+ </p>
54
+
55
+ # Vigogne-Falcon-7B-Instruct: A French Instruction-following Falcon Model
56
+
57
+ Vigogne-Falcon-7B-Instruct is a [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) model fine-tuned to follow the French instructions.
58
+
59
+ For more information, please visit the Github repo: https://github.com/bofenghuang/vigogne
60
+
61
+ ## Usage
62
+
63
+ ```python
64
+ import torch
65
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
66
+ from vigogne.preprocess import generate_instruct_prompt
67
+
68
+ model_name_or_path = "bofenghuang/vigogne-falcon-7b-instruct"
69
+
70
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side="right", use_fast=False)
71
+ tokenizer.pad_token = tokenizer.eos_token
72
+
73
+ model = AutoModelForCausalLM.from_pretrained(
74
+ model_name_or_path,
75
+ torch_dtype=torch.float16,
76
+ device_map="auto",
77
+ trust_remote_code=True,
78
+ )
79
+
80
+ user_query = "Expliquez la différence entre DoS et phishing."
81
+ prompt = generate_instruct_prompt(user_query)
82
+ input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(model.device)
83
+ input_length = input_ids.shape[1]
84
+
85
+ generated_outputs = model.generate(
86
+ input_ids=input_ids,
87
+ generation_config=GenerationConfig(
88
+ temperature=0.1,
89
+ do_sample=True,
90
+ repetition_penalty=1.0,
91
+ max_new_tokens=512,
92
+ ),
93
+ return_dict_in_generate=True,
94
+ pad_token_id=tokenizer.eos_token_id,
95
+ eos_token_id=tokenizer.eos_token_id,
96
+ )
97
+ generated_tokens = generated_outputs.sequences[0, input_length:]
98
+ generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
99
+ print(generated_text)
100
+ ```
101
+
102
+ You can also infer this model by using the following Google Colab Notebook.
103
+
104
+ <a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_instruct.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
105
+
106
+ ## Limitations
107
+
108
+ Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.<center>
109
+ [![GitHub](https://maddes8cht.github.io/assets/buttons/github-io-button.png)](https://maddes8cht.github.io)
110
+ [![Stack Exchange](https://stackexchange.com/users/flair/26485911.png)](https://stackexchange.com/users/26485911)
111
+ [![GitHub](https://maddes8cht.github.io/assets/buttons/github-button.png)](https://github.com/maddes8cht)
112
+ [![HuggingFace](https://maddes8cht.github.io/assets/buttons/huggingface-button.png)](https://huggingface.co/maddes8cht)
113
+ [![Twitter](https://maddes8cht.github.io/assets/buttons/twitter-button.png)](https://twitter.com/maddes1966)
114
+ </center>