tolgadev commited on
Commit
7b47afa
1 Parent(s): 2d16527

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -0
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model_name: Trendyol-LLM-7b-chat-v1.0-gguf
3
+ model_creator: Trendyol
4
+ base_model: Trendyol/Trendyol-LLM-7b-chat-v1.0
5
+ language:
6
+ - tr
7
+ - en
8
+ pipeline_tag: text-generation
9
+ license: apache-2.0
10
+ model_type: llama
11
+ library_name: transformers
12
+ inference: false
13
+ tags:
14
+ - trendyol
15
+ - llama-2
16
+ - turkish
17
+ quantized_by: tolgadev
18
+ ---
19
+ ## Trendyol-LLM-7b-chat-v1.0-gguf models
20
+ ----
21
+ ## Description
22
+
23
+ This repo contains all types of GGUF formatted model files for [Trendyol-LLM-7b-chat-v1.0](https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-v1.0).
24
+
25
+ <img src="https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-v1.0/resolve/main/trendyol-llm-mistral.jpg"
26
+ alt="drawing" width="400"/>
27
+
28
+ ## Quantized LLM models and methods
29
+ | Name | Quant method | Bits | Size | Max RAM required | Use case |
30
+ | ---- | ---- | ---- | ---- | ---- | ----- |
31
+ | [Trendyol-LLM-7b-chat-v1.0.Q2_K.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q2_K.gguf) | Q2_K | 2 | 2.59 GB| 4.88 GB | smallest, significant quality loss - not recommended for most purposes |
32
+ | [Trendyol-LLM-7b-chat-v1.0.Q3_K_S.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q3_K_S.gguf) | Q3_K_S | 3 | 3.01 GB| 5.56 GB | very small, high quality loss |
33
+ | [Trendyol-LLM-7b-chat-v1.0.Q3_K_M.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q3_K_M.gguf) | Q3_K_M | 3 | 3.36 GB| 5.91 GB | very small, high quality loss |
34
+ | [Trendyol-LLM-7b-chat-v1.0.Q3_K_L.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q3_K_L.gguf) | Q3_K_L | 3 | 3.66 GB| 6.20 GB | small, substantial quality loss |
35
+ | [Trendyol-LLM-7b-chat-v1.0.Q4_0.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q4_0.gguf) | Q4_0 | 4 | 3.9 GB| 6.45 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
36
+ | [Trendyol-LLM-7b-chat-v1.0.Q4_K_S.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q4_K_S.gguf) | Q4_K_S | 4 | 3.93 GB| 6.48 GB | small, greater quality loss |
37
+ | [Trendyol-LLM-7b-chat-v1.0.Q4_K_M.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q4_K_M.gguf) | Q4_K_M | 4 | 4.15 GB| 6.69 GB | medium, balanced quality - recommended |
38
+ | [Trendyol-LLM-7b-chat-v1.0.Q5_0.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q5_0.gguf) | Q5_0 | 5 | 4.73 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
39
+ | [Trendyol-LLM-7b-chat-v1.0.Q5_K_S.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q5_K_S.gguf) | Q5_K_S | 5 | 4.75 GB| 7.27 GB | large, low quality loss - recommended |
40
+ | [Trendyol-LLM-7b-chat-v1.0.Q5_K_M.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q5_K_M.gguf) | Q5_K_M | 5 | 4.86 GB| 7.40 GB | large, very low quality loss - recommended |
41
+ | [Trendyol-LLM-7b-chat-v1.0.Q6_K.gguf](https://huggingface.co/tolgadev/Trendyol-LLM-7b-chat-v1.0-GGUF/blob/main/trendyol-llm-7b-chat-v1.0.Q6_K.gguf) | Q6_K | 6 | 5.61 GB| 8.15 GB | very large, extremely low quality loss |
42
+
43
+ The names of the quantization methods follow the naming convention: "q" + the number of bits + the variant used (detailed below). Here is a list of all the models and their corresponding use cases, based on model cards made by [TheBloke](https://huggingface.co/TheBloke/):
44
+
45
+ * `q2_k`: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.
46
+ * `q3_k_l`: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
47
+ * `q3_k_m`: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
48
+ * `q3_k_s`: Uses Q3_K for all tensors
49
+ * `q4_0`: Original quant method, 4-bit.
50
+ * `q4_1`: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
51
+ * `q4_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K
52
+ * `q4_k_s`: Uses Q4_K for all tensors
53
+ * `q5_0`: Higher accuracy, higher resource usage and slower inference.
54
+ * `q5_1`: Even higher accuracy, resource usage and slower inference.
55
+ * `q5_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K
56
+ * `q5_k_s`: Uses Q5_K for all tensors
57
+ * `q6_k`: Uses Q8_K for all tensors
58
+
59
+ **TheBloke recommends using Q5_K_M** as it preserves most of the model's performance.
60
+ Alternatively, you can use Q4_K_M if you want to save some memory.
61
+ In general, K_M versions are better than K_S versions.
62
+
63
+ ## How to download GGUF files
64
+
65
+ **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
66
+
67
+ The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
68
+ - LM Studio
69
+ - LoLLMS Web UI
70
+ - Faraday.dev
71
+
72
+ ## Special thanks to [TheBloke on Huggingface](https://huggingface.co/TheBloke) and [Maxime Labonne on Github](https://github.com/mlabonne/llm-course)
73
+
74
+ -----
75
+
76
+
77
+ # **Trendyol LLM v1.0 - DPO**
78
+ Trendyol LLM v1.0 - DPO is a generative model that is based on Mistral 7B model. DPO training was applied. This is the repository for the chat model.
79
+
80
+ ## Model Details
81
+
82
+ **Model Developers** Trendyol
83
+
84
+ **Variations** [base](https://huggingface.co/Trendyol/Trendyol-LLM-7b-base-v1.0), [chat](https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-v1.0), and dpo variations.
85
+
86
+ **Input** Models input text only.
87
+
88
+ **Output** Models generate text only.
89
+
90
+ **Model Architecture** Trendyol LLM is an auto-regressive language model (based on Mistral 7b) that uses an optimized transformer architecture. Huggingface TRL lib was used for training. The DPO version is fine-tuned on 11K sets (prompt-chosen-reject) with the following trainables by using LoRA:
91
+
92
+ - **lr**=5e-6
93
+ - **lora_rank**=64
94
+ - **lora_alpha**=128
95
+ - **lora_trainable**=q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj
96
+ - **lora_dropout**=0.05
97
+ - **bf16**=True
98
+ - **beta**=0.01
99
+ - **max_length**= 1024
100
+ - **max_prompt_length**= 512
101
+ - **lr_scheduler_type**= cosine
102
+ - **torch_dtype**= bfloat16
103
+
104
+ <img src="https://camo.githubusercontent.com/3e61ca080778f62988b459c7321726fa35bb3776ceb07ecaabf71ebca44f95a7/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f74726c2d696e7465726e616c2d74657374696e672f6578616d706c652d696d616765732f7265736f6c76652f6d61696e2f696d616765732f74726c5f62616e6e65725f6461726b2e706e67"
105
+ alt="drawing" width="600"/>
106
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_diagram.png"
107
+ alt="drawing" width="600"/>
108
+
109
+ ## Usage
110
+
111
+ ```python
112
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
113
+
114
+ model_id = "Trendyol/Trendyol-LLM-7b-chat-v1.0"
115
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
116
+ model = AutoModelForCausalLM.from_pretrained(model_id,
117
+ device_map='auto',
118
+ load_in_8bit=True)
119
+
120
+ sampling_params = dict(do_sample=True, temperature=0.3, top_k=50, top_p=0.9)
121
+
122
+ pipe = pipeline("text-generation",
123
+ model=model,
124
+ tokenizer=tokenizer,
125
+ device_map="auto",
126
+ max_new_tokens=1024,
127
+ return_full_text=True,
128
+ repetition_penalty=1.1
129
+ )
130
+
131
+ DEFAULT_SYSTEM_PROMPT = "Sen yardımcı bir asistansın ve sana verilen talimatlar doğrultusunda en iyi cevabı üretmeye çalışacaksın.\n"
132
+
133
+ TEMPLATE = (
134
+ "[INST] {system_prompt}\n\n"
135
+ "{instruction} [/INST]"
136
+ )
137
+
138
+ def generate_prompt(instruction, system_prompt=DEFAULT_SYSTEM_PROMPT):
139
+ return TEMPLATE.format_map({'instruction': instruction,'system_prompt': system_prompt})
140
+
141
+ def generate_output(user_query, sys_prompt=DEFAULT_SYSTEM_PROMPT):
142
+ prompt = generate_prompt(user_query, sys_prompt)
143
+ outputs = pipe(prompt,
144
+ **sampling_params
145
+ )
146
+ return outputs[0]["generated_text"].split("[/INST]")[-1]
147
+
148
+ user_query = "Türkiye'de kaç il var?"
149
+ response = generate_output(user_query)
150
+ print(response)
151
+ ```
152
+
153
+ with chat template:
154
+ ```python
155
+ pipe = pipeline("conversational",
156
+ model=model,
157
+ tokenizer=tokenizer,
158
+ device_map="auto",
159
+ max_new_tokens=1024,
160
+ repetition_penalty=1.1
161
+ )
162
+
163
+ messages = [
164
+ {"role": "user", "content": "Türkiye'de kaç il var?"}
165
+ ]
166
+
167
+ outputs = pipe(messages, **sampling_params)
168
+ print(outputs)
169
+ ```
170
+
171
+ ## Limitations, Risks, Bias, and Ethical Considerations
172
+
173
+ ### Limitations and Known Biases
174
+
175
+ - **Primary Function and Application:** Trendyol LLM, an autoregressive language model, is primarily designed to predict the next token in a text string. While often used for various applications, it is important to note that it has not undergone extensive real-world application testing. Its effectiveness and reliability across diverse scenarios remain largely unverified.
176
+
177
+ - **Language Comprehension and Generation:** The model is primarily trained in standard English and Turkish. Its performance in understanding and generating slang, informal language, or other languages may be limited, leading to potential errors or misinterpretations.
178
+
179
+ - **Generation of False Information:** Users should be aware that Trendyol LLM may produce inaccurate or misleading information. Outputs should be considered as starting points or suggestions rather than definitive answers.
180
+
181
+ ### Risks and Ethical Considerations
182
+
183
+ - **Potential for Harmful Use:** There is a risk that Trendyol LLM could be used to generate offensive or harmful language. We strongly discourage its use for any such purposes and emphasize the need for application-specific safety and fairness evaluations before deployment.
184
+
185
+ - **Unintended Content and Bias:** The model was trained on a large corpus of text data, which was not explicitly checked for offensive content or existing biases. Consequently, it may inadvertently produce content that reflects these biases or inaccuracies.
186
+
187
+ - **Toxicity:** Despite efforts to select appropriate training data, the model is capable of generating harmful content, especially when prompted explicitly. We encourage the open-source community to engage in developing strategies to minimize such risks.
188
+
189
+ ### Recommendations for Safe and Ethical Usage
190
+
191
+ - **Human Oversight:** We recommend incorporating a human curation layer or using filters to manage and improve the quality of outputs, especially in public-facing applications. This approach can help mitigate the risk of generating objectionable content unexpectedly.
192
+
193
+ - **Application-Specific Testing:** Developers intending to use Trendyol LLM should conduct thorough safety testing and optimization tailored to their specific applications. This is crucial, as the model’s responses can be unpredictable and may occasionally be biased, inaccurate, or offensive.
194
+
195
+ - **Responsible Development and Deployment:** It is the responsibility of developers and users of Trendyol LLM to ensure its ethical and safe application. We urge users to be mindful of the model's limitations and to employ appropriate safeguards to prevent misuse or harmful consequences.