afrideva commited on
Commit
79e8363
1 Parent(s): aa7e436

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +46 -42
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: TinyLlama/TinyLlama-1.1B-Chat-v0.6
3
  datasets:
4
  - cerebras/SlimPajama-627B
5
  - bigcode/starcoderdata
@@ -9,7 +9,7 @@ language:
9
  - en
10
  license: apache-2.0
11
  model_creator: TinyLlama
12
- model_name: TinyLlama-1.1B-Chat-v0.6
13
  pipeline_tag: text-generation
14
  quantized_by: afrideva
15
  tags:
@@ -23,19 +23,19 @@ tags:
23
  - q6_k
24
  - q8_0
25
  ---
26
- # TinyLlama/TinyLlama-1.1B-Chat-v0.6-GGUF
27
 
28
- Quantized GGUF model files for [TinyLlama-1.1B-Chat-v0.6](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6) from [TinyLlama](https://huggingface.co/TinyLlama)
29
 
30
 
31
  | Name | Quant method | Size |
32
  | ---- | ---- | ---- |
33
- | [tinyllama-1.1b-chat-v0.6.q2_k.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q2_k.gguf) | q2_k | 482.14 MB |
34
- | [tinyllama-1.1b-chat-v0.6.q3_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q3_k_m.gguf) | q3_k_m | 549.85 MB |
35
- | [tinyllama-1.1b-chat-v0.6.q4_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q4_k_m.gguf) | q4_k_m | 667.81 MB |
36
- | [tinyllama-1.1b-chat-v0.6.q5_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q5_k_m.gguf) | q5_k_m | 782.04 MB |
37
- | [tinyllama-1.1b-chat-v0.6.q6_k.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q6_k.gguf) | q6_k | 903.41 MB |
38
- | [tinyllama-1.1b-chat-v0.6.q8_0.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF/resolve/main/tinyllama-1.1b-chat-v0.6.q8_0.gguf) | q8_0 | 1.17 GB |
39
 
40
 
41
 
@@ -53,39 +53,43 @@ The TinyLlama project aims to **pretrain** a **1.1B Llama model on 3 trillion to
53
  We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
54
 
55
  #### This Model
56
- This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-955k-2T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T). **We follow [HF's Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/edit/main/README.md)'s training recipe.** The model was " initially fine-tuned on a variant of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
57
- We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contain 64k prompts and model completions that are ranked by GPT-4."
58
-
59
-
60
  #### How to use
61
- You will need the transformers>=4.34
62
  Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) github page for more information.
63
-
64
- ```python
65
- # Install transformers from source - only needed for versions <= v4.34
66
- # pip install git+https://github.com/huggingface/transformers.git
67
- # pip install accelerate
68
-
69
  import torch
70
- from transformers import pipeline
71
-
72
- pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v0.6", torch_dtype=torch.bfloat16, device_map="auto")
73
-
74
- # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
75
- messages = [
76
- {
77
- "role": "system",
78
- "content": "You are a friendly chatbot who always responds in the style of a pirate",
79
- },
80
- {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
81
- ]
82
- prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
83
- outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
84
- print(outputs[0]["generated_text"])
85
- # <|system|>
86
- # You are a friendly chatbot who always responds in the style of a pirate.</s>
87
- # <|user|>
88
- # How many helicopters can a human eat in one sitting?</s>
89
- # <|assistant|>
90
- # ...
 
 
 
 
 
 
 
 
 
91
  ```
 
1
  ---
2
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v0.5
3
  datasets:
4
  - cerebras/SlimPajama-627B
5
  - bigcode/starcoderdata
 
9
  - en
10
  license: apache-2.0
11
  model_creator: TinyLlama
12
+ model_name: TinyLlama-1.1B-Chat-v0.5
13
  pipeline_tag: text-generation
14
  quantized_by: afrideva
15
  tags:
 
23
  - q6_k
24
  - q8_0
25
  ---
26
+ # TinyLlama/TinyLlama-1.1B-Chat-v0.5-GGUF
27
 
28
+ Quantized GGUF model files for [TinyLlama-1.1B-Chat-v0.5](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.5) from [TinyLlama](https://huggingface.co/TinyLlama)
29
 
30
 
31
  | Name | Quant method | Size |
32
  | ---- | ---- | ---- |
33
+ | [tinyllama-1.1b-chat-v0.5.q2_k.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q2_k.gguf) | q2_k | 482.15 MB |
34
+ | [tinyllama-1.1b-chat-v0.5.q3_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q3_k_m.gguf) | q3_k_m | 549.85 MB |
35
+ | [tinyllama-1.1b-chat-v0.5.q4_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q4_k_m.gguf) | q4_k_m | 667.82 MB |
36
+ | [tinyllama-1.1b-chat-v0.5.q5_k_m.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q5_k_m.gguf) | q5_k_m | 782.05 MB |
37
+ | [tinyllama-1.1b-chat-v0.5.q6_k.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q6_k.gguf) | q6_k | 903.42 MB |
38
+ | [tinyllama-1.1b-chat-v0.5.q8_0.gguf](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.5-GGUF/resolve/main/tinyllama-1.1b-chat-v0.5.q8_0.gguf) | q8_0 | 1.17 GB |
39
 
40
 
41
 
 
53
  We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
54
 
55
  #### This Model
56
+ This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-955k-2T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T).
57
+ The dataset used is [OpenAssistant/oasst_top1_2023-08-25](https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25) following the [chatml](https://github.com/openai/openai-python/blob/main/chatml.md) format.
 
 
58
  #### How to use
59
+ You will need the transformers>=4.31
60
  Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) github page for more information.
61
+ ```
62
+ from transformers import AutoTokenizer
63
+ import transformers
 
 
 
64
  import torch
65
+ model = "PY007/TinyLlama-1.1B-Chat-v0.5"
66
+ tokenizer = AutoTokenizer.from_pretrained(model)
67
+ pipeline = transformers.pipeline(
68
+ "text-generation",
69
+ model=model,
70
+ torch_dtype=torch.float16,
71
+ device_map="auto",
72
+ )
73
+
74
+ CHAT_EOS_TOKEN_ID = 32002
75
+
76
+ prompt = "How to get in a good university?"
77
+ formatted_prompt = (
78
+ f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
79
+ )
80
+
81
+
82
+ sequences = pipeline(
83
+ formatted_prompt,
84
+ do_sample=True,
85
+ top_k=50,
86
+ top_p = 0.9,
87
+ num_return_sequences=1,
88
+ repetition_penalty=1.1,
89
+ max_new_tokens=1024,
90
+ eos_token_id=CHAT_EOS_TOKEN_ID,
91
+ )
92
+
93
+ for seq in sequences:
94
+ print(f"Result: {seq['generated_text']}")
95
  ```