afrideva commited on
Commit
25b64bf
1 Parent(s): a89b430

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ahxt/llama2_xs_460M_experimental
3
+ datasets:
4
+ - Redpajama
5
+ inference: false
6
+ language:
7
+ - en
8
+ metrics:
9
+ - MMLU
10
+ model_creator: ahxt
11
+ model_name: llama2_xs_460M_experimental
12
+ pipeline_tag: text-generation
13
+ quantized_by: afrideva
14
+ tags:
15
+ - llama2
16
+ - llama-2
17
+ - llama
18
+ - llama2 architecture
19
+ - gguf
20
+ - ggml
21
+ - quantized
22
+ - q2_k
23
+ - q3_k_m
24
+ - q4_k_m
25
+ - q5_k_m
26
+ - q6_k
27
+ - q8_0
28
+ ---
29
+ # ahxt/llama2_xs_460M_experimental-GGUF
30
+
31
+ Quantized GGUF model files for [llama2_xs_460M_experimental](https://huggingface.co/ahxt/llama2_xs_460M_experimental) from [ahxt](https://huggingface.co/ahxt)
32
+
33
+
34
+ | Name | Quant method | Size |
35
+ | ---- | ---- | ---- |
36
+ | [llama2_xs_460m_experimental.fp16.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.fp16.gguf) | fp16 | 925.45 MB |
37
+ | [llama2_xs_460m_experimental.q2_k.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q2_k.gguf) | q2_k | 212.56 MB |
38
+ | [llama2_xs_460m_experimental.q3_k_m.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q3_k_m.gguf) | q3_k_m | 238.87 MB |
39
+ | [llama2_xs_460m_experimental.q4_k_m.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q4_k_m.gguf) | q4_k_m | 288.51 MB |
40
+ | [llama2_xs_460m_experimental.q5_k_m.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q5_k_m.gguf) | q5_k_m | 333.29 MB |
41
+ | [llama2_xs_460m_experimental.q6_k.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q6_k.gguf) | q6_k | 380.87 MB |
42
+ | [llama2_xs_460m_experimental.q8_0.gguf](https://huggingface.co/afrideva/llama2_xs_460M_experimental-GGUF/resolve/main/llama2_xs_460m_experimental.q8_0.gguf) | q8_0 | 492.67 MB |
43
+
44
+
45
+
46
+ ## Original Model Card:
47
+ # LLaMa Lite: Reduced-Scale, Experimental Versions of LLaMA and LLaMa 2
48
+
49
+ In this series of repos, we present an open-source reproduction of Meta AI's [LLaMA](https://ai.meta.com/blog/large-language-model-llama-meta-ai/) and [LLaMa 2](https://ai.meta.com/llama/) large language models. However, with significantly reduced model sizes, the experimental version of [llama1_s](https://huggingface.co/ahxt/llama1_s_1.8B_experimental) has 1.8B parameters, and the experimental version of [llama2_xs](https://huggingface.co/ahxt/llama2_xs_460M_experimental) has 460M parameters. ('s' stands for small, while 'xs' denotes extra small).
50
+
51
+
52
+ ## Dataset and Tokenization
53
+ We train our models on part of [RedPajama](https://www.together.xyz/blog/redpajama) dataset. We use the [GPT2Tokenizer](https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/gpt2#transformers.GPT2Tokenizer) to tokenize the text.
54
+
55
+
56
+ ### Using with HuggingFace Transformers
57
+ The experimental checkpoints can be directly loaded by [Transformers](https://huggingface.co/transformers/) library. The following code snippet shows how to load the our experimental model and generate text with it.
58
+
59
+ ```python
60
+ import torch
61
+ from transformers import AutoTokenizer, AutoModelForCausalLM
62
+
63
+ # model_path = 'ahxt/llama2_xs_460M_experimental'
64
+ model_path = 'ahxt/llama1_s_1.8B_experimental'
65
+
66
+ model = AutoModelForCausalLM.from_pretrained(model_path)
67
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
68
+ model.eval()
69
+
70
+ prompt = 'Q: What is the largest bird?\nA:'
71
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
72
+ tokens = model.generate(input_ids, max_length=20)
73
+ print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
74
+ # Q: What is the largest bird?\nA: The largest bird is the bald eagle.
75
+ ```
76
+
77
+ ## Evaluation
78
+
79
+ We evaluate our models on the MMLU task
80
+ markdown table
81
+ | Models | #parameters |zero-shot | 5-shot |
82
+ | --- | --- | --- | --- |
83
+ | llama | 7B | 28.46 | 35.05 |
84
+ | openllama | 3B | 24.90 | 26.71 |
85
+ |TinyLlama-1.1B-step-50K-105b | 1.1B | 19.00 | 26.53 |
86
+ | llama2_xs_460M | 0.46B | 21.13 | 26.39 |
87
+
88
+
89
+
90
+
91
+ ## Contact
92
+ This experimental version is developed by:
93
+ [Xiaotian Han](https://ahxt.github.io/) from Texas A&M University. And these experimental verisons are for research only.
94
+
95
+
96
+
97
+
98
+
99
+
100
+
101
+
102
+
103
+
104
+
105
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
106
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ahxt__llama2_xs_460M_experimental)
107
+
108
+ | Metric | Value |
109
+ |-----------------------|---------------------------|
110
+ | Avg. | 26.65 |
111
+ | ARC (25-shot) | 24.91 |
112
+ | HellaSwag (10-shot) | 38.47 |
113
+ | MMLU (5-shot) | 26.17 |
114
+ | TruthfulQA (0-shot) | 41.59 |
115
+ | Winogrande (5-shot) | 49.88 |
116
+ | GSM8K (5-shot) | 0.0 |
117
+ | DROP (3-shot) | 5.51 |