RichardErkhov commited on
Commit
172b060
·
verified ·
1 Parent(s): f9c307a

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ tiny_starcoder_py - bnb 8bits
11
+ - Model creator: https://huggingface.co/bigcode/
12
+ - Original model: https://huggingface.co/bigcode/tiny_starcoder_py/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ pipeline_tag: text-generation
20
+ inference: true
21
+ widget:
22
+ - text: 'def print_hello_world():'
23
+ example_title: Hello world
24
+ group: Python
25
+ license: bigcode-openrail-m
26
+ datasets:
27
+ - bigcode/the-stack-dedup
28
+ metrics:
29
+ - code_eval
30
+ library_name: transformers
31
+ tags:
32
+ - code
33
+ model-index:
34
+ - name: Tiny-StarCoder-Py
35
+ results:
36
+ - task:
37
+ type: text-generation
38
+ dataset:
39
+ type: openai_humaneval
40
+ name: HumanEval
41
+ metrics:
42
+ - name: pass@1
43
+ type: pass@1
44
+ value: 7.84%
45
+ verified: false
46
+ ---
47
+
48
+ # TinyStarCoderPy
49
+
50
+ This is a 164M parameters model with the same architecture as [StarCoder](https://huggingface.co/bigcode/starcoder) (8k context length, MQA & FIM). It was trained on the Python data from [StarCoderData](https://huggingface.co/datasets/bigcode/starcoderdata)
51
+ for ~6 epochs which amounts to 100B tokens.
52
+
53
+
54
+ ## Use
55
+
56
+ ### Intended use
57
+
58
+ The model was trained on GitHub code, to assist with some tasks like [Assisted Generation](https://huggingface.co/blog/assisted-generation). For pure code completion, we advise using our 15B models [StarCoder]() or [StarCoderBase]().
59
+
60
+
61
+ ### Generation
62
+ ```python
63
+ # pip install -q transformers
64
+ from transformers import AutoModelForCausalLM, AutoTokenizer
65
+
66
+ checkpoint = "bigcode/tiny_starcoder_py"
67
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
70
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
71
+
72
+ inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
73
+ outputs = model.generate(inputs)
74
+ print(tokenizer.decode(outputs[0]))
75
+ ```
76
+
77
+ ### Fill-in-the-middle
78
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
79
+
80
+ ```python
81
+ input_text = "<fim_prefix>def print_one_two_three():\n print('one')\n <fim_suffix>\n print('three')<fim_middle>"
82
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
83
+ outputs = model.generate(inputs)
84
+ print(tokenizer.decode(outputs[0]))
85
+ ```
86
+
87
+ # Training
88
+
89
+ ## Model
90
+
91
+ - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
92
+ - **Pretraining steps:** 50k
93
+ - **Pretraining tokens:** 100 billion
94
+ - **Precision:** bfloat16
95
+
96
+ ## Hardware
97
+
98
+ - **GPUs:** 32 Tesla A100
99
+ - **Training time:** 18 hours
100
+
101
+ ## Software
102
+
103
+ - **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
104
+ - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
105
+ - **BP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
106
+
107
+ # License
108
+ The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
109
+
110
+