pirroh commited on
Commit
4e7312f
1 Parent(s): 2c7d17d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -4
README.md CHANGED
@@ -4,18 +4,54 @@ datasets:
4
  - bigcode/the-stack-dedup
5
  tags:
6
  - code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
 
10
- # replit-code-v1-3b
11
 
12
- `replit-code-v1-3b` is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
13
 
14
- The training mixture includes 20 different languages, listed here in descending order of number of tokens:
15
  <br/>
16
  `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
17
 
18
- In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on 525B tokens (~195 tokens per parameter).
19
 
20
 
21
  ## How to use the model
 
4
  - bigcode/the-stack-dedup
5
  tags:
6
  - code
7
+ language:
8
+ - code
9
+ programming_language:
10
+ - Markdown
11
+ - Java
12
+ - JavaScript
13
+ - Python
14
+ - TypeScript
15
+ - PHP
16
+ - SQL
17
+ - JSX
18
+ - reStructuredText
19
+ - Rust
20
+ - C
21
+ - CSS
22
+ - Go
23
+ - C++
24
+ - HTML
25
+ - Vue
26
+ - Ruby
27
+ - Jupyter Notebook
28
+ - R
29
+ - Shell
30
+ model-index:
31
+ - name: replit-code-v1-3b
32
+ results:
33
+ - task:
34
+ type: text-generation
35
+ dataset:
36
+ type: openai_humaneval
37
+ name: HumanEval (Python)
38
+ metrics:
39
+ - name: pass@1
40
+ type: pass@1
41
+ value: 0.219
42
+ verified: false
43
  ---
44
 
45
 
46
+ # replit-code-v1-3b [Test it ]
47
 
48
+ `replit-code-v1-3b` is a 2.7B Causal Language Model focused on **Code Completion**. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
49
 
50
+ The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
51
  <br/>
52
  `Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
53
 
54
+ In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
55
 
56
 
57
  ## How to use the model