acf15626vp commited on
Commit
7eff671
1 Parent(s): 6291643

upload model

Browse files
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - ja
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - gpt_neox
9
+ - gpt-neox
10
+ - japanese
11
+ inference:
12
+ parameters:
13
+ max_new_tokens: 128
14
+ do_sample: false
15
+ repetition_penalty: 1.1
16
  ---
17
+
18
+ # stockmark/gpt-neox-japanese-1.4b
19
+
20
+ This repository provides a GPT-NeoX based model with 1.4B parameters pre-trained on Japanese corpus of about 20B tokens. This model is developed by [Stockmark Inc.](https://stockmark.co.jp/)
21
+
22
+ ## How to use
23
+
24
+ ```python
25
+ import torch
26
+ from transformers import AutoModelForCausalLM, AutoTokenizer
27
+
28
+ # Use torch.bfloat16 for A100 GPU and torch.flaot16 for the older generation GPUs
29
+ torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "is_bf16_supported") and torch.cuda.is_bf16_supported() else torch.float16
30
+
31
+ model = AutoModelForCausalLM.from_pretrained("stockmark/gpt-neox-japanese-1.4b", device_map="auto", torch_dtype=torch_dtype)
32
+ tokenizer = AutoTokenizer.from_pretrained("stockmark/gpt-neox-japanese-1.4b")
33
+
34
+ inputs = tokenizer("自然言語処理とは", return_tensors="pt").to(model.device)
35
+ with torch.no_grad():
36
+ tokens = model.generate(
37
+ **inputs,
38
+ max_new_tokens=128,
39
+ repetition_penalty=1.1
40
+ )
41
+
42
+ output = tokenizer.decode(tokens[0], skip_special_tokens=True)
43
+ print(output)
44
+ ```
45
+
46
+ ## Training dataset
47
+ - Japanese Web Corpus (ja): 8.6B tokens (This dataset will not be released.)
48
+ - Wikipedia (ja): 0.88B tokens
49
+ - CC100 (ja): 10.5B tokens
50
+
51
+ ## Training setting
52
+ - Trained using HuggingFace Trainer and DeepSpeed (ZeRO-2)
53
+ - 8 A100 GPUs (40GB) at ABCI
54
+ - Mixed Precision (BF16)
55
+
56
+ ## License
57
+ [The MIT license](https://opensource.org/licenses/MIT)
58
+
59
+ ## Developed by
60
+ Stockmark Inc.
61
+ https://stockmark.co.jp/
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GPTNeoXForCausalLM"
4
+ ],
5
+ "bos_token_id": 0,
6
+ "classifier_dropout": 0.1,
7
+ "eos_token_id": 0,
8
+ "hidden_act": "gelu",
9
+ "hidden_size": 2048,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 8192,
12
+ "layer_norm_eps": 1e-05,
13
+ "max_position_embeddings": 1024,
14
+ "model_type": "gpt_neox",
15
+ "num_attention_heads": 16,
16
+ "num_hidden_layers": 24,
17
+ "pad_token_id": 1,
18
+ "rotary_emb_base": 10000,
19
+ "rotary_pct": 0.25,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.30.2",
23
+ "use_cache": true,
24
+ "use_parallel_residual": false,
25
+ "vocab_size": 50000
26
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "pad_token_id": 1,
6
+ "transformers_version": "4.30.2"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ce45432f8538cee7b698f847dcb7f31cea2e27d0322bcf2fa7bf12796246baf
3
+ size 2852015168
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2039be3d49e59904559ab49df85fa8ed58a35967e535d19f174b296a8695b300
3
+ size 2852090557
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|padding|>",
5
+ "unk_token": "<|endoftext|>"
6
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "pad_token": "<|padding|>",
8
+ "tokenizer_class": "GPTNeoXTokenizer",
9
+ "unk_token": "<|endoftext|>"
10
+ }