jacobthebanana commited on
Commit
18c2785
1 Parent(s): 7db7d07

Added JAX weights of galactica-1.3b converted from PyTorch

Browse files
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
4
+
5
+ JAX weights converted from Torch checkpoint at `facebook/galactica-1.3b`.
6
+
7
+ ```python
8
+ (env) ubuntu@vm:~$ JAX_PLATFORM_NAME=cpu python3
9
+ >>> import jax
10
+ >>> print(jax.devices())
11
+ [CpuDevice(id=0)] # Ensure that model weights are loaded into CPU RAM, not accelerator memory.
12
+ >>> from transformers import FlaxOPTForCausalLM
13
+ >>> model = FlaxOPTForCausalLM.from_pretrained("facebook/galactica-1.3b", from_pt=True)
14
+ >>> model.push_to_hub(hf_model_repo)
15
+ ```
16
+
17
+ ## Citation and Attribution
18
+
19
+ Citation from the original repo is reproduced below as per the cc-by-nc-4.0 licsense.
20
+
21
+ ```bibtex
22
+ @inproceedings{GALACTICA,
23
+ title={GALACTICA: A Large Language Model for Science},
24
+ author={Ross Taylor and Marcin Kardas and Guillem Cucurull and Thomas Scialom and Anthony Hartshorn and Elvis Saravia and Andrew Poulton and Viktor Kerkez and Robert Stojnic},
25
+ year={2022}
26
+ }
27
+ ```
28
+
29
+ > Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/content/base",
3
+ "_remove_final_layer_norm": false,
4
+ "activation_dropout": 0.0,
5
+ "activation_function": "gelu",
6
+ "architectures": [
7
+ "OPTForCausalLM"
8
+ ],
9
+ "attention_dropout": 0.1,
10
+ "bos_token_id": 0,
11
+ "do_layer_norm_before": true,
12
+ "dropout": 0.1,
13
+ "enable_bias": true,
14
+ "eos_token_id": 2,
15
+ "ffn_dim": 8192,
16
+ "hidden_size": 2048,
17
+ "init_std": 0.02,
18
+ "layer_norm_elementwise_affine": true,
19
+ "layerdrop": 0.0,
20
+ "learned_embeddings": true,
21
+ "max_position_embeddings": 2048,
22
+ "model_type": "opt",
23
+ "num_attention_heads": 32,
24
+ "num_hidden_layers": 24,
25
+ "pad_token_id": 1,
26
+ "scale_embeddings": false,
27
+ "torch_dtype": "float16",
28
+ "transformers_version": "4.25.1",
29
+ "use_cache": true,
30
+ "vocab_size": 50000,
31
+ "word_embed_proj_dim": 2048
32
+ }
flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:055d597d7985d15a748fda64122e254f01103ee8c99562c79d2fc23352a8b589
3
+ size 2630415559
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "</s>",
3
+ "eos_token": "</s>",
4
+ "pad_token": "<pad>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "model_max_length": 1000000000000000019884624838656,
3
+ "name_or_path": "facebook/galactica-1.3b",
4
+ "special_tokens_map_file": "/content/tokenizer/special_tokens_map.json",
5
+ "tokenizer_class": "PreTrainedTokenizerFast"
6
+ }