TearGosling
commited on
Commit
•
ee26d26
1
Parent(s):
15a0bd6
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This is a test of converting the architecture of [GPT-J 6B](https://huggingface.co/EleutherAI/gpt-j-6b) into a mixture-of-experts model. It is initialized with 4 experts (2 active) with otherwise the same configuration as GPT-J-6B, making this a 17B parameter model in total.
|
2 |
+
|
3 |
+
The model weights were initialized *randomly* - not loaded from the pretrained GPT-J - for testing purposes. This model is not useable for any downstream purposes unless you're trying to generate absolute schizo babble - in which case, this model is perfect for your use-case. You have been warned.
|
4 |
+
|
5 |
+
Be sure to pass `trust_remote_code=True` into `AutoModelForCausalLM.from_pretrained` if you still want to use this model for some god-forsaken reason.
|