Text Generation
Transformers
Safetensors
English
mega
Inference Endpoints
pszemraj commited on
Commit
9f5c251
1 Parent(s): 6f5f5ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -58,8 +58,6 @@ pipeline_tag: text-generation
58
 
59
  # BEE-spoke-data/mega-ar-126m-4k
60
 
61
- > model card WIP, more details to come
62
-
63
 
64
  This may not be the _best_ language model, but it is a language model! It's interesting for a few reasons, not in the least of which is that it's technically not a transformer.
65
 
@@ -73,6 +71,11 @@ Details:
73
  - train-from-scratch
74
 
75
 
 
 
 
 
 
76
 
77
  ## evals
78
 
 
58
 
59
  # BEE-spoke-data/mega-ar-126m-4k
60
 
 
 
61
 
62
  This may not be the _best_ language model, but it is a language model! It's interesting for a few reasons, not in the least of which is that it's technically not a transformer.
63
 
 
71
  - train-from-scratch
72
 
73
 
74
+ For more info on MEGA (_& what some of the params above mean_), check out the [model docs](https://huggingface.co/docs/transformers/main/en/model_doc/mega#mega) or the [original paper](https://arxiv.org/abs/2209.10655)
75
+
76
+ ## Usage
77
+
78
+ Use it as you would any small text generation model. Given the small size and architecture of the model, it's probably best to take advantage of the longer context length by providing the model with additional text/context to "see more" rather than "generate more".
79
 
80
  ## evals
81