Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,115 @@ datasets:
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- research
|
| 10 |
+
- convolutional
|
| 11 |
+
- fft
|
| 12 |
+
- transformer-alternative
|
| 13 |
+
- causal-lm
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# GCLM — Global Convolutional Language Model
|
| 17 |
+
|
| 18 |
+
## Model Summary
|
| 19 |
+
|
| 20 |
+
**GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**.
|
| 21 |
+
|
| 22 |
+
Instead of attention heads, GCLM uses:
|
| 23 |
+
- **Local depthwise convolutions** for short-range context
|
| 24 |
+
- **FFT-based global convolutions** for long-range sequence modeling
|
| 25 |
+
|
| 26 |
+
This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling.
|
| 27 |
+
|
| 28 |
+
> GCLM is a transformer alternative — not a transformer replacement.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## Architecture Overview
|
| 33 |
+
|
| 34 |
+
- Token + learned positional embeddings
|
| 35 |
+
- Stacked convolutional blocks:
|
| 36 |
+
- Local depthwise + pointwise convolution
|
| 37 |
+
- Optional global FFT convolution every *N* layers
|
| 38 |
+
- Feedforward MLP
|
| 39 |
+
- Residual connections + LayerNorm
|
| 40 |
+
- Causal language modeling head
|
| 41 |
+
|
| 42 |
+
**Key properties:**
|
| 43 |
+
- No attention mechanism
|
| 44 |
+
- No KV cache
|
| 45 |
+
- Linear memory scaling with sequence length
|
| 46 |
+
- Extremely long-context friendly (tested up to 8k+ tokens)
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Training Data
|
| 51 |
+
|
| 52 |
+
The model was trained on:
|
| 53 |
+
- **Skylion007/openwebtext**
|
| 54 |
+
|
| 55 |
+
This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## Intended Use
|
| 60 |
+
|
| 61 |
+
**Primary use cases:**
|
| 62 |
+
- Research into transformer alternatives
|
| 63 |
+
- Long-context modeling experiments
|
| 64 |
+
- Architectural ablation studies
|
| 65 |
+
- Educational exploration of non-attention sequence models
|
| 66 |
+
|
| 67 |
+
**Not intended for:**
|
| 68 |
+
- Safety-critical applications
|
| 69 |
+
- Medical, legal, or financial advice
|
| 70 |
+
- Deployment as a production chatbot without additional alignment work
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## Limitations
|
| 75 |
+
|
| 76 |
+
- This model is **research-grade**, not instruction-tuned
|
| 77 |
+
- Outputs may be:
|
| 78 |
+
- Incoherent
|
| 79 |
+
- Factually incorrect
|
| 80 |
+
- Biased or unsafe
|
| 81 |
+
- Performance characteristics differ significantly from transformer LMs
|
| 82 |
+
- No reinforcement learning or alignment tuning applied
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
## Ethical Considerations
|
| 87 |
+
|
| 88 |
+
GCLM was trained on publicly available web data and may reflect societal biases present in that data.
|
| 89 |
+
|
| 90 |
+
Users are responsible for:
|
| 91 |
+
- Applying appropriate filtering
|
| 92 |
+
- Avoiding harmful or misleading use cases
|
| 93 |
+
- Evaluating outputs critically
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## License
|
| 98 |
+
|
| 99 |
+
This model is released under the **Apache License 2.0**.
|
| 100 |
+
|
| 101 |
+
You are free to:
|
| 102 |
+
- Use
|
| 103 |
+
- Modify
|
| 104 |
+
- Distribute
|
| 105 |
+
- Use commercially
|
| 106 |
+
|
| 107 |
+
Attribution and license preservation are required.
|
| 108 |
+
Patent rights are explicitly granted under this license.
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## Citation
|
| 113 |
+
|
| 114 |
+
If you use GCLM in your research, please cite or reference the project.
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
## Important
|
| 118 |
+
|
| 119 |
+
The model will not be put in the repo until it has finished training.
|