umm-dev
/

simple-gclm-implementation

Text Generation

transformer-alternative

Model card Files Files and versions

umm-dev commited on 9 days ago

Commit

c517fa7

·

verified ·

1 Parent(s): b57c451

Update README.md

Files changed (1) hide show

README.md +112 -1

README.md CHANGED Viewed

@@ -5,4 +5,115 @@ datasets:
 language:
 - en
 pipeline_tag: text-generation
----

 language:
 - en
 pipeline_tag: text-generation
+tags:
+- research
+- convolutional
+- fft
+- transformer-alternative
+- causal-lm
+---
+# GCLM — Global Convolutional Language Model
+## Model Summary
+**GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**.
+Instead of attention heads, GCLM uses:
+- **Local depthwise convolutions** for short-range context
+- **FFT-based global convolutions** for long-range sequence modeling
+This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling.
+> GCLM is a transformer alternative — not a transformer replacement.
+---
+## Architecture Overview
+- Token + learned positional embeddings
+- Stacked convolutional blocks:
+  - Local depthwise + pointwise convolution
+  - Optional global FFT convolution every *N* layers
+  - Feedforward MLP
+  - Residual connections + LayerNorm
+- Causal language modeling head
+**Key properties:**
+- No attention mechanism
+- No KV cache
+- Linear memory scaling with sequence length
+- Extremely long-context friendly (tested up to 8k+ tokens)
+---
+## Training Data
+The model was trained on:
+- **Skylion007/openwebtext**
+This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.
+---
+## Intended Use
+**Primary use cases:**
+- Research into transformer alternatives
+- Long-context modeling experiments
+- Architectural ablation studies
+- Educational exploration of non-attention sequence models
+**Not intended for:**
+- Safety-critical applications
+- Medical, legal, or financial advice
+- Deployment as a production chatbot without additional alignment work
+---
+## Limitations
+- This model is **research-grade**, not instruction-tuned
+- Outputs may be:
+  - Incoherent
+  - Factually incorrect
+  - Biased or unsafe
+- Performance characteristics differ significantly from transformer LMs
+- No reinforcement learning or alignment tuning applied
+---
+## Ethical Considerations
+GCLM was trained on publicly available web data and may reflect societal biases present in that data.
+Users are responsible for:
+- Applying appropriate filtering
+- Avoiding harmful or misleading use cases
+- Evaluating outputs critically
+---
+## License
+This model is released under the **Apache License 2.0**.
+You are free to:
+- Use
+- Modify
+- Distribute
+- Use commercially
+Attribution and license preservation are required.
+Patent rights are explicitly granted under this license.
+---
+## Citation
+If you use GCLM in your research, please cite or reference the project.
+## Important
+The model will not be put in the repo until it has finished training.