umm-dev commited on
Commit
c517fa7
·
verified ·
1 Parent(s): b57c451

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -1
README.md CHANGED
@@ -5,4 +5,115 @@ datasets:
5
  language:
6
  - en
7
  pipeline_tag: text-generation
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - en
7
  pipeline_tag: text-generation
8
+ tags:
9
+ - research
10
+ - convolutional
11
+ - fft
12
+ - transformer-alternative
13
+ - causal-lm
14
+ ---
15
+
16
+ # GCLM — Global Convolutional Language Model
17
+
18
+ ## Model Summary
19
+
20
+ **GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**.
21
+
22
+ Instead of attention heads, GCLM uses:
23
+ - **Local depthwise convolutions** for short-range context
24
+ - **FFT-based global convolutions** for long-range sequence modeling
25
+
26
+ This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling.
27
+
28
+ > GCLM is a transformer alternative — not a transformer replacement.
29
+
30
+ ---
31
+
32
+ ## Architecture Overview
33
+
34
+ - Token + learned positional embeddings
35
+ - Stacked convolutional blocks:
36
+ - Local depthwise + pointwise convolution
37
+ - Optional global FFT convolution every *N* layers
38
+ - Feedforward MLP
39
+ - Residual connections + LayerNorm
40
+ - Causal language modeling head
41
+
42
+ **Key properties:**
43
+ - No attention mechanism
44
+ - No KV cache
45
+ - Linear memory scaling with sequence length
46
+ - Extremely long-context friendly (tested up to 8k+ tokens)
47
+
48
+ ---
49
+
50
+ ## Training Data
51
+
52
+ The model was trained on:
53
+ - **Skylion007/openwebtext**
54
+
55
+ This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.
56
+
57
+ ---
58
+
59
+ ## Intended Use
60
+
61
+ **Primary use cases:**
62
+ - Research into transformer alternatives
63
+ - Long-context modeling experiments
64
+ - Architectural ablation studies
65
+ - Educational exploration of non-attention sequence models
66
+
67
+ **Not intended for:**
68
+ - Safety-critical applications
69
+ - Medical, legal, or financial advice
70
+ - Deployment as a production chatbot without additional alignment work
71
+
72
+ ---
73
+
74
+ ## Limitations
75
+
76
+ - This model is **research-grade**, not instruction-tuned
77
+ - Outputs may be:
78
+ - Incoherent
79
+ - Factually incorrect
80
+ - Biased or unsafe
81
+ - Performance characteristics differ significantly from transformer LMs
82
+ - No reinforcement learning or alignment tuning applied
83
+
84
+ ---
85
+
86
+ ## Ethical Considerations
87
+
88
+ GCLM was trained on publicly available web data and may reflect societal biases present in that data.
89
+
90
+ Users are responsible for:
91
+ - Applying appropriate filtering
92
+ - Avoiding harmful or misleading use cases
93
+ - Evaluating outputs critically
94
+
95
+ ---
96
+
97
+ ## License
98
+
99
+ This model is released under the **Apache License 2.0**.
100
+
101
+ You are free to:
102
+ - Use
103
+ - Modify
104
+ - Distribute
105
+ - Use commercially
106
+
107
+ Attribution and license preservation are required.
108
+ Patent rights are explicitly granted under this license.
109
+
110
+ ---
111
+
112
+ ## Citation
113
+
114
+ If you use GCLM in your research, please cite or reference the project.
115
+
116
+
117
+ ## Important
118
+
119
+ The model will not be put in the repo until it has finished training.