license: mit
tags:
- pytorch
- gpt2
- text-generation
- fin-ai
- experimental
- in-training
- from-scratch
- automated-training
language:
- en
datasets:
- wikitext
- roneneldan/TinyStories
- openai/gsm8k
- squad
- imdb
- ag_news
- yelp_review_full
- cnn_dailymail
- billsum
- commonsense_qa
- hellaswag
- winogrande
- boolq
- race
- stanfordnlp/coqa
- allenai/c4
- Skylion007/openwebtext
- trivia_qa
- hotpot_qa
- microsoft/ms_marco
- duorc
- amazon_polarity
- zeroshot/twitter-financial-news-sentiment
- sciq
- quail
- wiki_qa
- paws
- medical_questions_pairs
- app_reviews
- rotten_tomatoes
metrics:
- perplexity
library_name: pytorch
pipeline_tag: text-generation
β οΈ EXPERIMENTAL MODEL - Training from scratch
GitHub β’ Training Logs β’ Report Issue
π¨ Important Notice
This model is training from scratch and outputs will be gibberish initially.
- π΄ Brand new model - Starting from random weights
- β³ Training time needed: 2-4 weeks for basic coherence
- π€ Automated training: Every 1 hour 10 minutes via GitHub Actions
- π Current quality: Expect complete nonsense initially
- π― Purpose: Research/experimental continuous learning
π Model Overview
| Specification | Value |
|---|---|
| Architecture | GPT-2 style Transformer |
| Parameters | 30,142,848 (~30M) |
| Layers | 6 |
| Attention Heads | 6 |
| Embedding Dimension | 384 |
| Feed-Forward Dimension | 1,536 |
| Max Sequence Length | 512 tokens |
| Vocabulary Size | 50,257 (GPT-2 tokenizer) |
| Position Encoding | Rotary (RoPE) |
| Activation | GELU |
π― Training Details
Training Schedule
- Frequency: Every 1 hour 10 minutes (6 cycles/hour)
- Steps per cycle: 800 steps
- Daily steps: ~115,200 steps
- Weekly steps: ~806,400 steps
- Batch size: 8 (effective: 32 with gradient accumulation)
- Learning rate: 3e-4 with cosine decay
- Warmup steps: 100
Training Infrastructure
- Platform: GitHub Actions (free tier)
- Hardware: CPU only
- Training time: ~15-20 minutes per cycle
- Automatic upload: To Hugging Face after each cycle
Datasets (30 total, rotating hourly)
The model trains on a diverse set of 30 datasets, cycling through one per hour:
π Knowledge & Reference
- WikiText-2, OpenWebText, C4
βοΈ Creative Writing
- TinyStories
π° News & Articles
- CNN/DailyMail, AG News, Billsum
β Question Answering
- SQuAD, CoQA, TriviaQA, HotpotQA, MS MARCO, WikiQA, Quail
π§ Reasoning & Logic
- GSM8K (Math), Common Sense QA, HellaSwag, WinoGrande, BoolQ
π Reading Comprehension
- RACE, DuoRC
π¬ Reviews & Sentiment
- IMDB, Yelp, Amazon Polarity, Rotten Tomatoes, App Reviews
π¬ Scientific & Medical
- SciQ, Medical Questions
π° Financial
- Twitter Financial News
π Paraphrase & Similarity
- PAWS
π Training Progress
Current Status
- Version: v2.0.0
- Training started: December 28, 2024
- Model type: fresh_init
- Total parameters: 30,142,848
Expected Timeline
| Week | Expected Quality | Description |
|---|---|---|
| 1 | π΄ Gibberish | Random weights, no coherence |
| 2 | π Patterns | Some token patterns emerging |
| 3-4 | π‘ Basic | Simple word sequences |
| 5-8 | π’ Improving | Short coherent phrases |
| 9-12 | π΅ Decent | Usable for simple tasks |
Monitoring
- GitHub Actions: View Training Runs
- Wandb Dashboard: View Metrics
- Model Updates: This page updates automatically
π» Usage
Installation
pip install torch transformers huggingface-hub
Download Model
from huggingface_hub import hf_hub_download
import os
# Create directory
os.makedirs("./fin_ai_model", exist_ok=True)
# Download model files
hf_hub_download("MeridianAlgo/Fin.AI", "model.pt", local_dir="./fin_ai_model")
hf_hub_download("MeridianAlgo/Fin.AI", "config.json", local_dir="./fin_ai_model")
Generate Text (Experimental)
from fin_ai.model import FinAIModel
import torch
from transformers import AutoTokenizer
# Load model
model = FinAIModel.from_pretrained("./fin_ai_model")
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Generate text (expect poor quality initially)
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
with torch.no_grad():
output = model.generate(
input_ids,
max_length=50,
temperature=0.8,
top_p=0.9,
do_sample=True,
)
generated_text = tokenizer.decode(output[0])
print(generated_text)
# Note: Output quality is poor initially and improves over weeks
π¬ Technical Details
Architecture Improvements (v2.0)
Compared to v1.x:
- β 3x more parameters (10M β 30M)
- β Better architecture (4 layers β 6 layers)
- β Larger embeddings (256 β 384 dimensions)
- β More attention heads (4 β 6 heads)
- β Improved training (600 β 800 steps/cycle)
Training Configuration
model:
size_preset: "small"
n_layers: 6
n_heads: 6
embed_dim: 384
ff_dim: 1536
max_seq_len: 512
training:
batch_size: 8
gradient_accumulation_steps: 4
learning_rate: 3.0e-4
weight_decay: 0.01
warmup_steps: 100
max_steps: 800
π Evaluation
Metrics Tracked
- Training Loss: Cross-entropy loss
- Perplexity: exp(loss)
- Tokens/Second: Training throughput
- Learning Rate: Cosine schedule with warmup
- Gradient Norm: For stability monitoring
Benchmarks (Coming Soon)
Once the model reaches basic coherence, we'll evaluate on:
- HellaSwag (common sense)
- LAMBADA (reading comprehension)
- WikiText perplexity
- Custom generation quality tests
β οΈ Limitations
- Early Training: Model is in very early training stages
- Output Quality: Expect gibberish for several weeks
- CPU Training: Slower than GPU training
- Small Model: 30M parameters is relatively small
- Limited Context: 512 token context window
- No Fine-tuning: Base model only, not instruction-tuned
- English Only: Trained primarily on English text
π€ Contributing
This is an open research project! Contributions welcome:
- Code: GitHub Repository
- Issues: Report Problems
- Discussions: Join Discussion
π License
MIT License - See LICENSE
π Links
- Repository: https://github.com/MeridianAlgo/FinAI
- Training Logs: https://wandb.ai/meridianalgo-meridianalgo/fin-ai
- GitHub Actions: https://github.com/MeridianAlgo/FinAI/actions
- Issues: https://github.com/MeridianAlgo/FinAI/issues
Last Updated: 2025-12-28 17:54 UTC
Status: π΄ Training from Scratch
Quality: β οΈ Expect Gibberish (2-4 weeks needed)