Commit History

feat: add 760M inference samples and quality audit
9db74d9

lzwjava commited on

feat: add 760M training logs from MI300X run
a553558

lzwjava commited on

docs: update README
8977f51

lzwjava commited on

Add working directory note to AMD download script docstring
941c05c

lzwjava commited on

Add FineWeb download script for AMD/US environment (direct HuggingFace, no mirror)
eb61d41

lzwjava commited on

Add deepseek run_lite and fineweb extract/tokenize scripts
108bc6b

lzwjava commited on

The changes update the download plan script to handle .tar.gz files (for Llama 3 models) in addition to the previously supported .tar files. The code checks if the target filename ends with .tar.gz and adjusts the extraction command accordingly by using tar xfzf format gzip files. This ensures compatibility with Llama 3 models which use the .tar.gz format.
ce535a0

lzwjava commited on

feat(download): track shard progress in progress.json for resumability
9b45962

lzwjava Claude Opus 4.7 (1M context) commited on

refactor(download): hardcode hf-mirror endpoint for China access
ca6fdcb

lzwjava Claude Opus 4.7 (1M context) commited on

chore: add ruff pre-commit hooks and apply formatting
468f6c2

lzwjava Claude Opus 4.7 (1M context) commited on

feat(download): add hardcoded 100B-token GPT-3 ablation downloader
693fe79

lzwjava Claude Opus 4.7 (1M context) commited on

feat(download): add token-budgeted FineWeb shard planner/downloader
8af1a22

lzwjava Claude Opus 4.7 (1M context) commited on

docs: add pending RAG and knowledge base features to README
5b13aa2

lzwjava Claude Sonnet 4.6 commited on

chore(logs): add gen1.txt log file
684f4e3

lzwjava commited on

build(logs): add FineWeb training logs
f41516d

lzwjava commited on

chore: add datasets directory with gitignore rules
f4a514a

lzwjava Claude Opus 4.6 (1M context) commited on

chore: restore training logs to logs directory
93494d5

lzwjava Claude Opus 4.6 (1M context) commited on

refactor: reorganize project structure
4f685ca

lzwjava Claude Opus 4.6 (1M context) commited on

docs: add README.md with project overview and usage
7edf0a2

lzwjava Claude Opus 4.6 (1M context) commited on

Add FineWeb download and extraction scripts (pyarrow iter_batches)
bcefcfb

lzwjava commited on

feat(deps): add requirements.txt for Python dependencies
f45aed1

lzwjava Claude Opus 4.6 (1M context) commited on

feat(train): add training duration calculation script
0060c88

lzwjava commited on