Post
1914
Diaries of Open Source. Part 6!
🏎️xAI releases Grok-1, a 314B MoE
Blog: https://x.ai/blog/grok-os
GH repo: https://github.com/xai-org/grok-1
Model: xai-org/grok-1
🕺MusicLang, a model for controllable music generation
Demo: musiclang/musiclang-predict
GH repo: https://github.com/musiclang/musiclang_predict
🔬BioT5: a family of models for biology and chemical text tasks
Base model: QizhiPei/biot5-base
Model for molecule captioning and design: QizhiPei/biot5-base-mol2text and QizhiPei/biot5-base-text2mol
GH Repo: https://github.com/QizhiPei/BioT5
Paper: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (2310.07276)
🤏Check out the AQLM and QMoE official weights from ISTA-DAS lab
Org: https://hf.co/ISTA-DASLab
Papers: Extreme Compression of Large Language Models via Additive Quantization (2401.06118) and QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)
🚀Community releases
Einstein-v4-7B, a Mistral fine-tune on high-quality data Weyaxi/Einstein-v4-7B
IL-7B, a Misttral fine-tune merge for rheumatology cmcmaster/il_7b
Caselaw Access Project, a collaboration to digitalize 40 million US court decisions from 6.7 million cases from 360 years https://hf.co/datasets/TeraflopAI/Caselaw_Access_Project
🌍Data and models around the world
HPLT Monolingual, a dataset of 75 languages with over 40TB of data HPLT/hplt_monolingual_v1_2
OpenLLM Turkish Benchmarks & Leaderboard malhajar/openllmturkishleadboard-datasets-65e5854490a87c0f2670ec18 and malhajar/OpenLLMTurkishLeaderboard
Occiglot, a collaborative effort for European LLMs with an initial release of 7B models for French, German, Spanish, and Italian occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01
Guftagoo, a Hindi+Hinglish multi-turn conversational dataset https://hf.co/datasets/Tensoic/gooftagoo
AryaBhatta-Orca-Maths-Hindi dataset https://hf.co/datasets/GenVRadmin/Aryabhatta-Orca-Maths-Hindi
🏎️xAI releases Grok-1, a 314B MoE
Blog: https://x.ai/blog/grok-os
GH repo: https://github.com/xai-org/grok-1
Model: xai-org/grok-1
🕺MusicLang, a model for controllable music generation
Demo: musiclang/musiclang-predict
GH repo: https://github.com/musiclang/musiclang_predict
🔬BioT5: a family of models for biology and chemical text tasks
Base model: QizhiPei/biot5-base
Model for molecule captioning and design: QizhiPei/biot5-base-mol2text and QizhiPei/biot5-base-text2mol
GH Repo: https://github.com/QizhiPei/BioT5
Paper: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (2310.07276)
🤏Check out the AQLM and QMoE official weights from ISTA-DAS lab
Org: https://hf.co/ISTA-DASLab
Papers: Extreme Compression of Large Language Models via Additive Quantization (2401.06118) and QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)
🚀Community releases
Einstein-v4-7B, a Mistral fine-tune on high-quality data Weyaxi/Einstein-v4-7B
IL-7B, a Misttral fine-tune merge for rheumatology cmcmaster/il_7b
Caselaw Access Project, a collaboration to digitalize 40 million US court decisions from 6.7 million cases from 360 years https://hf.co/datasets/TeraflopAI/Caselaw_Access_Project
🌍Data and models around the world
HPLT Monolingual, a dataset of 75 languages with over 40TB of data HPLT/hplt_monolingual_v1_2
OpenLLM Turkish Benchmarks & Leaderboard malhajar/openllmturkishleadboard-datasets-65e5854490a87c0f2670ec18 and malhajar/OpenLLMTurkishLeaderboard
Occiglot, a collaborative effort for European LLMs with an initial release of 7B models for French, German, Spanish, and Italian occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01
Guftagoo, a Hindi+Hinglish multi-turn conversational dataset https://hf.co/datasets/Tensoic/gooftagoo
AryaBhatta-Orca-Maths-Hindi dataset https://hf.co/datasets/GenVRadmin/Aryabhatta-Orca-Maths-Hindi