Daniel Han-Chen

danielhanchen

AI & ML interests

None yet

Recent Activity

updated a model 40 minutes ago
unsloth/gemma-2-9b-bnb-4bit
updated a model 40 minutes ago
unsloth/Phi-3.5-mini-instruct-bnb-4bit
updated a model 40 minutes ago
unsloth/gemma-2-2b-bnb-4bit

Articles

Organizations

danielhanchen's activity

posted an update about 14 hours ago
replied to louisbrulenaudet's post 2 months ago
reacted to louisbrulenaudet's post with πŸ”₯β€οΈπŸ‘€ 2 months ago
view post
Post
2586
The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data πŸ€—

The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer πŸ¦₯

Link to the base model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1

Link to the instruct model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct

Link to the dataset: louisbrulenaudet/Romulus-cpt-fr

Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.
  • 1 reply
Β·
replied to their post 7 months ago
replied to their post 7 months ago
posted an update 7 months ago
view post
Post
3600
Yay we got 500K+ monthly HF downloads on our Unsloth HF repo! :) Super appreciate everyone in the OSS community - and thanks for using Unsloth!!
Β·
reacted to loubnabnl's post with πŸ€— 9 months ago
view post
Post
⭐ Today we’re releasing The Stack v2 & StarCoder2: a series of 3B, 7B & 15B code generation models trained on 3.3 to 4.5 trillion tokens of code:

- StarCoder2-15B matches or outperforms CodeLlama 34B, and approaches DeepSeek-33B on multiple benchmarks.
- StarCoder2-3B outperforms StarCoderBase-15B and similar sized models.
- The Stack v2 a 4x larger dataset than the Stack v1, resulting in 900B unique code tokens πŸš€
As always, we released everything from models and datasets to curation code. Enjoy!

πŸ”— StarCoder2 collection: bigcode/starcoder2-65de6da6e87db3383572be1a
πŸ”— Paper: https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view
πŸ”— BlogPost: https://huggingface.co/blog/starcoder2
πŸ”— Code Leaderboard: bigcode/bigcode-models-leaderboard
reacted to akhaliq's post with πŸ‘ 9 months ago
view post
Post
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
reacted to FremyCompany's post with πŸ‘ 9 months ago
view post
Post
πŸ”₯ What's that biomedical model that got 170,763 downloads last month on HuggingFace?! Well, the paper is finally published! #BioLORD

πŸ“° Read our article in the Journal of the American Medical Informatics Association:
https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocae029/7614965

πŸ“TLDR: BioLORD-2023 is a series of semantic language models for the biomedical domain, capable of representing clinical concepts and sentences in a semantic space aligned with human preferences. Our new multilingual version supports 50+ languages and is further finetuned on 7 European languages. These models were trained contrastively and through distillations, using a corpus unifying in the same latent space the concept names of biomedical concepts and their descriptions. For concepts which didn't have a description written by humans in UMLS, we use information contained in the SnomedCT knowledge graph and the capabilities of ChatGPT to generate synthetic data and improve our results.

πŸ€— Access our models from the HuggingFace hub, including the new 2023-C and 2023-S variants:
FremyCompany/BioLORD-2023
FremyCompany/BioLORD-2023-M
FremyCompany/BioLORD-2023-S
FremyCompany/BioLORD-2023-C
  • 1 reply
Β·
reacted to mvaloatto's post with ❀️ 9 months ago
view post
Post
Want more β€œgood machine learning” in your X feed? Here is a new Space for you:
πŸ”” Top HF Users To Follow On X - https://huggingface.co/spaces/mvaloatto/HF2X

Ever since I fell down the AI rabbit hole, it hasn’t been super easy to spot and follow the most impactful Hugging Face contributors on X. So, inspired by @Weyaxi leaderboards, I decided to create a list just for this purpose.

Why, you ask?

First, it’s quite surprising how so many talented AI pioneers and independent contributors on X don't get the visibility/reach you might expect. Sad but true: follower count doesn't always match up with the value or innovation an individual brings to the table (just stating the obvious here).

Open source AI, in particular, thrives not just on innovation but also on the collective spirit of its believers and builders. With Hugging Face standing out as a prime hub for top AI engineers and contributors, compiling a directory of X profiles from influential figures on this platform felt like a natural step.

This Space aims to not only connect these top contributors but also guide open AI enthusiasts and newcomers towards the field's leading lights.

I put this modest page together using some web scraping and what I remember from my web dev class ages ago! Suggestions/likes are welcome - I’m hoping to keep tweaking/upgrading it, especially if you all find it useful.

Now, let’s follow each other! It’s time to accelerate the dissemination of our ideas, encourage collaboration within our community, and ensure that open AI developments receive the attention and recognition they deserve. πŸ”₯
Β·
reacted to clem's post with ❀️ 9 months ago
view post
Post
Terribly excited about open-source + on-device AI these days! Great to see @qualcomm release 80+ models optimized and curated for their devices and chips on HF: https://huggingface.co/qualcomm

  • 1 reply
Β·
reacted to harpreetsahota's post with ❀️ 9 months ago
view post
Post
google/gemma-7b-it is super good!

I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.

I've got a notebook here, which is kind of a framework for vibe-checking LLMs.

In this notebook, I take Gemma for a spin on a variety of prompts:
β€’ [nonsensical tokens]( harpreetsahota/diverse-token-sampler
β€’ [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
β€’ [summarization ability]( lighteval/summarization)
β€’ [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
β€’ [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)

I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.

I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.

Cheers:

You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing
reacted to vladbogo's post with πŸ‘ 9 months ago
view post
Post
Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co/blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!
Β·
reacted to DmitryRyumin's post with ❀️ 9 months ago
view post
Post
πŸš€πŸ”₯🌟 New Research Alert - ICLR 2024! 🌟πŸ”₯πŸš€
πŸ“„ Title: FuseChat: Revolutionizing Chat Models Fusion πŸŒŸπŸš€

πŸ‘₯ Authors: @Wanfq , @passerqxj et al.

πŸ“… Conference: ICLR, May 7-11, 2024 | Vienna, Austria πŸ‡¦πŸ‡Ή

πŸ”— Paper: FuseChat: Knowledge Fusion of Chat Models (2402.16107)
πŸ”— Repository: https://github.com/fanqiwan/FuseLLM

πŸ”₯ Models πŸ€–:
1️⃣ FuseChat-7B-VaRM: FuseAI/FuseChat-7B-VaRM
2️⃣ FuseChat-7B-Slerp: FuseAI/FuseChat-7B-Slerp
3️⃣ OpenChat-3.5-7B-Solar: FuseAI/OpenChat-3.5-7B-Solar
4️⃣ FuseChat-7B-TA: FuseAI/FuseChat-7B-TA
5️⃣ OpenChat-3.5-7B-Mixtral: FuseAI/OpenChat-3.5-7B-Mixtral

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #FuseChat #ChatModels #KnowledgeFusion #ICLR2024 #AI #Innovation #FuseLLM
reacted to merve's post with πŸ€πŸ‘ 9 months ago
view post
Post
I've tried DoRA (https://arxiv.org/abs/2402.09353) with SDXL using PEFT, outputs are quite detailed 🀩🌟
as usual trained on lego dataset I compiled, I compared them with previously trained pivotal tuned model and the normal DreamBooth model before that 😊

Notebook by @linoyts https://colab.research.google.com/drive/134mt7bCMKtCYyYzETfEGKXT1J6J50ydT?usp=sharing
Integration to PEFT by @BenjaminB https://github.com/huggingface/peft/pull/1474 (more info in the PR)
reacted to akhaliq's post with β€οΈπŸ‘ 9 months ago
view post
Post
ChatMusician

Understanding and Generating Music Intrinsically with LLM

ChatMusician: Understanding and Generating Music Intrinsically with LLM (2402.16153)

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.