muhtasham (Muhtasham Oblokulov)

upvoted an article 1 day ago

Article

Introducing the Open Arabic LLM Leaderboard

2 days ago

• 29

upvoted a collection 7 days ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 4 days ago • 115

upvoted a collection 11 days ago

Llama3-ChatQA-1.5

Collection

Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 12 days ago • 34

upvoted 3 collections 21 days ago

upvoted a collection 27 days ago

Llama 3

Collection

8 items • Updated 27 days ago • 11

upvoted a collection 28 days ago

gazelle v0.2

Collection

2 items • Updated Mar 19 • 11

upvoted an article about 1 month ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

about 1 month ago

• 124

upvoted a collection about 1 month ago

CodeGemma Release

Collection

16 items • Updated 1 day ago • 57

upvoted 2 articles about 1 month ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 95

Article

Fine-tune Llama 2 with DPO

Aug 8, 2023

• 11

upvoted 3 collections about 1 month ago

Aurora-M models

Collection

Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated 10 days ago • 15

A little guide to building Large Language Models in 2024

Collection

Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1 • 13

The SPRIGHT T2I collection

Collection

This collection contains the datasets, model, paper, and demo associated with the SPRIGHT (SPatially RIGHT) release. • 5 items • Updated Apr 2 • 3

upvoted a paper about 1 month ago

The Case for Co-Designing Model Architectures with Hardware

Paper • 2401.14489 • Published Jan 25 • 2

upvoted 2 collections about 2 months ago

Qwen1.5

Collection

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 3 days ago • 165

DBRX

Collection

DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 88

upvoted a paper about 2 months ago

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14 • 23

upvoted 6 collections about 2 months ago

Wav2Vec 2.0

Collection

A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. • 8 items • Updated Jan 16 • 12

Load 4bit models 4x faster

Collection

Native bitsandbytes 4bit pre quantized models • 16 items • Updated 25 days ago • 21

WhisperKit

Collection

Datasets, models and evaluation results for WhisperKit • 1 item • Updated Mar 23 • 5

Long-Form Test Sets

Collection

A collection of long-form (samples > 30s) datasets used to evaluate the Distil-Whisper models. • 5 items • Updated Mar 21 • 5

Training Datasets

Collection

A collection of pseudo-labelled datasets used to train the Distil-Whisper model. • 9 items • Updated Mar 21 • 12

distil-large-v3

Collection

This collection contains the model repositories for distil-large-v3, which provides support for the most popular Whisper libraries. • 4 items • Updated Mar 21 • 4

upvoted a paper about 2 months ago

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 53

upvoted a paper 2 months ago

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13 • 34

upvoted 3 collections 2 months ago

Awesome Document AI

Collection

A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 38

ise-uiuc's Papers

Collection

7 items • Updated Mar 31 • 5

OpenCodeInterpreter

Collection

18 items • Updated Mar 3 • 72

upvoted 4 papers 2 months ago

Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29 • 25

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 44

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 40

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 182

upvoted a paper 3 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 123

upvoted 4 collections 3 months ago

💫 StarCoder2

Collection

StarCoder2 models and datasets! • 8 items • Updated Mar 1 • 72

Matryoshka Embedding Models

Collection

https://huggingface.co/blog/matryoshka • 12 items • Updated about 12 hours ago • 10

OpenMath

Collection

A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Feb 19 • 28

InstructRetro

Collection

InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning. • 4 items • Updated 17 days ago • 7

upvoted 6 collections 4 months ago

ML for Tools

Collection

Collection of papers about ML for using tools! • 25 items • Updated Jan 17 • 9

AIM

Collection

AIM: Autoregressive Image Models • 5 items • Updated Jan 29 • 43

MoEs papers reading list

Collection

41 items • Updated 3 days ago • 122

Comparing DPO with IPO and KTO

Collection

A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 28 items • Updated Mar 23 • 178

Zeroshot Classifiers

Collection

These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 76

upvoted a paper 5 months ago

Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 78

upvoted 2 collections 6 months ago

Switch-Transformers release

Collection

This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated 1 day ago • 11

Nemotron 3 8B

Collection

The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise. • 5 items • Updated Feb 19 • 37

upvoted 2 papers 6 months ago

Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 31

Retrieve Anything To Augment Large Language Models

Paper • 2310.07554 • Published Oct 11, 2023 • 6

upvoted 2 collections 6 months ago

Tajik Language Models

Collection

8 items • Updated Nov 11, 2023 • 1

zephyr story

Collection

sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15

upvoted a paper 7 months ago

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Paper • 2310.12823 • Published Oct 19, 2023 • 33

upvoted 6 papers 8 months ago

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 79

Accelerating LLM Inference with Staged Speculative Decoding

Paper • 2308.04623 • Published Aug 8, 2023 • 20

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37

Large Language Models for Compiler Optimization

Paper • 2309.07062 • Published Sep 11, 2023 • 22

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 52

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 57

upvoted a paper 9 months ago

Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

Paper • 2307.02053 • Published Jul 5, 2023 • 23

Muhtasham Oblokulov PRO

AI & ML interests

Organizations

muhtasham's activity

Introducing the Open Arabic LLM Leaderboard

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

CodeGemma - an official Google release for code LLMs

Fine-tune Llama 2 with DPO