mishig (Mishig Davaadorj)

upvoted an article about 19 hours ago

Article

Orchestration of Experts: The First-Principle Multi-Model System

By

•

1 day ago

• 12

upvoted an article 15 days ago

Article

Improving Prompt Consistency with Structured Generations

Apr 30

• 46

upvoted a paper 17 days ago

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 1

upvoted a paper about 1 month ago

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 96

upvoted an article about 1 month ago

Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

By

•

Apr 24

• 48

upvoted a paper about 1 month ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 238

upvoted an article about 1 month ago

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 192

upvoted a collection about 1 month ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 553

upvoted 2 papers about 1 month ago

The Illusion of State in State-Space Models

Paper • 2404.08819 • Published Apr 12 • 1

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6 • 13

upvoted 6 papers about 2 months ago

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 24

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26 • 1

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11 • 15

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 39

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 93

upvoted 2 articles about 2 months ago

Article

History of State Space Models (SSM) in 2022

By

•

Apr 11

• 6

Article

It's raining diffusion personalization techniques☔️🎭🖼️

By

•

Apr 11

• 16

upvoted 2 papers about 2 months ago

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8 • 57

Formal Aspects of Language Modeling

Paper • 2311.04329 • Published Nov 7, 2023 • 1

upvoted 4 articles about 2 months ago

Article

RFDiffusion Potentials

By

•

17 days ago

• 9

Article

Deploying 🤗 Hub models in Vertex AI

By

•

Feb 27

• 3

Article

Pollen-Vision: Unified interface for Zero-Shot vision models in robotics

Mar 25

• 6

Article

Diffusion Models

By

•

13 days ago

• 12

upvoted a paper about 2 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

upvoted 3 papers 3 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 176

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

Paper • 2403.00818 • Published Feb 26 • 13

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 567

upvoted a collection 4 months ago

Qwen1.5

Collection

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 19 days ago • 183

upvoted 4 papers 4 months ago

CroissantLLM: A Truly Bilingual French-English Language Model

Paper • 2402.00786 • Published Feb 1 • 22

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Paper • 2401.09967 • Published Jan 18 • 1

WARM: On the Benefits of Weight Averaged Reward Models

Paper • 2401.12187 • Published Jan 22 • 17

upvoted a collection 4 months ago

AIM

Collection

AIM: Autoregressive Image Models • 5 items • Updated Jan 29 • 43

upvoted 2 collections 5 months ago

Zeroshot Classifiers

Collection

These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 79

Mongolian Speech Models 🇲🇳

Collection

STT and TTS • 6 items • Updated Sep 8, 2023 • 2

upvoted 3 papers 5 months ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 152

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4 • 25

Possible Meissner effect near room temperature in copper-substituted lead apatite

Paper • 2401.00999 • Published Jan 2 • 5

upvoted 2 papers 6 months ago

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 39

Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 3

upvoted a collection 6 months ago

MoE

Collection

135 items • Updated about 3 hours ago • 17

upvoted 3 papers 6 months ago

upvoted a collection 6 months ago

Tulu V2 Suite

Collection

The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated Feb 1 • 43

upvoted 4 papers 6 months ago

Scalable AI Safety via Doubly-Efficient Debate

Paper • 2311.14125 • Published Nov 23, 2023 • 1

Simplifying Transformer Blocks

Paper • 2311.01906 • Published Nov 3, 2023 • 1

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 38

Transformers learn in-context by gradient descent

Paper • 2212.07677 • Published Dec 15, 2022 • 1

upvoted 2 collections 7 months ago

Zephyr 7B

Collection

Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12 • 138

Reward models on the hub

Collection

UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13 • 24

upvoted 2 papers 7 months ago

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Paper • 2310.04378 • Published Oct 6, 2023 • 19

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 116

upvoted a collection 8 months ago

Contra (Bottleneck T5)

Collection

Text autoencoders capable of embedding and generating text in a fixed-size latent space, useful for embeddings and latent space text editing. • 4 items • Updated Oct 3, 2023 • 27

upvoted a paper 8 months ago

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 74

upvoted a collection 8 months ago

Recent models: last 100 repos, sorted by creation date

Collection

The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 447

upvoted a paper 9 months ago

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 20

upvoted 2 papers 10 months ago

Neural signature kernels as infinite-width-depth-limits of controlled ResNets

Paper • 2303.17671 • Published Mar 30, 2023 • 1

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Paper • 2110.07205 • Published Oct 14, 2021 • 4

Mishig Davaadorj

AI & ML interests

Articles

CodeGemma - an official Google release for code LLMs

Organizations

mishig's activity

Orchestration of Experts: The First-Principle Multi-Model System

Improving Prompt Consistency with Structured Generations

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

Fine-tune Llama 3 with ORPO

History of State Space Models (SSM) in 2022

It's raining diffusion personalization techniques☔️🎭🖼️

RFDiffusion Potentials

Deploying 🤗 Hub models in Vertex AI

Pollen-Vision: Unified interface for Zero-Shot vision models in robotics

Diffusion Models