ucyang (Unchun Yang)

upvoted a paper 1 day ago

Yuan 2.0-M32: Mixture of Experts with Attention Router

Paper • 2405.17976 • Published 5 days ago • 15

upvoted 2 collections 1 day ago

Synthetic (text) Dataset Generation

Collection

Papers about synthetic dataset generation • 9 items • Updated 3 days ago • 3

sentence-transformers-from-synthetic-data

Collection

Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model • 3 items • Updated 1 day ago • 15

upvoted an article 1 day ago

Article

⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2

By

•

4 days ago

• 20

upvoted a paper 1 day ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published 3 days ago • 34

upvoted 2 papers 3 days ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1 • 20

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Paper • 2306.02858 • Published Jun 5, 2023 • 14

upvoted 2 articles 4 days ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 81

Article

Unlocking Longer Generation with Key-Value Cache Quantization

17 days ago

• 12

upvoted a collection 7 days ago

SimPO

Collection

This collections contains the list of model being trained and evaluated in the preprint: SimPO: Simple Preference Optimization with a Reference-Free R • 25 items • Updated 8 days ago • 9

upvoted an article 7 days ago

Article

Training MoE on AWS Trainium

By

•

9 days ago

• 3

upvoted a paper 8 days ago

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published 14 days ago • 15

upvoted a collection 8 days ago

C4AI Aya 23

Collection

Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated 9 days ago • 34

upvoted an article 9 days ago

Article

A Dive into Text-to-Video Models

May 8, 2023

• 4

upvoted 2 articles 10 days ago

Article

Let's talk about LLM evaluation

By

•

9 days ago

• 82

Article

Enjoy the Power of Phi-3 with ONNX Runtime on your device

By

•

11 days ago

• 19

upvoted a paper 13 days ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published 19 days ago • 57

upvoted a paper 15 days ago

LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 18

upvoted a collection 17 days ago

AkaLlama

Collection

Korean adaptation of Llama-3 LLM suites, developed by MIR Lab @ Yonsei University • 3 items • Updated 15 days ago • 1

upvoted a paper 18 days ago

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

Paper • 2403.10882 • Published Mar 16 • 5

upvoted a collection 18 days ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 11 items • Updated 16 days ago • 103

upvoted an article 18 days ago

Article

Hugging Face x LangChain : A new partner package in LangChain

19 days ago

• 70

upvoted a collection 18 days ago

NuNerZero - Zero Shot NER

Collection

The best compact Zero-Shot NER models with MIT license • 4 items • Updated 22 days ago • 13

upvoted 3 articles 19 days ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

By

•

25 days ago

• 25

Article

quanto: a pytorch quantization toolkit

Mar 18

• 15

Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

By

•

Apr 24

• 48

upvoted a paper 20 days ago

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 41

upvoted a collection 20 days ago

Yi-1.5 (2024/05)

Collection

10 items • Updated 13 days ago • 76

upvoted an article 20 days ago

Article

Llama 2 is here - get it on Hugging Face

Jul 18, 2023

• 15

upvoted a collection 20 days ago

Llama 2 Family

Collection

This collection hosts the transformers and original repos of the Llama 2 and Llama Guard releases • 13 items • Updated Apr 18 • 36

upvoted a paper 21 days ago

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published 25 days ago • 8

upvoted a collection 22 days ago

Granite Time Series Models

Collection

A collection of time series models trained by IBM licensed under CDLA-permissive-2.0 license. • 3 items • Updated 25 days ago • 5

upvoted a collection 24 days ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 18 items • Updated 2 days ago • 135

upvoted a paper 28 days ago

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published about 1 month ago • 53

upvoted an article 28 days ago

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Apr 29

• 69

upvoted 2 articles 29 days ago

Article

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 8

Article

Optimizing your LLM in production

Sep 15, 2023

• 5

upvoted a paper about 1 month ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 61

upvoted 4 articles about 1 month ago

Article

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

May 1

• 53

Article

What Makes a Dialog Agent Useful?

Jan 24, 2023

• 1

Article

Can We Train Chat Models with Raw Data?

By

•

Apr 25

• 17

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 40

upvoted a collection about 1 month ago

Korean Datasets I've released so far.

Collection

지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated 8 days ago • 14

upvoted a paper about 1 month ago

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 22

upvoted 4 collections about 1 month ago

upvoted a paper about 1 month ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 238

upvoted 3 collections about 1 month ago

LayoutLM

Collection

The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 5 items • Updated 11 days ago • 9

Table Transformer

Collection

The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. • 5 items • Updated 11 days ago • 12

Phi-3

Collection

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated 2 days ago • 299

upvoted 4 articles about 1 month ago

Article

Faster fine-tuning using TRL & Unsloth

Jan 10

• 20

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 73

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 193

Article

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 14

upvoted a collection about 1 month ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 557

upvoted an article about 1 month ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 245

upvoted an article about 2 months ago

Article

Synthetic data: save money, time and carbon with open source

Feb 16

• 28

upvoted a paper about 2 months ago

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 235

Unchun Yang

AI & ML interests

Organizations

ucyang's activity

⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2

Mixture of Experts Explained

Unlocking Longer Generation with Key-Value Cache Quantization

Training MoE on AWS Trainium

A Dive into Text-to-Video Models

Let's talk about LLM evaluation

Enjoy the Power of Phi-3 with ONNX Runtime on your device

Hugging Face x LangChain : A new partner package in LangChain

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

quanto: a pytorch quantization toolkit

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

Llama 2 is here - get it on Hugging Face

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Overview of natively supported quantization schemes in 🤗 Transformers

Optimizing your LLM in production

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

What Makes a Dialog Agent Useful?

Can We Train Chat Models with Raw Data?

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Faster fine-tuning using TRL & Unsloth

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Fine-tune Llama 3 with ORPO

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Welcome Llama 3 - Meta's new open LLM

Synthetic data: save money, time and carbon with open source