sugatoray (SUGATO RAY)

upvoted 3 papers 1 day ago

Visualizing the Loss Landscape of Neural Nets

Paper • 1712.09913 • Published Dec 28, 2017 • 1

tinyBenchmarks: evaluating LLMs with fewer examples

Paper • 2402.14992 • Published Feb 22 • 11

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published 3 days ago • 18

upvoted a paper 2 days ago

Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published 18 days ago • 14

upvoted a collection 2 days ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 11 items • Updated about 2 hours ago • 83

upvoted an article 2 days ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

4 days ago

• 90

upvoted an article 3 days ago

Article

License to Call: Introducing Transformers Agents 2.0

5 days ago

• 63

upvoted a collection 3 days ago

NuNerZero - Zero Shot NER

Collection

The best compact Zero-Shot NER models with MIT license • 4 items • Updated 7 days ago • 11

upvoted 2 articles 5 days ago

Article

Introducing the Open Leaderboard for Hebrew LLMs!

13 days ago

• 23

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

19 days ago

• 68

upvoted a paper 6 days ago

Automating the Enterprise with Foundation Models

Paper • 2405.03710 • Published 14 days ago • 1

upvoted an article 6 days ago

Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

By

•

21 days ago

• 54

upvoted a paper 8 days ago

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published 10 days ago • 11

upvoted a paper 9 days ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published 17 days ago • 61

upvoted an article 9 days ago

Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

By

•

11 days ago

• 24

upvoted a paper 10 days ago

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published 15 days ago • 52

upvoted a collection 10 days ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 5 days ago • 116

upvoted an article 12 days ago

Article

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

15 days ago

• 12

upvoted an article 13 days ago

Article

Open-source LLMs as LangChain Agents

Jan 24

• 9

upvoted a collection 13 days ago

Llama3-ChatQA-1.5

Collection

Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 14 days ago • 35

upvoted a paper 14 days ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published 15 days ago • 92

upvoted a collection 15 days ago

ZeroGPU Spaces

Collection

ZeroGPU Spaces made by the community • 16 items • Updated about 3 hours ago • 122

upvoted a paper 16 days ago

Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

Paper • 2404.13506 • Published 26 days ago • 1

upvoted a paper 19 days ago

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 57

upvoted 2 collections 23 days ago

OpenELM Instruct Models

Collection

4 items • Updated Apr 12 • 96

Arctic

Collection

A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 23 days ago • 18

upvoted 2 articles 24 days ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

26 days ago

• 71

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 125

upvoted 2 papers 25 days ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 34

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 7

upvoted a collection 29 days ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 29 days ago • 516

upvoted an article 29 days ago

Article

Welcome Llama 3 - Meta's new open LLM

30 days ago

• 238

upvoted a collection 30 days ago

WizardLM

Collection

0 items • Updated 9 days ago • 95

upvoted 2 papers about 1 month ago

A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

Paper • 2404.00579 • Published Mar 31 • 1

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 79

upvoted an article about 1 month ago

Article

DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive

By

•

Apr 9

• 26

upvoted a paper about 1 month ago

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Paper • 2404.00456 • Published Mar 30 • 3

upvoted 2 articles about 1 month ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 95

Article

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Apr 10

• 16

upvoted 2 papers about 1 month ago

More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 46

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 62

upvoted 2 collections about 1 month ago

LLMs

Collection

Collection of LLMs • 100 items • Updated 2 days ago • 1

Papers-LLMEval

Collection

5 items • Updated 1 day ago • 1

upvoted 2 papers about 1 month ago

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30 • 39

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 100

upvoted 3 articles about 1 month ago

Article

RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled

By

•

Apr 7

• 9

Article

Mixture of Experts Explained

Dec 11, 2023

• 65

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

By

•

10 days ago

• 21

upvoted a collection about 1 month ago

MoE

Collection

131 items • Updated 12 days ago • 16

upvoted 2 papers about 1 month ago

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 46

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29 • 26

upvoted a collection about 1 month ago

Papers-MoE

Collection

Papers on Mixture of Experts (MoE) • 4 items • Updated Apr 8 • 1

upvoted 3 papers about 1 month ago

upvoted an article about 1 month ago

Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Apr 4

• 20

upvoted 4 papers about 2 months ago

Model Stock: All we need is just a few fine-tuned models

Paper • 2403.19522 • Published Mar 28 • 9

sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 31

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Paper • 2403.14610 • Published Mar 21 • 1

Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model

Paper • 2206.14371 • Published Jun 29, 2022 • 3

SUGATO RAY

AI & ML interests

Organizations

sugatoray's activity

PaliGemma – Google's Cutting-Edge Open Vision Language Model

License to Call: Introducing Transformers Agents 2.0

Introducing the Open Leaderboard for Hebrew LLMs!

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

SeeMoE: Implementing a MoE Vision Language Model from Scratch

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Open-source LLMs as LangChain Agents

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Welcome Llama 3 - Meta's new open LLM

DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive

CodeGemma - an official Google release for code LLMs

Making thousands of open LLMs bloom in the Vertex AI Model Garden

RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled

Mixture of Experts Explained

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B