ucalyptus (Sayantan Das)

upvoted an article 5 days ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

13 days ago

• 11

upvoted an article 6 days ago

Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By

•

6 days ago

• 11

upvoted a collection 8 days ago

🚀GGUF

Collection

Llama.cpp compatible models, can be used on CPUs and GPUs! • 663 items • Updated 3 days ago • 23

upvoted 2 articles 10 days ago

Article

Hugging Face x LangChain : A new partner package in LangChain

15 days ago

• 68

Article

Train custom AI models with the trainer API and adapt them to 🤗

By

•

4 days ago

• 19

upvoted an article 12 days ago

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 38

upvoted an article 13 days ago

Article

Hugging Face + Google Visual Blocks

By

•

13 days ago

• 17

upvoted a collection 17 days ago

Prem

Collection

Finetunes and Quantizations of the Prem LLMs • 11 items • Updated 16 days ago • 2

upvoted an article 18 days ago

Article

Merge Large Language Models with mergekit

By

•

Jan 9

• 20

upvoted a collection 19 days ago

fuck quadratic attention

Collection

11 items • Updated Apr 24 • 19

upvoted a paper 20 days ago

CLLMs: Consistency Large Language Models

Paper • 2403.00835 • Published Feb 28 • 3

upvoted a paper 26 days ago

A Primer on the Inner Workings of Transformer-based Language Models

Paper • 2405.00208 • Published 29 days ago • 7

upvoted a paper 27 days ago

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

Paper • 2401.16788 • Published Jan 30 • 1

upvoted a paper 28 days ago

Octopus v4: Graph of language models

Paper • 2404.19296 • Published 29 days ago • 94

upvoted an article about 1 month ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 133

upvoted 2 papers about 1 month ago

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 22

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 235

upvoted an article about 1 month ago

Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

By

•

Apr 24

• 48

upvoted a paper about 1 month ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 238

upvoted 3 articles about 1 month ago

Article

AI Apps in a Flash with Gradio's Reload Mode

Apr 16

• 16

Article

Towards Encrypted Large Language Models with FHE

Aug 2, 2023

• 5

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 191

upvoted a paper about 2 months ago

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Paper • 2402.05859 • Published Feb 8 • 4

upvoted 2 collections about 2 months ago

Zephyr ORPO

Collection

Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12 • 14

SDXL LoRA

Collection

A few SDXL-LoRA models I've trained • 20 items • Updated Jan 22 • 6

upvoted an article about 2 months ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 97

upvoted 9 papers about 2 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 60

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 22

H2O-Danube-1.8B Technical Report

Paper • 2401.16818 • Published Jan 30 • 16

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 75

upvoted a paper 2 months ago

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 55

upvoted a collection 2 months ago

DRAGON Models

Collection

Production-grade RAG-optimized 6-7B parameter models - "Delivering RAG on ..." the leading foundation base models • 11 items • Updated Feb 3 • 41

upvoted 3 papers 2 months ago

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 63

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2 • 11

Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 45

upvoted 6 papers 3 months ago

WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 20

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages

Paper • 2403.01926 • Published Mar 4 • 1

Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 567

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 94

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87

upvoted a collection 3 months ago

Data-efficient LLMs

Collection

dataset pruning for advancing the capabilities of LLMs • 24 items • Updated 5 days ago • 1

upvoted 6 papers 3 months ago

Datasets for Large Language Models: A Comprehensive Survey

Paper • 2402.18041 • Published Feb 28 • 2

Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

Paper • 2306.06881 • Published Jun 12, 2023 • 1

FaceForensics++: Learning to Detect Manipulated Facial Images

Paper • 1901.08971 • Published Jan 25, 2019 • 1

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Paper • 2203.05482 • Published Mar 10, 2022 • 5

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 45

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 104

upvoted 2 collections 3 months ago

Papers about model merging

Collection

referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 13

Model Merging Papers

Collection

Collection of relevant papers about model merging • 13 items • Updated Apr 2 • 5

upvoted a paper 3 months ago

Editing Models with Task Arithmetic

Paper • 2212.04089 • Published Dec 8, 2022 • 4

upvoted 2 collections 3 months ago

SLIM Models

Collection

Structured Language Instruction Models (SLIMs) • 21 items • Updated 2 days ago • 24

WebML

Collection

Machine Learning on the Web • 13 items • Updated Feb 7 • 8

upvoted 2 papers 3 months ago

One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation

Paper • 2402.11909 • Published Feb 19 • 1

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 91

Sayantan Das

AI & ML interests

Organizations

ucalyptus's activity

Unlocking Longer Generation with Key-Value Cache Quantization

Synthetic dataset generation techniques: generating custom sentence similarity data

Hugging Face x LangChain : A new partner package in LangChain

Train custom AI models with the trainer API and adapt them to 🤗

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Hugging Face + Google Visual Blocks

Merge Large Language Models with mergekit

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

AI Apps in a Flash with Gradio's Reload Mode

Towards Encrypted Large Language Models with FHE

Fine-tune Llama 3 with ORPO

CodeGemma - an official Google release for code LLMs