1 37 67

Enzo Shiraishi

eshiraishi

https://eshiraishi.github.io

AI & ML interests

Deep Learning, NLP, Transformers, Interpretability, RLHF, Conversational AI.

Recent Activity

liked a dataset 26 days ago

EleutherAI/pile

liked a dataset 26 days ago

bigscience-data/roots_pt_wikiquote

liked a dataset 26 days ago

bigscience-data/roots_pt_wikinews

View all activity

Organizations

eshiraishi's activity

upvoted an article 7 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 229

upvoted a paper 7 months ago

A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 14

upvoted an article 7 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 352

upvoted a collection 8 months ago

sentence-transformers-from-synthetic-data

Collection

Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model • 4 items • Updated Jun 21, 2024 • 22

upvoted an article 11 months ago

Article

Improving Prompt Consistency with Structured Generations

Apr 30, 2024

• 63

upvoted 2 papers 11 months ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 122

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188

upvoted 2 papers 12 months ago

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 76

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 65

upvoted a collection 12 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 737

upvoted 4 papers 12 months ago

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 65

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

Paper • 2403.11322 • Published Mar 17, 2024 • 1

upvoted 6 papers about 1 year ago

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 80

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 33

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8