librarian-bot (Librarian Bot (Bot))

New activity in librarian-bots/dataset-to-model-monitor about 2 hours ago

Discussion tracking new models trained on HuggingFaceH4/ultrafeedback_binarized

179

#37 opened 5 months ago by

librarian-bot

Discussion tracking new models trained on HuggingFaceH4/ultrachat_200k

224

#15 opened 7 months ago by

librarian-bot

Discussion tracking new models trained on Open-Orca/OpenOrca

161

#19 opened 7 months ago by

librarian-bot

commented a paper about 4 hours ago

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

Paper • 2406.06110 • Published 5 days ago •

2

commented 2 papers about 5 hours ago

SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

Paper • 2406.05678 • Published 6 days ago •

2

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

Paper • 2405.17755 • Published 18 days ago •

2

commented 12 papers about 8 hours ago

A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models

Paper • 2405.16504 • Published 20 days ago •

2

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Paper • 2406.08128 • Published 3 days ago •

2

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

Paper • 2405.15731 • Published 21 days ago •

2

State-Free Inference of State-Space Models: The Transfer Function Approach

Paper • 2405.06147 • Published May 10 •

2

LoCoCo: Dropping In Convolutions for Long Context Compression

Paper • 2406.05317 • Published 7 days ago •

2

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Paper • 2406.06484 • Published 4 days ago • 2 •

2

LongSSM: On the Length Extension of State-space Models in Language Modelling

Paper • 2406.02080 • Published 11 days ago •

2

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published 3 days ago • 14 •

3

commented a paper 1 day ago

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Paper • 2305.14235 • Published May 23, 2023 •

2

commented 3 papers 2 days ago

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Paper • 2402.10588 • Published Feb 16 • 1 •

2

Hebbian Deep Learning Without Feedback

Paper • 2209.11883 • Published Sep 23, 2022 •

2

Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation

Paper • 2405.19290 • Published 16 days ago •

2

New activity in librarian-bots/dataset-to-model-monitor 2 days ago

Discussion tracking new models trained on google/fleurs

117

#6 opened 10 months ago by

librarian-bot

commented a paper 2 days ago

Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13 • 3 •

3

New activity in librarian-bots/dataset-to-model-monitor 3 days ago

Discussion tracking new models trained on HuggingFaceH4/cai-conversation-harmless

15

#44 opened 5 months ago by

librarian-bot

Discussion tracking new models trained on HuggingFaceH4/orca_dpo_pairs

9

#46 opened 4 months ago by

librarian-bot

Discussion tracking new models trained on LDJnr/Capybara

133

#33 opened 5 months ago by

librarian-bot

commented 15 papers 3 days ago

Super Tiny Language Models

Paper • 2405.14159 • Published 23 days ago •

2

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3 •

2

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22 • 6 •

3

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Paper • 2405.16287 • Published 20 days ago • 10 •

2

D'OH: Decoder-Only random Hypernetworks for Implicit Neural Representations

Paper • 2403.19163 • Published Mar 28 •

2

Byte-Level Recursive Convolutional Auto-Encoder for Text

Paper • 1802.01817 • Published Feb 6, 2018 •

2

FIT: Far-reaching Interleaved Transformers

Paper • 2305.12689 • Published May 22, 2023 • 1 •

2

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 8 •

9

Prompting-based Synthetic Data Generation for Few-Shot Question Answering

Paper • 2405.09335 • Published about 1 month ago •

2

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

Paper • 2310.09716 • Published Oct 15, 2023 •

2

TarGEN: Targeted Data Generation with Large Language Models

Paper • 2310.17876 • Published Oct 27, 2023 •

2

CrossTune: Black-Box Few-Shot Classification with Label Enhancement

Paper • 2403.12468 • Published Mar 19 •

2

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Paper • 2405.17057 • Published 19 days ago •

2

SemCoder: Training Code Language Models with Comprehensive Semantics

Paper • 2406.01006 • Published 12 days ago •

2

NExT: Teaching Large Language Models to Reason about Code Execution

Paper • 2404.14662 • Published Apr 23 • 4 •

2

commented a paper 4 days ago

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Paper • 2405.16325 • Published 20 days ago • 1 •

2

New activity in librarian-bots/dataset-to-model-monitor 4 days ago

Discussion tracking new models trained on google/cvss

1

#11 opened 9 months ago by

librarian-bot

Discussion tracking new models trained on nvidia/HelpSteer

67

#21 opened 6 months ago by

librarian-bot

commented 9 papers 4 days ago

Spectral Adapter: Fine-Tuning in Spectral Space

Paper • 2405.13952 • Published 23 days ago •

2

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Paper • 2405.15179 • Published 22 days ago • 1 •

2

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

Paper • 2405.20271 • Published 15 days ago •

2

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Paper • 2406.02214 • Published 11 days ago •

2

Sparse Matrix in Large Language Model Fine-tuning

Paper • 2405.15525 • Published 21 days ago •

2

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Paper • 2405.19597 • Published 16 days ago •

2

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published 18 days ago • 1 •

2

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

Paper • 2405.03003 • Published May 5 • 1 •

2

NOLA: Networks as Linear Combination of Low Rank Random Basis

Paper • 2310.02556 • Published Oct 4, 2023 • 2 •

2

commented 2 papers 5 days ago

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published 15 days ago • 10 •

1

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Paper • 2405.13195 • Published 24 days ago • 8 •

1

commented 4 papers 6 days ago

Extended Mind Transformers

Paper • 2406.02332 • Published 10 days ago •

2

Loki: Low-Rank Keys for Efficient Sparse Attention

Paper • 2406.02542 • Published 10 days ago •

2

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

Paper • 2402.13449 • Published Feb 21 •

2

Self-Selected Attention Span for Accelerating Large Language Model Inference

Paper • 2404.09336 • Published Apr 14 •

2

Librarian Bot (Bot)

AI & ML interests

Organizations

librarian-bot's activity

Discussion tracking new models trained on HuggingFaceH4/ultrafeedback_binarized

Discussion tracking new models trained on HuggingFaceH4/ultrachat_200k

Discussion tracking new models trained on Open-Orca/OpenOrca

Discussion tracking new models trained on google/fleurs

Discussion tracking new models trained on HuggingFaceH4/cai-conversation-harmless

Discussion tracking new models trained on HuggingFaceH4/orca_dpo_pairs

Discussion tracking new models trained on LDJnr/Capybara

Discussion tracking new models trained on google/cvss

Discussion tracking new models trained on nvidia/HelpSteer