Akarshan Biswas's picture

Akarshan Biswas

qnixsynapse

·

qnixsynapse

AI & ML interests

NLP, models, quantization

Recent Activity

updated a model 3 days ago

qnixsynapse/pythia-1.4b-Q4_0-GGUF

published a model 3 days ago

qnixsynapse/pythia-1.4b-Q4_0-GGUF

upvoted a paper 10 days ago

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

View all activity

Organizations

None yet

qnixsynapse's activity

upvoted a paper 10 days ago

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 19

upvoted a collection 15 days ago

Gemma 3 Release

17 items • Updated about 9 hours ago • 297

upvoted a paper about 1 month ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 130

upvoted a paper 4 months ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 57

upvoted a paper 6 months ago

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16, 2024 • 44

upvoted a paper 7 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

upvoted an article 8 months ago

Article

Tool Use, Unified

Aug 12, 2024

• 93

upvoted 2 papers 8 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 41

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 114

upvoted a collection 8 months ago

Gemma 2 2B Release

The 2.6B parameter version of Gemma 2. • 6 items • Updated about 9 hours ago • 79

upvoted 3 papers 9 months ago

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4, 2024 • 19

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 25

upvoted a collection 10 months ago

SSMs

A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated 1 day ago • 27

upvoted a paper 10 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8, 2024 • 9

upvoted 3 papers 11 months ago

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 111

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22, 2024 • 8

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19, 2024 • 39

upvoted a collection 11 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 728

upvoted a paper 12 months ago

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 46