27 10 1

Dattu Sharma

imdatta0

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Recent Activity

updated a model 37 minutes ago

imdatta0/llama8b_dmath_acconly_repeat_const_unsloth

updated a model about 2 hours ago

imdatta0/llama_openthoughts_sorted_sft

updated a model about 3 hours ago

imdatta0/thinkllama_r1math_grpo

View all activity

Organizations

imdatta0's activity

New activity in open-thoughts/OpenThoughts-114k 2 months ago

Typo in system prompt

#7 opened 2 months ago by

imdatta0

New activity in meta-llama/Llama-3.3-70B-Instruct 4 months ago

Tokenizer doesn't load with transformers 4.34.4

#21 opened 4 months ago by

imdatta0

New activity in imdatta0/wikipedia_en_sample 5 months ago

Librarian Bot: Add language metadata for dataset

#2 opened 5 months ago by

librarian-bot

commented 5 papers 5 months ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 1 •

commented 5 papers 6 months ago

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 32 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 176 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 32 •

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 147 •

commented 7 papers 7 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 41 •

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 16 •

Planning In Natural Language Improves LLM Search For Code Generation

Paper • 2409.03733 • Published Sep 5, 2024 •

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 79 •

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21, 2024 • 25 •

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 33 •

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58 •