6 10 14

Shubham Toshniwal

stoshniwal

https://shtoshni.github.io/

shtoshni

AI & ML interests

NLP, LLM

Recent Activity

new activity about 2 months ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B:Tokenizer config is wrong

liked a model 3 months ago

Qwen/Qwen2.5-Math-7B-Instruct

liked a model 4 months ago

Qwen/QwQ-32B-Preview

View all activity

Organizations

stoshniwal's activity

New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-32B about 2 months ago

Tokenizer config is wrong

#10 opened about 2 months ago by

stoshniwal

liked a model 3 months ago

Qwen/Qwen2.5-Math-7B-Instruct

Text Generation • Updated Sep 23, 2024 • 88.5k • 63

liked a model 4 months ago

Qwen/QwQ-32B-Preview

Text Generation • Updated Jan 12 • 249k • • 1.72k

upvoted a paper 4 months ago

Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 52

updated 4 models 4 months ago

updated a dataset 4 months ago

nvidia/OpenMathInstruct-2

Viewer • Updated Nov 25, 2024 • 22M • 6.43k • 159

upvoted a collection 4 months ago

Qwen2.5-Math

Collection

Math-specific model series based on Qwen2.5 • 11 items • Updated Jan 14 • 78

liked a model 4 months ago

nvidia/Cosmos-0.1-Tokenizer-DV4x8x8

Updated Nov 11, 2024 • 812 • 12

upvoted an article 5 months ago

Article

Fixing Gradient Accumulation

Oct 16, 2024

• 51

upvoted a collection 5 months ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated Jan 17 • 153

New activity in nvidia/OpenMathInstruct-2 5 months ago

Upload scaling_plot.jpg

#4 opened 5 months ago by

shtoshni

Unable to load dataset

#3 opened 5 months ago by

minyichen

Dataset Viewer issue: JobManagerCrashedError

#2 opened 5 months ago by

stoshniwal

liked a model 5 months ago

nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Jan 14 • 22.3k • 764

upvoted a collection 5 months ago

OpenMath-2

Collection

A collection of models and datasets introduced in "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data" • 7 items • Updated Jan 17 • 13

upvoted 2 papers 5 months ago

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Paper • 2410.01560 • Published Oct 2, 2024 • 4

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 12