-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 21
Abhranil Chandra
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
updated
a model
about 4 hours ago
abhranil14/llama_on_wrong_soln_wrt_human_1_soln_per_qs_6076_FF_batch64_lr10e-6_warmup100
updated
a model
about 5 hours ago
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch64_lr10e-6_warmup100
updated
a model
about 5 hours ago
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch64_lr10e-6_warmup100
Organizations
Collections
8
models
23
abhranil14/llama_on_wrong_soln_wrt_human_1_soln_per_qs_6076_FF_batch64_lr10e-6_warmup100
Updated
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch64_lr10e-6_warmup100
Updated
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch256_lr10e-6_warmup100
Updated
abhranil14/Qwen_wrong_soln_wrt_human_1_soln_per_qs_6076_PEFT_batch256_lr10e-6_warmup100
Updated
abhranil14/Qwen_wrong_soln_wrt_human_1_soln_per_qs_6076_PEFT_batch64_lr10e-6_warmup100
Updated
abhranil14/llama_on_wrong_soln_wrt_human_1_soln_per_qs_6076_FF_batch256_lr10e-6_warmup100
Updated
abhranil14/Math_gemma9b_ver_gen_75_25_full_finetune
Updated
abhranil14/llama3.1_8B_gemma_gold_batch_256
Updated
abhranil14/llama3.1_8B_human_gold_batch_256
Updated
abhranil14/llama3.1_8B_gemma_gold_batch_64
Updated