Song's picture

6 5 10

Song

Hwanjun

·

AI & ML interests

None yet

Recent Activity

reacted to Kseniase's post with 👍 8 days ago

16 new research on inference-time scaling: For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models. So here are 13 new methods + 3 comprehensive studies on test-time scaling: 1. https://huggingface.co/papers/2504.02495 Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT) 2. https://huggingface.co/papers/2504.04718 Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification 3. https://huggingface.co/papers/2504.00810 Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window 4. https://huggingface.co/papers/2504.00891 Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models 5. https://huggingface.co/papers/2503.24320 SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search 6. https://huggingface.co/papers/2504.07104 Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases 7. https://huggingface.co/papers/2503.13288 Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps Read further below 👇 Also, subscribe to the Turing Post https://www.turingpost.com/subscribe

upvoted a paper 8 days ago

Inference-Time Scaling for Generalist Reward Modeling

upvoted a paper 22 days ago

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

View all activity

Organizations

Hwanjun's activity

liked 2 models about 1 month ago

DISLab/Gen-8B-R2

Question Answering • Updated Mar 19 • 38 • 2

DISLab/Ext2Gen-8B-R2

Question Answering • Updated Mar 19 • 20 • 4

liked a dataset 3 months ago

DISLab/FeedSum

Viewer • Updated Jan 25 • 127k • 59 • 3

liked a model 5 months ago

DevQuasar/DISLab.SummLlama3.2-3B-GGUF

Text Generation • Updated Feb 1 • 49 • 3

liked 5 models 6 months ago

DISLab/SummLlama3-70B

Summarization • Updated Nov 13, 2024 • 12 • 7

DISLab/SummLlama3.1-8B

Summarization • Updated Nov 13, 2024 • 148 • 10

DISLab/SummLlama3.1-70B

Summarization • Updated Nov 13, 2024 • 3 • 7

DISLab/SummLlama3.2-3B

Summarization • Updated Dec 10, 2024 • 602 • 36

DISLab/SummLlama3-8B

Summarization • Updated Nov 13, 2024 • 17 • 14

liked a model 12 months ago

allenai/led-base-16384

Text2Text Generation • Updated Jan 24, 2023 • 28.3k • 44