3 88 88

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper about 23 hours ago

Gemini Embedding: Generalizable Embeddings from Gemini

liked a model 1 day ago

google/gemma-3-27b-it

reacted to AdinaY's post with 🤗 2 days ago

Open Sora 2.0 is out 🔥 https://huggingface.co/collections/hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5 ✨ 11B with Apache2.0 ✨ Low training cost - $200k ✨ open weights, code and training workflow

View all activity

Organizations

theainerd's activity

upvoted a paper about 23 hours ago

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published 6 days ago • 28

liked a model 1 day ago

google/gemma-3-27b-it

Image-Text-to-Text • Updated 4 days ago • 190k • 648

reacted to AdinaY's post with 🤗 2 days ago

Post

1715

Open Sora 2.0 is out 🔥
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow

reacted to onekq's post with 👍 7 days ago

Post

3238

QwQ-32B is amazing!

It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models.
onekq-ai/WebApp1K-models-leaderboard

Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? 🤗

1 reply

reacted to clem's post with 🔥 8 days ago

Post

7064

I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).

He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.

As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!

6 replies

upvoted a paper 9 days ago

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published 14 days ago • 58

reacted to Kseniase's post with 🔥 13 days ago

Post

6080

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

upvoted an article 17 days ago

Article

SigLIP 2: A better multilingual vision language encoder

23 days ago

• 136

reacted to AdinaY's post with 🔥 17 days ago

Post

2715

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

1 reply

reacted to burtenshaw's post with 🔥 18 days ago

Post

6225

Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

🔗 Follow the org for updates https://huggingface.co/agents-course

This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .

liked a Space 18 days ago

325

AI Deadlines

⚡

Schedule tasks efficiently using AI-generated deadlines

reacted to stefan-it's post with 👍 18 days ago

Post

5069

She arrived 😍

[Expect more models soon...]

2 replies

upvoted a paper 20 days ago

LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published 23 days ago • 26

liked a dataset 22 days ago

facebook/natural_reasoning

Viewer • Updated 23 days ago • 1.15M • 11.7k • 409

upvoted 2 papers 22 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 24 days ago • 129

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 24 days ago • 179

upvoted a paper 23 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 24 days ago • 97

reacted to cogwheelhead's post with 👍 23 days ago

Post

2515

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

liked 2 Spaces 24 days ago

156

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset

Paligemma2 Mix

🌖

Generate text or segment objects from an image