Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
11
6
18
Ashvini Kumar Jindal
akjindal53244
Follow
vgoklani's profile picture
Binu30's profile picture
ashishlal's profile picture
58 followers
·
4 following
akjindal53244
akjindal53244
AI & ML interests
NLP
Recent Activity
Reacted to
albertvillanova
's
post
with 👍
about 1 month ago
🚨 We’ve just released a new tool to compare the performance of models in the 🤗 Open LLM Leaderboard: the Comparator 🎉 https://huggingface.co/spaces/open-llm-leaderboard/comparator Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. 🦙🧵👇 1/ Load the Models' Results - Go to the 🤗 Open LLM Leaderboard Comparator: https://huggingface.co/spaces/open-llm-leaderboard/comparator - Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns. - Press the Load button. Ready to dive into the results! 2/ Compare Metric Results in the Results Tab 📊 - Head over to the Results tab. - Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟 - Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro. 3/ Check Config Alignment in the Configs Tab ⚙️ - To ensure you’re comparing apples to apples, head to the Configs tab. - Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs... - If something looks off, it’s good to know before drawing conclusions! ✅ 4/ Compare Predictions by Sample in the Details Tab 🔍 - Curious about how each model responds to specific inputs? The Details tab is your go-to! - Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button. - Check out the side-by-side predictions and dive into the nuances of each model’s outputs. 5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons. 🚀 Try the 🤗 Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
View all activity
Articles
Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging
Aug 19
•
73
Organizations
akjindal53244
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
akjindal53244/Llama-3.1-Storm-8B
2 months ago
Languages report ?
1
#7 opened 2 months ago by
nicolollo
New activity in
akjindal53244/Llama-3.1-Storm-8B
3 months ago
too smart ?
2
#4 opened 3 months ago by
Daemontatox
New activity in
akjindal53244/Llama-3.1-Storm-8B-GGUF
3 months ago
is it uncensored ?
1
#2 opened 3 months ago by
Shehab007
New activity in
akjindal53244/Llama-3.1-Storm-8B
3 months ago
Ollama?
7
#2 opened 3 months ago by
Apdouglas
New activity in
blog-explorers/README
3 months ago
[Support] Community Articles
65
#5 opened 8 months ago by
victor
New activity in
arcee-ai/Llama-Spark
4 months ago
Not able to reproduce benchmark metrics
3
#2 opened 4 months ago by
akjindal53244
New activity in
akjindal53244/Arithmo-Mistral-7B
10 months ago
qlora weights
1
#1 opened about 1 year ago by
baconnier
New activity in
akjindal53244/Mistral-7B-v0.1-Open-Platypus
about 1 year ago
Adding Evaluation Results
#1 opened about 1 year ago by
leaderboard-pr-bot
New activity in
open-llm-leaderboard-old/requests
about 1 year ago
Update model type from "fine-tuned" to "instruction-tuned"
3
#13 opened about 1 year ago by
akjindal53244
New activity in
open-llm-leaderboard/open_llm_leaderboard
about 1 year ago
Update Model Type from fine-tuned to Instruction-Tuned
3
#312 opened about 1 year ago by
akjindal53244
Mistral Finetunes Are Failing
10
#311 opened about 1 year ago by
Weyaxi