Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3
12
61
Nandan Thakur
nthakur
Follow
kenchan0226's profile picture
Tonic's profile picture
Kaguya-19's profile picture
14 followers
·
17 following
https://thakur-nandan.github.io
nandan__thakur
thakur-nandan
AI & ML interests
NLP, IR, QA
Recent Activity
reacted
to
clem
's
post
with 🔥
3 days ago
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google. Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating. With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world. This is incredibly exciting. Let’s go, open science and open-source AI!
reacted
to
their
post
with 🔥
3 days ago
Last year, I curated & generated a few multilingual SFT and DPO datasets by translating English SFT/DPO datasets into 9-10 languages using the https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 model. I hope it helps the community for pretraining/instruction tuning multilingual LLMs! I added a small diagram to briefly describe which datasets are added and their sources. Happy to collaborate in either using these datasets for instruction FT, or wishes to extend translated versions of newer SFT/DPO english datasets! https://huggingface.co/collections/nthakur/multilingual-sft-and-dpo-datasets-67eaf56fe3feca5a57cf7d74
updated
a collection
3 days ago
🏜️MIRAGE-Bench [NAACL'25]
View all activity
Organizations
nthakur
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
5 datasets
3 days ago
nthakur/mirage-bench-sft-teacher-gpt-4o
Viewer
•
Updated
Aug 9, 2024
•
29.2k
•
7
•
2
nthakur/mirage-bench
Viewer
•
Updated
Jun 10, 2024
•
13k
•
245
•
2
nthakur/mirage-bench-output
Viewer
•
Updated
13 days ago
•
10k
•
84
•
1
nthakur/mirage-bench-pairwise-judgments
Viewer
•
Updated
13 days ago
•
299k
•
52
•
1
nthakur/mirage-bench-instruct
Viewer
•
Updated
3 days ago
•
39.8k
•
11
•
1
liked
a dataset
22 days ago
habedi/stack-exchange-dataset
Viewer
•
Updated
Nov 29, 2023
•
82.2k
•
344
•
6
liked
a dataset
25 days ago
nthakur/bge-retrieval-data
Viewer
•
Updated
22 days ago
•
680k
•
332
•
1
liked
5 models
about 1 month ago
facebook/drama-base
Sentence Similarity
•
Updated
about 1 month ago
•
1.07k
•
15
Qwen/Qwen2.5-7B
Text Generation
•
Updated
Sep 25, 2024
•
419k
•
162
intfloat/e5-mistral-7b-instruct
Feature Extraction
•
Updated
Apr 23, 2024
•
164k
•
•
504
Qwen/Qwen2.5-3B
Text Generation
•
Updated
Sep 20, 2024
•
294k
•
•
97
nthakur/contriever-base-msmarco
Sentence Similarity
•
Updated
Jun 9, 2022
•
2.26k
•
2
liked
a dataset
2 months ago
nthakur/bge-full-data
Viewer
•
Updated
Feb 4
•
1.6M
•
209
•
1
liked
2 models
2 months ago
meta-llama/Llama-3.2-1B
Text Generation
•
Updated
Oct 24, 2024
•
3.42M
•
•
1.78k
Alibaba-NLP/gte-modernbert-base
Sentence Similarity
•
Updated
Jan 24
•
106k
•
126
liked
a dataset
2 months ago
cfli/bge-full-data
Updated
Oct 11, 2024
•
541
•
34
liked
a dataset
5 months ago
google/frames-benchmark
Viewer
•
Updated
Oct 15, 2024
•
824
•
1.95k
•
194
liked
a dataset
7 months ago
princeton-nlp/SWE-bench
Viewer
•
Updated
Mar 3
•
21.5k
•
53.1k
•
108
liked
a model
8 months ago
BAAI/bge-reranker-v2-m3
Text Classification
•
Updated
Jun 24, 2024
•
1.17M
•
•
587
liked
a dataset
8 months ago
argilla/distilabel-intel-orca-dpo-pairs
Viewer
•
Updated
16 days ago
•
12.9k
•
4.13k
•
172
Load more