Data Is Better Together

community

Activity Feed

AI & ML interests

Building better datasets together

Recent Activity

dvilasuero updated a dataset about 22 hours ago

data-is-better-together/fineweb-c-progress

davidberenstein1957 updated a collection 1 day ago

Open Image Preferences

nataliaElv updated a Space 3 days ago

data-is-better-together/fineweb-communications-pack

View all activity

data-is-better-together's activity

burtenshaw

posted an update about 19 hours ago

Post

1082

People are flexing their end of year stats, so I made this app to show hub stats in a tidy design!

Thanks @Ameeeee and @jfcalvo for the feature from Argilla!
burtenshaw/recap

1 reply

dvilasuero

updated a dataset about 22 hours ago

data-is-better-together/fineweb-c-progress

Viewer • Updated about 22 hours ago • 281 • 414 • 1

davidberenstein1957

posted an update 1 day ago

Post

846

🐇 Tumble down the AI rabbit hole without any technical knowledge!

Explore AI models on the Hub by a simple and quick search

Demo: davidberenstein1957/transformers-pipeline-playground

davidberenstein1957

updated a collection 1 day ago

Open Image Preferences

Collection

Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated 1 day ago • 4

sayakpaul

posted an update 2 days ago

Post

1377

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

nataliaElv

posted an update 3 days ago

Post

1550

If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU

nataliaElv

updated a Space 3 days ago

Running

🌐📢

FineWeb 2 Communications Pack

davidberenstein1957

updated a collection 3 days ago

Open Image Preferences

Collection

Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated 1 day ago • 4

davidberenstein1957

in data-is-better-together/open-image-preferences-v1-results 4 days ago

170'000 additional annotations

#3 opened 4 days ago by

jasoncorkill

davidberenstein1957

posted an update 4 days ago

Post

4006

Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator

4 replies

davidberenstein1957

updated a collection 8 days ago

Open Image Preferences

Collection

Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated 1 day ago • 4

librarian-bot

updated a dataset 8 days ago

data-is-better-together/open-image-preferences-v1-binarized

Viewer • Updated 11 days ago • 7.46k • 1.43k • 36

librarian-bot

in data-is-better-together/open-image-preferences-v1-binarized 8 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 8 days ago by

librarian-bot

nataliaElv

posted an update 9 days ago

Post

1228

How do your annotations for FineWeb2 compare to your teammates'?

I started contributing some annotations to the FineWeb2 collaborative annotation sprint and I wanted to know if my labelling trends were similar to those of my teammates.

I did some analysis and I wasn't surprised to see that I'm being a bit harsher on my evaluations than my mates 😂

Do you want to see how your annotations compare to others?
👉 Go to this Gradio space: nataliaElv/fineweb2_compare_my_annotations
✍️ Enter the dataset that you've contributed to and your Hugging Face username.

How were your results?
- Contribute some annotations: data-is-better-together/fineweb-c
- Join your language channel in Rocket chat: HuggingFaceFW/discussion

frascuchon

updated a Space 9 days ago

Running on CPU Upgrade

🌐

FineWeb-c - Annotation

dvilasuero

updated a Space 9 days ago

Running

🌐📊

FineWeb 2 - Community Leaderboard

davidberenstein1957

updated a model 10 days ago

data-is-better-together/open-image-preferences-v1-flux-dev-lora

Text-to-Image • Updated 10 days ago • 364 • • 17

davidberenstein1957

in data-is-better-together/open-image-preferences-v1-flux-dev-lora 10 days ago

Add generated example

#10 opened 10 days ago by

davidberenstein1957

dvilasuero

updated a Space 10 days ago

Running

🌐📢

FineWeb 2 Communications Pack

burtenshaw

posted an update 10 days ago

Post

2327

Quick update from week 1 of smol course. The community is taking the driving seat and using the material for their own projects. If you want to do the same, join in!

- we have ongoing translation projects in Korean, Vietnamese, Portuguese, and Spanish
- 3 chapters are ready for students. On topics like, instruction tuning, preference alignment, and parameter efficient fine tuning
- 3 chapters are in progress on evaluation, vision language models, and synthetic data.
- around 780 people have forked the repo to use it for learning, teaching, sharing.

⏭️ Next step is to support people that want to use the course for teaching, content creation, internal knowledge sharing, or anything. If you're into this. Drop an issue or PR

REPO: https://buff.ly/3ZCMKX2
discord channel: https://buff.ly/4f9F8jA

AI & ML interests

Recent Activity

Team members 15

data-is-better-together's activity

FineWeb 2 Communications Pack

170'000 additional annotations

Librarian Bot: Add language metadata for dataset

FineWeb-c - Annotation

FineWeb 2 - Community Leaderboard

Add generated example

FineWeb 2 Communications Pack