48 55 340

Yacine Jernite

yjernite

https://yjernite.github.io/

YJernite

yjernite

AI & ML interests

Technical, community, and regulatory tools of AI governance @HuggingFace

Articles

📚 Training Data Transparency in AI: Tools, Trends, and Policy Recommendations 🗳️

Dec 5, 2023

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 11

AI Policy @🤗: Open ML Considerations in the EU AI Act

Jul 24, 2023

AI Policy @🤗: Response to the U.S. NTIA's Request for Comment on AI Accountability

Jun 20, 2023

Hugging Face Selected for the French Data Protection Agency Enhanced Support Program

May 15, 2023

Ethics and Society Newsletter #3: Ethical Openness at Hugging Face

Mar 30, 2023

Ethics and Society Newsletter #2: Let's talk about bias!

Dec 15, 2022

Putting ethical principles at the core of research lifecycle

May 19, 2022

Introducing the Data Measurements Tool: an Interactive Tool for Looking at Datasets

Nov 29, 2021

Organizations

yjernite's activity

upvoted an article 8 days ago

Article

AI has a problem with objectifying women

•

8 days ago

• 52

upvoted an article 10 days ago

Article

Let's talk about LLM evaluation

•

9 days ago

• 82

upvoted 2 collections 15 days ago

CommonCanvas

Collection

Collection of models trained on the CommonCatalogue datasets • 8 items • Updated 15 days ago • 6

CommonCatalog

Collection

Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated 15 days ago • 7

upvoted a collection 16 days ago

Wikimedia Datasets

Collection

Wikimedia datasets, across languages and modalities, from different Wikimedia projects, on the hub. Not all tested. • 19 items • Updated 16 days ago • 9

upvoted an article 17 days ago

Article

Energy Star Ratings for AI Models

•

23 days ago

• 15

upvoted an article 29 days ago

Article

⚗️ 🧑🏼‍🌾 Let's grow some Domain Specific Datasets together

•

Apr 29

• 27

upvoted an article about 1 month ago

Article

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

•

Apr 18

• 20

upvoted a paper about 2 months ago

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 26

upvoted 3 articles about 2 months ago

Article

Vision Language Models Explained

Apr 11

• 90

Article

Policy Questions Blog 1: AI Data Transparency Remarks for NAIAC Panel 📚🔍⚖️

•

Mar 27

• 1

Article

Public Policy at Hugging Face

Apr 8

• 17

upvoted 2 collections 2 months ago

Creación de corpus en comunidad

Collection

Colección de esfuerzos colaborativos para crear corpus en español de calidad. Toda persona hispanohablante puede contribuir :) • 7 items • Updated 25 days ago • 6

Common Corpus

Collection

The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 103

upvoted 3 collections 3 months ago

Chronos Models

Collection

Chronos: Pretrained (language) models for time series forecasting based on the T5 architecture. • 6 items • Updated Mar 18 • 25

MetricX-23

Collection

A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated 17 days ago • 13

Awesome Document AI

Collection

A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 38

upvoted a paper 3 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 125

upvoted 3 collections 3 months ago

🇮🇹 Italian NLP Resources

Collection

Collection of models, datasets and demos relevant to Italian NLP 🇮🇹 • 182 items • Updated about 17 hours ago • 18

Nomic Embed

Collection

Open Source Long Context Text Embedders • 8 items • Updated Feb 14 • 8

Sora Reference Papers

Collection

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora • 30 items • Updated Feb 20 • 50

upvoted 5 collections 4 months ago

GritLM

Collection

Generative Representational Instruction Tuning (GRIT) • 64 items • Updated Apr 17 • 4

⛔️🔦 Provenance, Watermarking & Deepfake Detection

Collection

Technical tools for more control over non-consensual synthetic content • 14 items • Updated Apr 1 • 36

Historic Newsaper Datasets

Collection

Historic Newspaper Datasets on the Hub • 13 items • Updated 30 days ago • 3

🔍 Daily Picks in Interpretability & Analysis of LMs

Collection

Outstanding research in interpretability and evaluation of language models, summarized • 51 items • Updated about 17 hours ago • 54

WAVES

Collection

Benchmarking the Robustness of Image Watermarks. Under development. Data will be released soon. • 2 items • Updated Jan 24 • 2

upvoted a collection 5 months ago

Zeroshot Classifiers

Collection

These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 79

upvoted 2 collections 6 months ago

Korean Datasets I've released so far.

Collection

지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated 7 days ago • 14

Tulu V2 Suite

Collection

The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated Feb 1 • 43

upvoted 6 collections 7 months ago

Radiology

Collection

7 items • Updated Nov 7, 2023 • 5

Custom Components ✨

Collection

Awesome gradio custom components to get you started build your own! • 7 items • Updated Nov 20, 2023 • 31

Reward models on the hub

Collection

UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13 • 24

Biomedical Demos

Collection

Some of my favorite biomedical demos • 8 items • Updated Dec 1, 2023 • 2

Medical QA Datasets

Collection

A collection of medical question answering (QA) datasets • 19 items • Updated Oct 31, 2023 • 16

Leaderboards and benchmarks ✨

Collection

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 62 items • Updated 11 days ago • 62

upvoted a paper 7 months ago

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness

Paper • 2302.10893 • Published Feb 7, 2023 • 5

upvoted 2 collections 7 months ago

AI Ethics projects in Spanish

Collection

Datasets, models and spaces related to hate speech detection and bias evaluation in Spanish. • 17 items • Updated Apr 13 • 6

Resources: Bias, Stereotypes, and Representational Harms

Collection

Linking collected resources for this category that have a dataset, model, or demo on Hugging Face or a paper on ArXiv (inked through Hugging Face) • 20 items • Updated Feb 17 • 1

upvoted 2 collections 8 months ago

Sourced from Wikimedia

Collection

Wikimedia collections, i.e. Wikipedia, are heavily used in ML research. This collection highlights some prominent examples of these datasets. • 9 items • Updated 3 days ago • 2

DIY AI For Journalists

Collection

Compiling resources useful for journalists building prototypes with AI • 8 items • Updated Sep 18, 2023 • 10

upvoted 2 papers 8 months ago

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

Paper • 2309.17410 • Published Sep 29, 2023 • 4

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Paper • 2309.01219 • Published Sep 3, 2023 • 2

upvoted 13 collections 8 months ago

Domain specific data and model documentation

Collection

There is a growing number of datasheets or model card frameworks being proposed for particular domains. This collection tries to capture some of these • 5 items • Updated Oct 5, 2023 • 2

Domain specific data and model documentation

Collection

There is a growing number of datasheets or model card frameworks being proposed for particular domains. This collection tries to capture some of these • 6 items • Updated Oct 5, 2023 • 1

Christopher

Collection

You can find the best of Christopher's work here • 13 items • Updated Nov 8, 2023 • 1

Ceyda

Collection

You can find the best of Ceyda's work here • 12 items • Updated Nov 8, 2023 • 1

Aritra

Collection

You can find the best of Aritra's work here • 9 items • Updated Nov 8, 2023 • 1

🏛️📚🖼️ Open Data: Public Domain and Open Licenses

Collection

9 items • Updated Feb 9 • 4

💻🔍 Understanding Models

Collection

23 items • Updated Mar 26 • 5

📚🔍 Understanding Datasets

Collection

9 items • Updated Feb 9 • 5

📊 Benchmarks and Leaderboards

Collection

33 items • Updated 9 days ago • 5

🤬⛔ Hate Speech and Filtering

Collection

3 items • Updated Feb 9 • 3

🔒☂️🧑‍🤝‍🧑 Privacy and AI

Collection

8 items • Updated Apr 4 • 5

⚖️ Showing Biases in ML Systems

Collection

9 items • Updated Feb 9 • 4

🗳️ AI for Policymakers

Collection

AI systems have much to offer to policymakers, both as a tool to support their work and as a technology that can improve access to public services. • 13 items • Updated Mar 8 • 7

Yacine Jernite

AI & ML interests

Articles

Public Policy at Hugging Face

Policy Questions Blog 1: AI Data Transparency Remarks for NAIAC Panel 📚🔍⚖️

AI Watermarking 101: Tools and Techniques

📚 Training Data Transparency in AI: Tools, Trends, and Policy Recommendations 🗳️

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

AI Policy @🤗: Open ML Considerations in the EU AI Act

AI Policy @🤗: Response to the U.S. NTIA's Request for Comment on AI Accountability

Hugging Face Selected for the French Data Protection Agency Enhanced Support Program

Ethics and Society Newsletter #3: Ethical Openness at Hugging Face

Ethics and Society Newsletter #2: Let's talk about bias!

Putting ethical principles at the core of research lifecycle

Introducing the Data Measurements Tool: an Interactive Tool for Looking at Datasets

Organizations

yjernite's activity

AI has a problem with objectifying women

Let's talk about LLM evaluation

Energy Star Ratings for AI Models

⚗️ 🧑🏼‍🌾 Let's grow some Domain Specific Datasets together

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

Vision Language Models Explained

Policy Questions Blog 1: AI Data Transparency Remarks for NAIAC Panel 📚🔍⚖️

Public Policy at Hugging Face