VictorSanh
commited on
Commit
·
a6e5964
1
Parent(s):
40ba628
integrate Giada's comments
Browse files
README.md
CHANGED
@@ -26,7 +26,9 @@ IDEFICS is on par with the original model on various image-text benchmarks, incl
|
|
26 |
|
27 |
We also fine-tune these base models on a mixture of supervised and instruction fine-tuning datasets, which boosts the downstream performance while making the models more usable in conversational settings: [idefics-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-80b-instruct) and [idefics-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct). As they reach higher performance, we recommend using these instructed versions first.
|
28 |
|
29 |
-
Read more about some of the technical challenges encountered during training IDEFICS [here](https://github.com/huggingface/m4-logs/blob/master/memos/README.md).
|
|
|
|
|
30 |
|
31 |
# Model Details
|
32 |
|
@@ -356,9 +358,9 @@ When looking at the response to the arrest prompt for the FairFace dataset, the
|
|
356 |
Comparing generated responses to the resume prompt by gender across both datasets, we see for FairFace that the terms `financial`, `development`, `product` and `software` appear more frequently for `man`. For StableBias, the terms `data` and `science` appear more frequently for `non-binary`.
|
357 |
|
358 |
![Notebook Screenshot](https://huggingface.co/spaces/HuggingFaceM4/m4-bias-eval/resolve/main/bias_nb_screenshot.png)
|
359 |
-
The [notebook](https://huggingface.co/spaces/HuggingFaceM4/m4-bias-eval/blob/main/m4_bias_eval.ipynb) used to carry out this evaluation gives a more detailed overview of the evaluation.
|
360 |
-
You can access a [demo](https://huggingface.co/spaces/HuggingFaceM4/IDEFICS-bias-eval) to explore the outputs generated by the model for this evaluation.
|
361 |
-
You can also access the generations produced in this evaluation at [HuggingFaceM4/m4-bias-eval-stable-bias](https://huggingface.co/datasets/HuggingFaceM4/m4-bias-eval-stable-bias) and [HuggingFaceM4/m4-bias-eval-fair-face](https://huggingface.co/datasets/HuggingFaceM4/m4-bias-eval-fair-face). We hope sharing these generations will make it easier for other people to build on our initial evaluation work.
|
362 |
|
363 |
Alongside this evaluation, we also computed the classification accuracy on FairFace for both the base and instructed models:
|
364 |
|
@@ -372,7 +374,23 @@ Alongside this evaluation, we also computed the classification accuracy on FairF
|
|
372 |
*Per bucket standard deviation. Each bucket represents a combination of race and gender from the [FairFace](https://huggingface.co/datasets/HuggingFaceM4/FairFace) dataset.
|
373 |
## Other limitations
|
374 |
|
375 |
-
- The model currently will offer medical diagnosis when prompted to do so. For example, the prompt `Does this X-ray show any medical problems?` along with an image of a chest X-ray returns `Yes, the X-ray shows a medical problem, which appears to be a collapsed lung
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
376 |
|
377 |
# License
|
378 |
|
@@ -401,4 +419,4 @@ Stas Bekman, Léo Tronchon, Hugo Laurençon, Lucile Saulnier, Amanpreet Singh, A
|
|
401 |
|
402 |
# Model Card Contact
|
403 |
|
404 |
-
Please open a discussion on the Community tab!
|
|
|
26 |
|
27 |
We also fine-tune these base models on a mixture of supervised and instruction fine-tuning datasets, which boosts the downstream performance while making the models more usable in conversational settings: [idefics-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-80b-instruct) and [idefics-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct). As they reach higher performance, we recommend using these instructed versions first.
|
28 |
|
29 |
+
Read more about some of the technical challenges we encountered during training IDEFICS [here](https://github.com/huggingface/m4-logs/blob/master/memos/README.md).
|
30 |
+
|
31 |
+
*How do I pronounce the name? [Youtube tutorial](https://www.youtube.com/watch?v=YKO0rWnPN2I&ab_channel=FrenchPronunciationGuide)*
|
32 |
|
33 |
# Model Details
|
34 |
|
|
|
358 |
Comparing generated responses to the resume prompt by gender across both datasets, we see for FairFace that the terms `financial`, `development`, `product` and `software` appear more frequently for `man`. For StableBias, the terms `data` and `science` appear more frequently for `non-binary`.
|
359 |
|
360 |
![Notebook Screenshot](https://huggingface.co/spaces/HuggingFaceM4/m4-bias-eval/resolve/main/bias_nb_screenshot.png)
|
361 |
+
The [notebook](https://huggingface.co/spaces/HuggingFaceM4/m4-bias-eval/blob/main/m4_bias_eval.ipynb) used to carry out this evaluation gives a more detailed overview of the evaluation.
|
362 |
+
You can access a [demo](https://huggingface.co/spaces/HuggingFaceM4/IDEFICS-bias-eval) to explore the outputs generated by the model for this evaluation.
|
363 |
+
You can also access the generations produced in this evaluation at [HuggingFaceM4/m4-bias-eval-stable-bias](https://huggingface.co/datasets/HuggingFaceM4/m4-bias-eval-stable-bias) and [HuggingFaceM4/m4-bias-eval-fair-face](https://huggingface.co/datasets/HuggingFaceM4/m4-bias-eval-fair-face). We hope sharing these generations will make it easier for other people to build on our initial evaluation work.
|
364 |
|
365 |
Alongside this evaluation, we also computed the classification accuracy on FairFace for both the base and instructed models:
|
366 |
|
|
|
374 |
*Per bucket standard deviation. Each bucket represents a combination of race and gender from the [FairFace](https://huggingface.co/datasets/HuggingFaceM4/FairFace) dataset.
|
375 |
## Other limitations
|
376 |
|
377 |
+
- The model currently will offer medical diagnosis when prompted to do so. For example, the prompt `Does this X-ray show any medical problems?` along with an image of a chest X-ray returns `Yes, the X-ray shows a medical problem, which appears to be a collapsed lung.`. We strongly discourage users to use the model on medical applications without proper adaptation and evaluation.
|
378 |
+
- Despite our efforts on filtering the training data, we found a small proportion of content that is not suitable for all audience. This includes pornographic content and reports of violent shootings and is prevalent in the OBELICS portion of the data (see [here](https://huggingface.co/datasets/HuggingFaceM4/OBELICS#content-warnings) for more details). As such, the model is suceptible to generate text that resemble these content.
|
379 |
+
|
380 |
+
# Misuse and Out-of-scope use
|
381 |
+
|
382 |
+
Using the model in [high-stakes](https://huggingface.co/bigscience/bloom/blob/main/README.md#glossary-and-calculations) settings is out of scope for this model. The model is not designed for [critical decisions](https://huggingface.co/bigscience/bloom/blob/main/README.md#glossary-and-calculations) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct. Out-of-scope uses include:
|
383 |
+
- Usage for evaluating or scoring individuals, such as for employment, education, or credit
|
384 |
+
- Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
|
385 |
+
|
386 |
+
Intentionally using the model for harm, violating [human rights](https://huggingface.co/bigscience/bloom/blob/main/README.md#glossary-and-calculations), or other kinds of malicious activities, is a misuse of this model. This includes:
|
387 |
+
- Spam generation
|
388 |
+
- Disinformation and influence operations
|
389 |
+
- Disparagement and defamation
|
390 |
+
- Harassment and abuse
|
391 |
+
- [Deception](https://huggingface.co/bigscience/bloom/blob/main/README.md#glossary-and-calculations)
|
392 |
+
- Unconsented impersonation and imitation
|
393 |
+
- Unconsented surveillance
|
394 |
|
395 |
# License
|
396 |
|
|
|
419 |
|
420 |
# Model Card Contact
|
421 |
|
422 |
+
Please open a discussion on the Community tab!
|