argilla
/

alpaca-garbage-collector-multilingual

Text Classification

sentence-transformers

PyTorch

setfit

xlm-roberta

Model card Files Files and versions Community

dvilasuero HF staff commited on Apr 3, 2023

Commit

9f2d78b

•

1 Parent(s): f057b32

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -5

README.md CHANGED Viewed

@@ -10,10 +10,13 @@ datasets:
 ---
 # 😵‍💫🦙 Alpaca HalluciHunter
-<img src="front-image.png" alt="Alpaca Cleaned" width="200" height="150" >
-This is a cross-lingual SetFit model [SetFit model](https://github.com/huggingface/setfit) to detect potentially bad instructions from Alpaca (and likely other synthetically generated instruction datasets).
 The model has been fine-tuned with 1,000 labeled examples from the AlpacaCleaned dataset. It leverages a multilingual sentence transformer `paraphrase-multilingual-mpnet-base-v2`, inspired by the findings from the SetFit paper (Section 6. Multilingual experiments.), where they trained models in English that performed well across languages.
@@ -23,8 +26,6 @@ It's a binary classifier with two labels:
 - `BAD INSTRUCTION`, there's an issue with the instruction, and/or input and output.
-This model can greatly speed up the validation of Alpaca Datasets, flagging examples that need to be fixed or simply discarded.
 ## Usage
 To use this model for inference, first install the SetFit library:
@@ -79,7 +80,7 @@ def get_predictions(texts):
 ds = ds.map(lambda batch: {"prediction": list(get_predictions(batch["text"]))}, batched=True)
 ```
-Load the data into Argilla for exploration and validation. You [need to launch Argilla](https://www.argilla.io/blog/launching-argilla-huggingface-hub):
 ```python
 # Replace api_url with the url to your HF Spaces URL if using Spaces
 # Replace api_key if you configured a custom API key
@@ -92,8 +93,37 @@ rg_dataset = rg.DatasetForTextClassification().from_datasets(ds)
 rg.log(records=rg_dataset, name="alpaca_to_clean")
 ```
 ## Examples
 ## BibTeX entry and citation info

 ---
 # 😵‍💫🦙 Alpaca HalluciHunter
+This is a cross-lingual SetFit model [SetFit model](https://github.com/huggingface/setfit) to detect potentially bad instructions from Alpaca. This model can greatly speed up the validation of Alpaca Datasets, flagging examples that need to be fixed or simply discarded.
+<div style="text-align:center;width:50%">
+    <img src="https://huggingface.co/argilla/alpaca-hallucihunter-multilingual/resolve/main/front-image.png" alt="Alpaca Cleaned"">
+</div>
 The model has been fine-tuned with 1,000 labeled examples from the AlpacaCleaned dataset. It leverages a multilingual sentence transformer `paraphrase-multilingual-mpnet-base-v2`, inspired by the findings from the SetFit paper (Section 6. Multilingual experiments.), where they trained models in English that performed well across languages.
 - `BAD INSTRUCTION`, there's an issue with the instruction, and/or input and output.
 ## Usage
 To use this model for inference, first install the SetFit library:
 ds = ds.map(lambda batch: {"prediction": list(get_predictions(batch["text"]))}, batched=True)
 ```
+Load the data into Argilla for exploration and validation. First, you [need to launch Argilla](https://www.argilla.io/blog/launching-argilla-huggingface-hub). Then run:
 ```python
 # Replace api_url with the url to your HF Spaces URL if using Spaces
 # Replace api_key if you configured a custom API key
 rg.log(records=rg_dataset, name="alpaca_to_clean")
 ```
+## Live demo
+You can explore the dataset using this Space (credentials: `argilla` / `1234`):
+(https://huggingface.co/spaces/argilla/alpaca-hallucihunter)[https://huggingface.co/spaces/argilla/alpaca-hallucihunter]
 ## Examples
+This model has been tested with English, German, and Spanish. This approach will be used by ongoing efforts for improving the quality of Alpaca-based datasets, and updates will be reflected here.
+Here are some examples of highest scored examples of `BAD INSTRUCTION`.
+### English
+<div style="text-align:center;width:50%">
+    <img src="https://huggingface.co/argilla/alpaca-hallucihunter-multilingual/resolve/main/front-image.png" alt="Alpaca Cleaned"">
+</div>
+### German
+<div style="text-align:center;width:50%">
+    <img src="https://huggingface.co/argilla/alpaca-hallucihunter-multilingual/resolve/main/german-alpaca.png" alt="Alpaca Cleaned"">
+</div>
+### Spanish
+<div style="text-align:center;width:50%">
+    <img src="https://huggingface.co/argilla/alpaca-hallucihunter-multilingual/resolve/main/spanish-alpaca.png" alt="Alpaca Cleaned"">
+</div>
 ## BibTeX entry and citation info