identrics
/

wasper_propaganda_detection_en

Model card Files Files and versions Community

Nikola299 commited on Aug 19, 2024

Commit

c68b60f

·

verified ·

1 Parent(s): 3e7e156

Update README.md

Files changed (1) hide show

README.md +46 -36

README.md CHANGED Viewed

@@ -1,57 +1,67 @@
 ---
 license: apache-2.0
-base_model: google-bert/bert-base-cased
 tags:
-- generated_from_trainer
-metrics:
-- accuracy
-model-index:
-- name: bert-base-cased_v3
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# bert-base-cased_v3
-This model is a fine-tuned version of [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.2443
-- Accuracy: 0.7326
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 32
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 11.0
-### Training results
-### Framework versions
-- Transformers 4.43.0.dev0
-- Pytorch 2.0.1+cu117
-- Datasets 2.20.0
-- Tokenizers 0.19.1

 ---
+base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
+library_name: peft
 license: apache-2.0
+language:
+- en
 tags:
+- propaganda
 ---
+# Model Card for identrics/BG_propaganda_detector
+## Model Description
+- **Developed by:** Identrics
+- **Language:** English
+- **License:** apache-2.0
+- **Finetuned from model:** google-bert/bert-base-cased
+- **Context window :** 512 tokens
+## Model Description
+This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string.
+This model was created by [`Identrics`](https://identrics.ai/), in the scope of the Wasper project.
+## Uses
+To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site
+### Example
+First install direct dependencies:
+```
+pip install transformers torch accelerate
+```
+Then the model can be downloaded and used for inference:
+```py
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2)
+tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector")
+tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
+output = model(**tokens)
+print(output.logits)
+```
+## Training Details
+Trained on a corpus of 200 human-generated comments, augmented with 200 more synthetic comments...
+Achieved an f1 score of x%
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+- PEFT 0.11.1