Model V0 Datacard Update
#2
by
Tihsrah-CD
- opened
README.md
CHANGED
@@ -9,13 +9,13 @@ pipeline_tag: text-classification
|
|
9 |
|
10 |
# Topic Classifier
|
11 |
|
12 |
-
This repository contains the
|
13 |
|
14 |
## Model Details
|
15 |
|
16 |
### Model Description
|
17 |
|
18 |
-
The
|
19 |
|
20 |
- **Developed by:** DAXA.AI
|
21 |
- **Funded by:** Open Source
|
@@ -26,14 +26,14 @@ The Topic Classifier is a BERT-based model, fine-tuned from the `distilbert-base
|
|
26 |
|
27 |
### Model Sources
|
28 |
|
29 |
-
- **Repository:** [https://huggingface.co/daxa-ai/
|
30 |
- **Demo:** [https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2](https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2)
|
31 |
|
32 |
## Usage
|
33 |
|
34 |
### How to Get Started with the Model
|
35 |
|
36 |
-
To use the
|
37 |
|
38 |
```python
|
39 |
# Import necessary libraries
|
@@ -43,8 +43,8 @@ import joblib
|
|
43 |
from huggingface_hub import hf_hub_url, cached_download
|
44 |
|
45 |
# Load the tokenizer and model
|
46 |
-
tokenizer = AutoTokenizer.from_pretrained("daxa-ai/
|
47 |
-
model = AutoModelForSequenceClassification.from_pretrained("daxa-ai/
|
48 |
|
49 |
# Example text
|
50 |
text = "Please enter your text here."
|
@@ -58,7 +58,7 @@ probabilities = torch.nn.functional.softmax(output.logits, dim=-1)
|
|
58 |
predicted_label = torch.argmax(probabilities, dim=-1)
|
59 |
|
60 |
# URL of your Hugging Face model repository
|
61 |
-
REPO_NAME = "daxa-ai/
|
62 |
|
63 |
# Path to the label encoder file in the repository
|
64 |
LABEL_ENCODER_FILE = "label_encoder.joblib"
|
@@ -161,6 +161,6 @@ def predict_fn(data, model_and_tokenizer):
|
|
161 |
|
162 |
## Conclusion
|
163 |
|
164 |
-
The
|
165 |
|
166 |
For more information or to try the model yourself, check out the public space [here](https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2).
|
|
|
9 |
|
10 |
# Topic Classifier
|
11 |
|
12 |
+
This repository contains the Pebblo Classifier model developed by DAXA.AI. The Pebblo Classifier is a machine learning model designed to categorize text documents across various domains, such as corporate documents, financial texts, harmful content, and medical documents.
|
13 |
|
14 |
## Model Details
|
15 |
|
16 |
### Model Description
|
17 |
|
18 |
+
The Pebblo Classifier is a BERT-based model, fine-tuned from the distilbert-base-uncased model. It is intended for categorizing text into specific topics, including "CORPORATE_DOCUMENTS," "FINANCIAL," "HARMFUL," and "MEDICAL." This model streamlines text classification tasks across multiple sectors, making it suitable for various business use cases.
|
19 |
|
20 |
- **Developed by:** DAXA.AI
|
21 |
- **Funded by:** Open Source
|
|
|
26 |
|
27 |
### Model Sources
|
28 |
|
29 |
+
- **Repository:** [https://huggingface.co/daxa-ai/pebblo-classifier-v2](https://huggingface.co/daxa-ai/pebblo-classifier-v2)
|
30 |
- **Demo:** [https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2](https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2)
|
31 |
|
32 |
## Usage
|
33 |
|
34 |
### How to Get Started with the Model
|
35 |
|
36 |
+
To use the Pebblo Classifier in your Python project, you can follow the steps below:
|
37 |
|
38 |
```python
|
39 |
# Import necessary libraries
|
|
|
43 |
from huggingface_hub import hf_hub_url, cached_download
|
44 |
|
45 |
# Load the tokenizer and model
|
46 |
+
tokenizer = AutoTokenizer.from_pretrained("daxa-ai/pebblo-classifier-v2")
|
47 |
+
model = AutoModelForSequenceClassification.from_pretrained("daxa-ai/pebblo-classifier-v2")
|
48 |
|
49 |
# Example text
|
50 |
text = "Please enter your text here."
|
|
|
58 |
predicted_label = torch.argmax(probabilities, dim=-1)
|
59 |
|
60 |
# URL of your Hugging Face model repository
|
61 |
+
REPO_NAME = "daxa-ai/pebblo-classifier-v2"
|
62 |
|
63 |
# Path to the label encoder file in the repository
|
64 |
LABEL_ENCODER_FILE = "label_encoder.joblib"
|
|
|
161 |
|
162 |
## Conclusion
|
163 |
|
164 |
+
The Pebblo Classifier achieves high accuracy, precision, recall, and F1-score, making it a reliable model for categorizing text across the domains of corporate documents, financial content, harmful content, and medical texts. The model is optimized for immediate deployment and works efficiently in real-world applications.
|
165 |
|
166 |
For more information or to try the model yourself, check out the public space [here](https://huggingface.co/spaces/daxa-ai/Topic-Classifier-2).
|