File size: 2,352 Bytes
b3aac41
 
 
 
 
 
 
 
 
 
 
 
 
 
18c98aa
 
 
 
 
b3aac41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0a19dd
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

---
tags:
- autotrain
- text-classification
widget:
- text: "I love AutoTrain"
datasets:
- few-shot-learning-classification-bert-sm-5K-32/autotrain-data
---

# Model Trained Using AutoTrain

- Problem type: Text Classification
  
# Publisher Info
- Publisher, PRAVIN SURESH TAWADE
- Co-Publisher, Dr.JAYA KRISHNA GUTHA
  
## Validation Metrics
loss: 0.25288185477256775

f1_macro: 0.9137712253628689

f1_micro: 0.914

f1_weighted: 0.9137712253628689

precision_macro: 0.9140401620479054

precision_micro: 0.914

precision_weighted: 0.9140401620479053

recall_macro: 0.9140000000000001

recall_micro: 0.914

recall_weighted: 0.914

accuracy: 0.914

## Data in depth
One of the potential business applications of few-shot text classification with the AG News dataset is in media and content companies. They could implement this technology to categorize news articles on world, sports, business, technology, and other topics with minimal labeled data. This few-shot model application would allow for more efficient management and retrieval of news content, improving user satisfaction with personalized news feed. Moreover, such a model will allow these companies to promptly adjust their classification to new categories or rapidly emerging topics in dynamic industries.

With a concern that the repetition of the source material may impair the perception of the results of my adaptation, I would prefer to avoid working with the same data I encountered during the course. Therefore, I would like to select a diverse text dataset where the number of the labelled examples available for each of the classes is limited. Additionally, in order to evaluate the effectiveness of the model, I would consider varying the domains and types of documents. The work will begin with the choice of the dataset, and the one I have selected is the AG’s News Corpus, which can be accessed on Hugging Face. In my study, I use this collection of news articles, divided into four primary classes: World, Sports, Business, and Sci/Tech. The sizes of the dataset are as follows: 30,000 training samples and 1,900 test samples for each of the classes.

- Dataset size: 31.3 MB
- Data Split: 127600 rows

- Data Fields:
   - Text: A feature represented by a string.
   - Label: A set of classification labels comprising World (0), Sports (1), Business (2), and Sci/Tech (3).