File size: 2,778 Bytes
83f9879
 
 
 
 
 
 
 
 
 
 
 
 
 
ce84a64
83f9879
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b5afc8
83f9879
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: cc-by-4.0
language:
- en
tags:
- bert
datasets:
- L3Cube-IndicNews 
widget:
- text: 'BCCI took action against Mumbai Indians batter Tim David and batting coach Kieron Pollard after they were found guilty of breaching the IPL Code of Conduct during their match against the Punjab Kings in Mullanpur on Thursday. "Mumbai Indians batter Tim David and batting coach Kieron Pollard have been fined for breaching the IPL’s Code of Conduct during their team’s Tata Indian Premier League (IPL) 2024 match against Punjab Kings at the PCA New International Cricket Stadium, Mullanpur on April 18," BCCI said.'

---

## English-Doc-Topic-BERT
Engish-Doc-Topic-BERT model is a BERT-Base-uncased model fine-tuned on Engish documents from the L3Cube-IndicNews Corpus [dataset link]https://github.com/l3cube-pune/indic-nlp. <br>
This dataset consists of sub-datasets like LDC (Long Document Classification), LPC (Long Paragraph Classification), and SHC (Short Headlines Classification), each having different document lengths. <br>
This model is trained on a combination of all three variants and works well across different document sizes.

More details on the dataset, models, and baseline results can be found in our [paper]https://arxiv.org/abs/2401.02254

Citing:
```
@article{mirashi2024l3cube,
  title={L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages},
  author={Mirashi, Aishwarya and Sonavane, Srushti and Lingayat, Purva and Padhiyar, Tejas and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2401.02254},
  year={2024}
}
```

Other document topic models for different Indic languages are listed below: <br>
<a href='https://huggingface.co/l3cube-pune/hindi-topic-all-doc'> Hindi-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/marathi-topic-all-doc-v2'> Marathi-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/bengali-topic-all-doc'> Bengali-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/telugu-topic-all-doc'> Telugu-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/tamil-topic-all-doc'> Tamil-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/gujarati-topic-all-doc'> Gujarati-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/kannada-topic-all-doc'> Kannada-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/odia-topic-all-doc'> Odia-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/malayalam-topic-all-doc'> Malayalam-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/punjabi-topic-all-doc'> Punjabi-Doc-Topic-BERT </a> <br>
<a href='https://huggingface.co/l3cube-pune/english-topic-all-doc'> English-Doc-Topic-BERT </a> <br>