tamang0000
commited on
Commit
•
3d0e234
1
Parent(s):
3250b7d
Update README.md
Browse files
README.md
CHANGED
@@ -13,8 +13,6 @@ tags:
|
|
13 |
|
14 |
# Assamese Tokenizer (50K Vocabulary)
|
15 |
|
16 |
-
[![Downloads](https://img.shields.io/github/downloads/tamang0000/assamese-tokenizer-50k/total.svg)](https://github.com/tamang0000/assamese-tokenizer-50k/releases)
|
17 |
-
|
18 |
## Model Details
|
19 |
|
20 |
This repository contains a custom tokenizer for the Assamese language with a vocabulary size of 50,000 tokens. The tokenizer was trained on the Assamese language subset of the CC-100 multilingual dataset. This tokenizer can be used for various Natural Language Processing (NLP) tasks involving the Assamese language.
|
|
|
13 |
|
14 |
# Assamese Tokenizer (50K Vocabulary)
|
15 |
|
|
|
|
|
16 |
## Model Details
|
17 |
|
18 |
This repository contains a custom tokenizer for the Assamese language with a vocabulary size of 50,000 tokens. The tokenizer was trained on the Assamese language subset of the CC-100 multilingual dataset. This tokenizer can be used for various Natural Language Processing (NLP) tasks involving the Assamese language.
|