abdouaziiz
/

bert-base-wolof

Inference Endpoints

Model card Files Files and versions Community

bert-base-wolof / README.md

abdouaziiz's picture

Create README.md

df8bdb4 over 2 years ago

|

raw history blame

No virus

1.72 kB

	---
	language: wo
	tags:
	- bert
	- language-model
	- wo
	- wolof
	---

	# Soraberta: Unsupervised Language Model Pre-training for Wolof

	bert-base-wolof is pretrained bert-base model on wolof language .

	## Soraberta models

	\| Model name \| Number of layers \| Attention Heads \| Embedding Dimension \| Total Parameters \|
	\| :------: \| :---: \| :---: \| :---: \| :---: \|
	\| `bert-base` \| 6 \| 12 \| 514 \| 56931622 M \|




	## Using Soraberta with Hugging Face's Transformers


	```python
	>>> from transformers import pipeline
	>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta')
	>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")

	[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
	'score': 0.9783930778503418,
	'token': 4621,
	'token_str': ' gileem'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
	'score': 0.009271537885069847,
	'token': 2155,
	'token_str': ' jend'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
	'score': 0.0027585660573095083,
	'token': 704,
	'token_str': ' aw'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
	'score': 0.001120452769100666,
	'token': 1171,
	'token_str': ' pel'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
	'score': 0.0005133090307936072,
	'token': 5820,
	'token_str': ' juum'}]
	```

	## Training data
	The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/)
	[ALFFA_PUBLIC](https://github.com/getalp/ALFFA_PUBLIC/tree/master/ASR/WOLOF)



	## Contact

	Please contact abdouaziz@gmail.com for any question, feedback or request.