abdouaziiz
/

soraberta

Inference Endpoints

Model card Files Files and versions Community

soraberta / README.md

abdouaziiz's picture

Update README.md

dbaf258 over 2 years ago

|

raw history blame contribute delete

No virus

1.72 kB

	---
	language: wo
	tags:
	- roberta
	- language-model
	- wo
	- wolof
	---

	# Soraberta: Unsupervised Language Model Pre-training for Wolof

	Soraberta is pretrained roberta-base model on wolof language . Roberta was introduced in [this paper](https://arxiv.org/abs/1907.11692)

	## Soraberta models

	\| Model name \| Number of layers \| Attention Heads \| Embedding Dimension \| Total Parameters \|
	\| :------: \| :---: \| :---: \| :---: \| :---: \|
	\| `soraberta-base` \| 6 \| 12 \| 514 \| 83 M \|




	## Using Soraberta with Hugging Face's Transformers


	```python
	>>> from transformers import pipeline
	>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta')
	>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")

	[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
	'score': 0.9783930778503418,
	'token': 4621,
	'token_str': ' gileem'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
	'score': 0.009271537885069847,
	'token': 2155,
	'token_str': ' jend'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
	'score': 0.0027585660573095083,
	'token': 704,
	'token_str': ' aw'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
	'score': 0.001120452769100666,
	'token': 1171,
	'token_str': ' pel'},
	{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
	'score': 0.0005133090307936072,
	'token': 5820,
	'token_str': ' juum'}]
	```

	## Training data
	The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/)



	## Contact

	Please contact abdouaziz@gmail.com for any question, feedback or request.