EthioNLP
/

EthioLLM-l-250K

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

EthioLLM-l-250K / README.md

Atnafu's picture

update

c9434ca verified 3 months ago

|

raw history blame contribute delete

No virus

1.64 kB

	---
	license: mit
	base_model: xlm-roberta-large
	tags:
	- generated_from_trainer
	model-index:
	- name: EthioLLM-l-250K
	results: []
	language:
	- am
	- om
	- ti
	- so
	- gez
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# EthioLLM-l-250K

	This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.0552

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 10
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10.0

	### Training results



	### Framework versions

	- Transformers 4.33.3
	- Pytorch 2.0.1+cu117
	- Datasets 2.14.5
	- Tokenizers 0.13.3

	### Citation Information

	@article{tonja2024ethiollm, title={EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation}, author={Tonja, Atnafu Lambebo and Azime, Israel Abebe and Belay, Tadesse Destaw and Yigezu, Mesay Gemeda and Mehamed, Moges Ahmed and Ayele, Abinew Ali and Jibril, Ebrahim Chekol and Woldeyohannis, Michael Melese and Kolesnikova, Olga and Slusallek, Philipp and others}, journal={arXiv preprint arXiv:2403.13737}, year={2024} }