FINGU-AI
/

QWEN2.5-7B-Bnk-3e

Model card Files Files and versions Community

QWEN2.5-7B-Bnk-3e / README.md

FINGU-AI's picture

Update README.md

41593ad verified 8 days ago

|

3.54 kB

	---
	language:
	- ko
	- uz
	- en
	- ru
	- zh
	- ja
	- km
	- my
	- si
	- tl
	- th
	- vi
	- uz
	- bn
	- mn
	- id
	- ne
	- pt
	tags:
	- translation
	- multilingual
	- korean
	- uzbek
	datasets:
	- custom_parallel_corpus
	license: mit
	---

	# QWEN2.5-7B-Bnk-7e

	## Model Description

	QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek.

	## Intended Uses & Limitations

	The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as:

	- Multilingual document translation
	- Cross-lingual information retrieval
	- Language learning applications
	- International communication assistance

	Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content.

	## Training and Evaluation Data

	The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair.

	## Training Procedure

	Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs.

	## Supported Languages

	The model supports translation from the following languages to Korean and Uzbek:

	- uzbek (uz)
	- Russian (ru)
	- Thai (th)
	- Chinese (Simplified) (zh)
	- Chinese (Traditional) (zh-tw, zh-hant)
	- Bengali (bn)
	- Mongolian (mn)
	- Indonesian (id)
	- Nepali (ne)
	- English (en)
	- Khmer (km)
	- Portuguese (pt)
	- Sinhala (si)
	- Korean (ko)
	- Tagalog (tl)
	- Myanar (my)
	- Vietnamese (vi)
	- Japanese (ja)



	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	# Example usage
	source_text = "Hello, how are you?"
	source_lang = "en"
	target_lang = "ko" # or "uz" for Uzbek

	messages = [
	{"role": "system", "content": f"""Translate {input_lang} to {output_lang} word by word correctly."""},
	{"role": "user", "content": f"""{source_text}"""},
	]
	# Apply chat template
	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to('cuda')

	outputs = model.generate(input_ids, max_length=100)
	response = outputs[0][input_ids.shape[-1]:]
	translated_text = tokenizer.decode(response, skip_special_tokens=True)
	print(translated_text)
	```
	## Performance


	## Limitations

	- The model's performance may vary across different language pairs and domains.
	- It may struggle with very colloquial or highly specialized text.
	- The model may not always capture cultural nuances or context-dependent meanings accurately.

	## Ethical Considerations

	- The model should not be used for generating or propagating harmful, biased, or misleading content.
	- Users should be aware of potential biases in the training data that may affect translations.
	- The model's outputs should not be considered as certified translations for official or legal purposes without human verification.


	## Citation


	```bibtex
	@misc{fingu2023qwen25,
	author = {FINGU AI and AI Team},
	title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model},
	year = {2024},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}}
	}