File size: 3,541 Bytes
8fdb923
 
 
 
 
 
 
 
 
 
 
 
 
 
41593ad
8fdb923
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41593ad
8fdb923
 
 
 
 
 
 
 
 
 
 
 
 
 
41593ad
8fdb923
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf83eb0
 
 
 
 
 
 
 
 
 
8fdb923
 
cf83eb0
 
8fdb923
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
language:
- ko
- uz
- en
- ru
- zh
- ja
- km
- my
- si
- tl
- th
- vi
- uz
- bn
- mn
- id
- ne
- pt
tags:
- translation
- multilingual
- korean
- uzbek
datasets:
- custom_parallel_corpus
license: mit
---

# QWEN2.5-7B-Bnk-7e

## Model Description

QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek.

## Intended Uses & Limitations

The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as:

- Multilingual document translation
- Cross-lingual information retrieval
- Language learning applications
- International communication assistance

Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content.

## Training and Evaluation Data

The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair.

## Training Procedure

Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs.

## Supported Languages

The model supports translation from the following languages to Korean and Uzbek:

- uzbek (uz)
- Russian (ru)
- Thai (th)
- Chinese (Simplified) (zh)
- Chinese (Traditional) (zh-tw, zh-hant)
- Bengali (bn)
- Mongolian (mn)
- Indonesian (id)
- Nepali (ne)
- English (en)
- Khmer (km)
- Portuguese (pt)
- Sinhala (si)
- Korean (ko)
- Tagalog (tl)
- Myanar (my)
- Vietnamese (vi)
- Japanese (ja)



## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Example usage
source_text = "Hello, how are you?"
source_lang = "en"
target_lang = "ko"  # or "uz" for Uzbek

messages = [
        {"role": "system", "content": f"""Translate {input_lang} to {output_lang} word by word correctly."""},
        {"role": "user", "content": f"""{source_text}"""},
    ]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to('cuda')

outputs = model.generate(input_ids, max_length=100)
response = outputs[0][input_ids.shape[-1]:]
translated_text = tokenizer.decode(response, skip_special_tokens=True)
print(translated_text)
```
## Performance


## Limitations

- The model's performance may vary across different language pairs and domains.
- It may struggle with very colloquial or highly specialized text.
- The model may not always capture cultural nuances or context-dependent meanings accurately.

## Ethical Considerations

- The model should not be used for generating or propagating harmful, biased, or misleading content.
- Users should be aware of potential biases in the training data that may affect translations.
- The model's outputs should not be considered as certified translations for official or legal purposes without human verification.


## Citation


```bibtex
@misc{fingu2023qwen25,
  author = {FINGU AI and AI Team},
  title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}}
}