Update README.md
Browse files
README.md
CHANGED
@@ -10,12 +10,14 @@ inference: False
|
|
10 |
|
11 |
# Randeng-Deltalm-362M-Zh-En
|
12 |
|
|
|
|
|
13 |
|
14 |
## 简介 Brief Introduction
|
15 |
|
16 |
-
|
17 |
|
18 |
-
Using the Fengshen-LM framework
|
19 |
|
20 |
## 模型分类 Model Taxonomy
|
21 |
|
@@ -36,27 +38,26 @@ Using the Fengshen-LM framework, on the collected Chinese-English dataset, finet
|
|
36 |
## 使用 Usage
|
37 |
|
38 |
```python
|
39 |
-
|
40 |
# Need to download modeling_deltalm.py from Fengshenbang-LM github repo in advance,
|
41 |
# or you can download modeling_deltalm.py in
|
42 |
# Strongly recommend you git clone the Fengshenbang-LM repo:
|
43 |
# 1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
|
44 |
# 2. cd Fengshenbang-LM/fengshen/examples/deltalm/
|
45 |
-
# and then you will see the modeling_deltalm.py which are needed by deltalm model
|
46 |
|
47 |
from modeling_deltalm import DeltalmForConditionalGeneration
|
|
|
48 |
|
49 |
model = DeltalmForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Deltalm-362M-Zh-En")
|
50 |
tokenizer = AutoTokenizer.from_pretrained("microsoft/infoxlm-base")
|
51 |
|
52 |
-
text = ""
|
53 |
-
inputs = tokenizer(text, max_length=
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
58 |
|
59 |
-
# model Output:
|
60 |
```
|
61 |
|
62 |
## 引用 Citation
|
|
|
10 |
|
11 |
# Randeng-Deltalm-362M-Zh-En
|
12 |
|
13 |
+
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/examples/)
|
14 |
+
- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/zh/latest/docs/%E7%87%83%E7%81%AF%E7%B3%BB%E5%88%97/)
|
15 |
|
16 |
## 简介 Brief Introduction
|
17 |
|
18 |
+
使用封神框架基于 Detalm base 进行finetune ,搜集的中英数据集(共3千万条)以及 iwslt的中英平行数据(20万),得到中 -> 英方向的翻译模型
|
19 |
|
20 |
+
Using the Fengshen-LM framework and finetuning based on detalm , get a translation model in the Chinese->English direction
|
21 |
|
22 |
## 模型分类 Model Taxonomy
|
23 |
|
|
|
38 |
## 使用 Usage
|
39 |
|
40 |
```python
|
41 |
+
|
42 |
# Need to download modeling_deltalm.py from Fengshenbang-LM github repo in advance,
|
43 |
# or you can download modeling_deltalm.py in
|
44 |
# Strongly recommend you git clone the Fengshenbang-LM repo:
|
45 |
# 1. git clone https://github.com/IDEA-CCNL/Fengshenbang-LM
|
46 |
# 2. cd Fengshenbang-LM/fengshen/examples/deltalm/
|
|
|
47 |
|
48 |
from modeling_deltalm import DeltalmForConditionalGeneration
|
49 |
+
from transformers import AutoTokenizer
|
50 |
|
51 |
model = DeltalmForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Deltalm-362M-Zh-En")
|
52 |
tokenizer = AutoTokenizer.from_pretrained("microsoft/infoxlm-base")
|
53 |
|
54 |
+
text = "尤其在夏天,如果你决定徒步穿越雨林,就需要小心蚊子。"
|
55 |
+
inputs = tokenizer(text, max_length=512, return_tensors="pt")
|
56 |
|
57 |
+
generate_ids = model.generate(inputs["input_ids"], max_length=512)
|
58 |
+
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
|
|
59 |
|
60 |
+
# model Output:
|
61 |
```
|
62 |
|
63 |
## 引用 Citation
|