NanoTranslator-XXL2 / README_zh-CN.md

upload

f6abc3d about 2 months ago

4.35 kB

	# NanoTranslator-XXL2

	[English](README.md) \| 简体中文

	## Introduction

	这是 NanoTranslator 的 XX-Large-2 型号，目前仅支持英译中。仓库中同时提供了 ONNX 版本的模型。

	所有模型均收录于 [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2) 中。

	\| \| P. \| Arch. \| Act. \| V. \| H. \| I. \| L. \| A.H. \| K.H. \| Tie \|
	\| :--: \| :-----: \| :--: \| :--: \| :--: \| :-----: \| :---: \| :------: \| :--: \| :--: \| :--: \|
	\| [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) \| 102 \| LLaMA \| SwiGLU \| 16K \| 1120 \| 3072 \| 6 \| 16 \| 8 \| True \|
	\| [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) \| 100 \| LLaMA \| SwiGLU \| 16K \| 768 \| 4096 \| 8 \| 24 \| 8 \| True \|
	\| [XL](https://huggingface.co/Mxode/NanoTranslator-XL) \| 78 \| LLaMA \| GeGLU \| 16K \| 768 \| 4096 \| 6 \| 24 \| 8 \| True \|
	\| [L](https://huggingface.co/Mxode/NanoTranslator-L) \| 49 \| LLaMA \| GeGLU \| 16K \| 512 \| 2816 \| 8 \| 16 \| 8 \| True \|
	\| [M2](https://huggingface.co/Mxode/NanoTranslator-M2) \| 22 \| Qwen2 \| GeGLU \| 4K \| 432 \| 2304 \| 6 \| 24 \| 8 \| True \|
	\| [M](https://huggingface.co/Mxode/NanoTranslator-M) \| 22 \| LLaMA \| SwiGLU \| 8K \| 256 \| 1408 \| 16 \| 16 \| 4 \| True \|
	\| [S](https://huggingface.co/Mxode/NanoTranslator-S) \| 9 \| LLaMA \| SwiGLU \| 4K \| 168 \| 896 \| 16 \| 12 \| 4 \| True \|
	\| [XS](https://huggingface.co/Mxode/NanoTranslator-XS) \| 2 \| LLaMA \| SwiGLU \| 2K \| 96 \| 512 \| 12 \| 12 \| 4 \| True \|

	- P. - Parameters (in million)
	- V. - vocab size
	- H. - hidden size
	- I. - intermediate size
	- L. - num layers
	- A.H. - num attention heads
	- K.H. - num kv heads
	- Tie - tie word embeddings



	## How to use

	Prompt 格式如下：

	```
	<\|im_start\|> {English Text} <\|endoftext\|>
	```

	### Directly using transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = 'Mxode/NanoTranslator-XXL2'

	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	def translate(text: str, model, **kwargs):
	generation_args = dict(
	max_new_tokens = kwargs.pop("max_new_tokens", 512),
	do_sample = kwargs.pop("do_sample", True),
	temperature = kwargs.pop("temperature", 0.55),
	top_p = kwargs.pop("top_p", 0.8),
	top_k = kwargs.pop("top_k", 40),
	**kwargs
	)

	prompt = "<\|im_start\|>" + text + "<\|endoftext\|>"
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	generated_ids = model.generate(model_inputs.input_ids, **generation_args)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	return response

	text = "Each step of the cell cycle is monitored by internal."

	response = translate(text, model, max_new_tokens=64, do_sample=False)
	print(response)
	```


	### ONNX

	根据实际测试，使用 ONNX 模型推理会比直接使用 transformers 推理要快 2～10 倍。

	如果希望使用 ONNX 模型，那么你需要手动切换到 [onnx 分支](https://huggingface.co/Mxode/NanoTranslator-XXL2/tree/onnx)并从本地加载。

	参考文档：

	- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
	- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)

	Using ORTModelForCausalLM

	```python
	from optimum.onnxruntime import ORTModelForCausalLM
	from transformers import AutoTokenizer

	model_path = "your/folder/to/onnx_model"

	ort_model = ORTModelForCausalLM.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	text = "Each step of the cell cycle is monitored by internal."

	response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
	print(response)
	```

	Using pipeline

	```python
	from optimum.pipelines import pipeline

	model_path = "your/folder/to/onnx_model"
	pipe = pipeline("text-generation", model=model_path, accelerator="ort")

	text = "Each step of the cell cycle is monitored by internal."

	response = pipe(text, max_new_tokens=64, do_sample=False)
	response
	```