File size: 3,235 Bytes
4c121f3
dcd24c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b8c5f7
dcd24c6
 
5b8c5f7
4c121f3
dcd24c6
5b8c5f7
dcd24c6
 
 
5b8c5f7
dcd24c6
 
 
5b8c5f7
dcd24c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b8c5f7
dcd24c6
 
 
 
 
 
 
 
 
5b8c5f7
dcd24c6
 
 
5b8c5f7
dcd24c6
 
 
5b8c5f7
dcd24c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b8c5f7
dcd24c6
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: llama2
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: DataLinguistic-34B-V1.0
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.701
      verified: false


---

# DataLinguistic-34B-V1.0 Chinese-English Question Answering Model

## Model Overview

DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets.

## Model Architecture 

DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters. 

## Training Datasets

The model was trained on the following open-source datasets:

- Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k
- Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus
- Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced)

The data is formatted as:
""


\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>


## Use Cases

The model can be used for a wide range of Chinese-English question answering and chatbot applications.

## Model Advantages

- Based on huge model CodeLlama-34b with 34B parameters
- Fine-tuned on large-scale Chinese-English QA datasets for high quality

## Usage

1. Install model from Huggingface
2. Import and initialize model
3. Input question, generate answer

## Version

Current version: DataLinguistic-34B-V1.0

## Author

Tang Zhengzheng

## Contributors

DataLinguistic team

# DataLinguistic-34B-V1.0 中英文问答模型

## 模型简介

DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。

## 模型结构

DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters.

## 模型训练数据集

模型使用了以下开源数据集进行了训练:

- Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集
- Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集
- DataLinguistic内部收集的专属中英文问答数据集(未开源)

数据集采用如下格式:

\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>

## 应用场景

该模型可广泛应用于中英文问答、聊天机器人等场景。

## 模型优势 

- 基于大模型Llama-34b,参数量达34亿
- 在大规模中英文问答数据集上进行微调,质量较高

## 使用步骤

1. 在Huggingface安装模型
2. 导入并初始化模型
3. 输入问题,生成回答

## 版本信息

当前版本:DataLinguistic-34B-V1.0

## 作者

唐正正

## 贡献者

DataLinguistic团队