File size: 3,235 Bytes
4c121f3 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 4c121f3 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 5b8c5f7 dcd24c6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: llama2
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: DataLinguistic-34B-V1.0
results:
- task:
type: text-generation
dataset:
type: openai_humaneval
name: HumanEval
metrics:
- name: pass@1
type: pass@1
value: 0.701
verified: false
---
# DataLinguistic-34B-V1.0 Chinese-English Question Answering Model
## Model Overview
DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets.
## Model Architecture
DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters.
## Training Datasets
The model was trained on the following open-source datasets:
- Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k
- Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus
- Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced)
The data is formatted as:
""
\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>
## Use Cases
The model can be used for a wide range of Chinese-English question answering and chatbot applications.
## Model Advantages
- Based on huge model CodeLlama-34b with 34B parameters
- Fine-tuned on large-scale Chinese-English QA datasets for high quality
## Usage
1. Install model from Huggingface
2. Import and initialize model
3. Input question, generate answer
## Version
Current version: DataLinguistic-34B-V1.0
## Author
Tang Zhengzheng
## Contributors
DataLinguistic team
# DataLinguistic-34B-V1.0 中英文问答模型
## 模型简介
DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。
## 模型结构
DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters.
## 模型训练数据集
模型使用了以下开源数据集进行了训练:
- Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集
- Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集
- DataLinguistic内部收集的专属中英文问答数据集(未开源)
数据集采用如下格式:
\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>
## 应用场景
该模型可广泛应用于中英文问答、聊天机器人等场景。
## 模型优势
- 基于大模型Llama-34b,参数量达34亿
- 在大规模中英文问答数据集上进行微调,质量较高
## 使用步骤
1. 在Huggingface安装模型
2. 导入并初始化模型
3. 输入问题,生成回答
## 版本信息
当前版本:DataLinguistic-34B-V1.0
## 作者
唐正正
## 贡献者
DataLinguistic团队 |