DataLinguistic
/

DataLinguistic-34B-V1.0

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

DataLinguistic-34B-V1.0 / README.md

DataLinguistic's picture

Update README.md

5b8c5f7 9 months ago

|

raw history blame contribute delete

No virus

3.24 kB

	---
	license: llama2
	metrics:
	- code_eval
	library_name: transformers
	tags:
	- code
	model-index:
	- name: DataLinguistic-34B-V1.0
	results:
	- task:
	type: text-generation
	dataset:
	type: openai_humaneval
	name: HumanEval
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.701
	verified: false


	---

	# DataLinguistic-34B-V1.0 Chinese-English Question Answering Model

	## Model Overview

	DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets.

	## Model Architecture

	DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters.

	## Training Datasets

	The model was trained on the following open-source datasets:

	- Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k
	- Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus
	- Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced)

	The data is formatted as:
	""


	\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>


	## Use Cases

	The model can be used for a wide range of Chinese-English question answering and chatbot applications.

	## Model Advantages

	- Based on huge model CodeLlama-34b with 34B parameters
	- Fine-tuned on large-scale Chinese-English QA datasets for high quality

	## Usage

	1. Install model from Huggingface
	2. Import and initialize model
	3. Input question, generate answer

	## Version

	Current version: DataLinguistic-34B-V1.0

	## Author

	Tang Zhengzheng

	## Contributors

	DataLinguistic team

	# DataLinguistic-34B-V1.0 中英文问答模型

	## 模型简介

	DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。

	## 模型结构

	DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters.

	## 模型训练数据集

	模型使用了以下开源数据集进行了训练:

	- Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集
	- Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集
	- DataLinguistic内部收集的专属中英文问答数据集(未开源)

	数据集采用如下格式:

	\<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\</s>

	## 应用场景

	该模型可广泛应用于中英文问答、聊天机器人等场景。

	## 模型优势

	- 基于大模型Llama-34b,参数量达34亿
	- 在大规模中英文问答数据集上进行微调,质量较高

	## 使用步骤

	1. 在Huggingface安装模型
	2. 导入并初始化模型
	3. 输入问题,生成回答

	## 版本信息

	当前版本:DataLinguistic-34B-V1.0

	## 作者

	唐正正

	## 贡献者

	DataLinguistic团队