Edit model card

DataLinguistic-34B-V1.0 Chinese-English Question Answering Model

Model Overview

DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets.

Model Architecture

DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters.

Training Datasets

The model was trained on the following open-source datasets:

  • Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k
  • Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus
  • Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced)

The data is formatted as: ""

<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}</s>

Use Cases

The model can be used for a wide range of Chinese-English question answering and chatbot applications.

Model Advantages

  • Based on huge model CodeLlama-34b with 34B parameters
  • Fine-tuned on large-scale Chinese-English QA datasets for high quality

Usage

  1. Install model from Huggingface
  2. Import and initialize model
  3. Input question, generate answer

Version

Current version: DataLinguistic-34B-V1.0

Author

Tang Zhengzheng

Contributors

DataLinguistic team

DataLinguistic-34B-V1.0 中英文问答模型

模型简介

DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。

模型结构

DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters.

模型训练数据集

模型使用了以下开源数据集进行了训练:

  • Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集
  • Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集
  • DataLinguistic内部收集的专属中英文问答数据集(未开源)

数据集采用如下格式:

<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}</s>

应用场景

该模型可广泛应用于中英文问答、聊天机器人等场景。

模型优势

  • 基于大模型Llama-34b,参数量达34亿
  • 在大规模中英文问答数据集上进行微调,质量较高

使用步骤

  1. 在Huggingface安装模型
  2. 导入并初始化模型
  3. 输入问题,生成回答

版本信息

当前版本:DataLinguistic-34B-V1.0

作者

唐正正

贡献者

DataLinguistic团队

Downloads last month
1,125

Spaces using DataLinguistic/DataLinguistic-34B-V1.0 13

Evaluation results