--- license: llama2 metrics: - code_eval library_name: transformers tags: - code model-index: - name: DataLinguistic-34B-V1.0 results: - task: type: text-generation dataset: type: openai_humaneval name: HumanEval metrics: - name: pass@1 type: pass@1 value: 0.701 verified: false --- # DataLinguistic-34B-V1.0 Chinese-English Question Answering Model ## Model Overview DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets. ## Model Architecture DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters. ## Training Datasets The model was trained on the following open-source datasets: - Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k - Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus - Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced) The data is formatted as: "" \please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\ ## Use Cases The model can be used for a wide range of Chinese-English question answering and chatbot applications. ## Model Advantages - Based on huge model CodeLlama-34b with 34B parameters - Fine-tuned on large-scale Chinese-English QA datasets for high quality ## Usage 1. Install model from Huggingface 2. Import and initialize model 3. Input question, generate answer ## Version Current version: DataLinguistic-34B-V1.0 ## Author Tang Zhengzheng ## Contributors DataLinguistic team # DataLinguistic-34B-V1.0 中英文问答模型 ## 模型简介 DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。 ## 模型结构 DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters. ## 模型训练数据集 模型使用了以下开源数据集进行了训练: - Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集 - Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集 - DataLinguistic内部收集的专属中英文问答数据集(未开源) 数据集采用如下格式: \please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}\ ## 应用场景 该模型可广泛应用于中英文问答、聊天机器人等场景。 ## 模型优势 - 基于大模型Llama-34b,参数量达34亿 - 在大规模中英文问答数据集上进行微调,质量较高 ## 使用步骤 1. 在Huggingface安装模型 2. 导入并初始化模型 3. 输入问题,生成回答 ## 版本信息 当前版本:DataLinguistic-34B-V1.0 ## 作者 唐正正 ## 贡献者 DataLinguistic团队