Edit model card
YAML Metadata Warning: The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, text2text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, any-to-any, other

Model Card for MindLLM

Model Details

Model Description

MindLLM 1.3B is a Transformer model with 1.3 billion parameters by Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications & Beijing Institute of Technology Southeast Academy of Information Technology.

It was trained using the bilingual data sources including Pile, Wudao, CBook and other self-collected data source that consists of filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, MindLLM showcased a great performance and even surpass models with less than 13 billion parameters.

Our model has been fine-tuned with instruction dataset in chat format but hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges and adopt to domain-specific application.

  • Developed by: Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications & Beijing Institute of Technology Southeast Academy of Information Technology
  • Model type: Pretrained Causal Language Model
  • Language(s) (NLP): Chinese & English
  • License: apache-2.0
  • Train from Scratch

Model Sources

To cite this model, please use

  title={MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications},
  author={Yang, Yizhe and Sun, Huashan and Li, Jiawei and Liu, Runheng and Li, Yinghao and Liu, Yuhang and Huang, Heyan and Gao, Yang},
  journal={arXiv preprint arXiv:2310.15777},


Direct Use

As the model has been supervised trained on instruction data in a special chat format. You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
tokenizer = AutoTokenizer.from_pretrained('mindllm_path')
tokenizer.max_length = 1024
model = AutoModelForCausalLM.from_pretrained('mindllm_path').to(device)
generator = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=device)
context = "<user>\n你知道电动车相对传统汽油车有哪些优点吗?\n<assistant>\n"
outputs = generator(context, max_new_tokens=1024, do_sample=True, num_beams=4, repetition_penalty=0.5, no_repeat_ngram_size=5, return_full_text=False)
[{'generated_text': '电动车相对传统汽油车的优点包括:\n1. 更低的排放和更高的能源效率 - 电动车所产生的有害排放物质远少于汽油车,并且它们的能源利用效率更高。\n2. 更低的维护成本 - 电动车需要更少的保养和通常拥有较少的运动部件,从而降低了总体维护成本。\n3. 更低的燃料成本 - 电动车需要比汽油车少得多的燃料,因此随着时间的推移,可以节省成本。\n4. 更长的续航里程 - 电动车单次充电可以行驶比汽油车更远的距离,非常适合长途通勤。\n5. 更为安静的运行 - 电动车比汽油车要安静得多,使驾驶更加愉悦。'}]

Chat Template

To get the expected features and performance for the chat versions, specific formatting needs to be followed, including and tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces). Here are some examples:

  1. single-turn
# prompt
# return 
电动车相对传统汽油车的优点包括:\n1. 更低的排放和更高的能源效率 - 电动车所产生的有害排放物质远少于汽油车,并且它们的能源利用效率更高。\n2. 更低的维护成本 - 电动车需要更少的保养和通常拥有较少的运动部件,从而降低了总体维护成本。\n3. 更低的燃料成本 - 电动车需要比汽油车少得多的燃料,因此随着时间的推移,可以节省成本。\n4. 更长的续航里程 - 电动车单次充电可以行驶比汽油车更远的距离,非常适合长途通勤。\n5. 更为安静的运行 - 电动车比汽油车要安静得多,使驾驶更加愉悦。
  1. multi-turn
# prompt
# return 

Other Examples

Self Awareness
assistant:你好,我是 MindLLM,一个人工智能助手。






Small Talk





def median(numbers):
    sorted_numbers = sorted(numbers)
    length = len(numbers)
    mid_index = length // 2
    if length % 2 == 0:
        return (sorted_numbers[mid_index - 1] + sorted_numbers[mid_index]) / 2
        return sorted_numbers[mid_index]
首先,定义了一个名为 median 的函数,它接受一个数字列表作为参数。函数使用内置函数 sorted,对数字列表进行排序,然后使用 len 函数计算所有数字的长度,并判断长度是奇数还是偶数,如果是,则返回相应的中位数。如果长度为奇数,则返回中位数。否则,将数组分成两半,找到中间的数,然后返回它。

Deployment resource consumption

Clik to expand
Precision Minimum GPU memory (Inference) Minimum GPU memory (Full Parameter Fine-tuning)
float32 6.08G 32.65G
float16(unquantized) 3.45G -(36.94G*)
bfloat16(unquantized) 3.45G 20.47G(33.93G*)
  • * Indicates use of mixed precision

Training Details

Training Data

Our training corpus is a diverse blend of both English and Chinese language data sources. The English component originates from the Pile dataset, and the Chinese component comprises data from Wudao, CBooks, and data meticulously gathered through web crawling.

To ensure data quality, we execute a thorough preprocessing pipeline, which involves purging special tags via rigorous data cleaning, data deduplication using Locality-Sensitive Hashing (LSH), and comprehensive filtering to eliminate low-quality content predominantly from advertisements or inappropriate material. We also examine the relationship between data volume and model capacity, assess the impact of different data types on model fitting effectiveness, and evaluate model training stability when handling mixed data sources. This analysis offers valuable insights into the vital role of pre-training data and the complexities of processing it. We also apply some mixture craftsmanship to construct training data based on data engineering and experience.

Training Procedure

This version of model was trained on about 241 billion English tokens and 82 billion Chinese tokens with a two-stage training strategy. It was trained as a autoregressive language model, using cross-entropy loss.

This version of model was also fine-tuned on 4 million Chinese instruction samples which are collected from open source instruction tuning datasets. The instruction tuning stage make the model can answer questions and perform multi-turns conversation in Chinese.

For more detailed information, please refer to the paper.

Downloads last month
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.