Deci
/

Text Generation
Transformers
Safetensors
English
deci
Deci AI
DeciLM
custom_code
Eval Results
Edit model card

DeciLM 6B

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model's architecture was generated using Deci's proprietary Neural Architecture Search-based technology, AutoNAC.

Model Details

Model Description

Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that's up to 15 times that of Llama 2 7B's. DeciLM-6B was further fine-tuned using LoRA for instruction following on a subset of the OpenOrca dataset, creating DeciLM 6B-Instruct

  • Developed by: Deci
  • Model type: DeciLM is an auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention.
  • Language(s) (NLP): English
  • License: Llama 2 Community License Agreement with an extention of Deci regarding hosting service providers.

Model Architecture

Parameters Layers Heads Sequence Length GQA num_key_value_heads* Hidden Size
5.7B 32 32 4096 Variable 4096

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer of the model.

  • Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
  • Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Model Sources

Uses

The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

How to Get Started with the Model

Use the code below to get started with the model.

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciLM-6b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))

Training Details

DeciLM 6B underwent training utilizing a subset of the SlimPajamas dataset, leveraging advanced proprietary methodologies allowing for fast training.

Evaluation

Below are DeciLM's 6B evaluation results.

Average ARC Challenge* ARC Easy* BoolQ HellaSwag* LAMBDA OpenAI OpenBookQA PIQA TruthfulQA Winogrande
60.33 42.06 70.02 71.01 74.58 69.78 34 77.09 36.19 68.03
Accuracy-norm score*

Runtime Benchmarks

Inference Tool/Hardware A10 (tokens/sec)
PyTorch 652.49
Infery LLM 2,029.6
  • Throughput (tokens/sec) - Measured with optimal batch - PyTorch BS 64, Infery LLM BS 128
  • In order to replicate the results of the PyTorch benchmark, use this code example

How to Cite

Please cite this model using this format.

@misc{DeciFoundationModels,
title = {DeciLM 6B},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/Deci/DeciLM-6b](https://huggingface.co/Deci/DeciLM-6b)},
}
Downloads last month
766
Safetensors
Model size
5.72B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train Deci/DeciLM-6b

Spaces using Deci/DeciLM-6b 3

Collection including Deci/DeciLM-6b

Evaluation results