CRIA v1.3

πŸ’‘ Article | πŸ’» Github | πŸ“” Colab 1,2

What is CRIA?

krΔ“-Ι™ plural crias. : a baby llama, alpaca, vicuΓ±a, or guanaco.

Cria Logo
or what ChatGPT suggests, "Crafting a Rapid prototype of an Intelligent llm App using open source resources".

The initial objective of the CRIA project is to develop a comprehensive end-to-end chatbot system, starting from the instruction-tuning of a large language model and extending to its deployment on the web using frameworks such as Next.js.

Specifically, we have fine-tuned the llama-2-7b-chat-hf model with QLoRA (4-bit precision) using the mlabonne/CodeLlama-2-20k dataset. This fine-tuned model serves as the backbone for the CRIA chat platform.

πŸ“¦ Model Release

CRIA v1.3 comes with several variants.

πŸ”§ Training

It was trained on a Google Colab notebook with a T4 GPU and high RAM.

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.4.0

πŸ’» Usage

# pip install transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "davzoku/cria-llama2-7b-v1.3"
prompt = "What is a cria?"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    f'<s>[INST] {prompt} [/INST]',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

References

We'd like to thank:

  • mlabonne for his article and resources on implementation of instruction tuning
  • TheBloke for his script for LLM quantization.
Downloads last month
1,355
Safetensors
Model size
6.74B params
Tensor type
F32
Β·
FP16
Β·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for davzoku/cria-llama2-7b-v1.3

Finetunes
2 models

Dataset used to train davzoku/cria-llama2-7b-v1.3

Spaces using davzoku/cria-llama2-7b-v1.3 20

Collection including davzoku/cria-llama2-7b-v1.3