Model Card for Model ID

I've fine-tuned a state-of-the-art Generative AI model using Hugging Face for customer support FAQ chat applications. This model is designed to provide accurate and helpful responses to frequently asked questions, making it a valuable tool for improving user experiences in customer support interactions. Its specialized training ensures it can understand and address a wide range of customer queries, making it an excellent choice for automating customer support tasks and enhancing overall efficiency.

Model Details

I have implemented a sharded model TinyPixel/Llama-2–7B-bf16-sharded which involves dividing a large neural network model into multiple smaller pieces, typically more than 14 pieces in our case. This sharding strategy has proven to be highly beneficial when combined with the ‘accelerate’ framework

When a model is sharded, each shard represents a portion of the overall model’s parameters. Accelerate can then efficiently manage these shards by distributing them across various parts of the memory, including GPU memory and CPU memory. This dynamic allocation of shards allows us to work with very large models without requiring an excessive amount of memory

Model Description

Developed by: [Tony Esposito]
Model type: [LLama2 family]
License: [Apache 2.0]
Finetuned from model [optional]: [TinyPixel/Llama-2-7B-bf16-sharded]

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: bfloat16

Framework versions

PEFT 0.7.0.dev0

fbanespo
/

Llama2-7b-qlora-chat-support-bot-faq

Model Card for Model ID

Model Details

Model Description

Training procedure

Framework versions

Model tree for fbanespo/Llama2-7b-qlora-chat-support-bot-faq