Chryslerx10/Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit

Model details

Model Name: Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit
Author: Chryslerx10
Base Model: Llama-3.2-1B
Task: Product Question Answering (QA) based on customer reviews
Framework: Hugging Face Transformers
PEFT Framework: PEFT with LoRA fine-tuning
Quantization: 4-bit with BitsAndBytes for efficient deployment

Model Description

This model is fine-tuned on a dataset of product-related QA pairs generated from customer reviews. The model is designed to assist users by answering questions about products based on relevant review data. It leverages LoRA (Low-Rank Adaptation) for efficient parameter fine-tuning and quantization for deployment on resource-constrained devices.

Key features

Instruction Tuning: Fine-tuned with clear step-by-step instructions for generating accurate, user-friendly answers.
4-bit Quantization: Optimized for efficient inference with low memory usage.
LoRA Fine-tuning: Enables effective fine-tuning with fewer resources while maintaining performance.
Conversational Tone: Provides responses that are conversational, relevant, and easy to understand.

Limitations

Product-related Questions Only: The model is trained to respond to questions about products. It politely informs the user when questions are unrelated.
Review Dependency: If sufficient review data is unavailable, the model acknowledges the limitation in its response.

Training Details

Dataset

The model was trained on Amazon reviews dataset of product reviews from the beauty category. The dataset was preprocessed into a structured format with the following fields:

Question: User question. Derived by using sentiment analysis on review segments, and assigning a basic question from a pool.
Related Reviews: Context containing relevant product reviews. Set of reviews sharing the same sentiment and question.
Answer: Generated using BART model. Summaries of the reviews in related reviews, added as context to the model

Fine tuning configuration

Batch Size: 2 per device (train/eval)
Epochs: 30
Learning Rate: 2e-5
Evaluation Strategy: Per epoch
Save Strategy: Epoch-based, with a limit of 2 checkpoints.

PEFT configuration

Adapter Type: LoRA
LoRA Rank (r): 8
Alpha: 16
Dropout: 0.5

Quantization configuration

Quantization Type: NF4
Compute Dtype: Float16
Double Quantization: Enabled

Inference

Loading the model

    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel, PeftConfig
    
    peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit"
    config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
    
    model = AutoModelForCausalLM.from_pretrained(
        config.base_model_name_or_path,
        device_map='auto',
        return_dict=True
    )
    
    tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
    tokenizer.pad_token = tokenizer.eos_token
    
    peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')

Generating outputs

  def create_chat_template(question, context):
    text = f"""[Instruction] You are a question-answering agent specialized in helping users with their queries about products based on relevant customer reviews. Your job is to analyze the reviews provided in the context and generate an accurate, helpful, and informative response to the question asked.

    1. Read the user's question carefully.
    2. Use the reviews given in the context to formulate your answer.
    3. If the product reviews don't contain enough information or is missing, inform the user that there aren't sufficient reviews to answer the question.
    4. If the question is unrelated to products, politely inform the user that you can only assist with product-related queries.
    5. Structure your response in a conversational and user-friendly manner. 

    Your goal is to provide helpful and contextually relevant answers to product-related questions.

    [Question]\n {row['question']}

    [Related Reviews]\n {row['review'] if row['review'] else ''}

    [Answer]\n {row['summary']}"""
  return text

  def generate_response(question, context):
      text = create_chat_template(question, context)
      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
      
      config = GenerationConfig(
          max_length=256,
          temperature=0.5,
          top_k=5,
          top_p=0.95,
          repetition_penalty=1.2,
          do_sample=True,
          penalty_alpha=0.6
      )
      
      response = model.generate(**inputs, generation_config=config)
      output = tokenizer.decode(response[0], skip_special_tokens=True)
      return output
  
  # Example usage
  question = "How is the battery life of this product?"
  response = generate_response(question)
  print(response)

Chryslerx10
/

Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit