EffiLLaMA

EffiLLaMA is a fine-tuned version of the LLaMA 3.2-1B Instruct model, designed for generating Harry Potter-themed text. The fine-tuning was conducted using LoRA (Low-Rank Adaptation) and QLoRA techniques to enable parameter-efficient fine-tuning for causal language modeling tasks.

Model Details

Model Description

EffiLLaMA was developed to fine-tune the LLaMA 3.2-1B Instruct model on a dataset derived from text extracted from the Harry Potter book series. The goal was to create a model that generates responses grounded in the Harry Potter universe while staying consistent with the canon.

Developed by: Anantha Padmanaban Krishna Kumar (fine-tuning) and Meta (base model development)
Model type: Causal Language Model with LoRA/QLoRA fine-tuning applied to the base model meta-llama/Llama-3.2-1B-Instruct
Language(s) (NLP): English
License: MIT (fine-tuned model) and the original base model may follow Meta's licensing terms
Finetuned from model: meta-llama/Llama-3.2-1B-Instruct (developed by Meta)

Uses

Direct Use

EffiLLaMA can generate Harry Potter-themed text for entertainment, storytelling, and educational purposes. It is best used in settings where adherence to the Harry Potter canon is important.

Downstream Use

EffiLLaMA can be fine-tuned further for more specific tasks or integrated into larger systems that require Harry Potter-related text generation.

Out-of-Scope Use

This model should not be used for generating harmful, malicious, or misleading content. It is not designed to handle tasks unrelated to its fine-tuning domain.

Bias, Risks, and Limitations

EffiLLaMA is fine-tuned on text extracted from the Harry Potter book series, which may introduce biases present in the source material. It might also fail to handle out-of-domain inputs effectively or misinterpret questions that require broader contextual knowledge.

Recommendations

Users should ensure the generated text aligns with their requirements and double-check the content for accuracy if used in critical applications.

How to Get Started with the Model

Here’s an example to get started with EffiLLaMA:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel


def load_model_and_tokenizer(base_model_name: str, lora_checkpoint: str):
    """
    Load the tokenizer and fine-tuned LoRA model.

    Args:
        base_model_name (str): Name of the base model from Hugging Face.
        lora_checkpoint (str): Path or Hugging Face repository of the LoRA fine-tuned model.

    Returns:
        model: The LoRA fine-tuned model.
        tokenizer: Tokenizer for the model.
    """
    tokenizer = AutoTokenizer.from_pretrained(base_model_name)
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        device_map="auto",
        torch_dtype=torch.float16,
    )
    model = PeftModel.from_pretrained(base_model, lora_checkpoint)
    
    # Set the padding token to the EOS token if not defined
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        model.config.pad_token_id = tokenizer.pad_token_id

    model.eval()
    return model, tokenizer


def generate_response(
    model,
    tokenizer,
    input_text: str,
    prompt_template: str = None,
    max_length: int = 512,
    num_beams: int = 4,
    temperature: float = 0.7,
    top_k: int = 50,
    top_p: float = 0.9,
    repetition_penalty: float = 1.2,
    do_sample: bool = True,
):
    """
    Generate a response from the model given an input prompt.

    Args:
        model: The LoRA fine-tuned model.
        tokenizer: Tokenizer for the model.
        input_text (str): User input or prompt.
        prompt_template (str): Template for generating responses (optional).
        max_length (int): Maximum length of the response.
        num_beams (int): Number of beams for beam search.
        temperature (float): Sampling temperature.
        top_k (int): Top-k sampling parameter.
        top_p (float): Top-p sampling parameter.
        repetition_penalty (float): Penalty for word repetition.
        do_sample (bool): Whether to enable sampling.

    Returns:
        str: Generated response from the model.
    """
    if prompt_template:
        input_text = prompt_template.format(input_text=input_text)

    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        padding=True,
        truncation=True,
    )

    with torch.no_grad():
        output = model.generate(
            input_ids=inputs["input_ids"].to(model.device),
            attention_mask=inputs["attention_mask"].to(model.device),
            max_length=max_length,
            num_beams=num_beams,
            temperature=temperature if do_sample else None,
            top_k=top_k if do_sample else None,
            top_p=top_p if do_sample else None,
            repetition_penalty=repetition_penalty,
            do_sample=do_sample,
            early_stopping=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.strip()


def main():
    """
    Main function to load the model, provide user prompts, and generate responses.
    """
    # Configuration
    BASE_MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"
    LORA_CHECKPOINT = "AIAlbus/EffiLLaMA"  # Replace with your Hugging Face model repo or local path
    PROMPT_TEMPLATE = """Analyze the given question based on facts established in the Harry Potter series canon.

Rules:
1. Use only information from the books, films, or official sources like interviews with J.K. Rowling.
2. Avoid inventing details, characters, or events not present in canon.
3. If analysis or interpretation is provided, explicitly state it as such.

Question: {input_text}

Factual analysis:
"""

    print("Loading model and tokenizer...")
    model, tokenizer = load_model_and_tokenizer(BASE_MODEL_NAME, LORA_CHECKPOINT)
    print("Model and tokenizer loaded successfully!")

    print("\n--- Welcome to EffiLLaMA Inference Script ---")
    print("Enter your prompt below (type 'exit' to quit):\n")

    while True:
        user_input = input("Your Prompt: ").strip()
        if user_input.lower() == "exit":
            print("Exiting the inference script. Goodbye!")
            break

        print("\nGenerating response...\n")
        response = generate_response(
            model=model,
            tokenizer=tokenizer,
            input_text=user_input,
            prompt_template=PROMPT_TEMPLATE,
        )
        print(f"Response:\n{response}")
        print("-" * 80)


if __name__ == "__main__":
    main()

Training Details

For full training details, including dataset preparation, fine-tuning scripts, and configuration, please refer to the EffiLLaMA GitHub repository.

AIAlbus
/

EffiLLaMA