EffiLLaMA
EffiLLaMA is a fine-tuned version of the LLaMA 3.2-1B Instruct model, designed for generating Harry Potter-themed text. The fine-tuning was conducted using LoRA (Low-Rank Adaptation) and QLoRA techniques to enable parameter-efficient fine-tuning for causal language modeling tasks.
Model Details
Model Description
EffiLLaMA was developed to fine-tune the LLaMA 3.2-1B Instruct model on a dataset derived from text extracted from the Harry Potter book series. The goal was to create a model that generates responses grounded in the Harry Potter universe while staying consistent with the canon.
- Developed by: Anantha Padmanaban Krishna Kumar (fine-tuning) and Meta (base model development)
- Model type: Causal Language Model with LoRA/QLoRA fine-tuning applied to the base model
meta-llama/Llama-3.2-1B-Instruct
- Language(s) (NLP): English
- License: MIT (fine-tuned model) and the original base model may follow Meta's licensing terms
- Finetuned from model: meta-llama/Llama-3.2-1B-Instruct (developed by Meta)
Uses
Direct Use
EffiLLaMA can generate Harry Potter-themed text for entertainment, storytelling, and educational purposes. It is best used in settings where adherence to the Harry Potter canon is important.
Downstream Use
EffiLLaMA can be fine-tuned further for more specific tasks or integrated into larger systems that require Harry Potter-related text generation.
Out-of-Scope Use
This model should not be used for generating harmful, malicious, or misleading content. It is not designed to handle tasks unrelated to its fine-tuning domain.
Bias, Risks, and Limitations
EffiLLaMA is fine-tuned on text extracted from the Harry Potter book series, which may introduce biases present in the source material. It might also fail to handle out-of-domain inputs effectively or misinterpret questions that require broader contextual knowledge.
Recommendations
Users should ensure the generated text aligns with their requirements and double-check the content for accuracy if used in critical applications.
How to Get Started with the Model
Here’s an example to get started with EffiLLaMA:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
def load_model_and_tokenizer(base_model_name: str, lora_checkpoint: str):
"""
Load the tokenizer and fine-tuned LoRA model.
Args:
base_model_name (str): Name of the base model from Hugging Face.
lora_checkpoint (str): Path or Hugging Face repository of the LoRA fine-tuned model.
Returns:
model: The LoRA fine-tuned model.
tokenizer: Tokenizer for the model.
"""
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map="auto",
torch_dtype=torch.float16,
)
model = PeftModel.from_pretrained(base_model, lora_checkpoint)
# Set the padding token to the EOS token if not defined
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id
model.eval()
return model, tokenizer
def generate_response(
model,
tokenizer,
input_text: str,
prompt_template: str = None,
max_length: int = 512,
num_beams: int = 4,
temperature: float = 0.7,
top_k: int = 50,
top_p: float = 0.9,
repetition_penalty: float = 1.2,
do_sample: bool = True,
):
"""
Generate a response from the model given an input prompt.
Args:
model: The LoRA fine-tuned model.
tokenizer: Tokenizer for the model.
input_text (str): User input or prompt.
prompt_template (str): Template for generating responses (optional).
max_length (int): Maximum length of the response.
num_beams (int): Number of beams for beam search.
temperature (float): Sampling temperature.
top_k (int): Top-k sampling parameter.
top_p (float): Top-p sampling parameter.
repetition_penalty (float): Penalty for word repetition.
do_sample (bool): Whether to enable sampling.
Returns:
str: Generated response from the model.
"""
if prompt_template:
input_text = prompt_template.format(input_text=input_text)
inputs = tokenizer(
input_text,
return_tensors="pt",
padding=True,
truncation=True,
)
with torch.no_grad():
output = model.generate(
input_ids=inputs["input_ids"].to(model.device),
attention_mask=inputs["attention_mask"].to(model.device),
max_length=max_length,
num_beams=num_beams,
temperature=temperature if do_sample else None,
top_k=top_k if do_sample else None,
top_p=top_p if do_sample else None,
repetition_penalty=repetition_penalty,
do_sample=do_sample,
early_stopping=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
return response.strip()
def main():
"""
Main function to load the model, provide user prompts, and generate responses.
"""
# Configuration
BASE_MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"
LORA_CHECKPOINT = "AIAlbus/EffiLLaMA" # Replace with your Hugging Face model repo or local path
PROMPT_TEMPLATE = """Analyze the given question based on facts established in the Harry Potter series canon.
Rules:
1. Use only information from the books, films, or official sources like interviews with J.K. Rowling.
2. Avoid inventing details, characters, or events not present in canon.
3. If analysis or interpretation is provided, explicitly state it as such.
Question: {input_text}
Factual analysis:
"""
print("Loading model and tokenizer...")
model, tokenizer = load_model_and_tokenizer(BASE_MODEL_NAME, LORA_CHECKPOINT)
print("Model and tokenizer loaded successfully!")
print("\n--- Welcome to EffiLLaMA Inference Script ---")
print("Enter your prompt below (type 'exit' to quit):\n")
while True:
user_input = input("Your Prompt: ").strip()
if user_input.lower() == "exit":
print("Exiting the inference script. Goodbye!")
break
print("\nGenerating response...\n")
response = generate_response(
model=model,
tokenizer=tokenizer,
input_text=user_input,
prompt_template=PROMPT_TEMPLATE,
)
print(f"Response:\n{response}")
print("-" * 80)
if __name__ == "__main__":
main()
Training Details
For full training details, including dataset preparation, fine-tuning scripts, and configuration, please refer to the EffiLLaMA GitHub repository.
Model tree for AIAlbus/EffiLLaMA
Base model
meta-llama/Llama-3.2-1B-Instruct