Instructions to use raghavnimbalkar/gpt2-screenplay-mac-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use raghavnimbalkar/gpt2-screenplay-mac-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("gpt2") model = PeftModel.from_pretrained(base_model, "raghavnimbalkar/gpt2-screenplay-mac-lora") - Notebooks
- Google Colab
- Kaggle
GPT-2 Small — LoRA Screenplay Adapter (Apple Silicon / MPS)
Study Context: This is the second model in a dual-architecture comparative study on screenplay generation using GPT-2 Small. The first model was a full-parameter fine-tune executed on a cloud NVIDIA T4 GPU. This adapter was trained entirely on consumer edge hardware — an Apple Silicon MacBook Air — using Low-Rank Adaptation (LoRA) to operate within the hard constraints of a fanless, unified-memory device.
Loading Notice — This is a PEFT Adapter
This repository contains LoRA adapter weights only, not a standalone model. You cannot load it with AutoModelForCausalLM.from_pretrained() directly. You must load the base gpt2 model and wrap it using peft.PeftModel. See the Usage section for the correct loading pattern.
Model Description
This model is a LoRA (Low-Rank Adaptation) fine-tune of OpenAI's GPT-2 Small (124M parameters), targeting causal language modeling for professional screenplay generation. Rather than updating all 124M parameters, LoRA injects trainable rank-decomposition matrices exclusively into the attention blocks (c_attn), leaving the base model frozen.
| Property | Value |
|---|---|
| Base Model | GPT-2 Small (openai-community/gpt2) |
| Fine-tune Method | PEFT / LoRA (Low-Rank Adaptation) |
| Total Parameters | 124,439,808 (base, frozen) |
| Trainable Parameters | 294,912 |
| % of Network Updated | 0.2364% |
| Target Modules | c_attn (attention projection layers) |
| Task | Causal Language Modeling / Screenplay Generation |
| Training Backend | MPS (Metal Performance Shaders) via PyTorch |
By updating only 294,912 parameters instead of 124 million, the entire training run was made feasible on hardware that would otherwise fail within minutes under a full-parameter regime.
Hardware & MLOps Optimizations
The Constraint Problem
A full-parameter fine-tuning attempt on the same MacBook Air was abandoned early. With gradient accumulation and a full optimizer state spanning all 124M parameters, the pipeline averaged 103 seconds per step — a pace that would have required over 133 hours to complete the same 4,700-step schedule, making it operationally non-viable on a thermally passive device.
The LoRA Solution
Switching to LoRA with the following configuration resolved all three critical constraints simultaneously:
| Constraint | Full-Parameter Outcome | LoRA Outcome |
|---|---|---|
| Memory (OOM) | Frequent crashes | Stable — optimizer state ~2.3MB |
| Thermal throttling | Sustained throttle >30min | No throttling across 7h 51m run |
| Step throughput | ~103 seconds/step | ~6.01 seconds/step (17× faster) |
The LoRA adapter's optimizer state is proportional only to trainable parameters (294,912), not the full network — this is what enabled the MPS backend to maintain sustained throughput on unified memory without page faults or thermal shutdown.
Compute Profile
| Property | Value |
|---|---|
| Hardware | Apple MacBook Air M2 Base (Unified Memory) |
| Compute Backend | PyTorch MPS (Metal Performance Shaders) |
| Precision | Default MPS precision |
| Optimizer | AdamW |
| Batch Size | per_device_train_batch_size = 4 |
| Gradient Accumulation | Disabled (memory constraint) |
| Avg. Step Throughput | ~6.01 seconds/step |
| Total Training Time | 7 hours, 51 minutes, 2 seconds |
Training Metrics
Dataset Coverage
The full screenplay corpus used in this study contains approximately 94 million tokens. Due to the step budget constraint of a local run, this adapter was trained on approximately 51% of the corpus (0.51 epoch coverage), compared to the full-parameter cloud model which completed a full epoch.
| Property | Value |
|---|---|
| Total Corpus Size | ~94 million tokens |
| Epoch Coverage | 0.51 (51% of corpus) |
| Total Steps | 4,700 |
Loss Convergence
| Metric | Value |
|---|---|
| Final Training Loss | 1.9806 |
| Final Evaluation Loss | 2.4017 |
MLOps Trade-off Assessment
The train/eval loss gap (1.98 → 2.40) reflects two compounding constraints inherent to this training configuration:
Partial corpus coverage. At 0.51 epochs, the model has not converged on the full vocabulary and structural distribution of the screenplay corpus. The full-parameter cloud model, which completed a full epoch, achieved a final validation loss of 1.3194 — a gap of ~1.08 loss units attributable to both architecture and data coverage.
LoRA's intentional frozen-base trade-off. LoRA achieves its memory efficiency by keeping 99.76% of the network frozen. This is architecturally correct for adapter-based transfer learning, but imposes an upper bound on how deeply the model can reshape its internal representations compared to a full-parameter overwrite.
This is not a model failure. The adapter successfully acquired structural screenplay formatting conventions — scene sluglines, character cues, dialogue block structure — within a training envelope that would be impossible for full fine-tuning on the same device. It represents a calibrated, deliberate engineering trade-off: edge-feasibility over depth of convergence.
Usage & Inference
Installation
pip install transformers peft torch
Loading the Adapter
Because this is a PEFT adapter, loading requires two steps: initialize the frozen base model, then wrap it with the adapter weights.
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from peft import PeftModel
base_model_id = "openai-community/gpt2"
adapter_id = "raghavnimbalkar/gpt2-screenplay-mac-lora"
# --- Device Selection ---
if torch.backends.mps.is_available():
device = torch.device("mps") # Apple Silicon
elif torch.cuda.is_available():
device = torch.device("cuda") # NVIDIA GPU
else:
device = torch.device("cpu")
print(f"Using device: {device}")
# --- Load Base Model + Adapter ---
tokenizer = GPT2Tokenizer.from_pretrained(base_model_id)
base_model = GPT2LMHeadModel.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.to(device)
model.eval()
Running Inference
prompt = "INT. ABANDONED WAREHOUSE - NIGHT\n\nRAIN hammers the corrugated roof. DETECTIVE COLE moves through the dark, flashlight cutting the shadows."
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
output = model.generate(
**inputs,
max_length=512,
temperature=0.85,
top_p=0.92,
repetition_penalty=1.15,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Recommended Sampling Parameters
| Parameter | Recommended Value | Notes |
|---|---|---|
max_length |
Up to 512 |
GPT-2 context window limit |
temperature |
0.85 |
Allows moderate creative variance |
top_p |
0.92 |
Nucleus sampling threshold |
repetition_penalty |
1.15 |
Essential — prevents screenplay boilerplate looping |
do_sample |
True |
Required for temperature/top_p sampling to activate |
Optional: Merging Adapter Weights
If you want a single standalone model file (e.g., for faster inference without the PEFT library dependency), you can merge the adapter into the base model and save:
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./screenplay-gpt2-lora-merged")
tokenizer.save_pretrained("./screenplay-gpt2-lora-merged")
Note: The merged model will be ~500MB (full GPT-2 Small size) rather than the ~1.2MB adapter. The merged weights are mathematically identical to using
PeftModel— this is purely a deployment convenience.
Comparison with Full-Parameter Model
This adapter is one half of an ongoing comparative study. The table below summarizes the key architectural and performance differences between both trained models.
| Property | Full-Parameter (Cloud) | LoRA Adapter (Local) |
|---|---|---|
| Hardware | NVIDIA T4 (Cloud) | Apple Silicon MacBook Air (MPS) |
| Trainable Params | 124,439,808 (100%) | 294,912 (0.24%) |
| Epoch Coverage | 1.0 (full corpus) | 0.51 (half corpus) |
| Total Steps | 9,272 | 4,700 |
| Training Time | 7h 43m 30s | 7h 51m 02s |
| Final Eval Loss | 1.3194 | 2.4017 |
| Step Throughput | ~3.0s/step (T4) | ~6.01s/step (MPS) |
| MLOps Event | Hardware preemption + hot-resume | 17× speedup via LoRA optimization |
Both models spent approximately the same wall-clock time training. The divergence in final loss is a direct reflection of full-parameter depth vs. adapter-based efficiency — not a difference in compute investment.
Intended Use
Intended uses:
- Screenplay drafting assistance and scene continuation on consumer hardware
- Comparative reference point for PEFT vs. full fine-tuning studies on GPT-2
- Offline, locally runnable script generation (no cloud dependency after download)
- Research into LoRA effectiveness on structured, domain-specific creative text
Out-of-scope uses:
- Production script generation without editorial review
- Factual or knowledge-retrieval tasks
- Any application requiring output truthfulness or citation
Bias, Risks, and Limitations
- Trained on an unfiltered screenplay corpus; outputs may reflect mature themes, stereotypes, or biases present in the training data.
- At 0.51 epoch coverage, the model's understanding of the full screenplay vocabulary distribution is incomplete. Long-form coherence is limited.
- The train/eval loss gap suggests moderate overfitting on seen structural patterns. Outputs are more formulaic than the full-parameter counterpart.
- No RLHF or safety fine-tuning has been applied.
Citation
@article{radford2019language,
title = {Language Models are Unsupervised Multitask Learners},
author = {Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
year = {2019}
}
@article{hu2021lora,
title = {LoRA: Low-Rank Adaptation of Large Language Models},
author = {Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
year = {2021},
journal = {arXiv preprint arXiv:2106.09685}
}
Model Card Contact
For questions about methodology, training configuration, or the broader comparative study, please open an issue in this repository.
- Downloads last month
- 45
Model tree for raghavnimbalkar/gpt2-screenplay-mac-lora
Base model
openai-community/gpt2