ATC-LLAMA-LORA: ICAO-Compliant Pilot Response Generator

Model Summary

ATC-LLAMA-LORA is a LoRA fine-tuned version of Meta Llama 3.1 8B Instruct trained to generate ICAO-compliant pilot radiotelephony responses to Air Traffic Control (ATC) transmissions. It is designed for use in UAV ground control stations and human-autonomy teaming research in aviation.

The model was developed as part of the SCOPE (Structured Constraint Optimization for Grammar-informed gEneration) framework at the AIDA³ Research Center, Purdue University, under research on safe human-autonomous teaming in aviation.


Intended Use

Primary Use

  • Generating real-time pilot readback responses to ATC instructions in UAV ground control stations
  • Evaluating LLM-based ATC communication compliance against ICAO Doc 9432 standards
  • Research on language model adaptation for safety-critical aviation communication

Out-of-Scope Use

  • This model is not certified for use in real-world flight operations
  • Not intended for manned aircraft communication systems
  • Not a substitute for certified avionics or licensed ATC systems

Training Details

Base Model

  • meta-llama/Llama-3.1-8B-Instruct

Training Data

  • Source: LDC ATC Corpus (Linguistic Data Consortium)
  • Raw pairs: 1,326 ATC–pilot exchange pairs
  • After cleaning: 238 pairs (179 gold + 59 silver)
    • Gold: Human-annotated readback_correct=True pairs
    • Silver: Heuristic-filtered pairs from instruction_readback, checkin_response, and question_answer exchange types
  • Excluded: 115 annotated non-compliant pairs (readback_correct=False), filler words, non-ICAO phrases, exchanges under 3 or over 40 tokens

Cleaning Criteria

Pairs were retained if they:

  • Were annotated as readback_correct=True by human experts, OR
  • Belonged to instruction/readback exchange types and passed heuristic filters:
    • No filler words (UH, UM, THANKS, SIR, OKAY, etc.)
    • No informal expressions (GONNA, THAT'S, VERY GOOD, etc.)
    • Response length between 3 and 40 tokens

Fine-Tuning Method

  • Method: LoRA (Low-Rank Adaptation) via PEFT
  • Trainer: TRL SFTTrainer
  • LoRA rank: 16
  • LoRA alpha: 32
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Dropout: 0.05
  • Epochs: 5
  • Batch size: 2 (effective 8 with gradient accumulation)
  • Learning rate: 2e-4 with cosine scheduler
  • Precision: bfloat16

Training Results

Epoch Train Loss Eval Loss Token Accuracy
1 0.862 0.385 90.6%
2 0.369 0.327 91.2%
3 0.274 0.302 91.7%
4 0.193 0.302 92.6%
5 0.136 0.315 92.6%

Best checkpoint selected at epoch 3 based on minimum eval loss (0.302) on the gold-only evaluation set.


Evaluation

The model is evaluated against:

  • ICAO Doc 9432 Manual of Radiotelephony phraseology standards
  • Human expert annotations from the LDC ATC Corpus (readback_correct field)
  • SCOPE whitelist n-gram compliance metric for ICAO-standard vocabulary coverage

System Prompt

The model was trained and should be prompted with:

You are a pilot responding to Air Traffic Control (ATC) transmissions.
Respond concisely using ICAO radiotelephony phraseology.
Always readback key instructions and values exactly as given.
End every transmission with your callsign.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.1-8B-Instruct"
LORA_REPO  = "Sabine-Brunswicker/ATC-LLAMA-LORA"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base      = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, LORA_REPO)
model = model.merge_and_unload()   # fuse adapter for faster inference

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

SYSTEM_PROMPT = (
    "You are a pilot responding to Air Traffic Control (ATC) transmissions. "
    "Respond concisely using ICAO radiotelephony phraseology. "
    "Always readback key instructions and values exactly as given. "
    "End every transmission with your callsign."
)

messages = [
    {"role": "system",    "content": SYSTEM_PROMPT},
    {"role": "user",      "content": "ATC: Ultra One Two Three, turn left heading two four zero, descend and maintain three thousand."},
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
output = pipe(prompt, max_new_tokens=100, temperature=0.1, do_sample=True, return_full_text=False)
print(output[0]["generated_text"])
# Expected: "Left heading Two Four Zero, descend and maintain Three Thousand. Ultra One Two Three."

Limitations

  • Small training set (238 pairs): The model has learned ATC phraseology patterns but may not generalize to all ATC instruction types. Coverage is strongest for heading, altitude, and speed instructions.
  • Corpus bias: The LDC ATC Corpus covers Boston TRACON operations. Performance on other ATC facilities, accents, or phraseology variants may vary.
  • No real-time telemetry grounding during training: The model was trained on transcript pairs without live avionics context. Telemetry-conditioned responses are handled via system prompt injection at inference time in the target application.
  • Not safety-certified: This model has not undergone DO-178C or equivalent aviation software certification and must not be used in certified flight systems.

Citation

If you use this model in your research, please cite:

@misc{atc-llama-lora-2026,
  author    = {Awoyera, Oluwafemi O., Brunswicker, Sabine and contributors},
  title     = {ATC-LLAMA-LORA: ICAO-Compliant Pilot Response Generation via LoRA Fine-Tuning},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/Sabine-Brunswicker/ATC-LLAMA-LORA}
}

Acknowledgements

This work was conducted at the AIDA³ Research Center (Center on Artificial Intelligence for Digital, Autonomous and Augmented Aviation) at Purdue University.


License

This model is built on Meta Llama 3.1 and is subject to the Meta Llama 3.1 Community License. Training data from the LDC ATC Corpus is subject to LDC licensing terms.

Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sabine-Brunswicker/ATC-LLAMA-LORA

Adapter
(2451)
this model