Legal Case Summarizer: Fine-tuned LLaMA-3.2-3B-Instruct

This model is a fine-tuned version of Meta's LLaMA-3.2-3B-Instruct, specifically optimized for legal case summarization with bilingual (Arabic-English) capabilities. It generates structured JSON summaries of legal cases, maintaining English keys with Arabic values.

Model Details

Base Model: meta-llama/Llama-3.2-3B-Instruct
Task: Legal Case Summarization
Language Support: Bilingual (Arabic content, English structure)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Framework: 🤗 Transformers + DeepSpeed + PEFT
License: [Same as base LLaMA model]

Intended Use

This model is designed for:

Summarizing legal cases in a structured JSON format
Extracting key information from legal documents
Generating bilingual summaries (Arabic content with English structure)
Supporting legal research and document analysis

Training Details

Training Data

The model was trained on a specialized dataset of legal cases, where each example follows a structured format including:

Case Information (numbers, dates, courts)
Persons Involved
Case Background
Key Issues
Arguments Presented
Court Findings
Outcomes
Additional Notes

Training Procedure

Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Configuration:
- Rank: 64
- Alpha: 16
- Target Modules: Attention layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- Dropout: 0.05

Training Hyperparameters

{
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 32,
    "num_train_epochs": 3,
    "learning_rate": 2e-4,
    "bf16": true,
    "max_seq_length": 10500,
    "evaluation_strategy": "steps",
    "eval_steps": 500,
    "save_steps": 500
}

Performance and Limitations

Strengths

Generates well-structured JSON summaries
Handles bilingual content effectively
Maintains consistent formatting
Extracts key legal information systematically

Limitations

Maximum input length: 10500 tokens
Limited to Arabic-English legal content
Requires well-formatted input following the template
May not handle complex legal terminology outside its training domain

Example Output

{
  "case_information": {
    "case_number": "حالة رقم ١٢٣٤",
    "date_of_ruling": "٢٠٢٣/٠١/١٥",
    "court": "المحكمة العليا",
    "main_case_topic": "نزاع تجاري",
    "parties_involved": "شركة أ ضد شركة ب"
  },
  "persons_involved": [
    {
      "name": "محمد أحمد",
      "role": "المدعي"
    }
  ],
  "background_of_the_case": {
    "overview": "نزاع تجاري حول عقد توريد..."
  }
  // Additional fields omitted for brevity
}

Direct Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ahmadsakor/Llama3.2-3B-Instruct-Legal-Summarization"

# Set the device map based on GPU availability
device_map = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token  # Use EOS token as padding token
tokenizer.padding_side = 'left'  # Left padding for batch alignment


# Load the model with the appropriate dtype and device mapping
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device_map
)

# Create the text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=dtype,
    device_map=device_map
)

# System prompt for the AI assistant's task
system_prompt = """
You are a legal assistant AI that summarizes legal cases in JSON format following a specific template. 
Please ensure all outputs are structured and all keys are in English while the values are in Arabic. 
Be concise, informative, and follow the template strictly.
"""

# Template prompt for the legal text summary
template_prompt="""
###
Legal Text Summary Template
1. Case Information

Case Number: [Insert case number]
Date of Ruling: [Insert date of ruling]
Court: [Insert court name]
Main Case Topic: [Mention the main topic of the case]
Parties Involved: [Insert names of parties]

2. Persons involved including their:
 [List the Persons in the text including their roles in a structured format (Name, Role)]

3. Background of the Case

Overview: [Briefly describe the nature of the case and context]
List of Relevant Dates with corresponding events in Arabic (Date, Event).


4. Key Issues

[List the main legal issues or disputes in the case]

5. Arguments Presented

Claimant’s Arguments:
[Summarize the arguments made by the claimant]
Defendant’s Arguments:
[Summarize the arguments made by the defendant]

6. Court's Findings

Evidence Reviewed: [Mention the evidence the court relied on]
Rulings Made: [Summarize the rulings made by the court]
Legal Principles Applied: [List any relevant legal principles or statutes cited]

7. Outcome

Final Decision: [Describe the court's final decision]
Implications: [Discuss any implications of the ruling]

8. Additional Notes

[Any additional observations or relevant information that should be noted]
#####
Example of output json format:
{
  "case_information": {
    "case_number": "",
    "date_of_ruling": "",
    "court": "",
    "main_case_topic": "",
    "parties_involved": ""
  },
  "persons_involved": [
    {
      "name": "",
      "role": ""
    }
  ],
  "background_of_the_case": {
    "overview": "",
    "relevant_dates": [
      {
        "date": "",
        "event": ""
      }
    ]
  },
  "key_issues": [
  ],
  "arguments_presented": {
    "claimants_arguments": "",
    "defendants_arguments": ""
  },
  "courts_findings": {
    "evidence_reviewed": "",
    "rulings_made": "",
    "legal_principles_applied": [
    ]
  },
  "outcome": {
    "final_decision": "",
    "implications": ""
  },
  "additional_notes": {
    "observations": ""
  }
}
###
Input:\n
"""

full_text = "قرار محكمة النقض رقم 1530 بتاريخ 17 نوفمبر 2022 في القضية الجنحية رقم 20201162213 استئناف - عدم أداء القسط الجزافي - أثره. لم يظهر من وثائق الملف ما يفيد أداء القسط الجزافي أثناء المرحلة الاستئنافية، فإن المحكمة لما رتبت على ذلك عدم قبول الاستئناف تكون قد طبقت القانون تطبيقاً سليماً وأن ما أثير بهذا الخصوص يبقى غير مؤسس. رفض الطلب باسم الطاعن وفق تصريح أفضى به بواسطة الأستاذ (ه.ب) بتاريخ 17. تم استئناف القضية المرفوعة أمام المحكمة الابتدائية بتازة والرامي إلى نقض القرار الصادر عن غرفة الاستئناف، في القضية ذات العدد 7 القاضي بإلغاء الحكم الابتدائي بشأن تعويض مدني قدره 4500 درهم بعد مؤاخذة المطلوبين في النقض (ع-ر١) و(م.1) من أجل جنحة انتزاع عقار بمنزاله الغير والحكم على كل واحد منهم بشهر واحد حبسا موقوف التنفيذ وغرامة نافذة قدرها 500 درهم. إن محكمة النقض، بعد أن تلا المستشار السيد المحفوظ سندالي التقرير المكلف به في القضية، وبعد الإنصات إلى المحامي العام السيد محمد جعبة في مستنتجاته، وبعد المداولة طبقاً للقانون. نظراً للمذكرة المدلى بها من لدن الطاعن بواسطة الأستاذ (ع.) المحامي بمدينة مكناس والمقبول للترافع أمام محكمة النقض والمستوفية للشروط الشكلية المتطلبة قانوناً. في شأن وسيلتي النقض مجتمعتين المستدل بها على النقض والمتخذتين في مجموعها من حرق مقتضيات الفقرة السابعة من المادة 365 من قانون المسطرة الجنائية ونقصان التعليل المنزلي معللاً، حيث انعدامه ذلك أن المحكمة مصدرة القرار المطعون فيه حالفت مقتضيات المادة أعلاه عندما لم تعمل على استدعاء الشاهد (ل.ط) المستمع إليه الوحيد ابتدائياً بعد صرف باقي الشهود الحاضرين من القاعة، والذين لم يتم الاستماع إليهم بدون توضيح السبب في ذلك. كما أن القرار موضوع النقض عندما قضى بعدم قبول استئناف العارض بعلة عدم أداء القسط الجزافي، فإنه استند على علة مخالفة للقانون بحيث إن العارض أدى القسط الجزافي أمامها. كما أن المحكمة باقتصارها على تبني النقاش الذي راج أمام محكمة البداية دون إتمامها بأي إجراء من إجراءات التحقيق للتأكد من صحة المعطيات المستقاة ابتدائياً والتي مكنت من بلوغ النتيجة التي وصلت إليها. علماً أن التقاضي هو على مستوى درجتين قانونيتين، كما أن المحكمة عندما اعتبرت تصريح الشاهد (ل.ط) بمثابة إقرار بمفهوم المخالفة على عدم ثبوت الحيازة علماً أنها لم تستمع إليه أمامها حتى تتمكن من بسط رقابتها على تصريحاته والبحث في باقي أوجه التصرف والاستغلال التي دفع بها العارض والتأكد والتدقيق في الحيازة المادية المطلوبة، وهو ما يجعلها مخالفة لمقتضيات الفصل 570 من القانون الجنائي، ما يكون معه القرار في مجمله على غير أساس وعرضة للنقض والإبطال. لكن حيث من جهة أولى فإنه ينفي في الملف ما يفيد أداء القسط الجنائي أثناء المرحلة الاستئنافية، فإن المحكمة لما رتبت على عدم قبول الاستئناف، تكون قد طبقت القانون تطبيقاً سليماً وأن ما أثير بهذا الخصوص يبقى غير مقبول. المحكمة النقض لصالحها برفض الطلب ورد مبلغ الضمانة لمودعه بعد استخلاص المصاريف القضائية. وبه صدر القرار وتلي في الجلسة العلنية المنعقدة بالتاريخ المذكور أعلاه بقاعة الجلسات العادية بمحكمة النقض الكائنة بشارع النخيل حي الرياض بالرباط، وكانت الهيئة الحاكمة متركبة من السادة: عبد الحكيم إدريسي قيطون رئيساً والمستشارين: المحفوظ سندالي مقرراً والمصطفى بارز ومحمد الغزاوي وفتيحة غزال، وبحضور المحامي العام السيد محمد جعبة الذي كان يمثل النيابة العامة وعاونه كاتبة الضبط السيدة سعاد عزيزي."
user_part = template_prompt + '\n' + full_text
prompt = (
    f"<|start_header_id|>system<|end_header_id|>\n{system_prompt}\n"
    f"<|start_header_id|>user<|end_header_id|>\n{user_part}\n"
    f"<|start_header_id|>assistant<|end_header_id|>\n"
)

# Generate the summary using the pipeline
generated_outputs = pipe(
    prompt,
    max_new_tokens=1500,  # Limit the number of tokens in the output
    num_return_sequences=1,  # Return a single sequence
    pad_token_id=pipe.tokenizer.eos_token_id,
    padding=True,
    return_full_text=False,
)

print (generated_outputs[0]["generated_text"])

Contact

For questions and feedback about the model, please create an issue in the model repository on Hugging Face.

Acknowledgments

This model was fine-tuned using:

🤗 Transformers
DeepSpeed
PEFT (Parameter-Efficient Fine-Tuning)
Weights & Biases for experiment tracking
Moroccan Judicial Portal for providing access to legal cases
Meta AI for the base LLaMA model.