You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

hades-2-4b

Model Summary

davidkim205/hades-2-4b is a fine-tuned version of unsloth/gemma-4-E4B-it, optimized for internal news article classification and company name extraction tasks.

Compared with the base model, this fine-tuned model showed improved overall performance across the main supported tasks in internal evaluation, although performance varies by language, task, and subset.

  • Model ID: davidkim205/hades-2-4b
  • Base model: unsloth/gemma-4-E4B-it
  • Parameter count: 4.5B active parameters; approximately 8B total parameters including embeddings, following the base model configuration
  • Base maximum context length: 131,072 tokens
  • Recommended evaluated context length: Up to 16,384 tokens
  • Tensor type: BF16

Intended Use

davidkim205/hades-2-4b is intended for internal news article classification and company name extraction tasks.

The model is designed to process news article text, typically including fields such as title and body, and perform the following tasks:

  • Company name extraction (stock): Extract company names from the article.
  • Section classification (section): Classify the article into one or more predefined news sections.
  • Category classification (category): Classify the article into exactly one predefined economic or non-economic category label.

Each task is expected to return a JSON-formatted response. The exact schema depends on the task-specific instruction.

Training Details

This model was fine-tuned using ORPO on task-specific preference pairs.

  • Fine-tuning algorithm: ORPO
  • Training objective: Preference optimization using chosen and rejected responses
  • Training format: Task-specific instruction plus news article input, typically including title and body
  • Output format: JSON-formatted response
  • Model weights: Merged 16-bit weights
  • Precision: BF16
  • Training sequence length: 16,384 tokens
  • Training tasks: stock_extraction, section_classification, category_classification

Dataset

This model was fine-tuned on a small internal preference dataset for news classification and company name extraction. Labels were produced with assistance from GPT-5.4 as part of an internal annotation workflow. Stock labels were subsequently reviewed by human annotators as part of the quality-control process.

The dataset contains approximately 1.6K news article examples in Korean and English. Each example consists of news article text, typically including title and body fields, and is associated with one of the supported tasks: stock_extraction, section_classification, or category_classification.

The data was formatted as preference pairs for ORPO training, with a chosen response and a rejected response.

The dataset is not publicly released because it contains internal annotation data.

Evaluation

The model was evaluated on an internal held-out news article evaluation set containing 1,207 task-level examples across Korean and English data.

The evaluation set was constructed from 400 news articles from April and May 2026: 100 Korean and 100 English articles per month. Because each article can be evaluated across multiple supported tasks, this resulted in 1,207 task-level evaluation examples.

The evaluation covers the three supported tasks: stock, section, and category.

Evaluation Method

For category, the model returns one category label, and accuracy is calculated by exact match between the predicted top-1 label and the reference top-1 label.

For section, the model may return multiple section objects, but this evaluation uses top-1 label accuracy: the predicted top-1 section label must exactly match the reference top-1 section label.

For stock, company objects with confidence < 0.7 are removed from the prediction output. Exact-match accuracy is then calculated by comparing the normalized reference and prediction company sets after filtering. Entity order is not considered in the exact-match criterion.

Overall Results

Task / Split davidkim205/hades-2-4b unsloth/gemma-4-E4B-it Improvement
total 70.92% (856/1207) 52.94% (639/1207) +17.98 pp
ko 63.76% (387/607) 40.69% (247/607) +23.07 pp
us 78.17% (469/600) 65.33% (392/600) +12.84 pp
section 69.50% (278/400) 53.50% (214/400) +16.00 pp
category 70.75% (283/400) 50.50% (202/400) +20.25 pp
stock 72.48% (295/407) 54.79% (223/407) +17.69 pp

Economic News Subset

The economic news subset is calculated by aggregating the April 2026 and May 2026 economic-news rows for each language and task. For section and category, it reports recall on examples whose reference labels indicate economic relevance. For stock, it reports exact-match company name extraction accuracy on economic-news examples.

Task Split Metric davidkim205/hades-2-4b unsloth/gemma-4-E4B-it Improvement
Section Korean Economic recall 33.33% (13/39) 35.90% (14/39) -2.56 pp
Section English Economic recall 70.00% (42/60) 63.33% (38/60) +6.67 pp
Category Korean Economic recall 91.18% (93/102) 58.82% (60/102) +32.35 pp
Category English Economic recall 90.38% (94/104) 94.23% (98/104) -3.85 pp
Stock Korean Exact-match accuracy 74.12% (63/85) 55.29% (47/85) +18.82 pp
Stock English Exact-match accuracy 67.04% (120/179) 45.25% (81/179) +21.79 pp

Usage

Inference Example

The following code is an illustrative example. In production, the model should be prompted with the task-specific instruction defined for each supported task.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "davidkim205/hades-2-4b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

task_instruction = """
Read the following news article and classify it into exactly one of the following categories:

- economic_incident
- economic_crime
- economic
- non_economic_incident
- non_economic_crime
- non_economic

Return the result as JSON only, with the following format:
{"category": "economic", "confidence": 0.0}
"""

article = {
    "title": "금융위, ‘회계기준 위반’ 국보에 과징금 6500만원",
    "body": "금융위원회가 회계처리기준을 위반해 재무제표를 허위 작성·공시한 국보에 과징금을 부과했다.",
}

messages = [
    {
        "role": "user",
        "content": f"""{task_instruction}

{article}
""",
    }
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False,
)

generated_tokens = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

vLLM Serving Example

The model can be served with vLLM using the following command:

vllm serve davidkim205/hades-2-4b \
  --dtype bfloat16 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.90

Limitations

This model was fine-tuned on a small internal dataset for news classification and company name extraction. Performance may vary on article formats, languages, label schemas, or domains that differ from the fine-tuning and evaluation data.

Although the base model configuration supports up to 131,072 tokens, this fine-tuned model was trained with a maximum sequence length of 16,384 tokens. Performance beyond 16,384 tokens has not been fully evaluated and should be validated before production use.

The model is prompted to return JSON-formatted responses, but valid JSON output is not guaranteed in all cases. Downstream systems should apply JSON validation and error handling.

Users should evaluate the model on their own target data before using it in production.

Downloads last month
177
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davidkim205/hades-2-4b

Finetuned
(93)
this model

Collection including davidkim205/hades-2-4b