Instructions to use davidkim205/hades-2-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use davidkim205/hades-2-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="davidkim205/hades-2-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("davidkim205/hades-2-4b")
model = AutoModelForMultimodalLM.from_pretrained("davidkim205/hades-2-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use davidkim205/hades-2-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "davidkim205/hades-2-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davidkim205/hades-2-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/davidkim205/hades-2-4b

SGLang

How to use davidkim205/hades-2-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "davidkim205/hades-2-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davidkim205/hades-2-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "davidkim205/hades-2-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davidkim205/hades-2-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use davidkim205/hades-2-4b with Docker Model Runner:
```
docker model run hf.co/davidkim205/hades-2-4b
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

hades-2-4b

Model Summary

davidkim205/hades-2-4b is a fine-tuned version of unsloth/gemma-4-E4B-it, optimized for internal news article classification and company name extraction tasks.

Compared with the base model, this fine-tuned model showed improved overall performance across the main supported tasks in internal evaluation, although performance varies by language, task, and subset.

Model ID: davidkim205/hades-2-4b
Base model: unsloth/gemma-4-E4B-it
Parameter count: 4.5B active parameters; approximately 8B total parameters including embeddings, following the base model configuration
Base maximum context length: 131,072 tokens
Recommended evaluated context length: Up to 16,384 tokens
Tensor type: BF16

Intended Use

davidkim205/hades-2-4b is intended for internal news article classification and company name extraction tasks.

The model is designed to process news article text, typically including fields such as title and body, and perform the following tasks:

Company name extraction (stock): Extract company names from the article.
Section classification (section): Classify the article into one or more predefined news sections.
Category classification (category): Classify the article into exactly one predefined economic or non-economic category label.

Each task is expected to return a JSON-formatted response. The exact schema depends on the task-specific instruction.

Training Details

This model was fine-tuned using ORPO on task-specific preference pairs.

Fine-tuning algorithm: ORPO
Training objective: Preference optimization using chosen and rejected responses
Training format: Task-specific instruction plus news article input, typically including title and body
Output format: JSON-formatted response
Model weights: Merged 16-bit weights
Precision: BF16
Training sequence length: 16,384 tokens
Training tasks: stock_extraction, section_classification, category_classification

Dataset

This model was fine-tuned on a small internal preference dataset for news classification and company name extraction. Labels were produced with assistance from GPT-5.4 as part of an internal annotation workflow. Stock labels were subsequently reviewed by human annotators as part of the quality-control process.

The dataset contains approximately 1.6K news article examples in Korean and English. Each example consists of news article text, typically including title and body fields, and is associated with one of the supported tasks: stock_extraction, section_classification, or category_classification.

The data was formatted as preference pairs for ORPO training, with a chosen response and a rejected response.

The dataset is not publicly released because it contains internal annotation data.

Evaluation

The model was evaluated on an internal held-out news article evaluation set containing 1,207 task-level examples across Korean and English data.

The evaluation set was constructed from 400 news articles from April and May 2026: 100 Korean and 100 English articles per month. Because each article can be evaluated across multiple supported tasks, this resulted in 1,207 task-level evaluation examples.

The evaluation covers the three supported tasks: stock, section, and category.

Evaluation Method

For category, the model returns one category label, and accuracy is calculated by exact match between the predicted top-1 label and the reference top-1 label.

For section, the model may return multiple section objects, but this evaluation uses top-1 label accuracy: the predicted top-1 section label must exactly match the reference top-1 section label.

For stock, company objects with confidence < 0.7 are removed from the prediction output. Exact-match accuracy is then calculated by comparing the normalized reference and prediction company sets after filtering. Entity order is not considered in the exact-match criterion.

Overall Results

Task / Split	`davidkim205/hades-2-4b`	`unsloth/gemma-4-E4B-it`	Improvement
total	70.92% (856/1207)	52.94% (639/1207)	+17.98 pp
ko	63.76% (387/607)	40.69% (247/607)	+23.07 pp
us	78.17% (469/600)	65.33% (392/600)	+12.84 pp
section	69.50% (278/400)	53.50% (214/400)	+16.00 pp
category	70.75% (283/400)	50.50% (202/400)	+20.25 pp
stock	72.48% (295/407)	54.79% (223/407)	+17.69 pp

Economic News Subset

The economic news subset is calculated by aggregating the April 2026 and May 2026 economic-news rows for each language and task. For section and category, it reports recall on examples whose reference labels indicate economic relevance. For stock, it reports exact-match company name extraction accuracy on economic-news examples.

Task	Split	Metric	`davidkim205/hades-2-4b`	`unsloth/gemma-4-E4B-it`	Improvement
Section	Korean	Economic recall	33.33% (13/39)	35.90% (14/39)	-2.56 pp
Section	English	Economic recall	70.00% (42/60)	63.33% (38/60)	+6.67 pp
Category	Korean	Economic recall	91.18% (93/102)	58.82% (60/102)	+32.35 pp
Category	English	Economic recall	90.38% (94/104)	94.23% (98/104)	-3.85 pp
Stock	Korean	Exact-match accuracy	74.12% (63/85)	55.29% (47/85)	+18.82 pp
Stock	English	Exact-match accuracy	67.04% (120/179)	45.25% (81/179)	+21.79 pp

Usage

Inference Example

The following code is an illustrative example. In production, the model should be prompted with the task-specific instruction defined for each supported task.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "davidkim205/hades-2-4b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

task_instruction = """
Read the following news article and classify it into exactly one of the following categories:

- economic_incident
- economic_crime
- economic
- non_economic_incident
- non_economic_crime
- non_economic

Return the result as JSON only, with the following format:
{"category": "economic", "confidence": 0.0}
"""

article = {
    "title": "금융위, ‘회계기준 위반’ 국보에 과징금 6500만원",
    "body": "금융위원회가 회계처리기준을 위반해 재무제표를 허위 작성·공시한 국보에 과징금을 부과했다.",
}

messages = [
    {
        "role": "user",
        "content": f"""{task_instruction}

{article}
""",
    }
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False,
)

generated_tokens = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

vLLM Serving Example

The model can be served with vLLM using the following command:

vllm serve davidkim205/hades-2-4b \
  --dtype bfloat16 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.90

Limitations

This model was fine-tuned on a small internal dataset for news classification and company name extraction. Performance may vary on article formats, languages, label schemas, or domains that differ from the fine-tuning and evaluation data.

Although the base model configuration supports up to 131,072 tokens, this fine-tuned model was trained with a maximum sequence length of 16,384 tokens. Performance beyond 16,384 tokens has not been fully evaluated and should be validated before production use.

The model is prompted to return JSON-formatted responses, but valid JSON output is not guaranteed in all cases. Downstream systems should apply JSON validation and error handling.

Users should evaluate the model on their own target data before using it in production.