---
base_model: ibm-granite/granite-3.1-2b-instruct
library_name: transformers
model_name: Stefan-Zweig-Granite-2B
tags:
- generated_from_trainer
- trl
- sft
datasets:
- Chan-Y/Stefan-Zweig-Chat
---

# Model Card for Stefan Zweig Language Model

This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
It has been trained using [TRL](https://github.com/huggingface/trl).

## Model Details
This model is designed to emulate Stefan Zweig's distinctive writing and conversational style in chat format.
Used a fine-tuning approach based on the methodology described in the DeepSeek-V3 technical report. The project aims to create a language model that emulates Stefan Zweig's distinctive writing and conversational style using a two-stage training process: Supervised Fine-Tuning (SFT) followed by Group Relative Policy Optimization (GRPO).

## Quick start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained("Chan-Y/Stefan-Zweig-Granite", device_map=device)
tokenizer = AutoTokenizer.from_pretrained("Chan-Y/Stefan-Zweig-Granite")

input_text = "As an experienced and famous writer Stefan Zweig, what's your opinion on artificial intelligence?"
inputs = tokenizer(input_text, return_tensors="pt").to(device)

with torch.no_grad():
  outputs = model.generate(
    **inputs,
    max_length=512,
    num_return_sequences=1,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
  )

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text.split(input_text)[-1])
```

## Training procedure

![train-loss](train-loss.png)

**Dataset:** Custom synthetic dataset generated using argilla/synthetic-data-generator with Qwen2.5:14b

**Data Format:** Structured conversations with specific role markers and custom tokens.

**Data Processing:** Implementation of special tokens <stefan_zweig> and </stefan_zweig> for style consistency

- **Training Type:** Two-stage training pipeline

  1. Supervised Fine-Tuning (SFT)
  2. Group Relative Policy Optimization (GRPO)

### Framework versions

- TRL: 0.14.0.dev0
- Transformers: 4.48.1
- Pytorch: 2.5.1+cu124
- Datasets: 3.2.0
- Tokenizers: 0.21.0