--- base_model: ibm-granite/granite-3.1-2b-instruct library_name: transformers model_name: Stefan-Zweig-Granite-2B tags: - generated_from_trainer - trl - sft datasets: - Chan-Y/Stefan-Zweig-Chat --- # Model Card for Stefan Zweig Language Model This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct). It has been trained using [TRL](https://github.com/huggingface/trl). ## Model Details This model is designed to emulate Stefan Zweig's distinctive writing and conversational style in chat format. Used a fine-tuning approach based on the methodology described in the DeepSeek-V3 technical report. The project aims to create a language model that emulates Stefan Zweig's distinctive writing and conversational style using a two-stage training process: Supervised Fine-Tuning (SFT) followed by Group Relative Policy Optimization (GRPO). ## Quick start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModelForCausalLM.from_pretrained("Chan-Y/Stefan-Zweig-Granite", device_map=device) tokenizer = AutoTokenizer.from_pretrained("Chan-Y/Stefan-Zweig-Granite") input_text = "As an experienced and famous writer Stefan Zweig, what's your opinion on artificial intelligence?" inputs = tokenizer(input_text, return_tensors="pt").to(device) with torch.no_grad(): outputs = model.generate( **inputs, max_length=512, num_return_sequences=1, do_sample=True, temperature=0.7, top_p=0.9, ) # Decode the generated text generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text.split(input_text)[-1]) ``` ## Training procedure ![train-loss](train-loss.png) **Dataset:** Custom synthetic dataset generated using argilla/synthetic-data-generator with Qwen2.5:14b **Data Format:** Structured conversations with specific role markers and custom tokens. **Data Processing:** Implementation of special tokens and for style consistency - **Training Type:** Two-stage training pipeline 1. Supervised Fine-Tuning (SFT) 2. Group Relative Policy Optimization (GRPO) ### Framework versions - TRL: 0.14.0.dev0 - Transformers: 4.48.1 - Pytorch: 2.5.1+cu124 - Datasets: 3.2.0 - Tokenizers: 0.21.0