Instructions to use asphyxiation112/gemma4-it-kaltsit-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use asphyxiation112/gemma4-it-kaltsit-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("./models/google/gemma-4-E4B-it")
model = PeftModel.from_pretrained(base_model, "asphyxiation112/gemma4-it-kaltsit-lora")

Transformers

How to use asphyxiation112/gemma4-it-kaltsit-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="asphyxiation112/gemma4-it-kaltsit-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("asphyxiation112/gemma4-it-kaltsit-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use asphyxiation112/gemma4-it-kaltsit-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "asphyxiation112/gemma4-it-kaltsit-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "asphyxiation112/gemma4-it-kaltsit-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/asphyxiation112/gemma4-it-kaltsit-lora

SGLang

How to use asphyxiation112/gemma4-it-kaltsit-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "asphyxiation112/gemma4-it-kaltsit-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "asphyxiation112/gemma4-it-kaltsit-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "asphyxiation112/gemma4-it-kaltsit-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "asphyxiation112/gemma4-it-kaltsit-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use asphyxiation112/gemma4-it-kaltsit-lora with Docker Model Runner:
```
docker model run hf.co/asphyxiation112/gemma4-it-kaltsit-lora
```

Gemma 4 LoRA Fine-Tuning for Kal'tsit-Style Dialogue Generation

Project Overview

This project fine-tunes google/gemma-4-E4B-it with LoRA to generate Chinese dialogue in the style of Kal'tsit from Arknights. The goal is not to train a full model from scratch, but to adapt a large instruction-tuned base model with a lightweight PEFT adapter so that it better follows a specific character voice: calm, restrained, analytical, and context-aware.

The project covers the full workflow from story data collection, text cleaning, character-specific dataset construction, prompt design, SFT formatting, LoRA training, validation monitoring, test generation, and optional adapter merging. The main training workflow is documented in gemma4_emotion_lora_arknights.ipynb.

Data Collection and Processing

The raw data was collected from Arknights story pages through ASTR story reader URLs. The URL list is stored in urls.txt and contains 263 story links. The data collection script is scripts/download_arknights_story.py.

The collection pipeline works as follows:

Parse the language code and story file path from each ASTR page URL.
Convert the page route into a raw JSON URL from the ArknightsStoryJson repository, using the pattern zh_CN/gamedata/story/{story_path}.json.
Download each story JSON with requests.
Read storyList and extract attributes.name as the speaker and attributes.content as the dialogue text.
Mark lines without a speaker as narration, and preserve Sticker text as on-screen text.
Clean color tags, HTML-like tags, escaped newlines, and redundant whitespace.
Save each story as both readable .txt and structured .jsonl files.

Each structured line uses the following format:

{"speaker": "凯尔希", "text": "dialogue text"}

The character dataset is then built with scripts/build_character_dataset.py:

Load all .jsonl files from the result folder in sorted order.
Add source file names and line indices to every record for traceability.
Select only records whose speaker exactly matches 凯尔希.
Filter very short or very long responses. The default range is 2 to 300 Chinese characters.
Use the previous 3 story lines as the dialogue context for each target response.
Convert each sample into an instruction/input/output SFT record.
Shuffle with seed 42 and split the dataset into train, validation, and test sets with an 80/10/10 ratio.

Final dataset size:

Split	Samples
Train	2680
Validation	335
Test	335
Total	3350

Prompt Design

Each training sample is converted into a chat-style prompt and assistant completion. The notebook uses the official tokenizer chat template to format the data for Gemma.

System prompt:

你正在扮演《明日方舟》中的凯尔希。
请根据用户给出的上下文进行回复。

要求：
1. 只输出凯尔希的回复内容。
2. 不要解释你为什么这样回复。
3. 不要输出“凯尔希：”这个角色名前缀。
4. 语气应冷静、克制、理性，句子可以偏长。
5. 回复应尽量贴合上下文，而不是机械复述已有台词。

User prompt template:

请根据上下文，以凯尔希的说话风格进行回复。

上下文：
[Character A]：previous line
[Character B]：previous line
[凯尔希]：previous line

The assistant completion is the target Kal'tsit response. In other words, the model is trained to generate the next character-style reply from context, rather than to classify text into labels.

Fine-Tuning Setup

The base model is google/gemma-4-E4B-it, downloaded from ModelScope and loaded from a local directory. Training uses single-GPU BF16 LoRA fine-tuning with PEFT and TRL SFTTrainer. The notebook loads the model with AutoModelForCausalLM and saves the final adapter and tokenizer.

Core training configuration:

Item	Value
Base model	`google/gemma-4-E4B-it`
Fine-tuning method	LoRA / PEFT
Trainer	TRL `SFTTrainer`
LoRA rank	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`all-linear`
Trainable parameters	25,249,792
Total parameters during PEFT training	7,966,350,624
Trainable ratio	0.3170%
Epochs	2
Global steps	336
Per-device batch size	2
Gradient accumulation	8
Learning rate	5e-5
Scheduler	cosine
Warmup steps	10
Max sequence length	512
Loss mode	completion-only loss
Precision	BF16
Evaluation interval	every 50 steps
Checkpoint selection	best `eval_loss`

Training workflow:

Load local JSONL files into a DatasetDict.
Convert instruction/input/output records into chat-style prompt/completion examples.
Load the Gemma 4 tokenizer and base model.
Configure LoRA and verify that trainable parameters are correctly attached.
Run supervised fine-tuning for 2 epochs with SFTTrainer.
Evaluate on the validation set every 50 steps and save checkpoints.
Generate responses for all 335 test examples.
Save the LoRA adapter, tokenizer, training metrics, and test generations.

Results

Training completed successfully at global_step=336, and the best checkpoint was checkpoint-336, which is also the final step. Total training time was about 1993 seconds, or 33.2 minutes.

Validation metrics:

Step	Eval loss	Eval token accuracy
50	3.1132	0.4541
100	2.9867	0.4716
150	2.9440	0.4739
200	2.9127	0.4758
250	2.8917	0.4769
300	2.8853	0.4801
336	2.8843	0.4788

The final test generation file is kaltsit_test_generations.csv, with 335 generated responses. There were no empty outputs and no generated responses with the unwanted 凯尔希： role prefix. The average target response length was 23.42 Chinese characters, while the average generated response length was 16.53 characters.

Qualitatively, the model learned part of the target style, especially restrained phrasing, concise responses, and role-prefix control. However, the test set also shows limitations. The response ...... appears 38 times, and 20 generated responses are 4 characters or shorter. This suggests that the LoRA adapter is valid and learned useful stylistic behavior, but it is not yet a high-quality story continuation model.

My interpretation of the result:

The training run completed successfully, and the adapter files are valid.
The language model LoRA weights were updated.
The model shows measurable style-control behavior.
Contextual reasoning and narrative continuation still need improvement.
Since the training data is text-only, this experiment should be viewed as Chinese character-style text fine-tuning, not multimodal capability fine-tuning.

Repository Artifacts

Main artifacts:

File	Description
`adapter_model.safetensors`	LoRA adapter weights
`adapter_config.json`	PEFT/LoRA configuration
`tokenizer.json` / `tokenizer_config.json`	Tokenizer files
`chat_template.jinja`	Gemma 4 chat template
`train_metrics.json`	Training summary metrics
`kaltsit_test_generations.csv`	Test-set generations
`checkpoint-300` / `checkpoint-336`	Training checkpoints

This repository currently contains the LoRA adapter, not a fully merged model. To deploy a merged model, the matching google/gemma-4-E4B-it base model must be loaded, the adapter must be attached with PeftModel.from_pretrained, and the weights can then be merged with merge_and_unload(). The full processor should also be saved with the merged model.

Data pipeline files:

File	Description
`scripts/download_arknights_story.py`	Downloads and parses raw Arknights story JSON files from ASTR URLs
`scripts/build_character_dataset.py`	Builds the Kal'tsit SFT dataset and creates train/validation/test splits
`scripts/count_speakers.py`	Counts speaker frequencies in the parsed story JSONL files
`requirements.txt`	Minimal dependency list for the data collection scripts
`urls.txt`	Source ASTR story URL list used for data collection

Minimal data pipeline reproduction:

pip install -r requirements.txt
python scripts/download_arknights_story.py --url-file urls.txt --jsonl
python scripts/build_character_dataset.py --input-dir result --character 凯尔希 --output-dir dataset --context-size 3
python scripts/count_speakers.py

Future Improvements

Apply LoRA only to the language model modules instead of all linear modules, since the dataset is text-only.
Filter or downweight very short target responses such as ...... and ——.
Add a small manually curated evaluation set for character consistency, contextual relevance, and naturalness.
Use longer context windows or scene-level samples to improve narrative continuity.
Compare multiple LoRA configurations, including different ranks, target modules, and data filtering strategies.

Downloads last month: -

Model tree for asphyxiation112/gemma4-it-kaltsit-lora

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Adapter

(122)

this model