Instructions to use argo11/0399-tv-full-thinking-fp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use argo11/0399-tv-full-thinking-fp with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="argo11/0399-tv-full-thinking-fp")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("argo11/0399-tv-full-thinking-fp")
model = AutoModelForCausalLM.from_pretrained("argo11/0399-tv-full-thinking-fp")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use argo11/0399-tv-full-thinking-fp with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "argo11/0399-tv-full-thinking-fp"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "argo11/0399-tv-full-thinking-fp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/argo11/0399-tv-full-thinking-fp

SGLang

How to use argo11/0399-tv-full-thinking-fp with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "argo11/0399-tv-full-thinking-fp" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "argo11/0399-tv-full-thinking-fp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "argo11/0399-tv-full-thinking-fp" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "argo11/0399-tv-full-thinking-fp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use argo11/0399-tv-full-thinking-fp with Docker Model Runner:
```
docker model run hf.co/argo11/0399-tv-full-thinking-fp
```

0399 Team Victory Full-Parameter SFT Checkpoint

This repository contains intermediate/full-parameter SFT checkpoints from LLM-jp experiment 0399, using Team Victory valid-clean math reasoning data.

Model Variants

Two model repositories were trained under the same data and training recipe, differing only in initialization:

Repo	Initialization	W&B Run
`argo11/0399-tv-full-thinking-fp`	`llm-jp/llm-jp-4-8b-thinking`	`0399_tv-full-thinking_fp`
`argo11/0399-tv-full-base-fp`	`llm-jp/llm-jp-4-8b-base`	`0399_tv-full-base_fp`

Current uploaded checkpoint:

checkpoint-500/
Upload excludes large optimizer/FSDP duplicate states from Hub.
Local full training state, including optimizer state, is retained under the ABCI experiment directory.

Experiment Links

GitHub issue: llm-jp/experiments#399
W&B project: argo-lab/llmjp4-8b-teamvictory-sft-difficulty-20260629
Thinking W&B run: 0399_tv-full-thinking_fp
Base W&B run: 0399_tv-full-base_fp
Tokenized dataset: argo11/0399-tv-valid-clean-sft-tokenized-llmjp4-8b

Intended Use

These checkpoints are intended for research on Japanese/English mathematical reasoning and reinforcement-learning initialization.

Primary intended uses:

Compare thinking initialization vs base initialization after identical Team Victory SFT.
Use as candidate initial checkpoints for later RL / GRPO-style math reasoning experiments.
Evaluate whether Team Victory SFT improves AIME-style pass@1 while preserving broader ability.

Not intended uses:

Production deployment without additional evaluation.
Safety-critical mathematical, financial, legal, medical, or educational grading decisions.
Claims of benchmark superiority before full evaluation is complete.

Training Data

Source dataset:

HayatoHongoEveryonesAI/qa_verify_cot_new_6M_v6

Filtered dataset:

TV_valid_clean
Rows: 5,461,079
Clean parquet SHA-256: b1fbc4d5c05dbacbf9366055200a034551a46d70bfb2bff4c6432f2175a10d9b
Filter rule:
- is_valid == 1
- contamination quarantine applied against target benchmark problems
Tokenized reusable dataset:
- argo11/0399-tv-valid-clean-sft-tokenized-llmjp4-8b
- Rows: 5,461,079
- Shards: 110
- Max length: 4096
- Columns: input_ids, attention_mask, labels, prompt_len, seq_len

The tokenized dataset was prepared on CPU and uploaded to Hugging Face to avoid repeated expensive tokenization on GPU nodes.

Training Procedure

Training mode:

Full-parameter SFT
No LoRA / PEFT
Hugging Face Trainer
FSDP: full_shard auto_wrap
Assistant response tokens only are trained.
Prompt tokens are masked with -100.

Core hyperparameters:

Parameter	Value
Epochs	`1`
Max length	`4096`
Per-device train batch size	`1`
Gradient accumulation steps	`16`
Learning rate	`2.0e-5`
Warmup ratio	`0.03`
LR scheduler	`cosine`
Precision	`bf16`
Optimizer	`adamw_torch`
Save steps	`500`
Save total limit	`3`
Seed	`20260629`

Infrastructure:

ABCI 3.0
Group: gcg51557
Reserved queue: R9920261000
Experiment directory: /groups/gcg51557/experiments/0399_tv_sft
SFT jobs:
- Thinking: 2004401.pbs1
- Base: 2004402.pbs1
HF checkpoint sync job:
- 2004479.pbs1

Monitoring Status

The initial production SFT run was monitored past checkpoint-500.

Observed stability:

Run	Step observed	Memory plateau	Error status
Thinking	`600+`	~`303GB`/`1.92TB`	No OOM/NCCL/Traceback observed
Base	`590+`	~`294GB`/`1.92TB`	No OOM/NCCL/Traceback observed

Notes:

Both runs skipped GPU-side JSONL regeneration.
Both runs used the uploaded tokenized dataset.
W&B online logging was confirmed.
checkpoint-500 was written locally and uploaded to Hugging Face Hub.
Base run showed grad_norm=inf in early logs while loss remained finite. This should be considered during downstream quality review.

Uploaded Checkpoint Contents

Each checkpoint-500/ directory on Hub contains:

model.safetensors
config.json
generation_config.json
tokenizer files
trainer_state.json
training_args.bin
RNG states
scheduler.pt

The following large local training-state files are intentionally not uploaded to Hub:

optimizer.bin
pytorch_model_fsdp.bin

They are retained in the ABCI experiment directory for local recovery/debugging.

Evaluation

Full benchmark evaluation is not yet included in this model card.

Planned gates for experiment 0399:

Primary math gates:

AIME 2024
AIME 2025
AIME 2026
MATH-500

Regression gates:

LiveCodeBench
IFEval
MT-Bench

Final-candidate-only gates:

GPQA Diamond
BBH
MMLU-Pro

Do not treat this checkpoint as validated until these evaluations are complete.

Limitations

This is an intermediate/full-param SFT checkpoint from an active experiment.
The checkpoint is optimized for math reasoning style data and may regress in non-math tasks.
Training data may contain long chain-of-thought style solutions; generated outputs may be verbose.
Benchmark contamination mitigation was applied, but no contamination process is perfect.
The uploaded checkpoint-500 is early in a longer training run and should not be interpreted as final model quality.
Safety alignment was not the primary target of this experiment.

Citation / Attribution

Base models:

@misc{llmjp4,
  title = {LLM-jp-4 8B Models},
  author = {LLM-jp},
  year = {2026},
  url = {https://huggingface.co/llm-jp}
}

Experiment tracking:

GitHub issue: https://github.com/llm-jp/experiments/issues/399
W&B project: https://wandb.ai/argo-lab/llmjp4-8b-teamvictory-sft-difficulty-20260629

Reproducibility Metadata

Experiment ID: 0399

Experiment slug: tv_sft

Canonical experiment directory:

/groups/gcg51557/experiments/0399_tv_sft

Key manifests:

/groups/gcg51557/experiments/0399_tv_sft/manifests/tokenized_sft_tv_valid_clean_llmjp4_8b.json
/groups/gcg51557/experiments/0399_tv_sft/manifests/sft_stability_monitor_20260701.json
/groups/gcg51557/experiments/0399_tv_sft/manifests/hf_checkpoint_sync_2004479.pbs1.json

Training configs:

/groups/gcg51557/experiments/0399_tv_sft/configs/sft_full_thinking.yaml
/groups/gcg51557/experiments/0399_tv_sft/configs/sft_full_base.yaml

Downloads last month: 834

Safetensors

Model size

9B params

Tensor type

F32

Model tree for argo11/0399-tv-full-thinking-fp

Base model

llm-jp/llm-jp-4-8b-thinking

Finetuned

(3)

this model

argo11
/

0399-tv-full-thinking-fp