Instructions to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base_model, "sunming-giegie/assignment3-part4-qwen3-1.7b-lora")

Transformers

How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sunming-giegie/assignment3-part4-qwen3-1.7b-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("sunming-giegie/assignment3-part4-qwen3-1.7b-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sunming-giegie/assignment3-part4-qwen3-1.7b-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sunming-giegie/assignment3-part4-qwen3-1.7b-lora

SGLang

How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sunming-giegie/assignment3-part4-qwen3-1.7b-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sunming-giegie/assignment3-part4-qwen3-1.7b-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with Docker Model Runner:
```
docker model run hf.co/sunming-giegie/assignment3-part4-qwen3-1.7b-lora
```

Assignment 3 Forward LoRA Adapter

This repository contains the Part 4 final instruction-tuning adapter for Assignment 3, built on top of Qwen/Qwen3-1.7B with LoRA. The model is intended as a compact, submission-ready artifact for the classroom pipeline based on self-generated and self-curated instruction-response pairs.

Model Details

Developed by: sunming-giegie
Model type: Causal language model with LoRA adapter
Base model: Qwen/Qwen3-1.7B
Language: English
License: Apache-2.0 for the base model; adapter release follows course-project use
Finetuning method: PEFT LoRA

Intended Use

This adapter is meant for:

course assignment demonstration
lightweight instruction-following experiments
reproducing the final SFT stage of the assignment pipeline

It is not intended as a production-ready general assistant.

Limitations

This model was trained on a small curated dataset and remains noticeably sensitive to prompt style and topic domain. In internal inspection during the assignment, the model was more reliable on concise factual or technical questions than on creative, open-ended, or multi-step reasoning prompts.

Known limitations:

may generate generic or over-explanatory answers
may fail on broad open-ended prompts
may still underperform the base model on difficult reasoning tasks
evaluation here is qualitative and assignment-oriented, not benchmark-complete

How to Use

Load the base model first, then attach this adapter with PEFT.

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen3-1.7B"
adapter_path = "sunming-giegie/assignment3-part4-qwen3-1.7b-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_path)

Training Data

The training data comes from the Part 3 curated dataset:

Sample 150 single-turn LIMA responses.
Use a backward model to infer instructions from responses.
Score each generated (instruction, response) pair with Qwen/Qwen3-1.7B.
Keep higher-quality pairs for forward supervised fine-tuning.
Build a compact subset emphasizing cleaner, shorter, and more technical examples.

The Part 3 curated dataset repo is expected to live alongside this model release.

Training Procedure

This adapter corresponds to the compact forward model variant selected as the most submission-ready version after multiple remediation rounds.

Hyperparameters

Precision: bf16
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj
Learning rate: 5e-5
Epochs: 4
Per-device batch size: 2
Gradient accumulation: 8
Max sequence length: 1536

Prompting

Training and inference used a direct-answer prompt style with a /no_think control token and explicit constraints to avoid chain-of-thought style output. Additional response cleaning and token suppression were added during the assignment to reduce stray special-token leakage.

Evaluation

Evaluation for the assignment was primarily example-based and qualitative:

generate held-out sample responses
compare fluency, relevance, and format cleanliness
prefer the checkpoint that minimizes prompt leakage and malformed outputs

Among the explored variants, the compact adapter was selected for submission because it produced the most stable direct answers on short factual and technical prompts.

Files

adapter_model.safetensors: LoRA adapter weights
adapter_config.json: LoRA configuration
tokenizer files copied for easier loading

Framework Versions

PEFT 0.18.1

Downloads last month: 1

Model tree for sunming-giegie/assignment3-part4-qwen3-1.7b-lora

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(511)

this model