Instructions to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B") model = PeftModel.from_pretrained(base_model, "sunming-giegie/assignment3-part4-qwen3-1.7b-lora") - Transformers
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sunming-giegie/assignment3-part4-qwen3-1.7b-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sunming-giegie/assignment3-part4-qwen3-1.7b-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sunming-giegie/assignment3-part4-qwen3-1.7b-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sunming-giegie/assignment3-part4-qwen3-1.7b-lora
- SGLang
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sunming-giegie/assignment3-part4-qwen3-1.7b-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sunming-giegie/assignment3-part4-qwen3-1.7b-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sunming-giegie/assignment3-part4-qwen3-1.7b-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use sunming-giegie/assignment3-part4-qwen3-1.7b-lora with Docker Model Runner:
docker model run hf.co/sunming-giegie/assignment3-part4-qwen3-1.7b-lora
Assignment 3 Forward LoRA Adapter
This repository contains the Part 4 final instruction-tuning adapter for Assignment 3, built on top of Qwen/Qwen3-1.7B with LoRA. The model is intended as a compact, submission-ready artifact for the classroom pipeline based on self-generated and self-curated instruction-response pairs.
Model Details
- Developed by:
sunming-giegie - Model type: Causal language model with LoRA adapter
- Base model:
Qwen/Qwen3-1.7B - Language: English
- License: Apache-2.0 for the base model; adapter release follows course-project use
- Finetuning method: PEFT LoRA
Intended Use
This adapter is meant for:
- course assignment demonstration
- lightweight instruction-following experiments
- reproducing the final SFT stage of the assignment pipeline
It is not intended as a production-ready general assistant.
Limitations
This model was trained on a small curated dataset and remains noticeably sensitive to prompt style and topic domain. In internal inspection during the assignment, the model was more reliable on concise factual or technical questions than on creative, open-ended, or multi-step reasoning prompts.
Known limitations:
- may generate generic or over-explanatory answers
- may fail on broad open-ended prompts
- may still underperform the base model on difficult reasoning tasks
- evaluation here is qualitative and assignment-oriented, not benchmark-complete
How to Use
Load the base model first, then attach this adapter with PEFT.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = "Qwen/Qwen3-1.7B"
adapter_path = "sunming-giegie/assignment3-part4-qwen3-1.7b-lora"
tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_path)
Training Data
The training data comes from the Part 3 curated dataset:
- Sample 150 single-turn LIMA responses.
- Use a backward model to infer instructions from responses.
- Score each generated
(instruction, response)pair withQwen/Qwen3-1.7B. - Keep higher-quality pairs for forward supervised fine-tuning.
- Build a compact subset emphasizing cleaner, shorter, and more technical examples.
The Part 3 curated dataset repo is expected to live alongside this model release.
Training Procedure
This adapter corresponds to the compact forward model variant selected as the most submission-ready version after multiple remediation rounds.
Hyperparameters
- Precision: bf16
- LoRA rank (
r): 16 - LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules:
q_proj,k_proj,v_proj,o_proj - Learning rate:
5e-5 - Epochs:
4 - Per-device batch size:
2 - Gradient accumulation:
8 - Max sequence length:
1536
Prompting
Training and inference used a direct-answer prompt style with a /no_think control token and explicit constraints to avoid chain-of-thought style output. Additional response cleaning and token suppression were added during the assignment to reduce stray special-token leakage.
Evaluation
Evaluation for the assignment was primarily example-based and qualitative:
- generate held-out sample responses
- compare fluency, relevance, and format cleanliness
- prefer the checkpoint that minimizes prompt leakage and malformed outputs
Among the explored variants, the compact adapter was selected for submission because it produced the most stable direct answers on short factual and technical prompts.
Files
adapter_model.safetensors: LoRA adapter weightsadapter_config.json: LoRA configuration- tokenizer files copied for easier loading
Framework Versions
- PEFT 0.18.1
- Downloads last month
- 1