GAIR/lima
Viewer • Updated • 1.33k • 4.16k • 465
How to use mrm8488/limstral-7B-v0.1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="mrm8488/limstral-7B-v0.1") # Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("mrm8488/limstral-7B-v0.1")
model = AutoModelForMultimodalLM.from_pretrained("mrm8488/limstral-7B-v0.1")How to use mrm8488/limstral-7B-v0.1 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mrm8488/limstral-7B-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "mrm8488/limstral-7B-v0.1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/mrm8488/limstral-7B-v0.1
How to use mrm8488/limstral-7B-v0.1 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "mrm8488/limstral-7B-v0.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "mrm8488/limstral-7B-v0.1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "mrm8488/limstral-7B-v0.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "mrm8488/limstral-7B-v0.1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use mrm8488/limstral-7B-v0.1 with Docker Model Runner:
docker model run hf.co/mrm8488/limstral-7B-v0.1
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the LIMA dataset for instruction following downstream task.
The model was loaded on 8 bits and fine-tuned on the LIMA dataset using the LoRA PEFT technique with the huggingface/peft library and trl/sft for 2 epochs on 1 x A100 (40GB) GPU.
SFT Trainer params:
trainer = SFTTrainer(
model=model,
train_dataset=train_ds,
eval_dataset=test_ds,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=2048,
tokenizer=tokenizer,
args=training_arguments,
packing=False
)
LoRA config:
config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
target_modules = ['q_proj', 'k_proj', 'down_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj']
)
The following hyperparameters were used during training:
| Step | Training Loss | Validation Loss |
|---|---|---|
| 5 | 1.802800 | 1.848371 |
| 10 | 1.605800 | 1.803416 |
| 15 | 1.844800 | 1.762276 |
| 20 | 1.752600 | 1.754042 |
| 25 | 1.512400 | 1.750550 |
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
repo_id = "mrm8488/limstral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(repo_id, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
gen = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
instruction = "[INST] Write an email to say goodbye to me boss [\INST]"
res = gen(instruction, max_new_tokens=512, temperature=0.3, top_p=0.75, top_k=40, repetition_penalty=1.2)
print(res[0]['generated_text'])