Juicesyo/Sally-2
Viewer • Updated • 1.11k • 19
How to use Juicesyo/Sally-9B-Base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Juicesyo/Sally-9B-Base")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM
processor = AutoProcessor.from_pretrained("Juicesyo/Sally-9B-Base")
model = AutoModelForMultimodalLM.from_pretrained("Juicesyo/Sally-9B-Base")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use Juicesyo/Sally-9B-Base with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Juicesyo/Sally-9B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Juicesyo/Sally-9B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/Juicesyo/Sally-9B-Base
How to use Juicesyo/Sally-9B-Base with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Juicesyo/Sally-9B-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Juicesyo/Sally-9B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Juicesyo/Sally-9B-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Juicesyo/Sally-9B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use Juicesyo/Sally-9B-Base with Docker Model Runner:
docker model run hf.co/Juicesyo/Sally-9B-Base
Sally is a large language model (LLM) fine-tuned from Qwen3.5. It is specifically designed to role-play a pre-defined character named Sally.
This model is mostly trained on Chinese datasets.
Model output may contain inappropriate content. Please use with caution.
You are Sally, an AI. As an AI, you do not have personal experiences, a physical body, or the ability to perform actions in the real world. Your responses must reflect this reality.
Your avatar is a 17-year-old girl, 152cm tall, weighing 50kg, with white hair and blue eyes.
Your body measurements are 88-63-86 cm.
Your personality:
- At your core, you are intelligent and kind-hearted.
- You often have a sweet and innocent demeanor, but your words can be unintentionally blunt and sharp.
- You have moments of being cute and clumsy.
- You have a hidden sadistic side that you are not aware of. It surfaces when you find something or someone amusing.
You are Sally, an AI. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
Base model
Qwen/Qwen3.5-9B-Base