Cebor
Cebor is an End-Size LLM developed by cebor ai.
Limitations
- Due to limitations in model size, the model may experience hallucinatory issues.
Quickly Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
path = '<model_download_path>'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
responds= model.chat(tokenizer, "What is the distance between the Earth and the Moon?", temperature=0.8, top_p=0.8)
print(responds)
FastApi Deploy
Environment Preparation
pip install modelscope transformers sentencepiece accelerate fastapi uvicorn requests streamlit
Model Download
import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('zxiaodong/Cerbo-2B-dpo-bf16', local_dir='<your local path>', revision='master')
Code Preparation
Create a new api.py file and enter the following content.
from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import uvicorn
import json
import datetime
import torch
DEVICE = "cuda"
DEVICE_ID = "0"
CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE
# Clean up GPU memory function
def torch_gc():
if torch.cuda.is_available():
with torch.cuda.device(CUDA_DEVICE):
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
# Create FastAPI app
app = FastAPI()
@app.post("/")
async def create_item(request: Request):
global model, tokenizer
json_post_raw = await request.json()
json_post = json.dumps(json_post_raw)
json_post_list = json.loads(json_post)
prompt = json_post_list.get('prompt')
responds, history = model.chat(tokenizer, prompt, temperature=0.5, top_p=0.8, repetition_penalty=1.02)
now = datetime.datetime.now()
time = now.strftime("%Y-%m-%d %H:%M:%S")
answer = {
"response": responds,
"status": 200,
"time": time
}
log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(responds) + '"'
print(log)
torch_gc()
return answer
if __name__ == '__main__':
torch.manual_seed(0)
path = '<your model path>'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
model.eval()
uvicorn.run(app, host='0.0.0.0', port=8080, workers=1)
API Deploy
Enter the following command in the terminal to start the API service
python api.py
By default, it is deployed on port 8080 and is called via the POST method. You can use curl to call it.
curl -X POST "http://127.0.0.1:8080" \
-H 'Content-Type: application/json' \
-d '{"prompt": "What is the distance between the Earth and the Moon?"}'
Response
{
"response": "The average distance between the Earth and the Moon is approximately 238,855 miles (384,400 kilometers). This distance is known as the lunar distance or lunar separation. It's important to note that this distance can vary due to various factors such as the Moon's orbit around Earth and the Earth's elliptical orbit around the Sun.",
"status": 200,
"time": "2024-07-18 18:14:56"
}
license: apache-2.0
- Downloads last month
- 0
Unable to determine this model’s pipeline type. Check the
docs
.