Edit model card

Cebor

Cebor is an End-Size LLM developed by cebor ai.

Limitations

  • Due to limitations in model size, the model may experience hallucinatory issues.

Quickly Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = '<model_download_path>'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds= model.chat(tokenizer, "What is the distance between the Earth and the Moon?", temperature=0.8, top_p=0.8)
print(responds)

FastApi Deploy

Environment Preparation

pip install modelscope transformers sentencepiece accelerate fastapi uvicorn requests streamlit

Model Download

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('zxiaodong/Cerbo-2B-dpo-bf16', local_dir='<your local path>', revision='master')

Code Preparation

Create a new api.py file and enter the following content.

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import uvicorn
import json
import datetime
import torch


DEVICE = "cuda"  
DEVICE_ID = "0"  
CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE 

# Clean up GPU memory function
def torch_gc():
    if torch.cuda.is_available():  
        with torch.cuda.device(CUDA_DEVICE):  
            torch.cuda.empty_cache()  
            torch.cuda.ipc_collect() 

# Create FastAPI app
app = FastAPI()


@app.post("/")
async def create_item(request: Request):
    global model, tokenizer  
    json_post_raw = await request.json() 
    json_post = json.dumps(json_post_raw)  
    json_post_list = json.loads(json_post)  
    prompt = json_post_list.get('prompt')  
    
    
    responds, history = model.chat(tokenizer, prompt, temperature=0.5, top_p=0.8, repetition_penalty=1.02)    
    now = datetime.datetime.now()  
    time = now.strftime("%Y-%m-%d %H:%M:%S")  
    
    answer = {
        "response": responds,
        "status": 200,
        "time": time
    }
    
    log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(responds) + '"'
    print(log) 
    torch_gc()  
    return answer  


if __name__ == '__main__':
    torch.manual_seed(0)  
    path = '<your model path>'
    tokenizer = AutoTokenizer.from_pretrained(path)
    model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
    model.eval()  
    uvicorn.run(app, host='0.0.0.0', port=8080, workers=1)  

API Deploy

Enter the following command in the terminal to start the API service

python api.py

By default, it is deployed on port 8080 and is called via the POST method. You can use curl to call it.

curl -X POST "http://127.0.0.1:8080" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "What is the distance between the Earth and the Moon?"}'

Response

{
    "response": "The average distance between the Earth and the Moon is approximately 238,855 miles (384,400 kilometers). This distance is known as the lunar distance or lunar separation. It's important to note that this distance can vary due to various factors such as the Moon's orbit around Earth and the Earth's elliptical orbit around the Sun.",
    "status": 200,
    "time": "2024-07-18 18:14:56"
}


license: apache-2.0

Downloads last month
0
Safetensors
Model size
2.72B params
Tensor type
BF16
·
Unable to determine this model’s pipeline type. Check the docs .