Edit model card

Codellama 34b base model fine-tuned on the text chunk from the OpenAssistant-Guanaco dataset instead of Q&A pair, so it struggles to determine the end of the answer. recommend using a stop string like "### Human:" to prevent the model from talking to itself.

Prompt template:

### Human: {prompt}
### Assistant:

Run the model via text-generation-inference

One GPU:

sudo docker run --gpus all --shm-size 1g -p 5000:80 -v $PWD/models:/data ghcr.io/huggingface/text-generation-inference:latest --max-total-tokens 4096 --quantize gptq --model-id mzbac/CodeLlama-34b-guanaco-gptq

Two GPUs:

docker run --gpus all --shm-size 1g -p 5000:80 -v $PWD/models:/data ghcr.io/huggingface/text-generation-inference:latest --max-total-tokens 4096 --max-input-length 4000 --max-batch-prefill-tokens 4096 --quantize gptq --num-shard 2 --model-id mzbac/CodeLlama-34b-guanaco-gptq

Query the mode via curl

curl 127.0.0.1:8001/generate \
    -X POST \
    -d '{"inputs":"### Human: 给我准备一个去日本旅行的计划\n### Assistant:", "parameters":{"max_new_tokens":2048, "stop": [
      "### Human:"
    ]}}' \
    -H 'Content-Type: application/json'

or From typescript:

npm install axios readline
import axios from 'axios';
import readline from 'readline';

const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const makeRequest = async (conversation: Array<{ role: string, content: string }>) => {
    const url = "http://127.0.0.1:5000/generate";
    const conversationString = conversation.map(
        (entry) => `### ${entry.role}: ${entry.content}`
    ).join('\n');

    const conversationWithPrompt = `${conversationString}\n### Assistant:`;

    const payload = {
        inputs: conversationWithPrompt,
        parameters: {
            temperature: 0.7,
            max_new_tokens: 2000,
            repetition_penalty: 1.03,
            top_k: 45,
            top_p: 0.95,
            typical_p: 0.95,
            stop: ['### Human']
        }
    };
    const headers = { 'Content-Type': 'application/json' };
    try {
        const response = await axios.post(url, payload, { headers });
        return response.data;
    } catch (error) {
        return `Failed to make request, status code: ${error.response?.status}`;
    }
};

const main = async () => {
    console.log("Welcome to the CLI tool. Type 'exit' to quit.");

    const conversation: Array<{ role: string, content: string }> = [];

    const question = (prompt: string) => {
        return new Promise<string>(resolve => {
            rl.question(prompt, (answer) => {
                resolve(answer);
            });
        });
    };

    while (true) {
        const humanInput = await question("### Human: ");

        if (humanInput.toLowerCase() === 'exit') {
            console.log("Exiting...");
            rl.close();
            break;
        }

        conversation.push({ role: 'Human', content: humanInput });

        console.log("### Assistant:");
        const assistantOutput = await makeRequest(conversation);

        const assistantText = assistantOutput.generated_text || 'Failed to get a valid response.';

        console.log(assistantText);
        conversation.push({ role: 'Assistant', content: assistantText });
    }
};

main();
Downloads last month
614
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.