Instructions to use Cyrema/Llama-2-7b-Cesspit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Cyrema/Llama-2-7b-Cesspit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Cyrema/Llama-2-7b-Cesspit")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Cyrema/Llama-2-7b-Cesspit")
model = AutoModelForCausalLM.from_pretrained("Cyrema/Llama-2-7b-Cesspit")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Cyrema/Llama-2-7b-Cesspit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Cyrema/Llama-2-7b-Cesspit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Cyrema/Llama-2-7b-Cesspit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Cyrema/Llama-2-7b-Cesspit

SGLang

How to use Cyrema/Llama-2-7b-Cesspit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Cyrema/Llama-2-7b-Cesspit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Cyrema/Llama-2-7b-Cesspit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Cyrema/Llama-2-7b-Cesspit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Cyrema/Llama-2-7b-Cesspit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Cyrema/Llama-2-7b-Cesspit with Docker Model Runner:
```
docker model run hf.co/Cyrema/Llama-2-7b-Cesspit
```

LLaMa-2-7b The Pit Project/Cesspit.

Model Details

Backbone Model: LLaMA-2
Language(s): English
Library: HuggingFace Transformers
License: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept their License before downloading the model weights.

Datasets Details

Scraped posts of a particular subject within an image board.
The dataset was heavily augmented with various types of filtering to improve coherency and relevency to the origin and our goals.
For our Cesspit model, it contains 272,637 entries.

Prompt Template

The model was not trained in an instructional or chat-style format, please ensure your inference program does not attempt to inject anything more than your sole input when inferencing, simply type whatever comes to mind and the model will attempt to complete it.

Hardware and Software

Hardware: We utilized 3.8 Nvidia RTX 4090 hours for training our model.
Training Factors: We created this model using Axolotl

Training details

The rank and alpha we used was 128 and 16.
Our learning rate was 4e-4 with 10 warmups steps with a cosine scheduler for 3 epoch.
Our batch size was 5 microbatch
Sample packing was used.

Limitations

It is strongly recommend to not deploy this model into a real-world environment unless its behavior is well-understood and explicit and strict limitations on the scope, impact, and duration of the deployment are enforced.

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

F16