---
library_name: transformers
tags: []
---
# CapyLake-7B-v2-laser

This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)


![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp)

[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)

## Process

+ Realigned the chat template to ChatML 
+ Completed 1 Epoch
+ 5e-05 learning rate
+ Training time was about 2 hours on 1 H100
+ Cost was ~$8

## Code Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")

# Adding hyperparameters to the generation call
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,  # Controls the maximum length of the new tokens created
    temperature=0.7,  # Adjust for creativity (lower is less random)
    top_k=50,  # Keeps the top k tokens for sampling
    top_p=0.95,  # Uses nucleus sampling with this cumulative probability
    num_return_sequences=1,  # Number of sequences to generate
    no_repeat_ngram_size=2,  # Prevents repeating n-grams to ensure diversity
    early_stopping=True  # Stops generation when all sequences reach the EOS token
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Other Capy Models
 
SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!

## Evaluations

TODO