---
pipeline_tag: text-generation
tags:
- text-generation-inference
- backpack
- backpackmodel
library_name: transformers
license: apache-2.0
datasets:
- openwebtext
language:
- en
---

# Model Card for Levanter-Backpack-1.4B
This is 1.4B parameter version of [Backpack architecture](https://arxiv.org/abs/2305.16765), intended to combine strong modeling performance 
with an interface for interpretability and control.

# Training Details

## Training Data
This model was trained on the [OpenWebText](https://huggingface.co/datasets/openwebtext) corpus.
## Training Procedure

This model was trained for 450k gradient steps and  cosine decaying learning rate from 1e-4 to zero, with a linear warmup of 5k steps.

# Environmental Impact

- **Hardware Type:**  v3-128 TPU (128 cores, 2TB Memory)
- **Hours used:** Roughly 8.6 days.
- **Cloud Provider:** Google Cloud Patform
- **Compute Region:** North America.

## Model Architecture and Objective

This model was trained to minimize the cross-entropy loss, and is a [Backpack language model](https://arxiv.org/pdf/2305.16765.pdf).

### Software

This model was trained with [Levanter](https://github.com/stanford-crfm/levanter/) and [Jax](https://github.com/google/jax).

### Loss Curve
![Loss Curve](assets/train_loss.png)

# How to Get Started with the Model

Please install `transformers`, `safetensors` and `torch` to use this model.

```bash
pip install transformers safetensors torch
```

Run the following Python code:

```python
import torch
import transformers
from transformers import AutoModelForCausalLM


model_id = "stanford-crfm/levanter-backpack-1b"
config = transformers.AutoConfig.from_pretrained(model_id, trust_remote_code=True)
torch_model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    config=config, 
    trust_remote_code=True
)
torch_model.eval()

input = torch.randint(0, 50264, (1, 512), dtype=torch.long)
torch_out = torch_model(input, position_ids=None,)
torch_out = torch.nn.functional.softmax(torch_out.logits, dim=-1)
print(torch_out.shape)
```