Instructions to use Lanni-ni/forgetting_gate_4_6_384_ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lanni-ni/forgetting_gate_4_6_384_ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Lanni-ni/forgetting_gate_4_6_384_", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Lanni-ni/forgetting_gate_4_6_384_", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Lanni-ni/forgetting_gate_4_6_384_ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lanni-ni/forgetting_gate_4_6_384_"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lanni-ni/forgetting_gate_4_6_384_",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Lanni-ni/forgetting_gate_4_6_384_

SGLang

How to use Lanni-ni/forgetting_gate_4_6_384_ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Lanni-ni/forgetting_gate_4_6_384_" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lanni-ni/forgetting_gate_4_6_384_",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Lanni-ni/forgetting_gate_4_6_384_" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lanni-ni/forgetting_gate_4_6_384_",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Lanni-ni/forgetting_gate_4_6_384_ with Docker Model Runner:
```
docker model run hf.co/Lanni-ni/forgetting_gate_4_6_384_
```

forgetting_gate_4_6_384_ / ops /layer_with_visualization.py

Lanni-ni

add remote code + model files

5bd7474 verified 6 months ago

raw

history blame contribute delete

1.26 kB

	import torch
	import torch.nn
	from typing import Dict, Any


	class LayerWithVisualization(torch.nn.Module):
	def __init__(self):
	super().__init__()
	self.visualization_enabled = False

	def prepare(self):
	# Should be called before the training step
	pass

	def plot(self, options: Dict[str, Any]) -> Dict[str, Any]:
	raise NotImplementedError()


	class LayerVisualizer:
	def __init__(self, module: torch.nn.Module, options: Dict[str, Any] = {}):
	self.modules = []
	self.options = options
	self.curr_options = None
	for n, m in module.named_modules():
	if isinstance(m, LayerWithVisualization):
	self.modules.append((n, m))

	def plot(self) -> Dict[str, Any]:
	res = {}
	for n, m in self.modules:
	res.update({f"{n}/{k}": v for k, v in m.plot(self.curr_options).items()})
	m.visualization_enabled = False

	self.curr_options = None
	return res

	def prepare(self, options: Dict[str, Any] = {}):
	self.curr_options = self.options.copy()
	self.curr_options.update(options)

	for _, m in self.modules:
	m.prepare()
	m.visualization_enabled = True