Instructions to use MebinThattil/tiny-llama-q4_0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MebinThattil/tiny-llama-q4_0 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MebinThattil/tiny-llama-q4_0",
	filename="tinyllama-1.1B-q4.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MebinThattil/tiny-llama-q4_0 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MebinThattil/tiny-llama-q4_0
# Run inference directly in the terminal:
llama-cli -hf MebinThattil/tiny-llama-q4_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MebinThattil/tiny-llama-q4_0
# Run inference directly in the terminal:
llama-cli -hf MebinThattil/tiny-llama-q4_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MebinThattil/tiny-llama-q4_0
# Run inference directly in the terminal:
./llama-cli -hf MebinThattil/tiny-llama-q4_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MebinThattil/tiny-llama-q4_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MebinThattil/tiny-llama-q4_0

Use Docker

docker model run hf.co/MebinThattil/tiny-llama-q4_0

LM Studio
Jan
Ollama
How to use MebinThattil/tiny-llama-q4_0 with Ollama:
```
ollama run hf.co/MebinThattil/tiny-llama-q4_0
```

Unsloth Studio new

How to use MebinThattil/tiny-llama-q4_0 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MebinThattil/tiny-llama-q4_0 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MebinThattil/tiny-llama-q4_0 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MebinThattil/tiny-llama-q4_0 to start chatting

Docker Model Runner
How to use MebinThattil/tiny-llama-q4_0 with Docker Model Runner:
```
docker model run hf.co/MebinThattil/tiny-llama-q4_0
```

Lemonade

How to use MebinThattil/tiny-llama-q4_0 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MebinThattil/tiny-llama-q4_0

Run and chat with the model

lemonade run user.tiny-llama-q4_0-{{QUANT_TAG}}

List all available models

lemonade list

tiny-llama-q4_0 / llama_cpp /server /cli.py

MebinThattil

Upload folder using huggingface_hub

5d62acd verified 11 months ago

raw

history blame

3.27 kB

	from __future__ import annotations

	import argparse

	from typing import List, Literal, Union, Any, Type, TypeVar

	from pydantic import BaseModel


	def _get_base_type(annotation: Type[Any]) -> Type[Any]:
	if getattr(annotation, "__origin__", None) is Literal:
	assert hasattr(annotation, "__args__") and len(annotation.__args__) >= 1 # type: ignore
	return type(annotation.__args__[0]) # type: ignore
	elif getattr(annotation, "__origin__", None) is Union:
	assert hasattr(annotation, "__args__") and len(annotation.__args__) >= 1 # type: ignore
	non_optional_args: List[Type[Any]] = [
	arg for arg in annotation.__args__ if arg is not type(None) # type: ignore
	]
	if non_optional_args:
	return _get_base_type(non_optional_args[0])
	elif (
	getattr(annotation, "__origin__", None) is list
	or getattr(annotation, "__origin__", None) is List
	):
	assert hasattr(annotation, "__args__") and len(annotation.__args__) >= 1 # type: ignore
	return _get_base_type(annotation.__args__[0]) # type: ignore
	return annotation


	def _contains_list_type(annotation: Type[Any] \| None) -> bool:
	origin = getattr(annotation, "__origin__", None)

	if origin is list or origin is List:
	return True
	elif origin in (Literal, Union):
	return any(_contains_list_type(arg) for arg in annotation.__args__) # type: ignore
	else:
	return False


	def _parse_bool_arg(arg: str \| bytes \| bool) -> bool:
	if isinstance(arg, bytes):
	arg = arg.decode("utf-8")

	true_values = {"1", "on", "t", "true", "y", "yes"}
	false_values = {"0", "off", "f", "false", "n", "no"}

	arg_str = str(arg).lower().strip()

	if arg_str in true_values:
	return True
	elif arg_str in false_values:
	return False
	else:
	raise ValueError(f"Invalid boolean argument: {arg}")


	def add_args_from_model(parser: argparse.ArgumentParser, model: Type[BaseModel]):
	"""Add arguments from a pydantic model to an argparse parser."""

	for name, field in model.model_fields.items():
	description = field.description
	if field.default and description and not field.is_required():
	description += f" (default: {field.default})"
	base_type = (
	_get_base_type(field.annotation) if field.annotation is not None else str
	)
	list_type = _contains_list_type(field.annotation)
	if base_type is not bool:
	parser.add_argument(
	f"--{name}",
	dest=name,
	nargs="*" if list_type else None,
	type=base_type,
	help=description,
	)
	if base_type is bool:
	parser.add_argument(
	f"--{name}",
	dest=name,
	type=_parse_bool_arg,
	help=f"{description}",
	)


	T = TypeVar("T", bound=Type[BaseModel])


	def parse_model_from_args(model: T, args: argparse.Namespace) -> T:
	"""Parse a pydantic model from an argparse namespace."""
	return model(
	**{
	k: v
	for k, v in vars(args).items()
	if v is not None and k in model.model_fields
	}
	)