Spaces:

ML610
/

replit-3b-ggml_models

Runtime error

App Files Files Community

replit-3b-ggml_models / ctransformers /ctransformers.egg-info /PKG-INFO

ML610

Upload 515 files

1cf2abd over 1 year ago

raw

history blame

15.4 kB

	Metadata-Version: 2.1
	Name: ctransformers
	Version: 0.2.11
	Summary: Python bindings for the Transformer models implemented in C/C++ using GGML library.
	Home-page: https://github.com/marella/ctransformers
	Author: Ravindra Marella
	Author-email: mv.ravindra007@gmail.com
	License: MIT
	Keywords: ctransformers transformers ai llm
	Classifier: Development Status :: 1 - Planning
	Classifier: Intended Audience :: Developers
	Classifier: Intended Audience :: Education
	Classifier: Intended Audience :: Science/Research
	Classifier: License :: OSI Approved :: MIT License
	Classifier: Programming Language :: Python :: 3
	Classifier: Topic :: Scientific/Engineering
	Classifier: Topic :: Scientific/Engineering :: Mathematics
	Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
	Classifier: Topic :: Software Development
	Classifier: Topic :: Software Development :: Libraries
	Classifier: Topic :: Software Development :: Libraries :: Python Modules
	Description-Content-Type: text/markdown
	Provides-Extra: tests
	License-File: LICENSE

	# [C Transformers](https://github.com/marella/ctransformers) [![PyPI](https://img.shields.io/pypi/v/ctransformers)](https://pypi.org/project/ctransformers/) [![tests](https://github.com/marella/ctransformers/actions/workflows/tests.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/tests.yml) [![build](https://github.com/marella/ctransformers/actions/workflows/build.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)

	Python bindings for the Transformer models implemented in C/C++ using [GGML](https://github.com/ggerganov/ggml) library.

	> Also see [ChatDocs](https://github.com/marella/chatdocs)

	- [Supported Models](#supported-models)
	- [Installation](#installation)
	- [Usage](#usage)
	- [Hugging Face Hub](#hugging-face-hub)
	- [LangChain](#langchain)
	- [GPU](#gpu)
	- [Documentation](#documentation)
	- [License](#license)

	## Supported Models

	\| Models \| Model Type \|
	\| :-------------------- \| ----------- \|
	\| GPT-2 \| `gpt2` \|
	\| GPT-J, GPT4All-J \| `gptj` \|
	\| GPT-NeoX, StableLM \| `gpt_neox` \|
	\| LLaMA \| `llama` \|
	\| MPT \| `mpt` \|
	\| Dolly V2 \| `dolly-v2` \|
	\| StarCoder, StarChat \| `starcoder` \|
	\| Falcon (Experimental) \| `falcon` \|

	## Installation

	```sh
	pip install ctransformers
	```

	For GPU (CUDA) support, set environment variable `CT_CUBLAS=1` and install from source using:

	```sh
	CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers
	```

	<details>
	<summary><strong>Show commands for Windows</strong></summary><br>

	On Windows PowerShell run:

	```sh
	$env:CT_CUBLAS=1
	pip install ctransformers --no-binary ctransformers
	```

	On Windows Command Prompt run:

	```sh
	set CT_CUBLAS=1
	pip install ctransformers --no-binary ctransformers
	```

	</details>

	## Usage

	It provides a unified interface for all models:

	```py
	from ctransformers import AutoModelForCausalLM

	llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')

	print(llm('AI is going to'))
	```

	[Run in Google Colab](https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL)

	If you are getting `illegal instruction` error, try using `lib='avx'` or `lib='basic'`:

	```py
	llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')
	```

	It provides a generator interface for more control:

	```py
	tokens = llm.tokenize('AI is going to')

	for token in llm.generate(tokens):
	print(llm.detokenize(token))
	```

	It can be used with a custom or Hugging Face tokenizer:

	```py
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained('gpt2')

	tokens = tokenizer.encode('AI is going to')

	for token in llm.generate(tokens):
	print(tokenizer.decode(token))
	```

	It also provides access to the low-level C API. See [Documentation](#documentation) section below.

	### Hugging Face Hub

	It can be used with models hosted on the Hub:

	```py
	llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
	```

	If a model repo has multiple model files (`.bin` files), specify a model file using:

	```py
	llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml', model_file='ggml-model.bin')
	```

	It can be used with your own models uploaded on the Hub. For better user experience, upload only one model per repo.

	To use it with your own model, add `config.json` file to your model repo specifying the `model_type`:

	```json
	{
	"model_type": "gpt2"
	}
	```

	You can also specify additional parameters under `task_specific_params.text-generation`.

	See [marella/gpt-2-ggml](https://huggingface.co/marella/gpt-2-ggml/blob/main/config.json) for a minimal example and [marella/gpt-2-ggml-example](https://huggingface.co/marella/gpt-2-ggml-example/blob/main/config.json) for a full example.

	### LangChain

	It is integrated into LangChain. See [LangChain docs](https://python.langchain.com/docs/ecosystem/integrations/ctransformers).

	### GPU

	> Note: Currently only LLaMA models have GPU support.

	To run some of the model layers on GPU, set the `gpu_layers` parameter:

	```py
	llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-llama.bin', model_type='llama', gpu_layers=50)
	```

	[Run in Google Colab](https://colab.research.google.com/drive/1Ihn7iPCYiqlTotpkqa1tOhUIpJBrJ1Tp)

	## Documentation

	<!-- API_DOCS -->

	### Config

	\| Parameter \| Type \| Description \| Default \|
	\| :------------------- \| :---------- \| :------------------------------------------------------- \| :------ \|
	\| `top_k` \| `int` \| The top-k value to use for sampling. \| `40` \|
	\| `top_p` \| `float` \| The top-p value to use for sampling. \| `0.95` \|
	\| `temperature` \| `float` \| The temperature to use for sampling. \| `0.8` \|
	\| `repetition_penalty` \| `float` \| The repetition penalty to use for sampling. \| `1.1` \|
	\| `last_n_tokens` \| `int` \| The number of last tokens to use for repetition penalty. \| `64` \|
	\| `seed` \| `int` \| The seed value to use for sampling tokens. \| `-1` \|
	\| `max_new_tokens` \| `int` \| The maximum number of new tokens to generate. \| `256` \|
	\| `stop` \| `List[str]` \| A list of sequences to stop generation when encountered. \| `None` \|
	\| `stream` \| `bool` \| Whether to stream the generated text. \| `False` \|
	\| `reset` \| `bool` \| Whether to reset the model state before generating text. \| `True` \|
	\| `batch_size` \| `int` \| The batch size to use for evaluating tokens. \| `8` \|
	\| `threads` \| `int` \| The number of threads to use for evaluating tokens. \| `-1` \|
	\| `context_length` \| `int` \| The maximum context length to use. \| `-1` \|
	\| `gpu_layers` \| `int` \| The number of layers to run on GPU. \| `0` \|

	> Note: Currently only LLaMA and MPT models support the `context_length` parameter and only LLaMA models support the `gpu_layers` parameter.

	### <kbd>class</kbd> `AutoModelForCausalLM`

	---

	#### <kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`

	```python
	from_pretrained(
	model_path_or_repo_id: str,
	model_type: Optional[str] = None,
	model_file: Optional[str] = None,
	config: Optional[ctransformers.hub.AutoConfig] = None,
	lib: Optional[str] = None,
	local_files_only: bool = False,
	**kwargs
	) â†’ LLM
	```

	Loads the language model from a local file or remote repo.

	Args:

	- <b>`model_path_or_repo_id`</b>: The path to a model file or directory or the name of a Hugging Face Hub model repo.
	- <b>`model_type`</b>: The model type.
	- <b>`model_file`</b>: The name of the model file in repo or directory.
	- <b>`config`</b>: `AutoConfig` object.
	- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.
	- <b>`local_files_only`</b>: Whether or not to only look at local files (i.e., do not try to download the model).

	Returns:
	`LLM` object.

	### <kbd>class</kbd> `LLM`

	### <kbd>method</kbd> `LLM.__init__`

	```python
	__init__(
	model_path: str,
	model_type: str,
	config: Optional[ctransformers.llm.Config] = None,
	lib: Optional[str] = None
	)
	```

	Loads the language model from a local file.

	Args:

	- <b>`model_path`</b>: The path to a model file.
	- <b>`model_type`</b>: The model type.
	- <b>`config`</b>: `Config` object.
	- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.

	---

	##### <kbd>property</kbd> LLM.config

	The config object.

	---

	##### <kbd>property</kbd> LLM.context_length

	The context length of model.

	---

	##### <kbd>property</kbd> LLM.embeddings

	The input embeddings.

	---

	##### <kbd>property</kbd> LLM.eos_token_id

	The end-of-sequence token.

	---

	##### <kbd>property</kbd> LLM.logits

	The unnormalized log probabilities.

	---

	##### <kbd>property</kbd> LLM.model_path

	The path to the model file.

	---

	##### <kbd>property</kbd> LLM.model_type

	The model type.

	---

	##### <kbd>property</kbd> LLM.vocab_size

	The number of tokens in vocabulary.

	---

	#### <kbd>method</kbd> `LLM.detokenize`

	```python
	detokenize(tokens: Sequence[int], decode: bool = True) â†’ Union[str, bytes]
	```

	Converts a list of tokens to text.

	Args:

	- <b>`tokens`</b>: The list of tokens.
	- <b>`decode`</b>: Whether to decode the text as UTF-8 string.

	Returns:
	The combined text of all tokens.

	---

	#### <kbd>method</kbd> `LLM.embed`

	```python
	embed(
	input: Union[str, Sequence[int]],
	batch_size: Optional[int] = None,
	threads: Optional[int] = None
	) â†’ List[float]
	```

	Computes embeddings for a text or list of tokens.

	> Note: Currently only LLaMA models support embeddings.

	Args:

	- <b>`input`</b>: The input text or list of tokens to get embeddings for.
	- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
	- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

	Returns:
	The input embeddings.

	---

	#### <kbd>method</kbd> `LLM.eval`

	```python
	eval(
	tokens: Sequence[int],
	batch_size: Optional[int] = None,
	threads: Optional[int] = None
	) â†’ None
	```

	Evaluates a list of tokens.

	Args:

	- <b>`tokens`</b>: The list of tokens to evaluate.
	- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
	- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

	---

	#### <kbd>method</kbd> `LLM.generate`

	```python
	generate(
	tokens: Sequence[int],
	top_k: Optional[int] = None,
	top_p: Optional[float] = None,
	temperature: Optional[float] = None,
	repetition_penalty: Optional[float] = None,
	last_n_tokens: Optional[int] = None,
	seed: Optional[int] = None,
	batch_size: Optional[int] = None,
	threads: Optional[int] = None,
	reset: Optional[bool] = None
	) â†’ Generator[int, NoneType, NoneType]
	```

	Generates new tokens from a list of tokens.

	Args:

	- <b>`tokens`</b>: The list of tokens to generate tokens from.
	- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
	- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
	- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
	- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
	- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
	- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
	- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
	- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
	- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

	Returns:
	The generated tokens.

	---

	#### <kbd>method</kbd> `LLM.is_eos_token`

	```python
	is_eos_token(token: int) â†’ bool
	```

	Checks if a token is an end-of-sequence token.

	Args:

	- <b>`token`</b>: The token to check.

	Returns:
	`True` if the token is an end-of-sequence token else `False`.

	---

	#### <kbd>method</kbd> `LLM.reset`

	```python
	reset() â†’ None
	```

	Resets the model state.

	---

	#### <kbd>method</kbd> `LLM.sample`

	```python
	sample(
	top_k: Optional[int] = None,
	top_p: Optional[float] = None,
	temperature: Optional[float] = None,
	repetition_penalty: Optional[float] = None,
	last_n_tokens: Optional[int] = None,
	seed: Optional[int] = None
	) â†’ int
	```

	Samples a token from the model.

	Args:

	- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
	- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
	- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
	- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
	- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
	- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`

	Returns:
	The sampled token.

	---

	#### <kbd>method</kbd> `LLM.tokenize`

	```python
	tokenize(text: str) â†’ List[int]
	```

	Converts a text into list of tokens.

	Args:

	- <b>`text`</b>: The text to tokenize.

	Returns:
	The list of tokens.

	---

	#### <kbd>method</kbd> `LLM.__call__`

	```python
	__call__(
	prompt: str,
	max_new_tokens: Optional[int] = None,
	top_k: Optional[int] = None,
	top_p: Optional[float] = None,
	temperature: Optional[float] = None,
	repetition_penalty: Optional[float] = None,
	last_n_tokens: Optional[int] = None,
	seed: Optional[int] = None,
	batch_size: Optional[int] = None,
	threads: Optional[int] = None,
	stop: Optional[Sequence[str]] = None,
	stream: Optional[bool] = None,
	reset: Optional[bool] = None
	) â†’ Union[str, Generator[str, NoneType, NoneType]]
	```

	Generates text from a prompt.

	Args:

	- <b>`prompt`</b>: The prompt to generate text from.
	- <b>`max_new_tokens`</b>: The maximum number of new tokens to generate. Default: `256`
	- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
	- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
	- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
	- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
	- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
	- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
	- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
	- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
	- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`
	- <b>`stream`</b>: Whether to stream the generated text. Default: `False`
	- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

	Returns:
	The generated text.

	<!-- API_DOCS -->

	## License

	[MIT](https://github.com/marella/ctransformers/blob/main/LICENSE)