image

Quantized version of Universal-NER/UniNER-7B-all

Universal-NER/UniNER-7B-all quantized to 4bit with GPTQ and stored with 1GB shard size.

Model Description

The model Universal-NER/UniNER-7B-all was quantized to 4bit, group_size 128, and act-order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration).

Evaluation

TODO

Prompt template

Prompt template is the same as for the full precision model:

prompt_template = """A virtual assistant answers questions from a user based on the provided text.
USER: Text: {input_text}
ASSISTANT: I’ve read this text.
USER: What describes {entity_name} in the text?
ASSISTANT:
"""

Usage

It is recommended to format input according to the prompt template mentioned above during inference for best results.

prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"})

The model is small enough to be loaded in free-tier Colab with a T4 GPU: https://gist.github.com/sebastianschramm/9903b2714e30d870d7e1e097c6b5c9e3

License

The original full precision model and its associated data are released under the CC BY-NC 4.0 license. Hence, the same license applies for the 4bit version.

Downloads last month
64
Safetensors
Model size
1.13B params
Tensor type
I32
·
FP16
·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for SebastianSchramm/UniNER-7B-all-GPTQ-4bit-128g-actorder_True

Quantized
(2)
this model

Collection including SebastianSchramm/UniNER-7B-all-GPTQ-4bit-128g-actorder_True