model

Browse files

Files changed (15) hide show

1_Pooling/config.json +7 -0
README.md +77 -0
__pycache__/handler.cpython-39.pyc +0 -0
config.json +24 -0
config_sentence_transformers.json +7 -0
handler.py +27 -0
model_head.pkl +3 -0
modules.json +20 -0
pytorch_model.bin +3 -0
requirements.txt +0 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +15 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false
+}

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+license: mit
+tags:
+- setfit
+- endpoints-template
+- text-classification
+---
+# SetFit AG News
+This is a [SetFit](https://github.com/huggingface/setfit/tree/main) classifier fine-tuned on the [AG News](https://huggingface.co/datasets/ag_news) dataset.
+The model was created following the [Outperform OpenAI GPT-3 with SetFit for text-classifiation](https://www.philschmid.de/getting-started-setfit) blog post of [Philipp Schmid](https://www.linkedin.com/in/philipp-schmid-a6a2bb196/).
+The model achieves an accuracy of 0.87 on the test set and was only trained with `32` total examples (8 per class).
+```bash
+***** Running evaluation *****
+model used: sentence-transformers/all-mpnet-base-v2
+train dataset: 32 samples
+accuracy: 0.8731578947368421
+```
+#### What is SetFit?
+"SetFit" (https://arxiv.org/abs/2209.11055) is a new approach that can be used to create high accuracte text-classification models with limited labeled data. SetFit is outperforming GPT-3 in 7 out of 11 tasks, while being 1600x smaller.
+Check out the blog to learn more: [Outperform OpenAI GPT-3 with SetFit for text-classifiation](https://www.philschmid.de/getting-started-setfit)
+# Inference Endpoints
+The model repository also implements a generic custom `handler.py` as an example for how to use `SetFit` models with [inference-endpoints](https://hf.co/inference-endpoints).
+Code: https://huggingface.co/philschmid/setfit-ag-news-endpoint/blob/main/handler.py
+![result](res.png)
+## Send requests with Pyton
+We are going to use requests to send our requests. (make your you have it installed `pip install requests`)
+```python
+import json
+import requests as r
+ENDPOINT_URL=""# url of your endpoint
+HF_TOKEN=""
+# payload samples
+regular_payload = { "inputs": "The New Customers Are In Town Today's customers are increasingly demanding, in Asia as elsewhere in the world. Henry Astorga describes the complex reality faced by today's marketers, which includes much higher expectations than we have been used to. Today's customers want performance, and they want it now!"}
+# HTTP headers for authorization
+headers= {
+    "Authorization": f"Bearer {HF_TOKEN}",
+    "Content-Type": "application/json"
+}
+# send request
+response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
+classified = response.json()
+print(classified)
+```
+**curl example**
+```bash
+curl https://ak7gduay2ypyr9vp.us-east-1.aws.endpoints.huggingface.cloud \
+-X POST \
+--data-binary 'sample.png' \
+-H "Authorization: Bearer XXX" \
+-H "Content-Type: null"
+```

__pycache__/handler.cpython-39.pyc ADDED Viewed

Binary file (1.37 kB). View file

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "_name_or_path": "/home/ubuntu/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2/",
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.23.1",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "__version__": {
+    "sentence_transformers": "2.0.0",
+    "transformers": "4.6.1",
+    "pytorch": "1.8.1"
+  }
+}

handler.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from typing import Dict, List, Any
+from setfit import SetFitModel
+class EndpointHandler:
+    def __init__(self, path=""):
+        # load model
+        self.model = SetFitModel.from_pretrained(path)
+        # ag_news id to label mapping
+        self.id2label = {0: "World", 1: "Sports", 2: "Business", 3: "Sci/Tech"}
+    def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
+        """
+         data args:
+              inputs (:obj: `str`)
+        Return:
+              A :obj:`list` | `dict`: will be serialized and returned
+        """
+        # get inputs
+        inputs = data.pop("inputs", data)
+        if isinstance(inputs, str):
+            inputs = [inputs]
+        # run normal prediction
+        scores = self.model.predict_proba(inputs)[0]
+        return [{"label": self.id2label[i], "score": score.item()} for i, score in enumerate(scores)]

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70d5c2cb3d62b4ed52b0fceee866879cb89f76a6fb9541c1870786037eb150aa
+size 25399

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:393137b7f36cac72e781dc14923ecf49a05927805b148d519dd4bab662ed5b4d
+size 438014769

requirements.txt ADDED Viewed

File without changes

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 384,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "name_or_path": "/home/ubuntu/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2/",
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "special_tokens_map_file": null,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff