Matthijs Hollemans commited on
Commit
277b711
1 Parent(s): 3fba4cf

add model card

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ datasets:
5
+ - sst-2
6
+ ---
7
+
8
+ # DistilBERT optimized for Apple Neural Engine
9
+
10
+ This is the [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model, optimized for the Apple Neural Engine (ANE) as described in the article [Deploying Transformers on the Apple Neural Engine](https://machinelearning.apple.com/research/neural-engine-transformers).
11
+
12
+ The source code is taken from Apple's [ml-ane-transformers](https://github.com/apple/ml-ane-transformers) GitHub repo, modified slightly to make it usable from the 🤗 Transformers library.
13
+
14
+ ## How to use
15
+
16
+ Usage example:
17
+
18
+ ```python
19
+ import torch
20
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
21
+
22
+ model_checkpoint = "apple/ane-distilbert-base-uncased-finetuned-sst-2-english"
23
+ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, trust_remote_code=True)
24
+ model = AutoModelForSequenceClassification.from_pretrained(
25
+ model_checkpoint, trust_remote_code=True, return_dict=False,
26
+ )
27
+
28
+ inputs = tokenizer(
29
+ ["The Neural Engine is really fast"],
30
+ return_tensors="pt",
31
+ max_length=128,
32
+ padding="max_length",
33
+ )
34
+
35
+ with torch.no_grad():
36
+ outputs = model(**inputs)
37
+ ```
38
+
39
+ ## Using the model with Core ML
40
+
41
+ PyTorch does not utilize the ANE, and running this version of the model with PyTorch on the CPU or GPU may actually be slower than the original. To take advantage of the hardware acceleration of the ANE, use the Core ML version of the model, **DistilBERT_fp16.mlpackage**.
42
+
43
+ Core ML usage example from Python:
44
+
45
+ ```python
46
+ import coremltools as ct
47
+
48
+ mlmodel = ct.models.MLModel("DistilBERT_fp16.mlpackage")
49
+
50
+ inputs = tokenizer(
51
+ ["The Neural Engine is really fast"],
52
+ return_tensors="np",
53
+ max_length=128,
54
+ padding="max_length",
55
+ )
56
+
57
+ outputs_coreml = mlmodel.predict({
58
+ "input_ids": inputs["input_ids"].astype(np.int32),
59
+ "attention_mask": inputs["attention_mask"].astype(np.int32),
60
+ })
61
+ ```
62
+
63
+ To use the model from Swift, you will need to tokenize the input yourself according to the BERT rules. You can find a Swift implementation of the [BERT tokenizer here](https://github.com/huggingface/swift-coreml-transformers).