joaopn commited on
Commit
b39a7be
1 Parent(s): cd231ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - google-research-datasets/go_emotions
5
+ language:
6
+ - en
7
+ tags:
8
+ - text-classification
9
+ - onnx
10
+ - fp16
11
+ - roberta
12
+ - emotions
13
+ - multi-class-classification
14
+ - multi-label-classification
15
+ - optimum
16
+ inference: false
17
+ ---
18
+
19
+ This model is a FP16 optimized version of [SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions). It runs exclusively on the GPU.
20
+
21
+ On an RTX 4090, it is about **2x faster** than the base ONNX version ([SamLowe/roberta-base-go_emotions-onnx](https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx)) and **3x faster** than the pytorch version. The speedup depends chiefly on your GPU's FP16:FP32 ratio. For more comparison benchmarks and sample code, check here: [https://github.com/joaopn/gpu_benchmark_goemotions](https://github.com/joaopn/gpu_benchmark_goemotions).
22
+
23
+
24
+ **Accuracy**: On a test set of 10K reddit comments, the mean label probability difference from the pytorch version was ~1E-4. Metrics (accuracy, F1) are essentially identical to the original model.
25
+
26
+ ### Usage
27
+
28
+ The model was generated with
29
+
30
+ ```python
31
+ from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification, AutoOptimizationConfig
32
+
33
+ model_id_onnx = "SamLowe/roberta-base-go_emotions-onnx"
34
+ file_name = "onnx/model.onnx"
35
+ model = ORTModelForSequenceClassification.from_pretrained(model_id_onnx, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': 0})
36
+
37
+ optimizer = ORTOptimizer.from_pretrained(model)
38
+ optimization_config = AutoOptimizationConfig.O4()
39
+ optimizer.optimize(save_dir='roberta-base-go_emotions-onnx-fp16', optimization_config=optimization_config)
40
+ ```
41
+
42
+ You will need the GPU version of the ONNX Runtime. It can be installed with
43
+
44
+ ```
45
+ pip install optimum[onnxruntime-gpu] --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
46
+ ```
47
+
48
+ For convenience, the [benchmark repo](https://github.com/joaopn/gpu_benchmark_goemotions) provides an `environment.yml` file to create a conda env with all the requirements. Below is an optimized, batched usage example:
49
+
50
+ ```python
51
+ import pandas as pd
52
+ import torch
53
+ from tqdm import tqdm
54
+ from transformers import AutoTokenizer
55
+ from optimum.onnxruntime import ORTModelForSequenceClassification
56
+
57
+ def sentiment_analysis_batched(df, batch_size, field_name):
58
+
59
+ model_id = 'joaopn/roberta-base-go_emotions-onnx-fp16'
60
+ file_name = 'model.onnx'
61
+ gpu_id = 0
62
+
63
+ model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name, provider="CUDAExecutionProvider", provider_options={'device_id': gpu_id})
64
+ device = torch.device(f"cuda:{gpu_id}")
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
67
+
68
+ results = []
69
+
70
+ # Precompute id2label mapping
71
+ id2label = model.config.id2label
72
+
73
+ total_samples = len(df)
74
+ with tqdm(total=total_samples, desc="Processing samples") as pbar:
75
+ for start_idx in range(0, total_samples, batch_size):
76
+ end_idx = start_idx + batch_size
77
+ texts = df[field_name].iloc[start_idx:end_idx].tolist()
78
+
79
+ inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512)
80
+ input_ids = inputs['input_ids'].to(device)
81
+ attention_mask = inputs['attention_mask'].to(device)
82
+
83
+ with torch.no_grad():
84
+ outputs = model(input_ids, attention_mask=attention_mask)
85
+ predictions = torch.sigmoid(outputs.logits) # Use sigmoid for multi-label classification
86
+
87
+ # Collect predictions on GPU
88
+ results.append(predictions)
89
+
90
+ pbar.update(end_idx - start_idx)
91
+
92
+ # Concatenate all results on GPU
93
+ all_predictions = torch.cat(results, dim=0).cpu().numpy()
94
+
95
+ # Convert to DataFrame
96
+ predictions_df = pd.DataFrame(all_predictions, columns=[id2label[i] for i in range(all_predictions.shape[1])])
97
+
98
+ # Add prediction columns to the original DataFrame
99
+ combined_df = pd.concat([df.reset_index(drop=True), predictions_df], axis=1)
100
+
101
+ return combined_df
102
+
103
+ df = pd.read_csv('https://github.com/joaopn/gpu_benchmark_goemotions/raw/main/data/random_sample_10k.csv.gz')
104
+ df = sentiment_analysis_batched(df, batch_size=8, field_name='body')
105
+ ```
106
+