Commit
·
0d6e45a
1
Parent(s):
05a3ef0
Update README.md
Browse files
README.md
CHANGED
@@ -5,3 +5,35 @@ widget:
|
|
5 |
context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
|
6 |
example_title: "Weather API"
|
7 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
|
6 |
example_title: "Weather API"
|
7 |
---
|
8 |
+
# RESTBERTa
|
9 |
+
RESTBERTa is a fine-tuned Transformer Encoder model that supports machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation.
|
10 |
+
In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question).
|
11 |
+
|
12 |
+
# RESTBERTa for Semantic Parameter Matching
|
13 |
+
This repository contains the weights for a fined-tuned RESTBERTa model for the task of semantic parameter matching in Web APIs. For this, we formulate question answering as a multiple choice task:
|
14 |
+
Given a query in natural language that describes the purpose and behavior of the target parameter, i.e., its semantics, the model should choose the parameter from
|
15 |
+
a given schema, which consists of hierarchically organized parameters, e.g., a JSON or XML schema.
|
16 |
+
|
17 |
+
Note: BERT models are optimized for linear text input. We, therefore, serialize a schema, which is commonly a tree structure of hierarchically organized parameters, into linear text
|
18 |
+
by converting parameters into an XPath-like notation, e.g., "users[\*].name" for a parameter "name" that is part of an object of the array "users". The result is a list of alphabetically
|
19 |
+
sorted XPaths, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
|
20 |
+
|
21 |
+
# Fine-tuning
|
22 |
+
We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 1,085,051 question answering samples from 2,321
|
23 |
+
real-world OpenAPI documentation. Each sample consists of:
|
24 |
+
- Question: The natural language description of the parameter, e.g., "The name of a user"
|
25 |
+
- Answer: The parameter in an XPath-like notation, e.g., "users[\*].name"
|
26 |
+
- Paragraph: The schema where the parameter is contained within, which is a list of parameters in XPath-like notation, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
|
27 |
+
|
28 |
+
# Inference:
|
29 |
+
RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested parameter (see paper for more details).
|
30 |
+
|
31 |
+
# Hyperparameters:
|
32 |
+
The model was fine-tuned with ten epochs and a batch size of 16 on Nvidia RTX 3090 GPU with 24 GB. This repository contains the model checkpoint (weights) after five epochs of fine-tuning, which achieved the highest accuracy applied to our validation set.
|
33 |
+
|
34 |
+
# References:
|
35 |
+
- https://github.com/SebastianKotstein/Parameter-Matching-Web-APIs
|
36 |
+
- https://zenodo.org/record/8019625
|
37 |
+
|
38 |
+
# Citation:
|
39 |
+
"Semantic Parameter Matching in Web APIs with Transformer-based Question Answering" (accepted for publication, coming soon)
|