SebastianKotstein commited on
Commit
0d6e45a
·
1 Parent(s): 05a3ef0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -5,3 +5,35 @@ widget:
5
  context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
6
  example_title: "Weather API"
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
6
  example_title: "Weather API"
7
  ---
8
+ # RESTBERTa
9
+ RESTBERTa is a fine-tuned Transformer Encoder model that supports machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation.
10
+ In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question).
11
+
12
+ # RESTBERTa for Semantic Parameter Matching
13
+ This repository contains the weights for a fined-tuned RESTBERTa model for the task of semantic parameter matching in Web APIs. For this, we formulate question answering as a multiple choice task:
14
+ Given a query in natural language that describes the purpose and behavior of the target parameter, i.e., its semantics, the model should choose the parameter from
15
+ a given schema, which consists of hierarchically organized parameters, e.g., a JSON or XML schema.
16
+
17
+ Note: BERT models are optimized for linear text input. We, therefore, serialize a schema, which is commonly a tree structure of hierarchically organized parameters, into linear text
18
+ by converting parameters into an XPath-like notation, e.g., "users[\*].name" for a parameter "name" that is part of an object of the array "users". The result is a list of alphabetically
19
+ sorted XPaths, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
20
+
21
+ # Fine-tuning
22
+ We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 1,085,051 question answering samples from 2,321
23
+ real-world OpenAPI documentation. Each sample consists of:
24
+ - Question: The natural language description of the parameter, e.g., "The name of a user"
25
+ - Answer: The parameter in an XPath-like notation, e.g., "users[\*].name"
26
+ - Paragraph: The schema where the parameter is contained within, which is a list of parameters in XPath-like notation, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
27
+
28
+ # Inference:
29
+ RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested parameter (see paper for more details).
30
+
31
+ # Hyperparameters:
32
+ The model was fine-tuned with ten epochs and a batch size of 16 on Nvidia RTX 3090 GPU with 24 GB. This repository contains the model checkpoint (weights) after five epochs of fine-tuning, which achieved the highest accuracy applied to our validation set.
33
+
34
+ # References:
35
+ - https://github.com/SebastianKotstein/Parameter-Matching-Web-APIs
36
+ - https://zenodo.org/record/8019625
37
+
38
+ # Citation:
39
+ "Semantic Parameter Matching in Web APIs with Transformer-based Question Answering" (accepted for publication, coming soon)