SebastianKotstein
/

restberta-qa-parameter-matching

Question Answering

Model card Files Files and versions Community

SebastianKotstein commited on Jul 27, 2023

Commit

0d6e45a

·

1 Parent(s): 05a3ef0

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -5,3 +5,35 @@ widget:
   context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
   example_title: "Weather API"
 ---

   context: "auth.key location.city location.city_id location.country location.lat location.lon location.postal_code state units"
   example_title: "Weather API"
 ---
+# RESTBERTa
+RESTBERTa is a fine-tuned Transformer Encoder model that supports machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation.
+In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question).
+# RESTBERTa for Semantic Parameter Matching
+This repository contains the weights for a fined-tuned RESTBERTa model for the task of semantic parameter matching in Web APIs. For this, we formulate question answering as a multiple choice task:
+Given a query in natural language that describes the purpose and behavior of the target parameter, i.e., its semantics, the model should choose the parameter from
+a given schema, which consists of hierarchically organized parameters, e.g., a JSON or XML schema.
+Note: BERT models are optimized for linear text input. We, therefore, serialize a schema, which is commonly a tree structure of hierarchically organized parameters, into linear text
+by converting parameters into an XPath-like notation, e.g., "users[\*].name" for a parameter "name" that is part of an object of the array "users". The result is a list of alphabetically
+sorted XPaths, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
+# Fine-tuning
+We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 1,085,051 question answering samples from 2,321
+real-world OpenAPI documentation. Each sample consists of:
+- Question: The natural language description of the parameter, e.g., "The name of a user"
+- Answer: The parameter in an XPath-like notation, e.g., "users[\*].name"
+- Paragraph: The schema where the parameter is contained within, which is a list of parameters in XPath-like notation, e.g., "link.href link.rel users[\*].id users[\*].name users[\*].surname".
+# Inference:
+RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested parameter (see paper for more details).
+# Hyperparameters:
+The model was fine-tuned with ten epochs and a batch size of 16 on Nvidia RTX 3090 GPU with 24 GB. This repository contains the model checkpoint (weights) after five epochs of fine-tuning, which achieved the highest accuracy applied to our validation set.
+# References:
+- https://github.com/SebastianKotstein/Parameter-Matching-Web-APIs
+- https://zenodo.org/record/8019625
+# Citation:
+"Semantic Parameter Matching in Web APIs with Transformer-based Question Answering" (accepted for publication, coming soon)