RESTBERTa

RESTBERTa stands for Representational State Transfer on Bidirectional Encoder Representations from Transformers approach and should support machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation. In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question). The identification and extraction of Web API syntax elements from Web API documentation is a common sub task of many Web API integration tasks, like parameter matching and endpoint discovery. Thus, RESTBERTa might be a foundation for several Web API integration tasks. Technically, RESTBERTa covers the concepts for fine-tuning a Transformer Encoder model, i.e., a pre-trained BERT model, to question answering with task-specific samples in order to prepare a model for a specific Web API integration task.

The paper "RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation" demonstrates the application of RESTBERTa to semantic parameter matching and endpoint discovery:

RESTBERTa for Parameter Matching and Endpoint Discovery:

This repository contains the weights of a CodeBERT base model that has been fine-tuned to the task of parameter matching and endpoint discovery. For this, we formulate question answering as a multiple choice task: Given a query in natural language that describes the purpose and behavior of the target parameter or endpoint, i.e., its semantics, the model should choose the parameter/endpoint from a hierarchical structure of parameters or endpoints, i.e., a schema or a URI model.

Note: BERT models are optimized for linear text input. We, therefore, serialize schemas and URI models into linear text by converting parameters and endpoints into an XPath-like notation, e.g., "users.{userId}.get" for an endpoint "GET /users/{userId}". The result is a list of alphabetically sorted XPaths, e.g., "users.get users.post users.{userId}.address.get users.{userId}.address.put users.{userId}.delete users.{userId}.get users.{userId}.put".

Fine-tuning

We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 909,089 question-answering samples from 2,321 real-world OpenAPI documentation. Each sample consists of:

Question: The natural language description of the parameter or endpoint, e.g., "Creates a new user"
Answer: The parameter or endpoint in an XPath-like notation, e.g., "users.post"
Paragraph: The hierarchical structure containing the parameter or endpoint, which is a list of parameters/endpoints in XPath-like notation, e.g., "users.get users.post users.{userId}.address.get users.{userId}.address.put users.{userId}.delete users.{userId}.get users.{userId}.put".

Inference:

RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested parameter or endpoint. We discuss the details in the paper.

Hyperparameters:

The model was fine-tuned with ten epochs and a batch size of 16 on a Nvidia Ampere GPU. This repository contains the model checkpoint (weights) after five epochs of fine-tuning, which achieved the highest accuracy applied to our parameter-matching validation set.

Citation:

@ARTICLE{10.1007/s10586-023-04237-x0,
  author={Kotstein, Sebastian and Decker, Christian},
  journal={Cluster Computing}, 
  title={RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation}, 
  year={2024},
  volume={},
  number={},
  pages={},
  publisher={Springer}
  doi={10.1007/s10586-023-04237-x}}