RESTBERTa

RESTBERTa stands for Representational State Transfer on Bidirectional Encoder Representations from Transformers approach and should support machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation. In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question). The identification and extraction of Web API syntax elements from Web API documentation is a common sub task of many Web API integration tasks, like parameter matching and endpoint discovery. Thus, RESTBERTa might be a foundation for several Web API integration tasks. Technically, RESTBERTa covers the concepts for fine-tuning a Transformer Encoder model, i.e., a pre-trained BERT model, to question answering with task-specific samples in order to prepare a model for a specific Web API integration task.

The paper "RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation" demonstrates the application of RESTBERTa to endpoint discovery, as well as semantic parameter matching:

RESTBERTa for Endpoint Discovery:

This repository contains the weights of a fined-tuned CodeBERT base model for the task of endpoint discovery in Web APIs. For this, we formulate question answering as a multiple choice task: Given a query in natural language that describes the purpose and behavior of the target endpoint, i.e., its semantics, the model should choose the endpoint from a given URI model, which is a tree structure.

Note: BERT models are optimized for linear text input. We, therefore, serialize a URI model into linear text by converting endpoints into an XPath-like notation, e.g., "users.{userId}.get" for an endpoint "GET /users/{userId}". The result is a list of alphabetically sorted XPaths, e.g., "users.get users.post users.{userId}.address.get users.{userId}.address.put users.{userId}.delete users.{userId}.get users.{userId}.put".

Fine-tuning

We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 44,595 question-answering samples from 2,321 real-world OpenAPI documentation. Each sample consists of:

Question: The natural language description of the endpoint, e.g., "Creates a new user"
Answer: The endpoint in an XPath-like notation, e.g., "users.post"
Paragraph: The URI model the endpoint is part of, which is a list of endpoints in XPath-like notation, e.g., "users.get users.post users.{userId}.address.get users.{userId}.address.put users.{userId}.delete users.{userId}.get users.{userId}.put".

Inference:

RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested endpoint. We discuss the details in the paper.

Hyperparameters:

The model was fine-tuned with ten epochs and a batch size of 16 on a Nvidia Ampere GPU. This repository contains the model checkpoint (weights) after ten epochs of fine-tuning, which achieved the highest accuracy applied to our validation set.

Citation:

@ARTICLE{10.1007/s10586-023-04237-x0,
  author={Kotstein, Sebastian and Decker, Christian},
  journal={Cluster Computing}, 
  title={RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation}, 
  year={2024},
  volume={},
  number={},
  pages={},
  publisher={Springer}
  doi={10.1007/s10586-023-04237-x}}