ReSyn — Set2Regex

This repository contains the pre-trained Set2Regex model presented in the paper ReSyn: A Generalized Recursive Regular Expression Synthesis Framework.

ReSyn is a synthesizer-agnostic divide-and-conquer framework that decomposes complex regular expression synthesis problems into manageable sub-problems by adaptively predicting whether to split examples sequentially (Concatenation) or group them by structural similarity (Union).

Set2Regex is the core neural synthesizer. Given a set of positive and negative example strings, it autoregressively generates a regular expression that matches every positive string and rejects every negative string. It encodes the example set with a hierarchical (character-level then string-level) Transformer and decodes the regex with set- and string-conditioned Transformer decoders. Greedy, top-k/top-p sampling, and beam search decoding are supported (see predict).

Links

Usage

These are custom PyTorch models that use PyTorchModelHubMixin. The model class is defined in the GitHub repository; clone it first so that the ReSyn package is importable, then:

from ReSyn.model import Set2Regex

model = Set2Regex.from_pretrained("mrseongminkim/ReSyn-Set2Regex").eval()

See ReSyn/server.py for the full input encoding / output decoding used at inference time.

Citation

If you find this work useful, please cite:

@inproceedings{kim2026resyn,
  title={ReSyn: A Generalized Recursive Regular Expression Synthesis Framework},
  author={Kim, Seongmin and Cheon, Hyunjoon and Kim, Su-Hyeon and Han, Yo-Sub and Ko, Sang-Ki},
  booktitle={Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-26)},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
10.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mrseongminkim/ReSyn-Set2Regex

Paper for mrseongminkim/ReSyn-Set2Regex