ReSyn — Set2Regex
This repository contains the pre-trained Set2Regex model presented in the paper ReSyn: A Generalized Recursive Regular Expression Synthesis Framework.
ReSyn is a synthesizer-agnostic divide-and-conquer framework that decomposes complex regular expression synthesis problems into manageable sub-problems by adaptively predicting whether to split examples sequentially (Concatenation) or group them by structural similarity (Union).
Set2Regex is the core neural synthesizer. Given a set of positive and negative example strings, it autoregressively generates a regular expression that matches every positive string and rejects every negative string. It encodes the example set with a hierarchical (character-level then string-level) Transformer and decodes the regex with set- and string-conditioned Transformer decoders. Greedy, top-k/top-p sampling, and beam search decoding are supported (see predict).
Links
- Paper: ReSyn: A Generalized Recursive Regular Expression Synthesis Framework
- GitHub Repository: mrseongminkim/ReSyn
- Dataset: mrseongminkim/ReSyn
Usage
These are custom PyTorch models that use PyTorchModelHubMixin. The model class is defined in the GitHub repository; clone it first so that the ReSyn package is importable, then:
from ReSyn.model import Set2Regex
model = Set2Regex.from_pretrained("mrseongminkim/ReSyn-Set2Regex").eval()
See ReSyn/server.py for the full input encoding / output decoding used at inference time.
Citation
If you find this work useful, please cite:
@inproceedings{kim2026resyn,
title={ReSyn: A Generalized Recursive Regular Expression Synthesis Framework},
author={Kim, Seongmin and Cheon, Hyunjoon and Kim, Su-Hyeon and Han, Yo-Sub and Ko, Sang-Ki},
booktitle={Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-26)},
year={2026}
}
- Downloads last month
- -