AnReu/math_albert · Could you explain the details about the finetune dataset?

In your paper, you explained the section about Fine-Tuning Data as follows:
In order to fine-tune our models, we paired each question with up to $N$ correct answers and the same number of incorrect answers. Up to $N$ correct answers were randomly chosen from the answers of the question. Each question in the corpus comes along with tags, i.e. categories indicating the topic of a question such as sequences-and-series or limits. As an incorrect answer for each question, we picked a random answer from one question sharing at least one tag with the original question by chance. This way, we chose up to $N$ incorrect answers independently from another.

This procedure yields 1.9 million examples for N=1 and 2.8 million examples for N=10, of which 90% were used as training data for the fine-tuning task. We presented to the model the entire text of the questions and answers using the structure introduced in the previous section. In addition, we pre-trained an ALBERT Model on MathSE (1) and fine-tuned it on N=1. We then let this model predict 1,000 answers to the 2021 test set. We evaluated the answers against the publicly available test set from last year and paired each correct answer with a randomly selected incorrect answer from the model's results. These question-answer pairs were used as an additional fine-tuning set which we denote by ANNOTATED.

I would like to ask about the selection of answers here, especially how the left and right ends of mathematical formulas are determined. Is it done by judging whether it is a symbol like $ using a similar regular expression method?