{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cleaning Math Abstracts"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Does the arXiv API pull AMS subject tags?\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Look at the metadata from Gyu Eun's paper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: arxiv in c:\\users\\leems\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (1.4.3)\n",
      "Requirement already satisfied: feedparser in c:\\users\\leems\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from arxiv) (6.0.10)\n",
      "Requirement already satisfied: sgmllib3k in c:\\users\\leems\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from feedparser->arxiv) (1.0.0)\n"
     ]
    }
   ],
   "source": [
    "!pip install arxiv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import arxiv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['math.AP', '35Q55 (35B30 35B33)'], ['math.AP', '35Q55 (35B30 35B40)'], ['math.AP', 'math-ph', 'math.MP', '35Q55 (Primary) 35Q40, 35B30, 35B40 (Secondary)'], ['math.NT', 'math.DS', '37P30, 11G50'], ['math.GM'], ['math.GM'], ['math.AG', 'math.NT', '11G50, 14G40'], ['math.NT', 'math.DS'], ['math.NT', 'math.CV', 'math.DS', '11G50, 14G50, 32H50, 37P05, 37P30'], ['eess.SY', 'cs.SY']]\n"
     ]
    }
   ],
   "source": [
    "## Create an arxiv search for Gyu Eun's papers\n",
    "\n",
    "query = 'au:Gyu Eun Lee'\n",
    "\n",
    "search = arxiv.Search(query=query,max_results=10)\n",
    "results = search.results()\n",
    "\n",
    "categories = []\n",
    "\n",
    "for result in results:\n",
    "    categories.append(result.categories)\n",
    "\n",
    "print(categories)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yes, at least the scraper sometimes has MSC classification within the category tag."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Are these tags present int he arXiv kaggle dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>109618</th>\n",
       "      <td>Let $f: \\mathbb{C} \\to X$ be a transcendental holomorphic curve into a complex projective manifold $X$. Let $L$ be a very ample line bundle on $X$. Let $s$ be a very generic holomorphic section of $L$ and $D$ the zero divisor given by $s$. We prove that the \\emph{geometric} defect of $D$ (defect of truncation $1$) with respect to $f$ is zero. We also prove that $f$ almost misses general enough analytic subsets on $X$ of codimension $2$.</td>\n",
       "      <td>[math.CV, math.AG, math.DS]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98189</th>\n",
       "      <td>In this paper we prove two approximation results for divergence free vector fields. The first is a form of an assertion of J. Bourgain and H. Brezis concerning the approximation of solenoidal charges in the strict topology: Given $F \\in M_b(\\mathbb{R}^d;\\mathbb{R}^d)$ such that $\\operatorname*{div} F=0$ in the sense of distributions, there exist $C^1$ closed curves $\\{\\Gamma_{i,l}\\}_{\\{1,\\ldots,n_l\\}\\times \\mathbb{N}}$, with parameterization by arclength $\\gamma_{i,l} \\in C^1([0,L_{i,l}];\\mathbb{R}^d)$, $l \\leq L_{i,l} \\leq 2l$, for which \\[ F= \\lim_{l \\to \\infty} \\frac{\\|F\\|_{M_b(\\mathbb{R}^d;\\mathbb{R}^d)}}{n_l \\cdot l} \\sum_{i=1}^{n_l} \\dot{\\gamma}_{i,l} \\left.\\mathcal{H}^1\\right\\vert_{\\Gamma_{i,l}} \\] weakly-star as measures and \\begin{align*} \\lim_{l \\to \\infty} \\frac{1}{n_l \\cdot l} \\sum_{i=1}^{n_l} |\\Gamma_{i,l}| = 1. \\end{align*} The second, which is an almost immediate consequence of the first, is that smooth compactly supported functions are dense in \\[ \\left\\{ F \\in M_b(\\mathbb{R}^d;\\mathbb{R}^d): \\operatorname*{div}F=0 \\right\\} \\] with respect to the strict topology.</td>\n",
       "      <td>[math.AP, math.FA]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79941</th>\n",
       "      <td>Ion transport in biological tissues is crucial in the study of many biological and pathological problems. Some multi-cellular structures, like smooth muscles on the vessel walls, could be treated as periodic bi-domain structures, which consist of intracellular space and extracellular space with semipermeable membranes in between. With the aid of two-scale homogenization theory, macro-scale models are proposed based on an electro-neutral (EN) microscale model with nonlinear interface conditions, where membranes are treated as combinations of capacitors and resistors. The connectivity of intracellular space is also taken into consideration. If the intracellular space is fully connected and forms a syncytium, then the macroscale model is a bidomain nonlinear coupled partial differential equations system. Otherwise, when the intracellular cells are not connected, the macroscale model for intracellular space is an ordinary differential system with source/sink terms from the connected extracellular space.</td>\n",
       "      <td>[math.AP, math.DS]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57526</th>\n",
       "      <td>Quantum error-correcting codes are used to protect qubits involved in quantum computation. This process requires logical operators, acting on protected qubits, to be translated into physical operators (circuits) acting on physical quantum states. We propose a mathematical framework for synthesizing physical circuits that implement logical Clifford operators for stabilizer codes. Circuit synthesis is enabled by representing the desired physical Clifford operator in $\\mathbb{C}^{N \\times N}$ as a partial $2m \\times 2m$ binary symplectic matrix, where $N = 2^m$. We state and prove two theorems that use symplectic transvections to efficiently enumerate all binary symplectic matrices that satisfy a system of linear equations. As a corollary of these results, we prove that for an $[\\![ m,k ]\\!]$ stabilizer code every logical Clifford operator has $2^{r(r+1)/2}$ symplectic solutions, where $r = m-k$, up to stabilizer degeneracy. The desired physical circuits are then obtained by decomposing each solution into a product of elementary symplectic matrices, that correspond to elementary circuits. This enumeration of all physical realizations enables optimization over the ensemble with respect to a suitable metric. Furthermore, we show that any circuit that normalizes the stabilizer of the code can be transformed into a circuit that centralizes the stabilizer, while realizing the same logical operation. Our method of circuit synthesis can be applied to any stabilizer code, and this paper discusses a proof of concept synthesis for the $[\\![ 6,4,2 ]\\!]$ CSS code. Programs implementing the algorithms in this paper, which includes routines to solve for binary symplectic solutions of general linear systems and our overall LCS (logical circuit synthesis) algorithm, can be found at: https://github.com/nrenga/symplectic-arxiv18a</td>\n",
       "      <td>[quant-ph, cs.IT, math.IT]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122545</th>\n",
       "      <td>We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein-type inequality, showing that the network jackknife leads to conservative estimates of the variance (in expectation) for any network functional that is invariant to node permutation. For a general class of count functionals, we also establish consistency of the network jackknife. We complement our theoretical analysis with a range of simulated and real-data examples and show that the network jackknife offers competitive performance in cases where other resampling methods are known to be valid. In fact, for several network statistics, we see that the jackknife provides more accurate inferences compared to related methods such as subsampling.</td>\n",
       "      <td>[math.ST, stat.ME, stat.ML, stat.TH]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41275</th>\n",
       "      <td>There has been a lot of effort to construct good quantum codes from the classical error correcting codes. Constructing new quantum codes, using Hermitian self-orthogonal codes, seems to be a difficult problem in general. In this paper, Hermitian self-orthogonal codes are studied from algebraic function fields. Sufficient conditions for the Hermitian self-orthogonality of an algebraic geometry code are presented. New Hermitian self-orthogonal codes are constructed from projective lines, elliptic curves, hyper-elliptic curves, Hermitian curves, and Artin-Schreier curves. In addition, over the projective lines, we construct new families of MDS quantum codes with parameters $[[N,N-2K,K+1]]_q$ under the following conditions: i) $N=t(q-1)+1$ or $t(q-1)+2$ with $t|(q+1)$ and $K=\\lfloor\\frac{t(q-1)+1}{2t}\\rfloor+1$; ii) $(n-1)|(q^2-1)$, $N=n$ or $N=n+1$, $K_0=\\lfloor\\frac{n+q-1}{q+1}\\rfloor$, and $K\\ge K_0+1$; iii) $N=tq+1$, $\\forall~1\\le t\\le q$ and $K=\\lfloor\\frac{tq+q-1}{q+1}\\rfloor+1$; iv) $n|(q^2-1)$, $n_2=\\frac{n}{\\gcd (n,q+1)}$, $\\forall~ 1\\le t\\le \\frac{q-1}{n_2}-1$, $N=(t+1)n+2$ and $K=\\lfloor \\frac{(t+1)n+1+q-1}{q+1}\\rfloor+1$.</td>\n",
       "      <td>[cs.IT, math.IT]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69961</th>\n",
       "      <td>With G=GL(n,C), let $\\mathcal{X}_{\\Gamma}G$ be the G-character variety of a given finitely presented group $\\Gamma$, and let $\\mathcal{X}^{irr}_{\\Gamma}G \\subset \\mathcal{X}_{\\Gamma}G$ be the locus of irreducible representation conjugacy classes. We provide a concrete relation, in terms of plethystic functions, between the generating series for E- polynomials of $\\mathcal{X}_{\\Gamma}G$ and the one for $\\mathcal{X}^{irr}_{\\Gamma}G$, generalizing a formula of Mozgovoy-Reineke [MR]. The proof uses a natural stratification of $\\mathcal{X}_{\\Gamma}G$ coming from affine GIT, the combinatorics of partitions, and the formula of MacDonald-Cheah for symmetric products; we also adapt it to the so-called Cartan brane in the moduli space of Higgs bundles. Combining our methods with arithmetic ones yields explicit expressions for the E-polynomials of the irreducible stratum of GL(n,C)-character varieties of some groups $\\Gamma$, including surface groups, free groups, and torus knot groups, for low values of $n$.</td>\n",
       "      <td>[math.AG, math.RT]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127482</th>\n",
       "      <td>Let $m, n$ be positive integers such that $m&gt;1$ divides $n$. In this paper, we introduce a special class of piecewise-affine permutations of the finite set $[1, n]:=\\{1, \\ldots, n\\}$ with the property that the reduction $\\pmod m$ of $m$ consecutive elements in any of its cycles is, up to a cyclic shift, a fixed permutation of $[1, m]$. Our main result provides the cycle decomposition of such permutations. We further show that such permutations give rise to permutations of finite fields. In particular, we explicitly obtain classes of permutation polynomials of finite fields whose cycle decomposition and its inverse are explicitly given.</td>\n",
       "      <td>[math.NT, math.CO]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13100</th>\n",
       "      <td>We study one-parameter deformations of Calabi-Yau type Fermat polynomial singularities along degree-one directions. We show that twisted sectors in the vanishing cohomology are components of automorphic forms for certain triangular groups. We prove consequentially that genus zero Gromov-Witten generating series of the corresponding Fermat Calabi-Yau varieties are components of automorphic forms. The main tools we use are mixed Hodge structures for quasi-homogeneous polynomial singularities, Riemann-Hilbert correspondence, and genus zero mirror symmetry.</td>\n",
       "      <td>[math.AG, math-ph, math.CA, math.MP]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56062</th>\n",
       "      <td>We consider the \\textit{phase retrieval} problem of recovering a sparse signal $\\mathbf{x}$ in $\\mathbb{R}^d$ from intensity-only measurements in dimension $d \\geq 2$. Phase retrieval can be equivalently formulated as the problem of recovering a signal from its autocorrelation, which is in turn directly related to the combinatorial problem of recovering a set from its pairwise differences. In one spatial dimension, this problem is well studied and known as the \\textit{turnpike problem}. In this work, we present MISTR (Multidimensional Intersection Sparse supporT Recovery), an algorithm which exploits this formulation to recover the support of a multidimensional signal from magnitude-only measurements. MISTR takes advantage of the structure of multiple dimensions to provably achieve the same accuracy as the best one-dimensional algorithms in dramatically less time. We prove theoretically that MISTR correctly recovers the support of signals distributed as a Gaussian point process with high probability as long as sparsity is at most $\\mathcal{O}\\left(n^{d\\theta}\\right)$ for any $\\theta &lt; 1/2$, where $n^d$ represents pixel size in a fixed image window. In the case that magnitude measurements are corrupted by noise, we provide a thresholding scheme with theoretical guarantees for sparsity at most $\\mathcal{O}\\left(n^{d\\theta}\\right)$ for $\\theta &lt; 1/4$ that obviates the need for MISTR to explicitly handle noisy autocorrelation data. Detailed and reproducible numerical experiments demonstrate the effectiveness of our algorithm, showing that in practice MISTR enjoys time complexity which is nearly linear in the size of the input.</td>\n",
       "      <td>[math.CO, cs.NA, eess.SP, math.NA, math.PR]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   abstract  \\\n",
       "109618    Let $f: \\mathbb{C} \\to X$ be a transcendental holomorphic curve into a complex projective manifold $X$. Let $L$ be a very ample line bundle on $X$. Let $s$ be a very generic holomorphic section of $L$ and $D$ the zero divisor given by $s$. We prove that the \\emph{geometric} defect of $D$ (defect of truncation $1$) with respect to $f$ is zero. We also prove that $f$ almost misses general enough analytic subsets on $X$ of codimension $2$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "98189     In this paper we prove two approximation results for divergence free vector fields. The first is a form of an assertion of J. Bourgain and H. Brezis concerning the approximation of solenoidal charges in the strict topology: Given $F \\in M_b(\\mathbb{R}^d;\\mathbb{R}^d)$ such that $\\operatorname*{div} F=0$ in the sense of distributions, there exist $C^1$ closed curves $\\{\\Gamma_{i,l}\\}_{\\{1,\\ldots,n_l\\}\\times \\mathbb{N}}$, with parameterization by arclength $\\gamma_{i,l} \\in C^1([0,L_{i,l}];\\mathbb{R}^d)$, $l \\leq L_{i,l} \\leq 2l$, for which \\[ F= \\lim_{l \\to \\infty} \\frac{\\|F\\|_{M_b(\\mathbb{R}^d;\\mathbb{R}^d)}}{n_l \\cdot l} \\sum_{i=1}^{n_l} \\dot{\\gamma}_{i,l} \\left.\\mathcal{H}^1\\right\\vert_{\\Gamma_{i,l}} \\] weakly-star as measures and \\begin{align*} \\lim_{l \\to \\infty} \\frac{1}{n_l \\cdot l} \\sum_{i=1}^{n_l} |\\Gamma_{i,l}| = 1. \\end{align*} The second, which is an almost immediate consequence of the first, is that smooth compactly supported functions are dense in \\[ \\left\\{ F \\in M_b(\\mathbb{R}^d;\\mathbb{R}^d): \\operatorname*{div}F=0 \\right\\} \\] with respect to the strict topology.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "79941     Ion transport in biological tissues is crucial in the study of many biological and pathological problems. Some multi-cellular structures, like smooth muscles on the vessel walls, could be treated as periodic bi-domain structures, which consist of intracellular space and extracellular space with semipermeable membranes in between. With the aid of two-scale homogenization theory, macro-scale models are proposed based on an electro-neutral (EN) microscale model with nonlinear interface conditions, where membranes are treated as combinations of capacitors and resistors. The connectivity of intracellular space is also taken into consideration. If the intracellular space is fully connected and forms a syncytium, then the macroscale model is a bidomain nonlinear coupled partial differential equations system. Otherwise, when the intracellular cells are not connected, the macroscale model for intracellular space is an ordinary differential system with source/sink terms from the connected extracellular space.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              \n",
       "57526     Quantum error-correcting codes are used to protect qubits involved in quantum computation. This process requires logical operators, acting on protected qubits, to be translated into physical operators (circuits) acting on physical quantum states. We propose a mathematical framework for synthesizing physical circuits that implement logical Clifford operators for stabilizer codes. Circuit synthesis is enabled by representing the desired physical Clifford operator in $\\mathbb{C}^{N \\times N}$ as a partial $2m \\times 2m$ binary symplectic matrix, where $N = 2^m$. We state and prove two theorems that use symplectic transvections to efficiently enumerate all binary symplectic matrices that satisfy a system of linear equations. As a corollary of these results, we prove that for an $[\\![ m,k ]\\!]$ stabilizer code every logical Clifford operator has $2^{r(r+1)/2}$ symplectic solutions, where $r = m-k$, up to stabilizer degeneracy. The desired physical circuits are then obtained by decomposing each solution into a product of elementary symplectic matrices, that correspond to elementary circuits. This enumeration of all physical realizations enables optimization over the ensemble with respect to a suitable metric. Furthermore, we show that any circuit that normalizes the stabilizer of the code can be transformed into a circuit that centralizes the stabilizer, while realizing the same logical operation. Our method of circuit synthesis can be applied to any stabilizer code, and this paper discusses a proof of concept synthesis for the $[\\![ 6,4,2 ]\\!]$ CSS code. Programs implementing the algorithms in this paper, which includes routines to solve for binary symplectic solutions of general linear systems and our overall LCS (logical circuit synthesis) algorithm, can be found at: https://github.com/nrenga/symplectic-arxiv18a    \n",
       "122545    We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein-type inequality, showing that the network jackknife leads to conservative estimates of the variance (in expectation) for any network functional that is invariant to node permutation. For a general class of count functionals, we also establish consistency of the network jackknife. We complement our theoretical analysis with a range of simulated and real-data examples and show that the network jackknife offers competitive performance in cases where other resampling methods are known to be valid. In fact, for several network statistics, we see that the jackknife provides more accurate inferences compared to related methods such as subsampling.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "41275     There has been a lot of effort to construct good quantum codes from the classical error correcting codes. Constructing new quantum codes, using Hermitian self-orthogonal codes, seems to be a difficult problem in general. In this paper, Hermitian self-orthogonal codes are studied from algebraic function fields. Sufficient conditions for the Hermitian self-orthogonality of an algebraic geometry code are presented. New Hermitian self-orthogonal codes are constructed from projective lines, elliptic curves, hyper-elliptic curves, Hermitian curves, and Artin-Schreier curves. In addition, over the projective lines, we construct new families of MDS quantum codes with parameters $[[N,N-2K,K+1]]_q$ under the following conditions: i) $N=t(q-1)+1$ or $t(q-1)+2$ with $t|(q+1)$ and $K=\\lfloor\\frac{t(q-1)+1}{2t}\\rfloor+1$; ii) $(n-1)|(q^2-1)$, $N=n$ or $N=n+1$, $K_0=\\lfloor\\frac{n+q-1}{q+1}\\rfloor$, and $K\\ge K_0+1$; iii) $N=tq+1$, $\\forall~1\\le t\\le q$ and $K=\\lfloor\\frac{tq+q-1}{q+1}\\rfloor+1$; iv) $n|(q^2-1)$, $n_2=\\frac{n}{\\gcd (n,q+1)}$, $\\forall~ 1\\le t\\le \\frac{q-1}{n_2}-1$, $N=(t+1)n+2$ and $K=\\lfloor \\frac{(t+1)n+1+q-1}{q+1}\\rfloor+1$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "69961     With G=GL(n,C), let $\\mathcal{X}_{\\Gamma}G$ be the G-character variety of a given finitely presented group $\\Gamma$, and let $\\mathcal{X}^{irr}_{\\Gamma}G \\subset \\mathcal{X}_{\\Gamma}G$ be the locus of irreducible representation conjugacy classes. We provide a concrete relation, in terms of plethystic functions, between the generating series for E- polynomials of $\\mathcal{X}_{\\Gamma}G$ and the one for $\\mathcal{X}^{irr}_{\\Gamma}G$, generalizing a formula of Mozgovoy-Reineke [MR]. The proof uses a natural stratification of $\\mathcal{X}_{\\Gamma}G$ coming from affine GIT, the combinatorics of partitions, and the formula of MacDonald-Cheah for symmetric products; we also adapt it to the so-called Cartan brane in the moduli space of Higgs bundles. Combining our methods with arithmetic ones yields explicit expressions for the E-polynomials of the irreducible stratum of GL(n,C)-character varieties of some groups $\\Gamma$, including surface groups, free groups, and torus knot groups, for low values of $n$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "127482    Let $m, n$ be positive integers such that $m>1$ divides $n$. In this paper, we introduce a special class of piecewise-affine permutations of the finite set $[1, n]:=\\{1, \\ldots, n\\}$ with the property that the reduction $\\pmod m$ of $m$ consecutive elements in any of its cycles is, up to a cyclic shift, a fixed permutation of $[1, m]$. Our main result provides the cycle decomposition of such permutations. We further show that such permutations give rise to permutations of finite fields. In particular, we explicitly obtain classes of permutation polynomials of finite fields whose cycle decomposition and its inverse are explicitly given.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "13100     We study one-parameter deformations of Calabi-Yau type Fermat polynomial singularities along degree-one directions. We show that twisted sectors in the vanishing cohomology are components of automorphic forms for certain triangular groups. We prove consequentially that genus zero Gromov-Witten generating series of the corresponding Fermat Calabi-Yau varieties are components of automorphic forms. The main tools we use are mixed Hodge structures for quasi-homogeneous polynomial singularities, Riemann-Hilbert correspondence, and genus zero mirror symmetry.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "56062     We consider the \\textit{phase retrieval} problem of recovering a sparse signal $\\mathbf{x}$ in $\\mathbb{R}^d$ from intensity-only measurements in dimension $d \\geq 2$. Phase retrieval can be equivalently formulated as the problem of recovering a signal from its autocorrelation, which is in turn directly related to the combinatorial problem of recovering a set from its pairwise differences. In one spatial dimension, this problem is well studied and known as the \\textit{turnpike problem}. In this work, we present MISTR (Multidimensional Intersection Sparse supporT Recovery), an algorithm which exploits this formulation to recover the support of a multidimensional signal from magnitude-only measurements. MISTR takes advantage of the structure of multiple dimensions to provably achieve the same accuracy as the best one-dimensional algorithms in dramatically less time. We prove theoretically that MISTR correctly recovers the support of signals distributed as a Gaussian point process with high probability as long as sparsity is at most $\\mathcal{O}\\left(n^{d\\theta}\\right)$ for any $\\theta < 1/2$, where $n^d$ represents pixel size in a fixed image window. In the case that magnitude measurements are corrupted by noise, we provide a thresholding scheme with theoretical guarantees for sparsity at most $\\mathcal{O}\\left(n^{d\\theta}\\right)$ for $\\theta < 1/4$ that obviates the need for MISTR to explicitly handle noisy autocorrelation data. Detailed and reproducible numerical experiments demonstrate the effectiveness of our algorithm, showing that in practice MISTR enjoys time complexity which is nearly linear in the size of the input.                                                                                                                                                                                                  \n",
       "\n",
       "                                                cat  \n",
       "109618  [math.CV, math.AG, math.DS]                  \n",
       "98189   [math.AP, math.FA]                           \n",
       "79941   [math.AP, math.DS]                           \n",
       "57526   [quant-ph, cs.IT, math.IT]                   \n",
       "122545  [math.ST, stat.ME, stat.ML, stat.TH]         \n",
       "41275   [cs.IT, math.IT]                             \n",
       "69961   [math.AG, math.RT]                           \n",
       "127482  [math.NT, math.CO]                           \n",
       "13100   [math.AG, math-ph, math.CA, math.MP]         \n",
       "56062   [math.CO, cs.NA, eess.SP, math.NA, math.PR]  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Re-load the dataset and look at some categories and abstracts.\n",
    "import pandas as pd\n",
    "pd.set_option('display.max_colwidth', 0)\n",
    "\n",
    "data = pd.read_parquet('./data/arXiv.parquet')\n",
    "short_data = pd.read_parquet('./data/arXiv.parquet',columns=['abstract','cat'])\n",
    "short_data['abstract'] = short_data['abstract'].str.replace('\\n',' ')\n",
    "\n",
    "short_sample = short_data.sample(10)\n",
    "\n",
    "short_sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Record some complicated examples of latex present in abstracts\n",
    "\n",
    "indices = [139098,50283,169377,32935,38604,132354]\n",
    "\n",
    "## One idea for fixing K\\\"ahler and related. Find patterns of the form \\\" etc and replace them by the letter\n",
    "## That follows the \". Hence K\\\"ahler -> Kahler. HOWEVER, sometimes they encase the letter in {}:\n",
    "## K\\\"{a}hler for instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>139098</th>\n",
       "      <td>Telgarsky's conjecture may fail</td>\n",
       "      <td>Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a\\ntopological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player\\n{\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic.\\nWe prove that this statement is consistently false.\\n  More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if\\n{\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a\\n$T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding\\nargument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice\\nproperties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive\\npair of her opponent's moves, all essential information about the play of the\\ngame before the current move. Our proof shows that under\\n$\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base\\nthat enables this coding strategy.\\n  Translated into the language of partially ordered sets, what we really show\\nis that $\\mathsf{GCH}+\\square$ implies the following statement, which is\\nequivalent to the existence of the \"nice'' $\\pi$-bases mentioned above:\\n\\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense\\nsub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q\\n\\}| &lt; \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is\\nindependent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it\\nis false even for ccc posets if $\\mathfrak{b} &gt; \\aleph_1$. We also show that if\\n$|\\mathbb P| &lt; \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of\\n$\\mathsf{GCH}$ holding below $|\\mathbb P|$.\\n</td>\n",
       "      <td>[math.LO, math.GN]</td>\n",
       "      <td>[['Brian', 'Will', ''], ['Dow', 'Alan', ''], ['Milovich', 'David', ''], ['Yengulalp', 'Lynne', '']]</td>\n",
       "      <td>2019-12-10</td>\n",
       "      <td>1912.03327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50283</th>\n",
       "      <td>Large Deviation Principle for the Greedy Exploration Algorithm over\\n  Erd\\\"os-R\\'enyi Graphs</td>\n",
       "      <td>We prove a large deviation principle for a greedy exploration process on an\\nErd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove\\nour main result, we use the general strategy to study large deviations of\\nprocesses proposed by Feng and Kurtz, based on the convergence of non-linear\\nsemigroups. The rate function can be expressed in a closed-form formula, and\\nassociated optimization problems can be solved explicitly, providing the large\\ndeviation trajectory. Also, we derive an LDP for the size of the maximum\\nindependent set discovered by such an algorithm and analyze the probability\\nthat it exceeds known bounds for the maximal independent set. We also analyze\\nthe link between these results and the landscape complexity of the independent\\nset and the exploration dynamic.\\n</td>\n",
       "      <td>[math.PR]</td>\n",
       "      <td>[['Bermolen', 'P.', ''], ['Goicoechea', 'V.', ''], ['Jonckheere', 'M.', ''], ['Mordecki', 'E.', '']]</td>\n",
       "      <td>2021-10-11</td>\n",
       "      <td>2007.04753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169377</th>\n",
       "      <td>Orthogonal expansions related to compact Gelfand pairs</td>\n",
       "      <td>Given a compact Gelfand pair (G,K) and a locally compact group L, we\\ncharacterize the class P_K^\\sharp(G,L) of continuous positive definite\\nfunctions f:G\\times L\\to \\C which are bi-invariant in the G-variable with\\nrespect to K. The functions of this class are the functions having a uniformly\\nconvergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in\\nG,u\\in L, where the sum is over the space Z of positive definite spherical\\nfunctions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z}\\nis a family of continuous positive definite functions on L such that\\n\\sum_{\\varphi\\in Z}B(\\varphi)(e_L)&lt;\\infty. Here e_L is the neutral element of\\nthe group L. For a compact abelian group G considered as a Gelfand pair (G,K)\\nwith trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms\\nof Fourier expansions on the dual group \\widehat{G}.\\n  The result is described in detail for the case of the Gelfand pairs\\n(O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand\\npairs.\\n  The result generalizes recent theorems of Berg-Porcu (2016) and\\nGuella-Menegatto (2016)\\n</td>\n",
       "      <td>[math.CA]</td>\n",
       "      <td>[['Berg', 'Christian', '', 'University of Copenhagen'], ['Peron', 'Ana P.', '', 'ICMC-USP-São Carlos'], ['Porcu', 'Emilio', '', 'University Federico Santa Maria']]</td>\n",
       "      <td>2019-03-20</td>\n",
       "      <td>1612.03718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32935</th>\n",
       "      <td>Congruent numbers, elliptic curves, and the passage from the local to\\n  the global: an update</td>\n",
       "      <td>This update to my article on Congruent numbers, elliptic curves, and the\\npassage from the local to the global, which appeared in Resonance, December\\n2009, pp. 1183--1205\\n(https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted\\nhere as arXiv:0704.3783, covers a few recent advances in the arithmetic of\\nelliptic curves with special reference to the congruent number problem.\\n</td>\n",
       "      <td>[math.NT]</td>\n",
       "      <td>[['Dalawat', 'Chandan Singh', '']]</td>\n",
       "      <td>2022-02-09</td>\n",
       "      <td>2201.11071</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38604</th>\n",
       "      <td>Around the nonlinear Ryll-Nardzewski theorem</td>\n",
       "      <td>Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach\\nspace with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a\\nnonexpansive and norm-distal dynamical system, then there is a fixed point of\\n$S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a\\nconsequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem\\nconcerning isometries in $L$-embedded Banach spaces. A similar statement is\\nproved for weakly compact convex subsets of a locally convex space, thus giving\\nthe nonlinear counterpart of the Ryll-Nardzewski theorem.\\n</td>\n",
       "      <td>[math.DS, math.FA, math.GR]</td>\n",
       "      <td>[['Wiśnicki', 'Andrzej', '']]</td>\n",
       "      <td>2022-01-03</td>\n",
       "      <td>1903.12123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132354</th>\n",
       "      <td>New upper bounds for the bondage number of a graph in terms of its\\n  maximum degree and Euler characteristic</td>\n",
       "      <td>The bondage number $b(G)$ of a graph $G$ is the smallest number of edges\\nwhose removal from $G$ results in a graph with larger domination number. Let\\n$G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as\\npossible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently\\nfound upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the\\nEuler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper\\nbound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root\\nof the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper\\nbound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor\\n\\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for\\n$b(G)$ when the girth, order, or size of the graph $G$ is large compared with\\nits Euler characteristic $\\chi$.\\n</td>\n",
       "      <td>[math.CO]</td>\n",
       "      <td>[['Huang', 'Jia', ''], ['Shen', 'Jian', '']]</td>\n",
       "      <td>2020-02-04</td>\n",
       "      <td>2002.00765</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                title  \\\n",
       "139098  Telgarsky's conjecture may fail                                                                                 \n",
       "50283   Large Deviation Principle for the Greedy Exploration Algorithm over\\n  Erd\\\"os-R\\'enyi Graphs                   \n",
       "169377  Orthogonal expansions related to compact Gelfand pairs                                                          \n",
       "32935   Congruent numbers, elliptic curves, and the passage from the local to\\n  the global: an update                  \n",
       "38604   Around the nonlinear Ryll-Nardzewski theorem                                                                    \n",
       "132354  New upper bounds for the bondage number of a graph in terms of its\\n  maximum degree and Euler characteristic   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             abstract  \\\n",
       "139098    Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a\\ntopological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player\\n{\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic.\\nWe prove that this statement is consistently false.\\n  More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if\\n{\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a\\n$T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding\\nargument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice\\nproperties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive\\npair of her opponent's moves, all essential information about the play of the\\ngame before the current move. Our proof shows that under\\n$\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base\\nthat enables this coding strategy.\\n  Translated into the language of partially ordered sets, what we really show\\nis that $\\mathsf{GCH}+\\square$ implies the following statement, which is\\nequivalent to the existence of the \"nice'' $\\pi$-bases mentioned above:\\n\\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense\\nsub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q\\n\\}| < \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is\\nindependent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it\\nis false even for ccc posets if $\\mathfrak{b} > \\aleph_1$. We also show that if\\n$|\\mathbb P| < \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of\\n$\\mathsf{GCH}$ holding below $|\\mathbb P|$.\\n   \n",
       "50283     We prove a large deviation principle for a greedy exploration process on an\\nErd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove\\nour main result, we use the general strategy to study large deviations of\\nprocesses proposed by Feng and Kurtz, based on the convergence of non-linear\\nsemigroups. The rate function can be expressed in a closed-form formula, and\\nassociated optimization problems can be solved explicitly, providing the large\\ndeviation trajectory. Also, we derive an LDP for the size of the maximum\\nindependent set discovered by such an algorithm and analyze the probability\\nthat it exceeds known bounds for the maximal independent set. We also analyze\\nthe link between these results and the landscape complexity of the independent\\nset and the exploration dynamic.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "169377    Given a compact Gelfand pair (G,K) and a locally compact group L, we\\ncharacterize the class P_K^\\sharp(G,L) of continuous positive definite\\nfunctions f:G\\times L\\to \\C which are bi-invariant in the G-variable with\\nrespect to K. The functions of this class are the functions having a uniformly\\nconvergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in\\nG,u\\in L, where the sum is over the space Z of positive definite spherical\\nfunctions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z}\\nis a family of continuous positive definite functions on L such that\\n\\sum_{\\varphi\\in Z}B(\\varphi)(e_L)<\\infty. Here e_L is the neutral element of\\nthe group L. For a compact abelian group G considered as a Gelfand pair (G,K)\\nwith trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms\\nof Fourier expansions on the dual group \\widehat{G}.\\n  The result is described in detail for the case of the Gelfand pairs\\n(O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand\\npairs.\\n  The result generalizes recent theorems of Berg-Porcu (2016) and\\nGuella-Menegatto (2016)\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "32935     This update to my article on Congruent numbers, elliptic curves, and the\\npassage from the local to the global, which appeared in Resonance, December\\n2009, pp. 1183--1205\\n(https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted\\nhere as arXiv:0704.3783, covers a few recent advances in the arithmetic of\\nelliptic curves with special reference to the congruent number problem.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "38604     Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach\\nspace with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a\\nnonexpansive and norm-distal dynamical system, then there is a fixed point of\\n$S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a\\nconsequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem\\nconcerning isometries in $L$-embedded Banach spaces. A similar statement is\\nproved for weakly compact convex subsets of a locally convex space, thus giving\\nthe nonlinear counterpart of the Ryll-Nardzewski theorem.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "132354    The bondage number $b(G)$ of a graph $G$ is the smallest number of edges\\nwhose removal from $G$ results in a graph with larger domination number. Let\\n$G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as\\npossible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently\\nfound upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the\\nEuler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper\\nbound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root\\nof the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper\\nbound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor\\n\\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for\\n$b(G)$ when the girth, order, or size of the graph $G$ is large compared with\\nits Euler characteristic $\\chi$.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "\n",
       "                                cat  \\\n",
       "139098  [math.LO, math.GN]            \n",
       "50283   [math.PR]                     \n",
       "169377  [math.CA]                     \n",
       "32935   [math.NT]                     \n",
       "38604   [math.DS, math.FA, math.GR]   \n",
       "132354  [math.CO]                     \n",
       "\n",
       "                                                                                                                                                             authors_parsed  \\\n",
       "139098  [['Brian', 'Will', ''], ['Dow', 'Alan', ''], ['Milovich', 'David', ''], ['Yengulalp', 'Lynne', '']]                                                                   \n",
       "50283   [['Bermolen', 'P.', ''], ['Goicoechea', 'V.', ''], ['Jonckheere', 'M.', ''], ['Mordecki', 'E.', '']]                                                                  \n",
       "169377  [['Berg', 'Christian', '', 'University of Copenhagen'], ['Peron', 'Ana P.', '', 'ICMC-USP-São Carlos'], ['Porcu', 'Emilio', '', 'University Federico Santa Maria']]   \n",
       "32935   [['Dalawat', 'Chandan Singh', '']]                                                                                                                                    \n",
       "38604   [['Wiśnicki', 'Andrzej', '']]                                                                                                                                         \n",
       "132354  [['Huang', 'Jia', ''], ['Shen', 'Jian', '']]                                                                                                                          \n",
       "\n",
       "       update_date          id  \n",
       "139098 2019-12-10   1912.03327  \n",
       "50283  2021-10-11   2007.04753  \n",
       "169377 2019-03-20   1612.03718  \n",
       "32935  2022-02-09   2201.11071  \n",
       "38604  2022-01-03   1903.12123  \n",
       "132354 2020-02-04   2002.00765  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Let's isolate these papers \n",
    "\n",
    "examples = data.iloc[indices]\n",
    "examples"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Cleaning"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Re-construct the categories as a list object and re-arrange the columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "ArrowInvalid",
     "evalue": "No match for FieldRef.Name(new_cat) in title: string\nabstract: string\ncat: list<item: string>\nauthors_parsed: string\nupdate_date: timestamp[us]\nid: string\n__fragment_index: int32\n__batch_index: int32\n__last_in_fragment: bool\n__filename: string",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mArrowInvalid\u001b[0m                              Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[1], line 4\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39mpandas\u001b[39;00m \u001b[39mas\u001b[39;00m \u001b[39mpd\u001b[39;00m\n\u001b[0;32m      3\u001b[0m \u001b[39m## Load data\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m df \u001b[39m=\u001b[39m pd\u001b[39m.\u001b[39;49mread_parquet(\u001b[39m'\u001b[39;49m\u001b[39m./data/arXiv.parquet\u001b[39;49m\u001b[39m'\u001b[39;49m,columns\u001b[39m=\u001b[39;49m[\u001b[39m'\u001b[39;49m\u001b[39mtitle\u001b[39;49m\u001b[39m'\u001b[39;49m , \u001b[39m'\u001b[39;49m\u001b[39mabstract\u001b[39;49m\u001b[39m'\u001b[39;49m,\u001b[39m'\u001b[39;49m\u001b[39mnew_cat\u001b[39;49m\u001b[39m'\u001b[39;49m])\n\u001b[0;32m      5\u001b[0m cat \u001b[39m=\u001b[39m df\u001b[39m.\u001b[39mnew_cat\u001b[39m.\u001b[39miloc[\u001b[39m0\u001b[39m]\n\u001b[0;32m      7\u001b[0m \u001b[39m## First we need to convert the 'stringified list' back into a list. \u001b[39;00m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pandas\\io\\parquet.py:503\u001b[0m, in \u001b[0;36mread_parquet\u001b[1;34m(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)\u001b[0m\n\u001b[0;32m    456\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[0;32m    457\u001b[0m \u001b[39mLoad a parquet object from the file path, returning a DataFrame.\u001b[39;00m\n\u001b[0;32m    458\u001b[0m \n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    499\u001b[0m \u001b[39mDataFrame\u001b[39;00m\n\u001b[0;32m    500\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[0;32m    501\u001b[0m impl \u001b[39m=\u001b[39m get_engine(engine)\n\u001b[1;32m--> 503\u001b[0m \u001b[39mreturn\u001b[39;00m impl\u001b[39m.\u001b[39mread(\n\u001b[0;32m    504\u001b[0m     path,\n\u001b[0;32m    505\u001b[0m     columns\u001b[39m=\u001b[39mcolumns,\n\u001b[0;32m    506\u001b[0m     storage_options\u001b[39m=\u001b[39mstorage_options,\n\u001b[0;32m    507\u001b[0m     use_nullable_dtypes\u001b[39m=\u001b[39muse_nullable_dtypes,\n\u001b[0;32m    508\u001b[0m     \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs,\n\u001b[0;32m    509\u001b[0m )\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pandas\\io\\parquet.py:251\u001b[0m, in \u001b[0;36mPyArrowImpl.read\u001b[1;34m(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)\u001b[0m\n\u001b[0;32m    244\u001b[0m path_or_handle, handles, kwargs[\u001b[39m\"\u001b[39m\u001b[39mfilesystem\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m _get_path_or_handle(\n\u001b[0;32m    245\u001b[0m     path,\n\u001b[0;32m    246\u001b[0m     kwargs\u001b[39m.\u001b[39mpop(\u001b[39m\"\u001b[39m\u001b[39mfilesystem\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mNone\u001b[39;00m),\n\u001b[0;32m    247\u001b[0m     storage_options\u001b[39m=\u001b[39mstorage_options,\n\u001b[0;32m    248\u001b[0m     mode\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mrb\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m    249\u001b[0m )\n\u001b[0;32m    250\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m--> 251\u001b[0m     result \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mapi\u001b[39m.\u001b[39mparquet\u001b[39m.\u001b[39mread_table(\n\u001b[0;32m    252\u001b[0m         path_or_handle, columns\u001b[39m=\u001b[39mcolumns, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs\n\u001b[0;32m    253\u001b[0m     )\u001b[39m.\u001b[39mto_pandas(\u001b[39m*\u001b[39m\u001b[39m*\u001b[39mto_pandas_kwargs)\n\u001b[0;32m    254\u001b[0m     \u001b[39mif\u001b[39;00m manager \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39marray\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[0;32m    255\u001b[0m         result \u001b[39m=\u001b[39m result\u001b[39m.\u001b[39m_as_manager(\u001b[39m\"\u001b[39m\u001b[39marray\u001b[39m\u001b[39m\"\u001b[39m, copy\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m)\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\parquet\\core.py:2986\u001b[0m, in \u001b[0;36mread_table\u001b[1;34m(source, columns, use_threads, metadata, schema, use_pandas_metadata, read_dictionary, memory_map, buffer_size, partitioning, filesystem, filters, use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit, thrift_container_size_limit)\u001b[0m\n\u001b[0;32m   2975\u001b[0m         \u001b[39m# TODO test that source is not a directory or a list\u001b[39;00m\n\u001b[0;32m   2976\u001b[0m         dataset \u001b[39m=\u001b[39m ParquetFile(\n\u001b[0;32m   2977\u001b[0m             source, metadata\u001b[39m=\u001b[39mmetadata, read_dictionary\u001b[39m=\u001b[39mread_dictionary,\n\u001b[0;32m   2978\u001b[0m             memory_map\u001b[39m=\u001b[39mmemory_map, buffer_size\u001b[39m=\u001b[39mbuffer_size,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   2983\u001b[0m             thrift_container_size_limit\u001b[39m=\u001b[39mthrift_container_size_limit,\n\u001b[0;32m   2984\u001b[0m         )\n\u001b[1;32m-> 2986\u001b[0m     \u001b[39mreturn\u001b[39;00m dataset\u001b[39m.\u001b[39;49mread(columns\u001b[39m=\u001b[39;49mcolumns, use_threads\u001b[39m=\u001b[39;49muse_threads,\n\u001b[0;32m   2987\u001b[0m                         use_pandas_metadata\u001b[39m=\u001b[39;49muse_pandas_metadata)\n\u001b[0;32m   2989\u001b[0m warnings\u001b[39m.\u001b[39mwarn(\n\u001b[0;32m   2990\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39mPassing \u001b[39m\u001b[39m'\u001b[39m\u001b[39muse_legacy_dataset=True\u001b[39m\u001b[39m'\u001b[39m\u001b[39m to get the legacy behaviour is \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m   2991\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39mdeprecated as of pyarrow 8.0.0, and the legacy implementation will \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m   2992\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39mbe removed in a future version.\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m   2993\u001b[0m     \u001b[39mFutureWarning\u001b[39;00m, stacklevel\u001b[39m=\u001b[39m\u001b[39m2\u001b[39m)\n\u001b[0;32m   2995\u001b[0m \u001b[39mif\u001b[39;00m ignore_prefixes \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\parquet\\core.py:2614\u001b[0m, in \u001b[0;36m_ParquetDatasetV2.read\u001b[1;34m(self, columns, use_threads, use_pandas_metadata)\u001b[0m\n\u001b[0;32m   2606\u001b[0m         index_columns \u001b[39m=\u001b[39m [\n\u001b[0;32m   2607\u001b[0m             col \u001b[39mfor\u001b[39;00m col \u001b[39min\u001b[39;00m _get_pandas_index_columns(metadata)\n\u001b[0;32m   2608\u001b[0m             \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(col, \u001b[39mdict\u001b[39m)\n\u001b[0;32m   2609\u001b[0m         ]\n\u001b[0;32m   2610\u001b[0m         columns \u001b[39m=\u001b[39m (\n\u001b[0;32m   2611\u001b[0m             \u001b[39mlist\u001b[39m(columns) \u001b[39m+\u001b[39m \u001b[39mlist\u001b[39m(\u001b[39mset\u001b[39m(index_columns) \u001b[39m-\u001b[39m \u001b[39mset\u001b[39m(columns))\n\u001b[0;32m   2612\u001b[0m         )\n\u001b[1;32m-> 2614\u001b[0m table \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_dataset\u001b[39m.\u001b[39;49mto_table(\n\u001b[0;32m   2615\u001b[0m     columns\u001b[39m=\u001b[39;49mcolumns, \u001b[39mfilter\u001b[39;49m\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_filter_expression,\n\u001b[0;32m   2616\u001b[0m     use_threads\u001b[39m=\u001b[39;49muse_threads\n\u001b[0;32m   2617\u001b[0m )\n\u001b[0;32m   2619\u001b[0m \u001b[39m# if use_pandas_metadata, restore the pandas metadata (which gets\u001b[39;00m\n\u001b[0;32m   2620\u001b[0m \u001b[39m# lost if doing a specific `columns` selection in to_table)\u001b[39;00m\n\u001b[0;32m   2621\u001b[0m \u001b[39mif\u001b[39;00m use_pandas_metadata:\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\_dataset.pyx:537\u001b[0m, in \u001b[0;36mpyarrow._dataset.Dataset.to_table\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\_dataset.pyx:383\u001b[0m, in \u001b[0;36mpyarrow._dataset.Dataset.scanner\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\_dataset.pyx:3202\u001b[0m, in \u001b[0;36mpyarrow._dataset.Scanner.from_dataset\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\_dataset.pyx:3120\u001b[0m, in \u001b[0;36mpyarrow._dataset.Scanner._make_scan_options\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\_dataset.pyx:3071\u001b[0m, in \u001b[0;36mpyarrow._dataset._populate_builder\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\pyarrow\\error.pxi:100\u001b[0m, in \u001b[0;36mpyarrow.lib.check_status\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;31mArrowInvalid\u001b[0m: No match for FieldRef.Name(new_cat) in title: string\nabstract: string\ncat: list<item: string>\nauthors_parsed: string\nupdate_date: timestamp[us]\nid: string\n__fragment_index: int32\n__batch_index: int32\n__last_in_fragment: bool\n__filename: string"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "## Load data\n",
    "df = pd.read_parquet('./data/arXiv.parquet',columns=['title' , 'abstract','new_cat'])\n",
    "cat = df.new_cat.iloc[0]\n",
    "\n",
    "## First we need to convert the 'stringified list' back into a list. \n",
    "\n",
    "def to_list(string):\n",
    "    out = []\n",
    "    cs = ['[',']',\"'\",\"'\"]\n",
    "    for cat in string.split(', '):\n",
    "        ## Remove brackets, string ticks\n",
    "        for char in cs:\n",
    "            cat = cat.replace(char,'')\n",
    "        ## Add to output\n",
    "        out.append(cat)\n",
    "    return out\n",
    "\n",
    "\n",
    "test = to_list(cat)\n",
    "for x in test:\n",
    "    print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Vertex representations via finite groups and t...</td>\n",
       "      <td>Given a finite group $\\Gamma$ and a virtual ...</td>\n",
       "      <td>[math.QA, hep-th, math.RT]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Categoricity and amalgamation for AEC and $ \\k...</td>\n",
       "      <td>In the original version of this paper, we as...</td>\n",
       "      <td>[math.LO]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>We describe an interesting relation between ...</td>\n",
       "      <td>[math.QA, hep-th, math.DG]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>We investigate simple examples of supersymme...</td>\n",
       "      <td>[quant-ph, hep-th, math-ph, math.MP]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>This is a slightly corrected version of an o...</td>\n",
       "      <td>[math.LO]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               title  \\\n",
       "0  Vertex representations via finite groups and t...   \n",
       "1  Categoricity and amalgamation for AEC and $ \\k...   \n",
       "2                       From Loop Groups to 2-Groups   \n",
       "3               Finite Supersymmetry Transformations   \n",
       "4         Super black box (formerly: Middle diamond)   \n",
       "\n",
       "                                            abstract  \\\n",
       "0    Given a finite group $\\Gamma$ and a virtual ...   \n",
       "1    In the original version of this paper, we as...   \n",
       "2    We describe an interesting relation between ...   \n",
       "3    We investigate simple examples of supersymme...   \n",
       "4    This is a slightly corrected version of an o...   \n",
       "\n",
       "                                    cat  \n",
       "0            [math.QA, hep-th, math.RT]  \n",
       "1                             [math.LO]  \n",
       "2            [math.QA, hep-th, math.DG]  \n",
       "3  [quant-ph, hep-th, math-ph, math.MP]  \n",
       "4                             [math.LO]  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "##  Re-load\n",
    "df = pd.read_parquet('./data/arXiv.parquet')\n",
    "\n",
    "## Create cat column with categories as a list object\n",
    "df['cat'] = df.new_cat.apply(to_list)\n",
    "\n",
    "## Drop the old 'new_cat' column\n",
    "df = df.drop('new_cat',axis=1)\n",
    "\n",
    "## Re-arrange the columns\n",
    "df = df[['title','abstract','cat','authors_parsed','update_date','id']]\n",
    "df.to_parquet('./data/arXiv.parquet')\n",
    "\n",
    "## Note the cat column is now read-in as a numpy array with string data. (This is how parquet files work)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Remove newline characters from the abstract."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "data['abstract'] = data['abstract'].str.replace('\\n',' ')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Translate the subject classifications to english and one-hot-encode them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Vertex representations via finite groups and the McKay correspondence</td>\n",
       "      <td>Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.</td>\n",
       "      <td>[Quantum Algebra, High Energy Physics - Theory, Representation Theory]</td>\n",
       "      <td>[['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9907166</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Categoricity and amalgamation for AEC and $ \\kappa $ measurable</td>\n",
       "      <td>In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda &gt; \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $&lt;\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${&lt;} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.</td>\n",
       "      <td>[Logic]</td>\n",
       "      <td>[['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9602216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.</td>\n",
       "      <td>[Quantum Algebra, High Energy Physics - Theory, Differential Geometry]</td>\n",
       "      <td>[['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]</td>\n",
       "      <td>2023-05-16</td>\n",
       "      <td>math/0504123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.</td>\n",
       "      <td>[Quantum Physics, High Energy Physics - Theory, Mathematical Physics, Mathematical Physics]</td>\n",
       "      <td>[['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]</td>\n",
       "      <td>2023-05-09</td>\n",
       "      <td>quant-ph/0401139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa &lt; \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta &lt; \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).</td>\n",
       "      <td>[Logic]</td>\n",
       "      <td>[['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-04</td>\n",
       "      <td>math/0212249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                   title  \\\n",
       "0  Vertex representations via finite groups and the McKay correspondence   \n",
       "1  Categoricity and amalgamation for AEC and $ \\kappa $ measurable         \n",
       "2  From Loop Groups to 2-Groups                                            \n",
       "3  Finite Supersymmetry Transformations                                    \n",
       "4  Super black box (formerly: Middle diamond)                              \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \\\n",
       "0    Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "1    In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda > \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $<\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${<} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "2    We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.    \n",
       "3    We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "4    This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa < \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta < \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "\n",
       "                                                                                           cat  \\\n",
       "0  [Quantum Algebra, High Energy Physics - Theory, Representation Theory]                        \n",
       "1  [Logic]                                                                                       \n",
       "2  [Quantum Algebra, High Energy Physics - Theory, Differential Geometry]                        \n",
       "3  [Quantum Physics, High Energy Physics - Theory, Mathematical Physics, Mathematical Physics]   \n",
       "4  [Logic]                                                                                       \n",
       "\n",
       "                                                                                                authors_parsed  \\\n",
       "0  [['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]                                  \n",
       "1  [['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]                                                           \n",
       "2  [['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]   \n",
       "3  [['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]                            \n",
       "4  [['Shelah', 'Saharon', '']]                                                                                   \n",
       "\n",
       "  update_date                id  \n",
       "0 2023-05-19   math/9907166      \n",
       "1 2023-05-19   math/9602216      \n",
       "2 2023-05-16   math/0504123      \n",
       "3 2023-05-09   quant-ph/0401139  \n",
       "4 2023-05-04   math/0212249      "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.preprocessing import MultiLabelBinarizer\n",
    "import util\n",
    "\n",
    "def name(cat_list):\n",
    "    out = []\n",
    "    map = util.category_map()\n",
    "    for tag in cat_list:\n",
    "        if tag not in map.keys():\n",
    "            out.append('UNK')\n",
    "        else:\n",
    "            out.append(map[tag])\n",
    "    return out\n",
    "\n",
    "data.cat = data.cat.apply(name)\n",
    "data.head()       \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Accelerator Physics</th>\n",
       "      <th>Adaptation and Self-Organizing Systems</th>\n",
       "      <th>Algebraic Geometry</th>\n",
       "      <th>Algebraic Topology</th>\n",
       "      <th>Analysis of PDEs</th>\n",
       "      <th>Applications</th>\n",
       "      <th>Applied Physics</th>\n",
       "      <th>Artificial Intelligence</th>\n",
       "      <th>Astrophysics</th>\n",
       "      <th>Astrophysics of Galaxies</th>\n",
       "      <th>...</th>\n",
       "      <th>Strongly Correlated Electrons</th>\n",
       "      <th>Subcellular Processes</th>\n",
       "      <th>Superconductivity</th>\n",
       "      <th>Symbolic Computation</th>\n",
       "      <th>Symplectic Geometry</th>\n",
       "      <th>Systems and Control</th>\n",
       "      <th>Theoretical Economics</th>\n",
       "      <th>Tissues and Organs</th>\n",
       "      <th>Trading and Market Microstructure</th>\n",
       "      <th>UNK</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 150 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Accelerator Physics  Adaptation and Self-Organizing Systems  \\\n",
       "0  0                    0                                        \n",
       "1  0                    0                                        \n",
       "2  0                    0                                        \n",
       "3  0                    0                                        \n",
       "4  0                    0                                        \n",
       "\n",
       "   Algebraic Geometry  Algebraic Topology  Analysis of PDEs  Applications  \\\n",
       "0  0                   0                   0                 0              \n",
       "1  0                   0                   0                 0              \n",
       "2  0                   0                   0                 0              \n",
       "3  0                   0                   0                 0              \n",
       "4  0                   0                   0                 0              \n",
       "\n",
       "   Applied Physics  Artificial Intelligence  Astrophysics  \\\n",
       "0  0                0                        0              \n",
       "1  0                0                        0              \n",
       "2  0                0                        0              \n",
       "3  0                0                        0              \n",
       "4  0                0                        0              \n",
       "\n",
       "   Astrophysics of Galaxies  ...  Strongly Correlated Electrons  \\\n",
       "0  0                         ...  0                               \n",
       "1  0                         ...  0                               \n",
       "2  0                         ...  0                               \n",
       "3  0                         ...  0                               \n",
       "4  0                         ...  0                               \n",
       "\n",
       "   Subcellular Processes  Superconductivity  Symbolic Computation  \\\n",
       "0  0                      0                  0                      \n",
       "1  0                      0                  0                      \n",
       "2  0                      0                  0                      \n",
       "3  0                      0                  0                      \n",
       "4  0                      0                  0                      \n",
       "\n",
       "   Symplectic Geometry  Systems and Control  Theoretical Economics  \\\n",
       "0  0                    0                    0                       \n",
       "1  0                    0                    0                       \n",
       "2  0                    0                    0                       \n",
       "3  0                    0                    0                       \n",
       "4  0                    0                    0                       \n",
       "\n",
       "   Tissues and Organs  Trading and Market Microstructure  UNK  \n",
       "0  0                   0                                  0    \n",
       "1  0                   0                                  0    \n",
       "2  0                   0                                  0    \n",
       "3  0                   0                                  0    \n",
       "4  0                   0                                  0    \n",
       "\n",
       "[5 rows x 150 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## One hot encode the english categories\n",
    "\n",
    "mlb = MultiLabelBinarizer()\n",
    "OHE_cat_array = mlb.fit_transform(data.cat)\n",
    "OHE_cat_data = pd.DataFrame(OHE_cat_array , columns = mlb.classes_)\n",
    "OHE_cat_data.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Accelerator Physics</th>\n",
       "      <th>Adaptation and Self-Organizing Systems</th>\n",
       "      <th>Algebraic Geometry</th>\n",
       "      <th>Algebraic Topology</th>\n",
       "      <th>Analysis of PDEs</th>\n",
       "      <th>Applications</th>\n",
       "      <th>Applied Physics</th>\n",
       "      <th>Artificial Intelligence</th>\n",
       "      <th>Astrophysics</th>\n",
       "      <th>Astrophysics of Galaxies</th>\n",
       "      <th>...</th>\n",
       "      <th>Strongly Correlated Electrons</th>\n",
       "      <th>Subcellular Processes</th>\n",
       "      <th>Superconductivity</th>\n",
       "      <th>Symbolic Computation</th>\n",
       "      <th>Symplectic Geometry</th>\n",
       "      <th>Systems and Control</th>\n",
       "      <th>Theoretical Economics</th>\n",
       "      <th>Tissues and Organs</th>\n",
       "      <th>Trading and Market Microstructure</th>\n",
       "      <th>UNK</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 150 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Accelerator Physics  Adaptation and Self-Organizing Systems  \\\n",
       "0  False                False                                    \n",
       "1  False                False                                    \n",
       "2  False                False                                    \n",
       "3  False                False                                    \n",
       "4  False                False                                    \n",
       "\n",
       "   Algebraic Geometry  Algebraic Topology  Analysis of PDEs  Applications  \\\n",
       "0  False               False               False             False          \n",
       "1  False               False               False             False          \n",
       "2  False               False               False             False          \n",
       "3  False               False               False             False          \n",
       "4  False               False               False             False          \n",
       "\n",
       "   Applied Physics  Artificial Intelligence  Astrophysics  \\\n",
       "0  False            False                    False          \n",
       "1  False            False                    False          \n",
       "2  False            False                    False          \n",
       "3  False            False                    False          \n",
       "4  False            False                    False          \n",
       "\n",
       "   Astrophysics of Galaxies  ...  Strongly Correlated Electrons  \\\n",
       "0  False                     ...  False                           \n",
       "1  False                     ...  False                           \n",
       "2  False                     ...  False                           \n",
       "3  False                     ...  False                           \n",
       "4  False                     ...  False                           \n",
       "\n",
       "   Subcellular Processes  Superconductivity  Symbolic Computation  \\\n",
       "0  False                  False              False                  \n",
       "1  False                  False              False                  \n",
       "2  False                  False              False                  \n",
       "3  False                  False              False                  \n",
       "4  False                  False              False                  \n",
       "\n",
       "   Symplectic Geometry  Systems and Control  Theoretical Economics  \\\n",
       "0  False                False                False                   \n",
       "1  False                False                False                   \n",
       "2  False                False                False                   \n",
       "3  False                False                False                   \n",
       "4  False                False                False                   \n",
       "\n",
       "   Tissues and Organs  Trading and Market Microstructure    UNK  \n",
       "0  False               False                              False  \n",
       "1  False               False                              False  \n",
       "2  False               False                              False  \n",
       "3  False               False                              False  \n",
       "4  False               False                              False  \n",
       "\n",
       "[5 rows x 150 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Store this data as boolean\n",
    "\n",
    "OHE_cat_data = OHE_cat_data.astype(dtype='bool')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Accelerator Physics</th>\n",
       "      <th>Adaptation and Self-Organizing Systems</th>\n",
       "      <th>Algebraic Geometry</th>\n",
       "      <th>Algebraic Topology</th>\n",
       "      <th>Analysis of PDEs</th>\n",
       "      <th>Applications</th>\n",
       "      <th>Applied Physics</th>\n",
       "      <th>Artificial Intelligence</th>\n",
       "      <th>Astrophysics</th>\n",
       "      <th>Astrophysics of Galaxies</th>\n",
       "      <th>...</th>\n",
       "      <th>Strongly Correlated Electrons</th>\n",
       "      <th>Subcellular Processes</th>\n",
       "      <th>Superconductivity</th>\n",
       "      <th>Symbolic Computation</th>\n",
       "      <th>Symplectic Geometry</th>\n",
       "      <th>Systems and Control</th>\n",
       "      <th>Theoretical Economics</th>\n",
       "      <th>Tissues and Organs</th>\n",
       "      <th>Trading and Market Microstructure</th>\n",
       "      <th>UNK</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>79265</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59743</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94748</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36055</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34908</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125441</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139894</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17363</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73015</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121373</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100029</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158077</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55299</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106954</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>156511</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38431</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174699</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158460</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44682</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>134265</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157521</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84402</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149114</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132594</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34793</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133942</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81413</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170547</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77366</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67126</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15773</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167140</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>134908</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130866</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124993</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21799</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55986</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133618</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76319</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55552</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70969</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42821</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98648</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99294</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80464</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126915</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136317</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55150</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139772</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25873</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>50 rows × 150 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        Accelerator Physics  Adaptation and Self-Organizing Systems  \\\n",
       "79265   False                False                                    \n",
       "59743   False                False                                    \n",
       "94748   False                False                                    \n",
       "36055   False                False                                    \n",
       "34908   False                False                                    \n",
       "125441  False                False                                    \n",
       "139894  False                False                                    \n",
       "17363   False                False                                    \n",
       "73015   False                False                                    \n",
       "121373  False                False                                    \n",
       "100029  False                False                                    \n",
       "158077  False                False                                    \n",
       "55299   False                False                                    \n",
       "106954  False                False                                    \n",
       "156511  False                False                                    \n",
       "38431   False                False                                    \n",
       "174699  False                False                                    \n",
       "158460  False                False                                    \n",
       "44682   False                False                                    \n",
       "134265  False                False                                    \n",
       "157521  False                False                                    \n",
       "84402   False                False                                    \n",
       "149114  False                False                                    \n",
       "132594  False                False                                    \n",
       "34793   False                False                                    \n",
       "133942  False                False                                    \n",
       "81413   False                False                                    \n",
       "170547  False                False                                    \n",
       "77366   False                False                                    \n",
       "67126   False                False                                    \n",
       "15773   False                False                                    \n",
       "167140  False                False                                    \n",
       "134908  False                False                                    \n",
       "130866  False                False                                    \n",
       "124993  False                False                                    \n",
       "21799   False                False                                    \n",
       "55986   False                False                                    \n",
       "133618  False                False                                    \n",
       "76319   False                False                                    \n",
       "55552   False                False                                    \n",
       "70969   False                False                                    \n",
       "42821   False                False                                    \n",
       "98648   False                False                                    \n",
       "99294   False                False                                    \n",
       "80464   False                False                                    \n",
       "126915  False                False                                    \n",
       "136317  False                False                                    \n",
       "55150   False                False                                    \n",
       "139772  False                False                                    \n",
       "25873   False                False                                    \n",
       "\n",
       "        Algebraic Geometry  Algebraic Topology  Analysis of PDEs  \\\n",
       "79265   False               False               False              \n",
       "59743   False               False               False              \n",
       "94748   False               True                False              \n",
       "36055   False               False               False              \n",
       "34908   False               False               False              \n",
       "125441  False               False               False              \n",
       "139894  False               False               True               \n",
       "17363   False               False               False              \n",
       "73015   False               False               False              \n",
       "121373  False               False               False              \n",
       "100029  False               False               False              \n",
       "158077  False               False               False              \n",
       "55299   False               False               False              \n",
       "106954  False               False               True               \n",
       "156511  False               False               False              \n",
       "38431   False               False               True               \n",
       "174699  False               False               False              \n",
       "158460  False               False               False              \n",
       "44682   False               False               False              \n",
       "134265  False               True                False              \n",
       "157521  False               False               False              \n",
       "84402   False               False               False              \n",
       "149114  False               False               False              \n",
       "132594  False               False               False              \n",
       "34793   False               False               False              \n",
       "133942  False               False               False              \n",
       "81413   False               False               False              \n",
       "170547  False               False               False              \n",
       "77366   False               False               False              \n",
       "67126   False               False               False              \n",
       "15773   False               False               False              \n",
       "167140  False               False               False              \n",
       "134908  False               False               False              \n",
       "130866  False               False               True               \n",
       "124993  False               False               False              \n",
       "21799   False               False               False              \n",
       "55986   False               True                False              \n",
       "133618  False               False               False              \n",
       "76319   False               False               False              \n",
       "55552   False               False               False              \n",
       "70969   False               False               False              \n",
       "42821   False               False               False              \n",
       "98648   False               False               False              \n",
       "99294   False               False               False              \n",
       "80464   False               False               False              \n",
       "126915  False               False               False              \n",
       "136317  True                False               False              \n",
       "55150   False               False               False              \n",
       "139772  True                False               False              \n",
       "25873   False               False               False              \n",
       "\n",
       "        Applications  Applied Physics  Artificial Intelligence  Astrophysics  \\\n",
       "79265   False         False            False                    False          \n",
       "59743   False         False            False                    False          \n",
       "94748   False         False            False                    False          \n",
       "36055   False         False            False                    False          \n",
       "34908   False         False            False                    False          \n",
       "125441  False         False            False                    False          \n",
       "139894  False         False            False                    False          \n",
       "17363   False         False            False                    False          \n",
       "73015   False         False            False                    False          \n",
       "121373  False         False            False                    False          \n",
       "100029  False         False            False                    False          \n",
       "158077  False         False            False                    False          \n",
       "55299   False         False            False                    False          \n",
       "106954  False         False            False                    False          \n",
       "156511  False         False            False                    False          \n",
       "38431   False         False            False                    False          \n",
       "174699  False         False            False                    False          \n",
       "158460  False         False            False                    False          \n",
       "44682   False         False            False                    False          \n",
       "134265  False         False            False                    False          \n",
       "157521  False         False            False                    False          \n",
       "84402   False         False            False                    False          \n",
       "149114  False         False            False                    False          \n",
       "132594  False         False            False                    False          \n",
       "34793   False         False            False                    False          \n",
       "133942  False         False            False                    False          \n",
       "81413   False         False            False                    False          \n",
       "170547  False         False            False                    False          \n",
       "77366   False         False            False                    False          \n",
       "67126   False         False            False                    False          \n",
       "15773   False         False            False                    False          \n",
       "167140  False         False            False                    False          \n",
       "134908  False         False            False                    False          \n",
       "130866  False         False            False                    False          \n",
       "124993  False         False            False                    False          \n",
       "21799   False         False            False                    False          \n",
       "55986   False         False            False                    False          \n",
       "133618  False         False            False                    False          \n",
       "76319   False         False            False                    False          \n",
       "55552   False         False            False                    False          \n",
       "70969   False         False            False                    False          \n",
       "42821   False         False            False                    False          \n",
       "98648   False         False            False                    False          \n",
       "99294   False         False            False                    False          \n",
       "80464   False         False            False                    False          \n",
       "126915  False         False            False                    False          \n",
       "136317  False         False            False                    False          \n",
       "55150   False         False            False                    False          \n",
       "139772  False         False            False                    False          \n",
       "25873   False         False            False                    False          \n",
       "\n",
       "        Astrophysics of Galaxies  ...  Strongly Correlated Electrons  \\\n",
       "79265   False                     ...  False                           \n",
       "59743   False                     ...  False                           \n",
       "94748   False                     ...  False                           \n",
       "36055   False                     ...  False                           \n",
       "34908   False                     ...  False                           \n",
       "125441  False                     ...  False                           \n",
       "139894  False                     ...  False                           \n",
       "17363   False                     ...  False                           \n",
       "73015   False                     ...  False                           \n",
       "121373  False                     ...  False                           \n",
       "100029  False                     ...  False                           \n",
       "158077  False                     ...  False                           \n",
       "55299   False                     ...  False                           \n",
       "106954  False                     ...  False                           \n",
       "156511  False                     ...  False                           \n",
       "38431   False                     ...  False                           \n",
       "174699  False                     ...  False                           \n",
       "158460  False                     ...  False                           \n",
       "44682   False                     ...  False                           \n",
       "134265  False                     ...  False                           \n",
       "157521  False                     ...  False                           \n",
       "84402   False                     ...  False                           \n",
       "149114  False                     ...  False                           \n",
       "132594  False                     ...  False                           \n",
       "34793   False                     ...  False                           \n",
       "133942  False                     ...  False                           \n",
       "81413   False                     ...  False                           \n",
       "170547  False                     ...  False                           \n",
       "77366   False                     ...  False                           \n",
       "67126   False                     ...  False                           \n",
       "15773   False                     ...  False                           \n",
       "167140  False                     ...  False                           \n",
       "134908  False                     ...  False                           \n",
       "130866  False                     ...  False                           \n",
       "124993  False                     ...  False                           \n",
       "21799   False                     ...  False                           \n",
       "55986   False                     ...  False                           \n",
       "133618  False                     ...  False                           \n",
       "76319   False                     ...  False                           \n",
       "55552   False                     ...  False                           \n",
       "70969   False                     ...  False                           \n",
       "42821   False                     ...  False                           \n",
       "98648   False                     ...  False                           \n",
       "99294   False                     ...  False                           \n",
       "80464   False                     ...  False                           \n",
       "126915  False                     ...  False                           \n",
       "136317  False                     ...  False                           \n",
       "55150   False                     ...  False                           \n",
       "139772  False                     ...  False                           \n",
       "25873   False                     ...  False                           \n",
       "\n",
       "        Subcellular Processes  Superconductivity  Symbolic Computation  \\\n",
       "79265   False                  False              False                  \n",
       "59743   False                  False              False                  \n",
       "94748   False                  False              False                  \n",
       "36055   False                  False              False                  \n",
       "34908   False                  False              False                  \n",
       "125441  False                  False              False                  \n",
       "139894  False                  False              False                  \n",
       "17363   False                  False              False                  \n",
       "73015   False                  False              False                  \n",
       "121373  False                  False              False                  \n",
       "100029  False                  False              False                  \n",
       "158077  False                  False              False                  \n",
       "55299   False                  False              False                  \n",
       "106954  False                  False              False                  \n",
       "156511  False                  False              False                  \n",
       "38431   False                  False              False                  \n",
       "174699  False                  False              False                  \n",
       "158460  False                  False              False                  \n",
       "44682   False                  False              False                  \n",
       "134265  False                  False              False                  \n",
       "157521  False                  False              False                  \n",
       "84402   False                  False              False                  \n",
       "149114  False                  False              False                  \n",
       "132594  False                  False              False                  \n",
       "34793   False                  False              False                  \n",
       "133942  False                  False              False                  \n",
       "81413   False                  False              False                  \n",
       "170547  False                  False              False                  \n",
       "77366   False                  False              False                  \n",
       "67126   False                  False              False                  \n",
       "15773   False                  False              False                  \n",
       "167140  False                  False              False                  \n",
       "134908  False                  False              False                  \n",
       "130866  False                  False              False                  \n",
       "124993  False                  False              False                  \n",
       "21799   False                  False              False                  \n",
       "55986   False                  False              False                  \n",
       "133618  False                  False              False                  \n",
       "76319   False                  False              False                  \n",
       "55552   False                  False              False                  \n",
       "70969   False                  False              False                  \n",
       "42821   False                  False              False                  \n",
       "98648   False                  False              False                  \n",
       "99294   False                  False              False                  \n",
       "80464   False                  False              False                  \n",
       "126915  False                  False              False                  \n",
       "136317  False                  False              False                  \n",
       "55150   False                  False              False                  \n",
       "139772  False                  False              False                  \n",
       "25873   False                  False              False                  \n",
       "\n",
       "        Symplectic Geometry  Systems and Control  Theoretical Economics  \\\n",
       "79265   False                False                False                   \n",
       "59743   False                False                False                   \n",
       "94748   False                False                False                   \n",
       "36055   False                True                 False                   \n",
       "34908   False                False                False                   \n",
       "125441  False                False                False                   \n",
       "139894  False                False                False                   \n",
       "17363   False                False                False                   \n",
       "73015   False                False                False                   \n",
       "121373  False                True                 False                   \n",
       "100029  False                False                False                   \n",
       "158077  False                False                False                   \n",
       "55299   False                False                False                   \n",
       "106954  False                False                False                   \n",
       "156511  False                False                False                   \n",
       "38431   False                False                False                   \n",
       "174699  False                False                False                   \n",
       "158460  False                False                False                   \n",
       "44682   False                False                False                   \n",
       "134265  False                False                False                   \n",
       "157521  False                False                False                   \n",
       "84402   False                False                False                   \n",
       "149114  False                False                False                   \n",
       "132594  False                False                False                   \n",
       "34793   False                False                False                   \n",
       "133942  False                False                False                   \n",
       "81413   False                False                False                   \n",
       "170547  False                False                False                   \n",
       "77366   False                False                False                   \n",
       "67126   False                False                False                   \n",
       "15773   False                False                False                   \n",
       "167140  False                False                False                   \n",
       "134908  False                False                False                   \n",
       "130866  False                False                False                   \n",
       "124993  False                False                False                   \n",
       "21799   False                False                False                   \n",
       "55986   False                False                False                   \n",
       "133618  False                False                False                   \n",
       "76319   False                False                False                   \n",
       "55552   False                False                False                   \n",
       "70969   False                False                False                   \n",
       "42821   False                False                False                   \n",
       "98648   False                False                False                   \n",
       "99294   False                False                False                   \n",
       "80464   False                False                False                   \n",
       "126915  False                False                False                   \n",
       "136317  False                False                False                   \n",
       "55150   False                False                False                   \n",
       "139772  False                False                False                   \n",
       "25873   False                False                False                   \n",
       "\n",
       "        Tissues and Organs  Trading and Market Microstructure    UNK  \n",
       "79265   False               False                              False  \n",
       "59743   False               False                              False  \n",
       "94748   False               False                              False  \n",
       "36055   False               False                              False  \n",
       "34908   False               False                              False  \n",
       "125441  False               False                              False  \n",
       "139894  False               False                              False  \n",
       "17363   False               False                              False  \n",
       "73015   False               False                              False  \n",
       "121373  False               False                              False  \n",
       "100029  False               False                              False  \n",
       "158077  False               False                              False  \n",
       "55299   False               False                              False  \n",
       "106954  False               False                              False  \n",
       "156511  False               False                              False  \n",
       "38431   False               False                              False  \n",
       "174699  False               False                              False  \n",
       "158460  False               False                              False  \n",
       "44682   False               False                              False  \n",
       "134265  False               False                              False  \n",
       "157521  False               False                              False  \n",
       "84402   False               False                              False  \n",
       "149114  False               False                              False  \n",
       "132594  False               False                              False  \n",
       "34793   False               False                              False  \n",
       "133942  False               False                              False  \n",
       "81413   False               False                              False  \n",
       "170547  False               False                              False  \n",
       "77366   False               False                              False  \n",
       "67126   False               False                              False  \n",
       "15773   False               False                              False  \n",
       "167140  False               False                              False  \n",
       "134908  False               False                              False  \n",
       "130866  False               False                              False  \n",
       "124993  False               False                              False  \n",
       "21799   False               False                              False  \n",
       "55986   False               False                              False  \n",
       "133618  False               False                              False  \n",
       "76319   False               False                              False  \n",
       "55552   False               False                              False  \n",
       "70969   False               False                              False  \n",
       "42821   False               False                              False  \n",
       "98648   False               False                              False  \n",
       "99294   False               False                              False  \n",
       "80464   False               False                              False  \n",
       "126915  False               False                              False  \n",
       "136317  False               False                              False  \n",
       "55150   False               False                              False  \n",
       "139772  False               False                              False  \n",
       "25873   False               False                              False  \n",
       "\n",
       "[50 rows x 150 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Check to see if this makes sense\n",
    "OHE_cat_data.sample(50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Let's store the category dataframe separately. Since the original dataframe and the OHE cats have\n",
    "## The same index, they can be recovered easily.\n",
    "\n",
    "OHE_cat_data.to_parquet('./data/arXiv_cat.parquet')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "      <th>Accelerator Physics</th>\n",
       "      <th>Adaptation and Self-Organizing Systems</th>\n",
       "      <th>Algebraic Geometry</th>\n",
       "      <th>Algebraic Topology</th>\n",
       "      <th>Analysis of PDEs</th>\n",
       "      <th>...</th>\n",
       "      <th>Strongly Correlated Electrons</th>\n",
       "      <th>Subcellular Processes</th>\n",
       "      <th>Superconductivity</th>\n",
       "      <th>Symbolic Computation</th>\n",
       "      <th>Symplectic Geometry</th>\n",
       "      <th>Systems and Control</th>\n",
       "      <th>Theoretical Economics</th>\n",
       "      <th>Tissues and Organs</th>\n",
       "      <th>Trading and Market Microstructure</th>\n",
       "      <th>UNK</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Vertex representations via finite groups and the McKay correspondence</td>\n",
       "      <td>Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we\\nconstruct a Fock space and associated vertex operators in terms of\\nrepresentation ring of wreath products $\\Gamma\\sim S_n$. We recover the\\ncharacter tables of wreath products $\\Gamma\\sim S_n$ by vertex operator\\ncalculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields\\na group theoretic realization of the basic representations of the affine and\\ntoroidal Lie algebras of $ADE$ type, which can be regarded as a new form of\\nMcKay correspondence.\\n</td>\n",
       "      <td>[['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9907166</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Categoricity and amalgamation for AEC and $ \\kappa $ measurable</td>\n",
       "      <td>In the original version of this paper, we assume a theory $T$ that the logic\\n$\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda &gt;\\n\\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class\\nof model of $T$ of cardinality $&lt;\\lambda$ (but $\\geq |T|+\\kappa$) has the\\namalgamation property; this is a step toward understanding the character of\\nsuch classes of models.\\n  In this revised version we replaced the class of models of $T$ by $\\mathfrak\\nk$, an AEC (abstract elementary class) which has LS-number ${&lt;} \\, \\kappa,$ or\\nat least which behave nicely for ultrapowers by $D$, a normal ultra-filter on\\n$\\kappa$.\\n  Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+},\\n\\aleph_{0}}$ (and so does a large part of the introduction and little in the\\nrest of \\S1), but otherwise, all is done in the context of AEC.\\n</td>\n",
       "      <td>[['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9602216</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>We describe an interesting relation between Lie 2-algebras, the Kac-Moody\\ncentral extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie\\n2-algebra is a categorified version of a Lie algebra where the Jacobi identity\\nholds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie\\n2-group is a categorified version of a Lie group. If $G$ is a simply-connected\\ncompact simple Lie group, there is a 1-parameter family of Lie 2-algebras\\n$\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects,\\nbut with a Jacobiator built from the canonical 3-form on $G$. There appears to\\nbe no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k\\n= 0$. Here, however, we construct for integral k an infinite-dimensional Lie\\n2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of\\nthis 2-group are based paths in $G$, while the automorphisms of any object form\\nthe level-$k$ Kac-Moody central extension of the loop group of $G$. This\\n2-group is closely related to the $k$th power of the canonical gerbe over $G$.\\nIts nerve gives a topological group that is an extension of $G$ by\\n$K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be\\nobtained by killing the third homotopy group of $G$. Thus, when $G =\\n\\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.\\n</td>\n",
       "      <td>[['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]</td>\n",
       "      <td>2023-05-16</td>\n",
       "      <td>math/0504123</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>We investigate simple examples of supersymmetry algebras with real and\\nGrassmann parameters. Special attention is payed to the finite\\nsupertransformations and their probability interpretation. Furthermore we look\\nfor combinations of bosons and fermions which are invariant under\\nsupertransformations. These combinations correspond to states that are highly\\nentangled.\\n</td>\n",
       "      <td>[['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]</td>\n",
       "      <td>2023-05-09</td>\n",
       "      <td>quant-ph/0401139</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>This is a slightly corrected version of an old work.\\n  Under certain cardinal arithmetic assumptions, we prove that for every large\\nenough regular $\\lambda$ cardinal, for many regular $\\kappa &lt; \\lambda$, many\\nstationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super\\nBB. In particular, we have the super BB on $\\{\\delta &lt; \\lambda \\colon\\ncf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.\\n  We have added some details. Works continuing it are [Sh:898] and [Sh:1028].\\nWe thank Ari Brodski and Adi Jarden for their helpful comments.\\n  In this paper we had earlier used the notion ``middle diamond\" which is now\\nreplaced by ``super BB'', that is, ``super black box'', in order to be\\nconsistent with other papers (see [Sh:898]).\\n</td>\n",
       "      <td>[['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-04</td>\n",
       "      <td>math/0212249</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>...</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 155 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                   title  \\\n",
       "0  Vertex representations via finite groups and the McKay correspondence   \n",
       "1  Categoricity and amalgamation for AEC and $ \\kappa $ measurable         \n",
       "2  From Loop Groups to 2-Groups                                            \n",
       "3  Finite Supersymmetry Transformations                                    \n",
       "4  Super black box (formerly: Middle diamond)                              \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        abstract  \\\n",
       "0    Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we\\nconstruct a Fock space and associated vertex operators in terms of\\nrepresentation ring of wreath products $\\Gamma\\sim S_n$. We recover the\\ncharacter tables of wreath products $\\Gamma\\sim S_n$ by vertex operator\\ncalculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields\\na group theoretic realization of the basic representations of the affine and\\ntoroidal Lie algebras of $ADE$ type, which can be regarded as a new form of\\nMcKay correspondence.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              \n",
       "1    In the original version of this paper, we assume a theory $T$ that the logic\\n$\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda >\\n\\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class\\nof model of $T$ of cardinality $<\\lambda$ (but $\\geq |T|+\\kappa$) has the\\namalgamation property; this is a step toward understanding the character of\\nsuch classes of models.\\n  In this revised version we replaced the class of models of $T$ by $\\mathfrak\\nk$, an AEC (abstract elementary class) which has LS-number ${<} \\, \\kappa,$ or\\nat least which behave nicely for ultrapowers by $D$, a normal ultra-filter on\\n$\\kappa$.\\n  Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+},\\n\\aleph_{0}}$ (and so does a large part of the introduction and little in the\\nrest of \\S1), but otherwise, all is done in the context of AEC.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "2    We describe an interesting relation between Lie 2-algebras, the Kac-Moody\\ncentral extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie\\n2-algebra is a categorified version of a Lie algebra where the Jacobi identity\\nholds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie\\n2-group is a categorified version of a Lie group. If $G$ is a simply-connected\\ncompact simple Lie group, there is a 1-parameter family of Lie 2-algebras\\n$\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects,\\nbut with a Jacobiator built from the canonical 3-form on $G$. There appears to\\nbe no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k\\n= 0$. Here, however, we construct for integral k an infinite-dimensional Lie\\n2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of\\nthis 2-group are based paths in $G$, while the automorphisms of any object form\\nthe level-$k$ Kac-Moody central extension of the loop group of $G$. This\\n2-group is closely related to the $k$th power of the canonical gerbe over $G$.\\nIts nerve gives a topological group that is an extension of $G$ by\\n$K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be\\nobtained by killing the third homotopy group of $G$. Thus, when $G =\\n\\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.\\n   \n",
       "3    We investigate simple examples of supersymmetry algebras with real and\\nGrassmann parameters. Special attention is payed to the finite\\nsupertransformations and their probability interpretation. Furthermore we look\\nfor combinations of bosons and fermions which are invariant under\\nsupertransformations. These combinations correspond to states that are highly\\nentangled.\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \n",
       "4    This is a slightly corrected version of an old work.\\n  Under certain cardinal arithmetic assumptions, we prove that for every large\\nenough regular $\\lambda$ cardinal, for many regular $\\kappa < \\lambda$, many\\nstationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super\\nBB. In particular, we have the super BB on $\\{\\delta < \\lambda \\colon\\ncf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.\\n  We have added some details. Works continuing it are [Sh:898] and [Sh:1028].\\nWe thank Ari Brodski and Adi Jarden for their helpful comments.\\n  In this paper we had earlier used the notion ``middle diamond\" which is now\\nreplaced by ``super BB'', that is, ``super black box'', in order to be\\nconsistent with other papers (see [Sh:898]).\\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "\n",
       "                                                                                                authors_parsed  \\\n",
       "0  [['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]                                  \n",
       "1  [['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]                                                           \n",
       "2  [['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]   \n",
       "3  [['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]                            \n",
       "4  [['Shelah', 'Saharon', '']]                                                                                   \n",
       "\n",
       "  update_date                id  Accelerator Physics  \\\n",
       "0 2023-05-19   math/9907166      False                 \n",
       "1 2023-05-19   math/9602216      False                 \n",
       "2 2023-05-16   math/0504123      False                 \n",
       "3 2023-05-09   quant-ph/0401139  False                 \n",
       "4 2023-05-04   math/0212249      False                 \n",
       "\n",
       "   Adaptation and Self-Organizing Systems  Algebraic Geometry  \\\n",
       "0  False                                   False                \n",
       "1  False                                   False                \n",
       "2  False                                   False                \n",
       "3  False                                   False                \n",
       "4  False                                   False                \n",
       "\n",
       "   Algebraic Topology  Analysis of PDEs  ...  Strongly Correlated Electrons  \\\n",
       "0  False               False             ...  False                           \n",
       "1  False               False             ...  False                           \n",
       "2  False               False             ...  False                           \n",
       "3  False               False             ...  False                           \n",
       "4  False               False             ...  False                           \n",
       "\n",
       "   Subcellular Processes  Superconductivity  Symbolic Computation  \\\n",
       "0  False                  False              False                  \n",
       "1  False                  False              False                  \n",
       "2  False                  False              False                  \n",
       "3  False                  False              False                  \n",
       "4  False                  False              False                  \n",
       "\n",
       "   Symplectic Geometry  Systems and Control  Theoretical Economics  \\\n",
       "0  False                False                False                   \n",
       "1  False                False                False                   \n",
       "2  False                False                False                   \n",
       "3  False                False                False                   \n",
       "4  False                False                False                   \n",
       "\n",
       "   Tissues and Organs  Trading and Market Microstructure    UNK  \n",
       "0  False               False                              False  \n",
       "1  False               False                              False  \n",
       "2  False               False                              False  \n",
       "3  False               False                              False  \n",
       "4  False               False                              False  \n",
       "\n",
       "[5 rows x 155 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## As an example we reconstruct the full dataframe from the two parts:\n",
    "papers = pd.read_parquet('./data/arXiv.parquet')\n",
    "papers = papers.drop('cat',axis=1)\n",
    "papers_cat = pd.read_parquet('./data/arXiv_cat.parquet')\n",
    "\n",
    "full_papers = papers.join(papers_cat,how='left')\n",
    "full_papers.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 1527 geometric PDE papers in this dataset.\n",
      "There are 175000 papers total in this dataset.\n"
     ]
    }
   ],
   "source": [
    "## Another example: Retrieve all articles which are tagged with both PDEs and diff geo.\n",
    "\n",
    "geo_pde = papers.loc[papers_cat['Analysis of PDEs'] & papers_cat['Differential Geometry'] == True]\n",
    "geo_pde.sample(20)\n",
    "print(f'There are {len(geo_pde)} geometric PDE papers in this dataset.')\n",
    "print(f'There are {len(papers)} papers total in this dataset.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "## How much smaller is the original dataframe on disk if we drop the category information entirely?\n",
    "\n",
    "data.head()\n",
    "test = data.drop('cat',axis=1)\n",
    "test.to_parquet('./data/arXiv_no_cat.parquet')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point we have created and saved a separate parquet file consisting of the boolean OHE categories.\n",
    "Since  dropping the category info from the original dataframe only reduces it by about ~8 mb, don't bother."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Naive latex removal and some pitfall examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Now we are going to look at some consequences of removing latex.\n",
    "## Study the examples from above\n",
    "import pandas as pd\n",
    "pd.set_option('display.max_colwidth', 0)\n",
    "\n",
    "indices = [139098,50283,169377,32935,38604,132354]\n",
    "examples = pd.DataFrame(pd.read_parquet('./data/arXiv.parquet',columns=['abstract']).iloc[indices])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Vertex representations via finite groups and the McKay correspondence</td>\n",
       "      <td>Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.</td>\n",
       "      <td>[math.QA, hep-th, math.RT]</td>\n",
       "      <td>[['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9907166</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Categoricity and amalgamation for AEC and $ \\kappa $ measurable</td>\n",
       "      <td>In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda &gt; \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $&lt;\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${&lt;} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.</td>\n",
       "      <td>[math.LO]</td>\n",
       "      <td>[['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9602216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.</td>\n",
       "      <td>[math.QA, hep-th, math.DG]</td>\n",
       "      <td>[['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]</td>\n",
       "      <td>2023-05-16</td>\n",
       "      <td>math/0504123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.</td>\n",
       "      <td>[quant-ph, hep-th, math-ph, math.MP]</td>\n",
       "      <td>[['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]</td>\n",
       "      <td>2023-05-09</td>\n",
       "      <td>quant-ph/0401139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa &lt; \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta &lt; \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).</td>\n",
       "      <td>[math.LO]</td>\n",
       "      <td>[['Shelah', 'Saharon', '']]</td>\n",
       "      <td>2023-05-04</td>\n",
       "      <td>math/0212249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                   title  \\\n",
       "0  Vertex representations via finite groups and the McKay correspondence   \n",
       "1  Categoricity and amalgamation for AEC and $ \\kappa $ measurable         \n",
       "2  From Loop Groups to 2-Groups                                            \n",
       "3  Finite Supersymmetry Transformations                                    \n",
       "4  Super black box (formerly: Middle diamond)                              \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \\\n",
       "0    Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "1    In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda > \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $<\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${<} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "2    We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.    \n",
       "3    We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "4    This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa < \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta < \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "\n",
       "                                    cat  \\\n",
       "0  [math.QA, hep-th, math.RT]             \n",
       "1  [math.LO]                              \n",
       "2  [math.QA, hep-th, math.DG]             \n",
       "3  [quant-ph, hep-th, math-ph, math.MP]   \n",
       "4  [math.LO]                              \n",
       "\n",
       "                                                                                                authors_parsed  \\\n",
       "0  [['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]                                  \n",
       "1  [['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]                                                           \n",
       "2  [['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]   \n",
       "3  [['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]                            \n",
       "4  [['Shelah', 'Saharon', '']]                                                                                   \n",
       "\n",
       "  update_date                id  \n",
       "0 2023-05-19   math/9907166      \n",
       "1 2023-05-19   math/9602216      \n",
       "2 2023-05-16   math/0504123      \n",
       "3 2023-05-09   quant-ph/0401139  \n",
       "4 2023-05-04   math/0212249      "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Forgot to remove the new line characters:\n",
    "\n",
    "data = pd.read_parquet('./data/arXiv.parquet')\n",
    "data.abstract = data.abstract.str.replace('\\n',' ')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Rewrite the data with the newline chars removed\n",
    "data.to_parquet('./data/arXiv.parquet')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>139098</th>\n",
       "      <td>Telgarsky's conjecture may fail</td>\n",
       "      <td>Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a topological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player {\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a $T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding argument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under $\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that $\\mathsf{GCH}+\\square$ implies the following statement, which is equivalent to the existence of the \"nice'' $\\pi$-bases mentioned above: \\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense sub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q \\}| &lt; \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is independent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it is false even for ccc posets if $\\mathfrak{b} &gt; \\aleph_1$. We also show that if $|\\mathbb P| &lt; \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of $\\mathsf{GCH}$ holding below $|\\mathbb P|$.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50283</th>\n",
       "      <td>Large Deviation Principle for the Greedy Exploration Algorithm over\\n  Erd\\\"os-R\\'enyi Graphs</td>\n",
       "      <td>We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169377</th>\n",
       "      <td>Orthogonal expansions related to compact Gelfand pairs</td>\n",
       "      <td>Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)&lt;\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32935</th>\n",
       "      <td>Congruent numbers, elliptic curves, and the passage from the local to\\n  the global: an update</td>\n",
       "      <td>This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38604</th>\n",
       "      <td>Around the nonlinear Ryll-Nardzewski theorem</td>\n",
       "      <td>Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a nonexpansive and norm-distal dynamical system, then there is a fixed point of $S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in $L$-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132354</th>\n",
       "      <td>New upper bounds for the bondage number of a graph in terms of its\\n  maximum degree and Euler characteristic</td>\n",
       "      <td>The bondage number $b(G)$ of a graph $G$ is the smallest number of edges whose removal from $G$ results in a graph with larger domination number. Let $G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as possible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently found upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the Euler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper bound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root of the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper bound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor \\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for $b(G)$ when the girth, order, or size of the graph $G$ is large compared with its Euler characteristic $\\chi$.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                title  \\\n",
       "139098  Telgarsky's conjecture may fail                                                                                 \n",
       "50283   Large Deviation Principle for the Greedy Exploration Algorithm over\\n  Erd\\\"os-R\\'enyi Graphs                   \n",
       "169377  Orthogonal expansions related to compact Gelfand pairs                                                          \n",
       "32935   Congruent numbers, elliptic curves, and the passage from the local to\\n  the global: an update                  \n",
       "38604   Around the nonlinear Ryll-Nardzewski theorem                                                                    \n",
       "132354  New upper bounds for the bondage number of a graph in terms of its\\n  maximum degree and Euler characteristic   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \n",
       "139098    Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a topological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player {\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a $T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding argument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under $\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that $\\mathsf{GCH}+\\square$ implies the following statement, which is equivalent to the existence of the \"nice'' $\\pi$-bases mentioned above: \\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense sub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q \\}| < \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is independent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it is false even for ccc posets if $\\mathfrak{b} > \\aleph_1$. We also show that if $|\\mathbb P| < \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of $\\mathsf{GCH}$ holding below $|\\mathbb P|$.   \n",
       "50283     We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "169377    Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)<\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "32935     This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "38604     Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a nonexpansive and norm-distal dynamical system, then there is a fixed point of $S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in $L$-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "132354    The bondage number $b(G)$ of a graph $G$ is the smallest number of edges whose removal from $G$ results in a graph with larger domination number. Let $G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as possible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently found upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the Euler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper bound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root of the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper bound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor \\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for $b(G)$ when the girth, order, or size of the graph $G$ is large compared with its Euler characteristic $\\chi$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "examples = pd.read_parquet('./data/arXiv.parquet',columns=['title','abstract']).iloc[indices]\n",
    "examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abstract</th>\n",
       "      <th>inline_removed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>139098</th>\n",
       "      <td>Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a topological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player {\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a $T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding argument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under $\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that $\\mathsf{GCH}+\\square$ implies the following statement, which is equivalent to the existence of the \"nice'' $\\pi$-bases mentioned above: \\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense sub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q \\}| &lt; \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is independent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it is false even for ccc posets if $\\mathfrak{b} &gt; \\aleph_1$. We also show that if $|\\mathbb P| &lt; \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of $\\mathsf{GCH}$ holding below $|\\mathbb P|$.</td>\n",
       "      <td>Telg\\'arsky's conjecture states that for each MATH, there is a topological space MATH such that in the Banach-Mazur game on MATH, the player {\\scriptsize NONEMPTY} has a winning MATH-tactic but no winning MATH-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming MATH, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a MATH space MATH, then she has a winning MATH-tactic. The proof uses a coding argument due to Galvin, whereby if MATH has a MATH-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under MATH, every MATH space has a sufficiently nice MATH-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that MATH implies the following statement, which is equivalent to the existence of the \"nice'' MATH-bases mentioned above: \\emph{Every separative poset MATH with the MATH-cc contains a dense sub-poset MATH such that MATH for every MATH.} We prove that this statement is independent of MATH: while it holds under MATH, it is false even for ccc posets if MATH. We also show that if MATH, then \\axiom-for-MATH is a consequence of MATH holding below MATH.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50283</th>\n",
       "      <td>We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.</td>\n",
       "      <td>We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169377</th>\n",
       "      <td>Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)&lt;\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)</td>\n",
       "      <td>Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)&lt;\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32935</th>\n",
       "      <td>This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.</td>\n",
       "      <td>This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38604</th>\n",
       "      <td>Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a nonexpansive and norm-distal dynamical system, then there is a fixed point of $S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in $L$-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.</td>\n",
       "      <td>Suppose that MATH is a weakMATH compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if MATH is a nonexpansive and norm-distal dynamical system, then there is a fixed point of MATH in MATH and the set of fixed points is a nonexpansive retract of MATH As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in MATH-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132354</th>\n",
       "      <td>The bondage number $b(G)$ of a graph $G$ is the smallest number of edges whose removal from $G$ results in a graph with larger domination number. Let $G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as possible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently found upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the Euler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper bound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root of the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper bound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor \\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for $b(G)$ when the girth, order, or size of the graph $G$ is large compared with its Euler characteristic $\\chi$.</td>\n",
       "      <td>The bondage number MATH of a graph MATH is the smallest number of edges whose removal from MATH results in a graph with larger domination number. Let MATH be embeddable on a surface whose Euler characteristic MATH is as large as possible, and assume MATH. Gagarin-Zverovich and Huang have recently found upper bounds of MATH in terms of the maximum degree MATH and the Euler characteristic MATH. In this paper we prove a better upper bound MATH where MATH is the largest real root of the cubic equation MATH; this upper bound is asymptotically equivalent to MATH. We also establish further improved upper bounds for MATH when the girth, order, or size of the graph MATH is large compared with its Euler characteristic MATH.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \\\n",
       "139098    Telg\\'arsky's conjecture states that for each $k \\in \\mathbb N$, there is a topological space $X_k$ such that in the Banach-Mazur game on $X_k$, the player {\\scriptsize NONEMPTY} has a winning $(k+1)$-tactic but no winning $k$-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming $\\mathsf{GCH}+\\square$, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a $T_3$ space $X$, then she has a winning $2$-tactic. The proof uses a coding argument due to Galvin, whereby if $X$ has a $\\pi$-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under $\\mathsf{GCH}+\\square$, every $T_3$ space has a sufficiently nice $\\pi$-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that $\\mathsf{GCH}+\\square$ implies the following statement, which is equivalent to the existence of the \"nice'' $\\pi$-bases mentioned above: \\emph{Every separative poset $\\mathbb P$ with the $\\kappa$-cc contains a dense sub-poset $\\mathbb D$ such that $|\\{ q \\in \\mathbb D \\,:\\, p \\text{ extends } q \\}| < \\kappa$ for every $p \\in \\mathbb P$.} We prove that this statement is independent of $\\mathsf{ZFC}$: while it holds under $\\mathsf{GCH}+\\square$, it is false even for ccc posets if $\\mathfrak{b} > \\aleph_1$. We also show that if $|\\mathbb P| < \\aleph_\\omega$, then \\axiom-for-$\\mathbb P$ is a consequence of $\\mathsf{GCH}$ holding below $|\\mathbb P|$.    \n",
       "50283     We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "169377    Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)<\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \n",
       "32935     This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "38604     Suppose that $Q$ is a weak$^{\\ast }$ compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if $(S,Q)$ is a nonexpansive and norm-distal dynamical system, then there is a fixed point of $S$ in $Q$ and the set of fixed points is a nonexpansive retract of $Q.$ As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in $L$-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "132354    The bondage number $b(G)$ of a graph $G$ is the smallest number of edges whose removal from $G$ results in a graph with larger domination number. Let $G$ be embeddable on a surface whose Euler characteristic $\\chi$ is as large as possible, and assume $\\chi\\leq0$. Gagarin-Zverovich and Huang have recently found upper bounds of $b(G)$ in terms of the maximum degree $\\Delta(G)$ and the Euler characteristic $\\chi(G)=\\chi$. In this paper we prove a better upper bound $b(G)\\leq\\Delta(G)+\\lfloor t\\rfloor$ where $t$ is the largest real root of the cubic equation $z^3 + z^2 + (3\\chi - 8)z + 9\\chi - 12=0$; this upper bound is asymptotically equivalent to $b(G)\\leq\\Delta(G)+1+\\lfloor \\sqrt{4-3\\chi} \\rfloor$. We also establish further improved upper bounds for $b(G)$ when the girth, order, or size of the graph $G$ is large compared with its Euler characteristic $\\chi$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           inline_removed  \n",
       "139098    Telg\\'arsky's conjecture states that for each MATH, there is a topological space MATH such that in the Banach-Mazur game on MATH, the player {\\scriptsize NONEMPTY} has a winning MATH-tactic but no winning MATH-tactic. We prove that this statement is consistently false.   More specifically, we prove, assuming MATH, that if {\\scriptsize NONEMPTY} has a winning strategy for the Banach-Mazur game on a MATH space MATH, then she has a winning MATH-tactic. The proof uses a coding argument due to Galvin, whereby if MATH has a MATH-base with certain nice properties, then {\\scriptsize NONEMPTY} is able to encode, in each consecutive pair of her opponent's moves, all essential information about the play of the game before the current move. Our proof shows that under MATH, every MATH space has a sufficiently nice MATH-base that enables this coding strategy.   Translated into the language of partially ordered sets, what we really show is that MATH implies the following statement, which is equivalent to the existence of the \"nice'' MATH-bases mentioned above: \\emph{Every separative poset MATH with the MATH-cc contains a dense sub-poset MATH such that MATH for every MATH.} We prove that this statement is independent of MATH: while it holds under MATH, it is false even for ccc posets if MATH. We also show that if MATH, then \\axiom-for-MATH is a consequence of MATH holding below MATH.   \n",
       "50283     We prove a large deviation principle for a greedy exploration process on an Erd\\\"os-R\\'enyi (ER) graph when the number of nodes goes to infinity. To prove our main result, we use the general strategy to study large deviations of processes proposed by Feng and Kurtz, based on the convergence of non-linear semigroups. The rate function can be expressed in a closed-form formula, and associated optimization problems can be solved explicitly, providing the large deviation trajectory. Also, we derive an LDP for the size of the maximum independent set discovered by such an algorithm and analyze the probability that it exceeds known bounds for the maximal independent set. We also analyze the link between these results and the landscape complexity of the independent set and the exploration dynamic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "169377    Given a compact Gelfand pair (G,K) and a locally compact group L, we characterize the class P_K^\\sharp(G,L) of continuous positive definite functions f:G\\times L\\to \\C which are bi-invariant in the G-variable with respect to K. The functions of this class are the functions having a uniformly convergent expansion \\sum_{\\varphi\\in Z} B(\\varphi)(u)\\varphi(x) for x\\in G,u\\in L, where the sum is over the space Z of positive definite spherical functions \\varphi:G\\to\\C for the Gelfand pair, and (B(\\varphi))_{\\varphi\\in Z} is a family of continuous positive definite functions on L such that \\sum_{\\varphi\\in Z}B(\\varphi)(e_L)<\\infty. Here e_L is the neutral element of the group L. For a compact abelian group G considered as a Gelfand pair (G,K) with trivial K=\\{e_G\\}, we obtain a characterization of P(G\\times L) in terms of Fourier expansions on the dual group \\widehat{G}.   The result is described in detail for the case of the Gelfand pairs (O(d+1),O(d)) and (U(q),U(q-1)) as well as for the product of these Gelfand pairs.   The result generalizes recent theorems of Berg-Porcu (2016) and Guella-Menegatto (2016)                                                                                                                                                                                                                                                                                  \n",
       "32935     This update to my article on Congruent numbers, elliptic curves, and the passage from the local to the global, which appeared in Resonance, December 2009, pp. 1183--1205 (https://www.ias.ac.in/describe/article/reso/014/12/1183-1205) and was posted here as arXiv:0704.3783, covers a few recent advances in the arithmetic of elliptic curves with special reference to the congruent number problem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "38604     Suppose that MATH is a weakMATH compact convex subset of a dual Banach space with the Radon-Nikod\\'{y}m property. We show that if MATH is a nonexpansive and norm-distal dynamical system, then there is a fixed point of MATH in MATH and the set of fixed points is a nonexpansive retract of MATH As a consequence we obtain a nonlinear extension of the Bader-Gelander-Monod theorem concerning isometries in MATH-embedded Banach spaces. A similar statement is proved for weakly compact convex subsets of a locally convex space, thus giving the nonlinear counterpart of the Ryll-Nardzewski theorem.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "132354    The bondage number MATH of a graph MATH is the smallest number of edges whose removal from MATH results in a graph with larger domination number. Let MATH be embeddable on a surface whose Euler characteristic MATH is as large as possible, and assume MATH. Gagarin-Zverovich and Huang have recently found upper bounds of MATH in terms of the maximum degree MATH and the Euler characteristic MATH. In this paper we prove a better upper bound MATH where MATH is the largest real root of the cubic equation MATH; this upper bound is asymptotically equivalent to MATH. We also establish further improved upper bounds for MATH when the girth, order, or size of the graph MATH is large compared with its Euler characteristic MATH.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Naive latex removal: We are first going to only remove latex math that is a 'word' i.e.\n",
    "## for which there is a space on either side of it.\n",
    "\n",
    "import regex\n",
    "\n",
    "## in-line math pattern separated by white space.\n",
    "pattern = r'\\$[^\\$]+?\\$'\n",
    "examples['inline_removed'] = pd.Series([regex.sub(pattern,'MATH',abstract) for abstract in examples.abstract],\n",
    "                                       index=indices)\n",
    "examples"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some decoration it would be nice to remove:\n",
    "\\emph{TEXT} - > TEXT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Teg\\'arsky's conjecture\n",
      "Tegarsky's conjecture\n"
     ]
    }
   ],
   "source": [
    "t = \"Teg\\\\'arsky's conjecture\"\n",
    "import regex\n",
    "\n",
    "## Goal: Replace with 'Tegarsky's conjecture'\n",
    "\n",
    "pattern = r\"\\\\\\'(.)\"\n",
    "results = regex.sub(pattern,r'\\1',t)\n",
    "print(t)\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We study C\\^ech cohomology...\n",
      "We study Cech cohomology...\n"
     ]
    }
   ],
   "source": [
    "## Test Cech cohomology\n",
    "\n",
    "t = 'We study C\\^ech cohomology...'\n",
    "pattern = r'\\\\\\^(.)'\n",
    "result = regex.sub(pattern,r'\\1',t)\n",
    "print(t)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Here is an example. We really want to say \\emph{This is fucking stupid}. How do we do this?\n",
      "Here is an example. We really want to say This is fucking stupid. How do we do this?\n"
     ]
    }
   ],
   "source": [
    "## Now how to do get rid of tex style formatting?\n",
    "t = 'Here is an example. We really want to say \\emph{This is fucking stupid}. How do we do this?'\n",
    "pattern = r'\\\\[^{}]*?{([^{}]*)}'\n",
    "result = regex.sub(pattern,r'\\1',t)\n",
    "print(t)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Test out on more examples. Make this into a function\n",
    "\n",
    "def remove_env(string):\n",
    "    pattern = r'\\\\[^{}]*?{([^{}]*)}'\n",
    "    return regex.sub(pattern, r'\\1', string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'We are going to blah blah DG and then yeah we do that.'"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = r'We are going to blah blah \\cite{DG} and then yeah we do that.'\n",
    "remove_env(a)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. More precise regex based cleaning.\n",
    "\n",
    "#### List of patterns we will substitute:\n",
    "\n",
    "Order of operations:\n",
    "Remove \\cite \\bf{} \\emph{} and replace with just what's inside the {}.\n",
    "\n",
    "1. Remove TeX accents \n",
    "    - r'\\\\\\A(.)'  -> r'\\1'\n",
    "    - Here A is the accent character\n",
    "    - A \\in {' , \" , ^ , `, H, ~, c, k, l, =, b, d, r, u, v, t, o, i}\n",
    "    See https://en.wikibooks.org/wiki/LaTeX/Special_Characters\n",
    "\n",
    "    - We also have to deal with accents written like \\A{letter}\n",
    "    - We ALSO have to deal with the fact that there is latex formatting that can begin with the same\n",
    "    characters -- e.g. \\b(.) will also match the \\bf in \\bf{text}. One way to do this is to remove these\n",
    "    environments first, before cleaning the accents.\n",
    "\n",
    "2. As we mentioned, this will get caught on \\cite{} or \\emph{} or \\bf{} BUT a further complication--\n",
    "We don't want to REMOVE the pattern Schr\\\"{o}dinger; we want to replace it with Schrodinger. However, we DO\n",
    "want to remove the environmen \\begin{}, \\end{} etc\n",
    "\n",
    "    - Maybe we can think of it like this: Use look-aheads. We know that accent characters will always be of the\n",
    "    form \\(SINGLE CHARACTER){}. Whereas no environments are defined by a single character?\n",
    "\n",
    "3. Specific character sequences\n",
    "    - \\begin{}\n",
    "    - \\end{}\n",
    "    - \\item\n",
    "    - \\\\[ .... \\\\]\n",
    "    - \\$$ ....\\$$\n",
    "4. IDEA: First match \\A{o} type accents and replace those.\n",
    "5. Now we only have ..a\\ce and \\cite. But we can differentiate these by looking ahead for a {}, since we have already removed accents that enclose the recieving char in {}.\n",
    "6. Latex envs are all of the form \\(LETTERS{...}) so, we can search for a pattern like\n",
    "r'\\\\[a-z]+'\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "## For cleaning tests, we make a copy of the clean data to experiment on \n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "pd.set_option('display.max_colwidth', 0)\n",
    "\n",
    "np.random.seed(420)\n",
    "test = pd.read_parquet('./data/arXiv.parquet',columns=['title','abstract']).sample(100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Define the pipeline\n",
    "import regex\n",
    "\n",
    "## 1. Latin-ize latex accents enclosed in brackets\n",
    "def remove_latex_accents(string):\n",
    "    accent = r'\\\\[\\'\\\"\\^\\`H\\~ckl=bdruvtoi]\\{([a-z])\\}'\n",
    "    replacement = r'\\1'\n",
    "\n",
    "    string = regex.sub(accent,replacement, string)\n",
    "    return string\n",
    "\n",
    "## 2. Remove latex environments\n",
    "def remove_env(string):\n",
    "    env = r'\\\\[a-z]{2,}{[^{}]+?}'\n",
    "\n",
    "    string = regex.sub(env,'',string)\n",
    "    return string\n",
    "\n",
    "## 3. Latin-ize non-{} enclosed latex accents:\n",
    "def remove_accents(string):\n",
    "    accent = r'\\\\[\\'\\\"\\^\\`H\\~ckl=bdruvtoi]([a-z])'\n",
    "    replacement = r'\\1'\n",
    "\n",
    "    string = regex.sub(accent,replacement,string)\n",
    "    return string \n",
    "\n",
    "## 4. ONLY remove latex'd math that is separated as a 'word' i.e. has space characters on either side of it.\n",
    "\n",
    "def remove_latex(string):\n",
    "    latex = r'\\s(\\$\\$?)[^\\$]*?\\1\\S*'\n",
    "    string = regex.sub(latex,' LATEX ',string)\n",
    "    return string \n",
    "     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Create the cleaning pipeline:\n",
    "\n",
    "def cleanse(string):\n",
    "    string = remove_latex_accents(string)\n",
    "    string = remove_env(string)\n",
    "    string = remove_accents(string)\n",
    "    string = remove_latex(string)\n",
    "    return string\n",
    "        \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "## See the results of these 4 ste\n",
    "test['abstract'] = test.abstract.apply(cleanse)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>90755</th>\n",
       "      <td>Symmetric Ideal Magnetofluidostatic Equilibria with Non-Vanishing\\n  Pressure Gradients in Asymmetric Confinement Vessels</td>\n",
       "      <td>We study the possibility of constructing steady magnetic fields satisfying the force balance equation of ideal magnetohydrodynamics with tangential boundary conditions in asymmetric confinement vessels, i.e. bounded regions that are not invariant under continuous Euclidean isometries (translations, rotations, or their combination). This problem is often encountered in the design of next-generation fusion reactors. We show that such configurations are possible if one relaxes the standard assumption that the vessel boundary corresponds to a pressure isosurface. We exhibit a smooth solution that possesses an Euclidean symmetry and yet solves the boundary value problem in an asymmetric ellipsoidal domain while sustaining a non-vanishing pressure gradient. This result provides a definitive answer to the problem of existence of regular ideal magnetofluidostatic equilibria in asymmetric bounded domains. The question remains open whether regular asymmetric solutions of the boundary value problem exist.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18805</th>\n",
       "      <td>Improved Sample Complexity Bounds for Branch-and-Cut</td>\n",
       "      <td>Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165414</th>\n",
       "      <td>Yorioka's characterization of the cofinality of the strong measure zero\\n  ideal and its independency from the continuum</td>\n",
       "      <td>In this paper we present a simpler proof of the fact that no inequality between LATEX and LATEX can be decided in ZFC by using well-known tecniques and results.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139352</th>\n",
       "      <td>The Janson inequalities for general up-sets</td>\n",
       "      <td>Janson and Janson, Luczak and Rucinski proved several inequalities for the lower tail of the distribution of the number of events that hold, when all the events are up-sets (increasing events) of a special form - each event is the intersection of some subset of a single set of independent events (i.e., a principal up-set). We show that these inequalities in fact hold for arbitrary up-sets, by modifying existing proofs to use only positive correlation, avoiding the need to assume positive correlation conditioned on one of the events.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168337</th>\n",
       "      <td>Priority Maps for Surveillance and Intervention of Wildfires and other\\n  Spreading Processes</td>\n",
       "      <td>Unmanned Aerial Vehicle (UAV) path planning algorithms often assume a knowledge reward function or priority map, indicating the most important areas to visit. In this paper we propose a method to create priority maps for monitoring or intervention of dynamic spreading processes such as wildfires. The presented optimization framework utilizes the properties of positive systems, in particular the separable structure of value (cost-to-go) functions, to provide scalable algorithms for surveillance and intervention. We present results obtained for a 16 and 1000 node example and convey how the priority map responds to changes in the dynamics of the system. The larger example of 1000 nodes, representing a fictional landscape, shows how the method can integrate bushfire spreading dynamics, landscape and wind conditions. Finally, we give an example of combining the proposed method with a travelling salesman problem for UAV path planning for wildfire intervention.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85379</th>\n",
       "      <td>The Hart-Shelah example, in stronger logics</td>\n",
       "      <td>We generalize the Hart-Shelah example  to higher infinitary logics. We build, for each natural number LATEX and for each infinite cardinal LATEX  a sentence LATEX of the logic LATEX that (modulo mild set theoretical hypotheses around LATEX and assuming LATEX  is categorical in LATEX but not in LATEX (or beyond); we study the dimensional encoding of combinatorics involved in the construction of this sentence and study various model-theoretic properties of the resulting abstract elementary class LATEX in the finite interval of cardinals LATEX</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122358</th>\n",
       "      <td>Relative non-pluripolar products of currents</td>\n",
       "      <td>Given a closed positive current T on a compact Kahler manifold X, we introduce the notion of non-pluripolar product relative to T of closed positive (1,1)-currents. We recover the well-known non-pluripolar product when T is the current of integration along X. Our main results are a monotonicity property of relative non-pluripolar products, a necessary condition for currents to be of relative full mass intersection in terms of Lelong numbers, and the convexity of weighted classes of currents of relative full mass intersection. The former two results are new even when T is the current of integration along X.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>153436</th>\n",
       "      <td>Schemes supported on the singular locus of a hyperplane arrangement in\\n  $\\mathbb P^n$</td>\n",
       "      <td>We introduce the use of liaison addition to the study of hyperplane arrangements. For an arrangement, LATEX  of hyperplanes in LATEX  LATEX is free if LATEX is Cohen-Macaulay, where LATEX is the Jacobian ideal of LATEX  Terao's conjecture says that freeness of LATEX is determined by the combinatorics of the intersection lattice of LATEX  We study the Cohen-Macaulayness of three other ideals, all unmixed, that are closely related to LATEX  Let LATEX be the intersection of height two primary components of LATEX and LATEX be the radical of LATEX  Our third ideal is LATEX for suitable LATEX  With a mild hypothesis we use liaison addition to show that all of these ideals are Cohen-Macaulay. When our hypothesis does not hold, we show that these ideals are not necessarily Cohen-Macaulay, and that Cohen-Macaulayness of any of these ideals does not imply Cohen-Macaulayness of any of the others. While we do not study the freeness of LATEX  we show by example that the Betti diagrams can vary even for arrangements with the same combinatorics.   We then study the situation when the hypothesis does not hold. For equidimensional curves in LATEX  the Hartshorne-Rao module from liaison theory measures the failure of an ideal to be Cohen-Macaulay, degree by degree, and also determines the even liaison class of such a curve. We show that for any positive integer LATEX there is an arrangement LATEX for which LATEX fails to be Cohen-Macaulay in only one degree, and this failure is by LATEX  we also give an analogous result for LATEX</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62339</th>\n",
       "      <td>Clifford deformations of Koszul Frobenius algebras and noncommutative\\n  quadrics</td>\n",
       "      <td>Let LATEX be a Koszul Frobenius algebra. A Clifford deformation of LATEX is a finite dimensional LATEX  algebra LATEX  which corresponds to a noncommutative quadric hypersurface LATEX  for some central regular element LATEX  It turns out that the bounded derived category LATEX is equivalent to the stable category of the maximal Cohen-Macaulay modules over LATEX provided that LATEX is noetherian. As a consequence, LATEX is a noncommutative isolated singularity if and only if the corresponding Clifford deformation LATEX is a semisimple LATEX  algebra. The preceding equivalence of triangulated categories also indicates that Clifford deformations of trivial extensions of a Koszul Frobenius algebra are related to the Knorrer Periodicity Theorem for quadric hypersurfaces. As an application, we recover Knorrer Periodicity Theorem without using of matrix factorizations.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117210</th>\n",
       "      <td>Causal Factorization and Linear Feedback</td>\n",
       "      <td>An algebraic framework for the investigation of linear dynamic output feedback is introduced. Pivotal in the present theory is the problem of causal factorization, i.e. the problem of factoring two systems over each other through a causal factor. The basic issues are resolved with the aid of the new concept of latency kernels.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                            title  \\\n",
       "90755   Symmetric Ideal Magnetofluidostatic Equilibria with Non-Vanishing\\n  Pressure Gradients in Asymmetric Confinement Vessels   \n",
       "18805   Improved Sample Complexity Bounds for Branch-and-Cut                                                                        \n",
       "165414  Yorioka's characterization of the cofinality of the strong measure zero\\n  ideal and its independency from the continuum    \n",
       "139352  The Janson inequalities for general up-sets                                                                                 \n",
       "168337  Priority Maps for Surveillance and Intervention of Wildfires and other\\n  Spreading Processes                               \n",
       "...                                                                                               ...                               \n",
       "85379   The Hart-Shelah example, in stronger logics                                                                                 \n",
       "122358  Relative non-pluripolar products of currents                                                                                \n",
       "153436  Schemes supported on the singular locus of a hyperplane arrangement in\\n  $\\mathbb P^n$                                     \n",
       "62339   Clifford deformations of Koszul Frobenius algebras and noncommutative\\n  quadrics                                           \n",
       "117210  Causal Factorization and Linear Feedback                                                                                    \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     abstract  \n",
       "90755     We study the possibility of constructing steady magnetic fields satisfying the force balance equation of ideal magnetohydrodynamics with tangential boundary conditions in asymmetric confinement vessels, i.e. bounded regions that are not invariant under continuous Euclidean isometries (translations, rotations, or their combination). This problem is often encountered in the design of next-generation fusion reactors. We show that such configurations are possible if one relaxes the standard assumption that the vessel boundary corresponds to a pressure isosurface. We exhibit a smooth solution that possesses an Euclidean symmetry and yet solves the boundary value problem in an asymmetric ellipsoidal domain while sustaining a non-vanishing pressure gradient. This result provides a definitive answer to the problem of existence of regular ideal magnetofluidostatic equilibria in asymmetric bounded domains. The question remains open whether regular asymmetric solutions of the boundary value problem exist.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "18805     Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research.                                                                                                                                                                                                                                                                                                                            \n",
       "165414    In this paper we present a simpler proof of the fact that no inequality between LATEX and LATEX can be decided in ZFC by using well-known tecniques and results.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "139352    Janson and Janson, Luczak and Rucinski proved several inequalities for the lower tail of the distribution of the number of events that hold, when all the events are up-sets (increasing events) of a special form - each event is the intersection of some subset of a single set of independent events (i.e., a principal up-set). We show that these inequalities in fact hold for arbitrary up-sets, by modifying existing proofs to use only positive correlation, avoiding the need to assume positive correlation conditioned on one of the events.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "168337    Unmanned Aerial Vehicle (UAV) path planning algorithms often assume a knowledge reward function or priority map, indicating the most important areas to visit. In this paper we propose a method to create priority maps for monitoring or intervention of dynamic spreading processes such as wildfires. The presented optimization framework utilizes the properties of positive systems, in particular the separable structure of value (cost-to-go) functions, to provide scalable algorithms for surveillance and intervention. We present results obtained for a 16 and 1000 node example and convey how the priority map responds to changes in the dynamics of the system. The larger example of 1000 nodes, representing a fictional landscape, shows how the method can integrate bushfire spreading dynamics, landscape and wind conditions. Finally, we give an example of combining the proposed method with a travelling salesman problem for UAV path planning for wildfire intervention.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "85379     We generalize the Hart-Shelah example  to higher infinitary logics. We build, for each natural number LATEX and for each infinite cardinal LATEX  a sentence LATEX of the logic LATEX that (modulo mild set theoretical hypotheses around LATEX and assuming LATEX  is categorical in LATEX but not in LATEX (or beyond); we study the dimensional encoding of combinatorics involved in the construction of this sentence and study various model-theoretic properties of the resulting abstract elementary class LATEX in the finite interval of cardinals LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "122358    Given a closed positive current T on a compact Kahler manifold X, we introduce the notion of non-pluripolar product relative to T of closed positive (1,1)-currents. We recover the well-known non-pluripolar product when T is the current of integration along X. Our main results are a monotonicity property of relative non-pluripolar products, a necessary condition for currents to be of relative full mass intersection in terms of Lelong numbers, and the convexity of weighted classes of currents of relative full mass intersection. The former two results are new even when T is the current of integration along X.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "153436    We introduce the use of liaison addition to the study of hyperplane arrangements. For an arrangement, LATEX  of hyperplanes in LATEX  LATEX is free if LATEX is Cohen-Macaulay, where LATEX is the Jacobian ideal of LATEX  Terao's conjecture says that freeness of LATEX is determined by the combinatorics of the intersection lattice of LATEX  We study the Cohen-Macaulayness of three other ideals, all unmixed, that are closely related to LATEX  Let LATEX be the intersection of height two primary components of LATEX and LATEX be the radical of LATEX  Our third ideal is LATEX for suitable LATEX  With a mild hypothesis we use liaison addition to show that all of these ideals are Cohen-Macaulay. When our hypothesis does not hold, we show that these ideals are not necessarily Cohen-Macaulay, and that Cohen-Macaulayness of any of these ideals does not imply Cohen-Macaulayness of any of the others. While we do not study the freeness of LATEX  we show by example that the Betti diagrams can vary even for arrangements with the same combinatorics.   We then study the situation when the hypothesis does not hold. For equidimensional curves in LATEX  the Hartshorne-Rao module from liaison theory measures the failure of an ideal to be Cohen-Macaulay, degree by degree, and also determines the even liaison class of such a curve. We show that for any positive integer LATEX there is an arrangement LATEX for which LATEX fails to be Cohen-Macaulay in only one degree, and this failure is by LATEX  we also give an analogous result for LATEX    \n",
       "62339     Let LATEX be a Koszul Frobenius algebra. A Clifford deformation of LATEX is a finite dimensional LATEX  algebra LATEX  which corresponds to a noncommutative quadric hypersurface LATEX  for some central regular element LATEX  It turns out that the bounded derived category LATEX is equivalent to the stable category of the maximal Cohen-Macaulay modules over LATEX provided that LATEX is noetherian. As a consequence, LATEX is a noncommutative isolated singularity if and only if the corresponding Clifford deformation LATEX is a semisimple LATEX  algebra. The preceding equivalence of triangulated categories also indicates that Clifford deformations of trivial extensions of a Koszul Frobenius algebra are related to the Knorrer Periodicity Theorem for quadric hypersurfaces. As an application, we recover Knorrer Periodicity Theorem without using of matrix factorizations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "117210    An algebraic framework for the investigation of linear dynamic output feedback is introduced. Pivotal in the present theory is the problem of causal factorization, i.e. the problem of factoring two systems over each other through a causal factor. The basic issues are resolved with the aid of the new concept of latency kernels.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "\n",
       "[100 rows x 2 columns]"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sample():\n",
    "    test = pd.read_parquet('./data/arXiv.parquet',columns=['title','abstract']).sample(10)\n",
    "    test['abstract'] = test.abstract.apply(cleanse)\n",
    "    return test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>172804</th>\n",
       "      <td>First order convergence and roots</td>\n",
       "      <td>Nesetril and Ossona de Mendez introduced the notion of first order convergence, which unifies the notions of convergence for sparse and dense graphs. They asked whether if G_i is a sequence of graphs with M being their first order limit and v is a vertex of M, then there exists a sequence v_i of vertices such that the graphs G_i rooted at v_i converge to M rooted at v. We show that this holds for almost all vertices v of M and we give an example showing that the statement need not hold for all vertices.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117253</th>\n",
       "      <td>Uniqueness for contagious McKean--Vlasov systems in the weak feedback\\n  regime</td>\n",
       "      <td>We present a simple uniqueness argument for a collection of McKean-Vlasov problems that have seen recent interest. Our first result shows that, in the weak feedback regime, there is global uniqueness for a very general class of random drivers. By weak feedback we mean the case where the contagion parameters are small enough to prevent blow-ups in solutions. Next, we specialise to a Brownian driver and show how the same techniques can be extended to give short-time uniqueness after blow-ups, regardless of the feedback strength. The heart of our approach is a surprisingly simple probabilistic comparison argument that is robust in the sense that it does not ask for any regularity of the solutions.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23755</th>\n",
       "      <td>$n$-Kazhdan groups and higher spectral expanders</td>\n",
       "      <td>Let LATEX  be a group of type LATEX  and let LATEX  be the LATEX  skeleton of the universal cover of a LATEX  simplicial complex with finite LATEX  skeleton. We show that if LATEX  is strongly LATEX  then for any family of finite index subgroups LATEX  the family of simplicial complexes LATEX  are bounded degree LATEX  spectral expanders. Using this we construct new examples of LATEX  dimensional spectral expanders.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2822</th>\n",
       "      <td>A 1-dimensional formal group over the prismatization of Spf Z_p</td>\n",
       "      <td>Let Sigma denote the prismatization of Spf (Z_p). The multiplicative group over Sigma maps to the prismatization of the multiplicative group over Spf (Z_p). We prove that the kernel of this map is the Cartier dual of some 1-dimensional formal group over Sigma. We obtain some results about this formal group (e.g., we describe its Lie algebra). We give a very explicit description of the pullback of the formal group to the quotient of the q-de Rham prism by the action of the multiplicative group of Z_p.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8528</th>\n",
       "      <td>Strong replica symmetry in high-dimensional optimal Bayesian inference</td>\n",
       "      <td>We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed \"side-observations\" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142660</th>\n",
       "      <td>Infinite dimensional systems of particles with interactions given by\\n  Dunkl operators</td>\n",
       "      <td>Firstly we consider a finite dimensional Markov semigroup generated by Dunkl laplacian with drift terms. Using gradient bounds we show that for small coefficients this semigroup has an invariant measure. We then extend this analysis to an infinite dimensional semigroup on LATEX  which we construct using gradient bounds, and finally we study the existence of invariant measures and ergodicity properties.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106924</th>\n",
       "      <td>The Erd\\H{o}s-Ko-Rado theorem for $2$-intersecting families of perfect\\n  matchings</td>\n",
       "      <td>A perfect matching in the complete graph on LATEX  vertices is a set of edges such that no two edges have a vertex in common and every vertex is covered exactly once. Two perfect matchings are said to be LATEX  if they have at least LATEX  edges in common. The main result in this paper is an extension of the famous Erdos-Ko-Rado (EKR) theorem  to 2-intersecting families of perfect matchings for all values of LATEX  Specifically, for LATEX  a set of 2-intersecting perfect matchings in LATEX  of maximum size has LATEX  perfect matchings.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99296</th>\n",
       "      <td>On the Yau-Tian-Donaldson conjecture for generalized K\\\"ahler-Ricci\\n  soliton equations</td>\n",
       "      <td>Let LATEX  be a log variety with an effective holomorphic torus action, and LATEX  be a closed positive LATEX  For any smooth positive function LATEX  defined on the moment polytope of the torus action, we study the Monge-Ampere equations that correspond to generalized and twisted Kahler-Ricci LATEX  We prove a version of Yau-Tian-Donaldson (YTD) conjecture for these general equations, showing that the existence of solutions is always equivalent to an equivariantly uniform LATEX  LATEX  When LATEX  is a current associated to a torus invariant linear system, we further show that equivariant special test configurations suffice for testing the stability. Our results allow arbitrary klt singularities and generalize most of previous results on (uniform) YTD conjecture for (twisted) Kahler-Ricci/Mabuchi solitons or Kahler-Einstein metrics.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18873</th>\n",
       "      <td>Equivariant log-concavity and equivariant K\\\"ahler packages</td>\n",
       "      <td>We show that the exterior algebra LATEX  which is the cohomology of the torus LATEX  and the polynomial ring LATEX  which is the cohomology of the classifying space LATEX  are LATEX  log-concave. We do so by explicitly giving the LATEX  maps on the appropriate sequences of tensor products of polynomials or exterior powers and proving that these maps satisfy the hard Lefschetz theorem. Furthermore, we prove that the whole Kahler package, including algebraic analogies of the Poincare duality, hard Lefschetz, and Hodge-Riemann bilinear relations, holds on the corresponding sequences in an equivariant setting.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70252</th>\n",
       "      <td>Nullstellensatz for relative existentially closed groups</td>\n",
       "      <td>We prove that in every variety of LATEX  every LATEX  closed element satisfies nullstellensatz for finite consistent systems of equations. This will generalize {f Theorem G} of . As a result we see that every pair of LATEX  closed elements in an arbitrary variety of LATEX  generate the same quasi-variety and if both of them are LATEX  they are geometrically equivalent.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                           title  \\\n",
       "172804  First order convergence and roots                                                          \n",
       "117253  Uniqueness for contagious McKean--Vlasov systems in the weak feedback\\n  regime            \n",
       "23755   $n$-Kazhdan groups and higher spectral expanders                                           \n",
       "2822    A 1-dimensional formal group over the prismatization of Spf Z_p                            \n",
       "8528    Strong replica symmetry in high-dimensional optimal Bayesian inference                     \n",
       "142660  Infinite dimensional systems of particles with interactions given by\\n  Dunkl operators    \n",
       "106924  The Erd\\H{o}s-Ko-Rado theorem for $2$-intersecting families of perfect\\n  matchings        \n",
       "99296   On the Yau-Tian-Donaldson conjecture for generalized K\\\"ahler-Ricci\\n  soliton equations   \n",
       "18873   Equivariant log-concavity and equivariant K\\\"ahler packages                                \n",
       "70252   Nullstellensatz for relative existentially closed groups                                   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          abstract  \n",
       "172804    Nesetril and Ossona de Mendez introduced the notion of first order convergence, which unifies the notions of convergence for sparse and dense graphs. They asked whether if G_i is a sequence of graphs with M being their first order limit and v is a vertex of M, then there exists a sequence v_i of vertices such that the graphs G_i rooted at v_i converge to M rooted at v. We show that this holds for almost all vertices v of M and we give an example showing that the statement need not hold for all vertices.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              \n",
       "117253    We present a simple uniqueness argument for a collection of McKean-Vlasov problems that have seen recent interest. Our first result shows that, in the weak feedback regime, there is global uniqueness for a very general class of random drivers. By weak feedback we mean the case where the contagion parameters are small enough to prevent blow-ups in solutions. Next, we specialise to a Brownian driver and show how the same techniques can be extended to give short-time uniqueness after blow-ups, regardless of the feedback strength. The heart of our approach is a surprisingly simple probabilistic comparison argument that is robust in the sense that it does not ask for any regularity of the solutions.                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "23755     Let LATEX  be a group of type LATEX  and let LATEX  be the LATEX  skeleton of the universal cover of a LATEX  simplicial complex with finite LATEX  skeleton. We show that if LATEX  is strongly LATEX  then for any family of finite index subgroups LATEX  the family of simplicial complexes LATEX  are bounded degree LATEX  spectral expanders. Using this we construct new examples of LATEX  dimensional spectral expanders.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "2822      Let Sigma denote the prismatization of Spf (Z_p). The multiplicative group over Sigma maps to the prismatization of the multiplicative group over Spf (Z_p). We prove that the kernel of this map is the Cartier dual of some 1-dimensional formal group over Sigma. We obtain some results about this formal group (e.g., we describe its Lie algebra). We give a very explicit description of the pullback of the formal group to the quotient of the q-de Rham prism by the action of the multiplicative group of Z_p.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "8528      We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed \"side-observations\" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context.   \n",
       "142660    Firstly we consider a finite dimensional Markov semigroup generated by Dunkl laplacian with drift terms. Using gradient bounds we show that for small coefficients this semigroup has an invariant measure. We then extend this analysis to an infinite dimensional semigroup on LATEX  which we construct using gradient bounds, and finally we study the existence of invariant measures and ergodicity properties.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "106924    A perfect matching in the complete graph on LATEX  vertices is a set of edges such that no two edges have a vertex in common and every vertex is covered exactly once. Two perfect matchings are said to be LATEX  if they have at least LATEX  edges in common. The main result in this paper is an extension of the famous Erdos-Ko-Rado (EKR) theorem  to 2-intersecting families of perfect matchings for all values of LATEX  Specifically, for LATEX  a set of 2-intersecting perfect matchings in LATEX  of maximum size has LATEX  perfect matchings.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "99296     Let LATEX  be a log variety with an effective holomorphic torus action, and LATEX  be a closed positive LATEX  For any smooth positive function LATEX  defined on the moment polytope of the torus action, we study the Monge-Ampere equations that correspond to generalized and twisted Kahler-Ricci LATEX  We prove a version of Yau-Tian-Donaldson (YTD) conjecture for these general equations, showing that the existence of solutions is always equivalent to an equivariantly uniform LATEX  LATEX  When LATEX  is a current associated to a torus invariant linear system, we further show that equivariant special test configurations suffice for testing the stability. Our results allow arbitrary klt singularities and generalize most of previous results on (uniform) YTD conjecture for (twisted) Kahler-Ricci/Mabuchi solitons or Kahler-Einstein metrics.                                                                                                                                                                                                                                                             \n",
       "18873     We show that the exterior algebra LATEX  which is the cohomology of the torus LATEX  and the polynomial ring LATEX  which is the cohomology of the classifying space LATEX  are LATEX  log-concave. We do so by explicitly giving the LATEX  maps on the appropriate sequences of tensor products of polynomials or exterior powers and proving that these maps satisfy the hard Lefschetz theorem. Furthermore, we prove that the whole Kahler package, including algebraic analogies of the Poincare duality, hard Lefschetz, and Hodge-Riemann bilinear relations, holds on the corresponding sequences in an equivariant setting.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "70252     We prove that in every variety of LATEX  every LATEX  closed element satisfies nullstellensatz for finite consistent systems of equations. This will generalize {f Theorem G} of . As a result we see that every pair of LATEX  closed elements in an arbitrary variety of LATEX  generate the same quasi-variety and if both of them are LATEX  they are geometrically equivalent.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>70252</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>title</th>\n",
       "      <td>Nullstellensatz for relative existentially closed groups</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>abstract</th>\n",
       "      <td>We prove that in every variety of $G$-groups, every $G$-existentially closed element satisfies nullstellensatz for finite consistent systems of equations. This will generalize {\\bf Theorem G} of \\cite{BMR1}. As a result we see that every pair of $G$-existentially closed elements in an arbitrary variety of $G$-groups generate the same quasi-variety and if both of them are $q_{\\omega}$-compact, they are geometrically equivalent.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                      70252\n",
       "title     Nullstellensatz for relative existentially closed groups                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "abstract    We prove that in every variety of $G$-groups, every $G$-existentially closed element satisfies nullstellensatz for finite consistent systems of equations. This will generalize {\\bf Theorem G} of \\cite{BMR1}. As a result we see that every pair of $G$-existentially closed elements in an arbitrary variety of $G$-groups generate the same quasi-variety and if both of them are $q_{\\omega}$-compact, they are geometrically equivalent. "
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Paper 70252 seems to be performing strangely\n",
    "\n",
    "original = pd.read_parquet('./data/arXiv.parquet',\n",
    "                           columns=['title','abstract']).iloc[70252]\n",
    "\n",
    "pd.DataFrame(original)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Maybe we want to remove all text in between curly braces? {}\n",
    "Also add a cleaning function on the very end that gets rid of any leftover punctuation like \n",
    "\n",
    "asdf \\cite{}. asdfa -> asdf . asdfa -> asdf asdfs\n",
    "\n",
    "could this be as simple as replace ' . ' with '. '?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>40538</th>\n",
       "      <td>Online Coloring of Short Intervals</td>\n",
       "      <td>We study the online graph coloring problem restricted to the intersection graphs of intervals with lengths in LATEX  For LATEX  it is the class of unit interval graphs, and for LATEX  the class of all interval graphs. Our focus is on intermediary classes.   We present a LATEX  algorithm, which beats the state of the art for LATEX  and proves that the problem we study can be strictly easier than online coloring of general interval graphs.   On the lower bound side, we prove that no algorithm is better than LATEX  for any LATEX  nor better than LATEX  for any LATEX  and that no algorithm beats the LATEX  asymptotic competitive ratio for all, arbitrarily large, values of LATEX  That last result shows that the problem we study can be strictly harder than unit interval coloring. Our main technical contribution is a recursive composition of strategies, which seems essential to prove any lower bound higher than LATEX</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47680</th>\n",
       "      <td>How Data Augmentation affects Optimization for Linear Regression</td>\n",
       "      <td>Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss.   We find joint schedules for learning rate and data augmentation scheme under which augmented gradient descent provably converges and characterize the resulting minimum. Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting. Our approach interprets augmented (S)GD as a stochastic optimization method for a time-varying sequence of proxy losses. This gives a unified way to analyze learning rate, batch size, and augmentations ranging from additive noise to random projections. From this perspective, our results, which also give rates of convergence, can be viewed as Monro-Robbins type conditions for augmented (S)GD.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81002</th>\n",
       "      <td>Rationally weighted Hurwitz numbers, Meijer $G$-functions and matrix\\n  integrals</td>\n",
       "      <td>The quantum spectral curve equation associated to KP LATEX  of hypergeometric type serving as generating functions for rationally weighted Hurwitz numbers is solved by generalized hypergeometric series. The basis elements spanning the corresponding Sato Grassmannian element are shown to be Meijer LATEX  or their asymptotic series. Using their Mellin integral representation the LATEX  evaluated at the trace invariants of an externally coupled matrix, is expressed as a matrix integral.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137982</th>\n",
       "      <td>Global well-posedness of the energy critical Nonlinear Schr\\\"odinger\\n  equation with small initial data in H^1(T^3)</td>\n",
       "      <td>A refined trilinear Strichartz estimate for solutions to the Schrodinger equation on the flat rational torus T^3 is derived. By a suitable modification of critical function space theory this is applied to prove a small data global well-posedness result for the quintic Nonlinear Schrodinger Equation in H^s(T^3) for all s \\geq 1. This is the first energy-critical global well-posedness result in the setting of compact manifolds.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40719</th>\n",
       "      <td>A complete answer to the Gaveau--Brockett problem</td>\n",
       "      <td>The note is dedicated to provide a satisfying and complete answer to the long-standing Gaveau--Brockett open problem. More precisely, we determine the exact formula of the Carnot--Caratheodory distance on arbitrary step-two groups. The basic idea of the proof is combining Varadhan's formulas with the explicit expression for the associated heat kernel LATEX  and the method of stationary phase. However, we have to introduce a number of original new methods, especially the usage of the concept of \"Operator convexity\". Next, new integral expressions for LATEX  by means of properties of Bessel functions will be presented. An unexpected direct proof for the well-known positivity of LATEX  via its original integral formula, will play an important role. Furthermore, all normal geodesics joining the identity element LATEX  to any given LATEX  as well as the cut locus can be characterized on every step-two groups. Finally, the corresponding results in Riemannian geometry on step-two groups will be briefly presented as well.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77056</th>\n",
       "      <td>An analog of perfect numbers involving the unitary totient function</td>\n",
       "      <td>We shall give some results for an integer divisible by its unitary totient.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1355</th>\n",
       "      <td>$\\mathbb{Z}_p\\mathbb{Z}_{p^2}\\dots\\mathbb{Z}_{p^s}$-Additive Generalized\\n  Hadamard Codes</td>\n",
       "      <td>The LATEX  codes are subgroups of LATEX  and can be seen as linear codes over LATEX  when LATEX  for all LATEX  a LATEX  code when LATEX  for all LATEX  , or a LATEX  code when LATEX  or LATEX  codes when LATEX  and LATEX  A LATEX  generalized Hadamard (GH) code is a GH code over LATEX  which is the Gray map image of a LATEX  code. In this paper, we generalize some known results for LATEX  GH codes with LATEX  prime and LATEX  First, we give a recursive construction of LATEX  GH codes of type LATEX  with LATEX  and LATEX  Then, we show for which types the corresponding LATEX  GH codes are nonlinear over LATEX  We also compute the kernel and its dimension whenever they are nonlinear.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146820</th>\n",
       "      <td>Momentum and Position Representations for the q-deformed Euclidean\\n  Quantum Space</td>\n",
       "      <td>We summarize some basics about mathematical tools of analysis for the q-deformed Euclidean space. We use the new tools to examine q-deformed eigenfunctions of the momentum or position operator within the framework of the star product formalism. We show that these two systems of functions are complete and orthonormal. With the q-deformed momentum or position eigenfunctions, we calculate matrix elements of the momentum or position operator. Considerations about expectation values and probability densities conclude the studies.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17050</th>\n",
       "      <td>Gacs-Korner Common Information Variational Autoencoder</td>\n",
       "      <td>We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is a variational relaxation of the Gacs-Korner common information, which we recover as a special case, but is more amenable to optimization and can be approximated empirically using samples from the underlying distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables. Additionally, we show that the auto-encoder that we learn recovers semantically meaningful disentangled factors of variation, even though we do not explicitly optimize for it.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13817</th>\n",
       "      <td>Filtrations and torsion pairs in Abramovich Polishchuk's heart</td>\n",
       "      <td>We study some abelian subcategories and torsion pairs in Abramovich Polishchuk's heart. And we construct stability conditions on a full triangulated subcategory LATEX  in LATEX  for an arbitrary smooth projective variety S. We also define a notion of LATEX  level stability, which is a generalization of the slope stability and the Gieseker stability. We show that for any object E in Abramovich Polishchuk's heart, there is a unique filtration whose factors are LATEX  level semistable, and the phase vectors are decreasing in a lexicographic order.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                       title  \\\n",
       "40538   Online Coloring of Short Intervals                                                                                     \n",
       "47680   How Data Augmentation affects Optimization for Linear Regression                                                       \n",
       "81002   Rationally weighted Hurwitz numbers, Meijer $G$-functions and matrix\\n  integrals                                      \n",
       "137982  Global well-posedness of the energy critical Nonlinear Schr\\\"odinger\\n  equation with small initial data in H^1(T^3)   \n",
       "40719   A complete answer to the Gaveau--Brockett problem                                                                      \n",
       "77056   An analog of perfect numbers involving the unitary totient function                                                    \n",
       "1355    $\\mathbb{Z}_p\\mathbb{Z}_{p^2}\\dots\\mathbb{Z}_{p^s}$-Additive Generalized\\n  Hadamard Codes                             \n",
       "146820  Momentum and Position Representations for the q-deformed Euclidean\\n  Quantum Space                                    \n",
       "17050   Gacs-Korner Common Information Variational Autoencoder                                                                 \n",
       "13817   Filtrations and torsion pairs in Abramovich Polishchuk's heart                                                         \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   abstract  \n",
       "40538     We study the online graph coloring problem restricted to the intersection graphs of intervals with lengths in LATEX  For LATEX  it is the class of unit interval graphs, and for LATEX  the class of all interval graphs. Our focus is on intermediary classes.   We present a LATEX  algorithm, which beats the state of the art for LATEX  and proves that the problem we study can be strictly easier than online coloring of general interval graphs.   On the lower bound side, we prove that no algorithm is better than LATEX  for any LATEX  nor better than LATEX  for any LATEX  and that no algorithm beats the LATEX  asymptotic competitive ratio for all, arbitrarily large, values of LATEX  That last result shows that the problem we study can be strictly harder than unit interval coloring. Our main technical contribution is a recursive composition of strategies, which seems essential to prove any lower bound higher than LATEX                                                                                                                                                                                                                                                                        \n",
       "47680     Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss.   We find joint schedules for learning rate and data augmentation scheme under which augmented gradient descent provably converges and characterize the resulting minimum. Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting. Our approach interprets augmented (S)GD as a stochastic optimization method for a time-varying sequence of proxy losses. This gives a unified way to analyze learning rate, batch size, and augmentations ranging from additive noise to random projections. From this perspective, our results, which also give rates of convergence, can be viewed as Monro-Robbins type conditions for augmented (S)GD.   \n",
       "81002     The quantum spectral curve equation associated to KP LATEX  of hypergeometric type serving as generating functions for rationally weighted Hurwitz numbers is solved by generalized hypergeometric series. The basis elements spanning the corresponding Sato Grassmannian element are shown to be Meijer LATEX  or their asymptotic series. Using their Mellin integral representation the LATEX  evaluated at the trace invariants of an externally coupled matrix, is expressed as a matrix integral.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "137982    A refined trilinear Strichartz estimate for solutions to the Schrodinger equation on the flat rational torus T^3 is derived. By a suitable modification of critical function space theory this is applied to prove a small data global well-posedness result for the quintic Nonlinear Schrodinger Equation in H^s(T^3) for all s \\geq 1. This is the first energy-critical global well-posedness result in the setting of compact manifolds.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "40719     The note is dedicated to provide a satisfying and complete answer to the long-standing Gaveau--Brockett open problem. More precisely, we determine the exact formula of the Carnot--Caratheodory distance on arbitrary step-two groups. The basic idea of the proof is combining Varadhan's formulas with the explicit expression for the associated heat kernel LATEX  and the method of stationary phase. However, we have to introduce a number of original new methods, especially the usage of the concept of \"Operator convexity\". Next, new integral expressions for LATEX  by means of properties of Bessel functions will be presented. An unexpected direct proof for the well-known positivity of LATEX  via its original integral formula, will play an important role. Furthermore, all normal geodesics joining the identity element LATEX  to any given LATEX  as well as the cut locus can be characterized on every step-two groups. Finally, the corresponding results in Riemannian geometry on step-two groups will be briefly presented as well.                                                                                                                                                              \n",
       "77056     We shall give some results for an integer divisible by its unitary totient.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \n",
       "1355      The LATEX  codes are subgroups of LATEX  and can be seen as linear codes over LATEX  when LATEX  for all LATEX  a LATEX  code when LATEX  for all LATEX  , or a LATEX  code when LATEX  or LATEX  codes when LATEX  and LATEX  A LATEX  generalized Hadamard (GH) code is a GH code over LATEX  which is the Gray map image of a LATEX  code. In this paper, we generalize some known results for LATEX  GH codes with LATEX  prime and LATEX  First, we give a recursive construction of LATEX  GH codes of type LATEX  with LATEX  and LATEX  Then, we show for which types the corresponding LATEX  GH codes are nonlinear over LATEX  We also compute the kernel and its dimension whenever they are nonlinear.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "146820    We summarize some basics about mathematical tools of analysis for the q-deformed Euclidean space. We use the new tools to examine q-deformed eigenfunctions of the momentum or position operator within the framework of the star product formalism. We show that these two systems of functions are complete and orthonormal. With the q-deformed momentum or position eigenfunctions, we calculate matrix elements of the momentum or position operator. Considerations about expectation values and probability densities conclude the studies.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "17050     We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is a variational relaxation of the Gacs-Korner common information, which we recover as a special case, but is more amenable to optimization and can be approximated empirically using samples from the underlying distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables. Additionally, we show that the auto-encoder that we learn recovers semantically meaningful disentangled factors of variation, even though we do not explicitly optimize for it.                                                               \n",
       "13817     We study some abelian subcategories and torsion pairs in Abramovich Polishchuk's heart. And we construct stability conditions on a full triangulated subcategory LATEX  in LATEX  for an arbitrary smooth projective variety S. We also define a notion of LATEX  level stability, which is a generalization of the slope stability and the Gieseker stability. We show that for any object E in Abramovich Polishchuk's heart, there is a unique filtration whose factors are LATEX  level semistable, and the phase vectors are decreasing in a lexicographic order.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             "
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['aardvark']\n",
      "['sandwich']\n",
      "[]\n",
      "['haaagra']\n"
     ]
    }
   ],
   "source": [
    "import regex\n",
    "## Experiment: Can we use regex to find all instances of 'words' i.e. alpha-numeric characters\n",
    "## containing at least one dash '-'? But it should be embedded inside. So not at the start or end of the\n",
    "## word.\n",
    "\n",
    "## We want to include possibly multi-dashed names, and we also want to find these at the beginning\n",
    "## of the string\n",
    "\n",
    "\n",
    "## A step by step investigation of how to identify a word with at least one instance of a character.\n",
    "\n",
    "## First suppose we want to match any word containing at least one a.\n",
    "## What does [a-z]*a+[a-z]* match?\n",
    "\n",
    "pattern = r'[a-z]*a+[a-z]*'\n",
    "t = 'aardvark'\n",
    "r = 'sandwich'\n",
    "s = 'mount'\n",
    "u = 'haaagra'\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,r))\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,u))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Naber-Cheeger']\n",
      "['Minta-Geis', '-Rabinowitz']\n",
      "[]\n",
      "['Dog--shit', '-']\n",
      "['-yes']\n"
     ]
    }
   ],
   "source": [
    "## What happens if we replace a with -?\n",
    "\n",
    "pattern = r'[A-Za-z]*-+[A-Za-z]*'\n",
    "s = 'Naber-Cheeger'\n",
    "t = 'Minta-Geis-Rabinowitz'\n",
    "u = 'Dogshit fuck'\n",
    "v = 'Dog--shit-'\n",
    "w = '-yes'\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))\n",
    "print(regex.findall(pattern,w))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Naber-Cheeger']\n",
      "['Minta-Geis-Rabinowitz']\n",
      "[]\n",
      "['Dog--shit-']\n",
      "['-yes']\n"
     ]
    }
   ],
   "source": [
    "## Problems:\n",
    "## 1. It cannot grab the entire hyphenated string in t because it the second [] matches greedily with\n",
    "## Geis before it hits a -, which no longer matches. It then looks again for the pattern.\n",
    "## 2. Same issue occurs in v.\n",
    "## 3. w shows that the first matching sequence [A-Za-z] can indeed match 0 times, in which case the string\n",
    "## will start with a -\n",
    "\n",
    "## Now try including - in the matching sets\n",
    "pattern = r'[A-Za-z\\-]*-+[A-Za-z\\-]*'\n",
    "s = 'Naber-Cheeger'\n",
    "t = 'Minta-Geis-Rabinowitz'\n",
    "u = 'Dogshit fuck'\n",
    "v = 'Dog--shit-'\n",
    "w = '-yes'\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))\n",
    "print(regex.findall(pattern,w))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('-Geis', '')]\n",
      "[('', 'Minta-')]\n",
      "[('', 'Minta-')]\n",
      "[('', 'Minta-'), ('', 'Geis-')]\n",
      "[('-yes', '')]\n"
     ]
    }
   ],
   "source": [
    "## Too much matching. Need to avoid matching double --'s.\n",
    "\n",
    "## Actually each of these entire strings matches with just the first set [A-Za-z\\-]*. The way its written\n",
    "## Doesn't make any sense.\n",
    "\n",
    "## Create a pattern that matches with -(TEXT) or TEXT- and just have a bunch of these in a row\n",
    "\n",
    "pattern = r'(-[A-Za-z]+)|([A-Za-z]+-)'\n",
    "s = '-Geis'\n",
    "t = 'Minta-'\n",
    "u = 'Minta-Geis'\n",
    "v = 'Minta-Geis-Rabinowitz'\n",
    "\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))\n",
    "print(regex.findall(pattern,w))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('-Geis', '')]\n",
      "[('', 'Minta')]\n",
      "[('', 'Minta'), ('-Geis', '')]\n",
      "[('', 'Minta'), ('-Geis', ''), ('-Rabinowitz', '')]\n",
      "[('-yes', '')]\n"
     ]
    }
   ],
   "source": [
    "## Why am I getting matching with the empty string? u doesn't work because\n",
    "## The hyphen is already used up by the first match. \n",
    "\n",
    "## Use lookahead/behind\n",
    "\n",
    "pattern = r'(-[A-Za-z]+)|([A-Za-z]+(?=-))'\n",
    "s = '-Geis'\n",
    "t = 'Minta-'\n",
    "u = 'Minta-Geis'\n",
    "v = 'Minta-Geis-Rabinowitz'\n",
    "\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))\n",
    "print(regex.findall(pattern,w))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n",
      "[]\n",
      "[('-Geis-',)]\n",
      "[('-Geis-',)]\n"
     ]
    }
   ],
   "source": [
    "## Put this together\n",
    "\n",
    "pattern = r'Minta(-[A-Za-z]+-)+'\n",
    "\n",
    "s = '-Geis'\n",
    "t = 'Minta-'\n",
    "u = 'Minta-Geis-'\n",
    "v = 'Minta-Geis-Rabinowitz-'\n",
    "\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['-Geis-']\n"
     ]
    }
   ],
   "source": [
    "## What is going on?\n",
    "\n",
    "pattern = r'-\\w+-'\n",
    "t = '-Geis-'\n",
    "print(regex.findall(pattern,t))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Naber-Cheeger']\n",
      "['Minta-Geis-Rabin']\n",
      "['Donaldson-Tian']\n",
      "['McCleerey-Chinese-guy', 'Demailley-Yau']\n"
     ]
    }
   ],
   "source": [
    "pattern = r'(?<!-)\\b(?:\\w+)(?=-)(?:-(?=\\w)\\w+)+(?!-)\\b'\n",
    "s = 'Naber-Cheeger'\n",
    "t = 'Minta-Geis-Rabin'\n",
    "u = 'My fucking dog can prove the Donaldson-Tian conjecture you doughnut!'\n",
    "v = 'Fuckface McCleerey-Chinese-guy did it again. They fucking proved the Demailley-Yau!'\n",
    "\n",
    "print(regex.findall(pattern,s))\n",
    "print(regex.findall(pattern,t))\n",
    "print(regex.findall(pattern,u))\n",
    "print(regex.findall(pattern,v))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "      <th>clean_abstract</th>\n",
       "      <th>keywords</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>125174</th>\n",
       "      <td>Decentralized Charging Control of Electric Vehicles in Residential\\n  Distribution Networks</td>\n",
       "      <td>Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve \"valley-filling\" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.</td>\n",
       "      <td>[math.OC]</td>\n",
       "      <td>[['Liu', 'Mingxi', ''], ['Phanivong', 'Phillippe K.', ''], ['Shi', 'Yang', ''], ['Callaway', 'Duncan S.', '']]</td>\n",
       "      <td>2020-04-02</td>\n",
       "      <td>1710.05533</td>\n",
       "      <td>Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve \"valley-filling\" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.</td>\n",
       "      <td>[non-separable, primal-dual, valley-filling]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91765</th>\n",
       "      <td>Geodesic orbit metrics on homogeneous spaces constructed by strongly\\n  isotropy irreducible spaces</td>\n",
       "      <td>In this paper, we focus on homogeneous spaces which are constructed from two strongly isotropy irreducible spaces, and prove that any geodesic orbit metric on these spaces is naturally reductive.</td>\n",
       "      <td>[math.DG]</td>\n",
       "      <td>[['Chen', 'Huibin', ''], ['Chen', 'Zhiqi', ''], ['Zhu', 'Fuhai', '']]</td>\n",
       "      <td>2020-12-15</td>\n",
       "      <td>2012.07015</td>\n",
       "      <td>In this paper, we focus on homogeneous spaces which are constructed from two strongly isotropy irreducible spaces, and prove that any geodesic orbit metric on these spaces is naturally reductive.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>154012</th>\n",
       "      <td>ISS Property with Respect to Boundary Disturbances for a Class of\\n  Riesz-Spectral Boundary Control Systems</td>\n",
       "      <td>This paper deals with the establishment of Input-to-State Stability (ISS) estimates for infinite dimensional systems with respect to both boundary and distributed disturbances. First, a new approach is developed for the establishment of ISS estimates for a class of Riesz-spectral boundary control systems satisfying certain eigenvalue constraints. Second, a concept of weak solutions is introduced in order to relax the disturbances regularity assumptions required to ensure the existence of classical solutions. The proposed concept of weak solutions, that applies to a large class of boundary control systems which is not limited to the Riesz-spectral ones, provides a natural extension of the concept of both classical and mild solutions. Assuming that an ISS estimate holds true for classical solutions, we show the existence, the uniqueness, and the ISS property of the weak solutions.</td>\n",
       "      <td>[math.OC, cs.SY]</td>\n",
       "      <td>[['Lhachemi', 'Hugo', ''], ['Shorten', 'Robert', '']]</td>\n",
       "      <td>2019-08-07</td>\n",
       "      <td>1810.03553</td>\n",
       "      <td>This paper deals with the establishment of Input-to-State Stability (ISS) estimates for infinite dimensional systems with respect to both boundary and distributed disturbances. First, a new approach is developed for the establishment of ISS estimates for a class of Riesz-spectral boundary control systems satisfying certain eigenvalue constraints. Second, a concept of weak solutions is introduced in order to relax the disturbances regularity assumptions required to ensure the existence of classical solutions. The proposed concept of weak solutions, that applies to a large class of boundary control systems which is not limited to the Riesz-spectral ones, provides a natural extension of the concept of both classical and mild solutions. Assuming that an ISS estimate holds true for classical solutions, we show the existence, the uniqueness, and the ISS property of the weak solutions.</td>\n",
       "      <td>[Input-to-State, Riesz-spectral]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11152</th>\n",
       "      <td>A new class of higher-ordered/extended Boussinesq system for efficient\\n  numerical simulations by splitting operators</td>\n",
       "      <td>In this work, we numerically study the higher-ordered/extended Boussinesq system describing the propagation of water-waves over flat topography. A reformulation of the same order of precision that avoids the calculation of high order derivatives on the surface deformation is proposed. We show that this formulation enjoys an extended range of applicability while remaining stable. Moreover, a significant improvement in terms of linear dispersive properties in high frequency regime is made due to the suitable adjustment of a dispersion correction parameter. We develop a second order splitting scheme where the hyperbolic part of the system is treated with a high-order finite volume scheme and the dispersive part is treated with a finite difference approach. Numerical simulations are then performed under two main goals: validating the model and the numerical methods and assessing the potential need of such higher-order model. \\red{The applicability of the proposed model and numerical method in practical problems is illustrated by a comparison with experimental data.}</td>\n",
       "      <td>[math.AP]</td>\n",
       "      <td>[['Lteif', 'Ralph', '', 'LAMA'], ['Gerbi', 'Stéphane', '', 'LAMA']]</td>\n",
       "      <td>2022-07-04</td>\n",
       "      <td>2102.09849</td>\n",
       "      <td>In this work, we numerically study the higher-ordered/extended Boussinesq system describing the propagation of water-waves over flat topography. A reformulation of the same order of precision that avoids the calculation of high order derivatives on the surface deformation is proposed. We show that this formulation enjoys an extended range of applicability while remaining stable. Moreover, a significant improvement in terms of linear dispersive properties in high frequency regime is made due to the suitable adjustment of a dispersion correction parameter. We develop a second order splitting scheme where the hyperbolic part of the system is treated with a high-order finite volume scheme and the dispersive part is treated with a finite difference approach. Numerical simulations are then performed under two main goals: validating the model and the numerical methods and assessing the potential need of such higher-order model.</td>\n",
       "      <td>[higher-order, high-order, water-waves, higher-ordered]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67426</th>\n",
       "      <td>Localized Reduced Basis Additive Schwarz Methods</td>\n",
       "      <td>Reduced basis methods build low-rank approximation spaces for the solution sets of parameterized PDEs by computing solutions of the given PDE for appropriately selected snapshot parameters. Localized reduced basis methods reduce the offline cost of computing these snapshot solutions by instead constructing a global space from spatially localized less expensive problems. In the case of online enrichment, these local problems are iteratively solved in regions of high residual and correspond to subdomain solves in domain decomposition methods. We show in this note that indeed there is a close relationship between online-enriched localized reduced basis and domain decomposition methods by introducing a Localized Reduced Basis Additive Schwarz method (LRBAS), which can be interpreted as a locally adaptive multi-preconditioning scheme for the CG method.</td>\n",
       "      <td>[math.NA, cs.NA]</td>\n",
       "      <td>[['Gander', 'Martin J.', ''], ['Rave', 'Stephan', '']]</td>\n",
       "      <td>2021-06-09</td>\n",
       "      <td>2103.10884</td>\n",
       "      <td>Reduced basis methods build low-rank approximation spaces for the solution sets of parameterized PDEs by computing solutions of the given PDE for appropriately selected snapshot parameters. Localized reduced basis methods reduce the offline cost of computing these snapshot solutions by instead constructing a global space from spatially localized less expensive problems. In the case of online enrichment, these local problems are iteratively solved in regions of high residual and correspond to subdomain solves in domain decomposition methods. We show in this note that indeed there is a close relationship between online-enriched localized reduced basis and domain decomposition methods by introducing a Localized Reduced Basis Additive Schwarz method (LRBAS), which can be interpreted as a locally adaptive multi-preconditioning scheme for the CG method.</td>\n",
       "      <td>[low-rank, multi-preconditioning, online-enriched]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113435</th>\n",
       "      <td>Some connections of complex dynamics</td>\n",
       "      <td>We survey some of the connections linking complex dynamics to other fields of mathematics and science. We hope to show that complex dynamics is not just interesting on its own but also has value as an applicable theory.</td>\n",
       "      <td>[math.DS, math.CV]</td>\n",
       "      <td>[['DeZotti', 'Alexandre', '']]</td>\n",
       "      <td>2020-07-01</td>\n",
       "      <td>2006.16386</td>\n",
       "      <td>We survey some of the connections linking complex dynamics to other fields of mathematics and science. We hope to show that complex dynamics is not just interesting on its own but also has value as an applicable theory.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101715</th>\n",
       "      <td>$L^2$ decay for the linearized Landau equation with the specular\\n  boundary condition</td>\n",
       "      <td>In this paper, we develop an alternative approach to establish the $L^2$ decay estimate for the linearized Landau equation in a bounded domain with specular boundary condition. The proof is based on the methodology of proof by contradiction motivated by [Guo, Comm. Pure Appl. Math., 55(9):1104-1135, 2002] and [Guo, Arch. Ration. Mech. Anal., 197(3):713-809, 2010].</td>\n",
       "      <td>[math.AP]</td>\n",
       "      <td>[['Guo', 'Yan', ''], ['Hwang', 'Hyung Ju', ''], ['Jang', 'Jin Woo', ''], ['Ouyang', 'Zhimeng', '']]</td>\n",
       "      <td>2020-09-30</td>\n",
       "      <td>2009.01391</td>\n",
       "      <td>In this paper, we develop an alternative approach to establish the LATEX  decay estimate for the linearized Landau equation in a bounded domain with specular boundary condition. The proof is based on the methodology of proof by contradiction motivated by [Guo, Comm. Pure Appl. Math., 55(9):1104-1135, 2002] and [Guo, Arch. Ration. Mech. Anal., 197(3):713-809, 2010].</td>\n",
       "      <td>[713-809, 1104-1135]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21170</th>\n",
       "      <td>Information in probability: Another information-theoretic proof of a\\n  finite de Finetti theorem</td>\n",
       "      <td>We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'</td>\n",
       "      <td>[math.PR, cs.IT, math.IT]</td>\n",
       "      <td>[['Gavalakis', 'Lampros', ''], ['Kontoyiannis', 'Ioannis', '']]</td>\n",
       "      <td>2022-04-28</td>\n",
       "      <td>2204.05033</td>\n",
       "      <td>We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first LATEX  in a sequence of LATEX  exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'</td>\n",
       "      <td>[information-theoretic, finite-valued]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70247</th>\n",
       "      <td>Stated skein algebras and their representations</td>\n",
       "      <td>This is a survey on stated skein algebras and their representations.</td>\n",
       "      <td>[math.GT, math.QA]</td>\n",
       "      <td>[['Korinman', 'Julien', '']]</td>\n",
       "      <td>2021-05-21</td>\n",
       "      <td>2105.09563</td>\n",
       "      <td>This is a survey on stated skein algebras and their representations.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157070</th>\n",
       "      <td>On mixture representations for the generalized Linnik distribution and\\n  their applications in limit theorems</td>\n",
       "      <td>We present new mixture representations for the generalized Linnik distribution in terms of normal, Laplace, exponential and stable laws and establish the relationship between the mixing distributions in these representations. Based on these representations, we prove some limit theorems for a wide class of rather simple statistics constructed from samples with random sized including, e. g., random sums of independent random variables with finite variances and maximum random sums, in which the generalized Linnik distribution plays the role of the limit law. Thus we demonstrate that the scheme of geometric (or, in general, negative binomial) summation is far not the only asymptotic setting (even for sums of independent random variables) in which the generalized Linnik law appears as the limit distribution.</td>\n",
       "      <td>[math.PR]</td>\n",
       "      <td>[['Korolev', 'V. Yu.', ''], ['Gorshenin', 'A. K.', ''], ['Zeifman', 'A. I.', '']]</td>\n",
       "      <td>2019-07-10</td>\n",
       "      <td>1810.06389</td>\n",
       "      <td>We present new mixture representations for the generalized Linnik distribution in terms of normal, Laplace, exponential and stable laws and establish the relationship between the mixing distributions in these representations. Based on these representations, we prove some limit theorems for a wide class of rather simple statistics constructed from samples with random sized including, e. g., random sums of independent random variables with finite variances and maximum random sums, in which the generalized Linnik distribution plays the role of the limit law. Thus we demonstrate that the scheme of geometric (or, in general, negative binomial) summation is far not the only asymptotic setting (even for sums of independent random variables) in which the generalized Linnik law appears as the limit distribution.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25237</th>\n",
       "      <td>Shifted Witten classes and topological recursion</td>\n",
       "      <td>The Witten $r$-spin class defines a non-semisimple cohomological field theory. Pandharipande, Pixton and Zvonkine studied two special shifts of the Witten class along two semisimple directions of the associated Dubrovin--Frobenius manifold using the Givental--Teleman reconstruction theorem. We show that the $R$-matrix and the translation of these two specific shifts can be constructed from the solutions of two differential equations that generalise the classical Airy differential equation. Using this, we prove that the descendant intersection theory of the shifted Witten classes satisfies topological recursion on two $1$-parameter families of spectral curves. By taking the limit as the parameter goes to zero for these families of spectral curves, we prove that the descendant intersection theory of the Witten $r$-spin class can be computed by topological recursion on the $r$-Airy spectral curve. We finally show that this proof suffices to deduce Witten's $r$-spin conjecture, already proved by Faber, Shadrin and Zvonkine, which claims that the generating series of $r$-spin intersection numbers is the tau function of the $r$-KdV hierarchy that satisfies the string equation.</td>\n",
       "      <td>[math.AG, math-ph, math.CA, math.MP]</td>\n",
       "      <td>[['Charbonnier', 'Séverin', ''], ['Chidambaram', 'Nitin Kumar', ''], ['Garcia-Failde', 'Elba', ''], ['Giacchetto', 'Alessandro', '']]</td>\n",
       "      <td>2022-03-31</td>\n",
       "      <td>2203.16523</td>\n",
       "      <td>The Witten LATEX  class defines a non-semisimple cohomological field theory. Pandharipande, Pixton and Zvonkine studied two special shifts of the Witten class along two semisimple directions of the associated Dubrovin--Frobenius manifold using the Givental--Teleman reconstruction theorem. We show that the LATEX  and the translation of these two specific shifts can be constructed from the solutions of two differential equations that generalise the classical Airy differential equation. Using this, we prove that the descendant intersection theory of the shifted Witten classes satisfies topological recursion on two LATEX  families of spectral curves. By taking the limit as the parameter goes to zero for these families of spectral curves, we prove that the descendant intersection theory of the Witten LATEX  class can be computed by topological recursion on the LATEX  spectral curve. We finally show that this proof suffices to deduce Witten's LATEX  conjecture, already proved by Faber, Shadrin and Zvonkine, which claims that the generating series of LATEX  intersection numbers is the tau function of the LATEX  hierarchy that satisfies the string equation.</td>\n",
       "      <td>[non-semisimple]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36435</th>\n",
       "      <td>Weighted $L^2$-contractivity of Langevin dynamics with singular\\n  potentials</td>\n",
       "      <td>Convergence to equilibrium of underdamped Langevin dynamics is studied under general assumptions on the potential $U$ allowing for singularities. By modifying the direct approach to convergence in $L^2$ pioneered by F. H\\'erau and developped by Dolbeault, Mouhot and Schmeiser, we show that the dynamics converges exponentially fast to equilibrium in the topologies $L^2(d\\mu)$ and $L^2(W^* d\\mu)$, where $\\mu$ denotes the invariant probability measure and $W^*$ is a suitable Lyapunov weight. In both norms, we make precise how the exponential convergence rate depends on the friction parameter $\\gamma$ in Langevin dynamics, by providing a lower bound scaling as $\\min(\\gamma, \\gamma^{-1})$. The results hold for usual polynomial-type potentials as well as potentials with singularities such as those arising from pairwise Lennard-Jones interactions between particles.</td>\n",
       "      <td>[math.PR, math-ph, math.AP, math.MP]</td>\n",
       "      <td>[['Camrud', 'Evan', ''], ['Herzog', 'David P.', ''], ['Stoltz', 'Gabriel', ''], ['Gordina', 'Maria', '']]</td>\n",
       "      <td>2022-01-19</td>\n",
       "      <td>2104.10574</td>\n",
       "      <td>Convergence to equilibrium of underdamped Langevin dynamics is studied under general assumptions on the potential LATEX  allowing for singularities. By modifying the direct approach to convergence in LATEX  pioneered by F. Herau and developped by Dolbeault, Mouhot and Schmeiser, we show that the dynamics converges exponentially fast to equilibrium in the topologies LATEX  and LATEX  where LATEX  denotes the invariant probability measure and LATEX  is a suitable Lyapunov weight. In both norms, we make precise how the exponential convergence rate depends on the friction parameter LATEX  in Langevin dynamics, by providing a lower bound scaling as LATEX  The results hold for usual polynomial-type potentials as well as potentials with singularities such as those arising from pairwise Lennard-Jones interactions between particles.</td>\n",
       "      <td>[polynomial-type, Lennard-Jones]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>105352</th>\n",
       "      <td>A note on extremely primitive affine groups</td>\n",
       "      <td>Let $G$ be a finite primitive permutation group on a set $\\Omega$ with nontrivial point stabilizer $G_{\\alpha}$. We say that $G$ is extremely primitive if $G_{\\alpha}$ acts primitively on each of its orbits in $\\Omega \\setminus \\{\\alpha\\}$. In earlier work, Mann, Praeger and Seress have proved that every extremely primitive group is either almost simple or of affine type and they have classified the affine groups up to the possibility of at most finitely many exceptions. More recently, the almost simple extremely primitive groups have been completely determined. If one assumes Wall's conjecture on the number of maximal subgroups of almost simple groups, then the results of Mann et al. show that it just remains to eliminate an explicit list of affine groups in order to complete the classification of the extremely primitive groups. Mann et al. have conjectured that none of these affine candidates are extremely primitive and our main result confirms this conjecture.</td>\n",
       "      <td>[math.GR]</td>\n",
       "      <td>[['Burness', 'Timothy C.', ''], ['Thomas', 'Adam R.', '']]</td>\n",
       "      <td>2020-09-01</td>\n",
       "      <td>2005.11554</td>\n",
       "      <td>Let LATEX  be a finite primitive permutation group on a set LATEX  with nontrivial point stabilizer LATEX  We say that LATEX  is extremely primitive if LATEX  acts primitively on each of its orbits in LATEX  In earlier work, Mann, Praeger and Seress have proved that every extremely primitive group is either almost simple or of affine type and they have classified the affine groups up to the possibility of at most finitely many exceptions. More recently, the almost simple extremely primitive groups have been completely determined. If one assumes Wall's conjecture on the number of maximal subgroups of almost simple groups, then the results of Mann et al. show that it just remains to eliminate an explicit list of affine groups in order to complete the classification of the extremely primitive groups. Mann et al. have conjectured that none of these affine candidates are extremely primitive and our main result confirms this conjecture.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92397</th>\n",
       "      <td>Inexact Derivative-Free Optimization for Bilevel Learning</td>\n",
       "      <td>Variational regularization techniques are dominant in the field of mathematical imaging. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. A by now common strategy to resolve this issue is to learn these parameters from data. While mathematically appealing this strategy leads to a nested optimization problem (known as bilevel optimization) which is computationally very difficult to handle. It is common when solving the upper-level problem to assume access to exact solutions of the lower-level problem, which is practically infeasible. In this work we propose to solve these problems using inexact derivative-free optimization algorithms which never require exact lower-level problem solutions, but instead assume access to approximate solutions with controllable accuracy, which is achievable in practice. We prove global convergence and a worstcase complexity bound for our approach. We test our proposed framework on ROFdenoising and learning MRI sampling patterns. Dynamically adjusting the lower-level accuracy yields learned parameters with similar reconstruction quality as highaccuracy evaluations but with dramatic reductions in computational work (up to 100 times faster in some cases).</td>\n",
       "      <td>[math.OC, cs.CV, cs.LG, cs.NA, math.NA, stat.ML]</td>\n",
       "      <td>[['Ehrhardt', 'Matthias J.', ''], ['Roberts', 'Lindon', '']]</td>\n",
       "      <td>2020-12-10</td>\n",
       "      <td>2006.12674</td>\n",
       "      <td>Variational regularization techniques are dominant in the field of mathematical imaging. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. A by now common strategy to resolve this issue is to learn these parameters from data. While mathematically appealing this strategy leads to a nested optimization problem (known as bilevel optimization) which is computationally very difficult to handle. It is common when solving the upper-level problem to assume access to exact solutions of the lower-level problem, which is practically infeasible. In this work we propose to solve these problems using inexact derivative-free optimization algorithms which never require exact lower-level problem solutions, but instead assume access to approximate solutions with controllable accuracy, which is achievable in practice. We prove global convergence and a worstcase complexity bound for our approach. We test our proposed framework on ROFdenoising and learning MRI sampling patterns. Dynamically adjusting the lower-level accuracy yields learned parameters with similar reconstruction quality as highaccuracy evaluations but with dramatic reductions in computational work (up to 100 times faster in some cases).</td>\n",
       "      <td>[lower-level, derivative-free, upper-level]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36360</th>\n",
       "      <td>Interpolation for analytic families of multilinear operators on metric\\n  measure spaces</td>\n",
       "      <td>Let (X j , d j , $\\mu$ j), j = 0, 1,. .. , m be metric measure spaces. Given 0 &lt; p $\\kappa$ $\\le$ $\\infty$ for $\\kappa$ = 1,. .. , m and an analytic family of multilinear operators T z : L p 1 (X 1) x $\\bullet$ $\\bullet$ $\\bullet$ L p m (X m) $\\rightarrow$ L 1 loc (X 0), for z in the complex unit strip, we prove a theorem in the spirit of Stein's complex interpolation for analytic families. Analyticity and our admissibility condition are defined in the weak (integral) sense and relax the pointwise definitions given in [9]. Continuous functions with compact support are natural dense subspaces of Lebesgue spaces over metric measure spaces and we assume the operators T z are initially defined on them. Our main lemma concerns the approximation of continuous functions with compact support by similar functions that depend analytically in an auxiliary parameter z. An application of the main theorem concerning bilinear estimates for Schr{\\\"o}dinger operators on L p is included.</td>\n",
       "      <td>[math.AP, math.FA]</td>\n",
       "      <td>[['Grafakos', 'Loukas', '', 'IMB'], ['Ouhabaz', 'El Maati', '', 'IMB']]</td>\n",
       "      <td>2022-01-19</td>\n",
       "      <td>2107.00290</td>\n",
       "      <td>Let (X j , d j , LATEX  j), j = 0, 1,. .. , m be metric measure spaces. Given 0 &lt; p LATEX  LATEX  LATEX  for LATEX  = 1,. .. , m and an analytic family of multilinear operators T z : L p 1 (X 1) x LATEX  LATEX  LATEX  L p m (X m) LATEX  L 1 loc (X 0), for z in the complex unit strip, we prove a theorem in the spirit of Stein's complex interpolation for analytic families. Analyticity and our admissibility condition are defined in the weak (integral) sense and relax the pointwise definitions given in [9]. Continuous functions with compact support are natural dense subspaces of Lebesgue spaces over metric measure spaces and we assume the operators T z are initially defined on them. Our main lemma concerns the approximation of continuous functions with compact support by similar functions that depend analytically in an auxiliary parameter z. An application of the main theorem concerning bilinear estimates for Schr{o}dinger operators on L p is included.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37936</th>\n",
       "      <td>Matrix versions of real and quaternionic nullstellensatz</td>\n",
       "      <td>Real Nullstellensatz is a classical result from Real Algebraic Geometry. It has recently been extended to quaternionic polynomials by Alon and Paran. The aim of this paper is to extend their Quaternionic Nullstellensatz to matrix polynomials. We also obtain an improvement of the Real Nullstellensatz for matrix polynomials in the sense that we simplify the definition of a real left ideal. We use the methods from the proof of the matrix version of Hilbert's Nullstellensatz and we obtain their extensions to a mildly non-commutative case and to the real case.</td>\n",
       "      <td>[math.RA]</td>\n",
       "      <td>[['Cimprič', 'J.', '']]</td>\n",
       "      <td>2022-01-06</td>\n",
       "      <td>2201.01345</td>\n",
       "      <td>Real Nullstellensatz is a classical result from Real Algebraic Geometry. It has recently been extended to quaternionic polynomials by Alon and Paran. The aim of this paper is to extend their Quaternionic Nullstellensatz to matrix polynomials. We also obtain an improvement of the Real Nullstellensatz for matrix polynomials in the sense that we simplify the definition of a real left ideal. We use the methods from the proof of the matrix version of Hilbert's Nullstellensatz and we obtain their extensions to a mildly non-commutative case and to the real case.</td>\n",
       "      <td>[non-commutative]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29384</th>\n",
       "      <td>On the mathematics of beauty: beautiful music</td>\n",
       "      <td>In this paper, we will study the simplest kind of beauty that can be found in a simple piece of music and can be appreciated universally. The proposed approach shows that aesthetically appealing patterns deliver higher amount of information over multiple levels in comparison with less aesthetically appealing patterns when the same amount of energy is used. The proposed model is tested on a set of beautiful music pieces.</td>\n",
       "      <td>[cs.IT, math.IT]</td>\n",
       "      <td>[['Khalili', 'A. M.', '']]</td>\n",
       "      <td>2022-03-04</td>\n",
       "      <td>1707.06510</td>\n",
       "      <td>In this paper, we will study the simplest kind of beauty that can be found in a simple piece of music and can be appreciated universally. The proposed approach shows that aesthetically appealing patterns deliver higher amount of information over multiple levels in comparison with less aesthetically appealing patterns when the same amount of energy is used. The proposed model is tested on a set of beautiful music pieces.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63951</th>\n",
       "      <td>Interplay between opers, quantum curves, WKB analysis, and Higgs bundles</td>\n",
       "      <td>Quantum curves were introduced in the physics literature. We develop a mathematical framework for the case associated with Hitchin spectral curves. In this context, a quantum curve is a Rees $\\mathcal{D}$-module on a smooth projective algebraic curve, whose semi-classical limit produces the Hitchin spectral curve of a Higgs bundle. We give a method of quantization of Hitchin spectral curves by concretely constructing one-parameter deformation families of opers.   We propose a variant of the topological recursion of Eynard--Orantin and Mirzakhani for the context of singular Hitchin spectral curves. We show that a PDE version of topological recursion provides all-order WKB analysis for the Rees $\\mathcal{D}$-modules, defined as the quantization of Hitchin spectral curves associated with meromorphic $SL(2,\\mathbb{C})$-Higgs bundles. Topological recursion can be considered as a process of quantization of Hitchin spectral curves. We prove that these two quantizations, one via the construction of families of opers, and the other via the PDE recursion of topological type, agree for holomorphic and meromorphic $SL(2,\\mathbb{C})$-Higgs bundles. Classical differential equations such as the Airy differential equation provides a typical example. Through these classical examples, we see that quantum curves relate Higgs bundles, opers, a conjecture of Gaiotto, and quantum invariants, such as Gromov--Witten invariants</td>\n",
       "      <td>[math.AG, math-ph, math.MP]</td>\n",
       "      <td>[['Dumitrescu', 'Olivia', ''], ['Mulase', 'Motohico', '']]</td>\n",
       "      <td>2021-07-05</td>\n",
       "      <td>1702.00511</td>\n",
       "      <td>Quantum curves were introduced in the physics literature. We develop a mathematical framework for the case associated with Hitchin spectral curves. In this context, a quantum curve is a Rees LATEX  defined as the quantization of Hitchin spectral curves associated with meromorphic LATEX  bundles. Topological recursion can be considered as a process of quantization of Hitchin spectral curves. We prove that these two quantizations, one via the construction of families of opers, and the other via the PDE recursion of topological type, agree for holomorphic and meromorphic LATEX  bundles. Classical differential equations such as the Airy differential equation provides a typical example. Through these classical examples, we see that quantum curves relate Higgs bundles, opers, a conjecture of Gaiotto, and quantum invariants, such as Gromov--Witten invariants</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60746</th>\n",
       "      <td>Computation of generalized matrix functions with rational Krylov methods</td>\n",
       "      <td>We present a class of algorithms based on rational Krylov methods to compute the action of a generalized matrix function on a vector. These algorithms incorporate existing methods based on the Golub-Kahan bidiagonalization as a special case. By exploiting the quasiseparable structure of the projected matrices, we show that the basis vectors can be updated using a short recurrence, which can be seen as a generalization to the rational case of the Golub-Kahan bidiagonalization. We also prove error bounds that relate the error of these methods to uniform rational approximation. The effectiveness of the algorithms and the accuracy of the bounds is illustrated with numerical experiments.</td>\n",
       "      <td>[math.NA, cs.NA]</td>\n",
       "      <td>[['Casulli', 'Angelo Alberto', ''], ['Simunec', 'Igor', '']]</td>\n",
       "      <td>2021-07-27</td>\n",
       "      <td>2107.12074</td>\n",
       "      <td>We present a class of algorithms based on rational Krylov methods to compute the action of a generalized matrix function on a vector. These algorithms incorporate existing methods based on the Golub-Kahan bidiagonalization as a special case. By exploiting the quasiseparable structure of the projected matrices, we show that the basis vectors can be updated using a short recurrence, which can be seen as a generalization to the rational case of the Golub-Kahan bidiagonalization. We also prove error bounds that relate the error of these methods to uniform rational approximation. The effectiveness of the algorithms and the accuracy of the bounds is illustrated with numerical experiments.</td>\n",
       "      <td>[Golub-Kahan]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26478</th>\n",
       "      <td>Spatial ecology, optimal control and game theoretical fishing problems</td>\n",
       "      <td>Of paramount importance in both ecological systems and economic policies are the problems of harvesting of natural resources. A paradigmatic situation where this question is raised is that of fishing strategies. Indeed, overfishing is a well-known problem in the management of live-stocks, as being too greedy may lead to an overall dramatic depletion of the population we are harvesting. A closely related topic is that of Nash equilibria in the context of fishing policies. Namely, two players being in competition for the same pool of resources, is it possible for them to find an equilibrium situation? The goal of this paper is to provide a detailed analysis of these two queries (\\emph{i.e} optimal fishing strategies for single-player models and study of Nash equilibria for multiple players games) by using a basic yet instructive mathematical model, the logistic-diffusive equation. In this framework, the underlying model simply reads $-\\mu\\Delta \\theta=\\theta(K(x)-\\alpha(x)-\\theta)$ where $K$ accounts for natural resources, $\\theta$ for the density of the population that is being harvested and $\\alpha=\\alpha(x)$ encodes either the single player fishing strategy or, when dealing with Nash equilibria, a combination of the fishing strategies of both players. This article consists of two main parts. The first one gives a very fine characterisation of the optimisers for the single-player game. In the case where two players are involved, we aim at finding a Nash equilibrium. We prove the existence of Nash equilibria in several different regimes \\textcolor{black}{and investigate several related qualitative queries}.Our study is completed by a variety of numerical simulations that illustrate our results and allow us to formulate open questions and conjectures.</td>\n",
       "      <td>[math.OC, math.AP]</td>\n",
       "      <td>[['Mazari', 'Idriss', ''], ['Ruiz-Balet', 'Domènec', '']]</td>\n",
       "      <td>2022-03-23</td>\n",
       "      <td>2203.11844</td>\n",
       "      <td>Of paramount importance in both ecological systems and economic policies are the problems of harvesting of natural resources. A paradigmatic situation where this question is raised is that of fishing strategies. Indeed, overfishing is a well-known problem in the management of live-stocks, as being too greedy may lead to an overall dramatic depletion of the population we are harvesting. A closely related topic is that of Nash equilibria in the context of fishing policies. Namely, two players being in competition for the same pool of resources, is it possible for them to find an equilibrium situation? The goal of this paper is to provide a detailed analysis of these two queries ( optimal fishing strategies for single-player models and study of Nash equilibria for multiple players games) by using a basic yet instructive mathematical model, the logistic-diffusive equation. In this framework, the underlying model simply reads LATEX  where LATEX  accounts for natural resources, LATEX  for the density of the population that is being harvested and LATEX  encodes either the single player fishing strategy or, when dealing with Nash equilibria, a combination of the fishing strategies of both players. This article consists of two main parts. The first one gives a very fine characterisation of the optimisers for the single-player game. In the case where two players are involved, we aim at finding a Nash equilibrium. We prove the existence of Nash equilibria in several different regimes {and investigate several related qualitative queries}.Our study is completed by a variety of numerical simulations that illustrate our results and allow us to formulate open questions and conjectures.</td>\n",
       "      <td>[logistic-diffusive, live-stocks, single-player, well-known]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                         title  \\\n",
       "125174  Decentralized Charging Control of Electric Vehicles in Residential\\n  Distribution Networks                              \n",
       "91765   Geodesic orbit metrics on homogeneous spaces constructed by strongly\\n  isotropy irreducible spaces                      \n",
       "154012  ISS Property with Respect to Boundary Disturbances for a Class of\\n  Riesz-Spectral Boundary Control Systems             \n",
       "11152   A new class of higher-ordered/extended Boussinesq system for efficient\\n  numerical simulations by splitting operators   \n",
       "67426   Localized Reduced Basis Additive Schwarz Methods                                                                         \n",
       "113435  Some connections of complex dynamics                                                                                     \n",
       "101715  $L^2$ decay for the linearized Landau equation with the specular\\n  boundary condition                                   \n",
       "21170   Information in probability: Another information-theoretic proof of a\\n  finite de Finetti theorem                        \n",
       "70247   Stated skein algebras and their representations                                                                          \n",
       "157070  On mixture representations for the generalized Linnik distribution and\\n  their applications in limit theorems           \n",
       "25237   Shifted Witten classes and topological recursion                                                                         \n",
       "36435   Weighted $L^2$-contractivity of Langevin dynamics with singular\\n  potentials                                            \n",
       "105352  A note on extremely primitive affine groups                                                                              \n",
       "92397   Inexact Derivative-Free Optimization for Bilevel Learning                                                                \n",
       "36360   Interpolation for analytic families of multilinear operators on metric\\n  measure spaces                                 \n",
       "37936   Matrix versions of real and quaternionic nullstellensatz                                                                 \n",
       "29384   On the mathematics of beauty: beautiful music                                                                            \n",
       "63951   Interplay between opers, quantum curves, WKB analysis, and Higgs bundles                                                 \n",
       "60746   Computation of generalized matrix functions with rational Krylov methods                                                 \n",
       "26478   Spatial ecology, optimal control and game theoretical fishing problems                                                   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \\\n",
       "125174    Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve \"valley-filling\" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "91765     In this paper, we focus on homogeneous spaces which are constructed from two strongly isotropy irreducible spaces, and prove that any geodesic orbit metric on these spaces is naturally reductive.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "154012    This paper deals with the establishment of Input-to-State Stability (ISS) estimates for infinite dimensional systems with respect to both boundary and distributed disturbances. First, a new approach is developed for the establishment of ISS estimates for a class of Riesz-spectral boundary control systems satisfying certain eigenvalue constraints. Second, a concept of weak solutions is introduced in order to relax the disturbances regularity assumptions required to ensure the existence of classical solutions. The proposed concept of weak solutions, that applies to a large class of boundary control systems which is not limited to the Riesz-spectral ones, provides a natural extension of the concept of both classical and mild solutions. Assuming that an ISS estimate holds true for classical solutions, we show the existence, the uniqueness, and the ISS property of the weak solutions.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "11152     In this work, we numerically study the higher-ordered/extended Boussinesq system describing the propagation of water-waves over flat topography. A reformulation of the same order of precision that avoids the calculation of high order derivatives on the surface deformation is proposed. We show that this formulation enjoys an extended range of applicability while remaining stable. Moreover, a significant improvement in terms of linear dispersive properties in high frequency regime is made due to the suitable adjustment of a dispersion correction parameter. We develop a second order splitting scheme where the hyperbolic part of the system is treated with a high-order finite volume scheme and the dispersive part is treated with a finite difference approach. Numerical simulations are then performed under two main goals: validating the model and the numerical methods and assessing the potential need of such higher-order model. \\red{The applicability of the proposed model and numerical method in practical problems is illustrated by a comparison with experimental data.}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "67426     Reduced basis methods build low-rank approximation spaces for the solution sets of parameterized PDEs by computing solutions of the given PDE for appropriately selected snapshot parameters. Localized reduced basis methods reduce the offline cost of computing these snapshot solutions by instead constructing a global space from spatially localized less expensive problems. In the case of online enrichment, these local problems are iteratively solved in regions of high residual and correspond to subdomain solves in domain decomposition methods. We show in this note that indeed there is a close relationship between online-enriched localized reduced basis and domain decomposition methods by introducing a Localized Reduced Basis Additive Schwarz method (LRBAS), which can be interpreted as a locally adaptive multi-preconditioning scheme for the CG method.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "113435    We survey some of the connections linking complex dynamics to other fields of mathematics and science. We hope to show that complex dynamics is not just interesting on its own but also has value as an applicable theory.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "101715    In this paper, we develop an alternative approach to establish the $L^2$ decay estimate for the linearized Landau equation in a bounded domain with specular boundary condition. The proof is based on the methodology of proof by contradiction motivated by [Guo, Comm. Pure Appl. Math., 55(9):1104-1135, 2002] and [Guo, Arch. Ration. Mech. Anal., 197(3):713-809, 2010].                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "21170     We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "70247     This is a survey on stated skein algebras and their representations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "157070    We present new mixture representations for the generalized Linnik distribution in terms of normal, Laplace, exponential and stable laws and establish the relationship between the mixing distributions in these representations. Based on these representations, we prove some limit theorems for a wide class of rather simple statistics constructed from samples with random sized including, e. g., random sums of independent random variables with finite variances and maximum random sums, in which the generalized Linnik distribution plays the role of the limit law. Thus we demonstrate that the scheme of geometric (or, in general, negative binomial) summation is far not the only asymptotic setting (even for sums of independent random variables) in which the generalized Linnik law appears as the limit distribution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "25237     The Witten $r$-spin class defines a non-semisimple cohomological field theory. Pandharipande, Pixton and Zvonkine studied two special shifts of the Witten class along two semisimple directions of the associated Dubrovin--Frobenius manifold using the Givental--Teleman reconstruction theorem. We show that the $R$-matrix and the translation of these two specific shifts can be constructed from the solutions of two differential equations that generalise the classical Airy differential equation. Using this, we prove that the descendant intersection theory of the shifted Witten classes satisfies topological recursion on two $1$-parameter families of spectral curves. By taking the limit as the parameter goes to zero for these families of spectral curves, we prove that the descendant intersection theory of the Witten $r$-spin class can be computed by topological recursion on the $r$-Airy spectral curve. We finally show that this proof suffices to deduce Witten's $r$-spin conjecture, already proved by Faber, Shadrin and Zvonkine, which claims that the generating series of $r$-spin intersection numbers is the tau function of the $r$-KdV hierarchy that satisfies the string equation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "36435     Convergence to equilibrium of underdamped Langevin dynamics is studied under general assumptions on the potential $U$ allowing for singularities. By modifying the direct approach to convergence in $L^2$ pioneered by F. H\\'erau and developped by Dolbeault, Mouhot and Schmeiser, we show that the dynamics converges exponentially fast to equilibrium in the topologies $L^2(d\\mu)$ and $L^2(W^* d\\mu)$, where $\\mu$ denotes the invariant probability measure and $W^*$ is a suitable Lyapunov weight. In both norms, we make precise how the exponential convergence rate depends on the friction parameter $\\gamma$ in Langevin dynamics, by providing a lower bound scaling as $\\min(\\gamma, \\gamma^{-1})$. The results hold for usual polynomial-type potentials as well as potentials with singularities such as those arising from pairwise Lennard-Jones interactions between particles.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "105352    Let $G$ be a finite primitive permutation group on a set $\\Omega$ with nontrivial point stabilizer $G_{\\alpha}$. We say that $G$ is extremely primitive if $G_{\\alpha}$ acts primitively on each of its orbits in $\\Omega \\setminus \\{\\alpha\\}$. In earlier work, Mann, Praeger and Seress have proved that every extremely primitive group is either almost simple or of affine type and they have classified the affine groups up to the possibility of at most finitely many exceptions. More recently, the almost simple extremely primitive groups have been completely determined. If one assumes Wall's conjecture on the number of maximal subgroups of almost simple groups, then the results of Mann et al. show that it just remains to eliminate an explicit list of affine groups in order to complete the classification of the extremely primitive groups. Mann et al. have conjectured that none of these affine candidates are extremely primitive and our main result confirms this conjecture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "92397     Variational regularization techniques are dominant in the field of mathematical imaging. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. A by now common strategy to resolve this issue is to learn these parameters from data. While mathematically appealing this strategy leads to a nested optimization problem (known as bilevel optimization) which is computationally very difficult to handle. It is common when solving the upper-level problem to assume access to exact solutions of the lower-level problem, which is practically infeasible. In this work we propose to solve these problems using inexact derivative-free optimization algorithms which never require exact lower-level problem solutions, but instead assume access to approximate solutions with controllable accuracy, which is achievable in practice. We prove global convergence and a worstcase complexity bound for our approach. We test our proposed framework on ROFdenoising and learning MRI sampling patterns. Dynamically adjusting the lower-level accuracy yields learned parameters with similar reconstruction quality as highaccuracy evaluations but with dramatic reductions in computational work (up to 100 times faster in some cases).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "36360     Let (X j , d j , $\\mu$ j), j = 0, 1,. .. , m be metric measure spaces. Given 0 < p $\\kappa$ $\\le$ $\\infty$ for $\\kappa$ = 1,. .. , m and an analytic family of multilinear operators T z : L p 1 (X 1) x $\\bullet$ $\\bullet$ $\\bullet$ L p m (X m) $\\rightarrow$ L 1 loc (X 0), for z in the complex unit strip, we prove a theorem in the spirit of Stein's complex interpolation for analytic families. Analyticity and our admissibility condition are defined in the weak (integral) sense and relax the pointwise definitions given in [9]. Continuous functions with compact support are natural dense subspaces of Lebesgue spaces over metric measure spaces and we assume the operators T z are initially defined on them. Our main lemma concerns the approximation of continuous functions with compact support by similar functions that depend analytically in an auxiliary parameter z. An application of the main theorem concerning bilinear estimates for Schr{\\\"o}dinger operators on L p is included.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "37936     Real Nullstellensatz is a classical result from Real Algebraic Geometry. It has recently been extended to quaternionic polynomials by Alon and Paran. The aim of this paper is to extend their Quaternionic Nullstellensatz to matrix polynomials. We also obtain an improvement of the Real Nullstellensatz for matrix polynomials in the sense that we simplify the definition of a real left ideal. We use the methods from the proof of the matrix version of Hilbert's Nullstellensatz and we obtain their extensions to a mildly non-commutative case and to the real case.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "29384     In this paper, we will study the simplest kind of beauty that can be found in a simple piece of music and can be appreciated universally. The proposed approach shows that aesthetically appealing patterns deliver higher amount of information over multiple levels in comparison with less aesthetically appealing patterns when the same amount of energy is used. The proposed model is tested on a set of beautiful music pieces.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "63951     Quantum curves were introduced in the physics literature. We develop a mathematical framework for the case associated with Hitchin spectral curves. In this context, a quantum curve is a Rees $\\mathcal{D}$-module on a smooth projective algebraic curve, whose semi-classical limit produces the Hitchin spectral curve of a Higgs bundle. We give a method of quantization of Hitchin spectral curves by concretely constructing one-parameter deformation families of opers.   We propose a variant of the topological recursion of Eynard--Orantin and Mirzakhani for the context of singular Hitchin spectral curves. We show that a PDE version of topological recursion provides all-order WKB analysis for the Rees $\\mathcal{D}$-modules, defined as the quantization of Hitchin spectral curves associated with meromorphic $SL(2,\\mathbb{C})$-Higgs bundles. Topological recursion can be considered as a process of quantization of Hitchin spectral curves. We prove that these two quantizations, one via the construction of families of opers, and the other via the PDE recursion of topological type, agree for holomorphic and meromorphic $SL(2,\\mathbb{C})$-Higgs bundles. Classical differential equations such as the Airy differential equation provides a typical example. Through these classical examples, we see that quantum curves relate Higgs bundles, opers, a conjecture of Gaiotto, and quantum invariants, such as Gromov--Witten invariants                                                                                                                                                                                                                                                                                                                                                                     \n",
       "60746     We present a class of algorithms based on rational Krylov methods to compute the action of a generalized matrix function on a vector. These algorithms incorporate existing methods based on the Golub-Kahan bidiagonalization as a special case. By exploiting the quasiseparable structure of the projected matrices, we show that the basis vectors can be updated using a short recurrence, which can be seen as a generalization to the rational case of the Golub-Kahan bidiagonalization. We also prove error bounds that relate the error of these methods to uniform rational approximation. The effectiveness of the algorithms and the accuracy of the bounds is illustrated with numerical experiments.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "26478     Of paramount importance in both ecological systems and economic policies are the problems of harvesting of natural resources. A paradigmatic situation where this question is raised is that of fishing strategies. Indeed, overfishing is a well-known problem in the management of live-stocks, as being too greedy may lead to an overall dramatic depletion of the population we are harvesting. A closely related topic is that of Nash equilibria in the context of fishing policies. Namely, two players being in competition for the same pool of resources, is it possible for them to find an equilibrium situation? The goal of this paper is to provide a detailed analysis of these two queries (\\emph{i.e} optimal fishing strategies for single-player models and study of Nash equilibria for multiple players games) by using a basic yet instructive mathematical model, the logistic-diffusive equation. In this framework, the underlying model simply reads $-\\mu\\Delta \\theta=\\theta(K(x)-\\alpha(x)-\\theta)$ where $K$ accounts for natural resources, $\\theta$ for the density of the population that is being harvested and $\\alpha=\\alpha(x)$ encodes either the single player fishing strategy or, when dealing with Nash equilibria, a combination of the fishing strategies of both players. This article consists of two main parts. The first one gives a very fine characterisation of the optimisers for the single-player game. In the case where two players are involved, we aim at finding a Nash equilibrium. We prove the existence of Nash equilibria in several different regimes \\textcolor{black}{and investigate several related qualitative queries}.Our study is completed by a variety of numerical simulations that illustrate our results and allow us to formulate open questions and conjectures.    \n",
       "\n",
       "                                                     cat  \\\n",
       "125174  [math.OC]                                          \n",
       "91765   [math.DG]                                          \n",
       "154012  [math.OC, cs.SY]                                   \n",
       "11152   [math.AP]                                          \n",
       "67426   [math.NA, cs.NA]                                   \n",
       "113435  [math.DS, math.CV]                                 \n",
       "101715  [math.AP]                                          \n",
       "21170   [math.PR, cs.IT, math.IT]                          \n",
       "70247   [math.GT, math.QA]                                 \n",
       "157070  [math.PR]                                          \n",
       "25237   [math.AG, math-ph, math.CA, math.MP]               \n",
       "36435   [math.PR, math-ph, math.AP, math.MP]               \n",
       "105352  [math.GR]                                          \n",
       "92397   [math.OC, cs.CV, cs.LG, cs.NA, math.NA, stat.ML]   \n",
       "36360   [math.AP, math.FA]                                 \n",
       "37936   [math.RA]                                          \n",
       "29384   [cs.IT, math.IT]                                   \n",
       "63951   [math.AG, math-ph, math.MP]                        \n",
       "60746   [math.NA, cs.NA]                                   \n",
       "26478   [math.OC, math.AP]                                 \n",
       "\n",
       "                                                                                                                               authors_parsed  \\\n",
       "125174  [['Liu', 'Mingxi', ''], ['Phanivong', 'Phillippe K.', ''], ['Shi', 'Yang', ''], ['Callaway', 'Duncan S.', '']]                          \n",
       "91765   [['Chen', 'Huibin', ''], ['Chen', 'Zhiqi', ''], ['Zhu', 'Fuhai', '']]                                                                   \n",
       "154012  [['Lhachemi', 'Hugo', ''], ['Shorten', 'Robert', '']]                                                                                   \n",
       "11152   [['Lteif', 'Ralph', '', 'LAMA'], ['Gerbi', 'Stéphane', '', 'LAMA']]                                                                     \n",
       "67426   [['Gander', 'Martin J.', ''], ['Rave', 'Stephan', '']]                                                                                  \n",
       "113435  [['DeZotti', 'Alexandre', '']]                                                                                                          \n",
       "101715  [['Guo', 'Yan', ''], ['Hwang', 'Hyung Ju', ''], ['Jang', 'Jin Woo', ''], ['Ouyang', 'Zhimeng', '']]                                     \n",
       "21170   [['Gavalakis', 'Lampros', ''], ['Kontoyiannis', 'Ioannis', '']]                                                                         \n",
       "70247   [['Korinman', 'Julien', '']]                                                                                                            \n",
       "157070  [['Korolev', 'V. Yu.', ''], ['Gorshenin', 'A. K.', ''], ['Zeifman', 'A. I.', '']]                                                       \n",
       "25237   [['Charbonnier', 'Séverin', ''], ['Chidambaram', 'Nitin Kumar', ''], ['Garcia-Failde', 'Elba', ''], ['Giacchetto', 'Alessandro', '']]   \n",
       "36435   [['Camrud', 'Evan', ''], ['Herzog', 'David P.', ''], ['Stoltz', 'Gabriel', ''], ['Gordina', 'Maria', '']]                               \n",
       "105352  [['Burness', 'Timothy C.', ''], ['Thomas', 'Adam R.', '']]                                                                              \n",
       "92397   [['Ehrhardt', 'Matthias J.', ''], ['Roberts', 'Lindon', '']]                                                                            \n",
       "36360   [['Grafakos', 'Loukas', '', 'IMB'], ['Ouhabaz', 'El Maati', '', 'IMB']]                                                                 \n",
       "37936   [['Cimprič', 'J.', '']]                                                                                                                 \n",
       "29384   [['Khalili', 'A. M.', '']]                                                                                                              \n",
       "63951   [['Dumitrescu', 'Olivia', ''], ['Mulase', 'Motohico', '']]                                                                              \n",
       "60746   [['Casulli', 'Angelo Alberto', ''], ['Simunec', 'Igor', '']]                                                                            \n",
       "26478   [['Mazari', 'Idriss', ''], ['Ruiz-Balet', 'Domènec', '']]                                                                               \n",
       "\n",
       "       update_date          id  \\\n",
       "125174 2020-04-02   1710.05533   \n",
       "91765  2020-12-15   2012.07015   \n",
       "154012 2019-08-07   1810.03553   \n",
       "11152  2022-07-04   2102.09849   \n",
       "67426  2021-06-09   2103.10884   \n",
       "113435 2020-07-01   2006.16386   \n",
       "101715 2020-09-30   2009.01391   \n",
       "21170  2022-04-28   2204.05033   \n",
       "70247  2021-05-21   2105.09563   \n",
       "157070 2019-07-10   1810.06389   \n",
       "25237  2022-03-31   2203.16523   \n",
       "36435  2022-01-19   2104.10574   \n",
       "105352 2020-09-01   2005.11554   \n",
       "92397  2020-12-10   2006.12674   \n",
       "36360  2022-01-19   2107.00290   \n",
       "37936  2022-01-06   2201.01345   \n",
       "29384  2022-03-04   1707.06510   \n",
       "63951  2021-07-05   1702.00511   \n",
       "60746  2021-07-27   2107.12074   \n",
       "26478  2022-03-23   2203.11844   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               clean_abstract  \\\n",
       "125174    Electric vehicle (EV) charging can negatively impact electric distribution networks by exceeding equipment thermal ratings and causing voltages to drop below standard ranges. In this paper, we develop a decentralized EV charging control scheme to achieve \"valley-filling\" (i.e., flattening demand profile during overnight charging), meanwhile meeting heterogeneous individual charging requirements and satisfying distribution network constraints. The formulated problem is an optimization problem with a non-separable objective function and strongly coupled inequality constraints. We propose a novel shrunken primal-dual subgradient (SPDS) algorithm to support the decentralized control scheme, derive conditions guaranteeing its convergence, and verify its efficacy and convergence with a representative distribution network model.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "91765     In this paper, we focus on homogeneous spaces which are constructed from two strongly isotropy irreducible spaces, and prove that any geodesic orbit metric on these spaces is naturally reductive.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "154012    This paper deals with the establishment of Input-to-State Stability (ISS) estimates for infinite dimensional systems with respect to both boundary and distributed disturbances. First, a new approach is developed for the establishment of ISS estimates for a class of Riesz-spectral boundary control systems satisfying certain eigenvalue constraints. Second, a concept of weak solutions is introduced in order to relax the disturbances regularity assumptions required to ensure the existence of classical solutions. The proposed concept of weak solutions, that applies to a large class of boundary control systems which is not limited to the Riesz-spectral ones, provides a natural extension of the concept of both classical and mild solutions. Assuming that an ISS estimate holds true for classical solutions, we show the existence, the uniqueness, and the ISS property of the weak solutions.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "11152     In this work, we numerically study the higher-ordered/extended Boussinesq system describing the propagation of water-waves over flat topography. A reformulation of the same order of precision that avoids the calculation of high order derivatives on the surface deformation is proposed. We show that this formulation enjoys an extended range of applicability while remaining stable. Moreover, a significant improvement in terms of linear dispersive properties in high frequency regime is made due to the suitable adjustment of a dispersion correction parameter. We develop a second order splitting scheme where the hyperbolic part of the system is treated with a high-order finite volume scheme and the dispersive part is treated with a finite difference approach. Numerical simulations are then performed under two main goals: validating the model and the numerical methods and assessing the potential need of such higher-order model.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "67426     Reduced basis methods build low-rank approximation spaces for the solution sets of parameterized PDEs by computing solutions of the given PDE for appropriately selected snapshot parameters. Localized reduced basis methods reduce the offline cost of computing these snapshot solutions by instead constructing a global space from spatially localized less expensive problems. In the case of online enrichment, these local problems are iteratively solved in regions of high residual and correspond to subdomain solves in domain decomposition methods. We show in this note that indeed there is a close relationship between online-enriched localized reduced basis and domain decomposition methods by introducing a Localized Reduced Basis Additive Schwarz method (LRBAS), which can be interpreted as a locally adaptive multi-preconditioning scheme for the CG method.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "113435    We survey some of the connections linking complex dynamics to other fields of mathematics and science. We hope to show that complex dynamics is not just interesting on its own but also has value as an applicable theory.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "101715    In this paper, we develop an alternative approach to establish the LATEX  decay estimate for the linearized Landau equation in a bounded domain with specular boundary condition. The proof is based on the methodology of proof by contradiction motivated by [Guo, Comm. Pure Appl. Math., 55(9):1104-1135, 2002] and [Guo, Arch. Ration. Mech. Anal., 197(3):713-809, 2010].                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "21170     We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first LATEX  in a sequence of LATEX  exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "70247     This is a survey on stated skein algebras and their representations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "157070    We present new mixture representations for the generalized Linnik distribution in terms of normal, Laplace, exponential and stable laws and establish the relationship between the mixing distributions in these representations. Based on these representations, we prove some limit theorems for a wide class of rather simple statistics constructed from samples with random sized including, e. g., random sums of independent random variables with finite variances and maximum random sums, in which the generalized Linnik distribution plays the role of the limit law. Thus we demonstrate that the scheme of geometric (or, in general, negative binomial) summation is far not the only asymptotic setting (even for sums of independent random variables) in which the generalized Linnik law appears as the limit distribution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \n",
       "25237     The Witten LATEX  class defines a non-semisimple cohomological field theory. Pandharipande, Pixton and Zvonkine studied two special shifts of the Witten class along two semisimple directions of the associated Dubrovin--Frobenius manifold using the Givental--Teleman reconstruction theorem. We show that the LATEX  and the translation of these two specific shifts can be constructed from the solutions of two differential equations that generalise the classical Airy differential equation. Using this, we prove that the descendant intersection theory of the shifted Witten classes satisfies topological recursion on two LATEX  families of spectral curves. By taking the limit as the parameter goes to zero for these families of spectral curves, we prove that the descendant intersection theory of the Witten LATEX  class can be computed by topological recursion on the LATEX  spectral curve. We finally show that this proof suffices to deduce Witten's LATEX  conjecture, already proved by Faber, Shadrin and Zvonkine, which claims that the generating series of LATEX  intersection numbers is the tau function of the LATEX  hierarchy that satisfies the string equation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "36435     Convergence to equilibrium of underdamped Langevin dynamics is studied under general assumptions on the potential LATEX  allowing for singularities. By modifying the direct approach to convergence in LATEX  pioneered by F. Herau and developped by Dolbeault, Mouhot and Schmeiser, we show that the dynamics converges exponentially fast to equilibrium in the topologies LATEX  and LATEX  where LATEX  denotes the invariant probability measure and LATEX  is a suitable Lyapunov weight. In both norms, we make precise how the exponential convergence rate depends on the friction parameter LATEX  in Langevin dynamics, by providing a lower bound scaling as LATEX  The results hold for usual polynomial-type potentials as well as potentials with singularities such as those arising from pairwise Lennard-Jones interactions between particles.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "105352    Let LATEX  be a finite primitive permutation group on a set LATEX  with nontrivial point stabilizer LATEX  We say that LATEX  is extremely primitive if LATEX  acts primitively on each of its orbits in LATEX  In earlier work, Mann, Praeger and Seress have proved that every extremely primitive group is either almost simple or of affine type and they have classified the affine groups up to the possibility of at most finitely many exceptions. More recently, the almost simple extremely primitive groups have been completely determined. If one assumes Wall's conjecture on the number of maximal subgroups of almost simple groups, then the results of Mann et al. show that it just remains to eliminate an explicit list of affine groups in order to complete the classification of the extremely primitive groups. Mann et al. have conjectured that none of these affine candidates are extremely primitive and our main result confirms this conjecture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "92397     Variational regularization techniques are dominant in the field of mathematical imaging. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. A by now common strategy to resolve this issue is to learn these parameters from data. While mathematically appealing this strategy leads to a nested optimization problem (known as bilevel optimization) which is computationally very difficult to handle. It is common when solving the upper-level problem to assume access to exact solutions of the lower-level problem, which is practically infeasible. In this work we propose to solve these problems using inexact derivative-free optimization algorithms which never require exact lower-level problem solutions, but instead assume access to approximate solutions with controllable accuracy, which is achievable in practice. We prove global convergence and a worstcase complexity bound for our approach. We test our proposed framework on ROFdenoising and learning MRI sampling patterns. Dynamically adjusting the lower-level accuracy yields learned parameters with similar reconstruction quality as highaccuracy evaluations but with dramatic reductions in computational work (up to 100 times faster in some cases).                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "36360     Let (X j , d j , LATEX  j), j = 0, 1,. .. , m be metric measure spaces. Given 0 < p LATEX  LATEX  LATEX  for LATEX  = 1,. .. , m and an analytic family of multilinear operators T z : L p 1 (X 1) x LATEX  LATEX  LATEX  L p m (X m) LATEX  L 1 loc (X 0), for z in the complex unit strip, we prove a theorem in the spirit of Stein's complex interpolation for analytic families. Analyticity and our admissibility condition are defined in the weak (integral) sense and relax the pointwise definitions given in [9]. Continuous functions with compact support are natural dense subspaces of Lebesgue spaces over metric measure spaces and we assume the operators T z are initially defined on them. Our main lemma concerns the approximation of continuous functions with compact support by similar functions that depend analytically in an auxiliary parameter z. An application of the main theorem concerning bilinear estimates for Schr{o}dinger operators on L p is included.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "37936     Real Nullstellensatz is a classical result from Real Algebraic Geometry. It has recently been extended to quaternionic polynomials by Alon and Paran. The aim of this paper is to extend their Quaternionic Nullstellensatz to matrix polynomials. We also obtain an improvement of the Real Nullstellensatz for matrix polynomials in the sense that we simplify the definition of a real left ideal. We use the methods from the proof of the matrix version of Hilbert's Nullstellensatz and we obtain their extensions to a mildly non-commutative case and to the real case.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "29384     In this paper, we will study the simplest kind of beauty that can be found in a simple piece of music and can be appreciated universally. The proposed approach shows that aesthetically appealing patterns deliver higher amount of information over multiple levels in comparison with less aesthetically appealing patterns when the same amount of energy is used. The proposed model is tested on a set of beautiful music pieces.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "63951     Quantum curves were introduced in the physics literature. We develop a mathematical framework for the case associated with Hitchin spectral curves. In this context, a quantum curve is a Rees LATEX  defined as the quantization of Hitchin spectral curves associated with meromorphic LATEX  bundles. Topological recursion can be considered as a process of quantization of Hitchin spectral curves. We prove that these two quantizations, one via the construction of families of opers, and the other via the PDE recursion of topological type, agree for holomorphic and meromorphic LATEX  bundles. Classical differential equations such as the Airy differential equation provides a typical example. Through these classical examples, we see that quantum curves relate Higgs bundles, opers, a conjecture of Gaiotto, and quantum invariants, such as Gromov--Witten invariants                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "60746     We present a class of algorithms based on rational Krylov methods to compute the action of a generalized matrix function on a vector. These algorithms incorporate existing methods based on the Golub-Kahan bidiagonalization as a special case. By exploiting the quasiseparable structure of the projected matrices, we show that the basis vectors can be updated using a short recurrence, which can be seen as a generalization to the rational case of the Golub-Kahan bidiagonalization. We also prove error bounds that relate the error of these methods to uniform rational approximation. The effectiveness of the algorithms and the accuracy of the bounds is illustrated with numerical experiments.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "26478     Of paramount importance in both ecological systems and economic policies are the problems of harvesting of natural resources. A paradigmatic situation where this question is raised is that of fishing strategies. Indeed, overfishing is a well-known problem in the management of live-stocks, as being too greedy may lead to an overall dramatic depletion of the population we are harvesting. A closely related topic is that of Nash equilibria in the context of fishing policies. Namely, two players being in competition for the same pool of resources, is it possible for them to find an equilibrium situation? The goal of this paper is to provide a detailed analysis of these two queries ( optimal fishing strategies for single-player models and study of Nash equilibria for multiple players games) by using a basic yet instructive mathematical model, the logistic-diffusive equation. In this framework, the underlying model simply reads LATEX  where LATEX  accounts for natural resources, LATEX  for the density of the population that is being harvested and LATEX  encodes either the single player fishing strategy or, when dealing with Nash equilibria, a combination of the fishing strategies of both players. This article consists of two main parts. The first one gives a very fine characterisation of the optimisers for the single-player game. In the case where two players are involved, we aim at finding a Nash equilibrium. We prove the existence of Nash equilibria in several different regimes {and investigate several related qualitative queries}.Our study is completed by a variety of numerical simulations that illustrate our results and allow us to formulate open questions and conjectures.    \n",
       "\n",
       "                                                            keywords  \n",
       "125174  [non-separable, primal-dual, valley-filling]                  \n",
       "91765   None                                                          \n",
       "154012  [Input-to-State, Riesz-spectral]                              \n",
       "11152   [higher-order, high-order, water-waves, higher-ordered]       \n",
       "67426   [low-rank, multi-preconditioning, online-enriched]            \n",
       "113435  None                                                          \n",
       "101715  [713-809, 1104-1135]                                          \n",
       "21170   [information-theoretic, finite-valued]                        \n",
       "70247   None                                                          \n",
       "157070  None                                                          \n",
       "25237   [non-semisimple]                                              \n",
       "36435   [polynomial-type, Lennard-Jones]                              \n",
       "105352  None                                                          \n",
       "92397   [lower-level, derivative-free, upper-level]                   \n",
       "36360   None                                                          \n",
       "37936   [non-commutative]                                             \n",
       "29384   None                                                          \n",
       "63951   None                                                          \n",
       "60746   [Golub-Kahan]                                                 \n",
       "26478   [logistic-diffusive, live-stocks, single-player, well-known]  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Now we will search through the clean abstracts for hyphenated words and extract them in a new column called 'hyphenated'\n",
    "\n",
    "pattern = r'(?<!-)\\b(?:\\w+)(?=-)(?:-(?=\\w)\\w+)+(?!-)\\b'\n",
    "\n",
    "def find_hyph(text):\n",
    "    keywords = regex.findall(pattern,text)\n",
    "    if keywords == []:\n",
    "        return None\n",
    "    else:\n",
    "        return list(set(keywords))\n",
    "\n",
    "data = pd.read_parquet('./data/arXiv.parquet')\n",
    "data['clean_abstract'] = data.abstract.apply(cleanse)\n",
    "data['keywords'] = data.clean_abstract.apply(find_hyph)\n",
    "                                             \n",
    "data.sample(20)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "      <th>clean_abstract</th>\n",
       "      <th>keywords</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>30101</th>\n",
       "      <td>Coexistence of D2D Communications and Cell-Free Massive MIMO Systems\\n  With Low Resolution ADC for Improved Throughput in Beyond-5G Networks</td>\n",
       "      <td>In this paper, uplink transmission of a cell-free massive multiple-input multiple-output (CF-mMIMO) system coexisting with device-to-device (D2D) communication links is investigated, under the assumption that access points (APs) are equipped with low-resolution analog-to-digital converters (ADCs). Lower bounds of achievable rates for both D2D users (DUEs) and CF-mMIMO users (CFUEs) are derived in closed-form, with perfect and imperfect channel state information. Next, in order to reduce pilot contamination, greedy and graph coloring-based pilot allocation algorithms are proposed and analyzed for the considered scenario. Furthermore, to control interference and improve the performance, two power control strategies are designed and their complexity and convergence are also discussed. The first power control strategy aims at maximizing CFUEs' sum spectral efficiency (SE) subject to quality of service constraints on DUEs, while the second one maximizes the weighted product of CFUEs' and DUEs' signal-to-interference-plus-noise-ratios (SINRs). Numerical results show that the proposed pilot and power allocations bring a considerable improvement to the network SE. Also, it is revealed that the activation of D2D links has a positive effect on the system throughput, i.e. the network offloading ensured by the D2D links overcomes the increased interference brought by D2D communications.</td>\n",
       "      <td>[cs.IT, math.IT]</td>\n",
       "      <td>[['Masoumi', 'Hamed', ''], ['Emadi', 'Mohammad Javad', ''], ['Buzzi', 'Stefano', '']]</td>\n",
       "      <td>2022-03-01</td>\n",
       "      <td>2005.10068</td>\n",
       "      <td>In this paper, uplink transmission of a cell-free massive multiple-input multiple-output (CF-mMIMO) system coexisting with device-to-device (D2D) communication links is investigated, under the assumption that access points (APs) are equipped with low-resolution analog-to-digital converters (ADCs). Lower bounds of achievable rates for both D2D users (DUEs) and CF-mMIMO users (CFUEs) are derived in closed-form, with perfect and imperfect channel state information. Next, in order to reduce pilot contamination, greedy and graph coloring-based pilot allocation algorithms are proposed and analyzed for the considered scenario. Furthermore, to control interference and improve the performance, two power control strategies are designed and their complexity and convergence are also discussed. The first power control strategy aims at maximizing CFUEs' sum spectral efficiency (SE) subject to quality of service constraints on DUEs, while the second one maximizes the weighted product of CFUEs' and DUEs' signal-to-interference-plus-noise-ratios (SINRs). Numerical results show that the proposed pilot and power allocations bring a considerable improvement to the network SE. Also, it is revealed that the activation of D2D links has a positive effect on the system throughput, i.e. the network offloading ensured by the D2D links overcomes the increased interference brought by D2D communications.</td>\n",
       "      <td>[CF-mMIMO, coloring-based, multiple-input, multiple-output, analog-to-digital, cell-free, closed-form, signal-to-interference-plus-noise-ratios, low-resolution, device-to-device]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44092</th>\n",
       "      <td>Non-asymptotic Identification of Linear Dynamical Systems Using Multiple\\n  Trajectories</td>\n",
       "      <td>This paper considers the problem of linear time-invariant (LTI) system identification using input/output data. Recent work has provided non-asymptotic results on partially observed LTI system identification using a single trajectory but is only suitable for stable systems. We provide finite-time analysis for learning Markov parameters based on the ordinary least-squares (OLS) estimator using multiple trajectories, which covers both stable and unstable systems. For unstable systems, our results suggest that the Markov parameters are harder to estimate in the presence of process noise. Without process noise, our upper bound on the estimation error is independent of the spectral radius of system dynamics with high probability. These two features are different from fully observed LTI systems for which recent work has shown that unstable systems with a bigger spectral radius are easier to estimate. Extensive numerical experiments demonstrate the performance of our OLS estimator.</td>\n",
       "      <td>[math.OC, cs.SY, eess.SY, math.DS]</td>\n",
       "      <td>[['Zheng', 'Yang', ''], ['Li', 'Na', '']]</td>\n",
       "      <td>2021-11-23</td>\n",
       "      <td>2009.00739</td>\n",
       "      <td>This paper considers the problem of linear time-invariant (LTI) system identification using input/output data. Recent work has provided non-asymptotic results on partially observed LTI system identification using a single trajectory but is only suitable for stable systems. We provide finite-time analysis for learning Markov parameters based on the ordinary least-squares (OLS) estimator using multiple trajectories, which covers both stable and unstable systems. For unstable systems, our results suggest that the Markov parameters are harder to estimate in the presence of process noise. Without process noise, our upper bound on the estimation error is independent of the spectral radius of system dynamics with high probability. These two features are different from fully observed LTI systems for which recent work has shown that unstable systems with a bigger spectral radius are easier to estimate. Extensive numerical experiments demonstrate the performance of our OLS estimator.</td>\n",
       "      <td>[finite-time, non-asymptotic, time-invariant, least-squares]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95950</th>\n",
       "      <td>On the first non-trivial strand of syzygies of projective schemes and\\n  Condition ${\\mathrm ND}(l)$</td>\n",
       "      <td>Let $X\\subset\\mathbb{P}^{n+e}$ be any $n$-dimensional closed subscheme. In this paper, we are mainly interested in two notions related to syzygies: one is the property $\\mathbf{N}_{d,p}~(d\\ge 2, ~p\\geq 1)$, which means that $X$ is $d$-regular up to $p$-th step in the minimal free resolution and the other is a new notion $\\mathrm{ND}(l)$ which generalizes the classical \"being nondegenerate\" to the condition that requires a general finite linear section not to be contained in any hypersurface of degree $l$.   First, we introduce condition $\\mathrm{ND}(l)$ and consider examples and basic properties deduced from the notion. Next we prove sharp upper bounds on the graded Betti numbers of the first non-trivial strand of syzygies, which generalize results in the quadratic case to higher degree case, and provide characterizations for the extremal cases. Further, after regarding some consequences of property $\\mathbf{N}_{d,p}$, we characterize the resolution of $X$ to be $d$-linear arithemetically Cohen-Macaulay as having property $\\mathbf{N}_{d,e}$ and condition $\\mathrm{ND}(d-1)$ at the same time. From this result, we obtain a syzygetic rigidity theorem which suggests a natural generalization of syzygetic rigidity on $2$-regularity due to Eisenbud-Green-Hulek-Popescu to a general $d$-regularity.</td>\n",
       "      <td>[math.AG, math.AC]</td>\n",
       "      <td>[['Ahn', 'Jeaman', ''], ['Han', 'Kangjin', ''], ['Kwak', 'Sijong', '']]</td>\n",
       "      <td>2020-11-16</td>\n",
       "      <td>2011.06785</td>\n",
       "      <td>Let LATEX  be any LATEX  closed subscheme. In this paper, we are mainly interested in two notions related to syzygies: one is the property LATEX  which means that LATEX  is LATEX  up to LATEX  step in the minimal free resolution and the other is a new notion LATEX  which generalizes the classical \"being nondegenerate\" to the condition that requires a general finite linear section not to be contained in any hypersurface of degree LATEX    First, we introduce condition LATEX  and consider examples and basic properties deduced from the notion. Next we prove sharp upper bounds on the graded Betti numbers of the first non-trivial strand of syzygies, which generalize results in the quadratic case to higher degree case, and provide characterizations for the extremal cases. Further, after regarding some consequences of property LATEX  we characterize the resolution of LATEX  to be LATEX  arithemetically Cohen-Macaulay as having property LATEX  and condition LATEX  at the same time. From this result, we obtain a syzygetic rigidity theorem which suggests a natural generalization of syzygetic rigidity on LATEX  due to Eisenbud-Green-Hulek-Popescu to a general LATEX</td>\n",
       "      <td>[Eisenbud-Green-Hulek-Popescu, non-trivial, Cohen-Macaulay]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87091</th>\n",
       "      <td>Frobenius test exponent for ideals generated by filter regular sequences</td>\n",
       "      <td>Let $(R,\\frak m)$ be a Noetherian local ring of prime characteristic $p&gt;0$, and $t$ an integer such that $H_{\\frak m}^j(R)/0^F_{H^j_{\\frak m}(R)}$ has finite length for all $j&lt;t$. The aim of this paper is to show that there exists an uniform bound for Frobenius test exponents of ideals generated by filter regular sequences of length at most $t$.</td>\n",
       "      <td>[math.AC]</td>\n",
       "      <td>[['Huong', 'Duong Thi', ''], ['Quy', 'Pham Hung', '']]</td>\n",
       "      <td>2021-01-21</td>\n",
       "      <td>2101.00475</td>\n",
       "      <td>Let LATEX  be a Noetherian local ring of prime characteristic LATEX  and LATEX  an integer such that LATEX  has finite length for all LATEX  The aim of this paper is to show that there exists an uniform bound for Frobenius test exponents of ideals generated by filter regular sequences of length at most LATEX</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86195</th>\n",
       "      <td>XXL type Artin groups are CAT(0) and acylindrically hyperbolic</td>\n",
       "      <td>We describe a simple locally CAT(0) classifying space for extra extra large type Artin groups (with all labels at least 5). Furthermore, when the Artin group is not dihedral, we describe a rank 1 periodic geodesic, thus proving that extra large type Artin groups are acylindrically hyperbolic. Together with Property RD proved by Ciabonu, Holt and Rees, the CAT(0) property implies the Baum-Connes conjecture for all XXL type Artin groups.</td>\n",
       "      <td>[math.MG, math.GT]</td>\n",
       "      <td>[['Haettel', 'Thomas', '']]</td>\n",
       "      <td>2021-01-27</td>\n",
       "      <td>1905.11032</td>\n",
       "      <td>We describe a simple locally CAT(0) classifying space for extra extra large type Artin groups (with all labels at least 5). Furthermore, when the Artin group is not dihedral, we describe a rank 1 periodic geodesic, thus proving that extra large type Artin groups are acylindrically hyperbolic. Together with Property RD proved by Ciabonu, Holt and Rees, the CAT(0) property implies the Baum-Connes conjecture for all XXL type Artin groups.</td>\n",
       "      <td>[Baum-Connes]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74105</th>\n",
       "      <td>Robust Model Predictive Control for Nonlinear Systems Using Convex\\n  Restriction</td>\n",
       "      <td>We present an algorithm for robust model predictive control with consideration of uncertainty and safety constraints. Our framework considers a nonlinear dynamical system subject to disturbances from an unknown but bounded uncertainty set. By viewing the system as a fixed point of an operator acting over trajectories, we propose a convex condition on control actions that guarantee safety against the uncertainty set. The proposed condition guarantees that all realizations of the state trajectories satisfy safety constraints. Our algorithm solves a sequence of convex quadratic constrained optimization problems of size n*N, where n is the number of states, and N is the prediction horizon in the model predictive control problem. Compared to existing methods, our approach solves convex problems while guaranteeing that all realizations of uncertainty set do not violate safety constraints. Moreover, we consider the implicit time-discretization of system dynamics to increase the prediction horizon and enhance computational accuracy. Numerical simulations for vehicle navigation demonstrate the effectiveness of our approach.</td>\n",
       "      <td>[math.OC]</td>\n",
       "      <td>[['Lee', 'Dongchan', ''], ['Turitsyn', 'Konstantin', ''], ['Slotine', 'Jean-Jacques', '']]</td>\n",
       "      <td>2021-04-23</td>\n",
       "      <td>2003.00345</td>\n",
       "      <td>We present an algorithm for robust model predictive control with consideration of uncertainty and safety constraints. Our framework considers a nonlinear dynamical system subject to disturbances from an unknown but bounded uncertainty set. By viewing the system as a fixed point of an operator acting over trajectories, we propose a convex condition on control actions that guarantee safety against the uncertainty set. The proposed condition guarantees that all realizations of the state trajectories satisfy safety constraints. Our algorithm solves a sequence of convex quadratic constrained optimization problems of size n*N, where n is the number of states, and N is the prediction horizon in the model predictive control problem. Compared to existing methods, our approach solves convex problems while guaranteeing that all realizations of uncertainty set do not violate safety constraints. Moreover, we consider the implicit time-discretization of system dynamics to increase the prediction horizon and enhance computational accuracy. Numerical simulations for vehicle navigation demonstrate the effectiveness of our approach.</td>\n",
       "      <td>[time-discretization]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>155131</th>\n",
       "      <td>An Achievement Game on a Cycle</td>\n",
       "      <td>Consider the following game played by Maker and Breaker on the vertices of the cycle $C_{n}$, with first move given to Breaker. The aim of Maker is to maximise the number of adjacent pairs of vertices that are both claimed by her, and the aim of Breaker is to minimise this number. The aim of this paper is to find this number exactly for all $n$ when both players play optimally, answering a related question of Dowden, Kang, Mikala\\v{c}ki and Stojakovi\\'{c}.</td>\n",
       "      <td>[math.CO]</td>\n",
       "      <td>[['Raty', 'Eero', '']]</td>\n",
       "      <td>2019-07-26</td>\n",
       "      <td>1907.11152</td>\n",
       "      <td>Consider the following game played by Maker and Breaker on the vertices of the cycle LATEX  with first move given to Breaker. The aim of Maker is to maximise the number of adjacent pairs of vertices that are both claimed by her, and the aim of Breaker is to minimise this number. The aim of this paper is to find this number exactly for all LATEX  when both players play optimally, answering a related question of Dowden, Kang, Mikalacki and Stojakovic.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20183</th>\n",
       "      <td>Mismatched Disturbance Rejection Control for Second-Order Discrete-Time\\n  Systems</td>\n",
       "      <td>This paper is concerned with mismatched disturbance rejection control for the second-order discrete-time systems.Different from previous work, the controllability of the system is applied to design the disturbance compensation gain, which does not require any coordinate transformations. Via this new idea, it is shown that disturbance in the regulated output is immediately and directly compensated in the case that the disturbance is known. When the disturbance is unknown, an extra generalized extended state observer is applied to design the controller. Two examples are given to show the effectiveness of the proposed methods. Numerical simulation shows that the designed controller has excellent disturbance rejection effect when the disturbance is known. The example with respect to the permanent-magnet direct current motor illustrates that the proposed control method for unknown disturbance rejection is effective.</td>\n",
       "      <td>[math.OC]</td>\n",
       "      <td>[['Lv', 'Shichao', ''], ['Peng', 'Kai', ''], ['Wang', 'Hongxia', ''], ['Zhang', 'Huanshui', '']]</td>\n",
       "      <td>2022-05-04</td>\n",
       "      <td>2205.01261</td>\n",
       "      <td>This paper is concerned with mismatched disturbance rejection control for the second-order discrete-time systems.Different from previous work, the controllability of the system is applied to design the disturbance compensation gain, which does not require any coordinate transformations. Via this new idea, it is shown that disturbance in the regulated output is immediately and directly compensated in the case that the disturbance is known. When the disturbance is unknown, an extra generalized extended state observer is applied to design the controller. Two examples are given to show the effectiveness of the proposed methods. Numerical simulation shows that the designed controller has excellent disturbance rejection effect when the disturbance is known. The example with respect to the permanent-magnet direct current motor illustrates that the proposed control method for unknown disturbance rejection is effective.</td>\n",
       "      <td>[discrete-time, second-order, permanent-magnet]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140472</th>\n",
       "      <td>Generalized shift operator of certain encodings of real numbers</td>\n",
       "      <td>The present article is devoted to the investigation of some properties of the generalized shift operator of numbers represented in terms of numeral systems with a variable alphabet.</td>\n",
       "      <td>[math.GM]</td>\n",
       "      <td>[['Serbenyuk', 'Symon', '']]</td>\n",
       "      <td>2019-11-28</td>\n",
       "      <td>1911.12140</td>\n",
       "      <td>The present article is devoted to the investigation of some properties of the generalized shift operator of numbers represented in terms of numeral systems with a variable alphabet.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167763</th>\n",
       "      <td>Properly-weighted graph Laplacian for semi-supervised learning</td>\n",
       "      <td>The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit.   In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement.</td>\n",
       "      <td>[math.AP, cs.LG, math.NA, math.PR]</td>\n",
       "      <td>[['Calder', 'Jeff', ''], ['Slepcev', 'Dejan', '']]</td>\n",
       "      <td>2019-04-03</td>\n",
       "      <td>1810.04351</td>\n",
       "      <td>The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit.   In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement.</td>\n",
       "      <td>[semi-supervised, ill-posed, large-sample, large-data]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167061</th>\n",
       "      <td>Topological Bijections for Oriented Matroids</td>\n",
       "      <td>In previous work by the first and third author with Matthew Baker, a family of bijections between bases of a regular matroid and the Jacobian group of the matroid was given. The core of the work is a geometric construction using zonotopal tilings that produces bijections between the bases of a realizable oriented matroid and the set of $(\\sigma,\\sigma^*)$-compatible orientations with respect to some acyclic circuit (respectively, cocircuit) signature $\\sigma$ (respectively, $\\sigma^*$). In this work, we extend this construction to general oriented matroids and circuit (respectively, cocircuit) signatures coming from generic single-element liftings (respectively, extensions). As a corollary, when both signatures are induced by the same lexicographic data, we give a new (bijective) proof of the interpretation of $T_M(1,1)$ using orientation activity due to Gioan and Las Vergnas. Here $T_M(x,y)$ is the Tutte polynomial of the matroid.</td>\n",
       "      <td>[math.CO]</td>\n",
       "      <td>[['Backman', 'Spencer', ''], ['Santos', 'Francisco', ''], ['Yuen', 'Chi Ho', '']]</td>\n",
       "      <td>2019-04-09</td>\n",
       "      <td>1904.03562</td>\n",
       "      <td>In previous work by the first and third author with Matthew Baker, a family of bijections between bases of a regular matroid and the Jacobian group of the matroid was given. The core of the work is a geometric construction using zonotopal tilings that produces bijections between the bases of a realizable oriented matroid and the set of LATEX  orientations with respect to some acyclic circuit (respectively, cocircuit) signature LATEX  (respectively, LATEX  In this work, we extend this construction to general oriented matroids and circuit (respectively, cocircuit) signatures coming from generic single-element liftings (respectively, extensions). As a corollary, when both signatures are induced by the same lexicographic data, we give a new (bijective) proof of the interpretation of LATEX  using orientation activity due to Gioan and Las Vergnas. Here LATEX  is the Tutte polynomial of the matroid.</td>\n",
       "      <td>[single-element]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109755</th>\n",
       "      <td>Carleson measure estimates and $\\epsilon$-approximation of bounded\\n  harmonic functions, without Ahlfors regularity assumptions</td>\n",
       "      <td>Let $\\Omega$ be a domain in $\\mathbb{R}^{d+1}$, $d \\geq 1$. In the paper's references [HMM2] and [GMT] it was proved that if $\\Omega$ satisfies a corkscrew condition and if $\\partial \\Omega$ is $d$-Ahlfors regular, i.e. Hausdorff measure $\\mathcal{H}^d(B(x,r) \\cap \\partial \\Omega) \\sim r^d$ for all $x \\in \\partial \\Omega$ and $0 &lt; r &lt; {\\rm diam}(\\partial \\Omega)$, then $\\partial \\Omega$ is uniformly rectifiable if and only if (a) a square function Carleson measure estimate holds for every bounded harmonic function on $\\Omega$ or (b) an $\\varepsilon$-approximation property for all $0 &lt; \\varepsilon &lt;1$ for every such function. Here we explore (a) and (b) when $\\partial \\Omega$ is not required to be Ahlfors regular. We first prove that (a) and (b) hold for any domain $\\Omega$ for which there exists a domain $\\widetilde \\Omega \\subset \\Omega$ such that $\\partial \\Omega \\subset \\partial \\widetilde \\Omega$ and $\\partial \\widetilde \\Omega$ is uniformly rectifiable. We next assume $\\Omega$ satisfies a corkscrew condition and $\\partial \\Omega$ satisfies a capacity density condition. Under these assumptions we prove conversely that the existence of such $\\widetilde \\Omega$ implies (a) and (b) hold on $\\Omega$ and give further characterizations of domains for which (a) or (b) holds. One is that harmonic measure satisfies a Carleson packing condition for diameters similar to the corona decompositionm proved equivalent to uniform rectifiability in [GMT]. The second characterization is reminiscent of the Carleson measure description of $H^{\\infty}$ interpolating sequences in the unit disc.</td>\n",
       "      <td>[math.CA]</td>\n",
       "      <td>[['Garnett', 'John', '']]</td>\n",
       "      <td>2020-07-28</td>\n",
       "      <td>2006.10682</td>\n",
       "      <td>Let LATEX  be a domain in LATEX  LATEX  In the paper's references [HMM2] and [GMT] it was proved that if LATEX  satisfies a corkscrew condition and if LATEX  is LATEX  regular, i.e. Hausdorff measure LATEX  for all LATEX  and LATEX  then LATEX  is uniformly rectifiable if and only if (a) a square function Carleson measure estimate holds for every bounded harmonic function on LATEX  or (b) an LATEX  property for all LATEX  for every such function. Here we explore (a) and (b) when LATEX  is not required to be Ahlfors regular. We first prove that (a) and (b) hold for any domain LATEX  for which there exists a domain LATEX  such that LATEX  and LATEX  is uniformly rectifiable. We next assume LATEX  satisfies a corkscrew condition and LATEX  satisfies a capacity density condition. Under these assumptions we prove conversely that the existence of such LATEX  implies (a) and (b) hold on LATEX  and give further characterizations of domains for which (a) or (b) holds. One is that harmonic measure satisfies a Carleson packing condition for diameters similar to the corona decompositionm proved equivalent to uniform rectifiability in [GMT]. The second characterization is reminiscent of the Carleson measure description of LATEX  interpolating sequences in the unit disc.</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17024</th>\n",
       "      <td>On skew partial derivatives and a Hermite-type interpolation problem</td>\n",
       "      <td>Let $\\mathcal{R}:=\\mathbb{F}[{\\bf x};\\sigma,\\delta]$ be a multivariate skew polynomial ring over a division ring $\\mathbb{F}$. In this paper, we introduce the notion of right and left $(\\sigma,\\delta)$-partial derivatives of polynomials in $\\mathcal{R}$ and we prove some of their main properties. As an application of these results, we solve in $\\mathcal{R}$ a Hermite-type multivariate skew polynomial interpolation problem. The main technical tools and results used here are of constructive type, showing methods and algorithms to construct a polynomial in $\\mathcal{R}$ which satisfies the above Hermite-type interpolation problem and its relative Lagrange-type version.</td>\n",
       "      <td>[math.RA]</td>\n",
       "      <td>[['Donoso', 'Jonathan Armando Briones', ''], ['Tironi', 'Andrea Luigi', '']]</td>\n",
       "      <td>2022-05-25</td>\n",
       "      <td>2205.12222</td>\n",
       "      <td>Let LATEX  be a multivariate skew polynomial ring over a division ring LATEX  In this paper, we introduce the notion of right and left LATEX  derivatives of polynomials in LATEX  a Hermite-type multivariate skew polynomial interpolation problem. The main technical tools and results used here are of constructive type, showing methods and algorithms to construct a polynomial in LATEX  which satisfies the above Hermite-type interpolation problem and its relative Lagrange-type version.</td>\n",
       "      <td>[Hermite-type, Lagrange-type]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131098</th>\n",
       "      <td>Geometric Rescaling Algorithms for Submodular Function Minimization</td>\n",
       "      <td>We present a new class of polynomial-time algorithms for submodular function minimization (SFM), as well as a unified framework to obtain strongly polynomial SFM algorithms. Our algorithms are based on simple iterative methods for the minimum-norm problem, such as the conditional gradient and Fujishige-Wolfe algorithms. We exhibit two techniques to turn simple iterative methods into polynomial-time algorithms.   Firstly, we adapt the geometric rescaling technique, which has recently gained attention in linear programming, to SFM and obtain a weakly polynomial bound $O(({n}^4\\cdot \\textrm{EO} + {n}^5)\\log ({n} L))$.   Secondly, we exhibit a general combinatorial black-box approach to turn $\\varepsilon L$-approximate SFM oracles into strongly polynomial exact SFM algorithms. This framework can be applied to a wide range of combinatorial and continuous algorithms, including pseudo-polynomial ones. In particular, we can obtain strongly polynomial algorithms by a repeated application of the conditional gradient or of the Fujishige-Wolfe algorithm. Combined with the geometric rescaling technique, the black-box approach provides an $O(({n}^5\\cdot \\textrm{EO} +{n}^6)\\log^2{n})$ algorithm.   Finally, we show that one of the techniques we develop in the paper can also be combined with the cutting-plane method of Lee, Sidford, and Wong \\cite{LSW}, yielding a simplified variant of their $O(n^3 \\log^2 n \\cdot \\textrm{EO} + n^4\\log^{O(1)} n)$ algorithm.</td>\n",
       "      <td>[math.OC, cs.DS]</td>\n",
       "      <td>[['Dadush', 'Daniel', ''], ['Végh', 'László A.', ''], ['Zambelli', 'Giacomo', '']]</td>\n",
       "      <td>2020-02-14</td>\n",
       "      <td>1707.05065</td>\n",
       "      <td>We present a new class of polynomial-time algorithms for submodular function minimization (SFM), as well as a unified framework to obtain strongly polynomial SFM algorithms. Our algorithms are based on simple iterative methods for the minimum-norm problem, such as the conditional gradient and Fujishige-Wolfe algorithms. We exhibit two techniques to turn simple iterative methods into polynomial-time algorithms.   Firstly, we adapt the geometric rescaling technique, which has recently gained attention in linear programming, to SFM and obtain a weakly polynomial bound LATEX    Secondly, we exhibit a general combinatorial black-box approach to turn LATEX  SFM oracles into strongly polynomial exact SFM algorithms. This framework can be applied to a wide range of combinatorial and continuous algorithms, including pseudo-polynomial ones. In particular, we can obtain strongly polynomial algorithms by a repeated application of the conditional gradient or of the Fujishige-Wolfe algorithm. Combined with the geometric rescaling technique, the black-box approach provides an LATEX  algorithm.   Finally, we show that one of the techniques we develop in the paper can also be combined with the cutting-plane method of Lee, Sidford, and Wong , yielding a simplified variant of their LATEX  algorithm.</td>\n",
       "      <td>[black-box, pseudo-polynomial, minimum-norm, Fujishige-Wolfe, polynomial-time, cutting-plane]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17981</th>\n",
       "      <td>Reducing Linear Hadwiger's Conjecture to Coloring Small Graphs</td>\n",
       "      <td>In 1943, Hadwiger conjectured that every graph with no $K_t$ minor is $(t-1)$-colorable for every $t\\ge 1$. In the 1980s, Kostochka and Thomason independently proved that every graph with no $K_t$ minor has average degree $O(t\\sqrt{\\log t})$ and hence is $O(t\\sqrt{\\log t})$-colorable. Recently, Norin, Song and the second author showed that every graph with no $K_t$ minor is $O(t(\\log t)^{\\beta})$-colorable for every $\\beta &gt; 1/4$, making the first improvement on the order of magnitude of the $O(t\\sqrt{\\log t})$ bound. The first main result of this paper is that every graph with no $K_t$ minor is $O(t\\log\\log t)$-colorable.   This is a corollary of our main technical result that the chromatic number of a $K_t$-minor-free graph is bounded by $O(t(1+f(G,t)))$ where $f(G,t)$ is the maximum of $\\frac{\\chi(H)}{a}$ over all $a\\ge \\frac{t}{\\sqrt{\\log t}}$ and $K_a$-minor-free subgraphs $H$ of $G$ that are small (i.e. $O(a\\log^4 a)$ vertices). This has a number of interesting corollaries. First as mentioned, using the current best-known bounds on coloring small $K_t$-minor-free graphs, we show that $K_t$-minor-free graphs are $O(t\\log\\log t)$-colorable. Second, it shows that proving Linear Hadwiger's Conjecture (that $K_t$-minor-free graphs are $O(t)$-colorable) reduces to proving it for small graphs. Third, we prove that $K_t$-minor-free graphs with clique number at most $\\sqrt{\\log t}/ (\\log \\log t)^2$ are $O(t)$-colorable. This implies our final corollary that Linear Hadwiger's Conjecture holds for $K_r$-free graphs for every fixed $r$.   One key to proving the main theorem is a new standalone result that every $K_t$-minor-free graph of average degree $d=\\Omega(t)$ has a subgraph on $O(t \\log^3 t)$ vertices with average degree $\\Omega(d)$.</td>\n",
       "      <td>[math.CO, cs.DM]</td>\n",
       "      <td>[['Delcourt', 'Michelle', ''], ['Postle', 'Luke', '']]</td>\n",
       "      <td>2022-05-19</td>\n",
       "      <td>2108.01633</td>\n",
       "      <td>In 1943, Hadwiger conjectured that every graph with no LATEX  minor is LATEX  for every LATEX  In the 1980s, Kostochka and Thomason independently proved that every graph with no LATEX  minor has average degree LATEX  and hence is LATEX  Recently, Norin, Song and the second author showed that every graph with no LATEX  minor is LATEX  for every LATEX  making the first improvement on the order of magnitude of the LATEX  bound. The first main result of this paper is that every graph with no LATEX  minor is LATEX    This is a corollary of our main technical result that the chromatic number of a LATEX  graph is bounded by LATEX  where LATEX  is the maximum of LATEX  over all LATEX  and LATEX  subgraphs LATEX  of LATEX  that are small (i.e. LATEX  vertices). This has a number of interesting corollaries. First as mentioned, using the current best-known bounds on coloring small LATEX  graphs, we show that LATEX  graphs are LATEX  Second, it shows that proving Linear Hadwiger's Conjecture (that LATEX  graphs are LATEX  reduces to proving it for small graphs. Third, we prove that LATEX  graphs with clique number at most LATEX  are LATEX  This implies our final corollary that Linear Hadwiger's Conjecture holds for LATEX  graphs for every fixed LATEX    One key to proving the main theorem is a new standalone result that every LATEX  graph of average degree LATEX  has a subgraph on LATEX  vertices with average degree LATEX</td>\n",
       "      <td>[best-known]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54604</th>\n",
       "      <td>Entropy as a Topological Operad Derivation</td>\n",
       "      <td>We share a small connection between information theory, algebra, and topology - namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.</td>\n",
       "      <td>[math.AT, cs.IT, math.CT, math.IT]</td>\n",
       "      <td>[['Bradley', 'Tai-Danae', '']]</td>\n",
       "      <td>2021-09-13</td>\n",
       "      <td>2107.09581</td>\n",
       "      <td>We share a small connection between information theory, algebra, and topology - namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.</td>\n",
       "      <td>[well-known]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19308</th>\n",
       "      <td>Learning to Continuously Optimize Wireless Resource in a Dynamic\\n  Environment: A Bilevel Optimization Perspective</td>\n",
       "      <td>There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment.   This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic\" setting where the environment statistics change in ``episodes\", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {\\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness\" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.</td>\n",
       "      <td>[eess.SP, cs.IT, cs.LG, math.IT]</td>\n",
       "      <td>[['Sun', 'Haoran', ''], ['Pu', 'Wenqiang', ''], ['Fu', 'Xiao', ''], ['Chang', 'Tsung-Hui', ''], ['Hong', 'Mingyi', '']]</td>\n",
       "      <td>2022-05-11</td>\n",
       "      <td>2105.01696</td>\n",
       "      <td>There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment.   This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic\" setting where the environment statistics change in ``episodes\", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {t without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness\" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.</td>\n",
       "      <td>[data-driven, state-of-the-art, ray-tracing]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44864</th>\n",
       "      <td>Analysis of finite-volume discrete adjoint fields for two-dimensional\\n  compressible Euler flows</td>\n",
       "      <td>This work deals with a number of questions relative to the discrete and continuous adjoint fields associated with the compressible Euler equations and classical aerodynamic functions. The consistency of the discrete adjoint equations with the corresponding continuous adjoint partial differential equation is one of them. It is has been established or at least discussed only for a handful of numerical schemes and a contribution of this article is to give the adjoint consistency conditions for the 2D Jameson-Schmidt-Turkel scheme in cell-centred finite-volume formulation. The consistency issue is also studied here from a new heuristic point of view by discretizing the continuous adjoint equation for the discrete flow and adjoint fields. Both points of view prove to provide useful information. Besides, it has been often noted that discrete or continuous inviscid lift and drag adjoint exhibit numerical divergence close to the wall and stagnation streamline for a wide range of subsonic and transonic flow conditions. This is analyzed here using the physical source term perturbation method introduced in reference [Giles and Pierce, AIAA Paper 97-1850, 1997]. With this point of view, the fourth physical source term of appears to be the only one responsible for this behavior. It is also demonstrated that the numerical divergence of the adjoint variables corresponds to the response of the flow to the convected increment of stagnation pressure and diminution of entropy created at the source and the resulting change in lift and drag.</td>\n",
       "      <td>[physics.comp-ph, cs.NA, math.NA]</td>\n",
       "      <td>[['Peter', 'Jacques', ''], ['Renac', 'Florent', ''], ['Labbé', 'Clément', '']]</td>\n",
       "      <td>2021-11-17</td>\n",
       "      <td>2009.07096</td>\n",
       "      <td>This work deals with a number of questions relative to the discrete and continuous adjoint fields associated with the compressible Euler equations and classical aerodynamic functions. The consistency of the discrete adjoint equations with the corresponding continuous adjoint partial differential equation is one of them. It is has been established or at least discussed only for a handful of numerical schemes and a contribution of this article is to give the adjoint consistency conditions for the 2D Jameson-Schmidt-Turkel scheme in cell-centred finite-volume formulation. The consistency issue is also studied here from a new heuristic point of view by discretizing the continuous adjoint equation for the discrete flow and adjoint fields. Both points of view prove to provide useful information. Besides, it has been often noted that discrete or continuous inviscid lift and drag adjoint exhibit numerical divergence close to the wall and stagnation streamline for a wide range of subsonic and transonic flow conditions. This is analyzed here using the physical source term perturbation method introduced in reference [Giles and Pierce, AIAA Paper 97-1850, 1997]. With this point of view, the fourth physical source term of appears to be the only one responsible for this behavior. It is also demonstrated that the numerical divergence of the adjoint variables corresponds to the response of the flow to the convected increment of stagnation pressure and diminution of entropy created at the source and the resulting change in lift and drag.</td>\n",
       "      <td>[Jameson-Schmidt-Turkel, cell-centred, finite-volume, 97-1850]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26270</th>\n",
       "      <td>Towards constructivising the Freyd-Mitchell embedding theorem</td>\n",
       "      <td>The aim of the paper is to first point out that the classical proof of the Freyd-Mitchell Embedding Theorem does not work in CZF; then, to propose an alternative embedding of a small abelian category into the category of sheaves of modules over a ringed space, which works constructively. It is necessary to mention that this work has been initially inspired by Erik Palmgren, who unexpectedly passed away in November 2019: I'm very grateful to him for having shared with me his intuitions, and for having supervised the realization of the first half of the paper.</td>\n",
       "      <td>[math.CT, math.LO]</td>\n",
       "      <td>[['Montaruli', 'Anna Giulia', '']]</td>\n",
       "      <td>2022-03-24</td>\n",
       "      <td>2203.12490</td>\n",
       "      <td>The aim of the paper is to first point out that the classical proof of the Freyd-Mitchell Embedding Theorem does not work in CZF; then, to propose an alternative embedding of a small abelian category into the category of sheaves of modules over a ringed space, which works constructively. It is necessary to mention that this work has been initially inspired by Erik Palmgren, who unexpectedly passed away in November 2019: I'm very grateful to him for having shared with me his intuitions, and for having supervised the realization of the first half of the paper.</td>\n",
       "      <td>[Freyd-Mitchell]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78234</th>\n",
       "      <td>Selectors and orderings of coarse spaces</td>\n",
       "      <td>Given a coarse space $(X, \\mathcal{E})$, we consider linear orders on $X$ compatible with the coarse structure $\\mathcal E$ and explore interplays between these orders and macro-uniform selectors of $(X, \\mathcal{E})$.</td>\n",
       "      <td>[math.GN]</td>\n",
       "      <td>[['Protasov', 'Igor', '']]</td>\n",
       "      <td>2021-03-24</td>\n",
       "      <td>2102.02053</td>\n",
       "      <td>Given a coarse space LATEX  we consider linear orders on LATEX  compatible with the coarse structure LATEX  and explore interplays between these orders and macro-uniform selectors of LATEX</td>\n",
       "      <td>[macro-uniform]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                title  \\\n",
       "30101   Coexistence of D2D Communications and Cell-Free Massive MIMO Systems\\n  With Low Resolution ADC for Improved Throughput in Beyond-5G Networks   \n",
       "44092   Non-asymptotic Identification of Linear Dynamical Systems Using Multiple\\n  Trajectories                                                        \n",
       "95950   On the first non-trivial strand of syzygies of projective schemes and\\n  Condition ${\\mathrm ND}(l)$                                            \n",
       "87091   Frobenius test exponent for ideals generated by filter regular sequences                                                                        \n",
       "86195   XXL type Artin groups are CAT(0) and acylindrically hyperbolic                                                                                  \n",
       "74105   Robust Model Predictive Control for Nonlinear Systems Using Convex\\n  Restriction                                                               \n",
       "155131  An Achievement Game on a Cycle                                                                                                                  \n",
       "20183   Mismatched Disturbance Rejection Control for Second-Order Discrete-Time\\n  Systems                                                              \n",
       "140472  Generalized shift operator of certain encodings of real numbers                                                                                 \n",
       "167763  Properly-weighted graph Laplacian for semi-supervised learning                                                                                  \n",
       "167061  Topological Bijections for Oriented Matroids                                                                                                    \n",
       "109755  Carleson measure estimates and $\\epsilon$-approximation of bounded\\n  harmonic functions, without Ahlfors regularity assumptions                \n",
       "17024   On skew partial derivatives and a Hermite-type interpolation problem                                                                            \n",
       "131098  Geometric Rescaling Algorithms for Submodular Function Minimization                                                                             \n",
       "17981   Reducing Linear Hadwiger's Conjecture to Coloring Small Graphs                                                                                  \n",
       "54604   Entropy as a Topological Operad Derivation                                                                                                      \n",
       "19308   Learning to Continuously Optimize Wireless Resource in a Dynamic\\n  Environment: A Bilevel Optimization Perspective                             \n",
       "44864   Analysis of finite-volume discrete adjoint fields for two-dimensional\\n  compressible Euler flows                                               \n",
       "26270   Towards constructivising the Freyd-Mitchell embedding theorem                                                                                   \n",
       "78234   Selectors and orderings of coarse spaces                                                                                                        \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract  \\\n",
       "30101     In this paper, uplink transmission of a cell-free massive multiple-input multiple-output (CF-mMIMO) system coexisting with device-to-device (D2D) communication links is investigated, under the assumption that access points (APs) are equipped with low-resolution analog-to-digital converters (ADCs). Lower bounds of achievable rates for both D2D users (DUEs) and CF-mMIMO users (CFUEs) are derived in closed-form, with perfect and imperfect channel state information. Next, in order to reduce pilot contamination, greedy and graph coloring-based pilot allocation algorithms are proposed and analyzed for the considered scenario. Furthermore, to control interference and improve the performance, two power control strategies are designed and their complexity and convergence are also discussed. The first power control strategy aims at maximizing CFUEs' sum spectral efficiency (SE) subject to quality of service constraints on DUEs, while the second one maximizes the weighted product of CFUEs' and DUEs' signal-to-interference-plus-noise-ratios (SINRs). Numerical results show that the proposed pilot and power allocations bring a considerable improvement to the network SE. Also, it is revealed that the activation of D2D links has a positive effect on the system throughput, i.e. the network offloading ensured by the D2D links overcomes the increased interference brought by D2D communications.                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "44092     This paper considers the problem of linear time-invariant (LTI) system identification using input/output data. Recent work has provided non-asymptotic results on partially observed LTI system identification using a single trajectory but is only suitable for stable systems. We provide finite-time analysis for learning Markov parameters based on the ordinary least-squares (OLS) estimator using multiple trajectories, which covers both stable and unstable systems. For unstable systems, our results suggest that the Markov parameters are harder to estimate in the presence of process noise. Without process noise, our upper bound on the estimation error is independent of the spectral radius of system dynamics with high probability. These two features are different from fully observed LTI systems for which recent work has shown that unstable systems with a bigger spectral radius are easier to estimate. Extensive numerical experiments demonstrate the performance of our OLS estimator.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "95950     Let $X\\subset\\mathbb{P}^{n+e}$ be any $n$-dimensional closed subscheme. In this paper, we are mainly interested in two notions related to syzygies: one is the property $\\mathbf{N}_{d,p}~(d\\ge 2, ~p\\geq 1)$, which means that $X$ is $d$-regular up to $p$-th step in the minimal free resolution and the other is a new notion $\\mathrm{ND}(l)$ which generalizes the classical \"being nondegenerate\" to the condition that requires a general finite linear section not to be contained in any hypersurface of degree $l$.   First, we introduce condition $\\mathrm{ND}(l)$ and consider examples and basic properties deduced from the notion. Next we prove sharp upper bounds on the graded Betti numbers of the first non-trivial strand of syzygies, which generalize results in the quadratic case to higher degree case, and provide characterizations for the extremal cases. Further, after regarding some consequences of property $\\mathbf{N}_{d,p}$, we characterize the resolution of $X$ to be $d$-linear arithemetically Cohen-Macaulay as having property $\\mathbf{N}_{d,e}$ and condition $\\mathrm{ND}(d-1)$ at the same time. From this result, we obtain a syzygetic rigidity theorem which suggests a natural generalization of syzygetic rigidity on $2$-regularity due to Eisenbud-Green-Hulek-Popescu to a general $d$-regularity.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          \n",
       "87091     Let $(R,\\frak m)$ be a Noetherian local ring of prime characteristic $p>0$, and $t$ an integer such that $H_{\\frak m}^j(R)/0^F_{H^j_{\\frak m}(R)}$ has finite length for all $j<t$. The aim of this paper is to show that there exists an uniform bound for Frobenius test exponents of ideals generated by filter regular sequences of length at most $t$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "86195     We describe a simple locally CAT(0) classifying space for extra extra large type Artin groups (with all labels at least 5). Furthermore, when the Artin group is not dihedral, we describe a rank 1 periodic geodesic, thus proving that extra large type Artin groups are acylindrically hyperbolic. Together with Property RD proved by Ciabonu, Holt and Rees, the CAT(0) property implies the Baum-Connes conjecture for all XXL type Artin groups.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "74105     We present an algorithm for robust model predictive control with consideration of uncertainty and safety constraints. Our framework considers a nonlinear dynamical system subject to disturbances from an unknown but bounded uncertainty set. By viewing the system as a fixed point of an operator acting over trajectories, we propose a convex condition on control actions that guarantee safety against the uncertainty set. The proposed condition guarantees that all realizations of the state trajectories satisfy safety constraints. Our algorithm solves a sequence of convex quadratic constrained optimization problems of size n*N, where n is the number of states, and N is the prediction horizon in the model predictive control problem. Compared to existing methods, our approach solves convex problems while guaranteeing that all realizations of uncertainty set do not violate safety constraints. Moreover, we consider the implicit time-discretization of system dynamics to increase the prediction horizon and enhance computational accuracy. Numerical simulations for vehicle navigation demonstrate the effectiveness of our approach.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "155131    Consider the following game played by Maker and Breaker on the vertices of the cycle $C_{n}$, with first move given to Breaker. The aim of Maker is to maximise the number of adjacent pairs of vertices that are both claimed by her, and the aim of Breaker is to minimise this number. The aim of this paper is to find this number exactly for all $n$ when both players play optimally, answering a related question of Dowden, Kang, Mikala\\v{c}ki and Stojakovi\\'{c}.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "20183     This paper is concerned with mismatched disturbance rejection control for the second-order discrete-time systems.Different from previous work, the controllability of the system is applied to design the disturbance compensation gain, which does not require any coordinate transformations. Via this new idea, it is shown that disturbance in the regulated output is immediately and directly compensated in the case that the disturbance is known. When the disturbance is unknown, an extra generalized extended state observer is applied to design the controller. Two examples are given to show the effectiveness of the proposed methods. Numerical simulation shows that the designed controller has excellent disturbance rejection effect when the disturbance is known. The example with respect to the permanent-magnet direct current motor illustrates that the proposed control method for unknown disturbance rejection is effective.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "140472    The present article is devoted to the investigation of some properties of the generalized shift operator of numbers represented in terms of numeral systems with a variable alphabet.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "167763    The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit.   In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "167061    In previous work by the first and third author with Matthew Baker, a family of bijections between bases of a regular matroid and the Jacobian group of the matroid was given. The core of the work is a geometric construction using zonotopal tilings that produces bijections between the bases of a realizable oriented matroid and the set of $(\\sigma,\\sigma^*)$-compatible orientations with respect to some acyclic circuit (respectively, cocircuit) signature $\\sigma$ (respectively, $\\sigma^*$). In this work, we extend this construction to general oriented matroids and circuit (respectively, cocircuit) signatures coming from generic single-element liftings (respectively, extensions). As a corollary, when both signatures are induced by the same lexicographic data, we give a new (bijective) proof of the interpretation of $T_M(1,1)$ using orientation activity due to Gioan and Las Vergnas. Here $T_M(x,y)$ is the Tutte polynomial of the matroid.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "109755    Let $\\Omega$ be a domain in $\\mathbb{R}^{d+1}$, $d \\geq 1$. In the paper's references [HMM2] and [GMT] it was proved that if $\\Omega$ satisfies a corkscrew condition and if $\\partial \\Omega$ is $d$-Ahlfors regular, i.e. Hausdorff measure $\\mathcal{H}^d(B(x,r) \\cap \\partial \\Omega) \\sim r^d$ for all $x \\in \\partial \\Omega$ and $0 < r < {\\rm diam}(\\partial \\Omega)$, then $\\partial \\Omega$ is uniformly rectifiable if and only if (a) a square function Carleson measure estimate holds for every bounded harmonic function on $\\Omega$ or (b) an $\\varepsilon$-approximation property for all $0 < \\varepsilon <1$ for every such function. Here we explore (a) and (b) when $\\partial \\Omega$ is not required to be Ahlfors regular. We first prove that (a) and (b) hold for any domain $\\Omega$ for which there exists a domain $\\widetilde \\Omega \\subset \\Omega$ such that $\\partial \\Omega \\subset \\partial \\widetilde \\Omega$ and $\\partial \\widetilde \\Omega$ is uniformly rectifiable. We next assume $\\Omega$ satisfies a corkscrew condition and $\\partial \\Omega$ satisfies a capacity density condition. Under these assumptions we prove conversely that the existence of such $\\widetilde \\Omega$ implies (a) and (b) hold on $\\Omega$ and give further characterizations of domains for which (a) or (b) holds. One is that harmonic measure satisfies a Carleson packing condition for diameters similar to the corona decompositionm proved equivalent to uniform rectifiability in [GMT]. The second characterization is reminiscent of the Carleson measure description of $H^{\\infty}$ interpolating sequences in the unit disc.                                                                                                                                                                     \n",
       "17024     Let $\\mathcal{R}:=\\mathbb{F}[{\\bf x};\\sigma,\\delta]$ be a multivariate skew polynomial ring over a division ring $\\mathbb{F}$. In this paper, we introduce the notion of right and left $(\\sigma,\\delta)$-partial derivatives of polynomials in $\\mathcal{R}$ and we prove some of their main properties. As an application of these results, we solve in $\\mathcal{R}$ a Hermite-type multivariate skew polynomial interpolation problem. The main technical tools and results used here are of constructive type, showing methods and algorithms to construct a polynomial in $\\mathcal{R}$ which satisfies the above Hermite-type interpolation problem and its relative Lagrange-type version.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "131098    We present a new class of polynomial-time algorithms for submodular function minimization (SFM), as well as a unified framework to obtain strongly polynomial SFM algorithms. Our algorithms are based on simple iterative methods for the minimum-norm problem, such as the conditional gradient and Fujishige-Wolfe algorithms. We exhibit two techniques to turn simple iterative methods into polynomial-time algorithms.   Firstly, we adapt the geometric rescaling technique, which has recently gained attention in linear programming, to SFM and obtain a weakly polynomial bound $O(({n}^4\\cdot \\textrm{EO} + {n}^5)\\log ({n} L))$.   Secondly, we exhibit a general combinatorial black-box approach to turn $\\varepsilon L$-approximate SFM oracles into strongly polynomial exact SFM algorithms. This framework can be applied to a wide range of combinatorial and continuous algorithms, including pseudo-polynomial ones. In particular, we can obtain strongly polynomial algorithms by a repeated application of the conditional gradient or of the Fujishige-Wolfe algorithm. Combined with the geometric rescaling technique, the black-box approach provides an $O(({n}^5\\cdot \\textrm{EO} +{n}^6)\\log^2{n})$ algorithm.   Finally, we show that one of the techniques we develop in the paper can also be combined with the cutting-plane method of Lee, Sidford, and Wong \\cite{LSW}, yielding a simplified variant of their $O(n^3 \\log^2 n \\cdot \\textrm{EO} + n^4\\log^{O(1)} n)$ algorithm.                                                                                                                                                                                                                                                                                                                \n",
       "17981     In 1943, Hadwiger conjectured that every graph with no $K_t$ minor is $(t-1)$-colorable for every $t\\ge 1$. In the 1980s, Kostochka and Thomason independently proved that every graph with no $K_t$ minor has average degree $O(t\\sqrt{\\log t})$ and hence is $O(t\\sqrt{\\log t})$-colorable. Recently, Norin, Song and the second author showed that every graph with no $K_t$ minor is $O(t(\\log t)^{\\beta})$-colorable for every $\\beta > 1/4$, making the first improvement on the order of magnitude of the $O(t\\sqrt{\\log t})$ bound. The first main result of this paper is that every graph with no $K_t$ minor is $O(t\\log\\log t)$-colorable.   This is a corollary of our main technical result that the chromatic number of a $K_t$-minor-free graph is bounded by $O(t(1+f(G,t)))$ where $f(G,t)$ is the maximum of $\\frac{\\chi(H)}{a}$ over all $a\\ge \\frac{t}{\\sqrt{\\log t}}$ and $K_a$-minor-free subgraphs $H$ of $G$ that are small (i.e. $O(a\\log^4 a)$ vertices). This has a number of interesting corollaries. First as mentioned, using the current best-known bounds on coloring small $K_t$-minor-free graphs, we show that $K_t$-minor-free graphs are $O(t\\log\\log t)$-colorable. Second, it shows that proving Linear Hadwiger's Conjecture (that $K_t$-minor-free graphs are $O(t)$-colorable) reduces to proving it for small graphs. Third, we prove that $K_t$-minor-free graphs with clique number at most $\\sqrt{\\log t}/ (\\log \\log t)^2$ are $O(t)$-colorable. This implies our final corollary that Linear Hadwiger's Conjecture holds for $K_r$-free graphs for every fixed $r$.   One key to proving the main theorem is a new standalone result that every $K_t$-minor-free graph of average degree $d=\\Omega(t)$ has a subgraph on $O(t \\log^3 t)$ vertices with average degree $\\Omega(d)$.    \n",
       "54604     We share a small connection between information theory, algebra, and topology - namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "19308     There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment.   This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic\" setting where the environment statistics change in ``episodes\", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {\\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness\" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.                                                                                                                                                  \n",
       "44864     This work deals with a number of questions relative to the discrete and continuous adjoint fields associated with the compressible Euler equations and classical aerodynamic functions. The consistency of the discrete adjoint equations with the corresponding continuous adjoint partial differential equation is one of them. It is has been established or at least discussed only for a handful of numerical schemes and a contribution of this article is to give the adjoint consistency conditions for the 2D Jameson-Schmidt-Turkel scheme in cell-centred finite-volume formulation. The consistency issue is also studied here from a new heuristic point of view by discretizing the continuous adjoint equation for the discrete flow and adjoint fields. Both points of view prove to provide useful information. Besides, it has been often noted that discrete or continuous inviscid lift and drag adjoint exhibit numerical divergence close to the wall and stagnation streamline for a wide range of subsonic and transonic flow conditions. This is analyzed here using the physical source term perturbation method introduced in reference [Giles and Pierce, AIAA Paper 97-1850, 1997]. With this point of view, the fourth physical source term of appears to be the only one responsible for this behavior. It is also demonstrated that the numerical divergence of the adjoint variables corresponds to the response of the flow to the convected increment of stagnation pressure and diminution of entropy created at the source and the resulting change in lift and drag.                                                                                                                                                                                                                             \n",
       "26270     The aim of the paper is to first point out that the classical proof of the Freyd-Mitchell Embedding Theorem does not work in CZF; then, to propose an alternative embedding of a small abelian category into the category of sheaves of modules over a ringed space, which works constructively. It is necessary to mention that this work has been initially inspired by Erik Palmgren, who unexpectedly passed away in November 2019: I'm very grateful to him for having shared with me his intuitions, and for having supervised the realization of the first half of the paper.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "78234     Given a coarse space $(X, \\mathcal{E})$, we consider linear orders on $X$ compatible with the coarse structure $\\mathcal E$ and explore interplays between these orders and macro-uniform selectors of $(X, \\mathcal{E})$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "\n",
       "                                       cat  \\\n",
       "30101   [cs.IT, math.IT]                     \n",
       "44092   [math.OC, cs.SY, eess.SY, math.DS]   \n",
       "95950   [math.AG, math.AC]                   \n",
       "87091   [math.AC]                            \n",
       "86195   [math.MG, math.GT]                   \n",
       "74105   [math.OC]                            \n",
       "155131  [math.CO]                            \n",
       "20183   [math.OC]                            \n",
       "140472  [math.GM]                            \n",
       "167763  [math.AP, cs.LG, math.NA, math.PR]   \n",
       "167061  [math.CO]                            \n",
       "109755  [math.CA]                            \n",
       "17024   [math.RA]                            \n",
       "131098  [math.OC, cs.DS]                     \n",
       "17981   [math.CO, cs.DM]                     \n",
       "54604   [math.AT, cs.IT, math.CT, math.IT]   \n",
       "19308   [eess.SP, cs.IT, cs.LG, math.IT]     \n",
       "44864   [physics.comp-ph, cs.NA, math.NA]    \n",
       "26270   [math.CT, math.LO]                   \n",
       "78234   [math.GN]                            \n",
       "\n",
       "                                                                                                                 authors_parsed  \\\n",
       "30101   [['Masoumi', 'Hamed', ''], ['Emadi', 'Mohammad Javad', ''], ['Buzzi', 'Stefano', '']]                                     \n",
       "44092   [['Zheng', 'Yang', ''], ['Li', 'Na', '']]                                                                                 \n",
       "95950   [['Ahn', 'Jeaman', ''], ['Han', 'Kangjin', ''], ['Kwak', 'Sijong', '']]                                                   \n",
       "87091   [['Huong', 'Duong Thi', ''], ['Quy', 'Pham Hung', '']]                                                                    \n",
       "86195   [['Haettel', 'Thomas', '']]                                                                                               \n",
       "74105   [['Lee', 'Dongchan', ''], ['Turitsyn', 'Konstantin', ''], ['Slotine', 'Jean-Jacques', '']]                                \n",
       "155131  [['Raty', 'Eero', '']]                                                                                                    \n",
       "20183   [['Lv', 'Shichao', ''], ['Peng', 'Kai', ''], ['Wang', 'Hongxia', ''], ['Zhang', 'Huanshui', '']]                          \n",
       "140472  [['Serbenyuk', 'Symon', '']]                                                                                              \n",
       "167763  [['Calder', 'Jeff', ''], ['Slepcev', 'Dejan', '']]                                                                        \n",
       "167061  [['Backman', 'Spencer', ''], ['Santos', 'Francisco', ''], ['Yuen', 'Chi Ho', '']]                                         \n",
       "109755  [['Garnett', 'John', '']]                                                                                                 \n",
       "17024   [['Donoso', 'Jonathan Armando Briones', ''], ['Tironi', 'Andrea Luigi', '']]                                              \n",
       "131098  [['Dadush', 'Daniel', ''], ['Végh', 'László A.', ''], ['Zambelli', 'Giacomo', '']]                                        \n",
       "17981   [['Delcourt', 'Michelle', ''], ['Postle', 'Luke', '']]                                                                    \n",
       "54604   [['Bradley', 'Tai-Danae', '']]                                                                                            \n",
       "19308   [['Sun', 'Haoran', ''], ['Pu', 'Wenqiang', ''], ['Fu', 'Xiao', ''], ['Chang', 'Tsung-Hui', ''], ['Hong', 'Mingyi', '']]   \n",
       "44864   [['Peter', 'Jacques', ''], ['Renac', 'Florent', ''], ['Labbé', 'Clément', '']]                                            \n",
       "26270   [['Montaruli', 'Anna Giulia', '']]                                                                                        \n",
       "78234   [['Protasov', 'Igor', '']]                                                                                                \n",
       "\n",
       "       update_date          id  \\\n",
       "30101  2022-03-01   2005.10068   \n",
       "44092  2021-11-23   2009.00739   \n",
       "95950  2020-11-16   2011.06785   \n",
       "87091  2021-01-21   2101.00475   \n",
       "86195  2021-01-27   1905.11032   \n",
       "74105  2021-04-23   2003.00345   \n",
       "155131 2019-07-26   1907.11152   \n",
       "20183  2022-05-04   2205.01261   \n",
       "140472 2019-11-28   1911.12140   \n",
       "167763 2019-04-03   1810.04351   \n",
       "167061 2019-04-09   1904.03562   \n",
       "109755 2020-07-28   2006.10682   \n",
       "17024  2022-05-25   2205.12222   \n",
       "131098 2020-02-14   1707.05065   \n",
       "17981  2022-05-19   2108.01633   \n",
       "54604  2021-09-13   2107.09581   \n",
       "19308  2022-05-11   2105.01696   \n",
       "44864  2021-11-17   2009.07096   \n",
       "26270  2022-03-24   2203.12490   \n",
       "78234  2021-03-24   2102.02053   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                clean_abstract  \\\n",
       "30101     In this paper, uplink transmission of a cell-free massive multiple-input multiple-output (CF-mMIMO) system coexisting with device-to-device (D2D) communication links is investigated, under the assumption that access points (APs) are equipped with low-resolution analog-to-digital converters (ADCs). Lower bounds of achievable rates for both D2D users (DUEs) and CF-mMIMO users (CFUEs) are derived in closed-form, with perfect and imperfect channel state information. Next, in order to reduce pilot contamination, greedy and graph coloring-based pilot allocation algorithms are proposed and analyzed for the considered scenario. Furthermore, to control interference and improve the performance, two power control strategies are designed and their complexity and convergence are also discussed. The first power control strategy aims at maximizing CFUEs' sum spectral efficiency (SE) subject to quality of service constraints on DUEs, while the second one maximizes the weighted product of CFUEs' and DUEs' signal-to-interference-plus-noise-ratios (SINRs). Numerical results show that the proposed pilot and power allocations bring a considerable improvement to the network SE. Also, it is revealed that the activation of D2D links has a positive effect on the system throughput, i.e. the network offloading ensured by the D2D links overcomes the increased interference brought by D2D communications.                                                                                                                                                                                                                                  \n",
       "44092     This paper considers the problem of linear time-invariant (LTI) system identification using input/output data. Recent work has provided non-asymptotic results on partially observed LTI system identification using a single trajectory but is only suitable for stable systems. We provide finite-time analysis for learning Markov parameters based on the ordinary least-squares (OLS) estimator using multiple trajectories, which covers both stable and unstable systems. For unstable systems, our results suggest that the Markov parameters are harder to estimate in the presence of process noise. Without process noise, our upper bound on the estimation error is independent of the spectral radius of system dynamics with high probability. These two features are different from fully observed LTI systems for which recent work has shown that unstable systems with a bigger spectral radius are easier to estimate. Extensive numerical experiments demonstrate the performance of our OLS estimator.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "95950     Let LATEX  be any LATEX  closed subscheme. In this paper, we are mainly interested in two notions related to syzygies: one is the property LATEX  which means that LATEX  is LATEX  up to LATEX  step in the minimal free resolution and the other is a new notion LATEX  which generalizes the classical \"being nondegenerate\" to the condition that requires a general finite linear section not to be contained in any hypersurface of degree LATEX    First, we introduce condition LATEX  and consider examples and basic properties deduced from the notion. Next we prove sharp upper bounds on the graded Betti numbers of the first non-trivial strand of syzygies, which generalize results in the quadratic case to higher degree case, and provide characterizations for the extremal cases. Further, after regarding some consequences of property LATEX  we characterize the resolution of LATEX  to be LATEX  arithemetically Cohen-Macaulay as having property LATEX  and condition LATEX  at the same time. From this result, we obtain a syzygetic rigidity theorem which suggests a natural generalization of syzygetic rigidity on LATEX  due to Eisenbud-Green-Hulek-Popescu to a general LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "87091     Let LATEX  be a Noetherian local ring of prime characteristic LATEX  and LATEX  an integer such that LATEX  has finite length for all LATEX  The aim of this paper is to show that there exists an uniform bound for Frobenius test exponents of ideals generated by filter regular sequences of length at most LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "86195     We describe a simple locally CAT(0) classifying space for extra extra large type Artin groups (with all labels at least 5). Furthermore, when the Artin group is not dihedral, we describe a rank 1 periodic geodesic, thus proving that extra large type Artin groups are acylindrically hyperbolic. Together with Property RD proved by Ciabonu, Holt and Rees, the CAT(0) property implies the Baum-Connes conjecture for all XXL type Artin groups.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "74105     We present an algorithm for robust model predictive control with consideration of uncertainty and safety constraints. Our framework considers a nonlinear dynamical system subject to disturbances from an unknown but bounded uncertainty set. By viewing the system as a fixed point of an operator acting over trajectories, we propose a convex condition on control actions that guarantee safety against the uncertainty set. The proposed condition guarantees that all realizations of the state trajectories satisfy safety constraints. Our algorithm solves a sequence of convex quadratic constrained optimization problems of size n*N, where n is the number of states, and N is the prediction horizon in the model predictive control problem. Compared to existing methods, our approach solves convex problems while guaranteeing that all realizations of uncertainty set do not violate safety constraints. Moreover, we consider the implicit time-discretization of system dynamics to increase the prediction horizon and enhance computational accuracy. Numerical simulations for vehicle navigation demonstrate the effectiveness of our approach.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "155131    Consider the following game played by Maker and Breaker on the vertices of the cycle LATEX  with first move given to Breaker. The aim of Maker is to maximise the number of adjacent pairs of vertices that are both claimed by her, and the aim of Breaker is to minimise this number. The aim of this paper is to find this number exactly for all LATEX  when both players play optimally, answering a related question of Dowden, Kang, Mikalacki and Stojakovic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "20183     This paper is concerned with mismatched disturbance rejection control for the second-order discrete-time systems.Different from previous work, the controllability of the system is applied to design the disturbance compensation gain, which does not require any coordinate transformations. Via this new idea, it is shown that disturbance in the regulated output is immediately and directly compensated in the case that the disturbance is known. When the disturbance is unknown, an extra generalized extended state observer is applied to design the controller. Two examples are given to show the effectiveness of the proposed methods. Numerical simulation shows that the designed controller has excellent disturbance rejection effect when the disturbance is known. The example with respect to the permanent-magnet direct current motor illustrates that the proposed control method for unknown disturbance rejection is effective.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "140472    The present article is devoted to the investigation of some properties of the generalized shift operator of numbers represented in terms of numeral systems with a variable alphabet.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "167763    The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit.   In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "167061    In previous work by the first and third author with Matthew Baker, a family of bijections between bases of a regular matroid and the Jacobian group of the matroid was given. The core of the work is a geometric construction using zonotopal tilings that produces bijections between the bases of a realizable oriented matroid and the set of LATEX  orientations with respect to some acyclic circuit (respectively, cocircuit) signature LATEX  (respectively, LATEX  In this work, we extend this construction to general oriented matroids and circuit (respectively, cocircuit) signatures coming from generic single-element liftings (respectively, extensions). As a corollary, when both signatures are induced by the same lexicographic data, we give a new (bijective) proof of the interpretation of LATEX  using orientation activity due to Gioan and Las Vergnas. Here LATEX  is the Tutte polynomial of the matroid.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              \n",
       "109755    Let LATEX  be a domain in LATEX  LATEX  In the paper's references [HMM2] and [GMT] it was proved that if LATEX  satisfies a corkscrew condition and if LATEX  is LATEX  regular, i.e. Hausdorff measure LATEX  for all LATEX  and LATEX  then LATEX  is uniformly rectifiable if and only if (a) a square function Carleson measure estimate holds for every bounded harmonic function on LATEX  or (b) an LATEX  property for all LATEX  for every such function. Here we explore (a) and (b) when LATEX  is not required to be Ahlfors regular. We first prove that (a) and (b) hold for any domain LATEX  for which there exists a domain LATEX  such that LATEX  and LATEX  is uniformly rectifiable. We next assume LATEX  satisfies a corkscrew condition and LATEX  satisfies a capacity density condition. Under these assumptions we prove conversely that the existence of such LATEX  implies (a) and (b) hold on LATEX  and give further characterizations of domains for which (a) or (b) holds. One is that harmonic measure satisfies a Carleson packing condition for diameters similar to the corona decompositionm proved equivalent to uniform rectifiability in [GMT]. The second characterization is reminiscent of the Carleson measure description of LATEX  interpolating sequences in the unit disc.                                                                                                                                                                                                                                                                                                                                                          \n",
       "17024     Let LATEX  be a multivariate skew polynomial ring over a division ring LATEX  In this paper, we introduce the notion of right and left LATEX  derivatives of polynomials in LATEX  a Hermite-type multivariate skew polynomial interpolation problem. The main technical tools and results used here are of constructive type, showing methods and algorithms to construct a polynomial in LATEX  which satisfies the above Hermite-type interpolation problem and its relative Lagrange-type version.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "131098    We present a new class of polynomial-time algorithms for submodular function minimization (SFM), as well as a unified framework to obtain strongly polynomial SFM algorithms. Our algorithms are based on simple iterative methods for the minimum-norm problem, such as the conditional gradient and Fujishige-Wolfe algorithms. We exhibit two techniques to turn simple iterative methods into polynomial-time algorithms.   Firstly, we adapt the geometric rescaling technique, which has recently gained attention in linear programming, to SFM and obtain a weakly polynomial bound LATEX    Secondly, we exhibit a general combinatorial black-box approach to turn LATEX  SFM oracles into strongly polynomial exact SFM algorithms. This framework can be applied to a wide range of combinatorial and continuous algorithms, including pseudo-polynomial ones. In particular, we can obtain strongly polynomial algorithms by a repeated application of the conditional gradient or of the Fujishige-Wolfe algorithm. Combined with the geometric rescaling technique, the black-box approach provides an LATEX  algorithm.   Finally, we show that one of the techniques we develop in the paper can also be combined with the cutting-plane method of Lee, Sidford, and Wong , yielding a simplified variant of their LATEX  algorithm.                                                                                                                                                                                                                                                                                                                                  \n",
       "17981     In 1943, Hadwiger conjectured that every graph with no LATEX  minor is LATEX  for every LATEX  In the 1980s, Kostochka and Thomason independently proved that every graph with no LATEX  minor has average degree LATEX  and hence is LATEX  Recently, Norin, Song and the second author showed that every graph with no LATEX  minor is LATEX  for every LATEX  making the first improvement on the order of magnitude of the LATEX  bound. The first main result of this paper is that every graph with no LATEX  minor is LATEX    This is a corollary of our main technical result that the chromatic number of a LATEX  graph is bounded by LATEX  where LATEX  is the maximum of LATEX  over all LATEX  and LATEX  subgraphs LATEX  of LATEX  that are small (i.e. LATEX  vertices). This has a number of interesting corollaries. First as mentioned, using the current best-known bounds on coloring small LATEX  graphs, we show that LATEX  graphs are LATEX  Second, it shows that proving Linear Hadwiger's Conjecture (that LATEX  graphs are LATEX  reduces to proving it for small graphs. Third, we prove that LATEX  graphs with clique number at most LATEX  are LATEX  This implies our final corollary that Linear Hadwiger's Conjecture holds for LATEX  graphs for every fixed LATEX    One key to proving the main theorem is a new standalone result that every LATEX  graph of average degree LATEX  has a subgraph on LATEX  vertices with average degree LATEX                                                                                                                                                                                              \n",
       "54604     We share a small connection between information theory, algebra, and topology - namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "19308     There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment.   This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic\" setting where the environment statistics change in ``episodes\", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {t without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness\" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.    \n",
       "44864     This work deals with a number of questions relative to the discrete and continuous adjoint fields associated with the compressible Euler equations and classical aerodynamic functions. The consistency of the discrete adjoint equations with the corresponding continuous adjoint partial differential equation is one of them. It is has been established or at least discussed only for a handful of numerical schemes and a contribution of this article is to give the adjoint consistency conditions for the 2D Jameson-Schmidt-Turkel scheme in cell-centred finite-volume formulation. The consistency issue is also studied here from a new heuristic point of view by discretizing the continuous adjoint equation for the discrete flow and adjoint fields. Both points of view prove to provide useful information. Besides, it has been often noted that discrete or continuous inviscid lift and drag adjoint exhibit numerical divergence close to the wall and stagnation streamline for a wide range of subsonic and transonic flow conditions. This is analyzed here using the physical source term perturbation method introduced in reference [Giles and Pierce, AIAA Paper 97-1850, 1997]. With this point of view, the fourth physical source term of appears to be the only one responsible for this behavior. It is also demonstrated that the numerical divergence of the adjoint variables corresponds to the response of the flow to the convected increment of stagnation pressure and diminution of entropy created at the source and the resulting change in lift and drag.                                                                             \n",
       "26270     The aim of the paper is to first point out that the classical proof of the Freyd-Mitchell Embedding Theorem does not work in CZF; then, to propose an alternative embedding of a small abelian category into the category of sheaves of modules over a ringed space, which works constructively. It is necessary to mention that this work has been initially inspired by Erik Palmgren, who unexpectedly passed away in November 2019: I'm very grateful to him for having shared with me his intuitions, and for having supervised the realization of the first half of the paper.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "78234     Given a coarse space LATEX  we consider linear orders on LATEX  compatible with the coarse structure LATEX  and explore interplays between these orders and macro-uniform selectors of LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "\n",
       "                                                                                                                                                                                  keywords  \n",
       "30101   [CF-mMIMO, coloring-based, multiple-input, multiple-output, analog-to-digital, cell-free, closed-form, signal-to-interference-plus-noise-ratios, low-resolution, device-to-device]  \n",
       "44092   [finite-time, non-asymptotic, time-invariant, least-squares]                                                                                                                        \n",
       "95950   [Eisenbud-Green-Hulek-Popescu, non-trivial, Cohen-Macaulay]                                                                                                                         \n",
       "87091   None                                                                                                                                                                                \n",
       "86195   [Baum-Connes]                                                                                                                                                                       \n",
       "74105   [time-discretization]                                                                                                                                                               \n",
       "155131  None                                                                                                                                                                                \n",
       "20183   [discrete-time, second-order, permanent-magnet]                                                                                                                                     \n",
       "140472  None                                                                                                                                                                                \n",
       "167763  [semi-supervised, ill-posed, large-sample, large-data]                                                                                                                              \n",
       "167061  [single-element]                                                                                                                                                                    \n",
       "109755  None                                                                                                                                                                                \n",
       "17024   [Hermite-type, Lagrange-type]                                                                                                                                                       \n",
       "131098  [black-box, pseudo-polynomial, minimum-norm, Fujishige-Wolfe, polynomial-time, cutting-plane]                                                                                       \n",
       "17981   [best-known]                                                                                                                                                                        \n",
       "54604   [well-known]                                                                                                                                                                        \n",
       "19308   [data-driven, state-of-the-art, ray-tracing]                                                                                                                                        \n",
       "44864   [Jameson-Schmidt-Turkel, cell-centred, finite-volume, 97-1850]                                                                                                                      \n",
       "26270   [Freyd-Mitchell]                                                                                                                                                                    \n",
       "78234   [macro-uniform]                                                                                                                                                                     "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sample(20)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 06/10/2023\n",
    "\n",
    "1. What is the state of the saved datafile?\n",
    "1. Clean the titles.\n",
    "1. Can we modify the latex cleaning so that any single character $C$ is replaced by C?\n",
    "    - Yes but unlikely this will matter, as single character words will mostly be notation and will not convey much meaning\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_colwidth', 0)\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>83715</th>\n",
       "      <td>Existence and uniqueness for the mild solution of the stochastic heat\\n  equation with non-Lipschitz drift on an unbounded spatial domain</td>\n",
       "      <td>We prove the existence and uniqueness of the mild solution for a nonlinear stochastic heat equation defined on an unbounded spatial domain. The nonlinearity is not assumed to be globally, or even locally, Lipschitz continuous. Instead the nonlinearity is assumed to satisfy a one-sided Lipschitz condition. First, a strengthened version of the Kolmogorov continuity theorem is introduced to prove that the stochastic convolutions of the fundamental solution of the heat equation and a spatially homogeneous noise grow no faster than polynomially. Second, a deterministic mapping that maps the stochastic convolution to the solution of the stochastic heat equation is proven to be Lipschitz continuous on polynomially weighted spaces of continuous functions. These two ingredients enable the formulation of a Picard iteration scheme to prove the existence and uniqueness of the mild solution.</td>\n",
       "      <td>[math.PR]</td>\n",
       "      <td>[['Salins', 'Michael', '']]</td>\n",
       "      <td>2021-02-12</td>\n",
       "      <td>2002.02016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88667</th>\n",
       "      <td>Subdivisional spaces and graph braid groups</td>\n",
       "      <td>We study the problem of computing the homology of the configuration spaces of a finite cell complex $X$. We proceed by viewing $X$, together with its subdivisions, as a subdivisional space--a kind of diagram object in a category of cell complexes. After developing a version of Morse theory for subdivisional spaces, we decompose $X$ and show that the homology of the configuration spaces of $X$ is computed by the derived tensor product of the Morse complexes of the pieces of the decomposition, an analogue of the monoidal excision property of factorization homology.   Applying this theory to the configuration spaces of a graph, we recover a cellular chain model due to \\'{S}wi\\k{a}tkowski. Our method of deriving this model enhances it with various convenient functorialities, exact sequences, and module structures, which we exploit in numerous computations, old and new.</td>\n",
       "      <td>[math.AT, math.GT]</td>\n",
       "      <td>[['An', 'Byung Hee', ''], ['Drummond-Cole', 'Gabriel C.', ''], ['Knudsen', 'Ben', '']]</td>\n",
       "      <td>2021-01-11</td>\n",
       "      <td>1708.02351</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43949</th>\n",
       "      <td>Fluctuations for linear eigenvalue statistics of sample covariance\\n  matrices</td>\n",
       "      <td>We prove a central limit theorem for the difference of linear eigenvalue statistics of a sample covariance matrix $\\widetilde{W}$ and its minor $W$. We find that the fluctuation of this difference is much smaller than those of the individual linear statistics, as a consequence of the strong correlation between the eigenvalues of $\\widetilde{W}$ and $W$. Our result identifies the fluctuation of the spatial derivative of the approximate Gaussian field in the recent paper by Dumitru and Paquette. Unlike in a similar result for Wigner matrices, for sample covariance matrices the fluctuation may entirely vanish.</td>\n",
       "      <td>[math.PR, math-ph, math.MP]</td>\n",
       "      <td>[['Cipolloni', 'Giorgio', ''], ['Erdős', 'László', '']]</td>\n",
       "      <td>2021-11-23</td>\n",
       "      <td>1806.08751</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18718</th>\n",
       "      <td>Existence of real algebraic hypersurfaces with many prescribed\\n  components</td>\n",
       "      <td>Given a real algebraic variety $X$ of dimension $n$, a very ample divisor $D$ on $X$ and a smooth closed hypersurface $\\Sigma$ of $\\mathbf{R}^n$, we construct real algebraic hypersurfaces in the linear system $|mD|$ whose real locus contains many connected components diffeomorphic to $\\Sigma$. As a consequence, we show the existence of real algebraic hypersurfaces in the linear system $|mD|$ whose Betti numbers grow by the maximal order, as $m$ goes to infinity. As another application, we recover a result by D. Gayet on the existence of many disjoint lagrangians with prescribed topology in any smooth complex hypersurface of $\\mathbf{C}\\mathbf{P}^n$. The results in the paper are proved more generally for complete intersections. The proof of our main result uses probabilistic tools.</td>\n",
       "      <td>[math.AG]</td>\n",
       "      <td>[['Ancona', 'Michele', '']]</td>\n",
       "      <td>2022-05-16</td>\n",
       "      <td>2205.06617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60483</th>\n",
       "      <td>Two lives: Compositions of unimodular rows</td>\n",
       "      <td>The paper lays the foundation for the study of unimodular rows using Spin groups. We show that elementary orbits of unimodular rows (of any length $n\\geq 3$) are equivalent to elementary Spin orbits on the unit sphere. (This bijection is true over all commutative rings). In the special case $n=3$, we get an interpretation of the Vaserstein symbol using Spin groups.   In addition, we introduce a new composition law that operates on certain subspaces of the underlying quadratic space (using the multiplication in composition algebras). In particular, the special case of split-quaternions leads to the composition of unimodular rows (discovered by L. Vaserstein and later generalized by W. van der Kallen). Strikingly, with this approach, we now see the possibility of new orbit structures not only for unimodular rows (using octonion multiplication) but also for more general quadratic spaces.</td>\n",
       "      <td>[math.RA, math.AC, math.RT]</td>\n",
       "      <td>[['Chintala', 'Vineeth', '']]</td>\n",
       "      <td>2021-07-28</td>\n",
       "      <td>2101.03862</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                           title  \\\n",
       "83715  Existence and uniqueness for the mild solution of the stochastic heat\\n  equation with non-Lipschitz drift on an unbounded spatial domain   \n",
       "88667  Subdivisional spaces and graph braid groups                                                                                                 \n",
       "43949  Fluctuations for linear eigenvalue statistics of sample covariance\\n  matrices                                                              \n",
       "18718  Existence of real algebraic hypersurfaces with many prescribed\\n  components                                                                \n",
       "60483  Two lives: Compositions of unimodular rows                                                                                                  \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   abstract  \\\n",
       "83715    We prove the existence and uniqueness of the mild solution for a nonlinear stochastic heat equation defined on an unbounded spatial domain. The nonlinearity is not assumed to be globally, or even locally, Lipschitz continuous. Instead the nonlinearity is assumed to satisfy a one-sided Lipschitz condition. First, a strengthened version of the Kolmogorov continuity theorem is introduced to prove that the stochastic convolutions of the fundamental solution of the heat equation and a spatially homogeneous noise grow no faster than polynomially. Second, a deterministic mapping that maps the stochastic convolution to the solution of the stochastic heat equation is proven to be Lipschitz continuous on polynomially weighted spaces of continuous functions. These two ingredients enable the formulation of a Picard iteration scheme to prove the existence and uniqueness of the mild solution.          \n",
       "88667    We study the problem of computing the homology of the configuration spaces of a finite cell complex $X$. We proceed by viewing $X$, together with its subdivisions, as a subdivisional space--a kind of diagram object in a category of cell complexes. After developing a version of Morse theory for subdivisional spaces, we decompose $X$ and show that the homology of the configuration spaces of $X$ is computed by the derived tensor product of the Morse complexes of the pieces of the decomposition, an analogue of the monoidal excision property of factorization homology.   Applying this theory to the configuration spaces of a graph, we recover a cellular chain model due to \\'{S}wi\\k{a}tkowski. Our method of deriving this model enhances it with various convenient functorialities, exact sequences, and module structures, which we exploit in numerous computations, old and new.                        \n",
       "43949    We prove a central limit theorem for the difference of linear eigenvalue statistics of a sample covariance matrix $\\widetilde{W}$ and its minor $W$. We find that the fluctuation of this difference is much smaller than those of the individual linear statistics, as a consequence of the strong correlation between the eigenvalues of $\\widetilde{W}$ and $W$. Our result identifies the fluctuation of the spatial derivative of the approximate Gaussian field in the recent paper by Dumitru and Paquette. Unlike in a similar result for Wigner matrices, for sample covariance matrices the fluctuation may entirely vanish.                                                                                                                                                                                                                                                                                               \n",
       "18718    Given a real algebraic variety $X$ of dimension $n$, a very ample divisor $D$ on $X$ and a smooth closed hypersurface $\\Sigma$ of $\\mathbf{R}^n$, we construct real algebraic hypersurfaces in the linear system $|mD|$ whose real locus contains many connected components diffeomorphic to $\\Sigma$. As a consequence, we show the existence of real algebraic hypersurfaces in the linear system $|mD|$ whose Betti numbers grow by the maximal order, as $m$ goes to infinity. As another application, we recover a result by D. Gayet on the existence of many disjoint lagrangians with prescribed topology in any smooth complex hypersurface of $\\mathbf{C}\\mathbf{P}^n$. The results in the paper are proved more generally for complete intersections. The proof of our main result uses probabilistic tools.                                                                                                              \n",
       "60483    The paper lays the foundation for the study of unimodular rows using Spin groups. We show that elementary orbits of unimodular rows (of any length $n\\geq 3$) are equivalent to elementary Spin orbits on the unit sphere. (This bijection is true over all commutative rings). In the special case $n=3$, we get an interpretation of the Vaserstein symbol using Spin groups.   In addition, we introduce a new composition law that operates on certain subspaces of the underlying quadratic space (using the multiplication in composition algebras). In particular, the special case of split-quaternions leads to the composition of unimodular rows (discovered by L. Vaserstein and later generalized by W. van der Kallen). Strikingly, with this approach, we now see the possibility of new orbit structures not only for unimodular rows (using octonion multiplication) but also for more general quadratic spaces.    \n",
       "\n",
       "                               cat  \\\n",
       "83715  [math.PR]                     \n",
       "88667  [math.AT, math.GT]            \n",
       "43949  [math.PR, math-ph, math.MP]   \n",
       "18718  [math.AG]                     \n",
       "60483  [math.RA, math.AC, math.RT]   \n",
       "\n",
       "                                                                               authors_parsed  \\\n",
       "83715  [['Salins', 'Michael', '']]                                                              \n",
       "88667  [['An', 'Byung Hee', ''], ['Drummond-Cole', 'Gabriel C.', ''], ['Knudsen', 'Ben', '']]   \n",
       "43949  [['Cipolloni', 'Giorgio', ''], ['Erdős', 'László', '']]                                  \n",
       "18718  [['Ancona', 'Michele', '']]                                                              \n",
       "60483  [['Chintala', 'Vineeth', '']]                                                            \n",
       "\n",
       "      update_date          id  \n",
       "83715 2021-02-12   2002.02016  \n",
       "88667 2021-01-11   1708.02351  \n",
       "43949 2021-11-23   1806.08751  \n",
       "18718 2022-05-16   2205.06617  \n",
       "60483 2021-07-28   2101.03862  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## 1. What is the state of the saved data?\n",
    "\n",
    "data = pd.read_parquet('./data/arXiv.parquet')\n",
    "data.sample(5)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Abstracts have \\n removed, but no other modifications\n",
    "- Titles have not been modified at all\n",
    "- Categories have been OHE in another file.\n",
    "\n",
    "Below: Clean the dataset and make a new-column containing all hyphenated keywords inside the cleaned titles and cleaned abstracts. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>165063</th>\n",
       "      <td>Injectivity, crossed products, and amenable group actions</td>\n",
       "      <td>This paper is motivated primarily by the question of when the maximal and reduced crossed products of a $G$-$C^*$-algebra agree (particularly inspired by results of Matsumura and Suzuki), and the relationships with various notions of amenability and injectivity. We give new connections between these notions. Key tools in this include the natural equivariant analogues of injectivity, and of Lance's weak expectation property: we also give complete characterizations of these equivariant properties, and some connections with injective envelopes in the sense of Hamana.</td>\n",
       "      <td>[math.OA]</td>\n",
       "      <td>[['Buss', 'Alcides', ''], ['Echterhoff', 'Siegfried', ''], ['Willett', 'Rufus', '']]</td>\n",
       "      <td>2019-04-30</td>\n",
       "      <td>1904.06771</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47016</th>\n",
       "      <td>Online Optimization with Feedback Delay and Nonlinear Switching Cost</td>\n",
       "      <td>We study a variant of online optimization in which the learner receives $k$-round $\\textit{delayed feedback}$ about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is $O(L^{2k})$, where $L$ is the Lipschitz constant of the switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on $k$ and $L$ are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constant-competitive online policies.</td>\n",
       "      <td>[cs.LG, cs.SY, eess.SY, math.OC]</td>\n",
       "      <td>[['Pan', 'Weici', ''], ['Shi', 'Guanya', ''], ['Lin', 'Yiheng', ''], ['Wierman', 'Adam', '']]</td>\n",
       "      <td>2021-11-02</td>\n",
       "      <td>2111.00095</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146120</th>\n",
       "      <td>Integrable systems and Special K\\\"ahler metrics</td>\n",
       "      <td>We describe the Special K\\\"ahler structure on the base of the so-called Hitchin system in terms of the geometry of the space of spectral curves. It yields a simple formula for the K\\\"ahler potential. This extends to the case of a singular spectral curve and we show that this defines the Special K\\\"ahler structure on certain natural integrable subsystems. Examples include the extreme case where the metric is flat.</td>\n",
       "      <td>[math.DG]</td>\n",
       "      <td>[['Hitchin', 'Nigel', '']]</td>\n",
       "      <td>2019-10-14</td>\n",
       "      <td>1910.05170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75448</th>\n",
       "      <td>Regularity theorem for totally nonnegative flag varieties</td>\n",
       "      <td>We show that the totally nonnegative part of a partial flag variety $G/P$ (in the sense of Lusztig) is a regular CW complex, confirming a conjecture of Williams. In particular, the closure of each positroid cell inside the totally nonnegative Grassmannian is homeomorphic to a ball, confirming a conjecture of Postnikov.</td>\n",
       "      <td>[math.CO, math.AG, math.GT, math.RT]</td>\n",
       "      <td>[['Galashin', 'Pavel', ''], ['Karp', 'Steven N.', ''], ['Lam', 'Thomas', '']]</td>\n",
       "      <td>2021-04-13</td>\n",
       "      <td>1904.00527</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98744</th>\n",
       "      <td>Federated Principal Component Analysis</td>\n",
       "      <td>We present a federated, asynchronous, and $(\\varepsilon, \\delta)$-differentially private algorithm for PCA in the memory-limited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its $r$ leading principal components when only $\\mathcal{O}(dr)$ memory is available with $d$ being the dimensionality of the data. We guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset $\\mathbf{X} \\in \\mathbb{R}^{d \\times n}$ is perturbed with a non-symmetric random Gaussian matrix with variance in $\\mathcal{O}\\left(\\left(\\frac{d}{n}\\right)^2 \\log d \\right)$, thus improving upon the state-of-the-art. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limited-memory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability.</td>\n",
       "      <td>[cs.LG, cs.IT, math.IT, stat.ML]</td>\n",
       "      <td>[['Grammenos', 'Andreas', ''], ['Mendoza-Smith', 'Rodrigo', ''], ['Crowcroft', 'Jon', ''], ['Mascolo', 'Cecilia', '']]</td>\n",
       "      <td>2020-10-26</td>\n",
       "      <td>1907.08059</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72148</th>\n",
       "      <td>Approaching optimality in blow-up results for Keller-Segel systems with   logistic-type dampening</td>\n",
       "      <td>Nonnegative solutions of the Neumann initial-boundary value problem for the chemotaxis system \\begin{align}\\label{prob:star}\\tag{$\\star$} \\begin{cases} u_t = \\Delta u - \\nabla \\cdot (u \\nabla v) + \\lambda u - \\mu u^\\kappa, \\\\\\\\ 0 = \\Delta v - \\overline m(t) + u, \\quad \\overline m(t) = \\frac1{|\\Omega|} \\int_\\Omega u(\\cdot, t) \\end{cases} \\end{align} in smooth bounded domains $\\Omega \\subset \\mathbb R^n$, $n \\ge 1$, are known to be global-in-time if $\\lambda \\geq 0$, $\\mu &gt; 0$ and $\\kappa &gt; 2$.   In the present work, we show that the exponent $\\kappa = 2$ is actually critical in the four- and higher dimensional setting. More precisely, if \\begin{alignat*}{3} \\qquad n &amp;\\geq 4, &amp;&amp;\\quad \\kappa \\in (1, 2) \\quad &amp;&amp;\\text{and} \\quad \\mu &gt; 0 \\\\\\\\ \\text{or}\\qquad n &amp;\\geq 5, &amp;&amp;\\quad \\kappa = 2 \\quad &amp;&amp;\\text{and} \\quad \\mu \\in \\left(0, \\frac{n-4}{n}\\right), \\end{alignat*} for balls $\\Omega \\subset \\mathbb R^n$ and parameters $\\lambda \\geq 0$, $m_0 &gt; 0$, we construct a nonnegative initial datum $u_0 \\in C^0(\\overline \\Omega)$ with $\\int_\\Omega u_0 = m_0$ for which the corresponding solution $(u, v)$ of \\eqref{prob:star} blows up in finite time. Moreover, in 3D, we obtain finite-time blow-up for $\\kappa \\in (1, \\frac32)$ (and $\\lambda \\geq 0$, $\\mu &gt; 0$).   As the corner stone of our analysis, for certain initial data, we prove that the mass accumulation function $w(s, t) = \\int_0^{\\sqrt[n]{s}} \\rho^{n-1} u(\\rho, t) \\,\\mathrm d\\rho$ fulfills the estimate $w_s \\le \\frac{w}{s}$. Using this information, we then obtain finite-time blow-up of $u$ by showing that for suitably chosen initial data, $s_0$ and $\\gamma$, the function $\\phi(t) = \\int_0^{s_0} s^{-\\gamma} (s_0 - s) w(s, t)$ cannot exist globally.</td>\n",
       "      <td>[math.AP]</td>\n",
       "      <td>[['Fuest', 'Mario', '']]</td>\n",
       "      <td>2021-05-10</td>\n",
       "      <td>2007.01184</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95154</th>\n",
       "      <td>A Whitney type theorem for surfaces: characterising graphs with locally   planar embeddings</td>\n",
       "      <td>We prove that for any parameter r an r-locally 2-connected graph G embeds r-locally planarly in a surface if and only if a certain matroid associated to the graph G is co-graphic.   This extends Whitney's abstract planar duality theorem from 1932.</td>\n",
       "      <td>[math.CO]</td>\n",
       "      <td>[['Carmesin', 'Johannes', '']]</td>\n",
       "      <td>2020-11-20</td>\n",
       "      <td>2008.03027</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130579</th>\n",
       "      <td>Rigorous bounds on the heat transport of rotating convection with Ekman   pumping</td>\n",
       "      <td>We establish rigorous upper bounds on the time-averaged heat transport for a model of rotating Rayleigh-Benard convection between no-slip boundaries at infinite Prandtl number and with Ekman pumping. The analysis is based on the asymptotically reduced equations derived for rotationally constrained dynamics with no-slip boundaries, and hence includes a lower order correction that accounts for the Ekman layer and corresponding Ekman pumping into the bulk. Using the auxiliary functional method we find that, to leading order, the temporally averaged heat transport is bounded above as a function of the Rayleigh and Ekman numbers Ra and Ek according to $Nu \\leq 0.3704 Ra^2 Ek^2$. Dependent on the relative values of the thermal forcing represented by $Ra$ and the effects of rotation represented by $Ek$, this bound is both an improvement on earlier rigorous upper bounds, and provides a partial explanation of recent numerical and experimental results that were consistent yet surprising relative to the previously derived upper bound of $Nu \\lesssim Ra^3 k^4$.</td>\n",
       "      <td>[math-ph, math.MP, physics.flu-dyn]</td>\n",
       "      <td>[['Pachev', 'B.', ''], ['Whitehead', 'J. P.', ''], ['Fantuzzi', 'G.', ''], ['Grooms', 'I.', '']]</td>\n",
       "      <td>2020-02-19</td>\n",
       "      <td>1910.13588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115850</th>\n",
       "      <td>Massey products in the homology of the loopspace of a p-completed   classifying space: finite groups with cyclic Sylow p-subgroups</td>\n",
       "      <td>Let G be a finite group with cyclic Sylow p-subgroup, and let k be a field of characteristic p. Then H^*(BG;k) and H_*(\\Omega BG\\phat;k) are A_{\\infty} algebras whose structure we determine up to quasi-isomorphism.</td>\n",
       "      <td>[math.RT]</td>\n",
       "      <td>[['Greenlees', 'John', ''], ['Benson', 'Dave', '']]</td>\n",
       "      <td>2020-06-15</td>\n",
       "      <td>2006.07160</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95087</th>\n",
       "      <td>List-Decodable Mean Estimation in Nearly-PCA Time</td>\n",
       "      <td>Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority $\\frac 1 k$ fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on $\\mathbb{R}^d$ has bounded covariance, our algorithm outputs a list of $O(k)$ candidate means, one of which is within distance $O(\\sqrt{k})$ from the truth. Our algorithm runs in time $\\widetilde{O}(ndk)$ for all $k = O(\\sqrt{d}) \\cup \\Omega(d)$, where $n$ is the size of the dataset. We also show that a variant of our algorithm has runtime $\\widetilde{O}(ndk)$ for all $k$, at the expense of an $O(\\sqrt{\\log k})$ factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single $k$-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes $\\widetilde{O}(n^2 d k^2)$ and $\\widetilde{O}(nd k^{\\ge 6})$.   Our approach builds on a novel soft downweighting method, $\\mathsf{SIFT}$, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of $\\mathsf{SIFT}$ via a careful \"win-win-win\" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.</td>\n",
       "      <td>[cs.DS, cs.LG, math.OC, stat.ML]</td>\n",
       "      <td>[['Diakonikolas', 'Ilias', ''], ['Kane', 'Daniel M.', ''], ['Kongsgaard', 'Daniel', ''], ['Li', 'Jerry', ''], ['Tian', 'Kevin', '']]</td>\n",
       "      <td>2020-11-20</td>\n",
       "      <td>2011.09973</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                     title  \\\n",
       "165063  Injectivity, crossed products, and amenable group actions                                                                            \n",
       "47016   Online Optimization with Feedback Delay and Nonlinear Switching Cost                                                                 \n",
       "146120  Integrable systems and Special K\\\"ahler metrics                                                                                      \n",
       "75448   Regularity theorem for totally nonnegative flag varieties                                                                            \n",
       "98744   Federated Principal Component Analysis                                                                                               \n",
       "72148   Approaching optimality in blow-up results for Keller-Segel systems with   logistic-type dampening                                    \n",
       "95154   A Whitney type theorem for surfaces: characterising graphs with locally   planar embeddings                                          \n",
       "130579  Rigorous bounds on the heat transport of rotating convection with Ekman   pumping                                                    \n",
       "115850  Massey products in the homology of the loopspace of a p-completed   classifying space: finite groups with cyclic Sylow p-subgroups   \n",
       "95087   List-Decodable Mean Estimation in Nearly-PCA Time                                                                                    \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              abstract  \\\n",
       "165063    This paper is motivated primarily by the question of when the maximal and reduced crossed products of a $G$-$C^*$-algebra agree (particularly inspired by results of Matsumura and Suzuki), and the relationships with various notions of amenability and injectivity. We give new connections between these notions. Key tools in this include the natural equivariant analogues of injectivity, and of Lance's weak expectation property: we also give complete characterizations of these equivariant properties, and some connections with injective envelopes in the sense of Hamana.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "47016     We study a variant of online optimization in which the learner receives $k$-round $\\textit{delayed feedback}$ about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is $O(L^{2k})$, where $L$ is the Lipschitz constant of the switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on $k$ and $L$ are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constant-competitive online policies.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "146120    We describe the Special K\\\"ahler structure on the base of the so-called Hitchin system in terms of the geometry of the space of spectral curves. It yields a simple formula for the K\\\"ahler potential. This extends to the case of a singular spectral curve and we show that this defines the Special K\\\"ahler structure on certain natural integrable subsystems. Examples include the extreme case where the metric is flat.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "75448     We show that the totally nonnegative part of a partial flag variety $G/P$ (in the sense of Lusztig) is a regular CW complex, confirming a conjecture of Williams. In particular, the closure of each positroid cell inside the totally nonnegative Grassmannian is homeomorphic to a ball, confirming a conjecture of Postnikov.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "98744     We present a federated, asynchronous, and $(\\varepsilon, \\delta)$-differentially private algorithm for PCA in the memory-limited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its $r$ leading principal components when only $\\mathcal{O}(dr)$ memory is available with $d$ being the dimensionality of the data. We guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset $\\mathbf{X} \\in \\mathbb{R}^{d \\times n}$ is perturbed with a non-symmetric random Gaussian matrix with variance in $\\mathcal{O}\\left(\\left(\\frac{d}{n}\\right)^2 \\log d \\right)$, thus improving upon the state-of-the-art. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limited-memory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "72148     Nonnegative solutions of the Neumann initial-boundary value problem for the chemotaxis system \\begin{align}\\label{prob:star}\\tag{$\\star$} \\begin{cases} u_t = \\Delta u - \\nabla \\cdot (u \\nabla v) + \\lambda u - \\mu u^\\kappa, \\\\\\\\ 0 = \\Delta v - \\overline m(t) + u, \\quad \\overline m(t) = \\frac1{|\\Omega|} \\int_\\Omega u(\\cdot, t) \\end{cases} \\end{align} in smooth bounded domains $\\Omega \\subset \\mathbb R^n$, $n \\ge 1$, are known to be global-in-time if $\\lambda \\geq 0$, $\\mu > 0$ and $\\kappa > 2$.   In the present work, we show that the exponent $\\kappa = 2$ is actually critical in the four- and higher dimensional setting. More precisely, if \\begin{alignat*}{3} \\qquad n &\\geq 4, &&\\quad \\kappa \\in (1, 2) \\quad &&\\text{and} \\quad \\mu > 0 \\\\\\\\ \\text{or}\\qquad n &\\geq 5, &&\\quad \\kappa = 2 \\quad &&\\text{and} \\quad \\mu \\in \\left(0, \\frac{n-4}{n}\\right), \\end{alignat*} for balls $\\Omega \\subset \\mathbb R^n$ and parameters $\\lambda \\geq 0$, $m_0 > 0$, we construct a nonnegative initial datum $u_0 \\in C^0(\\overline \\Omega)$ with $\\int_\\Omega u_0 = m_0$ for which the corresponding solution $(u, v)$ of \\eqref{prob:star} blows up in finite time. Moreover, in 3D, we obtain finite-time blow-up for $\\kappa \\in (1, \\frac32)$ (and $\\lambda \\geq 0$, $\\mu > 0$).   As the corner stone of our analysis, for certain initial data, we prove that the mass accumulation function $w(s, t) = \\int_0^{\\sqrt[n]{s}} \\rho^{n-1} u(\\rho, t) \\,\\mathrm d\\rho$ fulfills the estimate $w_s \\le \\frac{w}{s}$. Using this information, we then obtain finite-time blow-up of $u$ by showing that for suitably chosen initial data, $s_0$ and $\\gamma$, the function $\\phi(t) = \\int_0^{s_0} s^{-\\gamma} (s_0 - s) w(s, t)$ cannot exist globally.                                                                                                                                                                                                              \n",
       "95154     We prove that for any parameter r an r-locally 2-connected graph G embeds r-locally planarly in a surface if and only if a certain matroid associated to the graph G is co-graphic.   This extends Whitney's abstract planar duality theorem from 1932.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \n",
       "130579    We establish rigorous upper bounds on the time-averaged heat transport for a model of rotating Rayleigh-Benard convection between no-slip boundaries at infinite Prandtl number and with Ekman pumping. The analysis is based on the asymptotically reduced equations derived for rotationally constrained dynamics with no-slip boundaries, and hence includes a lower order correction that accounts for the Ekman layer and corresponding Ekman pumping into the bulk. Using the auxiliary functional method we find that, to leading order, the temporally averaged heat transport is bounded above as a function of the Rayleigh and Ekman numbers Ra and Ek according to $Nu \\leq 0.3704 Ra^2 Ek^2$. Dependent on the relative values of the thermal forcing represented by $Ra$ and the effects of rotation represented by $Ek$, this bound is both an improvement on earlier rigorous upper bounds, and provides a partial explanation of recent numerical and experimental results that were consistent yet surprising relative to the previously derived upper bound of $Nu \\lesssim Ra^3 k^4$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "115850    Let G be a finite group with cyclic Sylow p-subgroup, and let k be a field of characteristic p. Then H^*(BG;k) and H_*(\\Omega BG\\phat;k) are A_{\\infty} algebras whose structure we determine up to quasi-isomorphism.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "95087     Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority $\\frac 1 k$ fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on $\\mathbb{R}^d$ has bounded covariance, our algorithm outputs a list of $O(k)$ candidate means, one of which is within distance $O(\\sqrt{k})$ from the truth. Our algorithm runs in time $\\widetilde{O}(ndk)$ for all $k = O(\\sqrt{d}) \\cup \\Omega(d)$, where $n$ is the size of the dataset. We also show that a variant of our algorithm has runtime $\\widetilde{O}(ndk)$ for all $k$, at the expense of an $O(\\sqrt{\\log k})$ factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single $k$-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes $\\widetilde{O}(n^2 d k^2)$ and $\\widetilde{O}(nd k^{\\ge 6})$.   Our approach builds on a novel soft downweighting method, $\\mathsf{SIFT}$, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of $\\mathsf{SIFT}$ via a careful \"win-win-win\" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.    \n",
       "\n",
       "                                         cat  \\\n",
       "165063  [math.OA]                              \n",
       "47016   [cs.LG, cs.SY, eess.SY, math.OC]       \n",
       "146120  [math.DG]                              \n",
       "75448   [math.CO, math.AG, math.GT, math.RT]   \n",
       "98744   [cs.LG, cs.IT, math.IT, stat.ML]       \n",
       "72148   [math.AP]                              \n",
       "95154   [math.CO]                              \n",
       "130579  [math-ph, math.MP, physics.flu-dyn]    \n",
       "115850  [math.RT]                              \n",
       "95087   [cs.DS, cs.LG, math.OC, stat.ML]       \n",
       "\n",
       "                                                                                                                              authors_parsed  \\\n",
       "165063  [['Buss', 'Alcides', ''], ['Echterhoff', 'Siegfried', ''], ['Willett', 'Rufus', '']]                                                   \n",
       "47016   [['Pan', 'Weici', ''], ['Shi', 'Guanya', ''], ['Lin', 'Yiheng', ''], ['Wierman', 'Adam', '']]                                          \n",
       "146120  [['Hitchin', 'Nigel', '']]                                                                                                             \n",
       "75448   [['Galashin', 'Pavel', ''], ['Karp', 'Steven N.', ''], ['Lam', 'Thomas', '']]                                                          \n",
       "98744   [['Grammenos', 'Andreas', ''], ['Mendoza-Smith', 'Rodrigo', ''], ['Crowcroft', 'Jon', ''], ['Mascolo', 'Cecilia', '']]                 \n",
       "72148   [['Fuest', 'Mario', '']]                                                                                                               \n",
       "95154   [['Carmesin', 'Johannes', '']]                                                                                                         \n",
       "130579  [['Pachev', 'B.', ''], ['Whitehead', 'J. P.', ''], ['Fantuzzi', 'G.', ''], ['Grooms', 'I.', '']]                                       \n",
       "115850  [['Greenlees', 'John', ''], ['Benson', 'Dave', '']]                                                                                    \n",
       "95087   [['Diakonikolas', 'Ilias', ''], ['Kane', 'Daniel M.', ''], ['Kongsgaard', 'Daniel', ''], ['Li', 'Jerry', ''], ['Tian', 'Kevin', '']]   \n",
       "\n",
       "       update_date          id  \n",
       "165063 2019-04-30   1904.06771  \n",
       "47016  2021-11-02   2111.00095  \n",
       "146120 2019-10-14   1910.05170  \n",
       "75448  2021-04-13   1904.00527  \n",
       "98744  2020-10-26   1907.08059  \n",
       "72148  2021-05-10   2007.01184  \n",
       "95154  2020-11-20   2008.03027  \n",
       "130579 2020-02-19   1910.13588  \n",
       "115850 2020-06-15   2006.07160  \n",
       "95087  2020-11-20   2011.09973  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Remove the new line character from titles.\n",
    "\n",
    "data['title'] = data.title.str.replace('\\n',' ')\n",
    "data.sample(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Run the cleaning pipeline (See above) on the title and abstract columns\n",
    "\n",
    "data['clean_title'] = data.title.apply(cleanse)\n",
    "data['clean_abstract'] = data.abstract.apply(cleanse)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Find hyphenated keywords in the titles and abstracts\n",
    "\n",
    "\n",
    "pattern = r'(?<!-)\\b(?:\\w+)(?=-)(?:-(?=\\w)\\w+)+(?!-)\\b'\n",
    "\n",
    "def find_hyph(text):\n",
    "    keywords = regex.findall(pattern,text)\n",
    "    if keywords == []:\n",
    "        return None\n",
    "    else:\n",
    "        return keywords\n",
    "\n",
    "\n",
    "data['hyph_in_title'] = data.clean_title.apply(find_hyph) \n",
    "data['hyph_in_abstract'] = data.clean_abstract.apply(find_hyph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>abstract</th>\n",
       "      <th>cat</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "      <th>clean_title</th>\n",
       "      <th>clean_abstract</th>\n",
       "      <th>hyph_in_title</th>\n",
       "      <th>hyph_in_abstract</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>109208</th>\n",
       "      <td>Consistency of Variational Bayes Inference for Estimation and Model   Selection in Mixtures</td>\n",
       "      <td>Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.</td>\n",
       "      <td>[math.ST, stat.CO, stat.ME, stat.TH]</td>\n",
       "      <td>[['Chérief-Abdellatif', 'Badr-Eddine', ''], ['Alquier', 'Pierre', '']]</td>\n",
       "      <td>2020-08-03</td>\n",
       "      <td>1805.05054</td>\n",
       "      <td>Consistency of Variational Bayes Inference for Estimation and Model   Selection in Mixtures</td>\n",
       "      <td>Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.</td>\n",
       "      <td>None</td>\n",
       "      <td>[two-fold]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167970</th>\n",
       "      <td>Data Amplification: A Unified and Competitive Approach to Property   Estimation</td>\n",
       "      <td>Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just $2n$ samples to achieve the performance attained by the empirical estimator with $n\\sqrt{\\log n}$ samples. This provides off-the-shelf, distribution-independent, \"amplification\" of the amount of data available relative to common-practice estimators.   We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with $n$ samples is even as good as that of the empirical estimator with $n\\log n$ samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.</td>\n",
       "      <td>[stat.ML, cs.LG, math.ST, stat.TH]</td>\n",
       "      <td>[['Hao', 'Yi', ''], ['Orlitsky', 'Alon', ''], ['Suresh', 'Ananda T.', ''], ['Wu', 'Yihong', '']]</td>\n",
       "      <td>2019-04-02</td>\n",
       "      <td>1904.00070</td>\n",
       "      <td>Data Amplification: A Unified and Competitive Approach to Property   Estimation</td>\n",
       "      <td>Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just LATEX  samples to achieve the performance attained by the empirical estimator with LATEX  samples. This provides off-the-shelf, distribution-independent, \"amplification\" of the amount of data available relative to common-practice estimators.   We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with LATEX  samples is even as good as that of the empirical estimator with LATEX  samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.</td>\n",
       "      <td>None</td>\n",
       "      <td>[linear-time, off-the-shelf, distribution-independent, common-practice]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83749</th>\n",
       "      <td>A Novel Trick to Overcome the Phase Space Volume Change and the Use of   Hamiltonian Trajectories with an emphasis on the Free Expansion</td>\n",
       "      <td>We extend and successfully apply a recently proposed microstate nonequilibrium thermodynamics to study expansion/contraction processes. Here, the numbers of initial and final microstates are different so they cannot be connected by unique Hamiltonian trajectories. This commonly happens when the phase space volume changes, and has not been studied so far using Hamiltonian trajectories that can be inverted to yield an identity mapping between initial and final microstates as the parameter in the Hamiltonian is changed. We propose a trick to overcome this hurdle with a focus on free expansion in an isolated system, where the concept of dissipated work is not clear. The trick is shown to be thermodynamically consistent and can be extremely useful in simulation. We justify that it is the thermodynamic average of the internal microwork done by a microstate that is dissipated; this microwork is different from the exchange microwork with the vacuum, which vanishes. We also establish that the microwork is nonnegative for free expansion, which is remarkable, since its sign is not fixed in a general process.</td>\n",
       "      <td>[cond-mat.stat-mech, cond-mat.mes-hall, math-ph, math.MP, physics.comp-ph]</td>\n",
       "      <td>[['Gujrati', 'P. D.', '']]</td>\n",
       "      <td>2021-02-12</td>\n",
       "      <td>2102.06122</td>\n",
       "      <td>A Novel Trick to Overcome the Phase Space Volume Change and the Use of   Hamiltonian Trajectories with an emphasis on the Free Expansion</td>\n",
       "      <td>We extend and successfully apply a recently proposed microstate nonequilibrium thermodynamics to study expansion/contraction processes. Here, the numbers of initial and final microstates are different so they cannot be connected by unique Hamiltonian trajectories. This commonly happens when the phase space volume changes, and has not been studied so far using Hamiltonian trajectories that can be inverted to yield an identity mapping between initial and final microstates as the parameter in the Hamiltonian is changed. We propose a trick to overcome this hurdle with a focus on free expansion in an isolated system, where the concept of dissipated work is not clear. The trick is shown to be thermodynamically consistent and can be extremely useful in simulation. We justify that it is the thermodynamic average of the internal microwork done by a microstate that is dissipated; this microwork is different from the exchange microwork with the vacuum, which vanishes. We also establish that the microwork is nonnegative for free expansion, which is remarkable, since its sign is not fixed in a general process.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127567</th>\n",
       "      <td>2-Local derivations on the W-algebra W(2,2)</td>\n",
       "      <td>The present paper is devoted to study 2-local derivations on W-algebra $W(2,2)$ which is an infinite-dimensional Lie algebras with some out derivations. We prove that all 2-local derivations on the W-algebra $W(2,2)$ are derivation. We also give a complete classification of the 2-local derivation on the so called thin Lie algebra and prove that it admits a lots of 2-local derivations which are not derivations.</td>\n",
       "      <td>[math.RA]</td>\n",
       "      <td>[['Tang', 'Xiaomin', '']]</td>\n",
       "      <td>2020-03-13</td>\n",
       "      <td>2003.05627</td>\n",
       "      <td>2-Local derivations on the W-algebra W(2,2)</td>\n",
       "      <td>The present paper is devoted to study 2-local derivations on W-algebra LATEX  which is an infinite-dimensional Lie algebras with some out derivations. We prove that all 2-local derivations on the W-algebra LATEX  are derivation. We also give a complete classification of the 2-local derivation on the so called thin Lie algebra and prove that it admits a lots of 2-local derivations which are not derivations.</td>\n",
       "      <td>[2-Local, W-algebra]</td>\n",
       "      <td>[2-local, W-algebra, infinite-dimensional, 2-local, W-algebra, 2-local, 2-local]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42123</th>\n",
       "      <td>Rees algebra and special fiber ring of binomial edge ideals of closed   graphs</td>\n",
       "      <td>In this article, we compute the regularity of Rees algebra of binomial edge ideals of closed graphs. We obtain a lower bound for the regularity of Rees algebra of binomial edge ideals. We also study some algebraic properties of the Rees algebra and special fiber ring of binomial edge ideals of closed graphs via algebraic properties of their initial algebra and Sagbi basis theory. We obtain an upper bound for the regularity of the special fiber ring of binomial edge ideals of closed graphs.</td>\n",
       "      <td>[math.AC]</td>\n",
       "      <td>[['Kumar', 'Arvind', '']]</td>\n",
       "      <td>2021-12-07</td>\n",
       "      <td>2102.03348</td>\n",
       "      <td>Rees algebra and special fiber ring of binomial edge ideals of closed   graphs</td>\n",
       "      <td>In this article, we compute the regularity of Rees algebra of binomial edge ideals of closed graphs. We obtain a lower bound for the regularity of Rees algebra of binomial edge ideals. We also study some algebraic properties of the Rees algebra and special fiber ring of binomial edge ideals of closed graphs via algebraic properties of their initial algebra and Sagbi basis theory. We obtain an upper bound for the regularity of the special fiber ring of binomial edge ideals of closed graphs.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45450</th>\n",
       "      <td>Unbalanced spanning subgraphs in edge labeled complete graphs</td>\n",
       "      <td>Let $K$ be a complete graph of order $n$. For $d\\in (0,1)$, let $c$ be a $\\pm 1$-edge labeling of $K$ such that there are $d{n\\choose 2}$ edges with label $+1$, and let $G$ be a spanning subgraph of $K$ of maximum degree at most $\\Delta$. We prove the existence of an isomorphic copy $G'$ of $G$ in $K$ such that the number of edges with label $+1$ in $G'$ is at least $\\left(c_{d,\\Delta}-O\\left(\\frac{1}{n}\\right)\\right)m(G)$, where $c_{d,\\Delta}=d+\\Omega\\left(\\frac{1}{\\Delta}\\right)$ for fixed $d$, that is, this number visibly deviates from its expected value when considering a uniformly random copy of $G$ in $K$. For $d=\\frac{1}{2}$, and $\\Delta\\leq 2$, we present more detailed results.</td>\n",
       "      <td>[math.CO]</td>\n",
       "      <td>[['Bessy', 'Stéphane', ''], ['Pardey', 'Johannes', ''], ['Picasarri-Arrieta', 'Lucas', ''], ['Rautenbach', 'Dieter', '']]</td>\n",
       "      <td>2021-11-12</td>\n",
       "      <td>2107.09290</td>\n",
       "      <td>Unbalanced spanning subgraphs in edge labeled complete graphs</td>\n",
       "      <td>Let LATEX  be a complete graph of order LATEX  For LATEX  let LATEX  be a LATEX  labeling of LATEX  such that there are LATEX  edges with label LATEX  and let LATEX  be a spanning subgraph of LATEX  of maximum degree at most LATEX  We prove the existence of an isomorphic copy LATEX  of LATEX  in LATEX  such that the number of edges with label LATEX  in LATEX  is at least LATEX  where LATEX  for fixed LATEX  that is, this number visibly deviates from its expected value when considering a uniformly random copy of LATEX  in LATEX  For LATEX  and LATEX  we present more detailed results.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26834</th>\n",
       "      <td>Monotone metric tensors in Quantum Information Geometry</td>\n",
       "      <td>We review some geometrical aspects pertaining to the world of monotone quantum metrics in finite dimensions. Particular emphasis is given to an unfolded perspective for quantum states that is built out of the spectral theorem and is naturally suited to investigate the comparison with the classical case of probability distributions.</td>\n",
       "      <td>[quant-ph, math-ph, math.MP]</td>\n",
       "      <td>[['Ciaglia', 'Florio M.', ''], ['Di Cosmo', 'Fabio', ''], ['Di Nocera', 'Fabio', ''], ['Vitale', 'Patrizia', '']]</td>\n",
       "      <td>2022-03-22</td>\n",
       "      <td>2203.10857</td>\n",
       "      <td>Monotone metric tensors in Quantum Information Geometry</td>\n",
       "      <td>We review some geometrical aspects pertaining to the world of monotone quantum metrics in finite dimensions. Particular emphasis is given to an unfolded perspective for quantum states that is built out of the spectral theorem and is naturally suited to investigate the comparison with the classical case of probability distributions.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14369</th>\n",
       "      <td>A new distance measurement and its application in K-Means Algorithm</td>\n",
       "      <td>K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.</td>\n",
       "      <td>[cs.LG, cs.NA, math.NA]</td>\n",
       "      <td>[['Zhang', 'Yiqun', ''], ['Li', 'Houbiao', '']]</td>\n",
       "      <td>2022-06-13</td>\n",
       "      <td>2206.05215</td>\n",
       "      <td>A new distance measurement and its application in K-Means Algorithm</td>\n",
       "      <td>K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.</td>\n",
       "      <td>[K-Means]</td>\n",
       "      <td>[K-Means, K-Means, high-dimensional, view-distance, K-Means, S-curve, K-Means, view-distance, real-world, K-Means, view-distance]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174903</th>\n",
       "      <td>A numerical model based on the curvilinear coordinate system for the MAC   method simplified</td>\n",
       "      <td>In this paper we developed a numerical methodology to study some incompressible fluid flows without free surface, using the curvilinear coordinate system and whose edge geometry is constructed via parametrized spline. First, we discussed the representation of the Navier-Stokes and continuity equations on the curvilinear coordinate system, along with the auxiliary conditions. Then, we presented the numerical method -- a simplified version of MAC (\\textit{Marker and Cell}) method -- along with the discretization of the governing equations, which is carried out using the finite differences method and the implementation of the FOU (\\textit{First Order Upwind}) scheme. Finally, we applied the numerical methodology to the parallel plates problem, lid-driven cavity problem and atherosclerosis problem, and then we compare the results obtained with those presented in the literature.   Keywords: finite differences, simplified MAC, curvilinear coordinates, parallel plates, did-driven cavity, atherosclerosis.</td>\n",
       "      <td>[math.NA, physics.flu-dyn]</td>\n",
       "      <td>[['Cirilo', 'Eliandro Rodrigues', ''], ['Barba', 'Alessandra Negrini Dalla', ''], ['Romeiro', 'Neyva Maria Lopes', ''], ['Natti', 'Paulo Laerte', '']]</td>\n",
       "      <td>2019-02-11</td>\n",
       "      <td>1902.03032</td>\n",
       "      <td>A numerical model based on the curvilinear coordinate system for the MAC   method simplified</td>\n",
       "      <td>In this paper we developed a numerical methodology to study some incompressible fluid flows without free surface, using the curvilinear coordinate system and whose edge geometry is constructed via parametrized spline. First, we discussed the representation of the Navier-Stokes and continuity equations on the curvilinear coordinate system, along with the auxiliary conditions. Then, we presented the numerical method -- a simplified version of MAC () method -- along with the discretization of the governing equations, which is carried out using the finite differences method and the implementation of the FOU () scheme. Finally, we applied the numerical methodology to the parallel plates problem, lid-driven cavity problem and atherosclerosis problem, and then we compare the results obtained with those presented in the literature.   Keywords: finite differences, simplified MAC, curvilinear coordinates, parallel plates, did-driven cavity, atherosclerosis.</td>\n",
       "      <td>None</td>\n",
       "      <td>[Navier-Stokes, lid-driven, did-driven]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133557</th>\n",
       "      <td>Weil-Petersson translation length and manifolds with many fibered   fillings</td>\n",
       "      <td>We prove that any mapping torus of a pseudo-Anosov mapping class with bounded normalized Weil-Petersson translation length contains a finite set of transverse and level closed curves, and drilling out this set of curves results in one of a finite number of cusped hyperbolic 3-manifolds. The number of manifolds in the finite list depends only on the bound for normalized translation length. We also prove a complementary result that explains the necessity of removing level curves by producing new estimates for the Weil-Petersson translation length of compositions of pseudo-Anosov mapping classes and arbitrary powers of a Dehn twist.</td>\n",
       "      <td>[math.GT, math.CV, math.DG]</td>\n",
       "      <td>[['Leininger', 'Christopher J.', ''], ['Minsky', 'Yair N.', ''], ['Souto', 'Juan', ''], ['Taylor', 'Samuel J.', '']]</td>\n",
       "      <td>2020-01-27</td>\n",
       "      <td>1910.01169</td>\n",
       "      <td>Weil-Petersson translation length and manifolds with many fibered   fillings</td>\n",
       "      <td>We prove that any mapping torus of a pseudo-Anosov mapping class with bounded normalized Weil-Petersson translation length contains a finite set of transverse and level closed curves, and drilling out this set of curves results in one of a finite number of cusped hyperbolic 3-manifolds. The number of manifolds in the finite list depends only on the bound for normalized translation length. We also prove a complementary result that explains the necessity of removing level curves by producing new estimates for the Weil-Petersson translation length of compositions of pseudo-Anosov mapping classes and arbitrary powers of a Dehn twist.</td>\n",
       "      <td>[Weil-Petersson]</td>\n",
       "      <td>[pseudo-Anosov, Weil-Petersson, 3-manifolds, Weil-Petersson, pseudo-Anosov]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32750</th>\n",
       "      <td>Invariant measures for large automorphism groups of projective surfaces</td>\n",
       "      <td>We classify invariant probability measures for non-elementary groups of automorphisms, on any compact K\\\"ahler surface X, under the assumption that the group contains a so-called \"parabolic automorphism\". We also prove that except in certain rigid situations known as Kummer examples, there are only finitely many invariant, ergodic, probability measures with a Zariski dense support. If X is a K3 or Enriques surface, and the group does not preserve any algebraic subset, this leads to a complete description of orbit closures.</td>\n",
       "      <td>[math.DS, math.AG]</td>\n",
       "      <td>[['Cantat', 'Serge', ''], ['Dujardin', 'Romain', '']]</td>\n",
       "      <td>2022-02-10</td>\n",
       "      <td>2110.04213</td>\n",
       "      <td>Invariant measures for large automorphism groups of projective surfaces</td>\n",
       "      <td>We classify invariant probability measures for non-elementary groups of automorphisms, on any compact Kahler surface X, under the assumption that the group contains a so-called \"parabolic automorphism\". We also prove that except in certain rigid situations known as Kummer examples, there are only finitely many invariant, ergodic, probability measures with a Zariski dense support. If X is a K3 or Enriques surface, and the group does not preserve any algebraic subset, this leads to a complete description of orbit closures.</td>\n",
       "      <td>None</td>\n",
       "      <td>[non-elementary, so-called]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7442</th>\n",
       "      <td>Sharp estimates, uniqueness and nondegeneracy of positive solutions of   the Lane-Emden system in planar domains</td>\n",
       "      <td>We study the Lane-Emden system $$\\begin{cases} -\\Delta u=v^p,\\quad u&gt;0,\\quad\\text{in}~\\Omega, -\\Delta v=u^q,\\quad v&gt;0,\\quad\\text{in}~\\Omega, u=v=0,\\quad\\text{on}~\\partial\\Omega, \\end{cases}$$ where $\\Omega\\subset\\mathbb{R}^2$ is a smooth bounded domain. In a recent work, we studied the concentration phenomena of positive solutions as $p,q\\to+\\infty$ and $|q-p|\\leq \\Lambda$. In this paper, we obtain sharp estimates of such multi-bubble solutions, including sharp convergence rates of local maxima and scaling parameters, and accurate approximations of solutions. As an application of these sharp estimates, we show that when $\\Omega$ is convex, then the solution of this system is unique and nondegenerate for large $p, q$.</td>\n",
       "      <td>[math.AP]</td>\n",
       "      <td>[['Chen', 'Zhijie', ''], ['Li', 'Houwang', ''], ['Zou', 'Wenming', '']]</td>\n",
       "      <td>2022-07-26</td>\n",
       "      <td>2205.15055</td>\n",
       "      <td>Sharp estimates, uniqueness and nondegeneracy of positive solutions of   the Lane-Emden system in planar domains</td>\n",
       "      <td>We study the Lane-Emden system LATEX  where LATEX  is a smooth bounded domain. In a recent work, we studied the concentration phenomena of positive solutions as LATEX  and LATEX  In this paper, we obtain sharp estimates of such multi-bubble solutions, including sharp convergence rates of local maxima and scaling parameters, and accurate approximations of solutions. As an application of these sharp estimates, we show that when LATEX  is convex, then the solution of this system is unique and nondegenerate for large LATEX</td>\n",
       "      <td>[Lane-Emden]</td>\n",
       "      <td>[Lane-Emden, multi-bubble]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>162934</th>\n",
       "      <td>The time evolution of permutations under random stirring</td>\n",
       "      <td>We consider permutations of $\\{1,...,n\\}$ obtained by $\\lfloor\\sqrt{n}t\\rfloor$ independent applications of random stirring. In each step the same marked stirring element is transposed with probability $1/n$ with any one of the $n$ elements. Normalizing by $\\sqrt{n}$ we describe the asymptotic distribution of the cycle structure of these permutations, for all $t\\ge 0$, as $n\\to\\infty$.</td>\n",
       "      <td>[math.PR]</td>\n",
       "      <td>[['Vető', 'Bálint', '']]</td>\n",
       "      <td>2019-05-20</td>\n",
       "      <td>math/0603044</td>\n",
       "      <td>The time evolution of permutations under random stirring</td>\n",
       "      <td>We consider permutations of LATEX  obtained by LATEX  independent applications of random stirring. In each step the same marked stirring element is transposed with probability LATEX  with any one of the LATEX  elements. Normalizing by LATEX  we describe the asymptotic distribution of the cycle structure of these permutations, for all LATEX  as LATEX</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1035</th>\n",
       "      <td>Bridges of Markov counting processes. Reciprocal classes and duality   formulas</td>\n",
       "      <td>Processes having the same bridges are said to belong to the same reciprocal class. In this article we analyze reciprocal classes of Markov counting processes by identifying their reciprocal invariants and we characterize them as the set of counting processes satisfying some duality formula.</td>\n",
       "      <td>[math.PR]</td>\n",
       "      <td>[['Conforti', 'Giovanni', '', \"MODAL'X\"], ['Léonard', 'Christian', '', \"MODAL'X\"], ['Murr', 'Rüdiger', ''], ['Roelly', 'Sylvie', '']]</td>\n",
       "      <td>2022-09-05</td>\n",
       "      <td>1408.1332</td>\n",
       "      <td>Bridges of Markov counting processes. Reciprocal classes and duality   formulas</td>\n",
       "      <td>Processes having the same bridges are said to belong to the same reciprocal class. In this article we analyze reciprocal classes of Markov counting processes by identifying their reciprocal invariants and we characterize them as the set of counting processes satisfying some duality formula.</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29606</th>\n",
       "      <td>Dynamic Compressed Sensing of Unsteady Flows with a Mobile Robot</td>\n",
       "      <td>Large-scale environmental sensing with a finite number of mobile sensors is a challenging task that requires a lot of resources and time. This is especially true when features in the environment are spatiotemporally changing with unknown or partially known dynamics. Fortunately, these dynamic features often evolve in a low-dimensional space, making it possible to capture their dynamics sufficiently well with only one or several properly planned mobile sensors. This paper investigates the problem of dynamic compressed sensing of an unsteady flow field, which takes advantage of the inherently low dimensionality of the underlying flow dynamics to reduce number of waypoints for a mobile sensing robot. The optimal sensing waypoints are identified by an iterative compressed sensing algorithm that optimizes the flow reconstruction based on the proper orthogonal decomposition modes. An optimal sampling trajectory is then found to traverse these waypoints while minimizing the energy consumption, time, and flow reconstruction error. Simulation results in an unsteady double gyre flow field is presented to demonstrate the efficacy of the proposed algorithms. Experimental results with an indoor quadcopter are presented to show the feasibility of the resulting trajectory.</td>\n",
       "      <td>[cs.RO, eess.SP, math.OC]</td>\n",
       "      <td>[['Shriwastav', 'Sachin', ''], ['Snyder', 'Gregory', ''], ['Song', 'Zhuoyuan', '']]</td>\n",
       "      <td>2022-03-03</td>\n",
       "      <td>2110.08658</td>\n",
       "      <td>Dynamic Compressed Sensing of Unsteady Flows with a Mobile Robot</td>\n",
       "      <td>Large-scale environmental sensing with a finite number of mobile sensors is a challenging task that requires a lot of resources and time. This is especially true when features in the environment are spatiotemporally changing with unknown or partially known dynamics. Fortunately, these dynamic features often evolve in a low-dimensional space, making it possible to capture their dynamics sufficiently well with only one or several properly planned mobile sensors. This paper investigates the problem of dynamic compressed sensing of an unsteady flow field, which takes advantage of the inherently low dimensionality of the underlying flow dynamics to reduce number of waypoints for a mobile sensing robot. The optimal sensing waypoints are identified by an iterative compressed sensing algorithm that optimizes the flow reconstruction based on the proper orthogonal decomposition modes. An optimal sampling trajectory is then found to traverse these waypoints while minimizing the energy consumption, time, and flow reconstruction error. Simulation results in an unsteady double gyre flow field is presented to demonstrate the efficacy of the proposed algorithms. Experimental results with an indoor quadcopter are presented to show the feasibility of the resulting trajectory.</td>\n",
       "      <td>None</td>\n",
       "      <td>[Large-scale, low-dimensional]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7622</th>\n",
       "      <td>Reconstructing anisotropic conductivities on two-dimensional Riemannian   manifolds from power densities</td>\n",
       "      <td>We consider an electrically conductive compact two-dimensional Riemannian manifold with a smooth boundary. This setting defines a natural conductive Laplacian on the manifold and hence also voltage potentials, current fields and corresponding power densities arising from suitable boundary conditions. Motivated by Acousto-Electric Tomography we show that if the manifold has genus zero and the metric is known, then the conductivity can be recovered uniquely and constructively from knowledge of a few power densities. We illustrate the reconstruction procedure numerically by an example of a conductivity on a non-simply connected surface in three-space.</td>\n",
       "      <td>[math.AP]</td>\n",
       "      <td>[['Knudsen', 'Kim', ''], ['Markvorsen', 'Steen', ''], ['Schlüter', 'Hjørdis', '']]</td>\n",
       "      <td>2022-07-26</td>\n",
       "      <td>2202.12056</td>\n",
       "      <td>Reconstructing anisotropic conductivities on two-dimensional Riemannian   manifolds from power densities</td>\n",
       "      <td>We consider an electrically conductive compact two-dimensional Riemannian manifold with a smooth boundary. This setting defines a natural conductive Laplacian on the manifold and hence also voltage potentials, current fields and corresponding power densities arising from suitable boundary conditions. Motivated by Acousto-Electric Tomography we show that if the manifold has genus zero and the metric is known, then the conductivity can be recovered uniquely and constructively from knowledge of a few power densities. We illustrate the reconstruction procedure numerically by an example of a conductivity on a non-simply connected surface in three-space.</td>\n",
       "      <td>[two-dimensional]</td>\n",
       "      <td>[two-dimensional, Acousto-Electric, non-simply, three-space]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102914</th>\n",
       "      <td>Multilevel Ensemble Kalman Filtering based on a sample average of   independent EnKF estimators</td>\n",
       "      <td>We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.</td>\n",
       "      <td>[math.NA, cs.NA]</td>\n",
       "      <td>[['Hoel', 'Håkon', ''], ['Shaimerdenova', 'Gaukhar', ''], ['Tempone', 'Raúl', '']]</td>\n",
       "      <td>2020-09-22</td>\n",
       "      <td>2002.00480</td>\n",
       "      <td>Multilevel Ensemble Kalman Filtering based on a sample average of   independent EnKF estimators</td>\n",
       "      <td>We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.</td>\n",
       "      <td>None</td>\n",
       "      <td>[multi-index, large-ensemble, fine-resolution, discrete-time, finite-dimensional]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77326</th>\n",
       "      <td>Minimum Feature Size Control in Level Set Topology Optimization via   Density Fields</td>\n",
       "      <td>A level set topology optimization approach that uses an auxiliary density field to nucleate holes during the optimization process and achieves minimum feature size control in optimized designs is explored. The level set field determines the solid-void interface, and the density field describes the distribution of a fictitious porous material using the solid isotropic material with penalization. These fields are governed by two sets of independent optimization variables which are initially coupled using a penalty for hole nucleation. The strength of the density field penalization and projection are gradually increased through the optimization process to promote a 0-1 density distribution. This treatment of the density field combined with a second penalty that regulates the evolution of the density field in the void phase, mitigate the appearance of small design features. The minimum feature size of optimized designs is controlled by the radius of the linear filter applied to the density optimization variables. The structural response is predicted by the extended finite element method, the sensitivities by the adjoint method, and the optimization variables are updated by a gradient-based optimization algorithm. Numerical examples investigate the robustness of this approach with respect to algorithmic parameters and mesh refinement. The results show the applicability of the combined density level set topology optimization approach for both optimal hole nucleation and for minimum feature size control in 2D and 3D. This comes, however, at the cost of a more advanced problem formulation and additional computational cost due to an increased number of optimization variables.</td>\n",
       "      <td>[math.OC]</td>\n",
       "      <td>[['Barrera', 'Jorge L.', ''], ['Geiss', 'Markus J.', ''], ['Maute', 'Kurt', '']]</td>\n",
       "      <td>2021-03-30</td>\n",
       "      <td>2103.14585</td>\n",
       "      <td>Minimum Feature Size Control in Level Set Topology Optimization via   Density Fields</td>\n",
       "      <td>A level set topology optimization approach that uses an auxiliary density field to nucleate holes during the optimization process and achieves minimum feature size control in optimized designs is explored. The level set field determines the solid-void interface, and the density field describes the distribution of a fictitious porous material using the solid isotropic material with penalization. These fields are governed by two sets of independent optimization variables which are initially coupled using a penalty for hole nucleation. The strength of the density field penalization and projection are gradually increased through the optimization process to promote a 0-1 density distribution. This treatment of the density field combined with a second penalty that regulates the evolution of the density field in the void phase, mitigate the appearance of small design features. The minimum feature size of optimized designs is controlled by the radius of the linear filter applied to the density optimization variables. The structural response is predicted by the extended finite element method, the sensitivities by the adjoint method, and the optimization variables are updated by a gradient-based optimization algorithm. Numerical examples investigate the robustness of this approach with respect to algorithmic parameters and mesh refinement. The results show the applicability of the combined density level set topology optimization approach for both optimal hole nucleation and for minimum feature size control in 2D and 3D. This comes, however, at the cost of a more advanced problem formulation and additional computational cost due to an increased number of optimization variables.</td>\n",
       "      <td>None</td>\n",
       "      <td>[solid-void, 0-1, gradient-based]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71427</th>\n",
       "      <td>Super quantum cohomology I: Super stable maps of genus zero with   Neveu-Schwarz punctures</td>\n",
       "      <td>In this article we define stable supercurves and super stable maps of genus zero via labeled trees. We prove that the moduli space of stable supercurves and super stable maps of fixed tree type are quotient superorbifolds. To this end, we prove a slice theorem for the action of super Lie groups on Riemannian supermanifolds and discuss superorbifolds. Furthermore, we propose a Gromov topology on super stable maps such that the restriction to fixed tree type yields the quotient topology from the superorbifolds and the reduction is compact. This would, possibly, lead to the notions of super Gromov-Witten invariants and small super quantum cohomology to be discussed in sequels.</td>\n",
       "      <td>[math.DG, math-ph, math.AG, math.MP]</td>\n",
       "      <td>[['Keßler', 'Enno', ''], ['Sheshmani', 'Artan', ''], ['Yau', 'Shing-Tung', '']]</td>\n",
       "      <td>2021-05-13</td>\n",
       "      <td>2010.15634</td>\n",
       "      <td>Super quantum cohomology I: Super stable maps of genus zero with   Neveu-Schwarz punctures</td>\n",
       "      <td>In this article we define stable supercurves and super stable maps of genus zero via labeled trees. We prove that the moduli space of stable supercurves and super stable maps of fixed tree type are quotient superorbifolds. To this end, we prove a slice theorem for the action of super Lie groups on Riemannian supermanifolds and discuss superorbifolds. Furthermore, we propose a Gromov topology on super stable maps such that the restriction to fixed tree type yields the quotient topology from the superorbifolds and the reduction is compact. This would, possibly, lead to the notions of super Gromov-Witten invariants and small super quantum cohomology to be discussed in sequels.</td>\n",
       "      <td>[Neveu-Schwarz]</td>\n",
       "      <td>[Gromov-Witten]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89892</th>\n",
       "      <td>The three types of normal sequential effect algebras</td>\n",
       "      <td>A sequential effect algebra (SEA) is an effect algebra equipped with a sequential product operation modeled after the L\\\"uders product $(a,b)\\mapsto \\sqrt{a}b\\sqrt{a}$ on C*-algebras. A SEA is called normal when it has all suprema of directed sets, and the sequential product interacts suitably with these suprema. The effects on a Hilbert space and the unit interval of a von Neumann or JBW algebra are examples of normal SEAs that are in addition convex, i.e. possess a suitable action of the real unit interval on the algebra. Complete Boolean algebras form normal SEAs too, which are convex only when $0=1$.   We show that any normal SEA $E$ splits as a direct sum $E\\equiv E_b\\oplus E_c \\oplus E_{ac}$ of a complete Boolean algebra $E_b$, a convex normal SEA $E_c$, and a newly identified type of normal SEA $E_{ac}$ we dub purely almost-convex. Along the way we show, among other things, that a SEA which contains only idempotents must be a Boolean algebra; and we establish a spectral theorem using which we settle for the class of normal SEAs a problem of Gudder regarding the uniqueness of square roots. After establishing our main result, we propose a simple extra axiom for normal SEAs that excludes the seemingly pathological a-convex SEAs. We conclude the paper by a study of SEAs with an associative sequential product. We find that associativity forces normal SEAs satisfying our new axiom to be commutative, shedding light on the question of why the sequential product in quantum theory should be non-associative.</td>\n",
       "      <td>[quant-ph, math.OA]</td>\n",
       "      <td>[['Westerbaan', 'Abraham', ''], ['Westerbaan', 'Bas', ''], ['van de Wetering', 'John', '']]</td>\n",
       "      <td>2020-12-30</td>\n",
       "      <td>2004.12749</td>\n",
       "      <td>The three types of normal sequential effect algebras</td>\n",
       "      <td>A sequential effect algebra (SEA) is an effect algebra equipped with a sequential product operation modeled after the Luders product LATEX  on C*-algebras. A SEA is called normal when it has all suprema of directed sets, and the sequential product interacts suitably with these suprema. The effects on a Hilbert space and the unit interval of a von Neumann or JBW algebra are examples of normal SEAs that are in addition convex, i.e. possess a suitable action of the real unit interval on the algebra. Complete Boolean algebras form normal SEAs too, which are convex only when LATEX    We show that any normal SEA LATEX  splits as a direct sum LATEX  of a complete Boolean algebra LATEX  a convex normal SEA LATEX  and a newly identified type of normal SEA LATEX  we dub purely almost-convex. Along the way we show, among other things, that a SEA which contains only idempotents must be a Boolean algebra; and we establish a spectral theorem using which we settle for the class of normal SEAs a problem of Gudder regarding the uniqueness of square roots. After establishing our main result, we propose a simple extra axiom for normal SEAs that excludes the seemingly pathological a-convex SEAs. We conclude the paper by a study of SEAs with an associative sequential product. We find that associativity forces normal SEAs satisfying our new axiom to be commutative, shedding light on the question of why the sequential product in quantum theory should be non-associative.</td>\n",
       "      <td>None</td>\n",
       "      <td>[almost-convex, a-convex, non-associative]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                           title  \\\n",
       "109208  Consistency of Variational Bayes Inference for Estimation and Model   Selection in Mixtures                                                \n",
       "167970  Data Amplification: A Unified and Competitive Approach to Property   Estimation                                                            \n",
       "83749   A Novel Trick to Overcome the Phase Space Volume Change and the Use of   Hamiltonian Trajectories with an emphasis on the Free Expansion   \n",
       "127567  2-Local derivations on the W-algebra W(2,2)                                                                                                \n",
       "42123   Rees algebra and special fiber ring of binomial edge ideals of closed   graphs                                                             \n",
       "45450   Unbalanced spanning subgraphs in edge labeled complete graphs                                                                              \n",
       "26834   Monotone metric tensors in Quantum Information Geometry                                                                                    \n",
       "14369   A new distance measurement and its application in K-Means Algorithm                                                                        \n",
       "174903  A numerical model based on the curvilinear coordinate system for the MAC   method simplified                                               \n",
       "133557  Weil-Petersson translation length and manifolds with many fibered   fillings                                                               \n",
       "32750   Invariant measures for large automorphism groups of projective surfaces                                                                    \n",
       "7442    Sharp estimates, uniqueness and nondegeneracy of positive solutions of   the Lane-Emden system in planar domains                           \n",
       "162934  The time evolution of permutations under random stirring                                                                                   \n",
       "1035    Bridges of Markov counting processes. Reciprocal classes and duality   formulas                                                            \n",
       "29606   Dynamic Compressed Sensing of Unsteady Flows with a Mobile Robot                                                                           \n",
       "7622    Reconstructing anisotropic conductivities on two-dimensional Riemannian   manifolds from power densities                                   \n",
       "102914  Multilevel Ensemble Kalman Filtering based on a sample average of   independent EnKF estimators                                            \n",
       "77326   Minimum Feature Size Control in Level Set Topology Optimization via   Density Fields                                                       \n",
       "71427   Super quantum cohomology I: Super stable maps of genus zero with   Neveu-Schwarz punctures                                                 \n",
       "89892   The three types of normal sequential effect algebras                                                                                       \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  abstract  \\\n",
       "109208    Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "167970    Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just $2n$ samples to achieve the performance attained by the empirical estimator with $n\\sqrt{\\log n}$ samples. This provides off-the-shelf, distribution-independent, \"amplification\" of the amount of data available relative to common-practice estimators.   We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with $n$ samples is even as good as that of the empirical estimator with $n\\log n$ samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              \n",
       "83749     We extend and successfully apply a recently proposed microstate nonequilibrium thermodynamics to study expansion/contraction processes. Here, the numbers of initial and final microstates are different so they cannot be connected by unique Hamiltonian trajectories. This commonly happens when the phase space volume changes, and has not been studied so far using Hamiltonian trajectories that can be inverted to yield an identity mapping between initial and final microstates as the parameter in the Hamiltonian is changed. We propose a trick to overcome this hurdle with a focus on free expansion in an isolated system, where the concept of dissipated work is not clear. The trick is shown to be thermodynamically consistent and can be extremely useful in simulation. We justify that it is the thermodynamic average of the internal microwork done by a microstate that is dissipated; this microwork is different from the exchange microwork with the vacuum, which vanishes. We also establish that the microwork is nonnegative for free expansion, which is remarkable, since its sign is not fixed in a general process.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "127567    The present paper is devoted to study 2-local derivations on W-algebra $W(2,2)$ which is an infinite-dimensional Lie algebras with some out derivations. We prove that all 2-local derivations on the W-algebra $W(2,2)$ are derivation. We also give a complete classification of the 2-local derivation on the so called thin Lie algebra and prove that it admits a lots of 2-local derivations which are not derivations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "42123     In this article, we compute the regularity of Rees algebra of binomial edge ideals of closed graphs. We obtain a lower bound for the regularity of Rees algebra of binomial edge ideals. We also study some algebraic properties of the Rees algebra and special fiber ring of binomial edge ideals of closed graphs via algebraic properties of their initial algebra and Sagbi basis theory. We obtain an upper bound for the regularity of the special fiber ring of binomial edge ideals of closed graphs.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "45450     Let $K$ be a complete graph of order $n$. For $d\\in (0,1)$, let $c$ be a $\\pm 1$-edge labeling of $K$ such that there are $d{n\\choose 2}$ edges with label $+1$, and let $G$ be a spanning subgraph of $K$ of maximum degree at most $\\Delta$. We prove the existence of an isomorphic copy $G'$ of $G$ in $K$ such that the number of edges with label $+1$ in $G'$ is at least $\\left(c_{d,\\Delta}-O\\left(\\frac{1}{n}\\right)\\right)m(G)$, where $c_{d,\\Delta}=d+\\Omega\\left(\\frac{1}{\\Delta}\\right)$ for fixed $d$, that is, this number visibly deviates from its expected value when considering a uniformly random copy of $G$ in $K$. For $d=\\frac{1}{2}$, and $\\Delta\\leq 2$, we present more detailed results.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "26834     We review some geometrical aspects pertaining to the world of monotone quantum metrics in finite dimensions. Particular emphasis is given to an unfolded perspective for quantum states that is built out of the spectral theorem and is naturally suited to investigate the comparison with the classical case of probability distributions.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "14369     K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "174903    In this paper we developed a numerical methodology to study some incompressible fluid flows without free surface, using the curvilinear coordinate system and whose edge geometry is constructed via parametrized spline. First, we discussed the representation of the Navier-Stokes and continuity equations on the curvilinear coordinate system, along with the auxiliary conditions. Then, we presented the numerical method -- a simplified version of MAC (\\textit{Marker and Cell}) method -- along with the discretization of the governing equations, which is carried out using the finite differences method and the implementation of the FOU (\\textit{First Order Upwind}) scheme. Finally, we applied the numerical methodology to the parallel plates problem, lid-driven cavity problem and atherosclerosis problem, and then we compare the results obtained with those presented in the literature.   Keywords: finite differences, simplified MAC, curvilinear coordinates, parallel plates, did-driven cavity, atherosclerosis.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "133557    We prove that any mapping torus of a pseudo-Anosov mapping class with bounded normalized Weil-Petersson translation length contains a finite set of transverse and level closed curves, and drilling out this set of curves results in one of a finite number of cusped hyperbolic 3-manifolds. The number of manifolds in the finite list depends only on the bound for normalized translation length. We also prove a complementary result that explains the necessity of removing level curves by producing new estimates for the Weil-Petersson translation length of compositions of pseudo-Anosov mapping classes and arbitrary powers of a Dehn twist.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "32750     We classify invariant probability measures for non-elementary groups of automorphisms, on any compact K\\\"ahler surface X, under the assumption that the group contains a so-called \"parabolic automorphism\". We also prove that except in certain rigid situations known as Kummer examples, there are only finitely many invariant, ergodic, probability measures with a Zariski dense support. If X is a K3 or Enriques surface, and the group does not preserve any algebraic subset, this leads to a complete description of orbit closures.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "7442      We study the Lane-Emden system $$\\begin{cases} -\\Delta u=v^p,\\quad u>0,\\quad\\text{in}~\\Omega, -\\Delta v=u^q,\\quad v>0,\\quad\\text{in}~\\Omega, u=v=0,\\quad\\text{on}~\\partial\\Omega, \\end{cases}$$ where $\\Omega\\subset\\mathbb{R}^2$ is a smooth bounded domain. In a recent work, we studied the concentration phenomena of positive solutions as $p,q\\to+\\infty$ and $|q-p|\\leq \\Lambda$. In this paper, we obtain sharp estimates of such multi-bubble solutions, including sharp convergence rates of local maxima and scaling parameters, and accurate approximations of solutions. As an application of these sharp estimates, we show that when $\\Omega$ is convex, then the solution of this system is unique and nondegenerate for large $p, q$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "162934    We consider permutations of $\\{1,...,n\\}$ obtained by $\\lfloor\\sqrt{n}t\\rfloor$ independent applications of random stirring. In each step the same marked stirring element is transposed with probability $1/n$ with any one of the $n$ elements. Normalizing by $\\sqrt{n}$ we describe the asymptotic distribution of the cycle structure of these permutations, for all $t\\ge 0$, as $n\\to\\infty$.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               \n",
       "1035      Processes having the same bridges are said to belong to the same reciprocal class. In this article we analyze reciprocal classes of Markov counting processes by identifying their reciprocal invariants and we characterize them as the set of counting processes satisfying some duality formula.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "29606     Large-scale environmental sensing with a finite number of mobile sensors is a challenging task that requires a lot of resources and time. This is especially true when features in the environment are spatiotemporally changing with unknown or partially known dynamics. Fortunately, these dynamic features often evolve in a low-dimensional space, making it possible to capture their dynamics sufficiently well with only one or several properly planned mobile sensors. This paper investigates the problem of dynamic compressed sensing of an unsteady flow field, which takes advantage of the inherently low dimensionality of the underlying flow dynamics to reduce number of waypoints for a mobile sensing robot. The optimal sensing waypoints are identified by an iterative compressed sensing algorithm that optimizes the flow reconstruction based on the proper orthogonal decomposition modes. An optimal sampling trajectory is then found to traverse these waypoints while minimizing the energy consumption, time, and flow reconstruction error. Simulation results in an unsteady double gyre flow field is presented to demonstrate the efficacy of the proposed algorithms. Experimental results with an indoor quadcopter are presented to show the feasibility of the resulting trajectory.                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "7622      We consider an electrically conductive compact two-dimensional Riemannian manifold with a smooth boundary. This setting defines a natural conductive Laplacian on the manifold and hence also voltage potentials, current fields and corresponding power densities arising from suitable boundary conditions. Motivated by Acousto-Electric Tomography we show that if the manifold has genus zero and the metric is known, then the conductivity can be recovered uniquely and constructively from knowledge of a few power densities. We illustrate the reconstruction procedure numerically by an example of a conductivity on a non-simply connected surface in three-space.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "102914    We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "77326     A level set topology optimization approach that uses an auxiliary density field to nucleate holes during the optimization process and achieves minimum feature size control in optimized designs is explored. The level set field determines the solid-void interface, and the density field describes the distribution of a fictitious porous material using the solid isotropic material with penalization. These fields are governed by two sets of independent optimization variables which are initially coupled using a penalty for hole nucleation. The strength of the density field penalization and projection are gradually increased through the optimization process to promote a 0-1 density distribution. This treatment of the density field combined with a second penalty that regulates the evolution of the density field in the void phase, mitigate the appearance of small design features. The minimum feature size of optimized designs is controlled by the radius of the linear filter applied to the density optimization variables. The structural response is predicted by the extended finite element method, the sensitivities by the adjoint method, and the optimization variables are updated by a gradient-based optimization algorithm. Numerical examples investigate the robustness of this approach with respect to algorithmic parameters and mesh refinement. The results show the applicability of the combined density level set topology optimization approach for both optimal hole nucleation and for minimum feature size control in 2D and 3D. This comes, however, at the cost of a more advanced problem formulation and additional computational cost due to an increased number of optimization variables.    \n",
       "71427     In this article we define stable supercurves and super stable maps of genus zero via labeled trees. We prove that the moduli space of stable supercurves and super stable maps of fixed tree type are quotient superorbifolds. To this end, we prove a slice theorem for the action of super Lie groups on Riemannian supermanifolds and discuss superorbifolds. Furthermore, we propose a Gromov topology on super stable maps such that the restriction to fixed tree type yields the quotient topology from the superorbifolds and the reduction is compact. This would, possibly, lead to the notions of super Gromov-Witten invariants and small super quantum cohomology to be discussed in sequels.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "89892     A sequential effect algebra (SEA) is an effect algebra equipped with a sequential product operation modeled after the L\\\"uders product $(a,b)\\mapsto \\sqrt{a}b\\sqrt{a}$ on C*-algebras. A SEA is called normal when it has all suprema of directed sets, and the sequential product interacts suitably with these suprema. The effects on a Hilbert space and the unit interval of a von Neumann or JBW algebra are examples of normal SEAs that are in addition convex, i.e. possess a suitable action of the real unit interval on the algebra. Complete Boolean algebras form normal SEAs too, which are convex only when $0=1$.   We show that any normal SEA $E$ splits as a direct sum $E\\equiv E_b\\oplus E_c \\oplus E_{ac}$ of a complete Boolean algebra $E_b$, a convex normal SEA $E_c$, and a newly identified type of normal SEA $E_{ac}$ we dub purely almost-convex. Along the way we show, among other things, that a SEA which contains only idempotents must be a Boolean algebra; and we establish a spectral theorem using which we settle for the class of normal SEAs a problem of Gudder regarding the uniqueness of square roots. After establishing our main result, we propose a simple extra axiom for normal SEAs that excludes the seemingly pathological a-convex SEAs. We conclude the paper by a study of SEAs with an associative sequential product. We find that associativity forces normal SEAs satisfying our new axiom to be commutative, shedding light on the question of why the sequential product in quantum theory should be non-associative.                                                                                                                                                                          \n",
       "\n",
       "                                                                               cat  \\\n",
       "109208  [math.ST, stat.CO, stat.ME, stat.TH]                                         \n",
       "167970  [stat.ML, cs.LG, math.ST, stat.TH]                                           \n",
       "83749   [cond-mat.stat-mech, cond-mat.mes-hall, math-ph, math.MP, physics.comp-ph]   \n",
       "127567  [math.RA]                                                                    \n",
       "42123   [math.AC]                                                                    \n",
       "45450   [math.CO]                                                                    \n",
       "26834   [quant-ph, math-ph, math.MP]                                                 \n",
       "14369   [cs.LG, cs.NA, math.NA]                                                      \n",
       "174903  [math.NA, physics.flu-dyn]                                                   \n",
       "133557  [math.GT, math.CV, math.DG]                                                  \n",
       "32750   [math.DS, math.AG]                                                           \n",
       "7442    [math.AP]                                                                    \n",
       "162934  [math.PR]                                                                    \n",
       "1035    [math.PR]                                                                    \n",
       "29606   [cs.RO, eess.SP, math.OC]                                                    \n",
       "7622    [math.AP]                                                                    \n",
       "102914  [math.NA, cs.NA]                                                             \n",
       "77326   [math.OC]                                                                    \n",
       "71427   [math.DG, math-ph, math.AG, math.MP]                                         \n",
       "89892   [quant-ph, math.OA]                                                          \n",
       "\n",
       "                                                                                                                                                authors_parsed  \\\n",
       "109208  [['Chérief-Abdellatif', 'Badr-Eddine', ''], ['Alquier', 'Pierre', '']]                                                                                   \n",
       "167970  [['Hao', 'Yi', ''], ['Orlitsky', 'Alon', ''], ['Suresh', 'Ananda T.', ''], ['Wu', 'Yihong', '']]                                                         \n",
       "83749   [['Gujrati', 'P. D.', '']]                                                                                                                               \n",
       "127567  [['Tang', 'Xiaomin', '']]                                                                                                                                \n",
       "42123   [['Kumar', 'Arvind', '']]                                                                                                                                \n",
       "45450   [['Bessy', 'Stéphane', ''], ['Pardey', 'Johannes', ''], ['Picasarri-Arrieta', 'Lucas', ''], ['Rautenbach', 'Dieter', '']]                                \n",
       "26834   [['Ciaglia', 'Florio M.', ''], ['Di Cosmo', 'Fabio', ''], ['Di Nocera', 'Fabio', ''], ['Vitale', 'Patrizia', '']]                                        \n",
       "14369   [['Zhang', 'Yiqun', ''], ['Li', 'Houbiao', '']]                                                                                                          \n",
       "174903  [['Cirilo', 'Eliandro Rodrigues', ''], ['Barba', 'Alessandra Negrini Dalla', ''], ['Romeiro', 'Neyva Maria Lopes', ''], ['Natti', 'Paulo Laerte', '']]   \n",
       "133557  [['Leininger', 'Christopher J.', ''], ['Minsky', 'Yair N.', ''], ['Souto', 'Juan', ''], ['Taylor', 'Samuel J.', '']]                                     \n",
       "32750   [['Cantat', 'Serge', ''], ['Dujardin', 'Romain', '']]                                                                                                    \n",
       "7442    [['Chen', 'Zhijie', ''], ['Li', 'Houwang', ''], ['Zou', 'Wenming', '']]                                                                                  \n",
       "162934  [['Vető', 'Bálint', '']]                                                                                                                                 \n",
       "1035    [['Conforti', 'Giovanni', '', \"MODAL'X\"], ['Léonard', 'Christian', '', \"MODAL'X\"], ['Murr', 'Rüdiger', ''], ['Roelly', 'Sylvie', '']]                    \n",
       "29606   [['Shriwastav', 'Sachin', ''], ['Snyder', 'Gregory', ''], ['Song', 'Zhuoyuan', '']]                                                                      \n",
       "7622    [['Knudsen', 'Kim', ''], ['Markvorsen', 'Steen', ''], ['Schlüter', 'Hjørdis', '']]                                                                       \n",
       "102914  [['Hoel', 'Håkon', ''], ['Shaimerdenova', 'Gaukhar', ''], ['Tempone', 'Raúl', '']]                                                                       \n",
       "77326   [['Barrera', 'Jorge L.', ''], ['Geiss', 'Markus J.', ''], ['Maute', 'Kurt', '']]                                                                         \n",
       "71427   [['Keßler', 'Enno', ''], ['Sheshmani', 'Artan', ''], ['Yau', 'Shing-Tung', '']]                                                                          \n",
       "89892   [['Westerbaan', 'Abraham', ''], ['Westerbaan', 'Bas', ''], ['van de Wetering', 'John', '']]                                                              \n",
       "\n",
       "       update_date            id  \\\n",
       "109208 2020-08-03   1805.05054     \n",
       "167970 2019-04-02   1904.00070     \n",
       "83749  2021-02-12   2102.06122     \n",
       "127567 2020-03-13   2003.05627     \n",
       "42123  2021-12-07   2102.03348     \n",
       "45450  2021-11-12   2107.09290     \n",
       "26834  2022-03-22   2203.10857     \n",
       "14369  2022-06-13   2206.05215     \n",
       "174903 2019-02-11   1902.03032     \n",
       "133557 2020-01-27   1910.01169     \n",
       "32750  2022-02-10   2110.04213     \n",
       "7442   2022-07-26   2205.15055     \n",
       "162934 2019-05-20   math/0603044   \n",
       "1035   2022-09-05   1408.1332      \n",
       "29606  2022-03-03   2110.08658     \n",
       "7622   2022-07-26   2202.12056     \n",
       "102914 2020-09-22   2002.00480     \n",
       "77326  2021-03-30   2103.14585     \n",
       "71427  2021-05-13   2010.15634     \n",
       "89892  2020-12-30   2004.12749     \n",
       "\n",
       "                                                                                                                                     clean_title  \\\n",
       "109208  Consistency of Variational Bayes Inference for Estimation and Model   Selection in Mixtures                                                \n",
       "167970  Data Amplification: A Unified and Competitive Approach to Property   Estimation                                                            \n",
       "83749   A Novel Trick to Overcome the Phase Space Volume Change and the Use of   Hamiltonian Trajectories with an emphasis on the Free Expansion   \n",
       "127567  2-Local derivations on the W-algebra W(2,2)                                                                                                \n",
       "42123   Rees algebra and special fiber ring of binomial edge ideals of closed   graphs                                                             \n",
       "45450   Unbalanced spanning subgraphs in edge labeled complete graphs                                                                              \n",
       "26834   Monotone metric tensors in Quantum Information Geometry                                                                                    \n",
       "14369   A new distance measurement and its application in K-Means Algorithm                                                                        \n",
       "174903  A numerical model based on the curvilinear coordinate system for the MAC   method simplified                                               \n",
       "133557  Weil-Petersson translation length and manifolds with many fibered   fillings                                                               \n",
       "32750   Invariant measures for large automorphism groups of projective surfaces                                                                    \n",
       "7442    Sharp estimates, uniqueness and nondegeneracy of positive solutions of   the Lane-Emden system in planar domains                           \n",
       "162934  The time evolution of permutations under random stirring                                                                                   \n",
       "1035    Bridges of Markov counting processes. Reciprocal classes and duality   formulas                                                            \n",
       "29606   Dynamic Compressed Sensing of Unsteady Flows with a Mobile Robot                                                                           \n",
       "7622    Reconstructing anisotropic conductivities on two-dimensional Riemannian   manifolds from power densities                                   \n",
       "102914  Multilevel Ensemble Kalman Filtering based on a sample average of   independent EnKF estimators                                            \n",
       "77326   Minimum Feature Size Control in Level Set Topology Optimization via   Density Fields                                                       \n",
       "71427   Super quantum cohomology I: Super stable maps of genus zero with   Neveu-Schwarz punctures                                                 \n",
       "89892   The three types of normal sequential effect algebras                                                                                       \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            clean_abstract  \\\n",
       "109208    Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "167970    Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just LATEX  samples to achieve the performance attained by the empirical estimator with LATEX  samples. This provides off-the-shelf, distribution-independent, \"amplification\" of the amount of data available relative to common-practice estimators.   We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with LATEX  samples is even as good as that of the empirical estimator with LATEX  samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "83749     We extend and successfully apply a recently proposed microstate nonequilibrium thermodynamics to study expansion/contraction processes. Here, the numbers of initial and final microstates are different so they cannot be connected by unique Hamiltonian trajectories. This commonly happens when the phase space volume changes, and has not been studied so far using Hamiltonian trajectories that can be inverted to yield an identity mapping between initial and final microstates as the parameter in the Hamiltonian is changed. We propose a trick to overcome this hurdle with a focus on free expansion in an isolated system, where the concept of dissipated work is not clear. The trick is shown to be thermodynamically consistent and can be extremely useful in simulation. We justify that it is the thermodynamic average of the internal microwork done by a microstate that is dissipated; this microwork is different from the exchange microwork with the vacuum, which vanishes. We also establish that the microwork is nonnegative for free expansion, which is remarkable, since its sign is not fixed in a general process.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "127567    The present paper is devoted to study 2-local derivations on W-algebra LATEX  which is an infinite-dimensional Lie algebras with some out derivations. We prove that all 2-local derivations on the W-algebra LATEX  are derivation. We also give a complete classification of the 2-local derivation on the so called thin Lie algebra and prove that it admits a lots of 2-local derivations which are not derivations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          \n",
       "42123     In this article, we compute the regularity of Rees algebra of binomial edge ideals of closed graphs. We obtain a lower bound for the regularity of Rees algebra of binomial edge ideals. We also study some algebraic properties of the Rees algebra and special fiber ring of binomial edge ideals of closed graphs via algebraic properties of their initial algebra and Sagbi basis theory. We obtain an upper bound for the regularity of the special fiber ring of binomial edge ideals of closed graphs.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "45450     Let LATEX  be a complete graph of order LATEX  For LATEX  let LATEX  be a LATEX  labeling of LATEX  such that there are LATEX  edges with label LATEX  and let LATEX  be a spanning subgraph of LATEX  of maximum degree at most LATEX  We prove the existence of an isomorphic copy LATEX  of LATEX  in LATEX  such that the number of edges with label LATEX  in LATEX  is at least LATEX  where LATEX  for fixed LATEX  that is, this number visibly deviates from its expected value when considering a uniformly random copy of LATEX  in LATEX  For LATEX  and LATEX  we present more detailed results.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "26834     We review some geometrical aspects pertaining to the world of monotone quantum metrics in finite dimensions. Particular emphasis is given to an unfolded perspective for quantum states that is built out of the spectral theorem and is naturally suited to investigate the comparison with the classical case of probability distributions.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "14369     K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "174903    In this paper we developed a numerical methodology to study some incompressible fluid flows without free surface, using the curvilinear coordinate system and whose edge geometry is constructed via parametrized spline. First, we discussed the representation of the Navier-Stokes and continuity equations on the curvilinear coordinate system, along with the auxiliary conditions. Then, we presented the numerical method -- a simplified version of MAC () method -- along with the discretization of the governing equations, which is carried out using the finite differences method and the implementation of the FOU () scheme. Finally, we applied the numerical methodology to the parallel plates problem, lid-driven cavity problem and atherosclerosis problem, and then we compare the results obtained with those presented in the literature.   Keywords: finite differences, simplified MAC, curvilinear coordinates, parallel plates, did-driven cavity, atherosclerosis.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  \n",
       "133557    We prove that any mapping torus of a pseudo-Anosov mapping class with bounded normalized Weil-Petersson translation length contains a finite set of transverse and level closed curves, and drilling out this set of curves results in one of a finite number of cusped hyperbolic 3-manifolds. The number of manifolds in the finite list depends only on the bound for normalized translation length. We also prove a complementary result that explains the necessity of removing level curves by producing new estimates for the Weil-Petersson translation length of compositions of pseudo-Anosov mapping classes and arbitrary powers of a Dehn twist.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "32750     We classify invariant probability measures for non-elementary groups of automorphisms, on any compact Kahler surface X, under the assumption that the group contains a so-called \"parabolic automorphism\". We also prove that except in certain rigid situations known as Kummer examples, there are only finitely many invariant, ergodic, probability measures with a Zariski dense support. If X is a K3 or Enriques surface, and the group does not preserve any algebraic subset, this leads to a complete description of orbit closures.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "7442      We study the Lane-Emden system LATEX  where LATEX  is a smooth bounded domain. In a recent work, we studied the concentration phenomena of positive solutions as LATEX  and LATEX  In this paper, we obtain sharp estimates of such multi-bubble solutions, including sharp convergence rates of local maxima and scaling parameters, and accurate approximations of solutions. As an application of these sharp estimates, we show that when LATEX  is convex, then the solution of this system is unique and nondegenerate for large LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "162934    We consider permutations of LATEX  obtained by LATEX  independent applications of random stirring. In each step the same marked stirring element is transposed with probability LATEX  with any one of the LATEX  elements. Normalizing by LATEX  we describe the asymptotic distribution of the cycle structure of these permutations, for all LATEX  as LATEX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "1035      Processes having the same bridges are said to belong to the same reciprocal class. In this article we analyze reciprocal classes of Markov counting processes by identifying their reciprocal invariants and we characterize them as the set of counting processes satisfying some duality formula.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \n",
       "29606     Large-scale environmental sensing with a finite number of mobile sensors is a challenging task that requires a lot of resources and time. This is especially true when features in the environment are spatiotemporally changing with unknown or partially known dynamics. Fortunately, these dynamic features often evolve in a low-dimensional space, making it possible to capture their dynamics sufficiently well with only one or several properly planned mobile sensors. This paper investigates the problem of dynamic compressed sensing of an unsteady flow field, which takes advantage of the inherently low dimensionality of the underlying flow dynamics to reduce number of waypoints for a mobile sensing robot. The optimal sensing waypoints are identified by an iterative compressed sensing algorithm that optimizes the flow reconstruction based on the proper orthogonal decomposition modes. An optimal sampling trajectory is then found to traverse these waypoints while minimizing the energy consumption, time, and flow reconstruction error. Simulation results in an unsteady double gyre flow field is presented to demonstrate the efficacy of the proposed algorithms. Experimental results with an indoor quadcopter are presented to show the feasibility of the resulting trajectory.                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "7622      We consider an electrically conductive compact two-dimensional Riemannian manifold with a smooth boundary. This setting defines a natural conductive Laplacian on the manifold and hence also voltage potentials, current fields and corresponding power densities arising from suitable boundary conditions. Motivated by Acousto-Electric Tomography we show that if the manifold has genus zero and the metric is known, then the conductivity can be recovered uniquely and constructively from knowledge of a few power densities. We illustrate the reconstruction procedure numerically by an example of a conductivity on a non-simply connected surface in three-space.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
       "102914    We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            \n",
       "77326     A level set topology optimization approach that uses an auxiliary density field to nucleate holes during the optimization process and achieves minimum feature size control in optimized designs is explored. The level set field determines the solid-void interface, and the density field describes the distribution of a fictitious porous material using the solid isotropic material with penalization. These fields are governed by two sets of independent optimization variables which are initially coupled using a penalty for hole nucleation. The strength of the density field penalization and projection are gradually increased through the optimization process to promote a 0-1 density distribution. This treatment of the density field combined with a second penalty that regulates the evolution of the density field in the void phase, mitigate the appearance of small design features. The minimum feature size of optimized designs is controlled by the radius of the linear filter applied to the density optimization variables. The structural response is predicted by the extended finite element method, the sensitivities by the adjoint method, and the optimization variables are updated by a gradient-based optimization algorithm. Numerical examples investigate the robustness of this approach with respect to algorithmic parameters and mesh refinement. The results show the applicability of the combined density level set topology optimization approach for both optimal hole nucleation and for minimum feature size control in 2D and 3D. This comes, however, at the cost of a more advanced problem formulation and additional computational cost due to an increased number of optimization variables.    \n",
       "71427     In this article we define stable supercurves and super stable maps of genus zero via labeled trees. We prove that the moduli space of stable supercurves and super stable maps of fixed tree type are quotient superorbifolds. To this end, we prove a slice theorem for the action of super Lie groups on Riemannian supermanifolds and discuss superorbifolds. Furthermore, we propose a Gromov topology on super stable maps such that the restriction to fixed tree type yields the quotient topology from the superorbifolds and the reduction is compact. This would, possibly, lead to the notions of super Gromov-Witten invariants and small super quantum cohomology to be discussed in sequels.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \n",
       "89892     A sequential effect algebra (SEA) is an effect algebra equipped with a sequential product operation modeled after the Luders product LATEX  on C*-algebras. A SEA is called normal when it has all suprema of directed sets, and the sequential product interacts suitably with these suprema. The effects on a Hilbert space and the unit interval of a von Neumann or JBW algebra are examples of normal SEAs that are in addition convex, i.e. possess a suitable action of the real unit interval on the algebra. Complete Boolean algebras form normal SEAs too, which are convex only when LATEX    We show that any normal SEA LATEX  splits as a direct sum LATEX  of a complete Boolean algebra LATEX  a convex normal SEA LATEX  and a newly identified type of normal SEA LATEX  we dub purely almost-convex. Along the way we show, among other things, that a SEA which contains only idempotents must be a Boolean algebra; and we establish a spectral theorem using which we settle for the class of normal SEAs a problem of Gudder regarding the uniqueness of square roots. After establishing our main result, we propose a simple extra axiom for normal SEAs that excludes the seemingly pathological a-convex SEAs. We conclude the paper by a study of SEAs with an associative sequential product. We find that associativity forces normal SEAs satisfying our new axiom to be commutative, shedding light on the question of why the sequential product in quantum theory should be non-associative.                                                                                                                                                                                                                                    \n",
       "\n",
       "               hyph_in_title  \\\n",
       "109208  None                   \n",
       "167970  None                   \n",
       "83749   None                   \n",
       "127567  [2-Local, W-algebra]   \n",
       "42123   None                   \n",
       "45450   None                   \n",
       "26834   None                   \n",
       "14369   [K-Means]              \n",
       "174903  None                   \n",
       "133557  [Weil-Petersson]       \n",
       "32750   None                   \n",
       "7442    [Lane-Emden]           \n",
       "162934  None                   \n",
       "1035    None                   \n",
       "29606   None                   \n",
       "7622    [two-dimensional]      \n",
       "102914  None                   \n",
       "77326   None                   \n",
       "71427   [Neveu-Schwarz]        \n",
       "89892   None                   \n",
       "\n",
       "                                                                                                                         hyph_in_abstract  \n",
       "109208  [two-fold]                                                                                                                         \n",
       "167970  [linear-time, off-the-shelf, distribution-independent, common-practice]                                                            \n",
       "83749   None                                                                                                                               \n",
       "127567  [2-local, W-algebra, infinite-dimensional, 2-local, W-algebra, 2-local, 2-local]                                                   \n",
       "42123   None                                                                                                                               \n",
       "45450   None                                                                                                                               \n",
       "26834   None                                                                                                                               \n",
       "14369   [K-Means, K-Means, high-dimensional, view-distance, K-Means, S-curve, K-Means, view-distance, real-world, K-Means, view-distance]  \n",
       "174903  [Navier-Stokes, lid-driven, did-driven]                                                                                            \n",
       "133557  [pseudo-Anosov, Weil-Petersson, 3-manifolds, Weil-Petersson, pseudo-Anosov]                                                        \n",
       "32750   [non-elementary, so-called]                                                                                                        \n",
       "7442    [Lane-Emden, multi-bubble]                                                                                                         \n",
       "162934  None                                                                                                                               \n",
       "1035    None                                                                                                                               \n",
       "29606   [Large-scale, low-dimensional]                                                                                                     \n",
       "7622    [two-dimensional, Acousto-Electric, non-simply, three-space]                                                                       \n",
       "102914  [multi-index, large-ensemble, fine-resolution, discrete-time, finite-dimensional]                                                  \n",
       "77326   [solid-void, 0-1, gradient-based]                                                                                                  \n",
       "71427   [Gromov-Witten]                                                                                                                    \n",
       "89892   [almost-convex, a-convex, non-associative]                                                                                         "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sample(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>raw_title</th>\n",
       "      <th>clean_title</th>\n",
       "      <th>hyph_in_title</th>\n",
       "      <th>raw_abstract</th>\n",
       "      <th>clean_abstract</th>\n",
       "      <th>hyph_in_abstract</th>\n",
       "      <th>authors_parsed</th>\n",
       "      <th>cat</th>\n",
       "      <th>update_date</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Vertex representations via finite groups and the McKay correspondence</td>\n",
       "      <td>Vertex representations via finite groups and the McKay correspondence</td>\n",
       "      <td>None</td>\n",
       "      <td>Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.</td>\n",
       "      <td>Given a finite group LATEX  and a virtual character LATEX  on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products LATEX  We recover the character tables of wreath products LATEX  by vertex operator calculus. When LATEX  is a finite subgroup of LATEX  our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of LATEX  type, which can be regarded as a new form of McKay correspondence.</td>\n",
       "      <td>None</td>\n",
       "      <td>[['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]</td>\n",
       "      <td>[math.QA, hep-th, math.RT]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9907166</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Categoricity and amalgamation for AEC and $ \\kappa $ measurable</td>\n",
       "      <td>Categoricity and amalgamation for AEC and LATEX  measurable</td>\n",
       "      <td>None</td>\n",
       "      <td>In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda &gt; \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $&lt;\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${&lt;} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.</td>\n",
       "      <td>In the original version of this paper, we assume a theory LATEX  that the logic LATEX  is categorical in a cardinal LATEX  and LATEX  is a measurable cardinal. There we prove that the class of model of LATEX  of cardinality LATEX  (but LATEX  has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of LATEX  by LATEX  an AEC (abstract elementary class) which has LS-number LATEX  or at least which behave nicely for ultrapowers by LATEX  a normal ultra-filter on LATEX    Presently sub-section \\S1A deals with LATEX  (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.</td>\n",
       "      <td>[LS-number, ultra-filter, sub-section]</td>\n",
       "      <td>[['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]</td>\n",
       "      <td>[math.LO]</td>\n",
       "      <td>2023-05-19</td>\n",
       "      <td>math/9602216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>From Loop Groups to 2-Groups</td>\n",
       "      <td>[2-Groups]</td>\n",
       "      <td>We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.</td>\n",
       "      <td>We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group LATEX  A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If LATEX  is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras LATEX  each having LATEX  as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on LATEX  There appears to be no Lie 2-group having LATEX  as its Lie 2-algebra, except when LATEX  Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to LATEX  The objects of this 2-group are based paths in LATEX  while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of LATEX  This 2-group is closely related to the LATEX  power of the canonical gerbe over LATEX  Its nerve gives a topological group that is an extension of LATEX  by LATEX  When LATEX  this topological group can also be obtained by killing the third homotopy group of LATEX  Thus, when LATEX  it is none other than LATEX</td>\n",
       "      <td>[2-algebras, Kac-Moody, 2-algebra, 2-group, simply-connected, 1-parameter, 2-algebras, 3-form, 2-group, 2-algebra, infinite-dimensional, 2-group, 2-algebra, 2-group, Kac-Moody, 2-group]</td>\n",
       "      <td>[['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]</td>\n",
       "      <td>[math.QA, hep-th, math.DG]</td>\n",
       "      <td>2023-05-16</td>\n",
       "      <td>math/0504123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>Finite Supersymmetry Transformations</td>\n",
       "      <td>None</td>\n",
       "      <td>We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.</td>\n",
       "      <td>We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.</td>\n",
       "      <td>None</td>\n",
       "      <td>[['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]</td>\n",
       "      <td>[quant-ph, hep-th, math-ph, math.MP]</td>\n",
       "      <td>2023-05-09</td>\n",
       "      <td>quant-ph/0401139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>Super black box (formerly: Middle diamond)</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa &lt; \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta &lt; \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).</td>\n",
       "      <td>This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular LATEX  cardinal, for many regular LATEX  many stationary subsets of LATEX  concentrating on cofinality LATEX  have super BB. In particular, we have the super BB on LATEX  This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).</td>\n",
       "      <td>None</td>\n",
       "      <td>[['Shelah', 'Saharon', '']]</td>\n",
       "      <td>[math.LO]</td>\n",
       "      <td>2023-05-04</td>\n",
       "      <td>math/0212249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                               raw_title  \\\n",
       "0  Vertex representations via finite groups and the McKay correspondence   \n",
       "1  Categoricity and amalgamation for AEC and $ \\kappa $ measurable         \n",
       "2  From Loop Groups to 2-Groups                                            \n",
       "3  Finite Supersymmetry Transformations                                    \n",
       "4  Super black box (formerly: Middle diamond)                              \n",
       "\n",
       "                                                             clean_title  \\\n",
       "0  Vertex representations via finite groups and the McKay correspondence   \n",
       "1  Categoricity and amalgamation for AEC and LATEX  measurable             \n",
       "2  From Loop Groups to 2-Groups                                            \n",
       "3  Finite Supersymmetry Transformations                                    \n",
       "4  Super black box (formerly: Middle diamond)                              \n",
       "\n",
       "  hyph_in_title  \\\n",
       "0  None           \n",
       "1  None           \n",
       "2  [2-Groups]     \n",
       "3  None           \n",
       "4  None           \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  raw_abstract  \\\n",
       "0    Given a finite group $\\Gamma$ and a virtual character $\\wt$ on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products $\\Gamma\\sim S_n$. We recover the character tables of wreath products $\\Gamma\\sim S_n$ by vertex operator calculus. When $\\Gamma$ is a finite subgroup of $SU_2$, our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of $ADE$ type, which can be regarded as a new form of McKay correspondence.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n",
       "1    In the original version of this paper, we assume a theory $T$ that the logic $\\mathbb L _{\\kappa, \\aleph_{0}}$ is categorical in a cardinal $\\lambda > \\kappa$, and $\\kappa$ is a measurable cardinal. There we prove that the class of model of $T$ of cardinality $<\\lambda$ (but $\\geq |T|+\\kappa$) has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of $T$ by $\\mathfrak k$, an AEC (abstract elementary class) which has LS-number ${<} \\, \\kappa,$ or at least which behave nicely for ultrapowers by $D$, a normal ultra-filter on $\\kappa$.   Presently sub-section \\S1A deals with $T \\subseteq \\mathbb L_{\\kappa^{+}, \\aleph_{0}}$ (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "2    We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group $\\mathrm{String}(n)$. A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If $G$ is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras $\\mathfrak{g}_k$ each having $\\mathrm{Lie}(G)$ as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on $G$. There appears to be no Lie 2-group having $\\mathfrak{g}_k$ as its Lie 2-algebra, except when $k = 0$. Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to $\\mathfrak{g}_k$. The objects of this 2-group are based paths in $G$, while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of $G$. This 2-group is closely related to the $k$th power of the canonical gerbe over $G$. Its nerve gives a topological group that is an extension of $G$ by $K(\\mathbb{Z},2)$. When $k = \\pm 1$, this topological group can also be obtained by killing the third homotopy group of $G$. Thus, when $G = \\mathrm{Spin}(n)$, it is none other than $\\mathrm{String}(n)$.    \n",
       "3    We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "4    This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular $\\lambda$ cardinal, for many regular $\\kappa < \\lambda$, many stationary subsets of $\\lambda$ concentrating on cofinality $\\kappa$ have super BB. In particular, we have the super BB on $\\{\\delta < \\lambda \\colon cf(\\delta) = \\kappa\\}$. This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    clean_abstract  \\\n",
       "0    Given a finite group LATEX  and a virtual character LATEX  on it, we construct a Fock space and associated vertex operators in terms of representation ring of wreath products LATEX  We recover the character tables of wreath products LATEX  by vertex operator calculus. When LATEX  is a finite subgroup of LATEX  our construction yields a group theoretic realization of the basic representations of the affine and toroidal Lie algebras of LATEX  type, which can be regarded as a new form of McKay correspondence.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "1    In the original version of this paper, we assume a theory LATEX  that the logic LATEX  is categorical in a cardinal LATEX  and LATEX  is a measurable cardinal. There we prove that the class of model of LATEX  of cardinality LATEX  (but LATEX  has the amalgamation property; this is a step toward understanding the character of such classes of models.   In this revised version we replaced the class of models of LATEX  by LATEX  an AEC (abstract elementary class) which has LS-number LATEX  or at least which behave nicely for ultrapowers by LATEX  a normal ultra-filter on LATEX    Presently sub-section \\S1A deals with LATEX  (and so does a large part of the introduction and little in the rest of \\S1), but otherwise, all is done in the context of AEC.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             \n",
       "2    We describe an interesting relation between Lie 2-algebras, the Kac-Moody central extensions of loop groups, and the group LATEX  A Lie 2-algebra is a categorified version of a Lie algebra where the Jacobi identity holds up to a natural isomorphism called the \"Jacobiator\". Similarly, a Lie 2-group is a categorified version of a Lie group. If LATEX  is a simply-connected compact simple Lie group, there is a 1-parameter family of Lie 2-algebras LATEX  each having LATEX  as its Lie algebra of objects, but with a Jacobiator built from the canonical 3-form on LATEX  There appears to be no Lie 2-group having LATEX  as its Lie 2-algebra, except when LATEX  Here, however, we construct for integral k an infinite-dimensional Lie 2-group whose Lie 2-algebra is equivalent to LATEX  The objects of this 2-group are based paths in LATEX  while the automorphisms of any object form the level-$k$ Kac-Moody central extension of the loop group of LATEX  This 2-group is closely related to the LATEX  power of the canonical gerbe over LATEX  Its nerve gives a topological group that is an extension of LATEX  by LATEX  When LATEX  this topological group can also be obtained by killing the third homotopy group of LATEX  Thus, when LATEX  it is none other than LATEX     \n",
       "3    We investigate simple examples of supersymmetry algebras with real and Grassmann parameters. Special attention is payed to the finite supertransformations and their probability interpretation. Furthermore we look for combinations of bosons and fermions which are invariant under supertransformations. These combinations correspond to states that are highly entangled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "4    This is a slightly corrected version of an old work.   Under certain cardinal arithmetic assumptions, we prove that for every large enough regular LATEX  cardinal, for many regular LATEX  many stationary subsets of LATEX  concentrating on cofinality LATEX  have super BB. In particular, we have the super BB on LATEX  This is a strong negation of uniformization.   We have added some details. Works continuing it are [Sh:898] and [Sh:1028]. We thank Ari Brodski and Adi Jarden for their helpful comments.   In this paper we had earlier used the notion ``middle diamond\" which is now replaced by ``super BB'', that is, ``super black box'', in order to be consistent with other papers (see [Sh:898]).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      \n",
       "\n",
       "                                                                                                                                                                            hyph_in_abstract  \\\n",
       "0  None                                                                                                                                                                                        \n",
       "1  [LS-number, ultra-filter, sub-section]                                                                                                                                                      \n",
       "2  [2-algebras, Kac-Moody, 2-algebra, 2-group, simply-connected, 1-parameter, 2-algebras, 3-form, 2-group, 2-algebra, infinite-dimensional, 2-group, 2-algebra, 2-group, Kac-Moody, 2-group]   \n",
       "3  None                                                                                                                                                                                        \n",
       "4  None                                                                                                                                                                                        \n",
       "\n",
       "                                                                                                authors_parsed  \\\n",
       "0  [['Frenkel', 'Igor', ''], ['Jing', 'Naihuan', ''], ['Wang', 'Weiqiang', '']]                                  \n",
       "1  [['Kolman', 'Oren', ''], ['Shelah', 'Saharon', '']]                                                           \n",
       "2  [['Baez', 'John C.', ''], ['Crans', 'Alissa S.', ''], ['Stevenson', 'Danny', ''], ['Schreiber', 'Urs', '']]   \n",
       "3  [['Ilieva', 'Nevena', ''], ['Narnhofer', 'Heide', ''], ['Thirring', 'Walter', '']]                            \n",
       "4  [['Shelah', 'Saharon', '']]                                                                                   \n",
       "\n",
       "                                    cat update_date                id  \n",
       "0  [math.QA, hep-th, math.RT]           2023-05-19   math/9907166      \n",
       "1  [math.LO]                            2023-05-19   math/9602216      \n",
       "2  [math.QA, hep-th, math.DG]           2023-05-16   math/0504123      \n",
       "3  [quant-ph, hep-th, math-ph, math.MP] 2023-05-09   quant-ph/0401139  \n",
       "4  [math.LO]                            2023-05-04   math/0212249      "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cols = ['raw_title','clean_title','hyph_in_title',\n",
    "                                     'raw_abstract', 'clean_abstract','hyph_in_abstract',\n",
    "                                     'authors_parsed','cat','update_date','id']\n",
    "\n",
    "cleaned_data = pd.DataFrame(columns=cols)\n",
    "\n",
    "for name in cols:\n",
    "    if not name in ['raw_title','raw_abstract']:\n",
    "        cleaned_data[name] = data[name]\n",
    "cleaned_data['raw_title'] = data['title']\n",
    "cleaned_data['raw_abstract'] = data['abstract']\n",
    "\n",
    "cleaned_data.head()                                    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Save the cleaned data to file\n",
    "\n",
    "cleaned_data.to_parquet('./data/arXiv_clean.parquet')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Include the cleaning utilities applied to the data in the util file."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}