Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeNeural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology
While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. To demonstrate the usefulness of our approach, we show that neural persistence reflects best practices developed in the deep learning community such as dropout and batch normalization. Moreover, we derive a neural persistence-based stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss.
Persistent-Homology-based Machine Learning and its Applications -- A Survey
A suitable feature representation that can both preserve the data intrinsic information and reduce data complexity and dimensionality is key to the performance of machine learning models. Deeply rooted in algebraic topology, persistent homology (PH) provides a delicate balance between data simplification and intrinsic structure characterization, and has been applied to various areas successfully. However, the combination of PH and machine learning has been hindered greatly by three challenges, namely topological representation of data, PH-based distance measurements or metrics, and PH-based feature representation. With the development of topological data analysis, progresses have been made on all these three problems, but widely scattered in different literatures. In this paper, we provide a systematical review of PH and PH-based supervised and unsupervised models from a computational perspective. Our emphasizes are the recent development of mathematical models and tools, including PH softwares and PH-based functions, feature representations, kernels, and similarity models. Essentially, this paper can work as a roadmap for the practical application of PH-based machine learning tools. Further, we consider different topological feature representations in different machine learning models, and investigate their impacts on the protein secondary structure classification.
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
Sheaf Neural Networks with Connection Laplacians
A Sheaf Neural Network (SNN) is a type of Graph Neural Network (GNN) that operates on a sheaf, an object that equips a graph with vector spaces over its nodes and edges and linear maps between these spaces. SNNs have been shown to have useful theoretical properties that help tackle issues arising from heterophily and over-smoothing. One complication intrinsic to these models is finding a good sheaf for the task to be solved. Previous works proposed two diametrically opposed approaches: manually constructing the sheaf based on domain knowledge and learning the sheaf end-to-end using gradient-based methods. However, domain knowledge is often insufficient, while learning a sheaf could lead to overfitting and significant computational overhead. In this work, we propose a novel way of computing sheaves drawing inspiration from Riemannian geometry: we leverage the manifold assumption to compute manifold-and-graph-aware orthogonal maps, which optimally align the tangent spaces of neighbouring data points. We show that this approach achieves promising results with less computational overhead when compared to previous SNN models. Overall, this work provides an interesting connection between algebraic topology and differential geometry, and we hope that it will spark future research in this direction.
A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions
Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets such as point clouds. One of the most important such descriptors is {\em persistent homology}, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms. Existing approaches to this problem either ignore most of the multiparameter information to reduce to the one-parameter case or are heuristic and potentially unstable in the face of noise. In this article, we introduce a new general representation framework that leverages recent results on {\em decompositions} of multiparameter persistent homology. This framework is rich in information, fast to compute, and encompasses previous approaches. Moreover, we establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation, making this framework an applicable and versatile tool for analyzing geometric and point cloud data. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs
Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands. At the same time, we prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their asymptotic behaviour. On the practical side, we study how sheaves can be learned from data. The resulting sheaf diffusion models have many desirable properties that address the limitations of classical graph diffusion equations (and corresponding GNN models) and obtain competitive results in heterophilic settings. Overall, our work provides new connections between GNNs and algebraic topology and would be of interest to both fields.
Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection
Metric space magnitude, an active field of research in algebraic topology, is a scalar quantity that summarizes the effective number of distinct points that live in a general metric space. The {\em weighting vector} is a closely-related concept that captures, in a nontrivial way, much of the underlying geometry of the original metric space. Recent work has demonstrated that when the metric space is Euclidean, the weighting vector serves as an effective tool for boundary detection. We recast this result and show the weighting vector may be viewed as a solution to a kernelized SVM. As one consequence, we apply this new insight to the task of outlier detection, and we demonstrate performance that is competitive or exceeds performance of state-of-the-art techniques on benchmark data sets. Under mild assumptions, we show the weighting vector, which has computational cost of matrix inversion, can be efficiently approximated in linear time. We show how nearest neighbor methods can approximate solutions to the minimization problems defined by SVMs.
Practical applications of metric space magnitude and weighting vectors
Metric space magnitude, an active subject of research in algebraic topology, originally arose in the context of biology, where it was used to represent the effective number of distinct species in an environment. In a more general setting, the magnitude of a metric space is a real number that aims to quantify the effective number of distinct points in the space. The contribution of each point to a metric space's global magnitude, which is encoded by the {\em weighting vector}, captures much of the underlying geometry of the original metric space. Surprisingly, when the metric space is Euclidean, the weighting vector also serves as an effective tool for boundary detection. This allows the weighting vector to serve as the foundation of novel algorithms for classic machine learning tasks such as classification, outlier detection and active learning. We demonstrate, using experiments and comparisons on classic benchmark datasets, the promise of the proposed magnitude and weighting vector-based approaches.
Approximating the Convex Hull via Metric Space Magnitude
Magnitude of a finite metric space and the related notion of magnitude functions on metric spaces is an active area of research in algebraic topology. Magnitude originally arose in the context of biology, where it represents the number of effective species in an environment; when applied to a one-parameter family of metric spaces tX with scale parameter t, the magnitude captures much of the underlying geometry of the space. Prior work has mostly focussed on properties of magnitude in a global sense; in this paper we restrict the sets to finite subsets of Euclidean space and investigate its individual components. We give an explicit formula for the corrected inclusion-exclusion principle, and define a quantity associated with each point, called the moment which gives an intrinsic ordering to the points. We exploit this in order to form an algorithm which approximates the convex hull.
LeanAgent: Lifelong Learning for Formal Theorem Proving
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs up to 11times better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem proving performance.
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.
A Phenomenological Approach to Interactive Knot Diagrams
Knot diagrams are among the most common visual tools in topology. Computer programs now make it possible to draw, manipulate and render them digitally, which proves to be useful in knot theory teaching and research. Still, an openly available tool to manipulate knot diagrams in a real-time, interactive way is yet to be developed. We introduce a method of operating on the geometry of the knot diagram itself without any underlying three-dimensional structure that can underpin such an application. This allows us to directly interact with vector graphics knot diagrams while at the same time computing knot invariants in ways proposed by previous work. An implementation of this method is provided.
Construction of simplicial complexes with prescribed degree-size sequences
We study the realizability of simplicial complexes with a given pair of integer sequences, representing the node degree distribution and the facet size distribution, respectively. While the s-uniform variant of the problem is NP-complete when s geq 3, we identify two populations of input sequences, most of which can be solved in polynomial time using a recursive algorithm that we contribute. Combining with a sampler for the simplicial configuration model [J.-G. Young et al., Phys. Rev. E 96, 032312 (2017)], we facilitate the efficient sampling of simplicial ensembles from arbitrary degree and size distributions. We find that, contrary to expectations based on dyadic networks, increasing the nodes' degrees reduces the number of loops in simplicial complexes. Our work unveils a fundamental constraint on the degree-size sequences and sheds light on further analysis of higher-order phenomena based on local structures.
On the Expressivity of Persistent Homology in Graph Learning
Persistent homology, a technique from computational topology, has recently shown strong empirical performance in the context of graph classification. Being able to capture long range graph properties via higher-order topological features, such as cycles of arbitrary length, in combination with multi-scale topological descriptors, has improved predictive performance for data sets with prominent topological structures, such as molecules. At the same time, the theoretical properties of persistent homology have not been formally assessed in this context. This paper intends to bridge the gap between computational topology and graph machine learning by providing a brief introduction to persistent homology in the context of graphs, as well as a theoretical discussion and empirical analysis of its expressivity for graph learning tasks.
Probability, valuations, hyperspace: Three monads on Top and the support as a morphism
We consider three monads on Top, the category of topological spaces, which formalize topological aspects of probability and possibility in categorical terms. The first one is the Hoare hyperspace monad H, which assigns to every space its space of closed subsets equipped with the lower Vietoris topology. The second is the monad V of continuous valuations, also known as the extended probabilistic powerdomain. We construct both monads in a unified way in terms of double dualization. This reveals a close analogy between them, and allows us to prove that the operation of taking the support of a continuous valuation is a morphism of monads from V to H. In particular, this implies that every H-algebra (topological complete semilattice) is also a V-algebra. Third, we show that V can be restricted to a submonad of tau-smooth probability measures on Top. By composing these two morphisms of monads, we obtain that taking the support of a tau-smooth probability measure is also a morphism of monads.
Open Gromov-Witten theory on Calabi-Yau three-folds I
We propose a general theory of the Open Gromov-Witten invariant on Calabi-Yau three-folds. We introduce the moduli space of multi-curves and show how it leads to invariants. Our construction is based on an idea of Witten. In the special case that each connected component of the Lagrangian submanifold has the rational homology of a sphere we define rational numbers F_{g,h} for each genus g and h boundary components.
A Heegaard-Floer TQFT for link cobordisms
We introduce a Heegaard-Floer homology functor from the category of oriented links in closed 3-manifolds and oriented surface cobordisms in 4-manifolds connecting them to the category of F[v]-modules and F[v]-homomorphisms between them, where F is the field with two elements. In comparison with previously defined TQFTs for decorated links and link cobordisms, the construction of this paper has the advantage of being independent from the decoration. Some of the basic properties of this functor are also explored.
Topological structure of complex predictions
Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into pictures representing a topological view. The result is a map of the predictions that enables inspection. The methods scale up to large datasets across different domains and enable us to detect labeling errors in training data, understand generalization in image classification, and inspect predictions of likely pathogenic mutations in the BRCA1 gene.
Positive Geometries and Canonical Forms
Recent years have seen a surprising connection between the physics of scattering amplitudes and a class of mathematical objects--the positive Grassmannian, positive loop Grassmannians, tree and loop Amplituhedra--which have been loosely referred to as "positive geometries". The connection between the geometry and physics is provided by a unique differential form canonically determined by the property of having logarithmic singularities (only) on all the boundaries of the space, with residues on each boundary given by the canonical form on that boundary. In this paper we initiate an exploration of "positive geometries" and "canonical forms" as objects of study in their own right in a more general mathematical setting. We give a precise definition of positive geometries and canonical forms, introduce general methods for finding forms for more complicated positive geometries from simpler ones, and present numerous examples of positive geometries in projective spaces, Grassmannians, and toric, cluster and flag varieties. We also illustrate a number of strategies for computing canonical forms which yield interesting representations for the forms associated with wide classes of positive geometries, ranging from the simplest Amplituhedra to new expressions for the volume of arbitrary convex polytopes.
Homotopy Limits and Homotopy Colimits of Chain Complexes
We prove that the homotopy limits and homotopy colimits of chain complexes can be computed by the cobar and bar constructions. We also show that the totalizations of double complexes compute the homotopy limits and homotopy colimits of simplicial and cosimplicial chain complexes.
Clifford Group Equivariant Simplicial Message Passing Networks
We introduce Clifford Group Equivariant Simplicial Message Passing Networks, a method for steerable E(n)-equivariant message passing on simplicial complexes. Our method integrates the expressivity of Clifford group-equivariant layers with simplicial message passing, which is topologically more intricate than regular graph message passing. Clifford algebras include higher-order objects such as bivectors and trivectors, which express geometric features (e.g., areas, volumes) derived from vectors. Using this knowledge, we represent simplex features through geometric products of their vertices. To achieve efficient simplicial message passing, we share the parameters of the message network across different dimensions. Additionally, we restrict the final message to an aggregation of the incoming messages from different dimensions, leading to what we term shared simplicial message passing. Experimental results show that our method is able to outperform both equivariant and simplicial graph neural networks on a variety of geometric tasks.
New counterexamples to the birational Torelli theorem for Calabi--Yau manifolds
We produce counterexamples to the birational Torelli theorem for Calabi-Yau manifolds in arbitrarily high dimension: this is done by exhibiting a series of non birational pairs of Calabi-Yau (n^2-1)-folds which, for n geq 2 even, admit an isometry between their middle cohomologies. These varieties also satisfy an mathbb L-equivalence relation in the Grothendieck ring of varieties, i.e. the difference of their classes annihilates a power of the class of the affine line. We state this last property for a broader class of Calabi-Yau pairs, namely all those which are realized as pushforwards of a general (1,1)-section on a homogeneous roof in the sense of Kanemitsu, along its two extremal contractions.
Topological Singularity Detection at Multiple Scales
The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
The generalized roof F(1,2,n): Hodge structures and derived categories
We consider generalized homogeneous roofs, i.e. quotients of simply connected, semisimple Lie groups by a parabolic subgroup, which admit two projective bundle structures. Given a general hyperplane section on such a variety, we consider the zero loci of its pushforwards along the projective bundle structures and we discuss their properties at the level of Hodge structures. In the case of the flag variety F(1,2,n) with its projections to P^{n-1} and G(2, n), we construct a derived embedding of the relevant zero loci by methods based on the study of B-brane categories in the context of a gauged linear sigma model.
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
The natural world is full of complex systems characterized by intricate relations between their components: from social interactions between individuals in a social network to electrostatic interactions between atoms in a protein. Topological Deep Learning (TDL) provides a comprehensive framework to process and extract knowledge from data associated with these systems, such as predicting the social community to which an individual belongs or predicting whether a protein can be a reasonable target for drug development. TDL has demonstrated theoretical and practical advantages that hold the promise of breaking ground in the applied sciences and beyond. However, the rapid growth of the TDL literature has also led to a lack of unification in notation and language across Topological Neural Network (TNN) architectures. This presents a real obstacle for building upon existing works and for deploying TNNs to new real-world problems. To address this issue, we provide an accessible introduction to TDL, and compare the recently published TNNs using a unified mathematical and graphical notation. Through an intuitive and critical review of the emerging field of TDL, we extract valuable insights into current challenges and exciting opportunities for future development.
Calabi-Yau fibrations, simple K-equivalence and mutations
A homogeneous roof is a rational homogeneous variety of Picard rank 2 and index r equipped with two different mathbb P^{r-1}-bundle structures. We consider bundles of homogeneous roofs over a smooth projective variety, formulating a relative version of the duality of Calabi--Yau pairs associated to roofs of projective bundles. We discuss how derived equivalence of such pairs can lift to Calabi--Yau fibrations, extending a result of Bridgeland and Maciocia to higher-dimensional cases. We formulate an approach to prove that the DK-conjecture holds for a class of simple K-equivalent maps arising from bundles of roofs. As an example, we propose a pair of eight-dimensional Calabi--Yau varieties fibered in dual Calabi--Yau threefolds, related by a GLSM phase transition, and we prove derived equivalence with the methods above.
Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds
Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. For enhancing the accuracy of such machine learning methods, it is often effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud, we choose a filtration for the point cloud, an increasing sequence of spaces. Since the performance of machine learning methods combined with persistent homology is highly affected by the choice of a filtration, we need to tune it depending on data and tasks. In this paper, we propose a framework that learns a filtration adaptively with the use of neural networks. In order to make the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we show a theoretical result on a finite-dimensional approximation of filtration functions, which justifies the proposed network architecture. Experimental results demonstrated the efficacy of our framework in several classification tasks.
Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
Adiabatic Solutions of the Haydys-Witten Equations and Symplectic Khovanov Homology
An influential conjecture by Witten states that there is an instanton Floer homology of four-manifolds with corners that in certain situations is isomorphic to Khovanov homology of a given knot K. The Floer chain complex is generated by Nahm pole solutions of the Kapustin-Witten equations on R^3 times R^+_y with an additional monopole-like singular behaviour along the knot K inside the three-dimensional boundary at y=0. The Floer differential is given by counting solutions of the Haydys-Witten equations that interpolate between Kapustin-Witten solutions along an additional flow direction R_s. This article investigates solutions of a decoupled version of the Kapustin-Witten and Haydys-Witten equations on R_s times R^3 times R^+_y, which in contrast to the full equations exhibit a Hermitian Yang-Mills structure and can be viewed as a lift of the extended Bogomolny equations (EBE) from three to five dimensions. Inspired by Gaiotto-Witten's approach of adiabatically braiding EBE-solutions to obtain generators of the Floer homology, we propose that there is an equivalence between adiabatic solutions of the decoupled Haydys-Witten equations and non-vertical paths in the moduli space of EBE-solutions fibered over the space of monopole positions. Moreover, we argue that the Grothendieck-Springer resolution of the Lie algebra of the gauge group provides a finite-dimensional model of this moduli space of monopole solutions. These considerations suggest an intriguing similarity between Haydys-Witten instanton Floer homology and symplectic Khovanov homology and provide a novel approach towards a proof of Witten's gauge-theoretic interpretations of Khovanov homology.
Chordal Averaging on Flag Manifolds and Its Applications
This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix spaces, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis. We release our source code under https://github.com/nmank/FlagAveraging.
Topological Autoencoders
We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.
Topological Point Cloud Clustering
We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.
Pruning-based Topology Refinement of 3D Mesh using a 2D Alpha Mask
Image-based 3D reconstruction has increasingly stunning results over the past few years with the latest improvements in computer vision and graphics. Geometry and topology are two fundamental concepts when dealing with 3D mesh structures. But the latest often remains a side issue in the 3D mesh-based reconstruction literature. Indeed, performing per-vertex elementary displacements over a 3D sphere mesh only impacts its geometry and leaves the topological structure unchanged and fixed. Whereas few attempts propose to update the geometry and the topology, all need to lean on costly 3D ground-truth to determine the faces/edges to prune. We present in this work a method that aims to refine the topology of any 3D mesh through a face-pruning strategy that extensively relies upon 2D alpha masks and camera pose information. Our solution leverages a differentiable renderer that renders each face as a 2D soft map. Its pixel intensity reflects the probability of being covered during the rendering process by such a face. Based on the 2D soft-masks available, our method is thus able to quickly highlight all the incorrectly rendered faces for a given viewpoint. Because our module is agnostic to the network that produces the 3D mesh, it can be easily plugged into any self-supervised image-based (either synthetic or natural) 3D reconstruction pipeline to get complex meshes with a non-spherical topology.
A Very Elementary Introduction to Sheaves
This paper is a very non-rigorous, loose, and extremely basic introduction to sheaves. This is meant to be a a guide to gaining intuition about sheaves, what they look like, and how they work, so that after reading this paper, someone can jump into the extremely abstract definitions and examples seen in textbooks with at least some idea of what is going on. Most of this material is inspired and built from the work of Dr. Michael Robinson, and that of Dr. Robert Ghrist and Dr. Jakob Hansen, as well as Dr. Justin Curry's PhD thesis, who are some of the only applied sheaf theorists out there and they do an amazing job of explaining sheaves in a concrete way through their research. The rest of this paper is populated by mathematical definitions found in textbooks that I have stretched from two lines into multiple pages, as well as some analogies for thinking of sheaves I have thought of myself. This paper only assumes knowledge of basic linear algebra, basic group theory, and the very fundamentals of topology. If there is anything in the setup that you do not understand it is probably a quick Wikipedia search away. I hope this paper provides insight, intuition, and helpful examples of why sheaves are such powerful tools in both math and science.
Homoclinic Floer homology via direct limits
Let (M omega) be a two dimensional symplectic manifold, phi: M to M a symplectomorphism with hyperbolic fixed point x and transversely intersecting stable and unstable manifolds W^s(phi, x) cap W^u(phi, x)=:H(phi, x). The intersection points are called homoclinic points, and the stable and unstable manifold are in this situation Lagrangian submanifolds. For this Lagrangian intersection problem with its infinite number of intersection points and wild oscillation behavior, we first define a Floer homology generated by finite sets of so-called contractible homoclinic points. This generalizes very significantly the Floer homologies generated by (semi)primary points defined by us in earlier works. Nevertheless these Floer homologies only consider quite `local' aspects of W^s(phi, x) cap W^u(phi, x) since their generator sets are finite, but the number of all contractible homoclinic points is infinite. To overcome this issue, we construct a direct limit of these `local' homoclinic Floer homologies over suitable index sets. These direct limits thus accumulate the information gathered by the finitely generated local' homoclinic Floer homologies.
One-connection rule for structural equation models
Linear structural equation models are multivariate statistical models encoded by mixed graphs. In particular, the set of covariance matrices for distributions belonging to a linear structural equation model for a fixed mixed graph G=(V, D,B) is parameterized by a rational function with parameters for each vertex and edge in G. This rational parametrization naturally allows for the study of these models from an algebraic and combinatorial point of view. Indeed, this point of view has led to a collection of results in the literature, mainly focusing on questions related to identifiability and determining relationships between covariances (i.e., finding polynomials in the Gaussian vanishing ideal). So far, a large proportion of these results has focused on the case when D, the directed part of the mixed graph G, is acyclic. This is due to the fact that in the acyclic case, the parametrization becomes polynomial and there is a description of the entries of the covariance matrices in terms of a finite sum. We move beyond the acyclic case and give a closed form expression for the entries of the covariance matrices in terms of the one-connections in a graph obtained from D through some small operations. This closed form expression then allows us to show that if G is simple, then the parametrization map is generically finite-to-one. Finally, having a closed form expression for the covariance matrices allows for the development of an algorithm for systematically exploring possible polynomials in the Gaussian vanishing ideal.
Theoretical and Numerical Analysis of 3D Reconstruction Using Point and Line Incidences
We study the joint image of lines incident to points, meaning the set of image tuples obtained from fixed cameras observing a varying 3D point-line incidence. We prove a formula for the number of complex critical points of the triangulation problem that aims to compute a 3D point-line incidence from noisy images. Our formula works for an arbitrary number of images and measures the intrinsic difficulty of this triangulation. Additionally, we conduct numerical experiments using homotopy continuation methods, comparing different approaches of triangulation of such incidences. In our setup, exploiting the incidence relations gives both a faster point reconstruction and in three views more accurate.
Proving Olympiad Algebraic Inequalities without Human Demonstrations
Solving Olympiad-level mathematical problems represents a significant advancement in machine intelligence and automated reasoning. Current machine learning methods, however, struggle to solve Olympiad-level problems beyond Euclidean plane geometry due to a lack of large-scale, high-quality datasets. The challenge is even greater in algebraic systems, which involve infinite reasoning spaces within finite conditions. To address these issues, we propose AIPS, an Algebraic Inequality Proving System capable of autonomously generating complex inequality theorems and effectively solving Olympiad-level inequality problems without requiring human demonstrations. During proof search in a mixed reasoning manner, a value curriculum learning strategy on generated datasets is implemented to improve proving performance, demonstrating strong mathematical intuitions. On a test set of 20 International Mathematical Olympiad-level inequality problems, AIPS successfully solved 10, outperforming state-of-the-art methods. Furthermore, AIPS automatically generated a vast array of non-trivial theorems without human intervention, some of which have been evaluated by professional contestants and deemed to reach the level of the International Mathematical Olympiad. Notably, one theorem was selected as a competition problem in a major city 2024 Mathematical Olympiad.
Learners' Languages
In "Backprop as functor", the authors show that the fundamental elements of deep learning -- gradient descent and backpropagation -- can be conceptualized as a strong monoidal functor Para(Euc)toLearn from the category of parameterized Euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. It was soon realized that there is an isomorphism LearncongPara(Slens), where Slens is the symmetric monoidal category of simple lenses as used in functional programming. In this note, we observe that Slens is a full subcategory of Poly, the category of polynomial functors in one variable, via the functor Amapsto Ay^A. Using the fact that (Poly,otimes) is monoidal closed, we show that a map Ato B in Para(Slens) has a natural interpretation in terms of dynamical systems (more precisely, generalized Moore machines) whose interface is the internal-hom type [Ay^A,By^B]. Finally, we review the fact that the category p-Coalg of dynamical systems on any p in Poly forms a topos, and consider the logical propositions that can be stated in its internal language. We give gradient descent as an example, and we conclude by discussing some directions for future work.
Flagfolds
By interpreting the product of the Principal Component Analysis, that is the covariance matrix, as a sequence of nested subspaces naturally coming with weights according to the level of approximation they provide, we are able to embed all d--dimensional Grassmannians into a stratified space of covariance matrices. We observe that Grassmannians constitute the lowest dimensional skeleton of the stratification while it is possible to define a Riemaniann metric on the highest dimensional and dense stratum, such a metric being compatible with the global stratification. With such a Riemaniann metric at hand, it is possible to look for geodesics between two linear subspaces of different dimensions that do not go through higher dimensional linear subspaces as would euclidean geodesics. Building upon the proposed embedding of Grassmannians into the stratified space of covariance matrices, we generalize the concept of varifolds to what we call flagfolds in order to model multi-dimensional shapes.
Fullness of the Kuznetsov-Polishchuk exceptional collection for the spinor tenfold
Kuznetsov and Polishchuk provided a general algorithm to construct exceptional collections of maximal length for homogeneous varieties of type A,B,C,D. We consider the case of the spinor tenfold and we prove that the corresponding collection is full, i.e. it generates the whole derived category of coherent sheaves. As a step of the proof, we construct some resolutions of homogeneous vector bundles which might be of independent interest.
Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus
Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic class associated with their electronic band structure, the first Chern class. Given their importance to next generation computing and the computational challenge of calculating them using first-principles approaches, there is a need to develop machine learning approaches to predict the characteristic classes associated with a material system. To aid in this program we introduce the {Haldane bundle dataset}, which consists of synthetically generated complex line bundles on the 2-torus. We envision this dataset, which is not as challenging as noisy and sparsely measured real-world datasets but (as we show) still difficult for off-the-shelf architectures, to be a testing ground for architectures that incorporate the rich topological and geometric priors underlying characteristic classes.
ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results
This paper presents the computational challenge on differential geometry and topology that happened within the ICLR 2021 workshop "Geometric and Topological Representation Learning". The competition asked participants to provide creative contributions to the fields of computational geometry and topology through the open-source repositories Geomstats and Giotto-TDA. The challenge attracted 16 teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings.
Learning From Simplicial Data Based on Random Walks and 1D Convolutions
Triggered by limitations of graph-based deep learning methods in terms of computational expressivity and model flexibility, recent years have seen a surge of interest in computational models that operate on higher-order topological domains such as hypergraphs and simplicial complexes. While the increased expressivity of these models can indeed lead to a better classification performance and a more faithful representation of the underlying system, the computational cost of these higher-order models can increase dramatically. To this end, we here explore a simplicial complex neural network learning architecture based on random walks and fast 1D convolutions (SCRaWl), in which we can adjust the increase in computational cost by varying the length and number of random walks considered while accounting for higher-order relationships. Importantly, due to the random walk-based design, the expressivity of the proposed architecture is provably incomparable to that of existing message-passing simplicial neural networks. We empirically evaluate SCRaWl on real-world datasets and show that it outperforms other simplicial neural networks.
TopoBenchmarkX: A Framework for Benchmarking Topological Deep Learning
This work introduces TopoBenchmarkX, a modular open-source library designed to standardize benchmarking and accelerate research in Topological Deep Learning (TDL). TopoBenchmarkX maps the TDL pipeline into a sequence of independent and modular components for data loading and processing, as well as model training, optimization, and evaluation. This modular organization provides flexibility for modifications and facilitates the adaptation and optimization of various TDL pipelines. A key feature of TopoBenchmarkX is that it allows for the transformation and lifting between topological domains. This enables, for example, to obtain richer data representations and more fine-grained analyses by mapping the topology and features of a graph to higher-order topological domains such as simplicial and cell complexes. The range of applicability of TopoBenchmarkX is demonstrated by benchmarking several TDL architectures for various tasks and datasets.
Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA
In this article, we introduce a new parameterized family of topological invariants, taking the form of candidate decompositions, for multi-parameter persistence modules. We prove that our candidate decompositions are controllable approximations: when restricting to modules that can be decomposed into interval summands, we establish theoretical results about the approximation error between our candidate decompositions and the true underlying module in terms of the standard interleaving and bottleneck distances. Moreover, even when the underlying module does not admit such a decomposition, our candidate decompositions are nonetheless stable invariants; small perturbations in the underlying module lead to small perturbations in the candidate decomposition. Then, we introduce MMA (Multipersistence Module Approximation): an algorithm for computing stable instances of such invariants, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence. By design, MMA can handle an arbitrary number of filtrations, and has bounded complexity and running time. Finally, we present empirical evidence validating the generalization capabilities and running time speed-ups of MMA on several data sets.
Understanding networks and their behaviors using sheaf theory
Many complicated network problems can be easily understood on small networks. Difficulties arise when small networks are combined into larger ones. Fortunately, the mathematical theory of sheaves was constructed to address just this kind of situation; it extends locally-defined structures to globally valid inferences by way of consistency relations. This paper exhibits examples in network monitoring and filter hardware where sheaves have useful descriptive power.
Topological Graph Neural Networks
Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive (in terms the Weisfeiler--Lehman graph isomorphism test) than message-passing GNNs. Augmenting GNNs with TOGL leads to improved predictive performance for graph and node classification tasks, both on synthetic data sets, which can be classified by humans using their topology but not by ordinary GNNs, and on real-world data.
Barycentric Subspace Analysis on Manifolds
This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and general type of family of subspaces in manifolds that we call barycentric subspaces. They are implicitly defined as the locus of points which are weighted means of k+1 reference points. As this definition relies on points and not on tangent vectors, it can also be extended to geodesic spaces which are not Riemannian. For instance, in stratified spaces, it naturally allows principal subspaces that span several strata, which is impossible in previous generalizations of PCA. We show that barycentric subspaces locally define a submanifold of dimension k which generalizes geodesic subspaces.Second, we rephrase PCA in Euclidean spaces as an optimization on flags of linear subspaces (a hierarchy of properly embedded linear subspaces of increasing dimension). We show that the Euclidean PCA minimizes the Accumulated Unexplained Variances by all the subspaces of the flag (AUV). Barycentric subspaces are naturally nested, allowing the construction of hierarchically nested subspaces. Optimizing the AUV criterion to optimally approximate data points with flags of affine spans in Riemannian manifolds lead to a particularly appealing generalization of PCA on manifolds called Barycentric Subspaces Analysis (BSA).
Persistent homology of the cosmic web. I: Hierarchical topology in ΛCDM cosmologies
Using a set of LambdaCDM simulations of cosmic structure formation, we study the evolving connectivity and changing topological structure of the cosmic web using state-of-the-art tools of multiscale topological data analysis (TDA). We follow the development of the cosmic web topology in terms of the evolution of Betti number curves and feature persistence diagrams of the three (topological) classes of structural features: matter concentrations, filaments and tunnels, and voids. The Betti curves specify the prominence of features as a function of density level, and their evolution with cosmic epoch reflects the changing network connections between these structural features. The persistence diagrams quantify the longevity and stability of topological features. In this study we establish, for the first time, the link between persistence diagrams, the features they show, and the gravitationally driven cosmic structure formation process. By following the diagrams' development over cosmic time, the link between the multiscale topology of the cosmic web and the hierarchical buildup of cosmic structure is established. The sharp apexes in the diagrams are intimately related to key transitions in the structure formation process. The apex in the matter concentration diagrams coincides with the density level at which, typically, they detach from the Hubble expansion and begin to collapse. At that level many individual islands merge to form the network of the cosmic web and a large number of filaments and tunnels emerge to establish its connecting bridges. The location trends of the apex possess a self-similar character that can be related to the cosmic web's hierarchical buildup. We find that persistence diagrams provide a significantly higher and more profound level of information on the structure formation process than more global summary statistics like Euler characteristic or Betti numbers.
Topology-Aware Latent Diffusion for 3D Shape Generation
We introduce a new generative model that combines latent diffusion with persistent homology to create 3D shapes with high diversity, with a special emphasis on their topological characteristics. Our method involves representing 3D shapes as implicit fields, then employing persistent homology to extract topological features, including Betti numbers and persistence diagrams. The shape generation process consists of two steps. Initially, we employ a transformer-based autoencoding module to embed the implicit representation of each 3D shape into a set of latent vectors. Subsequently, we navigate through the learned latent space via a diffusion model. By strategically incorporating topological features into the diffusion process, our generative module is able to produce a richer variety of 3D shapes with different topological structures. Furthermore, our framework is flexible, supporting generation tasks constrained by a variety of inputs, including sparse and partial point clouds, as well as sketches. By modifying the persistence diagrams, we can alter the topology of the shapes generated from these input modalities.
Bimonoidal Structure of Probability Monads
We give a conceptual treatment of the notion of joints, marginals, and independence in the setting of categorical probability. This is achieved by endowing the usual probability monads (like the Giry monad) with a monoidal and an opmonoidal structure, mutually compatible (i.e. a bimonoidal structure). If the underlying monoidal category is cartesian monoidal, a bimonoidal structure is given uniquely by a commutative strength. However, if the underlying monoidal category is not cartesian monoidal, a strength is not enough to guarantee all the desired properties of joints and marginals. A bimonoidal structure is then the correct requirement for the more general case. We explain the theory and the operational interpretation, with the help of the graphical calculus for monoidal categories. We give a definition of stochastic independence based on the bimonoidal structure, compatible with the intuition and with other approaches in the literature for cartesian monoidal categories. We then show as an example that the Kantorovich monad on the category of complete metric spaces is a bimonoidal monad for a non-cartesian monoidal structure.
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.
Differentiability and Optimization of Multiparameter Persistent Homology
Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation mapping a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors.
Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach
Matrix manifolds, such as manifolds of Symmetric Positive Definite (SPD) matrices and Grassmann manifolds, appear in many applications. Recently, by applying the theory of gyrogroups and gyrovector spaces that is a powerful framework for studying hyperbolic geometry, some works have attempted to build principled generalizations of Euclidean neural networks on matrix manifolds. However, due to the lack of many concepts in gyrovector spaces for the considered manifolds, e.g., the inner product and gyroangles, techniques and mathematical tools provided by these works are still limited compared to those developed for studying hyperbolic geometry. In this paper, we generalize some notions in gyrovector spaces for SPD and Grassmann manifolds, and propose new models and layers for building neural networks on these manifolds. We show the effectiveness of our approach in two applications, i.e., human action recognition and knowledge graph completion.
Derived categories of families of Fano threefolds
We construct S-linear semiorthogonal decompositions of derived categories of smooth Fano threefold fibrations X/S with relative Picard rank 1 and rational geometric fibers and discuss how the structure of components of these decompositions is related to rationality properties of X/S.
Capacity Analysis of Vector Symbolic Architectures
Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for "bundling" and one outer-product-like for "binding") form a *vector symbolic architecture* (VSA). While VSAs have been employed in numerous applications and have been studied empirically, many theoretical questions about VSAs remain open. We analyze the *representation capacities* of four common VSAs: MAP-I, MAP-B, and two VSAs based on sparse binary vectors. "Representation capacity' here refers to bounds on the dimensions of the VSA vectors required to perform certain symbolic tasks, such as testing for set membership i in S and estimating set intersection sizes |X cap Y| for two sets of symbols X and Y, to a given degree of accuracy. We also analyze the ability of a novel variant of a Hopfield network (a simple model of associative memory) to perform some of the same tasks that are typically asked of VSAs. In addition to providing new bounds on VSA capacities, our analyses establish and leverage connections between VSAs, "sketching" (dimensionality reduction) algorithms, and Bloom filters.
TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning
Topology reasoning aims to comprehensively understand road scenes and present drivable routes in autonomous driving. It requires detecting road centerlines (lane) and traffic elements, further reasoning their topology relationship, i.e., lane-lane topology, and lane-traffic topology. In this work, we first present that the topology score relies heavily on detection performance on lane and traffic elements. Therefore, we introduce a powerful 3D lane detector and an improved 2D traffic element detector to extend the upper limit of topology performance. Further, we propose TopoMLP, a simple yet high-performance pipeline for driving topology reasoning. Based on the impressive detection performance, we develop two simple MLP-based heads for topology generation. TopoMLP achieves state-of-the-art performance on OpenLane-V2 benchmark, i.e., 41.2% OLS with ResNet-50 backbone. It is also the 1st solution for 1st OpenLane Topology in Autonomous Driving Challenge. We hope such simple and strong pipeline can provide some new insights to the community. Code is at https://github.com/wudongming97/TopoMLP.
Optimizing NOTEARS Objectives via Topological Swaps
Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimization challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swapping pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swapping pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo.
From Latent Graph to Latent Topology Inference: Differentiable Cell Complex Module
Latent Graph Inference (LGI) relaxed the reliance of Graph Neural Networks (GNNs) on a given graph topology by dynamically learning it. However, most of LGI methods assume to have a (noisy, incomplete, improvable, ...) input graph to rewire and can solely learn regular graph topologies. In the wake of the success of Topological Deep Learning (TDL), we study Latent Topology Inference (LTI) for learning higher-order cell complexes (with sparse and not regular topology) describing multi-way interactions between data points. To this aim, we introduce the Differentiable Cell Complex Module (DCM), a novel learnable function that computes cell probabilities in the complex to improve the downstream task. We show how to integrate DCM with cell complex message passing networks layers and train it in a end-to-end fashion, thanks to a two-step inference procedure that avoids an exhaustive search across all possible cells in the input, thus maintaining scalability. Our model is tested on several homophilic and heterophilic graph datasets and it is shown to outperform other state-of-the-art techniques, offering significant improvements especially in cases where an input graph is not provided.
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.
Path-based Algebraic Foundations of Graph Query Languages
Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data models and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, there exists a panoply of concrete graph query languages provided by current graph database systems, each offering different query features. A common limitation of current graph query engines is the absence of an algebraic approach for evaluating path queries. To address this, we introduce an abstract algebra for evaluating path queries, allowing paths to be treated as first-class entities within the query processing pipeline. We demonstrate that our algebra can express a core fragment of path queries defined in GQL and SQL/PGQ, thereby serving as a formal framework for studying both standards and supporting their implementation in current graph database systems. We also show that evaluation trees for path algebra expressions can function as logical plans for evaluating path queries and enable the application of query optimization techniques. Our algebraic framework has the potential to act as a lingua franca for path query evaluation, enabling different implementations to be expressed and compared.
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and L_2 loss, trained on reasoning tasks in Abelian group (e.g., modular addition). Such a rich structure enables analytical construction of global optimal solutions from partial solutions that only satisfy part of the loss, despite its high nonlinearity. We coin the framework as CoGO (Composing Global Optimizers). Specifically, we show that the weight space over different numbers of hidden nodes of the 2-layer network is equipped with a semi-ring algebraic structure, and the loss function to be optimized consists of monomial potentials, which are ring homomorphism, allowing partial solutions to be composed into global ones by ring addition and multiplication. Our experiments show that around 95% of the solutions obtained by gradient descent match exactly our theoretical constructions. Although the global optimizers constructed only required a small number of hidden nodes, our analysis on gradient dynamics shows that over-parameterization asymptotically decouples training dynamics and is beneficial. We further show that training dynamics favors simpler solutions under weight decay, and thus high-order global optimizers such as perfect memorization are unfavorable.
A Compositional Atlas for Algebraic Circuits
Circuits based on sum-product structure have become a ubiquitous representation to compactly encode knowledge, from Boolean functions to probability distributions. By imposing constraints on the structure of such circuits, certain inference queries become tractable, such as model counting and most probable configuration. Recent works have explored analyzing probabilistic and causal inference queries as compositions of basic operators to derive tractability conditions. In this paper, we take an algebraic perspective for compositional inference, and show that a large class of queries - including marginal MAP, probabilistic answer set programming inference, and causal backdoor adjustment - correspond to a combination of basic operators over semirings: aggregation, product, and elementwise mapping. Using this framework, we uncover simple and general sufficient conditions for tractable composition of these operators, in terms of circuit properties (e.g., marginal determinism, compatibility) and conditions on the elementwise mappings. Applying our analysis, we derive novel tractability conditions for many such compositional queries. Our results unify tractability conditions for existing problems on circuits, while providing a blueprint for analysing novel compositional inference queries.
Feature emergence via margin maximization: case studies in algebraic tasks
Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
Classifying Clustering Schemes
Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality. Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the data set, for example by adding points or by applying functions to it. We show that within this framework, one can prove a theorems analogous to one of J. Kleinberg, in which for example one obtains an existence and uniqueness theorem instead of a non-existence result. We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes it is possible to construct richer families of clustering schemes that exhibit sensitivity to density.
Graphlets correct for the topological information missed by random walks
Random walks are widely used for mining networks due to the computational efficiency of computing them. For instance, graph representation learning learns a d-dimensional embedding space, so that the nodes that tend to co-occur on random walks (a proxy of being in the same network neighborhood) are close in the embedding space. Specific local network topology (i.e., structure) influences the co-occurrence of nodes on random walks, so random walks of limited length capture only partial topological information, hence diminishing the performance of downstream methods. We explicitly capture all topological neighborhood information and improve performance by introducing orbit adjacencies that quantify the adjacencies of two nodes as co-occurring on a given pair of graphlet orbits, which are symmetric positions on graphlets (small, connected, non-isomorphic, induced subgraphs of a large network). Importantly, we mathematically prove that random walks on up to k nodes capture only a subset of all the possible orbit adjacencies for up to k-node graphlets. Furthermore, we enable orbit adjacency-based analysis of networks by developing an efficient GRaphlet-orbit ADjacency COunter (GRADCO), which exhaustively computes all 28 orbit adjacency matrices for up to four-node graphlets. Note that four-node graphlets suffice, because real networks are usually small-world. In large networks on around 20,000 nodes, GRADCOcomputesthe28matricesinminutes. Onsixrealnetworksfromvarious domains, we compare the performance of node-label predictors obtained by using the network embeddings based on our orbit adjacencies to those based on random walks. We find that orbit adjacencies, which include those unseen by random walks, outperform random walk-based adjacencies, demonstrating the importance of the inclusion of the topological neighborhood information that is unseen by random walks.
Functorial Manifold Learning
We adapt previous research on category theory and topological unsupervised learning to develop a functorial perspective on manifold learning, also known as nonlinear dimensionality reduction. We first characterize manifold learning algorithms as functors that map pseudometric spaces to optimization objectives and that factor through hierarchical clustering functors. We then use this characterization to prove refinement bounds on manifold learning loss functions and construct a hierarchy of manifold learning algorithms based on their equivariants. We express several popular manifold learning algorithms as functors at different levels of this hierarchy, including Metric Multidimensional Scaling, IsoMap, and UMAP. Next, we use interleaving distance to study the stability of a broad class of manifold learning algorithms. We present bounds on how closely the embeddings these algorithms produce from noisy data approximate the embeddings they would learn from noiseless data. Finally, we use our framework to derive a set of novel manifold learning algorithms, which we experimentally demonstrate are competitive with the state of the art.
Preservation of Loewy Diagrams Under Exact Functors
We derive sufficient conditions for exact functors on locally finite abelian categories to preserve Loewy diagrams of objects. We apply our results to determine sufficient conditions for induction functors associated to simple current extensions of vertex algebras to preserve Loewy diagrams.
Information structures and their cohomology
We introduce the category of information structures, whose objects are suitable diagrams of measurable sets that encode the possible outputs of a given family of observables and their mutual relationships of refinement; they serve as mathematical models of contextuality in classical and quantum settings. Each information structure can be regarded as a ringed site with trivial topology; the structure ring is generated by the observables themselves and its multiplication corresponds to joint measurement. We extend Baudot and Bennequin's definition of information cohomology to this setting, as a derived functor in the category of modules over the structure ring, and show explicitly that the bar construction gives a projective resolution in that category, recovering in this way the cochain complexes previously considered in the literature. Finally, we study the particular case of a one-parameter family of coefficients made of functions of probability distributions. The only 1-cocycles are Shannon entropy or Tsallis alpha-entropy, depending on the value of the parameter.
Geometric Algebra Transformers
Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In experiments with n-body modeling and robotic planning, GATr shows strong improvements over non-geometric baselines.
Unveiling Real Triple Degeneracies in Crystals: Exploring Link and Compound Structures
With their non-Abelian topological charges, real multi-bandgap systems challenge the conventional topological phase classifications. As the minimal sector of multi-bandgap systems, real triple degeneracies (RTPs), which serve as real 'Weyl points', lay the foundation for the research on real topological phases. However, experimental demonstration of physical systems with global band configurations consisting of multiple RTPs in crystals has not been reported. Here we present experimental evidence of RTPs in photonic meta-crystals, characterizing them using the Euler number, and establishing their connection with both Abelian and non-Abelian charges. By considering RTPs as the basic elements, we further propose the concept of a topological compound, akin to a chemical compound, where we find that certain phases are not topologically allowed. The topological classification of RTPs in crystals demonstrated in our work plays a similar role as the 'no-go' theorem in Weyl systems.
Principal Landau Determinants
We reformulate the Landau analysis of Feynman integrals with the aim of advancing the state of the art in modern particle-physics computations. We contribute new algorithms for computing Landau singularities, using tools from polyhedral geometry and symbolic/numerical elimination. Inspired by the work of Gelfand, Kapranov, and Zelevinsky (GKZ) on generalized Euler integrals, we define the principal Landau determinant of a Feynman diagram. We illustrate with a number of examples that this algebraic formalism allows to compute many components of the Landau singular locus. We adapt the GKZ framework by carefully specializing Euler integrals to Feynman integrals. For instance, ultraviolet and infrared singularities are detected as irreducible components of an incidence variety, which project dominantly to the kinematic space. We compute principal Landau determinants for the infinite families of one-loop and banana diagrams with different mass configurations, and for a range of cutting-edge Standard Model processes. Our algorithms build on the Julia package Landau.jl and are implemented in the new open-source package PLD.jl available at https://mathrepo.mis.mpg.de/PLD/.
An elementary and unified proof of Grothendieck's inequality
We present an elementary, self-contained proof of Grothendieck's inequality that unifies the real and complex cases and yields both the Krivine and Haagerup bounds, the current best-known explicit bounds for the real and complex Grothendieck constants respectively. This article is intended to be pedagogical, combining and streamlining known ideas of Lindenstrauss--Pe{\l}czy\'nski, Krivine, and Haagerup into a proof that need only univariate calculus, basic complex variables, and a modicum of linear algebra as prerequisites.
Sheaf Theory through Examples (Abridged Version)
This book provides an inviting tour through sheaf theory, from the perspective of applied category theory and pitched at a less specialized audience than is typical with introductions to sheaves. The book makes it as easy as possible for the reader new to sheaves, by motivating and developing the theory via a broad range of concrete examples and explicit constructions, including applications to n-colorings of graphs, satellite data, chess problems, Bayes nets, musical performance, complexes, and more. Included is an extended first chapter introducing and motivating all the necessary category-theoretical background, again with a strong emphasis on concrete examples. A new and unabridged version (including a fifth chapter on more advanced topics and a conclusion) will be available with MIT Press.
All Weight Systems for Calabi-Yau Fourfolds from Reflexive Polyhedra
For any given dimension d, all reflexive d-polytopes can be found (in principle) as subpolytopes of a number of maximal polyhedra that are defined in terms of (d+1)-tuples of integers (weights), or combinations of k-tuples of weights with k<d+1. We present the results of a complete classification of sextuples of weights pertaining to the construction of all reflexive polytopes in five dimensions. We find 322 383 760 930 such weight systems. 185 269 499 015 of them give rise directly to reflexive polytopes and thereby to mirror pairs of Calabi-Yau fourfolds. These lead to 532 600 483 distinct sets of Hodge numbers.
Categories of Differentiable Polynomial Circuits for Machine Learning
Reverse derivative categories (RDCs) have recently been shown to be a suitable semantic framework for studying machine learning algorithms. Whereas emphasis has been put on training methodologies, less attention has been devoted to particular model classes: the concrete categories whose morphisms represent machine learning models. In this paper we study presentations by generators and equations of classes of RDCs. In particular, we propose polynomial circuits as a suitable machine learning model. We give an axiomatisation for these circuits and prove a functional completeness result. Finally, we discuss the use of polynomial circuits over specific semirings to perform machine learning with discrete values.
The Topology and Geometry of Neural Representations
A central question for neuroscience is how to characterize brain representations of perceptual and cognitive content. An ideal characterization should distinguish different functional regions with robustness to noise and idiosyncrasies of individual brains that do not correspond to computational differences. Previous studies have characterized brain representations by their representational geometry, which is defined by the representational dissimilarity matrix (RDM), a summary statistic that abstracts from the roles of individual neurons (or responses channels) and characterizes the discriminability of stimuli. Here we explore a further step of abstraction: from the geometry to the topology of brain representations. We propose topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics that generalizes the RDM to characterize the topology while de-emphasizing the geometry. We evaluate this new family of statistics in terms of the sensitivity and specificity for model selection using both simulations and functional MRI (fMRI) data. In the simulations, the ground truth is a data-generating layer representation in a neural network model and the models are the same and other layers in different model instances (trained from different random seeds). In fMRI, the ground truth is a visual area and the models are the same and other areas measured in different subjects. Results show that topology-sensitive characterizations of population codes are robust to noise and interindividual variability and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions.
Expectation-Complete Graph Representations with Homomorphisms
We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.
Geometric Clifford Algebra Networks
We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the Pin(p,q,r) group. We then propose the concept of group action layers, which linearly combine object transformations using pre-specified group actions. Together with a new activation and normalization scheme, these layers serve as adjustable geometric templates that can be refined via gradient descent. Theoretical advantages are strongly reflected in the modeling of three-dimensional rigid body transformations as well as large-scale fluid dynamics simulations, showing significantly improved performance over traditional methods.
On affine spaces of alternating matrices with constant rank
Let F be a field, and n geq r>0 be integers, with r even. Denote by A_n(F) the space of all n-by-n alternating matrices with entries in F. We consider the problem of determining the greatest possible dimension for an affine subspace of A_n(F) in which every matrix has rank equal to r (or rank at least r). Recently Rubei has solved this problem over the field of real numbers. We extend her result to all fields with large enough cardinality. Provided that n geq r+3 and |F|geq minbigl(r-1,r{2}+2bigr), we also determine the affine subspaces of rank r matrices in A_n(F) that have the greatest possible dimension, and we point to difficulties for the corresponding problem in the case nleq r+2.
Differentiable Euler Characteristic Transforms for Shape Classification
The Euler Characteristic Transform (ECT) has proven to be a powerful representation, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the Differentiable Euler Characteristic Transform (DECT), is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.
Geometric Algebra Attention Networks for Small Point Clouds
Much of the success of deep learning is drawn from building architectures that properly respect underlying symmetry and structure in the data on which they operate - a set of considerations that have been united under the banner of geometric deep learning. Often problems in the physical sciences deal with relatively small sets of points in two- or three-dimensional space wherein translation, rotation, and permutation equivariance are important or even vital for models to be useful in practice. In this work, we present rotation- and permutation-equivariant architectures for deep learning on these small point clouds, composed of a set of products of terms from the geometric algebra and reductions over those products using an attention mechanism. The geometric algebra provides valuable mathematical structure by which to combine vector, scalar, and other types of geometric inputs in a systematic way to account for rotation invariance or covariance, while attention yields a powerful way to impose permutation equivariance. We demonstrate the usefulness of these architectures by training models to solve sample problems relevant to physics, chemistry, and biology.
Categorification of Group Equivariant Neural Networks
We present a novel application of category theory for deep learning. We show how category theory can be used to understand and work with the linear layer functions of group equivariant neural networks whose layers are some tensor power space of R^{n} for the groups S_n, O(n), Sp(n), and SO(n). By using category theoretic constructions, we build a richer structure that is not seen in the original formulation of these neural networks, leading to new insights. In particular, we outline the development of an algorithm for quickly computing the result of a vector that is passed through an equivariant, linear layer for each group in question. The success of our approach suggests that category theory could be beneficial for other areas of deep learning.
Space-time tradeoffs of lenses and optics via higher category theory
Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category 2-Optic(C) whose 2-cells explicitly track optics' internal configuration. We show that the 1-category Optic(C) arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic.
Graph Automorphism Group Equivariant Neural Networks
For any graph G having n vertices and its automorphism group Aut(G), we provide a full characterisation of all of the possible Aut(G)-equivariant neural networks whose layers are some tensor power of R^{n}. In particular, we find a spanning set of matrices for the learnable, linear, Aut(G)-equivariant layer functions between such tensor power spaces in the standard basis of R^{n}.
Discrete Galerkin Method for Fractional Integro-Differential Equations
In this paper, we develop a fully discrete Galerkin method for solving initial value fractional integro-differential equations(FIDEs). We consider Generalized Jacobi polynomials(GJPs) with indexes corresponding to the number of homogeneous initial conditions as natural basis functions for the approximate solution. The fractional derivatives are used in the Caputo sense. The numerical solvability of algebraic system obtained from implementation of proposed method for a special case of FIDEs is investigated. We also provide a suitable convergence analysis to approximate solutions under a more general regularity assumption on the exact solution.
A Theory of Topological Derivatives for Inverse Rendering of Geometry
We introduce a theoretical framework for differentiable surface evolution that allows discrete topology changes through the use of topological derivatives for variational optimization of image functionals. While prior methods for inverse rendering of geometry rely on silhouette gradients for topology changes, such signals are sparse. In contrast, our theory derives topological derivatives that relate the introduction of vanishing holes and phases to changes in image intensity. As a result, we enable differentiable shape perturbations in the form of hole or phase nucleation. We validate the proposed theory with optimization of closed curves in 2D and surfaces in 3D to lend insights into limitations of current methods and enable improved applications such as image vectorization, vector-graphics generation from text prompts, single-image reconstruction of shape ambigrams and multi-view 3D reconstruction.
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a pre-existing vocabulary. In contrast, we seek to approximate representations from an encoder as combinations of a smaller set of vectors in the embedding space. These vectors can be seen as "ideal words" for generating concepts directly within the embedding space of the model. We first present a framework for understanding compositional structures from a geometric perspective. We then explain what these compositional structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice. Finally, we empirically explore these structures in CLIP's embeddings and we evaluate their usefulness for solving different vision-language tasks such as classification, debiasing, and retrieval. Our results show that simple linear algebraic operations on embedding vectors can be used as compositional and interpretable methods for regulating the behavior of VLMs.
Generalized Convolution and Efficient Language Recognition
Convolution is a broadly useful operation with applications including signal processing, machine learning, probability, optics, polynomial multiplication, and efficient parsing. Usually, however, this operation is understood and implemented in more specialized forms, hiding commonalities and limiting usefulness. This paper formulates convolution in the common algebraic framework of semirings and semimodules and populates that framework with various representation types. One of those types is the grand abstract template and itself generalizes to the free semimodule monad. Other representations serve varied uses and performance trade-offs, with implementations calculated from simple and regular specifications. Of particular interest is Brzozowski's method for regular expression matching. Uncovering the method's essence frees it from syntactic manipulations, while generalizing from boolean to weighted membership (such as multisets and probability distributions) and from sets to n-ary relations. The classic trie data structure then provides an elegant and efficient alternative to syntax. Pleasantly, polynomial arithmetic requires no additional implementation effort, works correctly with a variety of representations, and handles multivariate polynomials and power series with ease. Image convolution also falls out as a special case.
Mukai duality via roofs of projective bundles
We investigate a construction providing pairs of Calabi-Yau varieties described as zero loci of pushforwards of a hyperplane section on a roof as described by Kanemitsu. We discuss the implications of such construction at the level of Hodge equivalence, derived equivalence and mathbb L-equivalence. For the case of K3 surfaces, we provide alternative interpretations for the Fourier-Mukai duality in the family of K3 surfaces of degree 12 of Mukai. In all these constructions the derived equivalence lifts to an equivalence of matrix factorizations categories.
A Latent Variable Model Approach to PMI-based Word Embeddings
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of~mnih2007three. The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps explain why low-dimensional semantic embeddings contain linear algebraic structure that allows solution of word analogies, as shown by~mikolov2013efficient and many subsequent papers. Experimental support is provided for the generative model assumptions, the most important of which is that latent word vectors are fairly uniformly dispersed in space.
Lexically Grounded Subword Segmentation
We present three innovations in tokenization and subword segmentation. First, we propose to use unsupervised morphological analysis with Morfessor as pre-tokenization. Second, we present an algebraic method for obtaining subword embeddings grounded in a word embedding space. Based on that, we design a novel subword segmentation algorithm that uses the embeddings, ensuring that the procedure considers lexical meaning. Third, we introduce an efficient segmentation algorithm based on a subword bigram model that can be initialized with the lexically aware segmentation method to avoid using Morfessor and large embedding tables at inference time. We evaluate the proposed approaches using two intrinsic metrics and measure their performance on two downstream tasks: part-of-speech tagging and machine translation. Our experiments show significant improvements in the morphological plausibility of the segmentation when evaluated using segmentation precision on morpheme boundaries and improved R\'enyi efficiency in 8 languages. Although the proposed tokenization methods do not have a large impact on automatic translation quality, we observe consistent performance gains in the arguably more morphological task of part-of-speech tagging.
Complex Network for Complex Problems: A comparative study of CNN and Complex-valued CNN
Neural networks, especially convolutional neural networks (CNN), are one of the most common tools these days used in computer vision. Most of these networks work with real-valued data using real-valued features. Complex-valued convolutional neural networks (CV-CNN) can preserve the algebraic structure of complex-valued input data and have the potential to learn more complex relationships between the input and the ground-truth. Although some comparisons of CNNs and CV-CNNs for different tasks have been performed in the past, a large-scale investigation comparing different models operating on different tasks has not been conducted. Furthermore, because complex features contain both real and imaginary components, CV-CNNs have double the number of trainable parameters as real-valued CNNs in terms of the actual number of trainable parameters. Whether or not the improvements in performance with CV-CNN observed in the past have been because of the complex features or just because of having double the number of trainable parameters has not yet been explored. This paper presents a comparative study of CNN, CNNx2 (CNN with double the number of trainable parameters as the CNN), and CV-CNN. The experiments were performed using seven models for two different tasks - brain tumour classification and segmentation in brain MRIs. The results have revealed that the CV-CNN models outperformed the CNN and CNNx2 models.
Re-evaluating Evaluation
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.
Extending Bootstrap AMG for Clustering of Attributed Graphs
In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as proposed in [1, 2]. The augmented graph is then embedded in a Euclidean space associated to its Laplacian and we cluster vertices via a modified K-means algorithm, using a new vector-valued distance in the embedding space. Main novelty of our method, which can be classified as an early fusion method, i.e., a method in which additional information on vertices are fused to the structure information before applying clustering, is the interpretation of attributes as new realizations of graph vertices, which can be dealt with as coordinate vectors in a related Euclidean space. This allows us to extend a scalable generalized spectral clustering procedure which substitutes graph Laplacian eigenvectors with some vectors, named algebraically smooth vectors, obtained by a linear-time complexity Algebraic MultiGrid (AMG) method. We discuss the performance of our proposed clustering method by comparison with recent literature approaches and public available results. Extensive experiments on different types of synthetic datasets and real-world attributed graphs show that our new algorithm, embedding attributes information in the clustering, outperforms structure-only-based methods, when the attributed network has an ambiguous structure. Furthermore, our new method largely outperforms the method which originally proposed the graph augmentation, showing that our embedding strategy and vector-valued distance are very effective in taking advantages from the augmented-graph representation.
MathPrompter: Mathematical Reasoning using Large Language Models
Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks and often provide incorrect answers. Unlike natural language understanding, math problems typically have a single correct answer, making the task of generating accurate solutions more challenging for LLMs. To the best of our knowledge, we are not aware of any LLMs that indicate their level of confidence in their responses which fuels a trust deficit in these models impeding their adoption. To address this deficiency, we propose `MathPrompter', a technique that improves performance of LLMs on arithmetic problems along with increased reliance in the predictions. MathPrompter uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways and thereby raise the confidence level in the output results. This is in contrast to other prompt based CoT methods, where there is no check on the validity of the intermediate steps followed. Our technique improves over state-of-the-art on the MultiArith dataset (78.7%rightarrow92.5%) evaluated using 175B parameter GPT-based LLM.
Higher Order Automatic Differentiation of Higher Order Functions
We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.
DYNOTEARS: Structure Learning from Time-Series Data
We revisit the structure learning problem for dynamic Bayesian networks and propose a method that simultaneously estimates contemporaneous (intra-slice) and time-lagged (inter-slice) relationships between variables in a time-series. Our approach is score-based, and revolves around minimizing a penalized loss subject to an acyclicity constraint. To solve this problem, we leverage a recent algebraic result characterizing the acyclicity constraint as a smooth equality constraint. The resulting algorithm, which we call DYNOTEARS, outperforms other methods on simulated data, especially in high-dimensions as the number of variables increases. We also apply this algorithm on real datasets from two different domains, finance and molecular biology, and analyze the resulting output. Compared to state-of-the-art methods for learning dynamic Bayesian networks, our method is both scalable and accurate on real data. The simple formulation and competitive performance of our method make it suitable for a variety of problems where one seeks to learn connections between variables across time.
Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing
We present semantic correctness proofs of Automatic Differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Finally, we sketch how the analysis extends to other AD methods by considering a continuation-based method.
Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks
In the graph node embedding problem, embedding spaces can vary significantly for different data types, leading to the need for different GNN model types. In this paper, we model the embedding update of a node feature as a Hamiltonian orbit over time. Since the Hamiltonian orbits generalize the exponential maps, this approach allows us to learn the underlying manifold of the graph in training, in contrast to most of the existing literature that assumes a fixed graph embedding manifold with a closed exponential map solution. Our proposed node embedding strategy can automatically learn, without extensive tuning, the underlying geometry of any given graph dataset even if it has diverse geometries. We test Hamiltonian functions of different forms and verify the performance of our approach on two graph node embedding downstream tasks: node classification and link prediction. Numerical experiments demonstrate that our approach adapts better to different types of graph datasets than popular state-of-the-art graph node embedding GNNs. The code is available at https://github.com/zknus/Hamiltonian-GNN.
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of modern mathematics -- from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory. Solving a typical problem requires multiple hours of effort from a researcher in the relevant branch of mathematics, and for the upper end questions, multiple days. FrontierMath uses new, unpublished problems and automated verification to reliably evaluate models while minimizing risk of data contamination. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community. As AI systems advance toward expert-level mathematical abilities, FrontierMath offers a rigorous testbed that quantifies their progress.
Simple Token-Level Confidence Improves Caption Correctness
The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding. However, state-of-the-art models often misinterpret the correctness of fine-grained details, leading to errors in outputs such as hallucinating objects in generated captions or poor compositional reasoning. In this work, we explore Token-Level Confidence, or TLC, as a simple yet surprisingly effective method to assess caption correctness. Specifically, we fine-tune a vision-language model on image captioning, input an image and proposed caption to the model, and aggregate either algebraic or learned token confidences over words or sequences to estimate image-caption consistency. Compared to sequence-level scores from pretrained models, TLC with algebraic confidence measures achieves a relative improvement in accuracy by 10% on verb understanding in SVO-Probes and outperforms prior state-of-the-art in image and group scores for compositional reasoning in Winoground by a relative 37% and 9%, respectively. When training data are available, a learned confidence estimator provides further improved performance, reducing object hallucination rates in MS COCO Captions by a relative 30% over the original model and setting a new state-of-the-art.
Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams
Solving Algebra Problems with Geometry Diagrams (APGDs) is still a challenging problem because diagram processing is not studied as intensively as language processing. To work against this challenge, this paper proposes a hologram reasoning scheme and develops a high-performance method for solving APGDs by using this scheme. To reach this goal, it first defines a hologram, being a kind of graph, and proposes a hologram generator to convert a given APGD into a hologram, which represents the entire information of APGD and the relations for solving the problem can be acquired from it by a uniform way. Then HGR, a hologram reasoning method employs a pool of prepared graph models to derive algebraic equations, which is consistent with the geometric theorems. This method is able to be updated by adding new graph models into the pool. Lastly, it employs deep reinforcement learning to enhance the efficiency of model selection from the pool. The entire HGR not only ensures high solution accuracy with fewer reasoning steps but also significantly enhances the interpretability of the solution process by providing descriptions of all reasoning steps. Experimental results demonstrate the effectiveness of HGR in improving both accuracy and interpretability in solving APGDs.
On Hofstadter's G-sequence
We characterize the entries of Hofstadter's G-sequence in terms of the lower and upper Wythoff sequences. This can be used to give a short and comprehensive proof of the equality of Hofstadter's G-sequence and the sequence of averages of the swapped Wythoff sequences. In a second part we give some new results that hold when one replaces the golden mean by other quadratic algebraic numbers.
ML-driven Hardware Cost Model for MLIR
During early optimization passes, compilers must make predictions for machine-dependent characteristics such as execution unit utilization, number of register spills, latency, throughput etc. to generate better code. Often a hand-written static/analytical hardware cost model is built into the compiler. However, the need for more sophisticated and varied predictions has become more pronounced with the development of deep learning compilers which need to optimize dataflow graphs. Such compilers usually employ a much higher level MLIR form as an IR representation before lowering to traditional LLVM-IR. A static/analytical cost model in such a scenario is cumbersome and error prone as the opcodes represent very high level algebraic/arithmetic operations. Hence, we develop a machine learning-based cost model for high-level MLIR which can predict different target variables of interest such as CPU/GPU/xPU utilization, instructions executed, register usage etc. By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research to help predict hardware characteristics more accurately. We expect such precise ML-driven hardware cost models to guide our deep learning compiler in graph level optimizations around operator fusion, local memory allocation, kernel scheduling etc. as well as in many kernel-level optimizations such as loop interchange, LICM and unroll. We report early work-in -progress results of developing such models on high-level MLIR representing dataflow graphs emitted by Pytorch/Tensorflow-like frameworks as well as lower-level dialects like affine. We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest and can be a go-to mechanism for hardware cost modelling in the future.
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning
Recently, deep learning models have made great progress in MWP solving on answer accuracy. However, they are uninterpretable since they mainly rely on shallow heuristics to achieve high performance without understanding and reasoning the grounded math logic. To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11,495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation. Different from existing MWP datasets, our InterMWP benchmark asks for a solver to not only output the solution expressions but also predict the corresponding logical formulas. We further propose a novel approach with logical prompt and interpretation generation, called LogicSolver. For each MWP, our LogicSolver first retrieves some highly-correlated algebraic knowledge and then passes them to the backbone model as prompts to improve the semantic representations of MWPs. With these improved semantic representations, our LogicSolver generates corresponding solution expressions and interpretable knowledge formulas in accord with the generated solution expressions, simultaneously. Experimental results show that our LogicSolver has stronger logical formula-based interpretability than baselines while achieving higher answer accuracy with the help of logical prompts, simultaneously. The source code and dataset is available at https://github.com/yangzhch6/InterMWP.
On Expressivity and Trainability of Quadratic Networks
Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. Practically, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective training strategy referred to as ReLinear to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and confirm the performance of quadratic deep learning. We have shared our code in https://github.com/FengleiFan/ReLinear.
High-performance symbolic-numerics via multiple dispatch
As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation.
A Categorical Framework for Learning Generalised Tree Automata
Automata learning is a popular technique used to automatically construct an automaton model from queries. Much research went into devising ad hoc adaptations of algorithms for different types of automata. The CALF project seeks to unify these using category theory in order to ease correctness proofs and guide the design of new algorithms. In this paper, we extend CALF to cover learning of algebraic structures that may not have a coalgebraic presentation. Furthermore, we provide a detailed algorithmic account of an abstract version of the popular L* algorithm, which was missing from CALF. We instantiate the abstract theory to a large class of Set functors, by which we recover for the first time practical tree automata learning algorithms from an abstract framework and at the same time obtain new algorithms to learn algebras of quotiented polynomial functors.
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models
The performance of large language models (LLMs) on existing reasoning benchmarks has significantly improved over the past years. In response, we present JEEBench, a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. We curate 515 challenging pre-engineering mathematics, physics and chemistry problems from the highly competitive IIT JEE-Advanced exam. Long-horizon reasoning on top of deep in-domain knowledge is essential for solving problems in this benchmark. Our evaluation on various open-source and proprietary models reveals that the highest performance, even after using techniques like self-consistency, self-refinement and chain-of-thought prompting, is less than 40%. The typical failure modes of GPT-4, the best model, are errors in algebraic manipulation, difficulty in grounding abstract concepts into mathematical equations accurately and failure in retrieving relevant domain-specific concepts. We also observe that by mere prompting, GPT-4 is unable to assess risk introduced by negative marking for incorrect answers. For this, we develop a post-hoc confidence-thresholding method over self-consistency, which enables effective response selection. We hope that our challenging benchmark will guide future re-search in problem-solving using LLMs.
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving geometric reasoning, algebraic problem-solving, and counting. These limitations stem from difficulties effectively integrating multiple modalities and accurately interpreting geometry-related tasks. Various works claim that introducing a captioning pipeline before VQA tasks enhances performance. We incorporated this pipeline for tasks involving geometry, algebra, and counting. We found that captioning results are not generalizable, specifically with larger VLMs primarily trained on downstream QnA tasks showing random performance on math-related challenges. However, we present a promising alternative: task-based prompting, enriching the prompt with task-specific guidance. This approach shows promise and proves more effective than direct captioning methods for math-heavy problems.
Meaning Representations from Trajectories in Autoregressive Models
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models. Our code is available at: https://github.com/tianyu139/meaning-as-trajectories
IterLara: A Turing Complete Algebra for Big Data, AI, Scientific Computing, and Database
Lara is a key-value algebra that aims at unifying linear and relational algebra with three types of operation abstraction. The study of Lara's expressive ability reports that it can represent relational algebra and most linear algebra operations. However, several essential computations, such as matrix inversion and determinant, cannot be expressed in Lara. Lara cannot represent global and iterative computation, either. This article proposes IterLara, extending Lara with iterative operators, to provide an algebraic model that unifies operations in general-purpose computing, like big data, AI, scientific computing, and database. We study the expressive ability of Lara and IterLara and prove that IterLara with aggregation functions can represent matrix inversion, determinant. Besides, we demonstrate that IterLara with no limitation of function utility is Turing complete. We also propose the Operation Count (OP) as a metric of computation amount for IterLara and ensure that the OP metric is in accordance with the existing computation metrics.
An Evaluation Dataset for Legal Word Embedding: A Case Study On Chinese Codex
Word embedding is a modern distributed word representations approach widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-\`a-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.
Poincaré Embeddings for Learning Hierarchical Representations
Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
Evaluating Language Models for Mathematics through Interactions
The standard methodology of evaluating large language models (LLMs) based on static pairs of inputs and outputs is insufficient for developing assistants: this kind of assessments fails to take into account the essential interactive element in their deployment, and therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models~(InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analysing MathConverse, we derive a preliminary taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, amongst other findings. Further, we identify useful scenarios and existing issues of GPT-4 in mathematical reasoning through a series of case studies contributed by expert mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models which communicate uncertainty, respond well to user corrections, are more interpretable and concise may constitute better assistants; interactive evaluation is a promising way to continually navigate the capability of these models; humans should be aware of language models' algebraic fallibility, and for that reason discern where they should be used.
A Probability Monad as the Colimit of Spaces of Finite Samples
We define and study a probability monad on the category of complete metric spaces and short maps. It assigns to each space the space of Radon probability measures on it with finite first moment, equipped with the Kantorovich-Wasserstein distance. This monad is analogous to the Giry monad on the category of Polish spaces, and it extends a construction due to van Breugel for compact and for 1-bounded complete metric spaces. We prove that this Kantorovich monad arises from a colimit construction on finite power-like constructions, which formalizes the intuition that probability measures are limits of finite samples. The proof relies on a criterion for when an ordinary left Kan extension of lax monoidal functors is a monoidal Kan extension. The colimit characterization allows the development of integration theory and the treatment of measures on spaces of measures, without measure theory. We also show that the category of algebras of the Kantorovich monad is equivalent to the category of closed convex subsets of Banach spaces with short affine maps as morphisms.
Specialization maps for Scholze's category of diamonds
We introduce the specialization map in Scholzes theory of diamonds. We consider v-sheaves that behave like formal schemes and call them kimberlites. We attach to them: a reduced special fiber, an analytic locus, a specialization map, a Zariski site, and an etale site. When the kimberlite comes from a formal scheme, our sites recover the classical ones. We prove that unramified p-adic Beilinson--Drinfeld Grassmannians are kimberlites with finiteness and normality properties.
Topological data analysis on noisy quantum computers
Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational problems. Indeed, TDA has been purported to be one such problem, yet, quantum computing algorithms proposed for the problem, such as the original Quantum TDA (QTDA) formulation by Lloyd, Garnerone and Zanardi, require fault-tolerance qualifications that are currently unavailable. In this study, we present NISQ-TDA, a fully implemented end-to-end quantum machine learning algorithm needing only a short circuit-depth, that is applicable to high-dimensional classical data, and with provable asymptotic speedup for certain classes of problems. The algorithm neither suffers from the data-loading problem nor does it need to store the input data on the quantum computer explicitly. The algorithm was successfully executed on quantum computing devices, as well as on noisy quantum simulators, applied to small datasets. Preliminary empirical results suggest that the algorithm is robust to noise.
Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation
The study of universal approximation properties (UAP) for neural networks (NN) has a long history. When the network width is unlimited, only a single hidden layer is sufficient for UAP. In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width w^*_{min}=max(d_x,d_y), where d_x and d_y are the dimensions of the input and output, respectively. Recently, cai2022achieve shows that a leaky-ReLU NN with this critical width can achieve UAP for L^p functions on a compact domain K, i.e., the UAP for L^p(K,R^{d_y}). This paper examines a uniform UAP for the function class C(K,R^{d_y}) and gives the exact minimum width of the leaky-ReLU NN as w_{min}=max(d_x+1,d_y)+1_{d_y=d_x+1}, which involves the effects of the output dimensions. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.
Holistic Geometric Feature Learning for Structured Reconstruction
The inference of topological principles is a key problem in structured reconstruction. We observe that wrongly predicted topological relationships are often incurred by the lack of holistic geometry clues in low-level features. Inspired by the fact that massive signals can be compactly described with frequency analysis, we experimentally explore the efficiency and tendency of learning structure geometry in the frequency domain. Accordingly, we propose a frequency-domain feature learning strategy (F-Learn) to fuse scattered geometric fragments holistically for topology-intact structure reasoning. Benefiting from the parsimonious design, the F-Learn strategy can be easily deployed into a deep reconstructor with a lightweight model modification. Experiments demonstrate that the F-Learn strategy can effectively introduce structure awareness into geometric primitive detection and topology inference, bringing significant performance improvement to final structured reconstruction. Code and pre-trained models are available at https://github.com/Geo-Tell/F-Learn.
Lie Group Decompositions for Equivariant Neural Networks
Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest G, the exponential map may not be surjective. Further limitations are encountered when G is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups G = GL^{+}(n, R) and G = SL(n, R), as well as their representation as affine transformations R^{n} rtimes G. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals.
Topological Components in a Community Currency Network
Transaction data from digital payment systems can be used to study economic processes at such a detail that was not possible previously. Here, we analyse the data from Sarafu token network, a community inclusion currency in Kenya. During the COVID-19 emergency, the Sarafu was disbursed as part of a humanitarian aid project. In this work, the transactions are analysed using network science. A topological categorisation is defined to identify cyclic and acyclic components. Furthermore, temporal aspects of circulation taking place within these components are considered. The significant presence of different types of strongly connected components as compared to randomized null models shows the importance of cycles in this economic network. Especially, indicating their key role in currency recirculation. In some acyclic components, the most significant triad suggests the presence of a group of users collecting currency from accounts active only once, hinting at a misuse of the system. In some other acyclic components, small isolated groups of users were active only once, suggesting the presence of users only interested in trying out the system. The methods used in this paper can answer specific questions related to user activities, currency design, and assessment of monetary interventions. Our methodology provides a general quantitative tool for analysing the behaviour of users in a currency network.
Generating logical magic states with the aid of non-Abelian topological order
In fault-tolerant quantum computing with the surface code, non-Clifford gates are crucial for universal computation. However, implementing these gates using methods like magic state distillation and code switching requires significant resources. In this work, we propose a new protocol that combines magic state preparation and code switching to realize logical non-Clifford operations with the potential for fault tolerance. Our approach begins with a special logical state in the Z_4 surface code. By applying a sequence of transformations, the system goes through different topological codes, including the non-Abelian D_4 quantum double model. This process ultimately produces a magic state in a condensed Z_2 surface code, which enables the implementation of a logical T gate in the standard Z_2 surface code. In our analysis, we employ a framework where the topological codes are represented by their topological orders and all the transformations are considered as topological manipulations such as gauging symmetries and condensing anyons. This perspective is particularly useful for understanding code switching between topological codes.
Elliptic genera of two-dimensional N=2 gauge theories with rank-one gauge groups
We compute the elliptic genera of two-dimensional N=(2,2) and N=(0,2) gauged linear sigma models via supersymmetric localization, for rank-one gauge groups. The elliptic genus is expressed as a sum over residues of a meromorphic function whose argument is the holonomy of the gauge field along both the spatial and the temporal directions of the torus. We illustrate our formulas by a few examples including the quintic Calabi-Yau, N=(2,2) SU(2) and O(2) gauge theories coupled to N fundamental chiral multiplets, and a geometric N=(0,2) model.
Visualizing Large-scale and High-dimensional Data
We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to large-scale and high-dimensional data (e.g., millions of data points and hundreds of dimensions). We propose the LargeVis, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to t-SNE, LargeVis significantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure thus easily scales to millions of high-dimensional data points. Experimental results on real-world data sets demonstrate that the LargeVis outperforms the state-of-the-art methods in both efficiency and effectiveness. The hyper-parameters of LargeVis are also much more stable over different data sets.