Post
1085
Universal Dataset to Test, Enhance and Benchmark AI Algorithms https://mltblog.com/4ia7r2D
This scientific research has three components. First, my most recent advances towards solving one of the most famous, multi-century old conjectures in number theory. One that kids in elementary school can understand, yet incredibly hard to prove. At the very core, it is about the spectacular quantum dynamics of the digit sum function.
Then, I present an infinite dataset that has all the patterns you or AI can imagine, and much more, ranging from obvious to undetectable. More specifically, it is an infinite number of infinite datasets all in tabular format, with various degrees of auto- and cross-correlations (short and long range) to test, enhance and benchmark AI algorithms including LLMs. It is based on the physics of the digit sum function and linked to the aforementioned conjecture. This synthetic data of its own kind is useful in context such as fraud detection or cybersecurity.
Finally, it comes with very efficient Python code to generate the data, involving gigantic numbers and high precision arithmetic.
➡️ Read article and learn how to use and generate dataset, at https://mltblog.com/4ia7r2D
This scientific research has three components. First, my most recent advances towards solving one of the most famous, multi-century old conjectures in number theory. One that kids in elementary school can understand, yet incredibly hard to prove. At the very core, it is about the spectacular quantum dynamics of the digit sum function.
Then, I present an infinite dataset that has all the patterns you or AI can imagine, and much more, ranging from obvious to undetectable. More specifically, it is an infinite number of infinite datasets all in tabular format, with various degrees of auto- and cross-correlations (short and long range) to test, enhance and benchmark AI algorithms including LLMs. It is based on the physics of the digit sum function and linked to the aforementioned conjecture. This synthetic data of its own kind is useful in context such as fraud detection or cybersecurity.
Finally, it comes with very efficient Python code to generate the data, involving gigantic numbers and high precision arithmetic.
➡️ Read article and learn how to use and generate dataset, at https://mltblog.com/4ia7r2D