Spaces:

MilesCranmer
/

PySR

Running

App Files Files Community

PySR / README.md

MilesCranmer

Installation instructions

dc9d777 over 4 years ago

preview code

raw

history blame

5.87 kB

	# Eureqa.jl

	Symbolic regression built on Julia, and interfaced by Python.
	Uses regularized evolution and simulated annealing.

	## Installation

	Install [Julia](https://julialang.org/downloads/). Then, at the command line,
	install the `Optim` package via: `julia -e 'import Pkg; Pkg.add("Optim")'`.
	For python, you need to have Python 3, numpy, and pandas installed.

	## Running:

	You can either call the program by calling the `eureqa` function from `eureqa.py`,
	or execute the program from the command line with, for example:
	```bash
	python eureqa.py --threads 8 --binary-operators plus mult pow --npop 200
	```

	Here is the full list of arguments:
	```
	usage: eureqa.py [-h] [--threads THREADS] [--parsimony PARSIMONY]
	[--alpha ALPHA] [--maxsize MAXSIZE]
	[--niterations NITERATIONS] [--npop NPOP]
	[--ncyclesperiteration NCYCLESPERITERATION] [--topn TOPN]
	[--fractionReplacedHof FRACTIONREPLACEDHOF]
	[--fractionReplaced FRACTIONREPLACED] [--migration MIGRATION]
	[--hofMigration HOFMIGRATION]
	[--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS]
	[--annealing ANNEALING] [--equation_file EQUATION_FILE]
	[--test TEST]
	[--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]]
	[--unary-operators UNARY_OPERATORS]

	optional arguments:
	-h, --help show this help message and exit
	--threads THREADS Number of threads (default: 4)
	--parsimony PARSIMONY
	How much to punish complexity (default: 0.001)
	--alpha ALPHA Scaling of temperature (default: 10)
	--maxsize MAXSIZE Max size of equation (default: 20)
	--niterations NITERATIONS
	Number of total migration periods (default: 20)
	--npop NPOP Number of members per population (default: 100)
	--ncyclesperiteration NCYCLESPERITERATION
	Number of evolutionary cycles per migration (default:
	5000)
	--topn TOPN How many best species to distribute from each
	population (default: 10)
	--fractionReplacedHof FRACTIONREPLACEDHOF
	Fraction of population to replace with hall of fame
	(default: 0.1)
	--fractionReplaced FRACTIONREPLACED
	Fraction of population to replace with best from other
	populations (default: 0.1)
	--migration MIGRATION
	Whether to migrate (default: True)
	--hofMigration HOFMIGRATION
	Whether to have hall of fame migration (default: True)
	--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS
	Whether to use classical optimization on constants
	before every migration (doesn't impact performance
	that much) (default: True)
	--annealing ANNEALING
	Whether to use simulated annealing (default: True)
	--equation_file EQUATION_FILE
	File to dump best equations to (default:
	hall_of_fame.csv)
	--test TEST Which test to run (default: simple1)
	--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]
	Binary operators. Make sure they are defined in
	operators.jl (default: ['plus', 'mul'])
	--unary-operators UNARY_OPERATORS
	Unary operators. Make sure they are defined in
	operators.jl (default: ['exp', 'sin', 'cos'])
	```




	## Modification

	You can add more operators in `operators.jl`, or use default
	Julia ones. Make sure all operators are defined for scalar `Float32`.
	Then just specify the operator names in your call, as above.
	You can also change the dataset learned on by passing in `X` and `y` as
	numpy arrays to `eureqa(...)`.

	One can also adjust the relative probabilities of each operation here,
	inside `eureqa.jl`:
	```julia
	weights = [8, 1, 1, 1, 0.1, 0.5, 2]
	```
	for:

	1. Perturb constant
	2. Mutate operator
	3. Append a node
	4. Delete a subtree
	5. Simplify equation
	6. Randomize completely
	7. Do nothing


	# TODO

	- [ ] Hyperparameter tune
	- [ ] Add interface for either defining an operation to learn, or loading in arbitrary dataset.
	- Could just write out the dataset in julia, or load it.
	- [ ] Add mutation for constant<->variable
	- [ ] Create a benchmark for accuracy
	- [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
	- [ ] Performance:
	- [ ] Use an enum for functions instead of storing them?
	- Current most expensive operations:
	- [x] deepcopy() before the mutate, to see whether to accept or not.
	- Seems like its necessary right now. But still by far the slowest option.
	- [ ] Calculating the loss function - there is duplicate calculations happening.
	- [ ] Declaration of the weights array every iteration
	- [x] Create a Python interface
	- [x] Explicit constant optimization on hall-of-fame
	- Create method to find and return all constants, from left to right
	- Create method to find and set all constants, in same order
	- Pull up some optimization algorithm and add it. Keep the package small!
	- [x] Create a benchmark for speed
	- [x] Simplify subtrees with only constants beneath them. Or should I? Maybe randomly simplify sometimes?
	- [x] Record hall of fame
	- [x] Optionally (with hyperparameter) migrate the hall of fame, rather than current bests
	- [x] Test performance of reduced precision integers
	- No effect
	- [x] Create struct to pass through all hyperparameters, instead of treating as constants
	- Make sure doesn't affect performance