Spaces:
Running
Running
File size: 5,865 Bytes
cfca8a4 841d7fc 3f4ce91 cfca8a4 dc9d777 cfca8a4 d8f5888 69c3f28 fa28749 d2d05bb fa28749 69c3f28 a3a2513 69c3f28 a3a2513 69c3f28 c3d240e 907cc73 841d7fc 9d4c050 d8f5888 841d7fc d8f5888 5eeffc0 16c9195 8ff33c6 16c9195 8ff33c6 382662a 1f1f9b0 16c9195 37f71ff 16c9195 37f71ff f570496 d3ad40f f570496 a86f107 9d4c050 cedbbde 1f1f9b0 9d4c050 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# Eureqa.jl
Symbolic regression built on Julia, and interfaced by Python.
Uses regularized evolution and simulated annealing.
## Installation
Install [Julia](https://julialang.org/downloads/). Then, at the command line,
install the `Optim` package via: `julia -e 'import Pkg; Pkg.add("Optim")'`.
For python, you need to have Python 3, numpy, and pandas installed.
## Running:
You can either call the program by calling the `eureqa` function from `eureqa.py`,
or execute the program from the command line with, for example:
```bash
python eureqa.py --threads 8 --binary-operators plus mult pow --npop 200
```
Here is the full list of arguments:
```
usage: eureqa.py [-h] [--threads THREADS] [--parsimony PARSIMONY]
[--alpha ALPHA] [--maxsize MAXSIZE]
[--niterations NITERATIONS] [--npop NPOP]
[--ncyclesperiteration NCYCLESPERITERATION] [--topn TOPN]
[--fractionReplacedHof FRACTIONREPLACEDHOF]
[--fractionReplaced FRACTIONREPLACED] [--migration MIGRATION]
[--hofMigration HOFMIGRATION]
[--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS]
[--annealing ANNEALING] [--equation_file EQUATION_FILE]
[--test TEST]
[--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]]
[--unary-operators UNARY_OPERATORS]
optional arguments:
-h, --help show this help message and exit
--threads THREADS Number of threads (default: 4)
--parsimony PARSIMONY
How much to punish complexity (default: 0.001)
--alpha ALPHA Scaling of temperature (default: 10)
--maxsize MAXSIZE Max size of equation (default: 20)
--niterations NITERATIONS
Number of total migration periods (default: 20)
--npop NPOP Number of members per population (default: 100)
--ncyclesperiteration NCYCLESPERITERATION
Number of evolutionary cycles per migration (default:
5000)
--topn TOPN How many best species to distribute from each
population (default: 10)
--fractionReplacedHof FRACTIONREPLACEDHOF
Fraction of population to replace with hall of fame
(default: 0.1)
--fractionReplaced FRACTIONREPLACED
Fraction of population to replace with best from other
populations (default: 0.1)
--migration MIGRATION
Whether to migrate (default: True)
--hofMigration HOFMIGRATION
Whether to have hall of fame migration (default: True)
--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS
Whether to use classical optimization on constants
before every migration (doesn't impact performance
that much) (default: True)
--annealing ANNEALING
Whether to use simulated annealing (default: True)
--equation_file EQUATION_FILE
File to dump best equations to (default:
hall_of_fame.csv)
--test TEST Which test to run (default: simple1)
--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]
Binary operators. Make sure they are defined in
operators.jl (default: ['plus', 'mul'])
--unary-operators UNARY_OPERATORS
Unary operators. Make sure they are defined in
operators.jl (default: ['exp', 'sin', 'cos'])
```
## Modification
You can add more operators in `operators.jl`, or use default
Julia ones. Make sure all operators are defined for scalar `Float32`.
Then just specify the operator names in your call, as above.
You can also change the dataset learned on by passing in `X` and `y` as
numpy arrays to `eureqa(...)`.
One can also adjust the relative probabilities of each operation here,
inside `eureqa.jl`:
```julia
weights = [8, 1, 1, 1, 0.1, 0.5, 2]
```
for:
1. Perturb constant
2. Mutate operator
3. Append a node
4. Delete a subtree
5. Simplify equation
6. Randomize completely
7. Do nothing
# TODO
- [ ] Hyperparameter tune
- [ ] Add mutation for constant<->variable
- [ ] Create a benchmark for accuracy
- [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
- [ ] Performance:
- [ ] Use an enum for functions instead of storing them?
- Current most expensive operations:
- [x] deepcopy() before the mutate, to see whether to accept or not.
- Seems like its necessary right now. But still by far the slowest option.
- [ ] Calculating the loss function - there is duplicate calculations happening.
- [ ] Declaration of the weights array every iteration
- [x] Add interface for either defining an operation to learn, or loading in arbitrary dataset.
- Could just write out the dataset in julia, or load it.
- [x] Create a Python interface
- [x] Explicit constant optimization on hall-of-fame
- Create method to find and return all constants, from left to right
- Create method to find and set all constants, in same order
- Pull up some optimization algorithm and add it. Keep the package small!
- [x] Create a benchmark for speed
- [x] Simplify subtrees with only constants beneath them. Or should I? Maybe randomly simplify sometimes?
- [x] Record hall of fame
- [x] Optionally (with hyperparameter) migrate the hall of fame, rather than current bests
- [x] Test performance of reduced precision integers
- No effect
- [x] Create struct to pass through all hyperparameters, instead of treating as constants
- Make sure doesn't affect performance
|