MilesCranmer commited on
Commit
012bfcc
1 Parent(s): 7fb9d91

Greatly improve readme

Browse files
Files changed (2) hide show
  1. README.md +65 -70
  2. eureqa.py +29 -58
README.md CHANGED
@@ -11,74 +11,17 @@ For python, you need to have Python 3, numpy, and pandas installed.
11
 
12
  ## Running:
13
 
14
- You can either call the program by calling the `eureqa` function from `eureqa.py`,
15
- or execute the program from the command line with, for example:
16
- ```bash
17
- python eureqa.py --threads 8 --binary-operators plus mult pow --npop 200
18
- ```
19
-
20
- Here is the full list of arguments:
21
- ```
22
- usage: eureqa.py [-h] [--threads THREADS] [--parsimony PARSIMONY]
23
- [--alpha ALPHA] [--maxsize MAXSIZE]
24
- [--niterations NITERATIONS] [--npop NPOP]
25
- [--ncyclesperiteration NCYCLESPERITERATION] [--topn TOPN]
26
- [--fractionReplacedHof FRACTIONREPLACEDHOF]
27
- [--fractionReplaced FRACTIONREPLACED] [--migration MIGRATION]
28
- [--hofMigration HOFMIGRATION]
29
- [--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS]
30
- [--annealing ANNEALING] [--equation_file EQUATION_FILE]
31
- [--test TEST]
32
- [--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]]
33
- [--unary-operators UNARY_OPERATORS]
34
-
35
- optional arguments:
36
- -h, --help show this help message and exit
37
- --threads THREADS Number of threads (default: 4)
38
- --parsimony PARSIMONY
39
- How much to punish complexity (default: 0.001)
40
- --alpha ALPHA Scaling of temperature (default: 10)
41
- --maxsize MAXSIZE Max size of equation (default: 20)
42
- --niterations NITERATIONS
43
- Number of total migration periods (default: 20)
44
- --npop NPOP Number of members per population (default: 100)
45
- --ncyclesperiteration NCYCLESPERITERATION
46
- Number of evolutionary cycles per migration (default:
47
- 5000)
48
- --topn TOPN How many best species to distribute from each
49
- population (default: 10)
50
- --fractionReplacedHof FRACTIONREPLACEDHOF
51
- Fraction of population to replace with hall of fame
52
- (default: 0.1)
53
- --fractionReplaced FRACTIONREPLACED
54
- Fraction of population to replace with best from other
55
- populations (default: 0.1)
56
- --migration MIGRATION
57
- Whether to migrate (default: True)
58
- --hofMigration HOFMIGRATION
59
- Whether to have hall of fame migration (default: True)
60
- --shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS
61
- Whether to use classical optimization on constants
62
- before every migration (doesn't impact performance
63
- that much) (default: True)
64
- --annealing ANNEALING
65
- Whether to use simulated annealing (default: True)
66
- --equation_file EQUATION_FILE
67
- File to dump best equations to (default:
68
- hall_of_fame.csv)
69
- --test TEST Which test to run (default: simple1)
70
- --binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]
71
- Binary operators. Make sure they are defined in
72
- operators.jl (default: ['plus', 'mult'])
73
- --unary-operators UNARY_OPERATORS
74
- Unary operators. Make sure they are defined in
75
- operators.jl (default: ['exp', 'sin', 'cos'])
76
- ```
77
-
78
-
79
-
80
 
81
- ## Modification
 
 
 
 
82
 
83
  You can add more operators in `operators.jl`, or use default
84
  Julia ones. Make sure all operators are defined for scalar `Float32`.
@@ -86,9 +29,61 @@ Then just specify the operator names in your call, as above.
86
  You can also change the dataset learned on by passing in `X` and `y` as
87
  numpy arrays to `eureqa(...)`.
88
 
89
- One can also adjust the relative probabilities of each mutation operation
90
- with the `weight...` parameters to `eureqa(...).
91
- inside `eureqa.jl`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  # TODO
94
 
 
11
 
12
  ## Running:
13
 
14
+ What follows is the API reference for running the numpy interface.
15
+ Note that nearly all parameters here
16
+ have been tuned with ~1000 trials over several example
17
+ equations. However, you should adjust `threads`, `niterations`,
18
+ `binary_operators`, `unary_operators` to your requirements.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ The program will output a pandas DataFrame containing the equations,
21
+ mean square error, and complexity. It will also dump to a csv
22
+ at the end of every iteration,
23
+ which is `hall_of_fame.csv` by default. It also prints the
24
+ equations to stdout.
25
 
26
  You can add more operators in `operators.jl`, or use default
27
  Julia ones. Make sure all operators are defined for scalar `Float32`.
 
29
  You can also change the dataset learned on by passing in `X` and `y` as
30
  numpy arrays to `eureqa(...)`.
31
 
32
+ ```python
33
+ eureqa(X=None, y=None, threads=4, niterations=20, ncyclesperiteration=int(default_ncyclesperiteration), binary_operators=["plus", "mult"], unary_operators=["cos", "exp", "sin"], alpha=default_alpha, annealing=True, fractionReplaced=default_fractionReplaced, fractionReplacedHof=default_fractionReplacedHof, npop=int(default_npop), parsimony=default_parsimony, migration=True, hofMigration=True, shouldOptimizeConstants=True, topn=int(default_topn), weightAddNode=default_weightAddNode, weightDeleteNode=default_weightDeleteNode, weightDoNothing=default_weightDoNothing, weightMutateConstant=default_weightMutateConstant, weightMutateOperator=default_weightMutateOperator, weightRandomize=default_weightRandomize, weightSimplify=default_weightSimplify, timeout=None, equation_file='hall_of_fame.csv', test='simple1', maxsize=20)
34
+ ```
35
+
36
+ Run symbolic regression to fit f(X[i, :]) ~ y[i] for all i.
37
+
38
+ **Arguments**:
39
+
40
+ - `X`: np.ndarray, 2D array. Rows are examples, columns are features.
41
+ - `y`: np.ndarray, 1D array. Rows are examples.
42
+ - `threads`: int, Number of threads (=number of populations running).
43
+ You can have more threads than cores - it actually makes it more
44
+ efficient.
45
+ - `niterations`: int, Number of iterations of the algorithm to run. The best
46
+ equations are printed, and migrate between populations, at the
47
+ end of each.
48
+ - `ncyclesperiteration`: int, Number of total mutations to run, per 10
49
+ samples of the population, per iteration.
50
+ - `binary_operators`: list, List of strings giving the binary operators
51
+ in Julia's Base, or in `operator.jl`.
52
+ - `unary_operators`: list, Same but for operators taking a single `Float32`.
53
+ - `alpha`: float, Initial temperature.
54
+ - `annealing`: bool, Whether to use annealing. You should (and it is default).
55
+ - `fractionReplaced`: float, How much of population to replace with migrating
56
+ equations from other populations.
57
+ - `fractionReplacedHof`: float, How much of population to replace with migrating
58
+ equations from hall of fame.
59
+ - `npop`: int, Number of individuals in each population
60
+ - `parsimony`: float, Multiplicative factor for how much to punish complexity.
61
+ - `migration`: bool, Whether to migrate.
62
+ - `hofMigration`: bool, Whether to have the hall of fame migrate.
63
+ - `shouldOptimizeConstants`: bool, Whether to numerically optimize
64
+ constants (Nelder-Mead/Newton) at the end of each iteration.
65
+ - `topn`: int, How many top individuals migrate from each population.
66
+ - `weightAddNode`: float, Relative likelihood for mutation to add a node
67
+ - `weightDeleteNode`: float, Relative likelihood for mutation to delete a node
68
+ - `weightDoNothing`: float, Relative likelihood for mutation to leave the individual
69
+ - `weightMutateConstant`: float, Relative likelihood for mutation to change
70
+ the constant slightly in a random direction.
71
+ - `weightMutateOperator`: float, Relative likelihood for mutation to swap
72
+ an operator.
73
+ - `weightRandomize`: float, Relative likelihood for mutation to completely
74
+ delete and then randomly generate the equation
75
+ - `weightSimplify`: float, Relative likelihood for mutation to simplify
76
+ constant parts by evaluation
77
+ - `timeout`: float, Time in seconds to timeout search
78
+ - `equation_file`: str, Where to save the files (.csv separated by |)
79
+ - `test`: str, What test to run, if X,y not passed.
80
+ - `maxsize`: int, Max size of an equation.
81
+
82
+ **Returns**:
83
+
84
+ pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
85
+ (as strings).
86
+
87
 
88
  # TODO
89
 
eureqa.py CHANGED
@@ -56,78 +56,49 @@ def eureqa(X=None, y=None, threads=4,
56
  equations, but you should adjust `threads`, `niterations`,
57
  `binary_operators`, `unary_operators` to your requirements.
58
 
59
- :param X: 2D array. Rows are examples, columns are features.
60
- :type X: np.ndarray, optional
61
- :param y: 1D array. Rows are examples.
62
- :type y: np.ndarray, optional
63
- :param threads: Number of threads (=number of populations running).
64
  You can have more threads than cores - it actually makes it more
65
  efficient.
66
- :type threads: int, optional
67
- :param niterations: Number of iterations of the algorithm to run. The best
68
  equations are printed, and migrate between populations, at the
69
  end of each.
70
- :type niterations: int, optional
71
- :param ncyclesperiteration: Number of total mutations to run, per 10
72
  samples of the population, per iteration.
73
- :type ncyclesperiteration: int, optional
74
- :param binary_operators: List of strings giving the binary operators
75
  in Julia's Base, or in `operator.jl`.
76
- :type binary_operators: list, optional
77
- :param unary_operators: Same but for operators taking a single `Float32`.
78
- :type unary_operators: list, optional
79
- :param alpha: Initial temperature.
80
- :type alpha: float, optional
81
- :param annealing: Whether to use annealing. You should (and it is default).
82
- :type annealing: bool, optional
83
- :param fractionReplaced: How much of population to replace with migrating
84
  equations from other populations.
85
- :type fractionReplaced: float, optional
86
- :param fractionReplacedHof: How much of population to replace with migrating
87
  equations from hall of fame.
88
- :type fractionReplacedHof: float, optional
89
- :param npop: Number of individuals in each population
90
- :type npop: int, optional
91
- :param parsimony: Multiplicative factor for how much to punish complexity.
92
- :type parsimony: float, optional
93
- :param migration: Whether to migrate.
94
- :type migration: bool, optional
95
- :param hofMigration: Whether to have the hall of fame migrate.
96
- :type hofMigration: bool, optional
97
- :param shouldOptimizeConstants: Whether to numerically optimize
98
  constants (Nelder-Mead/Newton) at the end of each iteration.
99
- :type shouldOptimizeConstants: bool, optional
100
- :param topn: How many top individuals migrate from each population.
101
- :type topn: int, optional
102
- :param weightAddNode: Relative likelihood for mutation to add a node
103
- :type weightAddNode: float, optional
104
- :param weightDeleteNode: Relative likelihood for mutation to delete a node
105
- :type weightDeleteNode: float, optional
106
- :param weightDoNothing: Relative likelihood for mutation to leave the individual
107
- :type weightDoNothing: float, optional
108
- :param weightMutateConstant: Relative likelihood for mutation to change
109
  the constant slightly in a random direction.
110
- :type weightMutateConstant: float, optional
111
- :param weightMutateOperator: Relative likelihood for mutation to swap
112
  an operator.
113
- :type weightMutateOperator: float, optional
114
- :param weightRandomize: Relative likelihood for mutation to completely
115
  delete and then randomly generate the equation
116
- :type weightRandomize: float, optional
117
- :param weightSimplify: Relative likelihood for mutation to simplify
118
  constant parts by evaluation
119
- :type weightSimplify: float, optional
120
- :param timeout: Time in seconds to timeout search
121
- :type timeout: float, optional
122
- :param equation_file: Where to save the files (.csv separated by |)
123
- :type equation_file: str, optional
124
- :param test: What test to run, if X,y not passed.
125
- :type test: str, optional
126
- :param maxsize: Max size of an equation.
127
- :type maxsize: int, optional
128
- :returns: Results dataframe, giving complexity, MSE, and equations
129
  (as strings).
130
- :rtype: pd.DataFrame
131
 
132
  """
133
 
 
56
  equations, but you should adjust `threads`, `niterations`,
57
  `binary_operators`, `unary_operators` to your requirements.
58
 
59
+ :param X: np.ndarray, 2D array. Rows are examples, columns are features.
60
+ :param y: np.ndarray, 1D array. Rows are examples.
61
+ :param threads: int, Number of threads (=number of populations running).
 
 
62
  You can have more threads than cores - it actually makes it more
63
  efficient.
64
+ :param niterations: int, Number of iterations of the algorithm to run. The best
 
65
  equations are printed, and migrate between populations, at the
66
  end of each.
67
+ :param ncyclesperiteration: int, Number of total mutations to run, per 10
 
68
  samples of the population, per iteration.
69
+ :param binary_operators: list, List of strings giving the binary operators
 
70
  in Julia's Base, or in `operator.jl`.
71
+ :param unary_operators: list, Same but for operators taking a single `Float32`.
72
+ :param alpha: float, Initial temperature.
73
+ :param annealing: bool, Whether to use annealing. You should (and it is default).
74
+ :param fractionReplaced: float, How much of population to replace with migrating
 
 
 
 
75
  equations from other populations.
76
+ :param fractionReplacedHof: float, How much of population to replace with migrating
 
77
  equations from hall of fame.
78
+ :param npop: int, Number of individuals in each population
79
+ :param parsimony: float, Multiplicative factor for how much to punish complexity.
80
+ :param migration: bool, Whether to migrate.
81
+ :param hofMigration: bool, Whether to have the hall of fame migrate.
82
+ :param shouldOptimizeConstants: bool, Whether to numerically optimize
 
 
 
 
 
83
  constants (Nelder-Mead/Newton) at the end of each iteration.
84
+ :param topn: int, How many top individuals migrate from each population.
85
+ :param weightAddNode: float, Relative likelihood for mutation to add a node
86
+ :param weightDeleteNode: float, Relative likelihood for mutation to delete a node
87
+ :param weightDoNothing: float, Relative likelihood for mutation to leave the individual
88
+ :param weightMutateConstant: float, Relative likelihood for mutation to change
 
 
 
 
 
89
  the constant slightly in a random direction.
90
+ :param weightMutateOperator: float, Relative likelihood for mutation to swap
 
91
  an operator.
92
+ :param weightRandomize: float, Relative likelihood for mutation to completely
 
93
  delete and then randomly generate the equation
94
+ :param weightSimplify: float, Relative likelihood for mutation to simplify
 
95
  constant parts by evaluation
96
+ :param timeout: float, Time in seconds to timeout search
97
+ :param equation_file: str, Where to save the files (.csv separated by |)
98
+ :param test: str, What test to run, if X,y not passed.
99
+ :param maxsize: int, Max size of an equation.
100
+ :returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
 
 
 
 
 
101
  (as strings).
 
102
 
103
  """
104