MilesCranmer commited on
Commit
6f11ae4
·
1 Parent(s): a4eb420

Update docstring on README

Browse files
Files changed (1) hide show
  1. README.md +23 -4
README.md CHANGED
@@ -196,17 +196,24 @@ which is `hall_of_fame.csv` by default. It also prints the
196
  equations to stdout.
197
 
198
  ```python
199
- pysr(X=None, y=None, weights=None, procs=4, niterations=100, ncyclesperiteration=300, binary_operators=["plus", "mult"], unary_operators=["cos", "exp", "sin"], alpha=0.1, annealing=True, fractionReplaced=0.10, fractionReplacedHof=0.10, npop=1000, parsimony=1e-4, migration=True, hofMigration=True, shouldOptimizeConstants=True, topn=10, weightAddNode=1, weightInsertNode=3, weightDeleteNode=3, weightDoNothing=1, weightMutateConstant=10, weightMutateOperator=1, weightRandomize=1, weightSimplify=0.01, perturbationFactor=1.0, nrestarts=3, timeout=None, equation_file='hall_of_fame.csv', test='simple1', verbosity=1e9, maxsize=20)
200
  ```
201
 
202
  Run symbolic regression to fit f(X[i, :]) ~ y[i] for all i.
 
 
 
203
 
204
  **Arguments**:
205
 
206
- - `X`: np.ndarray, 2D array. Rows are examples, columns are features.
 
 
207
  - `y`: np.ndarray, 1D array. Rows are examples.
208
- - `weights`: np.ndarray, 1D array. Same shape as `y`. Optional weighted sum (e.g., 1/error^2).
209
- - `procs`: int, Number of processes running (=number of populations running).
 
 
210
  - `niterations`: int, Number of iterations of the algorithm to run. The best
211
  equations are printed, and migrate between populations, at the
212
  end of each.
@@ -248,6 +255,18 @@ constant parts by evaluation
248
  - `equation_file`: str, Where to save the files (.csv separated by |)
249
  - `test`: str, What test to run, if X,y not passed.
250
  - `maxsize`: int, Max size of an equation.
 
 
 
 
 
 
 
 
 
 
 
 
251
 
252
  **Returns**:
253
 
 
196
  equations to stdout.
197
 
198
  ```python
199
+ pysr(X=None, y=None, weights=None, procs=4, populations=None, niterations=100, ncyclesperiteration=300, binary_operators=["plus", "mult"], unary_operators=["cos", "exp", "sin"], alpha=0.1, annealing=True, fractionReplaced=0.10, fractionReplacedHof=0.10, npop=1000, parsimony=1e-4, migration=True, hofMigration=True, shouldOptimizeConstants=True, topn=10, weightAddNode=1, weightInsertNode=3, weightDeleteNode=3, weightDoNothing=1, weightMutateConstant=10, weightMutateOperator=1, weightRandomize=1, weightSimplify=0.01, perturbationFactor=1.0, nrestarts=3, timeout=None, extra_sympy_mappings={}, equation_file='hall_of_fame.csv', test='simple1', verbosity=1e9, maxsize=20, fast_cycle=False, maxdepth=None, variable_names=[], select_k_features=None, threads=None, julia_optimization=3)
200
  ```
201
 
202
  Run symbolic regression to fit f(X[i, :]) ~ y[i] for all i.
203
+ Note: most default parameters have been tuned over several example
204
+ equations, but you should adjust `threads`, `niterations`,
205
+ `binary_operators`, `unary_operators` to your requirements.
206
 
207
  **Arguments**:
208
 
209
+ - `X`: np.ndarray or pandas.DataFrame, 2D array. Rows are examples,
210
+ columns are features. If pandas DataFrame, the columns are used
211
+ for variable names (so make sure they don't contain spaces).
212
  - `y`: np.ndarray, 1D array. Rows are examples.
213
+ - `weights`: np.ndarray, 1D array. Each row is how to weight the
214
+ mean-square-error loss on weights.
215
+ - `procs`: int, Number of processes (=number of populations running).
216
+ - `populations`: int, Number of populations running; by default=procs.
217
  - `niterations`: int, Number of iterations of the algorithm to run. The best
218
  equations are printed, and migrate between populations, at the
219
  end of each.
 
255
  - `equation_file`: str, Where to save the files (.csv separated by |)
256
  - `test`: str, What test to run, if X,y not passed.
257
  - `maxsize`: int, Max size of an equation.
258
+ - `maxdepth`: int, Max depth of an equation. You can use both maxsize and maxdepth.
259
+ maxdepth is by default set to = maxsize, which means that it is redundant.
260
+ - `fast_cycle`: bool, (experimental) - batch over population subsamples. This
261
+ is a slightly different algorithm than regularized evolution, but does cycles
262
+ 15% faster. May be algorithmically less efficient.
263
+ - `variable_names`: list, a list of names for the variables, other
264
+ than "x0", "x1", etc.
265
+ - `select_k_features`: (None, int), whether to run feature selection in
266
+ Python using random forests, before passing to the symbolic regression
267
+ code. None means no feature selection; an int means select that many
268
+ features.
269
+ - `julia_optimization`: int, Optimization level (0, 1, 2, 3)
270
 
271
  **Returns**:
272