MilesCranmer commited on
Commit
0dbee97
·
unverified ·
2 Parent(s): 3ef2b32 35e6ab1

Merge branch 'master' into latex-table

Browse files
Files changed (9) hide show
  1. .gitignore +1 -0
  2. Dockerfile +7 -0
  3. README.md +29 -3
  4. docs/options.md +18 -0
  5. pysr/sr.py +230 -38
  6. pysr/version.py +2 -2
  7. test/test.py +106 -6
  8. test/test_jax.py +10 -10
  9. test/test_torch.py +9 -9
.gitignore CHANGED
@@ -3,6 +3,7 @@
3
  *.csv
4
  *.csv.out*
5
  *.bkup
 
6
  performance*txt
7
  *.out
8
  trials*
 
3
  *.csv
4
  *.csv.out*
5
  *.bkup
6
+ *.pkl
7
  performance*txt
8
  *.out
9
  trials*
Dockerfile CHANGED
@@ -3,9 +3,16 @@
3
 
4
  ARG ARCH=linux/amd64
5
  ARG VERSION=latest
 
6
 
7
  FROM --platform=$ARCH julia:$VERSION
8
 
 
 
 
 
 
 
9
  # Need to use ARG after FROM, otherwise it won't get passed through.
10
  ARG PYVERSION=3.9.10
11
 
 
3
 
4
  ARG ARCH=linux/amd64
5
  ARG VERSION=latest
6
+ ARG PKGVERSION=0.9.5
7
 
8
  FROM --platform=$ARCH julia:$VERSION
9
 
10
+ # metainformation
11
+ LABEL org.opencontainers.image.version = $PKGVERSION
12
+ LABEL org.opencontainers.image.authors = "Miles Cranmer"
13
+ LABEL org.opencontainers.image.source = "https://github.com/MilesCranmer/PySR"
14
+ LABEL org.opencontainers.image.licenses = "Apache License 2.0"
15
+
16
  # Need to use ARG after FROM, otherwise it won't get passed through.
17
  ARG PYVERSION=3.9.10
18
 
README.md CHANGED
@@ -1,15 +1,24 @@
1
  [//]: # (Logo:)
2
 
3
- <img src="https://raw.githubusercontent.com/MilesCranmer/PySR/master/docs/images/pysr_logo.svg" width="400" />
4
 
 
 
5
  # PySR: High-Performance Symbolic Regression in Python
6
 
 
 
 
7
  PySR is built on an extremely optimized pure-Julia backend, and uses regularized evolution, simulated annealing, and gradient-free optimization to search for equations that fit your data.
8
 
 
 
9
  | **Docs** | **colab** | **pip** | **conda** | **Stats** |
10
  |---|---|---|---|---|
11
  |[![Documentation](https://github.com/MilesCranmer/PySR/actions/workflows/docs.yml/badge.svg)](https://astroautomata.com/PySR/)|[![Colab](https://img.shields.io/badge/colab-notebook-yellow)](https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb)|[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)|[![Conda Version](https://img.shields.io/conda/vn/conda-forge/pysr.svg)](https://anaconda.org/conda-forge/pysr)|[![Downloads](https://pepy.tech/badge/pysr)](https://badge.fury.io/py/pysr)|
12
 
 
 
13
 
14
  (pronounced like *py* as in python, and then *sur* as in surface)
15
 
@@ -17,7 +26,10 @@ If you find PySR useful, please cite it using the citation information given in
17
  If you've finished a project with PySR, please submit a PR to showcase your work on the [Research Showcase page](https://astroautomata.com/PySR/#/papers)!
18
 
19
 
20
- ### Test status:
 
 
 
21
  | **Linux** | **Windows** | **macOS (intel)** |
22
  |---|---|---|
23
  |[![Linux](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml)|[![Windows](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml)|[![macOS](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml)|
@@ -25,6 +37,8 @@ If you've finished a project with PySR, please submit a PR to showcase your work
25
  |[![Docker](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml)|[![conda-forge](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml)|[![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/PySR/badge.svg?branch=master&service=github)](https://coveralls.io/github/MilesCranmer/PySR)|
26
 
27
 
 
 
28
  Check out [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl) for
29
  the pure-Julia backend of this package.
30
 
@@ -58,10 +72,14 @@ python interface.
58
 
59
  # Installation
60
 
 
 
61
  | pip (macOS, Linux, Windows) | conda (macOS - only Intel, Linux) |
62
  |---|---|
63
  | 1. Install Julia manually (see [downloads](https://julialang.org/downloads/))<br>2. `pip install pysr`<br>3. `python -c 'import pysr; pysr.install()'` | 1. `conda install -c conda-forge pysr`<br>2. `python -c 'import pysr; pysr.install()'`|
64
 
 
 
65
  This last step will install and update the required Julia packages, including
66
  `PyCall.jl`.
67
 
@@ -144,7 +162,15 @@ This arrow in the `pick` column indicates which equation is currently selected b
144
  SymPy format (`sympy_format` - which you can also get with `model.sympy()`), and even JAX and PyTorch format
145
  (both of which are differentiable - which you can get with `model.jax()` and `model.pytorch()`).
146
 
147
- Note that `PySRRegressor` stores the state of the last search, and will restart from where you left off the next time you call `.fit()`. This will cause problems if significant changes are made to the search parameters (like changing the operators). You can run `model.reset()` to reset the state.
 
 
 
 
 
 
 
 
148
 
149
  There are several other useful features such as denoising (e.g., `denoising=True`),
150
  feature selection (e.g., `select_k_features=3`).
 
1
  [//]: # (Logo:)
2
 
3
+ <div align="center">
4
 
5
+ <img src="https://raw.githubusercontent.com/MilesCranmer/PySR/master/docs/images/pysr_logo.svg" width="200" />
6
+
7
  # PySR: High-Performance Symbolic Regression in Python
8
 
9
+ </div>
10
+
11
+
12
  PySR is built on an extremely optimized pure-Julia backend, and uses regularized evolution, simulated annealing, and gradient-free optimization to search for equations that fit your data.
13
 
14
+ <div align="center">
15
+
16
  | **Docs** | **colab** | **pip** | **conda** | **Stats** |
17
  |---|---|---|---|---|
18
  |[![Documentation](https://github.com/MilesCranmer/PySR/actions/workflows/docs.yml/badge.svg)](https://astroautomata.com/PySR/)|[![Colab](https://img.shields.io/badge/colab-notebook-yellow)](https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb)|[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)|[![Conda Version](https://img.shields.io/conda/vn/conda-forge/pysr.svg)](https://anaconda.org/conda-forge/pysr)|[![Downloads](https://pepy.tech/badge/pysr)](https://badge.fury.io/py/pysr)|
19
 
20
+ </div>
21
+
22
 
23
  (pronounced like *py* as in python, and then *sur* as in surface)
24
 
 
26
  If you've finished a project with PySR, please submit a PR to showcase your work on the [Research Showcase page](https://astroautomata.com/PySR/#/papers)!
27
 
28
 
29
+ <div align="center">
30
+
31
+ ### Test status
32
+
33
  | **Linux** | **Windows** | **macOS (intel)** |
34
  |---|---|---|
35
  |[![Linux](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml)|[![Windows](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml)|[![macOS](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml)|
 
37
  |[![Docker](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml)|[![conda-forge](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml)|[![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/PySR/badge.svg?branch=master&service=github)](https://coveralls.io/github/MilesCranmer/PySR)|
38
 
39
 
40
+ </div>
41
+
42
  Check out [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl) for
43
  the pure-Julia backend of this package.
44
 
 
72
 
73
  # Installation
74
 
75
+ <div align="center">
76
+
77
  | pip (macOS, Linux, Windows) | conda (macOS - only Intel, Linux) |
78
  |---|---|
79
  | 1. Install Julia manually (see [downloads](https://julialang.org/downloads/))<br>2. `pip install pysr`<br>3. `python -c 'import pysr; pysr.install()'` | 1. `conda install -c conda-forge pysr`<br>2. `python -c 'import pysr; pysr.install()'`|
80
 
81
+ </div>
82
+
83
  This last step will install and update the required Julia packages, including
84
  `PyCall.jl`.
85
 
 
162
  SymPy format (`sympy_format` - which you can also get with `model.sympy()`), and even JAX and PyTorch format
163
  (both of which are differentiable - which you can get with `model.jax()` and `model.pytorch()`).
164
 
165
+ Note that `PySRRegressor` stores the state of the last search, and will restart from where you left off the next time you call `.fit()`, assuming you have set `warm_start=True`.
166
+ This will cause problems if significant changes are made to the search parameters (like changing the operators). You can run `model.reset()` to reset the state.
167
+
168
+ You will notice that PySR will save two files: `hall_of_fame...csv` and `hall_of_fame...pkl`.
169
+ The csv file is a list of equations and their losses, and the pkl file is a saved state of the model.
170
+ You may load the model from the `pkl` file with:
171
+ ```python
172
+ model = PySRRegressor.from_file("hall_of_fame.2022-08-10_100832.281.pkl")
173
+ ```
174
 
175
  There are several other useful features such as denoising (e.g., `denoising=True`),
176
  feature selection (e.g., `select_k_features=3`).
docs/options.md CHANGED
@@ -16,6 +16,7 @@ may find useful include:
16
  - LaTeX, SymPy
17
  - Callable exports: numpy, pytorch, jax
18
  - `loss`
 
19
 
20
  These are described below
21
 
@@ -252,3 +253,20 @@ Can also uses these losses for weighted (weighted-average):
252
  model = PySRRegressor(..., weights=weights, loss="LPDistLoss{3}()")
253
  model.fit(..., weights=weights)
254
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - LaTeX, SymPy
17
  - Callable exports: numpy, pytorch, jax
18
  - `loss`
19
+ - Model loading
20
 
21
  These are described below
22
 
 
253
  model = PySRRegressor(..., weights=weights, loss="LPDistLoss{3}()")
254
  model.fit(..., weights=weights)
255
  ```
256
+
257
+ ## Model loading
258
+
259
+ PySR will automatically save a pickle file of the model state
260
+ when you call `model.fit`, once before the search starts,
261
+ and again after the search finishes. The filename will
262
+ have the same base name as the input file, but with a `.pkl` extension.
263
+ You can load the saved model state with:
264
+ ```python
265
+ model = PySRRegressor.from_file(pickle_filename)
266
+ ```
267
+ If you have a long-running job and would like to load the model
268
+ before completion, you can also do this. In this case, the model
269
+ loading will use the `csv` file to load the equations, since the
270
+ `csv` file is continually updated during the search. Once
271
+ the search completes, the model including its equations will
272
+ be saved to the pickle file, overwriting the existing version.
pysr/sr.py CHANGED
@@ -1,3 +1,4 @@
 
1
  import os
2
  import sys
3
  import numpy as np
@@ -8,6 +9,7 @@ import re
8
  import tempfile
9
  import shutil
10
  from pathlib import Path
 
11
  from datetime import datetime
12
  import warnings
13
  from multiprocessing import cpu_count
@@ -204,10 +206,18 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
204
  Parameters
205
  ----------
206
  model_selection : str, default="best"
207
- Model selection criterion. Can be 'accuracy' or 'best'.
208
- `"accuracy"` selects the candidate model with the lowest loss
209
- (highest accuracy). `"best"` selects the candidate model with
210
- the lowest sum of normalized loss and complexity.
 
 
 
 
 
 
 
 
211
 
212
  binary_operators : list[str], default=["+", "-", "*", "/"]
213
  List of strings giving the binary operators in Julia's Base.
@@ -468,7 +478,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
468
  Whether to use a progress bar instead of printing to stdout.
469
 
470
  equation_file : str, default=None
471
- Where to save the files (.csv separated by |).
472
 
473
  temp_equation_file : bool, default=False
474
  Whether to put the hall of fame file in the temp directory.
@@ -563,6 +573,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
563
  equation_file_contents_ : list[pandas.DataFrame]
564
  Contents of the equation file output by the Julia backend.
565
 
 
 
 
566
  Notes
567
  -----
568
  Most default parameters have been tuned over several example equations,
@@ -806,6 +819,119 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
806
  f"{k} is not a valid keyword argument for PySRRegressor."
807
  )
808
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
809
  def __repr__(self):
810
  """
811
  Prints all current equations fitted by the model.
@@ -826,12 +952,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
826
 
827
  for i, equations in enumerate(all_equations):
828
  selected = ["" for _ in range(len(equations))]
829
- if self.model_selection == "accuracy":
830
- chosen_row = -1
831
- elif self.model_selection == "best":
832
- chosen_row = equations["score"].idxmax()
833
- else:
834
- raise NotImplementedError
835
  selected[chosen_row] = ">>>>"
836
  repr_equations = pd.DataFrame(
837
  dict(
@@ -874,17 +995,31 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
874
  from the pickled instance.
875
  """
876
  state = self.__dict__
877
- if "raw_julia_state_" in state:
 
 
 
878
  warnings.warn(
879
  "raw_julia_state_ cannot be pickled and will be removed from the "
880
  "serialized instance. This will prevent a `warm_start` fit of any "
881
  "model that is deserialized via `pickle.load()`."
882
  )
 
 
 
 
 
 
 
 
 
883
  pickled_state = {
884
- key: None if key == "raw_julia_state_" else value
885
  for key, value in state.items()
886
  }
887
- if "equations_" in pickled_state:
 
 
888
  pickled_state["output_torch_format"] = False
889
  pickled_state["output_jax_format"] = False
890
  if self.nout_ == 1:
@@ -907,6 +1042,16 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
907
  ]
908
  return pickled_state
909
 
 
 
 
 
 
 
 
 
 
 
910
  @property
911
  def equations(self): # pragma: no cover
912
  warnings.warn(
@@ -950,18 +1095,14 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
950
  return [eq.iloc[i] for eq, i in zip(self.equations_, index)]
951
  return self.equations_.iloc[index]
952
 
953
- if self.model_selection == "accuracy":
954
- if isinstance(self.equations_, list):
955
- return [eq.iloc[-1] for eq in self.equations_]
956
- return self.equations_.iloc[-1]
957
- elif self.model_selection == "best":
958
- if isinstance(self.equations_, list):
959
- return [eq.iloc[eq["score"].idxmax()] for eq in self.equations_]
960
- return self.equations_.iloc[self.equations_["score"].idxmax()]
961
- else:
962
- raise NotImplementedError(
963
- f"{self.model_selection} is not a valid model selection strategy."
964
- )
965
 
966
  def _setup_equation_file(self):
967
  """
@@ -1607,8 +1748,20 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1607
  y,
1608
  )
1609
 
1610
- # Fitting procedure
1611
- return self._run(X, y, mutated_params, weights=weights, seed=seed)
 
 
 
 
 
 
 
 
 
 
 
 
1612
 
1613
  def refresh(self, checkpoint_file=None):
1614
  """
@@ -1620,10 +1773,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1620
  checkpoint_file : str, default=None
1621
  Path to checkpoint hall of fame file to be loaded.
1622
  """
1623
- check_is_fitted(self, attributes=["equation_file_"])
1624
  if checkpoint_file:
1625
  self.equation_file_ = checkpoint_file
1626
  self.equation_file_contents_ = None
 
1627
  self.equations_ = self.get_hof()
1628
 
1629
  def predict(self, X, index=None):
@@ -1695,7 +1848,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1695
  raise ValueError(
1696
  "Failed to evaluate the expression. "
1697
  "If you are using a custom operator, make sure to define it in :param`extra_sympy_mappings`, "
1698
- "e.g., `model.set_params(extra_sympy_mappings={'inv': lambda x: 1 / x})`."
 
1699
  ) from error
1700
 
1701
  def sympy(self, index=None):
@@ -1819,15 +1973,15 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1819
  if self.nout_ > 1:
1820
  all_outputs = []
1821
  for i in range(1, self.nout_ + 1):
1822
- df = pd.read_csv(
1823
- str(self.equation_file_) + f".out{i}" + ".bkup",
1824
- sep="|",
1825
- )
1826
  # Rename Complexity column to complexity:
1827
  df.rename(
1828
  columns={
1829
  "Complexity": "complexity",
1830
- "MSE": "loss",
1831
  "Equation": "equation",
1832
  },
1833
  inplace=True,
@@ -1835,11 +1989,14 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1835
 
1836
  all_outputs.append(df)
1837
  else:
1838
- all_outputs = [pd.read_csv(str(self.equation_file_) + ".bkup", sep="|")]
 
 
 
1839
  all_outputs[-1].rename(
1840
  columns={
1841
  "Complexity": "complexity",
1842
- "MSE": "loss",
1843
  "Equation": "equation",
1844
  },
1845
  inplace=True,
@@ -1893,7 +2050,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
1893
 
1894
  ret_outputs = []
1895
 
1896
- for output in self.equation_file_contents_:
 
 
1897
 
1898
  scores = []
1899
  lastMSE = None
@@ -2043,6 +2202,26 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
2043
  )
2044
 
2045
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2046
  def _denoise(X, y, Xresampled=None, random_state=None):
2047
  """Denoise the dataset using a Gaussian process"""
2048
  from sklearn.gaussian_process import GaussianProcessRegressor
@@ -2088,3 +2267,16 @@ def run_feature_selection(X, y, select_k_features, random_state=None):
2088
  clf, threshold=-np.inf, max_features=select_k_features, prefit=True
2089
  )
2090
  return selector.get_support(indices=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
  import os
3
  import sys
4
  import numpy as np
 
9
  import tempfile
10
  import shutil
11
  from pathlib import Path
12
+ import pickle as pkl
13
  from datetime import datetime
14
  import warnings
15
  from multiprocessing import cpu_count
 
206
  Parameters
207
  ----------
208
  model_selection : str, default="best"
209
+ Model selection criterion when selecting a final expression from
210
+ the list of best expression at each complexity.
211
+ Can be 'accuracy', 'best', or 'score'.
212
+ - `"accuracy"` selects the candidate model with the lowest loss
213
+ (highest accuracy).
214
+ - `"score"` selects the candidate model with the highest score.
215
+ Score is defined as the negated derivative of the log-loss with
216
+ respect to complexity - if an expression has a much better
217
+ loss at a slightly higher complexity, it is preferred.
218
+ - `"best"` selects the candidate model with the highest score
219
+ among expressions with a loss better than at least 1.5x the
220
+ most accurate model.
221
 
222
  binary_operators : list[str], default=["+", "-", "*", "/"]
223
  List of strings giving the binary operators in Julia's Base.
 
478
  Whether to use a progress bar instead of printing to stdout.
479
 
480
  equation_file : str, default=None
481
+ Where to save the files (.csv extension).
482
 
483
  temp_equation_file : bool, default=False
484
  Whether to put the hall of fame file in the temp directory.
 
573
  equation_file_contents_ : list[pandas.DataFrame]
574
  Contents of the equation file output by the Julia backend.
575
 
576
+ show_pickle_warnings_ : bool
577
+ Whether to show warnings about what attributes can be pickled.
578
+
579
  Notes
580
  -----
581
  Most default parameters have been tuned over several example equations,
 
819
  f"{k} is not a valid keyword argument for PySRRegressor."
820
  )
821
 
822
+ @classmethod
823
+ def from_file(
824
+ cls,
825
+ equation_file,
826
+ *,
827
+ binary_operators=None,
828
+ unary_operators=None,
829
+ n_features_in=None,
830
+ feature_names_in=None,
831
+ selection_mask=None,
832
+ nout=1,
833
+ **pysr_kwargs,
834
+ ):
835
+ """
836
+ Create a model from a saved model checkpoint or equation file.
837
+
838
+ Parameters
839
+ ----------
840
+ equation_file : str
841
+ Path to a pickle file containing a saved model, or a csv file
842
+ containing equations.
843
+
844
+ binary_operators : list[str]
845
+ The same binary operators used when creating the model.
846
+ Not needed if loading from a pickle file.
847
+
848
+ unary_operators : list[str]
849
+ The same unary operators used when creating the model.
850
+ Not needed if loading from a pickle file.
851
+
852
+ n_features_in : int
853
+ Number of features passed to the model.
854
+ Not needed if loading from a pickle file.
855
+
856
+ feature_names_in : list[str]
857
+ Names of the features passed to the model.
858
+ Not needed if loading from a pickle file.
859
+
860
+ selection_mask : list[bool]
861
+ If using select_k_features, you must pass `model.selection_mask_` here.
862
+ Not needed if loading from a pickle file.
863
+
864
+ nout : int, default=1
865
+ Number of outputs of the model.
866
+ Not needed if loading from a pickle file.
867
+
868
+ pysr_kwargs : dict
869
+ Any other keyword arguments to initialize the PySRRegressor object.
870
+ These will overwrite those stored in the pickle file.
871
+ Not needed if loading from a pickle file.
872
+
873
+ Returns
874
+ -------
875
+ model : PySRRegressor
876
+ The model with fitted equations.
877
+ """
878
+ if os.path.splitext(equation_file)[1] != ".pkl":
879
+ pkl_filename = _csv_filename_to_pkl_filename(equation_file)
880
+ else:
881
+ pkl_filename = equation_file
882
+
883
+ # Try to load model from <equation_file>.pkl
884
+ print(f"Checking if {pkl_filename} exists...")
885
+ if os.path.exists(pkl_filename):
886
+ print(f"Loading model from {pkl_filename}")
887
+ assert binary_operators is None
888
+ assert unary_operators is None
889
+ assert n_features_in is None
890
+ with open(pkl_filename, "rb") as f:
891
+ model = pkl.load(f)
892
+ # Update any parameters if necessary, such as
893
+ # extra_sympy_mappings:
894
+ model.set_params(**pysr_kwargs)
895
+ if "equations_" not in model.__dict__ or model.equations_ is None:
896
+ model.refresh()
897
+
898
+ return model
899
+
900
+ # Else, we re-create it.
901
+ print(
902
+ f"{equation_file} does not exist, "
903
+ "so we must create the model from scratch."
904
+ )
905
+ assert binary_operators is not None
906
+ assert unary_operators is not None
907
+ assert n_features_in is not None
908
+
909
+ # TODO: copy .bkup file if exists.
910
+ model = cls(
911
+ equation_file=equation_file,
912
+ binary_operators=binary_operators,
913
+ unary_operators=unary_operators,
914
+ **pysr_kwargs,
915
+ )
916
+
917
+ model.nout_ = nout
918
+ model.n_features_in_ = n_features_in
919
+
920
+ if feature_names_in is None:
921
+ model.feature_names_in_ = [f"x{i}" for i in range(n_features_in)]
922
+ else:
923
+ assert len(feature_names_in) == n_features_in
924
+ model.feature_names_in_ = feature_names_in
925
+
926
+ if selection_mask is None:
927
+ model.selection_mask_ = np.ones(n_features_in, dtype=bool)
928
+ else:
929
+ model.selection_mask_ = selection_mask
930
+
931
+ model.refresh(checkpoint_file=equation_file)
932
+
933
+ return model
934
+
935
  def __repr__(self):
936
  """
937
  Prints all current equations fitted by the model.
 
952
 
953
  for i, equations in enumerate(all_equations):
954
  selected = ["" for _ in range(len(equations))]
955
+ chosen_row = idx_model_selection(equations, self.model_selection)
 
 
 
 
 
956
  selected[chosen_row] = ">>>>"
957
  repr_equations = pd.DataFrame(
958
  dict(
 
995
  from the pickled instance.
996
  """
997
  state = self.__dict__
998
+ show_pickle_warning = not (
999
+ "show_pickle_warnings_" in state and not state["show_pickle_warnings_"]
1000
+ )
1001
+ if "raw_julia_state_" in state and show_pickle_warning:
1002
  warnings.warn(
1003
  "raw_julia_state_ cannot be pickled and will be removed from the "
1004
  "serialized instance. This will prevent a `warm_start` fit of any "
1005
  "model that is deserialized via `pickle.load()`."
1006
  )
1007
+ state_keys_containing_lambdas = ["extra_sympy_mappings", "extra_torch_mappings"]
1008
+ for state_key in state_keys_containing_lambdas:
1009
+ if state[state_key] is not None and show_pickle_warning:
1010
+ warnings.warn(
1011
+ f"`{state_key}` cannot be pickled and will be removed from the "
1012
+ "serialized instance. When loading the model, please redefine "
1013
+ f"`{state_key}` at runtime."
1014
+ )
1015
+ state_keys_to_clear = ["raw_julia_state_"] + state_keys_containing_lambdas
1016
  pickled_state = {
1017
+ key: (None if key in state_keys_to_clear else value)
1018
  for key, value in state.items()
1019
  }
1020
+ if ("equations_" in pickled_state) and (
1021
+ pickled_state["equations_"] is not None
1022
+ ):
1023
  pickled_state["output_torch_format"] = False
1024
  pickled_state["output_jax_format"] = False
1025
  if self.nout_ == 1:
 
1042
  ]
1043
  return pickled_state
1044
 
1045
+ def _checkpoint(self):
1046
+ """Saves the model's current state to a checkpoint file.
1047
+
1048
+ This should only be used internally by PySRRegressor."""
1049
+ # Save model state:
1050
+ self.show_pickle_warnings_ = False
1051
+ with open(_csv_filename_to_pkl_filename(self.equation_file_), "wb") as f:
1052
+ pkl.dump(self, f)
1053
+ self.show_pickle_warnings_ = True
1054
+
1055
  @property
1056
  def equations(self): # pragma: no cover
1057
  warnings.warn(
 
1095
  return [eq.iloc[i] for eq, i in zip(self.equations_, index)]
1096
  return self.equations_.iloc[index]
1097
 
1098
+ if isinstance(self.equations_, list):
1099
+ return [
1100
+ eq.iloc[idx_model_selection(eq, self.model_selection)]
1101
+ for eq in self.equations_
1102
+ ]
1103
+ return self.equations_.iloc[
1104
+ idx_model_selection(self.equations_, self.model_selection)
1105
+ ]
 
 
 
 
1106
 
1107
  def _setup_equation_file(self):
1108
  """
 
1748
  y,
1749
  )
1750
 
1751
+ # Initially, just save model parameters, so that
1752
+ # it can be loaded from an early exit:
1753
+ if not self.temp_equation_file:
1754
+ self._checkpoint()
1755
+
1756
+ # Perform the search:
1757
+ self._run(X, y, mutated_params, weights=weights, seed=seed)
1758
+
1759
+ # Then, after fit, we save again, so the pickle file contains
1760
+ # the equations:
1761
+ if not self.temp_equation_file:
1762
+ self._checkpoint()
1763
+
1764
+ return self
1765
 
1766
  def refresh(self, checkpoint_file=None):
1767
  """
 
1773
  checkpoint_file : str, default=None
1774
  Path to checkpoint hall of fame file to be loaded.
1775
  """
 
1776
  if checkpoint_file:
1777
  self.equation_file_ = checkpoint_file
1778
  self.equation_file_contents_ = None
1779
+ check_is_fitted(self, attributes=["equation_file_"])
1780
  self.equations_ = self.get_hof()
1781
 
1782
  def predict(self, X, index=None):
 
1848
  raise ValueError(
1849
  "Failed to evaluate the expression. "
1850
  "If you are using a custom operator, make sure to define it in :param`extra_sympy_mappings`, "
1851
+ "e.g., `model.set_params(extra_sympy_mappings={'inv': lambda x: 1 / x})`. You can then "
1852
+ "run `model.refresh()` to re-load the expressions."
1853
  ) from error
1854
 
1855
  def sympy(self, index=None):
 
1973
  if self.nout_ > 1:
1974
  all_outputs = []
1975
  for i in range(1, self.nout_ + 1):
1976
+ cur_filename = str(self.equation_file_) + f".out{i}" + ".bkup"
1977
+ if not os.path.exists(cur_filename):
1978
+ cur_filename = str(self.equation_file_) + f".out{i}"
1979
+ df = pd.read_csv(cur_filename)
1980
  # Rename Complexity column to complexity:
1981
  df.rename(
1982
  columns={
1983
  "Complexity": "complexity",
1984
+ "Loss": "loss",
1985
  "Equation": "equation",
1986
  },
1987
  inplace=True,
 
1989
 
1990
  all_outputs.append(df)
1991
  else:
1992
+ filename = str(self.equation_file_) + ".bkup"
1993
+ if not os.path.exists(filename):
1994
+ filename = str(self.equation_file_)
1995
+ all_outputs = [pd.read_csv(filename)]
1996
  all_outputs[-1].rename(
1997
  columns={
1998
  "Complexity": "complexity",
1999
+ "Loss": "loss",
2000
  "Equation": "equation",
2001
  },
2002
  inplace=True,
 
2050
 
2051
  ret_outputs = []
2052
 
2053
+ equation_file_contents = copy.deepcopy(self.equation_file_contents_)
2054
+
2055
+ for output in equation_file_contents:
2056
 
2057
  scores = []
2058
  lastMSE = None
 
2202
  )
2203
 
2204
 
2205
+ def idx_model_selection(equations: pd.DataFrame, model_selection: str) -> int:
2206
+ """
2207
+ Return the index of the selected expression, given a dataframe of
2208
+ equations and a model selection.
2209
+ """
2210
+ if model_selection == "accuracy":
2211
+ chosen_idx = equations["loss"].idxmin()
2212
+ elif model_selection == "best":
2213
+ threshold = 1.5 * equations["loss"].min()
2214
+ filtered_equations = equations.query(f"loss <= {threshold}")
2215
+ chosen_idx = filtered_equations["score"].idxmax()
2216
+ elif model_selection == "score":
2217
+ chosen_idx = equations["score"].idxmax()
2218
+ else:
2219
+ raise NotImplementedError(
2220
+ f"{model_selection} is not a valid model selection strategy."
2221
+ )
2222
+ return chosen_idx
2223
+
2224
+
2225
  def _denoise(X, y, Xresampled=None, random_state=None):
2226
  """Denoise the dataset using a Gaussian process"""
2227
  from sklearn.gaussian_process import GaussianProcessRegressor
 
2267
  clf, threshold=-np.inf, max_features=select_k_features, prefit=True
2268
  )
2269
  return selector.get_support(indices=True)
2270
+
2271
+
2272
+ def _csv_filename_to_pkl_filename(csv_filename) -> str:
2273
+ # Assume that the csv filename is of the form "foo.csv"
2274
+ assert str(csv_filename).endswith(".csv")
2275
+
2276
+ dirname = str(os.path.dirname(csv_filename))
2277
+ basename = str(os.path.basename(csv_filename))
2278
+ base = str(os.path.splitext(basename)[0])
2279
+
2280
+ pkl_basename = base + ".pkl"
2281
+
2282
+ return os.path.join(dirname, pkl_basename)
pysr/version.py CHANGED
@@ -1,2 +1,2 @@
1
- __version__ = "0.9.5"
2
- __symbolic_regression_jl_version__ = "0.9.7"
 
1
+ __version__ = "0.10.0"
2
+ __symbolic_regression_jl_version__ = "0.10.0"
test/test.py CHANGED
@@ -5,7 +5,12 @@ import unittest
5
  import numpy as np
6
  from sklearn import model_selection
7
  from pysr import PySRRegressor
8
- from pysr.sr import run_feature_selection, _handle_feature_selection
 
 
 
 
 
9
  from pysr.export_latex import to_latex
10
  from sklearn.utils.estimator_checks import check_estimator
11
  import sympy
@@ -13,6 +18,7 @@ import pandas as pd
13
  import warnings
14
  import pickle as pkl
15
  import tempfile
 
16
 
17
  DEFAULT_PARAMS = inspect.signature(PySRRegressor.__init__).parameters
18
  DEFAULT_NITERATIONS = DEFAULT_PARAMS["niterations"].default
@@ -136,7 +142,7 @@ class TestPipeline(unittest.TestCase):
136
  # These tests are flaky, so don't fail test:
137
  try:
138
  np.testing.assert_almost_equal(
139
- model.predict(X.copy())[:, 0], X[:, 0] ** 2, decimal=4
140
  )
141
  except AssertionError:
142
  print("Error in test_multioutput_weighted_with_callable_temp_equation")
@@ -145,7 +151,7 @@ class TestPipeline(unittest.TestCase):
145
 
146
  try:
147
  np.testing.assert_almost_equal(
148
- model.predict(X.copy())[:, 1], X[:, 1] ** 2, decimal=4
149
  )
150
  except AssertionError:
151
  print("Error in test_multioutput_weighted_with_callable_temp_equation")
@@ -281,6 +287,72 @@ class TestPipeline(unittest.TestCase):
281
  model.fit(X.values, y.values, Xresampled=Xresampled.values)
282
  self.assertLess(np.average((model.predict(X.values) - y.values) ** 2), 1e-4)
283
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
284
 
285
  def manually_create_model(equations, feature_names=None):
286
  if feature_names is None:
@@ -304,7 +376,7 @@ def manually_create_model(equations, feature_names=None):
304
  model.feature_names_in_ = np.array(feature_names, dtype=object)
305
  for i in range(model.nout_):
306
  equations[i]["complexity loss equation".split(" ")].to_csv(
307
- f"equation_file.csv.out{i+1}.bkup", sep="|"
308
  )
309
  else:
310
  model.equation_file_ = "equation_file.csv"
@@ -312,7 +384,7 @@ def manually_create_model(equations, feature_names=None):
312
  model.selection_mask_ = None
313
  model.feature_names_in_ = np.array(feature_names, dtype=object)
314
  equations["complexity loss equation".split(" ")].to_csv(
315
- "equation_file.csv.bkup", sep="|"
316
  )
317
 
318
  model.refresh()
@@ -351,7 +423,21 @@ class TestBest(unittest.TestCase):
351
  X = self.X
352
  y = self.y
353
  for f in [self.model.predict, self.equations_.iloc[-1]["lambda_format"]]:
354
- np.testing.assert_almost_equal(f(X), y, decimal=4)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
355
 
356
 
357
  class TestFeatureSelection(unittest.TestCase):
@@ -385,6 +471,20 @@ class TestFeatureSelection(unittest.TestCase):
385
  class TestMiscellaneous(unittest.TestCase):
386
  """Test miscellaneous functions."""
387
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
388
  def test_deprecation(self):
389
  """Ensure that deprecation works as expected.
390
 
 
5
  import numpy as np
6
  from sklearn import model_selection
7
  from pysr import PySRRegressor
8
+ from pysr.sr import (
9
+ run_feature_selection,
10
+ _handle_feature_selection,
11
+ _csv_filename_to_pkl_filename,
12
+ idx_model_selection,
13
+ )
14
  from pysr.export_latex import to_latex
15
  from sklearn.utils.estimator_checks import check_estimator
16
  import sympy
 
18
  import warnings
19
  import pickle as pkl
20
  import tempfile
21
+ from pathlib import Path
22
 
23
  DEFAULT_PARAMS = inspect.signature(PySRRegressor.__init__).parameters
24
  DEFAULT_NITERATIONS = DEFAULT_PARAMS["niterations"].default
 
142
  # These tests are flaky, so don't fail test:
143
  try:
144
  np.testing.assert_almost_equal(
145
+ model.predict(X.copy())[:, 0], X[:, 0] ** 2, decimal=3
146
  )
147
  except AssertionError:
148
  print("Error in test_multioutput_weighted_with_callable_temp_equation")
 
151
 
152
  try:
153
  np.testing.assert_almost_equal(
154
+ model.predict(X.copy())[:, 1], X[:, 1] ** 2, decimal=3
155
  )
156
  except AssertionError:
157
  print("Error in test_multioutput_weighted_with_callable_temp_equation")
 
287
  model.fit(X.values, y.values, Xresampled=Xresampled.values)
288
  self.assertLess(np.average((model.predict(X.values) - y.values) ** 2), 1e-4)
289
 
290
+ def test_load_model(self):
291
+ """See if we can load a ran model from the equation file."""
292
+ csv_file_data = """
293
+ Complexity,Loss,Equation
294
+ 1,0.19951081,"1.9762075"
295
+ 3,0.12717344,"(f0 + 1.4724599)"
296
+ 4,0.104823045,"pow_abs(2.2683423, cos(f3))\""""
297
+ # Strip the indents:
298
+ csv_file_data = "\n".join([l.strip() for l in csv_file_data.split("\n")])
299
+
300
+ for from_backup in [False, True]:
301
+ rand_dir = Path(tempfile.mkdtemp())
302
+ equation_filename = str(rand_dir / "equation.csv")
303
+ with open(equation_filename + (".bkup" if from_backup else ""), "w") as f:
304
+ f.write(csv_file_data)
305
+ model = PySRRegressor.from_file(
306
+ equation_filename,
307
+ n_features_in=5,
308
+ feature_names_in=["f0", "f1", "f2", "f3", "f4"],
309
+ binary_operators=["+", "*", "/", "-", "^"],
310
+ unary_operators=["cos"],
311
+ )
312
+ X = self.rstate.rand(100, 5)
313
+ y_truth = 2.2683423 ** np.cos(X[:, 3])
314
+ y_test = model.predict(X, 2)
315
+
316
+ np.testing.assert_allclose(y_truth, y_test)
317
+
318
+ def test_load_model_simple(self):
319
+ # Test that we can simply load a model from its equation file.
320
+ y = self.X[:, [0, 1]] ** 2
321
+ model = PySRRegressor(
322
+ # Test that passing a single operator works:
323
+ unary_operators="sq(x) = x^2",
324
+ binary_operators="plus",
325
+ extra_sympy_mappings={"sq": lambda x: x**2},
326
+ **self.default_test_kwargs,
327
+ procs=0,
328
+ denoise=True,
329
+ early_stop_condition="stop_if(loss, complexity) = loss < 0.05 && complexity == 2",
330
+ )
331
+ rand_dir = Path(tempfile.mkdtemp())
332
+ equation_file = rand_dir / "equations.csv"
333
+ model.set_params(temp_equation_file=False)
334
+ model.set_params(equation_file=equation_file)
335
+ model.fit(self.X, y)
336
+
337
+ # lambda functions are removed from the pickling, so we need
338
+ # to pass it during the loading:
339
+ model2 = PySRRegressor.from_file(
340
+ model.equation_file_, extra_sympy_mappings={"sq": lambda x: x**2}
341
+ )
342
+
343
+ np.testing.assert_allclose(model.predict(self.X), model2.predict(self.X))
344
+
345
+ # Try again, but using only the pickle file:
346
+ for file_to_delete in [str(equation_file), str(equation_file) + ".bkup"]:
347
+ if os.path.exists(file_to_delete):
348
+ os.remove(file_to_delete)
349
+
350
+ pickle_file = rand_dir / "equations.pkl"
351
+ model3 = PySRRegressor.from_file(
352
+ model.equation_file_, extra_sympy_mappings={"sq": lambda x: x**2}
353
+ )
354
+ np.testing.assert_allclose(model.predict(self.X), model3.predict(self.X))
355
+
356
 
357
  def manually_create_model(equations, feature_names=None):
358
  if feature_names is None:
 
376
  model.feature_names_in_ = np.array(feature_names, dtype=object)
377
  for i in range(model.nout_):
378
  equations[i]["complexity loss equation".split(" ")].to_csv(
379
+ f"equation_file.csv.out{i+1}.bkup"
380
  )
381
  else:
382
  model.equation_file_ = "equation_file.csv"
 
384
  model.selection_mask_ = None
385
  model.feature_names_in_ = np.array(feature_names, dtype=object)
386
  equations["complexity loss equation".split(" ")].to_csv(
387
+ "equation_file.csv.bkup"
388
  )
389
 
390
  model.refresh()
 
423
  X = self.X
424
  y = self.y
425
  for f in [self.model.predict, self.equations_.iloc[-1]["lambda_format"]]:
426
+ np.testing.assert_almost_equal(f(X), y, decimal=3)
427
+
428
+ def test_all_selection_strategies(self):
429
+ equations = pd.DataFrame(
430
+ dict(
431
+ loss=[1.0, 0.1, 0.01, 0.001 * 1.4, 0.001],
432
+ score=[0.5, 1.0, 0.5, 0.5, 0.3],
433
+ )
434
+ )
435
+ idx_accuracy = idx_model_selection(equations, "accuracy")
436
+ self.assertEqual(idx_accuracy, 4)
437
+ idx_best = idx_model_selection(equations, "best")
438
+ self.assertEqual(idx_best, 3)
439
+ idx_score = idx_model_selection(equations, "score")
440
+ self.assertEqual(idx_score, 1)
441
 
442
 
443
  class TestFeatureSelection(unittest.TestCase):
 
471
  class TestMiscellaneous(unittest.TestCase):
472
  """Test miscellaneous functions."""
473
 
474
+ def test_csv_to_pkl_conversion(self):
475
+ """Test that csv filename to pkl filename works as expected."""
476
+ tmpdir = Path(tempfile.mkdtemp())
477
+ equation_file = tmpdir / "equations.389479384.28378374.csv"
478
+ expected_pkl_file = tmpdir / "equations.389479384.28378374.pkl"
479
+
480
+ # First, test inputting the paths:
481
+ test_pkl_file = _csv_filename_to_pkl_filename(equation_file)
482
+ self.assertEqual(test_pkl_file, str(expected_pkl_file))
483
+
484
+ # Next, test inputting the strings.
485
+ test_pkl_file = _csv_filename_to_pkl_filename(str(equation_file))
486
+ self.assertEqual(test_pkl_file, str(expected_pkl_file))
487
+
488
  def test_deprecation(self):
489
  """Ensure that deprecation works as expected.
490
 
test/test_jax.py CHANGED
@@ -34,13 +34,13 @@ class TestJAX(unittest.TestCase):
34
  equations = pd.DataFrame(
35
  {
36
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
37
- "MSE": [1.0, 0.1, 1e-5],
38
  "Complexity": [1, 2, 3],
39
  }
40
  )
41
 
42
- equations["Complexity MSE Equation".split(" ")].to_csv(
43
- "equation_file.csv.bkup", sep="|"
44
  )
45
 
46
  model.refresh(checkpoint_file="equation_file.csv")
@@ -49,7 +49,7 @@ class TestJAX(unittest.TestCase):
49
  np.testing.assert_almost_equal(
50
  np.array(jformat["callable"](jnp.array(X), jformat["parameters"])),
51
  np.square(np.cos(X.values[:, 1])), # Select feature 1
52
- decimal=4,
53
  )
54
 
55
  def test_pipeline(self):
@@ -61,13 +61,13 @@ class TestJAX(unittest.TestCase):
61
  equations = pd.DataFrame(
62
  {
63
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
64
- "MSE": [1.0, 0.1, 1e-5],
65
  "Complexity": [1, 2, 3],
66
  }
67
  )
68
 
69
- equations["Complexity MSE Equation".split(" ")].to_csv(
70
- "equation_file.csv.bkup", sep="|"
71
  )
72
 
73
  model.refresh(checkpoint_file="equation_file.csv")
@@ -76,7 +76,7 @@ class TestJAX(unittest.TestCase):
76
  np.testing.assert_almost_equal(
77
  np.array(jformat["callable"](jnp.array(X), jformat["parameters"])),
78
  np.square(np.cos(X[:, 1])), # Select feature 1
79
- decimal=4,
80
  )
81
 
82
  def test_feature_selection_custom_operators(self):
@@ -110,5 +110,5 @@ class TestJAX(unittest.TestCase):
110
  np_output = np_prediction(X.values)
111
  jax_output = jax_prediction(X.values)
112
 
113
- np.testing.assert_almost_equal(y.values, np_output, decimal=4)
114
- np.testing.assert_almost_equal(y.values, jax_output, decimal=4)
 
34
  equations = pd.DataFrame(
35
  {
36
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
37
+ "Loss": [1.0, 0.1, 1e-5],
38
  "Complexity": [1, 2, 3],
39
  }
40
  )
41
 
42
+ equations["Complexity Loss Equation".split(" ")].to_csv(
43
+ "equation_file.csv.bkup"
44
  )
45
 
46
  model.refresh(checkpoint_file="equation_file.csv")
 
49
  np.testing.assert_almost_equal(
50
  np.array(jformat["callable"](jnp.array(X), jformat["parameters"])),
51
  np.square(np.cos(X.values[:, 1])), # Select feature 1
52
+ decimal=3,
53
  )
54
 
55
  def test_pipeline(self):
 
61
  equations = pd.DataFrame(
62
  {
63
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
64
+ "Loss": [1.0, 0.1, 1e-5],
65
  "Complexity": [1, 2, 3],
66
  }
67
  )
68
 
69
+ equations["Complexity Loss Equation".split(" ")].to_csv(
70
+ "equation_file.csv.bkup"
71
  )
72
 
73
  model.refresh(checkpoint_file="equation_file.csv")
 
76
  np.testing.assert_almost_equal(
77
  np.array(jformat["callable"](jnp.array(X), jformat["parameters"])),
78
  np.square(np.cos(X[:, 1])), # Select feature 1
79
+ decimal=3,
80
  )
81
 
82
  def test_feature_selection_custom_operators(self):
 
110
  np_output = np_prediction(X.values)
111
  jax_output = jax_prediction(X.values)
112
 
113
+ np.testing.assert_almost_equal(y.values, np_output, decimal=3)
114
+ np.testing.assert_almost_equal(y.values, jax_output, decimal=3)
test/test_torch.py CHANGED
@@ -49,13 +49,13 @@ class TestTorch(unittest.TestCase):
49
  equations = pd.DataFrame(
50
  {
51
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
52
- "MSE": [1.0, 0.1, 1e-5],
53
  "Complexity": [1, 2, 3],
54
  }
55
  )
56
 
57
- equations["Complexity MSE Equation".split(" ")].to_csv(
58
- "equation_file.csv.bkup", sep="|"
59
  )
60
 
61
  model.refresh(checkpoint_file="equation_file.csv")
@@ -82,13 +82,13 @@ class TestTorch(unittest.TestCase):
82
  equations = pd.DataFrame(
83
  {
84
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
85
- "MSE": [1.0, 0.1, 1e-5],
86
  "Complexity": [1, 2, 3],
87
  }
88
  )
89
 
90
- equations["Complexity MSE Equation".split(" ")].to_csv(
91
- "equation_file.csv.bkup", sep="|"
92
  )
93
 
94
  model.refresh(checkpoint_file="equation_file.csv")
@@ -133,13 +133,13 @@ class TestTorch(unittest.TestCase):
133
  equations = pd.DataFrame(
134
  {
135
  "Equation": ["1.0", "mycustomoperator(x1)"],
136
- "MSE": [1.0, 0.1],
137
  "Complexity": [1, 2],
138
  }
139
  )
140
 
141
- equations["Complexity MSE Equation".split(" ")].to_csv(
142
- "equation_file_custom_operator.csv.bkup", sep="|"
143
  )
144
 
145
  model.set_params(
 
49
  equations = pd.DataFrame(
50
  {
51
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
52
+ "Loss": [1.0, 0.1, 1e-5],
53
  "Complexity": [1, 2, 3],
54
  }
55
  )
56
 
57
+ equations["Complexity Loss Equation".split(" ")].to_csv(
58
+ "equation_file.csv.bkup"
59
  )
60
 
61
  model.refresh(checkpoint_file="equation_file.csv")
 
82
  equations = pd.DataFrame(
83
  {
84
  "Equation": ["1.0", "cos(x1)", "square(cos(x1))"],
85
+ "Loss": [1.0, 0.1, 1e-5],
86
  "Complexity": [1, 2, 3],
87
  }
88
  )
89
 
90
+ equations["Complexity Loss Equation".split(" ")].to_csv(
91
+ "equation_file.csv.bkup"
92
  )
93
 
94
  model.refresh(checkpoint_file="equation_file.csv")
 
133
  equations = pd.DataFrame(
134
  {
135
  "Equation": ["1.0", "mycustomoperator(x1)"],
136
+ "Loss": [1.0, 0.1],
137
  "Complexity": [1, 2],
138
  }
139
  )
140
 
141
+ equations["Complexity Loss Equation".split(" ")].to_csv(
142
+ "equation_file_custom_operator.csv.bkup"
143
  )
144
 
145
  model.set_params(