Spaces:
Sleeping
Sleeping
Commit
ยท
a107c57
1
Parent(s):
882c546
grammarly typos
Browse files
app.py
CHANGED
@@ -12,12 +12,12 @@ you may be aware that changes in the distribution of
|
|
12 |
the production data can affect the model's performance.
|
13 |
""")
|
14 |
|
15 |
-
st.markdown("""Recently a paper from MIT, Harvard and other institutions showed how [91% of their ML models
|
16 |
-
experiments
|
17 |
|
18 |
-
st.markdown("""Typically, to know if a model is degrading
|
19 |
-
getting new labeled data is
|
20 |
-
knowing how the model
|
21 |
""")
|
22 |
|
23 |
st.markdown("""
|
@@ -39,18 +39,18 @@ car_value, salary_range, loan_lenght, etc.
|
|
39 |
st.dataframe(analysis_df.head(3))
|
40 |
|
41 |
st.markdown("""
|
42 |
-
We know that the model had a **Test F1-Score of: 0.943**. But
|
43 |
will continue to be good on production data?
|
44 |
""")
|
45 |
|
46 |
st.markdown("#### Estimating the Model Performance")
|
47 |
st.markdown("""
|
48 |
-
Instead of waiting for ground truth we can use NannyML's
|
49 |
[CBPE]("https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html")
|
50 |
method to estimate the performance of an ML model.
|
51 |
|
52 |
CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities.
|
53 |
-
Once the probabilities are
|
54 |
""")
|
55 |
|
56 |
chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500)
|
@@ -101,7 +101,7 @@ st.divider()
|
|
101 |
|
102 |
|
103 |
|
104 |
-
st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML""")
|
105 |
|
106 |
st.markdown("""
|
107 |
NannyML is an open-source library for post-deployment data science. Leave us a ๐ on [GitHub]("https://github.com/NannyML/nannyml")
|
|
|
12 |
the production data can affect the model's performance.
|
13 |
""")
|
14 |
|
15 |
+
st.markdown("""Recently a paper from MIT, Harvard, and other institutions showed how [91% of their ML models
|
16 |
+
experiments degraded]('https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time') in time.""")
|
17 |
|
18 |
+
st.markdown("""Typically, we need access to ground truth to know if a model is degrading.
|
19 |
+
But most of the time, getting new labeled data is expensive, time-consuming, or impossible.
|
20 |
+
So we end up blindless without knowing how the model performs in production.
|
21 |
""")
|
22 |
|
23 |
st.markdown("""
|
|
|
39 |
st.dataframe(analysis_df.head(3))
|
40 |
|
41 |
st.markdown("""
|
42 |
+
We know that the model had a **Test F1-Score of: 0.943**. But what guarantees us that the F1-Score
|
43 |
will continue to be good on production data?
|
44 |
""")
|
45 |
|
46 |
st.markdown("#### Estimating the Model Performance")
|
47 |
st.markdown("""
|
48 |
+
Instead of waiting for ground truth, we can use NannyML's
|
49 |
[CBPE]("https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html")
|
50 |
method to estimate the performance of an ML model.
|
51 |
|
52 |
CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities.
|
53 |
+
Once the probabilities are calibrated, it can estimate any performance metric that can be computed from the confusion matrix elements.
|
54 |
""")
|
55 |
|
56 |
chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500)
|
|
|
101 |
|
102 |
|
103 |
|
104 |
+
st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML.""")
|
105 |
|
106 |
st.markdown("""
|
107 |
NannyML is an open-source library for post-deployment data science. Leave us a ๐ on [GitHub]("https://github.com/NannyML/nannyml")
|