Santiago Viquez
santiviquez
AI & ML interests
ML @ NannyML. A bit of everything. NLP, RL, and, of course, tabular. In the GenAI era, how can you not love tabular data? Educational content and OSS.
Articles
Organizations
Posts
17
Post
2010
More open research updates π§΅
Performance estimation is currently the best way to quantify the impact of data drift on model performance. π‘
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. π¨βπ¬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share π
Performance estimation is currently the best way to quantify the impact of data drift on model performance. π‘
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. π¨βπ¬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share π
Post
1338
How would you benchmark performance estimation algorithms vs data drift signals?
I'm working on a benchmarking analysis, and I'm currently doing the following:
- Get univariate and multivariate drift signals and measure their correlation with realized performance.
- Use drift signals as features of a regression model to predict the model's performance.
- Use drift signals as features of a classification model to predict a performance drop.
- Compare all the above experiments with results from Performance Estimation algorithms.
Any other ideas?
I'm working on a benchmarking analysis, and I'm currently doing the following:
- Get univariate and multivariate drift signals and measure their correlation with realized performance.
- Use drift signals as features of a regression model to predict the model's performance.
- Use drift signals as features of a classification model to predict a performance drop.
- Compare all the above experiments with results from Performance Estimation algorithms.
Any other ideas?
Collections
1
Collection of LLM hallucination and evaluation papers that I've been exploring and implementing. Some of them have my comments and annotated doodles.
-
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Paper β’ 2208.05309 β’ Published β’ 1 -
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Paper β’ 2305.13711 β’ Published β’ 2 -
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Paper β’ 2302.09664 β’ Published β’ 2 -
BARTScore: Evaluating Generated Text as Text Generation
Paper β’ 2106.11520 β’ Published β’ 1
models
16
santiviquez/t5-small-finetuned-samsum-en
Summarization
β’
Updated
β’
3
santiviquez/bart-base-finetuned-samsum-en
Summarization
β’
Updated
β’
4
santiviquez/amazon-reviews-sentiment-bert-base-uncased-6000-samples
Updated
santiviquez/amazon-reviews-sentiment-distilbert-base-uncased-6000-samples
Text Classification
β’
Updated
β’
2
santiviquez/amazon-reviews-finetuning-distilbert-base-uncased
Text Classification
β’
Updated
β’
1
santiviquez/amazon-reviews-finetuning-distilbert-base-uncased_books
Text Classification
β’
Updated
β’
7
santiviquez/amazon-reviews-finetuning-bert-base-sentiment
Text Classification
β’
Updated
β’
19
santiviquez/amazon_reviews_finetuning-sentiment-model-3000-samples
Text Classification
β’
Updated
β’
1
santiviquez/noisy_human_cnn
Updated
santiviquez/ssr-base-finetuned-samsum-en
Summarization
β’
Updated
β’
4