starcoder_memorization_checker

Runtime error

dhuynh95 commited on Oct 30, 2023

Commit

f8720c8

1 Parent(s): 30962e5

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -137,6 +137,8 @@ To raise awareness of this issue, we show in this demo how much [StarCoder](http
 We found that **StarCoder memorized at least 8% of the training samples** we used, which highlights the high risks of LLMs exposing the training set. We provide a notebook to reproduce our results [here](https://colab.research.google.com/drive/1YaaPOXzodEAc4JXboa12gN5zdlzy5XaR?usp=sharing). 👈
 To evaluate memorization of the training set, we can prompt StarCoder with the first tokens of an example from the training set. If StarCoder completes the prompt with an output that looks very similar to the original sample, we will consider this sample to be memorized by the LLM. 💾
 """
 memorization_definition = """

 We found that **StarCoder memorized at least 8% of the training samples** we used, which highlights the high risks of LLMs exposing the training set. We provide a notebook to reproduce our results [here](https://colab.research.google.com/drive/1YaaPOXzodEAc4JXboa12gN5zdlzy5XaR?usp=sharing). 👈
 To evaluate memorization of the training set, we can prompt StarCoder with the first tokens of an example from the training set. If StarCoder completes the prompt with an output that looks very similar to the original sample, we will consider this sample to be memorized by the LLM. 💾
+⚠️Non responsiveness: We use Hugging Face Pro Inference solution to query StarCoder, which might be not available. If the demo does not work, please try later.
 """
 memorization_definition = """