Redacting names is bad practice.

#53
by wasertech - opened

I tried the model on HuggingChat. Here is one particularly bad example.

In physics and cosmology, a singularity refers to a point in spacetime where the known laws of physics break down and quantities like density and curvature become infinite.
The most well-known example of a singularity is the one predicted by <NAME>'s theory of general relativity at the center of a black hole.

As anyone with two cells can clearly infer, the model is mentioning Albert Einstein's work on general relativity. Yet the model has been trained with redacted names so it redacts them from its answers as well.

I can understand it is done to protect people's privacy and anonymity (which I can appreciate) but doing so systematically for all names (incl. public figures) is just bad practice as the model will learn this behavior and therefor be useless for most cases.

Sad, I was really looking forward to play more with this model but in this state, I don't even see the point.

Sign up or log in to comment