MatthiasC's picture
Improve code and add more example specific text
305fb83
raw
history blame
696 Bytes
As you can see we have 2 unmatched entities: "January 18" and "U.S". The first one is a hallucinated entity in the summary, that does not exist in the article.
Deep learning based generation is [prone to hallucinate](https://arxiv.org/pdf/2202.03629.pdf) unintended text. These hallucinations degrade
system performance and fail to meet user expectations in many real-world scenarios. By applying entity matching, we can improve this problem
for the downstream task of summary generation. U.S. **does** occur in the article, but as "US" instead of "U.S.". This could be solved
by comparing to a list of abbreviations or with a specific embedder for abbreviations but is currently not implemented.