{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "32744c39-dae0-4f8c-beea-af8cf934f977", "metadata": {}, "outputs": [], "source": [ "from paraphrase_metrics import metrics as pm\n", "import pandas as pd\n", "import spacy\n", "from tqdm import tqdm\n", "nlp = spacy.load(\"en_core_web_sm\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "c219268a-d25b-4b27-b0eb-0bb578ec3450", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | og_s1 | \n", "og_s2 | \n", "new_s1 | \n", "new_s2 | \n", "og_label | \n", "new_label | \n", "remarks | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "Amrozi accused his brother, whom he called \"th... | \n", "Referring to him as only \"the witness\", Amrozi... | \n", "Amrozi accused his brother, whom he called \"th... | \n", "Referring to him as only \"the witness\", Amrozi... | \n", "1 | \n", "1 | \n", "no need to correct | \n", "
1 | \n", "Yucaipa owned Dominick's before selling the ch... | \n", "Yucaipa bought Dominick's in 1995 for $693 mil... | \n", "Yucaipa owned Dominick's before selling the ch... | \n", "Yucaipa bought Dominick's in 1995 for $693 mil... | \n", "0 | \n", "0 | \n", "no need to correct | \n", "
2 | \n", "They had published an advertisement on the Int... | \n", "On June 10, the ship's owners had published an... | \n", "They had published an advertisement on the Int... | \n", "On June 10, the ship's owners had published an... | \n", "1 | \n", "1 | \n", "no need to correct | \n", "
3 | \n", "Around 0335 GMT, Tab shares were up 19 cents, ... | \n", "Tab shares jumped 20 cents, or 4.6%, to set a ... | \n", "Around 0335 GMT, Tab shares were up 19 cents, ... | \n", "Tab shares jumped 20 cents, or 4.6%, to set a ... | \n", "0 | \n", "0 | \n", "no need to correct | \n", "
4 | \n", "The stock rose $2.11, or about 11 percent, to ... | \n", "PG&E Corp. shares jumped $1.63 or 8 percent to... | \n", "The stock rose $2.11, or about 11 percent, to ... | \n", "PG&E Corp. shares jumped $1.63 or 8 percent to... | \n", "1 | \n", "0 | \n", "can't correct | \n", "