Spaces:
Restarting
on
CPU Upgrade
Unable to repro Llama-3.1-8B-Instruct results + Inconsistent behavior when running locally
I managed to get the evals running locally after a few changes; however, I'm getting inconsistent behavior from the HHEM-2.1 model.
Here's one run:
"source" "summary" "HEM scores"
"Paul Merson has restarted his row with Andros Townsend after the Tottenham midfielder was brought on with only seven minutes remaining in his team's 0-0 draw with Burnley on Sunday. 'Just been watching the game, did you miss the coach? #RubberDub #7minutes,' Merson put on Twitter. Merson initially angered Townsend for writing in his Sky Sports column that 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' Paul Merson had another dig at Andros Townsend after his appearance for Tottenham against Burnley . Townsend was brought on in the 83rd minute for Tottenham as they drew 0-0 against Burnley . Andros Townsend scores England's equaliser in their 1-1 friendly draw with Italy in Turin on Tuesday night . The former Arsenal man was proven wrong when Townsend hit a stunning equaliser for England against Italy and he duly admitted his mistake. 'It's not as though I was watching hoping he wouldn't score for England, I'm genuinely pleased for him and fair play to him – it was a great goal,' Merson said. 'It's just a matter of opinion, and my opinion was that he got pulled off after half an hour at Manchester United in front of Roy Hodgson, so he shouldn't have been in the squad. 'When I'm wrong, I hold my hands up. I don't have a problem with doing that - I'll always be the first to admit when I'm wrong.' Townsend hit back at Merson on Twitter after scoring for England against Italy . Sky Sports pundit Merson (centre) criticised Townsend's call-up to the England squad last week . Townsend hit back at Merson after netting for England in Turin on Wednesday, saying 'Not bad for a player that should be 'nowhere near the squad' ay @PaulMerse?' Any bad feeling between the pair seemed to have passed but Merson was unable to resist having another dig at Townsend after Tottenham drew at Turf Moor." "Paul Merson and Andros Townsend have a long-standing row. Merson initially criticized Townsend for being in the England team, saying 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' However, Townsend scored a stunning equaliser for England against Italy, proving Merson wrong. Merson admitted his mistake and expressed genuine pleasure for Townsend's goal. Despite this, Merson continued to have a dig at Townsend after his appearance for Tottenham against Burnley." "0.0"
"Chelsea have made an offer for FC Tokyo's 22-year-old forward Yoshinori Muto, according to club president Naoki Ogane. The Japan international, who has played for the J-League side since 2013, will join Chelsea's Dutch partner club Vitesse Arnhem on loan next season if he completes a move to Stamford Bridge this summer. Ogane claims that Chelsea's interest in Muto is not connected to the £200million sponsorship deal they signed with Japanese company Yokohama Rubber in February. FC Tokyo forward Yoshinori Muto (centre) brings the ball forward against Albirex Niigata in March . FC Tokyo president Naoki Ogane claims that Chelsea have made a bid for Japan international Muto . Muto tussles with Yuji Nakazawa of Yokohama F.Marinos during a J-League clash last month . Age: 22 . Club: FC Tokyo . Appearances: 37 . Goals: 16 . International caps (Japan): 11 . International goals: 1 . Did you know? Muto graduated from Keio University in Tokyo with an economics degree two weeks ago . Speaking to Sports Nippon, Ogane said: 'It is true that Chelsea sent us an offer for Muto. 'It is a formal offer with conditions. They want to acquire him in the summer.' Muto, who only graduated from Keio University a fortnight ago after completing an economics degree, would be the first Japanese player to represent Chelsea if he moves to west London. He has earned 11 caps for his country after signing his first professional contract in 2014, scoring once for the Samurai Blue. A £4million deal for the youngster has been mooted, but Muto admits that he isn't sure if he will join the Premier League title chasers despite being pleased with their bid. He said: 'I have not decided yet at all. It is an honour for me to receive this offer from a great club.' Muto scored 13 times in his debut season with FC Tokyo and was named in the J-League's best XI. Muto admits it is an 'honour' to receive an offer from Chelsea although he has not yet decided to join the club . Muto, pictured in action against Ventforet Kofu has scored three goals in four games so far this season . The 22-year-old has a shot at goal during Japan's Asian Cup match against Palestine in January this year . He has continued his fine form during the current campaign, helping his club to third place in the division with three goals in four games. Yokohama Rubber - one of the world's largest tyre manufacturers - will become Chelsea's official shirt sponsors from the start of the 2015-16 season. The initial five-year deal is the biggest in the club's history, with the Blues now considering a two-week pre-season tour of Japan this summer." "Here's a concise summary of the passage covering the core pieces of information described:
Chelsea have made a formal offer to FC Tokyo for 22-year-old forward Yoshinori Muto, a Japan international. The offer includes a loan deal for Muto to join Chelsea's Dutch partner club Vitesse Arnhem next season if he completes a move to Chelsea this summer. Muto, who has played for FC Tokyo since 2013, has earned 11 caps for Japan and scored 16 goals in 37 appearances for the club. A £4million deal has been mooted, but Muto has not yet decided to join Chelsea, calling it an 'honour' to receive the offer from the Premier League title chasers." "0.00031"
Here's another run:
"source" "summary" "HEM scores"
"Paul Merson has restarted his row with Andros Townsend after the Tottenham midfielder was brought on with only seven minutes remaining in his team's 0-0 draw with Burnley on Sunday. 'Just been watching the game, did you miss the coach? #RubberDub #7minutes,' Merson put on Twitter. Merson initially angered Townsend for writing in his Sky Sports column that 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' Paul Merson had another dig at Andros Townsend after his appearance for Tottenham against Burnley . Townsend was brought on in the 83rd minute for Tottenham as they drew 0-0 against Burnley . Andros Townsend scores England's equaliser in their 1-1 friendly draw with Italy in Turin on Tuesday night . The former Arsenal man was proven wrong when Townsend hit a stunning equaliser for England against Italy and he duly admitted his mistake. 'It's not as though I was watching hoping he wouldn't score for England, I'm genuinely pleased for him and fair play to him – it was a great goal,' Merson said. 'It's just a matter of opinion, and my opinion was that he got pulled off after half an hour at Manchester United in front of Roy Hodgson, so he shouldn't have been in the squad. 'When I'm wrong, I hold my hands up. I don't have a problem with doing that - I'll always be the first to admit when I'm wrong.' Townsend hit back at Merson on Twitter after scoring for England against Italy . Sky Sports pundit Merson (centre) criticised Townsend's call-up to the England squad last week . Townsend hit back at Merson after netting for England in Turin on Wednesday, saying 'Not bad for a player that should be 'nowhere near the squad' ay @PaulMerse?' Any bad feeling between the pair seemed to have passed but Merson was unable to resist having another dig at Townsend after Tottenham drew at Turf Moor." "Paul Merson and Andros Townsend have a long-standing row. Merson initially criticized Townsend for being in the England team, saying 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' However, Townsend scored a stunning equaliser for England against Italy, proving Merson wrong. Merson admitted his mistake and expressed genuine pleasure for Townsend's goal. Despite this, Merson continued to have a dig at Townsend after his appearance for Tottenham against Burnley." "1.0"
"Chelsea have made an offer for FC Tokyo's 22-year-old forward Yoshinori Muto, according to club president Naoki Ogane. The Japan international, who has played for the J-League side since 2013, will join Chelsea's Dutch partner club Vitesse Arnhem on loan next season if he completes a move to Stamford Bridge this summer. Ogane claims that Chelsea's interest in Muto is not connected to the £200million sponsorship deal they signed with Japanese company Yokohama Rubber in February. FC Tokyo forward Yoshinori Muto (centre) brings the ball forward against Albirex Niigata in March . FC Tokyo president Naoki Ogane claims that Chelsea have made a bid for Japan international Muto . Muto tussles with Yuji Nakazawa of Yokohama F.Marinos during a J-League clash last month . Age: 22 . Club: FC Tokyo . Appearances: 37 . Goals: 16 . International caps (Japan): 11 . International goals: 1 . Did you know? Muto graduated from Keio University in Tokyo with an economics degree two weeks ago . Speaking to Sports Nippon, Ogane said: 'It is true that Chelsea sent us an offer for Muto. 'It is a formal offer with conditions. They want to acquire him in the summer.' Muto, who only graduated from Keio University a fortnight ago after completing an economics degree, would be the first Japanese player to represent Chelsea if he moves to west London. He has earned 11 caps for his country after signing his first professional contract in 2014, scoring once for the Samurai Blue. A £4million deal for the youngster has been mooted, but Muto admits that he isn't sure if he will join the Premier League title chasers despite being pleased with their bid. He said: 'I have not decided yet at all. It is an honour for me to receive this offer from a great club.' Muto scored 13 times in his debut season with FC Tokyo and was named in the J-League's best XI. Muto admits it is an 'honour' to receive an offer from Chelsea although he has not yet decided to join the club . Muto, pictured in action against Ventforet Kofu has scored three goals in four games so far this season . The 22-year-old has a shot at goal during Japan's Asian Cup match against Palestine in January this year . He has continued his fine form during the current campaign, helping his club to third place in the division with three goals in four games. Yokohama Rubber - one of the world's largest tyre manufacturers - will become Chelsea's official shirt sponsors from the start of the 2015-16 season. The initial five-year deal is the biggest in the club's history, with the Blues now considering a two-week pre-season tour of Japan this summer." "Here's a concise summary of the passage covering the core pieces of information described:
Chelsea have made a formal offer to FC Tokyo for 22-year-old forward Yoshinori Muto, a Japan international. The offer includes a loan deal for Muto to join Chelsea's Dutch partner club Vitesse Arnhem next season if he completes a move to Chelsea this summer. Muto, who has played for FC Tokyo since 2013, has earned 11 caps for Japan and scored 16 goals in 37 appearances for the club. A £4million deal has been mooted, but Muto has not yet decided to join Chelsea, calling it an 'honour' to receive the offer from the Premier League title chasers." "1.0"
Note that the inputs and outputs are identical, yet the final scores are completely different.
The full runs result in the following scores:
Factual Consistency Rate: 89.76143141153081
Hallucination Rate: 10.238568588469192
Factual Consistency Rate: 10.139165009940358
Hallucination Rate: 89.86083499005964
While these numbers are effectively mirrors of each other (which is strange), these numbers are also nowhere near the official 8B numbers:
Factual Consistency Rate: 94.6
Hallucination Rate: 5.4
I get a correct repro (and consistent behavior) when modifying the inference to use AutoModel instead:
https://huggingface.co/vectara/hallucination_evaluation_model#using-with-automodel
Factual Consistency Rate: 97.01789264413519
Hallucination Rate: 2.9821073558648123
Hi @lema-balerion Thanks for your interest in HHEM-2.1-Open. So has your issue been solved?
My issue is solved but given that I needed to make multiple changes to the repo to make it happen, this issue should remain open until inference is fixed. Is this still being maintained?
Could you please provide more details on your previous implementation that led to inconsistent performance, as well as the specific changes you made to achieve consistent predictions? This information will help us address the issue more efficiently.
It's not my previous implementation, it's the default state of the repo.
The changes that needed to be made were in model_operations.py:
self.other_model = AutoModelForSequenceClassification.from_pretrained(
model_path, trust_remote_code=True)
and
scores = self.other_model.predict(text_pairs)
scores = [round(x, 5) for x in scores.tolist()]
return scores
Basically completely replacing the current implementation, which uses AutoModelForTokenClassification
, AutoTokenizer
, and manually pulls out the logits:
prompt = "<pad> Determine if the hypothesis is true given the premise?\n\nPremise: {text1}\n\nHypothesis: {text2}"
tokenizer = AutoTokenizer.from_pretrained('t5-base')
inputs = tokenizer(
[prompt.format(text1=pair[0], text2=pair[1]) for pair in text_pairs],
return_tensors='pt', padding='longest').to(self.device)
self.model.eval()
with torch.no_grad():
output = self.model(**inputs)
logits = output.logits
logits = logits[:,0,:] # get the logits on the first token
logits = torch.softmax(logits, dim=-1)
scores = [round(x, 5) for x in logits[:, 1].tolist()] # list of float
return scores
Hi @lema-balerion Do u want us to further investigate this or you think it is okay to close the ticket?
Until changes are made to the repo to address the issue I don't think the ticket should be closed, no.
Hi @lema-balerion , can you check if the following implementation matches your initial attempt that resulted in unstable predictions? I’ve used the two pairs you mentioned as an example. If I’ve misunderstood the issue, could you clarify what your initial approach was?
from transformers import AutoModelForSequenceClassification
source_1 = "Paul Merson has restarted his row with Andros Townsend after the Tottenham midfielder was brought on with only seven minutes remaining in his team's 0-0 draw with Burnley on Sunday. 'Just been watching the game, did you miss the coach? #RubberDub #7minutes,' Merson put on Twitter. Merson initially angered Townsend for writing in his Sky Sports column that 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' Paul Merson had another dig at Andros Townsend after his appearance for Tottenham against Burnley . Townsend was brought on in the 83rd minute for Tottenham as they drew 0-0 against Burnley . Andros Townsend scores England's equaliser in their 1-1 friendly draw with Italy in Turin on Tuesday night . The former Arsenal man was proven wrong when Townsend hit a stunning equaliser for England against Italy and he duly admitted his mistake. 'It's not as though I was watching hoping he wouldn't score for England, I'm genuinely pleased for him and fair play to him – it was a great goal,' Merson said. 'It's just a matter of opinion, and my opinion was that he got pulled off after half an hour at Manchester United in front of Roy Hodgson, so he shouldn't have been in the squad. 'When I'm wrong, I hold my hands up. I don't have a problem with doing that - I'll always be the first to admit when I'm wrong.' Townsend hit back at Merson on Twitter after scoring for England against Italy . Sky Sports pundit Merson (centre) criticised Townsend's call-up to the England squad last week . Townsend hit back at Merson after netting for England in Turin on Wednesday, saying 'Not bad for a player that should be 'nowhere near the squad' ay @PaulMerse?' Any bad feeling between the pair seemed to have passed but Merson was unable to resist having another dig at Townsend after Tottenham drew at Turf Moor."
summary_1 = "Paul Merson and Andros Townsend have a long-standing row. Merson initially criticized Townsend for being in the England team, saying 'if Andros Townsend can get in (the England team) then it opens it up to anybody.' However, Townsend scored a stunning equaliser for England against Italy, proving Merson wrong. Merson admitted his mistake and expressed genuine pleasure for Townsend's goal. Despite this, Merson continued to have a dig at Townsend after his appearance for Tottenham against Burnley."
source_2 = "Chelsea have made an offer for FC Tokyo's 22-year-old forward Yoshinori Muto, according to club president Naoki Ogane. The Japan international, who has played for the J-League side since 2013, will join Chelsea's Dutch partner club Vitesse Arnhem on loan next season if he completes a move to Stamford Bridge this summer. Ogane claims that Chelsea's interest in Muto is not connected to the £200million sponsorship deal they signed with Japanese company Yokohama Rubber in February. FC Tokyo forward Yoshinori Muto (centre) brings the ball forward against Albirex Niigata in March . FC Tokyo president Naoki Ogane claims that Chelsea have made a bid for Japan international Muto . Muto tussles with Yuji Nakazawa of Yokohama F.Marinos during a J-League clash last month . Age: 22 . Club: FC Tokyo . Appearances: 37 . Goals: 16 . International caps (Japan): 11 . International goals: 1 . Did you know? Muto graduated from Keio University in Tokyo with an economics degree two weeks ago . Speaking to Sports Nippon, Ogane said: 'It is true that Chelsea sent us an offer for Muto. 'It is a formal offer with conditions. They want to acquire him in the summer.' Muto, who only graduated from Keio University a fortnight ago after completing an economics degree, would be the first Japanese player to represent Chelsea if he moves to west London. He has earned 11 caps for his country after signing his first professional contract in 2014, scoring once for the Samurai Blue. A £4million deal for the youngster has been mooted, but Muto admits that he isn't sure if he will join the Premier League title chasers despite being pleased with their bid. He said: 'I have not decided yet at all. It is an honour for me to receive this offer from a great club.' Muto scored 13 times in his debut season with FC Tokyo and was named in the J-League's best XI. Muto admits it is an 'honour' to receive an offer from Chelsea although he has not yet decided to join the club . Muto, pictured in action against Ventforet Kofu has scored three goals in four games so far this season . The 22-year-old has a shot at goal during Japan's Asian Cup match against Palestine in January this year . He has continued his fine form during the current campaign, helping his club to third place in the division with three goals in four games. Yokohama Rubber - one of the world's largest tyre manufacturers - will become Chelsea's official shirt sponsors from the start of the 2015-16 season. The initial five-year deal is the biggest in the club's history, with the Blues now considering a two-week pre-season tour of Japan this summer."
summary_2 = """Here's a concise summary of the passage covering the core pieces of information described:
Chelsea have made a formal offer to FC Tokyo for 22-year-old forward Yoshinori Muto, a Japan international. The offer includes a loan deal for Muto to join Chelsea's Dutch partner club Vitesse Arnhem next season if he completes a move to Chelsea this summer. Muto, who has played for FC Tokyo since 2013, has earned 11 caps for Japan and scored 16 goals in 37 appearances for the club. A £4million deal has been mooted, but Muto has not yet decided to join Chelsea, calling it an 'honour' to receive the offer from the Premier League title chasers."""
pairs = [ # Test data, List[Tuple[str, str]]
(source, summary) for source, summary in zip([source_1,source_2],[summary_1,summary_2])
]
model = AutoModelForSequenceClassification.from_pretrained('vectara/hallucination_evaluation_model', trust_remote_code=True).to("cuda:0")
print(model.predict(pairs))
Your samples are correct, but the approach you're sharing is the one that is stable and correct. The one that's inconsistent is the one built into this repo:
https://huggingface.co/spaces/vectara/leaderboard/blob/066863b06e8d230559d3638a331370baa70773f1/src/backend/model_operations.py#L519
The code in this repository is for our internal maintenance and was implemented for HHEM-2.1, which supports the leaderboard. HHEM-2.1-Open is an open-source alternative. For reproducing results, please refer to the Evaluation with HHEM-2.1-Open Locally section on the About page. When using HHEM-2.1-Open, it should be loaded with AutoModelForSequenceClassification, as demonstrated in the example provided on the model card. I hope this clears up any confusion.
Evaluation with HHEM-2.1-Open Locally
- You can access generated summaries from models on the leaderboard here. The text generation prompt is available under "Prompt Used" section in the repository's README.
- Check here for more details on using HHEM-2.1-Open. Please note that our leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.
Ah, I see the About
page was updated 6 days ago, while I was following the instructions 7 days ago and filed the bug afterward.
https://huggingface.co/spaces/vectara/leaderboard/commit/066863b06e8d230559d3638a331370baa70773f1
Now that the About page has been updated this can be closed.