seonil commited on
Commit
adcb90e
1 Parent(s): ac939ac

return only harim+ scores

Browse files
Files changed (2) hide show
  1. README.md +3 -4
  2. harim_plus.py +13 -17
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 3.9
8
  app_file: app.py
9
  pinned: false
10
  tags:
@@ -34,7 +34,6 @@ pip install evaluate
34
  import evaluate
35
  from pprint import pprint
36
 
37
- # example from the paper
38
  art = """Spain's 2-0 defeat by Holland on Tuesday brought back bitter memories of their disastrous 2014 World Cup, but coach Vicente del Bosque will not be too worried about a third straight friendly defeat, insists Gerard Pique. Holland, whose 5-1 drubbing of Spain in the group stage in Brazil last year marked the end of the Iberian nation's six-year domination of the world game, scored two early goals at the Amsterdam Arena and held on against some determined Spain pressure in the second half for a 2-0 success. They became the first team to inflict two defeats on Del Bosque since he took over in 2008 but the gruff 64-year-old had used the match to try out several new faces and he fielded a largely experimental, second-string team. Stefan de Vrij (right) headed Holland in front against Spain at the Amsterdam Arena on Tuesday Gerard Pique (left) could do nothing to stop Davy Klaassen doubling the Dutch advantage Malaga forward Juanmi and Sevilla midfielder Vitolo became the 55th and 56th players to debut under Del Bosque, while the likes of goalkeeper David de Gea, defenders Raul Albiol, Juan Bernat and Dani Carvajal and midfielder Mario Suarez all started the game. 'The national team's state of health is good,' centre back Gerard Pique told reporters. 'We are in a process where players are coming into the team and gathering experience,' added the Barcelona defender. 'We are second in qualifying (for Euro 2016) and these friendly games are for experimenting. 'I am not that worried about this match because we lost friendlies in previous years and then ended up winning titles.' David de Gea was given a start by Vicente del Bosque but could not keep out De Vrij's header here Dani Carvajal (centre) was another squad player given a chance to impress against Holland Del Bosque will be confident he can find the right mix of players to secure Spain's berth at Euro 2016 in France next year, when they will be chasing an unprecedented third straight title. Slovakia are the surprise leaders in qualifying Group C thanks to a 2-1 win over Spain in Zilina in October and have a maximum 15 points from five of 10 matches. Spain are second on 12 points, three ahead of Ukraine, who they beat 1-0 in Seville on Friday. Del Bosque's side host Slovakia in September in a match that could decide who goes through to the finals as group winners. 'The team is in good shape,' forward Pedro told reporters. 'We have a very clear idea of our playing style and we are able to count on people who are gradually making a place for themselves in the team.'"""
39
 
40
  summaries = [
@@ -46,8 +45,8 @@ summaries = [
46
  articles = [art] * len(summaries)
47
 
48
  scorer = evaluate.load('NCSOFT/harim_plus')
49
- scores = scorer.compute(predictions = summaries, references = articles) # use_aggregator=False, tokenwise_score=False, bsz=32)
50
- pprint(scores['harim+'])
51
  >>> [1.8230078220367432,
52
  1.5361897945404053,
53
  1.806436538696289,
 
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
  tags:
 
34
  import evaluate
35
  from pprint import pprint
36
 
 
37
  art = """Spain's 2-0 defeat by Holland on Tuesday brought back bitter memories of their disastrous 2014 World Cup, but coach Vicente del Bosque will not be too worried about a third straight friendly defeat, insists Gerard Pique. Holland, whose 5-1 drubbing of Spain in the group stage in Brazil last year marked the end of the Iberian nation's six-year domination of the world game, scored two early goals at the Amsterdam Arena and held on against some determined Spain pressure in the second half for a 2-0 success. They became the first team to inflict two defeats on Del Bosque since he took over in 2008 but the gruff 64-year-old had used the match to try out several new faces and he fielded a largely experimental, second-string team. Stefan de Vrij (right) headed Holland in front against Spain at the Amsterdam Arena on Tuesday Gerard Pique (left) could do nothing to stop Davy Klaassen doubling the Dutch advantage Malaga forward Juanmi and Sevilla midfielder Vitolo became the 55th and 56th players to debut under Del Bosque, while the likes of goalkeeper David de Gea, defenders Raul Albiol, Juan Bernat and Dani Carvajal and midfielder Mario Suarez all started the game. 'The national team's state of health is good,' centre back Gerard Pique told reporters. 'We are in a process where players are coming into the team and gathering experience,' added the Barcelona defender. 'We are second in qualifying (for Euro 2016) and these friendly games are for experimenting. 'I am not that worried about this match because we lost friendlies in previous years and then ended up winning titles.' David de Gea was given a start by Vicente del Bosque but could not keep out De Vrij's header here Dani Carvajal (centre) was another squad player given a chance to impress against Holland Del Bosque will be confident he can find the right mix of players to secure Spain's berth at Euro 2016 in France next year, when they will be chasing an unprecedented third straight title. Slovakia are the surprise leaders in qualifying Group C thanks to a 2-1 win over Spain in Zilina in October and have a maximum 15 points from five of 10 matches. Spain are second on 12 points, three ahead of Ukraine, who they beat 1-0 in Seville on Friday. Del Bosque's side host Slovakia in September in a match that could decide who goes through to the finals as group winners. 'The team is in good shape,' forward Pedro told reporters. 'We have a very clear idea of our playing style and we are able to count on people who are gradually making a place for themselves in the team.'"""
38
 
39
  summaries = [
 
45
  articles = [art] * len(summaries)
46
 
47
  scorer = evaluate.load('NCSOFT/harim_plus')
48
+ scores = scorer.compute(predictions = summaries, references = articles) # use_aggregator=False, bsz=32, return_details=False, tokenwise_score=False)
49
+ pprint(scores)
50
  >>> [1.8230078220367432,
51
  1.5361897945404053,
52
  1.806436538696289,
harim_plus.py CHANGED
@@ -30,14 +30,11 @@ _CITATION = """\
30
  }
31
  """
32
 
33
- _DESCRIPTION = """\
34
- HaRiM+ is a reference-less (i.e. scoring summary quality only requires an article) evaluation metric score for summarization task which hurls the power of summarization model.
35
- It will work great ranking the summary-article pairs according to its quality.
36
- Note that the score range is unbound.
37
 
38
- Summarization model inside the HaRiM+ will read and evaluate how good the quality of a summary given the paired source article.
39
-
40
- HaRiM+ is proved effective for benchmarking summarization systems (system-level performance) as well as ranking the article-summary pairs (segment-level performance) in comprehensive aspect such as factuality, consistency, coherency, fluency, and relevance. For details, refer to our paper published in AACL2022.
41
  """
42
 
43
  _KWARGS_DESCRIPTION = """
@@ -51,14 +48,12 @@ Args:
51
  `predictions` (list of str): generated summaries
52
  `references` (list of str): source articles to be summarized
53
  `use_aggregator` (bool): if True, average of the scores are returned
 
 
 
54
 
55
  Returns:
56
- 'results' (dict): {
57
- 'harim+' (List[float] or float): HaRiM+ score to use,
58
- 'harim' (List[float] or float): HaRiM term for computing the score above,
59
- 'log_ppl' (List[float] or float): Log perplexity term. Same as (Yuan et al., NeurIPS 2021),
60
- 'lambda' (float): (recommend not to modify this) Balancing coeff. for computing harim+ from harim and log_ppl.
61
- }
62
 
63
  Examples:
64
  >>> summaries = ["hello there", "hello there"]
@@ -94,8 +89,8 @@ class Harimplus(evaluate.Metric):
94
  inputs_description=_KWARGS_DESCRIPTION,
95
  features=datasets.Features(
96
  {
97
- "predictions": datasets.Value("string", id="sequence"),
98
- "references": datasets.Value("string", id="sequence"),
99
  }
100
  ),
101
  codebase_urls=[CODEBASE_URL],
@@ -124,8 +119,9 @@ class Harimplus(evaluate.Metric):
124
  references=None,
125
  use_aggregator=False,
126
  bsz=32,
127
- tokenwise_score=False):
 
128
  summaries = predictions
129
  articles = references
130
- scores = self.scorer.compute(predictions=summaries, references=articles, use_aggregator=use_aggregator, bsz=bsz, tokenwise_score=tokenwise_score)
131
  return scores
 
30
  }
31
  """
32
 
33
+ _DESCRIPTION = f"""HaRiM+ is a reference-less evaluation metric (i.e. requires only article-summary pair, no reference summary) for summarization which hurls the power of summarization model.
34
+ Summarization model inside the HaRiM+ will read and evaluate how good the quality of a summary given the paired article.
35
+ It will work great for ranking the summary-article pairs according to its quality.
 
36
 
37
+ HaRiM+ is proved effective for benchmarking summarization systems (system-level performance) as well as ranking the article-summary pairs (segment-level performance) in comprehensive aspect such as factuality, consistency, coherency, fluency, and relevance. For details, refer to our [paper]({PAPER_URL}) published in AACL2022.
 
 
38
  """
39
 
40
  _KWARGS_DESCRIPTION = """
 
48
  `predictions` (list of str): generated summaries
49
  `references` (list of str): source articles to be summarized
50
  `use_aggregator` (bool): if True, average of the scores are returned
51
+ `bsz` (int): batch size for harim to iterate through the given pairs
52
+ `return_details` (bool): whether to show more than harim+ score (returns logppl, harim term. refer to the paper for detail)
53
+ `tokenwise_score` (bool): whether to show tokenwise scores for input pairs (if return_details=False, this is ignored)
54
 
55
  Returns:
56
+ 'results' (list of float): harim+ score for each summary-article pair
 
 
 
 
 
57
 
58
  Examples:
59
  >>> summaries = ["hello there", "hello there"]
 
89
  inputs_description=_KWARGS_DESCRIPTION,
90
  features=datasets.Features(
91
  {
92
+ "predictions (summaries)": datasets.Value("string", id="sequence"),
93
+ "references (articles)": datasets.Value("string", id="sequence"),
94
  }
95
  ),
96
  codebase_urls=[CODEBASE_URL],
 
119
  references=None,
120
  use_aggregator=False,
121
  bsz=32,
122
+ tokenwise_score=False,
123
+ return_details=False):
124
  summaries = predictions
125
  articles = references
126
+ scores = self.scorer.compute(predictions=summaries, references=articles, use_aggregator=use_aggregator, bsz=bsz, tokenwise_score=tokenwise_score, return_details=return_details)
127
  return scores