Chenxi Whitehouse commited on
Commit
2b35800
1 Parent(s): f50b47e

add new files, fix typo

Browse files
README.md CHANGED
@@ -32,8 +32,14 @@ bash script/scraper.sh <split> <start_idx> <end_idx>
32
  # e.g., bash script/scraper.sh dev 0 500
33
  ```
34
 
35
- ### Rank the sentences in the knowledge store with BM25
36
- See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more args
37
  ```
38
  python -m src.reranking.bm25_sentences
 
 
 
 
 
 
39
  ```
 
32
  # e.g., bash script/scraper.sh dev 0 500
33
  ```
34
 
35
+ ### Rank the sentences in the knowledge store with BM25, keep top 100 sentences for each claim
36
+ See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more argument options.
37
  ```
38
  python -m src.reranking.bm25_sentences
39
+ ```
40
+
41
+ ### Generate questions for each evidence sentence
42
+ We use [BLOOM](https://huggingface.co/bigscience/bloom-7b1) to generate questions for each evidence sentence using the closet examples from the training set. See [question_generation_top_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/question_generation_top_sentences.py) for more argument options.
43
+ ```
44
+ python -m retrieval_reranking.question_generation_top_sentences
45
  ```
data_store/dev_top_k.json ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ {"claim_id": 0, "claim": "In a letter to Steve Jobs, Sean Connery refused to appear in an apple commercial.", "top_100": [{"sentence": "Also, fake Sean Connery sent a letter to Real Steve Jobs.", "url": "https://www.nbcnews.com/news/world/pre-caffeine-tech-apple-gossip-smart-pugs-flna122578"}, {"sentence": "This is a letter Sean Connery wrote didn't write in response to Steve Jobs after being asked to appear in an Apple ad.", "url": "https://www.businessinsider.com/james-bond-sean-connery-steve-jobs-apple-letter-2011-6"}, {"sentence": "Fake Sean Connery / Steve Jobs letter becomes top Twitter trending topic", "url": "https://www.mi6-hq.com/news/index.php?itemid=9532"}, {"sentence": "Hilarious, though fictional, was this letter from Sean Connery to Steve Jobs released this morning on Scoopertino.", "url": "https://www.splasmata.com/?m=201106"}, {"sentence": "First, the bad news. Sean Connery never actually sent a typewritten letter to Steve Jobs in 1998 refusing to be in an Apple ad.", "url": "https://www.cnet.com/culture/fake-sean-connery-letter-to-steve-jobs-goes-viral/"}, {"sentence": "Pingback: Did Sean Connery Write an Angry Letter to Steve Jobs? | wafflesatnoon.com", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Pingback: Did Sean Connery Write an Angry Letter to Steve Jobs? | wafflesatnoon.com", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Pingback: Carta de Sean Connery a Steve Jobs — Tecnoculto", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Pingback: Carta de Sean Connery a Steve Jobs — Tecnoculto", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "'I am f****** James Bond': Sean Connery letter to Steve Jobs rejecting offer to appear in Apple ad revealed to be fake", "url": "https://www.dailymail.co.uk/news/article-2006317/Sean-Connery-letter-Steve-Jobs-rejecting-offer-appear-Apple-ad-revealed-fake.html"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume", "url": "https://www.mediastorehouse.com/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume", "url": "https://www.mediastorehouse.co.uk/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "私のジェームズ・ボンド好きとアップル好きのせいで、私のメール箱は、1998 年に Sean Connery から Steve Jobs に送られたという Scoopertino の偽手紙に関するメールで溢れかえっている。これはこういうことだ。この手紙の画像コピーがインターネットを野火の如く流布している。しかし Scoopertino へのリンクは張られていない。ということはなりすましのイタズラはうまくいかなかったということだ。", "url": "https://maclalala2.wordpress.com/2011/06/24/%E3%81%9F%E3%81%8B%E3%81%8C%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%BC%E3%83%AB%E3%82%B9%E3%83%9E%E3%83%B3%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AB%E3%82%B8%E3%82%A7%E3%83%BC/"}, {"sentence": "Pingback: JamesBondAuction.co.uk – James Bond 007 » Archive » Fake Sean Connery / Steve Jobs letter becomes top Twitter trending topic", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Pingback: JamesBondAuction.co.uk – James Bond 007 » Archive » Fake Sean Connery / Steve Jobs letter becomes top Twitter trending topic", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume Circa May 1985", "url": "https://www.mediastorehouse.co.uk/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume Circa May 1985", "url": "https://www.mediastorehouse.co.uk/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume Circa May 1985", "url": "https://www.mediastorehouse.com/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Sean Connery eating an apple on set of film Highlander in costume Circa May 1985", "url": "https://www.mediastorehouse.com/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Pingback: Sean Connery writes Steve Jobs. - Science Fiction Fantasy Chronicles: forums", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Pingback: Sean Connery writes Steve Jobs. - Science Fiction Fantasy Chronicles: forums", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Do you know who I am? A faked letter from James Bond star Sir Sean Connery firmly rejected an apparent advertising role from Apple chief Steve Jobs", "url": "https://www.dailymail.co.uk/news/article-2006317/Sean-Connery-letter-Steve-Jobs-rejecting-offer-appear-Apple-ad-revealed-fake.html"}, {"sentence": "An image of a purported 1998 letter from actor Sean Connery (famous for his portrayal of agent James Bond) to Apple CEO Steve Jobs, caustically rebuffing an offer to become a pitchman for Apple Computers, hit the Internet in June 2011.", "url": "https://www.snopes.com/fact-check/false-sean-connery-letter-to-apple/"}, {"sentence": "Ernest Hemingway, Sean Connery, Sigmund Freud, Steve Jobs, Padre Pio, Van Gogh, Giuseppe Verdi, George Clooney, Lenin, Cavour, Garibaldi…", "url": "https://www.isupportstreetart.com/sigmund-freud-test-of-personality-by-chekos-opiemme/"}, {"sentence": "Steve Jobs in an Interview with Fortune Magazine, 2000", "url": "https://www.stephenfry.com/2011/10/steve-jobs/"}, {"sentence": "A 'letter' from Sean Connery to Steve Jobs was a top trending topic on Twitter today thanks to an unwitting tweet from a marketing executive who thought it was genuine.", "url": "https://www.mi6-hq.com/news/index.php?itemid=9532"}, {"sentence": "Thanks to the confluence of my interests and the fact that it’s funny as hell, I’ve been inundated with email regarding Scoopertino’s fake 1998 letter from Sean Connery to Steve Jobs.", "url": "https://maclalala2.wordpress.com/2011/06/24/%E3%81%9F%E3%81%8B%E3%81%8C%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%BC%E3%83%AB%E3%82%B9%E3%83%9E%E3%83%B3%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AB%E3%82%B8%E3%82%A7%E3%83%BC/"}, {"sentence": "In 1996, she married Jason Connery, son of Sean Connery, with whom she performed in Bullet to Beijing (1995).", "url": "https://www.imdb.com/search/name/?birth_year=1967"}, {"sentence": "Another added: 'Sean Connery's letter to Steve Jobs? Well done satire site Scoopertino for fooling so many tweeps .'", "url": "https://www.dailymail.co.uk/news/article-2006317/Sean-Connery-letter-Steve-Jobs-rejecting-offer-appear-Apple-ad-revealed-fake.html"}, {"sentence": "Kellie Pickler, Sean Connery, Others Weigh In On Proper Kilt Etiquette", "url": "https://www.mtv.com/news/mhhftp/kellie-pickler-sean-connery-others-weigh-in-on-proper-kilt-etiquette"}, {"sentence": "Anyway, here is a link to Sean Connery in his own words.....", "url": "https://www.mumsnet.com/Talk/_chat/4066060-Sean-Connery-is-not-a-legend"}, {"sentence": "Though Steve had a thing for Sean Connery, the feeling was not mutual. Connery was appalled by the “advert” Jobs sent across the pond and declined to participate in the misadventure on at least three separate occasions.", "url": "https://maclalala2.wordpress.com/2011/06/24/%E3%81%9F%E3%81%8B%E3%81%8C%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%BC%E3%83%AB%E3%82%B9%E3%83%9E%E3%83%B3%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AB%E3%82%B8%E3%82%A7%E3%83%BC/"}, {"sentence": "Though Steve had a thing for Sean Connery, the feeling was not mutual. Connery was appalled by the “advert” Jobs sent across the pond and declined to participate in the misadventure on at least three separate occasions.", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Though Steve had a thing for Sean Connery, the feeling was not mutual. Connery was appalled by the “advert” Jobs sent across the pond and declined to participate in the misadventure on at least three separate occasions.", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "such an assembly to the duke, but James refused. In retaliation, mer-", "url": "https://rockinst.org/wp-content/uploads/2017/10/New-York-State-Government-Second-Edition.pdf"}, {"sentence": "Thousands of James Bond fans were today taken in by a spoof letter from Sean Conney to Apple boss Steve Jobs in which the film star launches a rant at the computer chief.", "url": "https://www.dailymail.co.uk/news/article-2006317/Sean-Connery-letter-Steve-Jobs-rejecting-offer-appear-Apple-ad-revealed-fake.html"}, {"sentence": "Steve Jobs, a lifelong fan of James Bond (he'd originally wanted to name the revolutionary computer \"Double-O-Mac\"), instructed his agency to begin work on a special celebrity Christmas ad featuring 007 himself, Sean Connery — even though Connery had yet to be signed.", "url": "https://www.snopes.com/fact-check/false-sean-connery-letter-to-apple/"}, {"sentence": "Starring Lorraine Bracco, Sean Connery, José Wilker", "url": "https://tv.apple.com/us/movie/medicine-man/umc.cmc.4okbho8b3z0zwwsq45dju39vk"}, {"sentence": "The Macintosh was named after a type (or, more appropriately for a company run by Steve Jobs, a “cultivar”) of apple.", "url": "https://www.techadvisor.com/article/725333/apple-a-z-everything-you-need-to-know-about-apple.html"}, {"sentence": "Dieser war damit nicht ganz einverstanden und das ging ihm offenbar gehörig auf den Saque. Also schrieb Sean Connery einen nicht ganz freundlichen Brief an Steve Jobs.", "url": "https://www.kraftfuttermischwerk.de/blogg/james-bonds-brief-an-steve-jobs/"}, {"sentence": "Steve Jobs, a lifelong fan of James Bond (he’d originally wanted to name the revolutionary computer “Double-O-Mac”), instructed his agency to begin work on a special celebrity Christmas ad featuring 007 himself, Sean Connery — even though Connery had yet to be signed.", "url": "https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Steve Jobs, a lifelong fan of James Bond (he’d originally wanted to name the revolutionary computer “Double-O-Mac”), instructed his agency to begin work on a special celebrity Christmas ad featuring 007 himself, Sean Connery — even though Connery had yet to be signed.", "url": "https://web.archive.org/web/20201129141238/https://scoopertino.com/exposed-the-imac-disaster-that-almost-was/"}, {"sentence": "Steve Jobs, a lifelong fan of James Bond (he’d originally wanted to name the revolutionary computer “Double-O-Mac”), instructed his agency to begin work on a special celebrity Christmas ad featuring 007 himself, Sean Connery — even though Connery had yet to be signed.", "url": "https://maclalala2.wordpress.com/2011/06/24/%E3%81%9F%E3%81%8B%E3%81%8C%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%BC%E3%83%AB%E3%82%B9%E3%83%9E%E3%83%B3%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AB%E3%82%B8%E3%82%A7%E3%83%BC/"}, {"sentence": "Sean Connery has been in the news of late: First there was the “gay kiss” (more on that later), then The Donald (a.k.a. Donald Trump) announced he wanted Connery to open his Scottish golf complex, and now comes a letter – fake, but worth reading all the same – “From the Desk of Sean Connery,” telling Apple’s computer salesman Steve Jobs to get lost for good.", "url": "https://www.altfg.com/film/warren-beatty-howard-hughes/"}, {"sentence": "In what may be the most exciting James Bond/Apple crossover since the famous fake letter from Sean Connery to Steve Jobs, style icon James Bond cosplaying as Apple’s late CEO is perhaps the best compliment Apple can be paid as it continues to take on the fashion world.", "url": "https://www.cultofmac.com/316087/what-do-steve-jobs-and-james-bond-have-in-common-turtlenecks-black-turtlenecks/"}, {"sentence": "Though Connery became known as Sir Sean Connery, it was a bumpy road to his eventual knighting in 2000.", "url": "https://www.looper.com/246825/the-untold-truth-of-sean-connery/"}, {"sentence": "According to resources including documented first-person interviews, TIME magazine and Walter Isaacson, Steve Jobs’ biographer, here is a mosaic of Steve Jobs’ sample day:", "url": "https://owaves.com/day-plan/day-life-steve-jobs/"}, {"sentence": "played by Sean Connery, Jack Nicholson, or Harrison ", "url": "https://www.brooklaw.edu/-/media/Brooklaw/News-Events/Brooklyn-Law-Notes/Legacy-Issues/PDFs/LawNotesFall2012.pdf"}, {"sentence": "Producer: Rhonda Tollefson, Michael Hertzberg, Sean Connery", "url": "https://www.rottentomatoes.com/m/entrapment"}, {"sentence": "Cast: Sean Connery, Ursula Andress, Joseph Wiseman", "url": "https://www.afi.com/afis-100-years-100-heroes-villians/"}, {"sentence": "Cast: Sean Connery, Ursula Andress, Joseph Wiseman", "url": "https://www.afi.com/afis-100-years-100-heroes-villians/"}, {"sentence": "A guy called Jony Ive, an Englishman, was hand-in-glove with Steve Jobs. The first apple instrument that was not white was the U2 black and red iPod.", "url": "https://thecurrency.news/articles/27183/u2-and-me-paul-mcguinness-on-russian-oligarchs-in-the-riviera-cutting-deals-with-steve-jobs-and-taking-financial-advice-from-bono/"}, {"sentence": "In your natal chart, Sean Connery, the ten main planets are distributed as follows:", "url": "https://www.astrotheme.com/astrology/Sean_Connery"}, {"sentence": "In 1977 Steve Jobs founded Apple together with Steve Wozniak, Ronald Wayne and Mike Markkula. In 1985 Jobs resigned from Apple after losing a struggle with the board of directors.", "url": "https://creativecriminals.com/celebrities/apple/think-different"}, {"sentence": "Sean Connery, albeit his hirsute moobs have a longer screentime ", "url": "https://commanderbond.net/wp-content/uploads/2013/12/The-007th-Minute-Ebook-Edition.pdf"}, {"sentence": "Elements, Modes and House Accentuations for Sean Connery", "url": "https://www.astrotheme.com/astrology/Sean_Connery"}, {"sentence": "Sean Connery has said that he refused to give up his favourite wine despite doctors' calls for him to quit drinking.", "url": "https://www.digitalspy.com/showbiz/a181395/connery-refused-to-quit-drinking/"}, {"sentence": "his complaints in a letter to the president.\" In the letter, Chennault said he ", "url": "https://history.army.mil/html/books/068/68-4/CMH_Pub_68-4.pdf"}, {"sentence": "appearance, a lot. Because Costner appeared with Sean Connery in The Untoucha-", "url": "https://monoskop.org/images/7/7b/Lovink_Geert_Rasch_Miriam_eds_Unlike_Us_Reader_Social_Media_Monopolies_and_Their_Alternatives.pdf"}, {"sentence": "RIP Steve Jobs, thanks for everything. You have been an inspiration to my entrepreneurial career.", "url": "https://news.ycombinator.com/item?id=3078128"}, {"sentence": "In 2008, Joe Nocera was working on a column about Steve Jobs' health, criticizing Jobs and Apple for keeping it a secret from investors.", "url": "https://www.businessinsider.com/steve-jobs-jerk-2011-10"}, {"sentence": "or the late Apple founder Steve Jobs, few of us will ever talk to an audience of ", "url": "https://www.cag.edu.tr/uploads/site/lecturer-files/mary-guffey-essentials-of-business-communication-2016-yzss.pdf"}, {"sentence": "Dressed in his character's costume, Connery exudes an air of effortless charm and sophistication as he indulges in a crisp apple.", "url": "https://www.mediastorehouse.com/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "Dressed in his character's costume, Connery exudes an air of effortless charm and sophistication as he indulges in a crisp apple.", "url": "https://www.mediastorehouse.co.uk/memory-lane-prints/mirror/0000to0099-00029/sean-connery-eating-apple-set-film-highlander-21300411.html"}, {"sentence": "In what film did Sean Connery sing Pretty Irish Girl", "url": "https://www.scoutingpolaris.nl/downloads/spellen/10.000vragen.pdf"}, {"sentence": "The next major stumbling block was in choosing an actor to read the script. Siltanen wanted Robin Williams but he refused to do any advertising, even after Jobs attempted to call him personally (his wife refused to put Jobs through).", "url": "https://www.creativereview.co.uk/apple-think-different-slogan/"}, {"sentence": "in an open letter to the Department of Social Work, “In this technological age, when ", "url": "https://wne.edu/university-archives/doc/WNE_History.pdf"}, {"sentence": "rockets eaten whole, Sean Connery with a camera on his head, Fifty ", "url": "https://commanderbond.net/wp-content/uploads/2013/12/The-007th-Minute-Ebook-Edition.pdf"}, {"sentence": "Excerpts from an Oral History Interview with Steve Jobs", "url": "https://americanhistory.si.edu/comphist/sj1.html"}, {"sentence": "Steve Jobs got his start in business with another Steve, Steve Wozniak, building the blue boxes phone phreakers used to make free calls across the nation.", "url": "https://www.investopedia.com/articles/fundamental-analysis/12/steve-jobs-apple-story.asp"}, {"sentence": "Sean Connery writes me a letter and gives a phone call to Disney and says, Guys, you know what? I think I'm too old for this part.", "url": "http://www.cigaraficionado.com/article/an-interview-with-arnon-milchan-6231"}, {"sentence": "first glance appear to be a commodity, undifferentiated product, in an attempt to improve ", "url": "https://colbournecollege.weebly.com/uploads/2/3/7/9/23793496/essentials_of_marketing_3e.pdf"}, {"sentence": "In 1997, the year Steve Jobs returned as CEO, the company successfully managed to rebrand Apple as a product for independent thinkers.", "url": "https://www.businessinsider.com/apple-history-through-advertising-40-years-anniversary-2016-3"}, {"sentence": "- ^ Walter Isaacson, Steve Jobs, Simon & Schuster, 2011", "url": "https://en.wikipedia.org/wiki/IPod_advertising"}, {"sentence": "That’s precisely what we’ve done with Steve Jobs. Well, to an extent.", "url": "https://georgejziogas.medium.com/lets-stop-worshipping-steve-jobs-and-people-like-him-a99f2a7caa00"}, {"sentence": "Connery. In Victorian England, a criminal plans to", "url": "https://www.dominionpost.com/wp-content/uploads/paperpages/DP-2013-02-24.pdf"}, {"sentence": "Dominants: Planets, Signs and Houses for Sean Connery", "url": "https://www.astrotheme.com/astrology/Sean_Connery"}, {"sentence": "When it comes to Steve Jobs, there's the \"Good Steve,\" and then, there's the \"Bad Steve,\" says biographer Walter Isaacson.", "url": "https://www.businessinsider.com/steve-jobs-jerk-2011-10"}, {"sentence": "One would think that the only thing 007 Sean Connery has in common with Apple co-founder Steve Jobs is a penchant for cool gadgets but this morning’s tweets proved otherwise.", "url": "https://www.telegraph.co.uk/culture/film/jamesbond/8589096/Fake-Sean-Connery-letter-to-Steve-Jobs-becomes-Twitter-sensation.html"}, {"sentence": "In 1997 Steve Jobs returned to Apple, called the cloners “leeches” and killed the whole thing.", "url": "https://www.techadvisor.com/article/725333/apple-a-z-everything-you-need-to-know-about-apple.html"}, {"sentence": "With respect to Steve Jobs, his family, and the tremendous legacy he created.", "url": "https://owaves.com/day-plan/day-life-steve-jobs/"}, {"sentence": "Apparently Sean Connery is a bigger deal than Steve Jobs across the pond too. And apparently, this isn't the only difference in the lists from the sequel that is already hitting theaters around the world.", "url": "https://www.firstshowing.net/2014/check-out-steve-rogers-varying-to-do-lists-from-captain-america-2/"}, {"sentence": "His goal, according to Walter Isaacson's biography \"Steve Jobs,\" was to build an enduring company that prioritized people.", "url": "https://www.cnbc.com/2019/10/05/apple-ceo-steve-jobs-technology-is-nothing-heres-what-it-takes-to-achieve-great-success.html"}, {"sentence": "And Bono formed a very deep friendship with Steve Jobs and had known him already before this. And Steve Jobs, a big music fan…", "url": "https://thecurrency.news/articles/27183/u2-and-me-paul-mcguinness-on-russian-oligarchs-in-the-riviera-cutting-deals-with-steve-jobs-and-taking-financial-advice-from-bono/"}, {"sentence": "forwarding to watch an appealing or novel commercial. In addition, longer commercials", "url": "https://steladhima.files.wordpress.com/2014/03/consumer-behavior.pdf"}, {"sentence": "- Steve Jobs Discovers the Macintosh Project, Mac History", "url": "https://www.bahcall.com/tim-ferriss-garry-kasparov-and-the-secret-weapon-of-a-world-champion-chess-player/"}, {"sentence": "and entertainers such as Sammy Davis, Jr., Sean Connery, Dean Martin, ", "url": "https://nibmehub.com/opac-service/pdf/read/Designing%20Clothes%20Culture%20and%20Organization%20of%20the%20Fashion%20Industry.pdf"}, {"sentence": "Jobs, S. 2007, September 6. \"Steve Jobs' Letter to iPhone Customers.\" The Wall Street Journal. ", "url": "https://www.augie.edu/sites/default/files/u57/pdf/jaciel_subdocs/iPhone.pdf"}, {"sentence": "Murray plays an out-of-luck American actor who goes to Japan to advertise whiskey, following in the footsteps of Mickey Rourke, Sammy Davis Junior and, again, Sean Connery.", "url": "http://news.bbc.co.uk/2/hi/entertainment/3326137.stm"}, {"sentence": "Sean Connery, Actor And The Original James Bond, Dies At 90", "url": "https://www.npr.org/2020/10/31/521703453/sean-connery-actor-and-the-original-james-bond-dies-at-90"}, {"sentence": "Sean Connery, Actor And The Original James Bond, Dies At 90", "url": "https://www.npr.org/2020/10/31/521703453/sean-connery-actor-and-the-original-james-bond-dies-at-90"}, {"sentence": "interviews with, and publicity about, Sean Connery and Roger Moore;", "url": "https://files.eric.ed.gov/fulltext/ED355523.pdf"}, {"sentence": "Sean married actress Diane Cilento in 1962 and they had Sean's only child, Jason Connery, born on January 11, 1963.", "url": "https://www.imdb.com/name/nm0000125/"}, {"sentence": "Sean married actress Diane Cilento in 1962 and they had Sean's only child, Jason Connery, born on January 11, 1963.", "url": "https://www.imdb.com/name/nm0000125/"}, {"sentence": "Sam Neill, who appeared with Connery in The Hunt for Red October, tweeted: “Every day on set with Sean Connery was an object lesson in how to act on screen.", "url": "https://www.theguardian.com/film/2020/oct/31/sean-connery-james-bond-actor-dies-aged-90"}, {"sentence": "In a letter to his wife, a Confederate soldier who witnessed ", "url": "https://www.wsfcs.k12.nc.us/cms/lib/NC01001395/Centricity/Domain/2407/A%20Pocket%20Style%20Manual%20-%20Diana%20Hacker.pdf"}, {"sentence": "the teacher’s desk is an apple. The task, in other words, is to recognize an", "url": "https://www.tribuneschoolchd.com/uploads/tms/files/1595167242-the-creative-mind-pdfdrive-com-.pdf"}, {"sentence": "Steve Jobs, he stated, avoided using people in his ads because it was difficult to find an actor who appealed to everyone.[2]", "url": "https://en.wikipedia.org/wiki/IPod_advertising"}, {"sentence": "For much more on Sean Connery, please scroll down.", "url": "https://www.closerweekly.com/posts/sean-connery-movies-your-guide-to-the-actors-life-and-career/"}, {"sentence": "Sharma, A., Wingfield, N., & Yuan, L. 2007, February 17. \"How Steve Jobs Played Hardball In iPhone ", "url": "https://www.augie.edu/sites/default/files/u57/pdf/jaciel_subdocs/iPhone.pdf"}]}
2
+ {"claim_id": 1, "claim": "Trump Administration claimed songwriter Billie Eilish Is Destroying Our Country In Leaked Documents", "top_100": [{"sentence": "Billie Eilish Is \"Destroying Our Country,\" Leaked Trump Memo Says", "url": "https://www.nylon.com/entertainment/billie-eilish-trump-covid-ad"}, {"sentence": "The Trump Administration Claimed That Billie Eilish Is \"Destroying the Country\"", "url": "https://finance.yahoo.com/news/trump-administration-claimed-billie-eilish-214141314.html"}, {"sentence": "The Trump Administration Claimed That Billie Eilish Is “Destroying the Country”", "url": "https://finance.yahoo.com/news/trump-administration-claimed-billie-eilish-214141314.html"}, {"sentence": "Trump Administration Official Accused Billie Eilish of 'Destroying Our Country'", "url": "https://www.justjared.com/2020/10/29/trump-administration-official-accused-billie-eilish-of-destroying-our-country/"}, {"sentence": "Leaked Trump Admin Document Describes Billie Eilish as 'Destroying Our Country and Everything We Care About'", "url": "https://money.yahoo.com/leaked-trump-admin-document-describes-191559664.html"}, {"sentence": "Leaked Trump Admin Document Describes Billie Eilish as 'Destroying Our Country and Everything We Care About'", "url": "https://finance.yahoo.com/news/leaked-trump-admin-document-describes-191559664.html"}, {"sentence": "Billie Eilish Trashes Trump for 'Destroying Our Country' Before DNC Performance of 'My Future'View Story", "url": "https://toofab.com/2020/08/21/julia-louis-dreyfus-dnc-jokes/"}, {"sentence": "\"Our troops deserve better. Our country deserves better.\" Billie Eilish Says Trump Is 'Destroying Our Country and Everything We Care About' in DNC Remarks Sen. Tammy Duckworth.", "url": "https://people.com/politics/tammy-duckworth-trump-shouldnt-be-president-for-another-4-minutes/"}, {"sentence": "Leaked Trump Admin Docs Rule Out Billie Eilish For Ad Campaign", "url": "https://www.stereogum.com/2104276/billie-eilish-is-destroying-our-country-and-everything-we-care-about-according-to-leaked-trump-admin-doc/news/"}, {"sentence": "Political Violence Is Destroying Our Country. Former Antifa Member Reveals How Young People Are Drawn to the Domestic Terror Group", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Communists: The Anti-American Democrats Are Destroying Our Country’s Power and Prestige", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Leaked Email Shows US Military Is Secretly Moving Illegal Immigrants Around the Country", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Romney: Trump’s Impeachment Is Important to Bring ‘Unity in Our Country’", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Billie Eilish Is At The Epicentre Of A New Youthquake In Pop", "url": "https://www.gq-magazine.co.uk/culture/article/billie-eilish-interview"}, {"sentence": "The U.S. Is Becoming A Third World Country Right Before Our Eyes", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "BREAKING: Leaked Google Documents Link Holocaust Denial", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Trump Tells GSA To Allow Biden Transition To Proceed \"In The Best Interest Of Our Country\"", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Our System Is Crumbling Right In Front Of Our Eyes", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "The World Is Contemplating a Second Trump Administration", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Leaked Documents Outline DHS’s Plans to Police Disinformation", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Woke Is A Tyranny Destroying Freedom – On The Way To Destroying Itself", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "America in Decline on Many Fronts: Our Political System Is Broken, Our Industrial Base Is Vanishing, Our Education System Is In Shambles", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "A recent bizarre leaked document from the Trump Administration reveals that Billie Eilish was among the celebrities considered for a pro-Trump coronavirus campaign.", "url": "https://www.altpress.com/tags/billie-eilish/page/6/"}, {"sentence": "Leaked Facebook Documents Show Discrimination Against Conservatives and Intent to Influence Elections", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Billie Eilish Pirate Baird O'Connell (born December 18, 2001), known professionally as Billie Eilish, is an American singer and songwriter born and raised in Los Angeles, California.", "url": "https://genius.com/artists/Billie-eilish"}, {"sentence": "Billie Eilish Revealed Which Of Her Songs Is Partially About Olivia Rodrigo", "url": "https://www.buzzfeed.com/tag/billie-eilish"}, {"sentence": "Ivermectin 'Works Throughout All Phases' Of COVID According To Leaked Military Documents", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "A Trump Indictment Over Mishandling Classified Documents Is Now a Very Real Possibility", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Billie Eilish Pirate Baird O’Connell, (born December 18, 2001) known professionally as Billie Eilish, is an American singer and songwriter born and raised in Los Angeles, CA.", "url": "https://madeinatlantis.com/2020/01/28/billie-eilish-career-milestones/"}, {"sentence": "I. Trump Administration Blueprint In Brief | 11 | ", "url": "https://www.hhs.gov/sites/default/files/AmericanPatientsFirst.pdf"}, {"sentence": "I. Trump Administration Blueprint In Brief | 9 | ", "url": "https://www.hhs.gov/sites/default/files/AmericanPatientsFirst.pdf"}, {"sentence": "Woke Is A Tyranny Destroying Freedom – On The Way To Destroying Itself – David Icke Dot-Connector", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "College Director: \"Every White Person In This Country Is Racist\"", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Related: Billie Eilish: Trump is 'destroying' the US", "url": "https://finance.yahoo.com/news/leaked-trump-admin-document-describes-191559664.html"}, {"sentence": "Related: Billie Eilish: Trump is 'destroying' the US", "url": "https://money.yahoo.com/leaked-trump-admin-document-describes-191559664.html"}, {"sentence": "Chinese Officials Trying to Dodge COVID-19 Vaccinations, Citing Health Reasons: Leaked Documents", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Billie Eilish Sweeps Grammys In Ceremony Clouded By Controversy And Mourning", "url": "https://www.npr.org/2020/01/27/799879297/billie-eilish-sweeps-grammys-in-ceremony-clouded-by-controversy-and-mourning"}, {"sentence": "Rand Paul: Trump Administration Is Giving 'Contradictory Information' on Soleimani Killing", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "According to a high-ranking Trump Administration official, Billie Eilish, and several other celebrities, are \"destroying the country and everything we care about\".", "url": "https://www.hotnewhiphop.com/337775-trump-admin-says-billie-eilish-is-destroying-the-country-in-leaked-docs-news"}, {"sentence": "Left Is Churning Through Our Institutions, Constitution to Get Trump", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Leaked Documents Reveal Homeland Security’s ‘Expansive’ Influence Over Social Media Censorship", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "A Washington Post story wrongly claimed the Trump administration accused Billie Eilish of “destroying our country” — then spread like wildfire among the entertainment industry.", "url": "https://nypost.com/2020/10/30/washington-post-wrongly-claims-trump-officials-criticized-billie-eilish/"}, {"sentence": "A Washington Post story wrongly claimed the Trump administration accused Billie Eilish of “destroying our country” — then spread like wildfire among the entertainment industry.", "url": "https://web.archive.org/web/20201101145631/https://nypost.com/2020/10/30/washington-post-wrongly-claims-trump-officials-criticized-billie-eilish/"}, {"sentence": "A Washington Post story wrongly claimed the Trump administration accused Billie Eilish of \"destroying our country\" — then spread like wildfire among the entertainment industry.", "url": "https://nypost.com/2020/10/30/"}, {"sentence": "Trump's Alleged Beef With Billie Eilish, WaPo Report Under Scrutiny", "url": "https://www.tmz.com/2020/10/29/billie-eilish-trump-document-ripped-leaked-destroying-our-country/"}, {"sentence": "The Crimes Committed Against Our Country by Corrupt Elites", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Just shy of her 18th birthday, singer and songwriter Billie Eilish has already achieved what many performers aspire to for their entire careers.", "url": "https://fherehab.com/learning/billie-eilish-her-depression-mental-health-struggles/"}, {"sentence": "How The Obama Administration Set In Motion Its Coup Against Trump", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "\"This Is For You, Dad\": Redditor Shares Heartbreaking Reason For Destroying Short-Sellers In WSB Raids", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "\"This Is For You, Dad\": Redditor Shares Heartbreaking Reason For Destroying Short-Sellers In WSB Raids", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Alex Baker alexandra@highrisepr.com Billie Eilish’s tracks What Was I Made For? [From The Motion Picture \"Barbie\"] by Billie Eilish published on 2023-07-13T16:47:56Z hotline (edit) by Billie Eilish published on 2023-05-10T01:45:24Z The 30th by Billie Eilish published on 2022-07-21T18:16:25Z TV by Billie Eilish published on 2022-07-21T18:14:06Z Getting Older by Billie Eilish published on 2021-07-29T18:01:46Z Male Fantasy by Billie Eilish published on 2021-07-29T17:58:53Z Happier Than Ever by Billie Eilish published on 2021-07-29T17:57:15Z Oxytocin by Billie Eilish published on 2021-07-29T17:56:46Z Halley's Comet by Billie Eilish published on 2021-07-29T17:56:02Z I Didn't Change My Number by Billie Eilish published on 2021-07-29T17:56:00Z", "url": "https://soundcloud.com/billieeilish"}, {"sentence": "Photos: Getty Posted to: Billie Eilish, Donald Trump, Newsies", "url": "https://www.justjared.com/2020/10/29/trump-administration-official-accused-billie-eilish-of-destroying-our-country/"}, {"sentence": "Trump officials accuse Billie Eilish of “destroying” America in leaked report", "url": "https://www.altpress.com/tags/billie-eilish/page/6/"}, {"sentence": "Trump when bad guy by Billie Eilish comes on the radio: pic.twitter.com/mR9GcLJDtO", "url": "https://happymag.tv/leaked-trump-docs-billie-eilish-destroying-america/"}, {"sentence": "Billie Eilish Trump Beef Or Misquote??? WaPo Report Under Scrutiny", "url": "https://www.tmz.com/2020/10/29/billie-eilish-trump-document-ripped-leaked-destroying-our-country/"}, {"sentence": "How Billie Eilish Went From Bedroom Musician To Global Icon In 8 Steps", "url": "https://www.udiscovermusic.com/stories/billie-eilish-introduction/"}, {"sentence": "Is America Turning Into a Communist Country? What We Are Going to do When They Come For Our Freedom Of Speech And Our Freedom to Bear Arms?", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Is America Turning Into a Communist Country? What We Are Going to do When They Come For Our Freedom Of Speech And Our Freedom to Bear Arms?", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "The Ongoing Displacement of Americans Out of Our Own Country", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Meet NBC News’ Brandy Zadrozny — The Woman In Charge of Doxxing and Destroying Trump Supporters", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "In 2018 the Trump Administration continued to advance discriminatory policies that under-", "url": "https://fenwayhealth.org/wp-content/uploads/Trump-Administration-Impact-on-LGBTs-Year-Two-Brief_Web.pdf"}, {"sentence": "Leaked Documents Show, NBC NEWS (Apr. 16, 2019), https://www.nbcnews.com/tech/social-media/mark-zuckerberg-", "url": "https://democrats-judiciary.house.gov/uploadedfiles/competition_in_digital_markets.pdf"}, {"sentence": "Leaked Documents Show, NBC NEWS (Apr. 16, 2019), https://www.nbcnews.com/tech/social-media/mark-zuckerberg-", "url": "https://democrats-judiciary.house.gov/uploadedfiles/competition_in_digital_markets.pdf"}, {"sentence": "Leaked Documents Show, NBC NEWS (Apr. 16, 2019), https://www.nbcnews.com/tech/social-media/mark-zuckerberg-", "url": "https://democrats-judiciary.house.gov/uploadedfiles/competition_in_digital_markets.pdf"}, {"sentence": "“Black Myself,” Amythyst Kiah, songwriter (Our Native Daughters)", "url": "https://www.dailynews.com/2019/11/20/billie-eilish-lizzo-expected-to-dominate-grammy-categories/"}, {"sentence": "Withholding Our Cash And Our Business Is Our Ultimate Weapon Against Woke", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Mary Trump: ‘Insurrection Is Very Possibly Tied’ with Donald ‘Stealing’ Classified Documents", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "The US Military/Security Complex Is Destroying Both Peace and the US Economy", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Is Slack Destroying American Companies? Q&A With Antonio Garcia-Martinez", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "In 2018, the Trump Administration continued to remove sexual orientation and gender ", "url": "https://fenwayhealth.org/wp-content/uploads/Trump-Administration-Impact-on-LGBTs-Year-Two-Brief_Web.pdf"}, {"sentence": "or transgender status.19 In 2017, the Trump Administration halted the forward momentum ", "url": "https://fenwayhealth.org/wp-content/uploads/Trump-Administration-Impact-on-LGBTs-Year-Two-Brief_Web.pdf"}, {"sentence": "This Video Of Dua Lipa Being Accidentally Snubbed By Billie Eilish Is Making People “Physically Cringe”", "url": "https://www.buzzfeed.com/tag/billie-eilish"}, {"sentence": "J.Lo, Billie Eilish among stars rejected from Trump's coronavirus ad for criticizing president", "url": "https://www.yahoo.com/entertainment/donald-trump-coronavirus-ad-campaign-rejected-billie-eilish-jennifer-lopez-220508625.html"}, {"sentence": "Billie Eilish And Jesse Rutherford Have Made Their Red Carpet Debut In An All-Gucci Look", "url": "https://www.buzzfeed.com/tag/jesse-rutherford"}, {"sentence": "Billie Eilish And Jesse Rutherford Have Made Their Red Carpet Debut In An All-Gucci Look", "url": "https://www.buzzfeed.com/tag/billie-eilish"}, {"sentence": "Q is destroying the GOP: QAnon Is Destroying the GOP From Within", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "This is excellent writing but sad and true: Our Destroyed Country", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Our Country Has Been Stolen and Republicans Did Not Prevent the Theft", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "AT&T Declares Our Country's Problem to Be White Americans", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Google Sent Police to ‘SWAT’ a Whistleblower When He Leaked Their Documents Proving Interference with US Elections", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "01:25 GMT – Billie Eilish debuts new song, says Trump is ‘destroying’ US", "url": "https://www.aljazeera.com/news/2020/8/20/kamala-harris-makes-history-as-vice-presidential-candidate"}, {"sentence": "Trump admin says Billie Eilish is \"destroying our country and everything we care about\"", "url": "https://www.reddit.com/r/popheads/comments/jkdyi8/trump_admin_says_billie_eilish_is_destroying_our/"}, {"sentence": "\"Our Legal System Is Corrupt\" - Trump Responds After Sussman 'FBI-Russia-Hoax-Lie' Acquittal", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Hirono: We See the ‘Spectacle’ of Trump ‘Acting Like He’s the Dictator of Our Country’", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "\"Our System Is Collapsing In Real Time\": Tucker Carlson Gives Bombshell Interview", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "\"Our System Is Collapsing In Real Time\": Tucker Carlson Gives Bombshell Interview", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "A Bank With $49 Trillion In Derivatives Exposure Is Melting Down Before Our Eyes", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "A Stunning 10 Million Illegals Have Entered The US Under Biden; Tucker Warns They Are \"Destroying\" The Country", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "Billie, 20, confirmed her relationship with the 31-year-old singer and songwriter around Halloween.", "url": "https://www.buzzfeed.com/tag/billie-eilish"}, {"sentence": "Billie, 20, confirmed her relationship with the 31-year-old singer and songwriter around Halloween.", "url": "https://www.buzzfeed.com/tag/jesse-rutherford"}, {"sentence": "Meet The RVs That Are Literally \"Driving\" Our Country's GDP", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "In Strategic Bind, Israel Weighs Freeing Hostages Against Destroying Hamas", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "In its second year in office, the Trump Administration continued to enforce and endorse ", "url": "https://fenwayhealth.org/wp-content/uploads/Trump-Administration-Impact-on-LGBTs-Year-Two-Brief_Web.pdf"}, {"sentence": "This will not happen: Trump Reauthorizes Declassification Of All 'Russia Hoax' Documents In Late-Night Tweetstorm", "url": "https://healthmasters.com/daily-news-articles-archive"}, {"sentence": "In particular, outlets from the AV Club to Billboard, from NME and Complex to LoveBScott, all wrote something along the lines of: “Leaked documents show the Trump administration thinks Billie Eilish is ‘destroying our country and everything we care about.’”", "url": "https://medium.com/everythings-interesting/no-the-trump-administration-didnt-accuse-billie-eilish-of-destroying-our-country-3771ad9361b6"}, {"sentence": "Billie Eilish performs song 'My Future' at DNC after ripping President Trump for 'destroying' country", "url": "https://www.usatoday.com/story/entertainment/music/2020/08/19/billie-eilish-debuts-song-my-future-dnc-after-ripping-trump/5613653002/"}, {"sentence": "According to leaked documents published by CNBC, the Trump administration asked Billie Eilish to participate.", "url": "https://www.thefader.com/2020/10/29/report-trump-admin-document-says-billie-eilish-is-destroying-our-country-and-everything-we-care-about"}, {"sentence": "Washington Post wrongly claims Trump officials said Billie Eilish is 'destroying our country'October 30, 2020 |", "url": "https://nypost.com/2020/10/30/"}, {"sentence": "No, the Trump administration didn’t accuse Billie Eilish of “destroying our country”", "url": "https://medium.com/everythings-interesting/no-the-trump-administration-didnt-accuse-billie-eilish-of-destroying-our-country-3771ad9361b6"}, {"sentence": "“Bad Guy,” Billie Eilish O’Connell & Finneas O’Connell, songwriters (Billie Eilish)", "url": "https://www.dailynews.com/2019/11/20/billie-eilish-lizzo-expected-to-dominate-grammy-categories/"}]}
script/scraper.sh CHANGED
@@ -3,7 +3,7 @@
3
  for ((i=$2;i<$3;i++))
4
  do
5
  echo $i
6
- python -m src.retrieval.scraper_for_knowledge_store -i ../AVeriTeC/data_store/"$1"_store/$i.tsv -o data_store/output_"$1" &
7
  done
8
 
9
  wait
 
3
  for ((i=$2;i<$3;i++))
4
  do
5
  echo $i
6
+ python -m src.retrieval.scraper_for_knowledge_store -i data_store/"$1"_store/$i.tsv -o data_store/output_"$1" &
7
  done
8
 
9
  wait
src/reranking/bm25_sentences.py CHANGED
@@ -8,7 +8,6 @@ from rank_bm25 import BM25Okapi
8
 
9
 
10
  def combine_all_sentences(knowledge_file):
11
- # Get all the unique sentences from the scraped urks for this claim
12
  sentences, urls = [], []
13
 
14
  with open(knowledge_file, "r", encoding="utf-8") as json_file:
@@ -31,7 +30,7 @@ def retrieve_top_k_sentences(query, document, urls, top_k):
31
  if __name__ == "__main__":
32
 
33
  parser = argparse.ArgumentParser(
34
- description="Get top 100 sentences for sentences in the knowlede store"
35
  )
36
  parser.add_argument(
37
  "-k",
@@ -96,7 +95,7 @@ if __name__ == "__main__":
96
  )
97
 
98
  print(
99
- f"Obtained {len(document_in_sentences)} sentenes from {num_urls_this_claim} urls."
100
  )
101
 
102
  # Retrieve top_k sentences with bm25
 
8
 
9
 
10
  def combine_all_sentences(knowledge_file):
 
11
  sentences, urls = [], []
12
 
13
  with open(knowledge_file, "r", encoding="utf-8") as json_file:
 
30
  if __name__ == "__main__":
31
 
32
  parser = argparse.ArgumentParser(
33
+ description="Get top 100 sentences for sentences in the knowledge store"
34
  )
35
  parser.add_argument(
36
  "-k",
 
95
  )
96
 
97
  print(
98
+ f"Obtained {len(document_in_sentences)} sentences from {num_urls_this_claim} urls."
99
  )
100
 
101
  # Retrieve top_k sentences with bm25
src/reranking/question_generation_top_sentences.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ import json
4
+ import nltk
5
+ from rank_bm25 import BM25Okapi
6
+ import numpy as np
7
+ import torch
8
+ from transformers import BloomTokenizerFast, BloomForCausalLM
9
+
10
+
11
+ def claim2prompts(example):
12
+ claim = example["claim"]
13
+
14
+ # claim_str = "Claim: " + claim + "||Evidence: "
15
+ claim_str = "Evidence: "
16
+
17
+ for question in example["questions"]:
18
+ q_text = question["question"].strip()
19
+ if len(q_text) == 0:
20
+ continue
21
+
22
+ if not q_text[-1] == "?":
23
+ q_text += "?"
24
+
25
+ answer_strings = []
26
+
27
+ for a in question["answers"]:
28
+ if a["answer_type"] in ["Extractive", "Abstractive"]:
29
+ answer_strings.append(a["answer"])
30
+ if a["answer_type"] == "Boolean":
31
+ answer_strings.append(
32
+ a["answer"]
33
+ + ", because "
34
+ + a["boolean_explanation"].lower().strip()
35
+ )
36
+
37
+ for a_text in answer_strings:
38
+ if not a_text[-1] in [".", "!", ":", "?"]:
39
+ a_text += "."
40
+
41
+ # prompt_lookup_str = claim + " " + a_text
42
+ prompt_lookup_str = a_text
43
+ this_q_claim_str = (
44
+ claim_str + " " + a_text.strip() + "||Question answered: " + q_text
45
+ )
46
+ yield (
47
+ prompt_lookup_str,
48
+ this_q_claim_str.replace("\n", " ").replace("||", "\n"),
49
+ )
50
+
51
+
52
+ if __name__ == "__main__":
53
+ parser = argparse.ArgumentParser(
54
+ description="Use a prompt to generate questions that could be answered by top-k retrieved evidence. Output generated questions."
55
+ )
56
+ parser.add_argument("--reference_corpus", default="data/train.json", help="")
57
+ parser.add_argument("--target_file", default="data/dev.json", help="")
58
+ parser.add_argument(
59
+ "-i",
60
+ "--top_k_target_knowledge",
61
+ default="data_store/dev_top_k.json",
62
+ help="Directory where the sentences for the scraped data is saved.",
63
+ )
64
+ parser.add_argument(
65
+ "-o",
66
+ "--output_questions",
67
+ default="data_store/dev_bm25_questions.json",
68
+ help="Directory where the sentences for the scraped data is saved.",
69
+ )
70
+ parser.add_argument(
71
+ "--top_k",
72
+ default=100,
73
+ type=int,
74
+ help="How many documents should we pick out with BM25",
75
+ )
76
+ args = parser.parse_args()
77
+
78
+ # few-shot learning from the training set
79
+ with open(args.reference_corpus, "r", encoding="utf-8") as json_file:
80
+ train_examples = json.load(json_file)
81
+
82
+ prompt_corpus, tokenized_corpus = [], []
83
+
84
+ for example in train_examples:
85
+ for lookup_str, prompt in claim2prompts(example):
86
+ entry = nltk.word_tokenize(lookup_str)
87
+ tokenized_corpus.append(entry)
88
+ prompt_corpus.append(prompt)
89
+
90
+ prompt_bm25 = BM25Okapi(tokenized_corpus)
91
+
92
+ # Load the bloom model:
93
+ tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-7b1")
94
+ model = BloomForCausalLM.from_pretrained(
95
+ "bigscience/bloom-7b1",
96
+ device_map="auto",
97
+ torch_dtype=torch.bfloat16,
98
+ offload_folder="./offload",
99
+ )
100
+
101
+ with open(args.output_questions, "a", encoding="utf-8") as output_file:
102
+ with open(args.top_k_target_knowledge, "r", encoding="utf-8") as json_file:
103
+ for i, line in enumerate(json_file):
104
+ data = json.loads(line)
105
+ top_k_sentences_urls = data[f"top_{args.top_k}"]
106
+ claim = data["claim"]
107
+ claim_id = data["claim_id"]
108
+
109
+ bm25_qau = [] # question, answer, url
110
+ # Generate questions for those top k:
111
+ for sent_i, sentences_urls in enumerate(top_k_sentences_urls):
112
+
113
+ prompt_lookup_str = sentences_urls["sentence"]
114
+ url = sentences_urls["url"]
115
+
116
+ st = time.time()
117
+ prompt_s = prompt_bm25.get_scores(
118
+ nltk.word_tokenize(prompt_lookup_str)
119
+ )
120
+ prompt_n = 10
121
+ prompt_top_n = np.argsort(prompt_s)[::-1][:prompt_n]
122
+ prompt_docs = [prompt_corpus[i] for i in prompt_top_n]
123
+ print(
124
+ f"Got top 100 prompt for sent {sent_i} in file {i}. Time elapsed: {time.time() - st}"
125
+ )
126
+
127
+ claim_prompt = (
128
+ "Evidence: "
129
+ + prompt_lookup_str.replace("\n", " ")
130
+ + "\nQuestion answered: "
131
+ )
132
+
133
+ prompt = "\n\n".join(prompt_docs + [claim_prompt])
134
+
135
+ inputs = tokenizer([prompt], padding=True, return_tensors="pt").to(
136
+ model.device
137
+ )
138
+
139
+ outputs = model.generate(
140
+ inputs["input_ids"],
141
+ max_length=5000,
142
+ num_beams=2,
143
+ no_repeat_ngram_size=2,
144
+ early_stopping=True,
145
+ )
146
+
147
+ tgt_text = tokenizer.batch_decode(
148
+ outputs[:, inputs["input_ids"].shape[-1] :],
149
+ skip_special_tokens=True,
150
+ )[0]
151
+
152
+ # We are not allowed to generate more than 250 characters:
153
+ tgt_text = tgt_text[:250]
154
+
155
+ qau_pair = [
156
+ tgt_text.strip().split("?")[0].replace("\n", " ") + "?",
157
+ prompt_lookup_str.replace("\n", " "),
158
+ url,
159
+ ]
160
+
161
+ bm25_qau.append(qau_pair)
162
+
163
+ json_data = {
164
+ "claim_id": claim_id,
165
+ "claim": claim,
166
+ "bm25_qau": bm25_qau,
167
+ }
168
+ output_file.write(
169
+ json.dumps(json_data, ensure_ascii=False, indent=4) + "\n"
170
+ )
171
+ output_file.flush()
src/retrieval/html2lines.py CHANGED
@@ -18,7 +18,6 @@ def get_page(url):
18
  page = None
19
  for _ in range(3):
20
  try:
21
- # for website that is "maintaining", trafilatura "respect the retry of the html" and waits for 24 hours
22
  page = trafilatura.fetch_url(url, config=DEFAULT_CONFIG)
23
  assert page is not None
24
  print("Fetched " + url, file=sys.stderr)
@@ -59,7 +58,7 @@ def line_correction(lines, max_size=100):
59
 
60
  if (
61
  len(stack) > MIN_CHAR
62
- ): # Enusre every lines in the out_lines suffice the MIN_CHAR restriction
63
  out_lines.append(stack)
64
  else:
65
  out_lines.append(line)
 
18
  page = None
19
  for _ in range(3):
20
  try:
 
21
  page = trafilatura.fetch_url(url, config=DEFAULT_CONFIG)
22
  assert page is not None
23
  print("Fetched " + url, file=sys.stderr)
 
58
 
59
  if (
60
  len(stack) > MIN_CHAR
61
+ ): # Ensure every lines in the out_lines suffice the MIN_CHAR restriction
62
  out_lines.append(stack)
63
  else:
64
  out_lines.append(line)
src/retrieval/scraper_for_knowledge_store.py CHANGED
@@ -27,7 +27,7 @@ def scrape_text_from_url(url, temp_name):
27
 
28
  if (
29
  response is None or response.status_code == 503
30
- ): # trafilatura does not handle retry with 503, often waiting 24hours as overwriten by the html
31
  return []
32
 
33
  if url.endswith(".pdf"):
@@ -92,7 +92,7 @@ if __name__ == "__main__":
92
  lines_skipped = len(existing_data)
93
  print(f" Skipping {lines_skipped} lines in {json_output_path}")
94
 
95
- # Some tsv files will fail to be laoded, try all 4 different libs to to load them
96
  try:
97
  df = pd.read_csv(args.tsv_input_file, sep="\t", header=None)
98
  data = df.values
@@ -107,7 +107,6 @@ if __name__ == "__main__":
107
  print("Data loaded successfully with NumPy.")
108
  except Exception as e:
109
  print("Error loading with NumPy:", e)
110
- # If NumPy loading fails, attempt to load with Pandas
111
  try:
112
  data = []
113
  with open(args.tsv_input_file, "r", newline="") as tsvfile:
 
27
 
28
  if (
29
  response is None or response.status_code == 503
30
+ ): # trafilatura does not handle retry with 503, often waiting 24 hours as overwritten by the html
31
  return []
32
 
33
  if url.endswith(".pdf"):
 
92
  lines_skipped = len(existing_data)
93
  print(f" Skipping {lines_skipped} lines in {json_output_path}")
94
 
95
+ # Some tsv files will fail to be loaded, try different libs to to load them
96
  try:
97
  df = pd.read_csv(args.tsv_input_file, sep="\t", header=None)
98
  data = df.values
 
107
  print("Data loaded successfully with NumPy.")
108
  except Exception as e:
109
  print("Error loading with NumPy:", e)
 
110
  try:
111
  data = []
112
  with open(args.tsv_input_file, "r", newline="") as tsvfile: