Summarization

In summarization we have two sequences. A larger sequence is represented by a smaller sequence aka summarized.

Let’s take a look at how a simple summarization dataset might look like.

document summary
Recent reports have linked some France-based

players with returns to Wales. "I've always felt -
and this is with my rugby hat on now; this is not
region or WRU - I'd rather spend that money on
keeping players in Wales," said Davies. The WRU
provides £2m to the fund and £1.3m comes from the
regions. Former Wales and British and Irish Lions
fly-half Davies became WRU chairman on Tuesday 21
October, succeeding deposed David Pickering
following governing body elections. He is now
serving a notice period to leave his role as
Newport Gwent Dragons chief executive after being
voted on to the WRU board in September. Davies was
among the leading figures among Dragons, Ospreys,
Scarlets and Cardiff Blues officials who were
embroiled in a protracted dispute with the WRU
that ended in a £60m deal in August this year. In
the wake of that deal being done, Davies said the
£3.3m should be spent on ensuring current Wales-
based stars remain there. In recent weeks, Racing
Metro flanker Dan Lydiate was linked with
returning to Wales. Likewise the Paris club's
scrum-half Mike Phillips and centre Jamie Roberts
were also touted for possible returns. Wales coach
Warren Gatland has said: "We haven't instigated
contact with the players. "But we are aware that
one or two of them are keen to return to Wales
sooner rather than later." Speaking to Scrum V on
BBC Radio Wales, Davies re-iterated his stance,
saying keeping players such as Scarlets full-back
Liam Williams and Ospreys flanker Justin Tipuric
in Wales should take precedence. "It's obviously a
limited amount of money [available]. The union are
contributing 60% of that contract and the regions
are putting £1.3m in. "So it's a total pot of just
over £3m and if you look at the sorts of salaries
that the... guys... have been tempted to go
overseas for [are] significant amounts of money.
"So if we were to bring the players back, we'd
probably get five or six players. "And I've always
felt - and this is with my rugby hat on now; this
is not region or WRU - I'd rather spend that money
on keeping players in Wales. "There are players
coming out of contract, perhaps in the next year
or so… you're looking at your Liam Williams' of
the world; Justin Tipuric for example - we need to
keep these guys in Wales. "We actually want them
there. They are the ones who are going to impress
the young kids, for example. "They are the sort of
heroes that our young kids want to emulate. "So I
would start off [by saying] with the limited pot
of money, we have to retain players in Wales.
"Now, if that can be done and there's some spare
monies available at the end, yes, let's look to
bring players back. "But it's a cruel world, isn't
it? "It's fine to take the buck and go, but great
if you can get them back as well, provided there's
enough money." British and Irish Lions centre
Roberts has insisted he will see out his Racing
Metro contract. He and Phillips also earlier
dismissed the idea of leaving Paris. Roberts also
admitted being hurt by comments in French
Newspaper L'Equipe attributed to Racing Coach
Laurent Labit questioning their effectiveness.
Centre Roberts and flanker Lydiate joined Racing
ahead of the 2013-14 season while scrum-half
Phillips moved there in December 2013 after being
dismissed for disciplinary reasons by former club
Bayonne.
New Welsh Rugby Union chairman Gareth Davies
believes a joint £3.3m WRU-regions fund should be
used to retain home-based talent such as Liam
Williams, not bring back exiled stars.
Army explosives experts were called out to deal
with a suspect package at the offices on the
Newtownards Road on Friday night. Roads were
sealed off and traffic diverted as a controlled
explosion was carried out. The premises, used by
East Belfast MP Naomi Long, have been targeted a
number of times. Most recently, petrol bomb
attacks were carried out on the offices on
consecutive nights in April and May. The attacks
began following a Belfast City Council vote in
December 2012 restricting the flying of the union
flag at the City Hall. Condemning the latest hoax,
Alliance MLA Chris Lyttle said: "It is a serious
incident for the local area, it causes serious
disruption, it puts people's lives at risk, it can
prevent emergency services reaching the area.
"Ultimately we need people with information to
share that with the police in order for them to do
their job and bring these people to justice."
A suspicious package left outside an Alliance
Party office in east Belfast has been declared a
hoax.
The warning begins at 22:00 GMT on Saturday and
ends at 10:00 on Sunday. The ice could lead to
difficult driving conditions on untreated roads
and slippery conditions on pavements, the weather
service warned. Only the southernmost counties and
parts of the most westerly counties are expected
to escape. Counties expected to be affected are
Carmarthenshire, Powys, Ceredigion, Pembrokeshire,
Denbighshire, Gwynedd, Wrexham, Conwy, Flintshire,
Anglesey, Monmouthshire, Blaenau Gwent,
Caerphilly, Merthyr Tydfil, Neath Port Talbot,
Rhondda Cynon Taff and Torfaen
The Met Office has issued a yellow weather warning
for ice across most of Wales.

Once you have the data in the format specified above, you are ready to train models using AutoNLP. Yes, it’s that easy.

The first step would be login to AutoNLP:

$ autonlp login --api-key YOUR_HUGGING_FACE_API_TOKEN

If you do not know your Hugging Face API token, please create an account on huggingface.co and you will find your api key in settings. Please do not share your api key with anyone!

Once you have logged in, you can create a new project:

$ autonlp create_project --name summarization_model --language en --task summarization

During creation of project, you can choose the language using “–language” parameter.

The next step is to upload files. Here, column mapping is very important. The columns from original data are mapped to AutoNLP column names. In the data above, the original columns are “document” and “summary”. We do not need more columns for a summarization problem.

AutoNLP columns for summarization are:

  • text

  • target

The original columns, thus, need to be mapped to text and target. This is done in upload command. You also need to tell AutoNLP what kind of split you are uploading: train or valid.

autonlp upload --project summarization_model --split train \
            --col_mapping document:text,summary:target \
            --files ~/datasets/train.csv

Similarly, upload the validation file:

autonlp upload --project summarization_model --split valid \
            --col_mapping document:text,summary:target \
            --files ~/datasets/valid.csv

Column mapping is always from original column to AutoNLP column (original_column:autonlp_column).

Please note that you can upload multiple files by separating the paths by a comma, however, the column names must be the same in each file.

Once you have uploaded the files successfully, you can start training by using the train command:

$ autonlp train --project summarization_model

And that’s it!

Your model will start training and you can monitor the training if you wish.