spyrosbriakos/greek_legal_bert_v2

This model was produced as part of respective B.Sc. Thesis: NLP Tasks with GreekLegalBERT v2.

As far as we can discern, there are two unique models in the Greek NLP era: the general-purpose Greek-BERT model and the specific-domain Greek-Legal-BERT-v1 model. In this thesis, we focus on the generation and representation of the second version of GreekLegal-BERT, namely GreekLegalBERT v2, which was provided with more Legal Data than the first version.

Combined dataset that was used for current model's pretraining purposes is comprised of:

The Raptarchis dataset, also known as RAPTARCHIS47k, consisting of approximately 47 thousand legal resources, is a comprehensive collection of Greek legislation dating from the founding of the Greek state in 1834 through 2015.
Nomothesi@, a platform that makes Greek legislation available on the Web as linked open data, was built on the basis of the aforementioned principles.
EuroParl, Philipp Koehn’s team in Edinburgh was able to collect corpus parallel text from the European Parliament sessions in 11 languages from European Union, including Greek.
EUR-LEX provides online access to European Union (EU) legal documents that is both official and comprehensive, containing 57 thousand Greek EU legislative documents from the EUR-LEX portal.
Hellenic Parliament Sessions, All the available minutes of the plenary sessions of the Greek or Hellenic Parliament, from 3 July 1989 to 24 August 2021,

The current thesis' goal is to compare the three dinstict Greek NLP models, based on BERT model, between different downstream NLP tasks, notably in Named Entity Recognition, Natural Language Inference and Multiclass Classification on Raptarchis dataset.