metadata

language:
  - en
  - cs
license: cc-by-4.0
metrics:
  - bleurt
  - bleu
  - bertscore
pipeline_tag: text-classification

AlignScoreCS

MultiTask multilingual model for assessing facticity in various NLU tasks in Czech and English language. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739. We trained a model using a shared architecture of checkpoint xlm-roberta-large https://huggingface.co/FacebookAI/xlm-roberta-large with three linear layers for regression, binary classification and ternary classification.

Usage

  # Assuming you copied the attached Files_and_versions/AlignScore.py file for ease of use in transformers.
  from AlignScoreCS import AlignScoreCS
  alignScoreCS = AlignScoreCS.from_pretrained("krotima1/AlignScoreCS")
  # put the model to cuda to accelerate
  print(alignScoreCS.score(context="This is context", claim="This is claim"))

Results

Training datasets

The following table shows datasets that has been utilized for training the model. We translated these english datasets to Czech using seamLessM4t.

NLP Task	Dataset	Training Task	Context (n words)	Claim (n words)	Sample Count
NLI	SNLI	3-way	10	13	Cs: 500k
					En: 550k
	MultiNLI	3-way	16	20	Cs: 393k
					En: 393k
	Adversarial NLI	3-way	48	54	Cs: 163k
					En: 163k
	DocNLI	2-way	97	285	Cs: 200k
					En: 942k
Fact Verification	NLI-style FEVER	3-way	48	50	Cs: 208k
					En: 208k
	Vitamin C	3-way	23	25	Cs: 371k
					En: 371k
Paraphrase	QQP	2-way	9	11	Cs: 162k
					En: 364k
	PAWS	2-way	-	18	Cs: -
					En: 707k
	PAWS labeled	2-way	18	-	Cs: 49k
					En: -
	PAWS unlabeled	2-way	18	-	Cs: 487k
					En: -
STS	SICK	reg	-	10	Cs: -
					En: 4k
	STS Benchmark	reg	-	10	Cs: -
					En: 6k
	Free-N1	reg	18	-	Cs: 20k
					En: -
QA	SQuAD v2	2-way	105	119	Cs: 130k
					En: 130k
	RACE	2-way	266	273	Cs: 200k
					En: 351k
Information Retrieval	MS MARCO	2-way	49	56	Cs: 200k
					En: 5M
Summarization	WikiHow	2-way	434	508	Cs: 157k
					En: 157k
	SumAug	2-way	-	-	Cs: -
					En: -