Czech Metrum Validator.
Validator for metrum. Trained on Czech poetry from github project by
Institute of Czech Literature, Czech Academy of Sciences.
https://github.com/versotym/corpusCzechVerse
Usage
Loading model
Download validator.py with interface Download model and load it by pytorch
import torch
model: ValidatorInterface = (torch.load(args.metre_model_path_full, map_location=torch.device('cpu')))
Load base robeczech tokenizer and try it out
tokenizer = = AutoTokenizer.from_pretrained('roberta-base')
model.validate(input_ids=datum["input_ids"], metre=datum["metre"])['acc']
Train Model
meter_model = MeterValidator(pretrained_model=args.pretrained_model)
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer)
training_args = TrainingArguments(
save_strategy = "no",
logging_steps = 500,
warmup_steps = args.worm_up,
weight_decay = 0.0,
num_train_epochs = args.epochs,
learning_rate = args.learning_rate,
fp16 = True if torch.cuda.is_available() else False,
ddp_backend = "nccl",
lr_scheduler_type="cosine",
logging_dir = './logs',
output_dir = './results',
per_device_train_batch_size = args.batch_size)
Trainer(model = rhyme_model,
args = training_args,
train_dataset= train_data.pytorch_dataset_body,
data_collator=collate).train()