|
# Model description |
|
|
|
- Morphosyntactic analyzer: Trankit |
|
- Tagset: UD |
|
- Embedding vectors: XLM-RoBERTa-Base |
|
- Dataset: NLPrePL-NKJP-fair-by-name (https://huggingface.co/datasets/ipipan/nlprepl) |
|
|
|
# How to use |
|
|
|
## Clone |
|
|
|
``` |
|
git clone git@hf.co:ipipan/nlpre_trankit_ud_xlm-roberta-base_nkjp-by-name |
|
``` |
|
|
|
## Load model |
|
|
|
``` |
|
import trankit |
|
|
|
model_path = './nlpre_trankit_ud_xlm-roberta-base_nkjp-by-name' |
|
|
|
trankit.verify_customized_pipeline( |
|
category='customized-mwt', # pipeline category |
|
save_dir=model_path, # directory used for saving models in previous steps |
|
embedding_name='xlm-roberta-base' # embedding version that we use for training our customized pipeline, by default, it is `xlm-roberta-base` |
|
) |
|
|
|
model = trankit.Pipeline(lang='customized-mwt', cache_dir=model_path, embedding='xlm-roberta-base') |
|
``` |