How to convert Marian model format to huggingface

#1
by corner - opened

I know that original “Helsinki-NLP/opus-mt-en-zh” was traned with MarianNMT,but how to use it in huggingface project?

Hey @corner ,

Sorry I don't fully understand the question. To use the model you can simply follow this code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

And the docs here https://huggingface.co/docs/transformers/model_doc/marian

Thank you for your reply,@patrickvonplaten :
I mean if I train model with marian,how can i use it in huggingface project,like “Helsinki-NLP/opus-mt-en-zh” in:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

You may try the code below to convert the model to pytorch.

import argparse
import os
from pathlib import Path
from transformers.models.marian.convert_marian_to_pytorch import convert

argparser = argparse.ArgumentParser('Convert Marian NMT models to pyTorch')
argparser.add_argument('--model-path', action="store", required=True)
argparser.add_argument('--dest-path', action="store", required=True)
args = argparser.parse_args()

Path(args.dest_path).mkdir(parents=True, exist_ok=True)
convert(Path(args.model_path), Path(args.dest_path))

THANKS bobosui
I'll try it

hello,@bobosui
Do I have to use sentenceice to train the model? How else can I get source spm/target. spm
Thx!

Language Technology Research Group at the University of Helsinki org

Yes, you have to use sentencepiece to get the subword segmentation models. You can probably use other subword tokenizers as well but then you have to dig into the model conversion code to make the appropriate adjustments.

Thanks to you all!
I have done train with sentencepiece,and use convert_marian_to_pytorch.py to pytorch format ,the result is not bad!

Sign up or log in to comment