gemma-2-9b-chatml / gemma-tokenizer-chatml.py
lodrick-the-lafted's picture
Upload folder using huggingface_hub
f6cf678 verified
raw
history blame contribute delete
320 Bytes
# original author: intervitens
import sentencepiece.sentencepiece_model_pb2 as model
m = model.ModelProto()
m.ParseFromString(open('./tokenizer.model', 'rb').read())
m.pieces[11].piece = '<|im_start|>'
m.pieces[12].piece = '<|im_end|>'
with open('tokenizer_fixed.model', 'wb') as f:
f.write(m.SerializeToString())