--- license: mit language: - ja pipeline_tag: text-generation --- # japanese-gpt-1b-PII-masking ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64ffe8a785a884a964b0cffe/gFQn0Oc6Nrvj8ViyTdZuM.png) # Model Description japanese-gpt-1b-PII-masking は、 [日本語事前学習済み1B GPTモデル](https://huggingface.co/rinna/japanese-gpt-1b)をベースとして、日本語の文章から個人情報をマスキングするように学習したモデルです。 # Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer input_text = "" model_name = "cameltech/japanese-gpt-1b-PII-masking" model = AutoModelForCausalLM.from_pretrained(best_model_path) tokenizer = AutoTokenizer.from_pretrained(best_model_path) if torch.cuda.is_available(): model = model.to("cuda") def preprocess(text): return text.replace("\n", "") def postprocess(text): return text.replace("", "\n") input_text += tokenizer.eos_token input_text = preprocess(input_text) with torch.no_grad(): token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") output_ids = model.generate( token_ids.to(model.device), max_new_tokens=256, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True) output = postprocess(output) print(output) ``` # Licenese [The MIT license](https://opensource.org/licenses/MIT)