--- library_name: transformers tags: - code datasets: - elyza/ELYZA-tasks-100 language: - ja metrics: - accuracy base_model: - tohoku-nlp/bert-base-japanese-v3 pipeline_tag: text-classification --- # Model Card for Model ID ## Model Details elyzaタスク100のタスクのinputを入力してタスクを分類するためのタスクです。 タスクの分類は以下のものです。 - 知識説明型 Knowledge Explanation - 創作型 Creative Generation - 分析推論型 Analytical Reasoning - 課題解決型 Task Solution - 情報抽出型 Information Extraction - 計算・手順型 Step-by-Step Calculation - 意見・視点型 Opinion-Perspective - ロールプレイ型 Role-Play Response ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [Hiroki Yanagisawa] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [BERT] - **Language(s) (NLP):** [Japanese] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [cl-tohoku/bert-base-japanese-v3] ### Direct Use ```python from transformers import pipeline label2id = { 'Task_Solution': 0, 'Creative_Generation': 1, 'Knowledge_Explanation': 2, 'Analytical_Reasoning': 3, 'Information_Extraction': 4, 'Step_by_Step_Calculation': 5, 'Role_Play_Response': 6, 'Opinion_Perspective': 7 } def preprocess_text_classification(examples: dict[str, list]) -> BatchEncoding: """バッチ処理用に修正""" encoded_examples = tokenizer( examples["questions"], # バッチ処理なのでリストで渡される max_length=512, padding=True, truncation=True, return_tensors=None # バッチ処理時はNoneを指定 ) # ラベルをバッチで数値に変換 encoded_examples["labels"] = [label2id[label] for label in examples["labels"]] return encoded_examples # 使用するデータセット test_data = test_data.to_pandas() test_data["labels"] = test_data["labels"].apply(lambda x: label2id[x]) test_data model_name = "hiroki-rad/bert-base-classification-ft" classify_pipe = pipeline(model=model_name, device="cuda:0") class_label = dataset["labels"].unique() label2id = {label: id for id, label in enumerate(class_label)} id2label = {id: label for id, label in enumerate(class_label)} results: list[dict[str, float | str]] = [] for i, example in tqdm(enumerate(test_data.itertuples())): # モデルの予測結果を取得 model_prediction = classify_pipe(example.questions)[0] # 正解のラベルIDをラベル名に変換 true_label = id2label[example.labels] results.append( { "example_id": i, "pred_prob": model_prediction["score"], "pred_label": model_prediction["label"], "true_label": true_label, } ) ```