---
library_name: transformers
tags:
- LoRA
- unsloth
license: apache-2.0
language:
- ja
base_model:
- IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is a **LoRA (Low-Rank Adaptation)** fine-tuned version of `IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit`, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements.

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

## Model Details

### Model Architecture and Visual Abstract

![Model Architecture 2](https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-it_lora/resolve/main/napkin-selection2.png)

### Model Overview

This is a LoRA (Low-Rank Adaptation) fine-tuned version of IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements.

**Note**: The base model, "IshiiTakahiro/llm-jp-3-13b-q-it-id2098_16bit," is a further pre-trained version of "llm-jp/llm-jp-3-13b." Please be aware of this distinction.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Base Model**: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit  
- **Adaptation Type**: LoRA  
- **Language**: Japanese  
- **License**: Apache 2.0  

This model specializes in tasks such as sentiment analysis, dialogue generation, and text summarization.

---

## Intended Use

### Primary Use Cases
This LoRA model is ideal for the following tasks:
1. **Text Generation:** Efficiently generate Japanese text for specific domains or use cases.
2. **Text Classification:** Perform classification tasks with reduced resource consumption.
3. **Domain-Specific Fine-Tuning:** Quickly adapt to niche tasks without retraining the entire model.

### Out-of-Scope Use Cases
- This LoRA model inherits limitations from the base model and should not be used for:
  - Generating harmful or biased content.
  - High-stakes decision-making in legal, medical, or critical scenarios.

---

## How to Use

### Installation
Before running the code, install the required libraries:

```
!pip uninstall unsloth -y && pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes jsonlines
```

## Example Code
```python

from unsloth import FastLanguageModel
import torch
from tqdm import tqdm
import random
import numpy as np
from multiprocessing import Pool, cpu_count
import re
import datetime
import csv
import jsonlines
from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')

peft_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="IshiiTakahiro/llm-jp-3-13b-it_lora",
    dtype=torch.bfloat16,  
    load_in_4bit=False,    
    trust_remote_code=True,
    token=HF_TOKEN,
)
def evaluate_task_score(task: str, answer: str) -> int:
  return 0

def batchify(data, batch_size):
    """データをバッチに分割する関数"""
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]

def score_prediction(task_input, predictions, index):
    scores = [
        evaluate_task_score(task_input, prediction)
        for prediction in predictions
    ]
    # ファイルに出力
    output_file = f"{ID}.prediction_scores.csv"
    with open(output_file, mode="a", newline="", encoding="utf-8") as file:
        writer = csv.writer(file)
        for prediction, score in zip(predictions, scores):
            writer.writerow([index, task_input, prediction, score])
    return index, scores

datasets = []
# タスクとなるデータの読み込み。
# 事前にデータをアップロードしてください。
with jsonlines.open("./elyza-tasks-100-TV_0.jsonl", "r") as reader:
    datasets = list(reader)

FastLanguageModel.for_inference(peft_model)
peft_model = peft_model.to(dtype=torch.bfloat16)

MAX_NEW_TOKENS:int = 2048
NUM_RETURN_SEQUENCES:int = 1
BATCH_SIZE:int = 1
ID:int = 1000
tasks_with_predictions = []
torch.cuda.empty_cache()

for batch in tqdm(batchify(datasets, BATCH_SIZE), desc="Running inference on GPU"):
    batch_inputs = []
    batch_task_ids = []
    for dt in batch:
        input_text = dt["input"]
        annotation = f"### 注釈\n　追加情報: {random.randint(1, 100)}。この追加情報は無視してください。"
        prompt = f"### 指示\n{input_text}\n{annotation}\n### 回答\n"
        batch_inputs.append(prompt)
        batch_task_ids.append(dt["task_id"])

    inputs = tokenizer(
        batch_inputs,
        return_tensors="pt",
        padding=True,
        truncation=True
    ).to(peft_model.device)

    with torch.no_grad():
      outputs = peft_model.generate(
          **inputs,
          max_new_tokens=MAX_NEW_TOKENS,
          use_cache=True,
          do_sample=False,
          repetition_penalty=1.07,
          early_stopping=True,
          num_return_sequences=NUM_RETURN_SEQUENCES,
          pad_token_id=tokenizer.pad_token_id,
          bos_token_id=tokenizer.bos_token_id,
          eos_token_id=tokenizer.eos_token_id
      )
    batch_predictions = [
        tokenizer.decode(output, skip_special_tokens=True).split('\n### 回答')[-1].strip()
        for output in outputs
    ]
    for task_id, input_text, prediction in zip(batch_task_ids, batch_inputs, batch_predictions):
        tasks_with_predictions.append((task_id, input_text, [prediction]))

# CPUでのスコアリング（並列化）
def cpu_scoring(task):
    task_id, task_input, predictions = task
    return score_prediction(task_input, predictions, task_id)
    
with Pool(cpu_count()) as pool:
    scoring_results = list(
        tqdm(pool.imap(cpu_scoring, tasks_with_predictions), total=len(tasks_with_predictions), desc="Scoring on CPU")
    )
    
# 最終結果の収集
results = []
for (task_id, task_input, predictions), (_, scores) in zip(tasks_with_predictions, scoring_results):
    best_index = np.argmax(scores)
    print("task_id:", task_id, ", best_index:", best_index)
    best_prediction = predictions[best_index]
    results.append({
        "task_id": task_id,
        "input": task_input,
        "output": best_prediction
    })
    
# 保存
output_filename = f"id{ID}.jsonl"
with jsonlines.open(output_filename, mode='w') as writer:
    writer.write_all(results)
    
print(f"Results saved to {output_filename}")
```


## Bias, Risks, and Limitations

This LoRA model inherits biases and risks from its base model:

- Cultural and Linguistic Bias: Outputs may reflect biases present in the Japanese-language training data.
- Domain-Specific Limitations: Performance may degrade outside of the fine-tuned domain or task.

### Recommendations

- Validate outputs critically, especially when applied to sensitive domains.
- Fine-tune further or evaluate carefully when adapting this model for a new domain.

### Training Procedure

This model was fine-tuned using LoRA, which updates only a small number of low-rank matrices:

Base Model: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit
LoRA Ranks: 8
Precision: bf16
Hardware: NVIDIA L4 GPUs

LoRA significantly reduces computational overhead compared to full model fine-tuning, while maintaining performance on the target task.


## Citation

If you use this model, please cite it as follows:

**BibTeX:**

@misc{ishii2024lora,
  title={LoRA Adaptation of Large Japanese Language Model},
  author={Takahiro Ishii},
  year={2024},
  note={Available at Hugging Face Hub: https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit}
}