base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
Uploaded model
- Developed by: SAS3
- License: apache-2.0
- Finetuned from model : llm-jp/llm-jp-3-13b
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
About This Model
This model is a fine-tuned version of llm-jp/llm-jp-3-13b using the Unsloth library and Hugging Face's TRL (Training Reinforcement Learning) library. It is designed to generate responses based on given instructions in Japanese.
Features
Efficient Loading: Utilizes 4-bit quantization for efficient memory usage. Customizable: Can be used with any JSONL dataset containing an "input" field. Easy Integration: The provided sample code allows for quick setup and inference.
Intended Use
Instruction Following: Generate responses to specific instructions or prompts. Text Generation: Suitable for applications requiring Japanese language text generation.
Limitations
Language: The model is fine-tuned for Japanese and may not perform well with inputs in other languages. Biases: As with any AI language model, outputs may contain biases present in the training data.
How to Cite
If you use this model in your research or applications, please cite it as:
SAS3/llm-jp-3-13b-it on Hugging Face
Contact
For any questions or support, please contact SAS3.
Sample Usage
Below is an example of how to use the uploaded model to generate outputs for any JSONL dataset. This code utilizes the Unsloth library to load the model and perform inference. The generated jsonl file will contain the model's outputs corresponding to your dataset.
# Install necessary libraries
!pip install unsloth
!pip uninstall unsloth -y
!pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
from unsloth import FastLanguageModel
import torch
import json
from tqdm import tqdm
import os
# Load the model and tokenizer
model_name = "SAS3/llm-jp-3-13b-it"
max_seq_length = 2048
dtype = None
load_in_4bit = True
# Replace "YOUR_HF_TOKEN" with your actual Hugging Face token
HF_TOKEN = "YOUR_HF_TOKEN"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
token=HF_TOKEN,
)
FastLanguageModel.for_inference(model)
# Load your dataset (replace 'your_dataset.jsonl' with your dataset file)
data = []
with open("your_dataset.jsonl", "r", encoding='utf-8') as f:
item = ""
for line in f:
line = line.strip()
item += line
if item.endswith("}"):
data.append(json.loads(item))
item = ""
# Perform inference
results = []
for dt in tqdm(data):
input_text = dt.get("input", "")
prompt = f"""### Instruction
{input_text}
### Response
"""
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
use_cache=True,
do_sample=False,
repetition_penalty=1.2
)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### Response')[-1]
results.append({
"task_id": dt.get("task_id", ""),
"input": input_text,
"output": prediction
})
# Save the results
safe_model_name = os.path.basename(model_name)
with open(f"./{safe_model_name}_output.jsonl", 'w', encoding='utf-8') as f:
for result in results:
json.dump(result, f, ensure_ascii=False)
f.write('\n')
Notes
Hugging Face Token: Replace "YOUR_HF_TOKEN" in the code with your actual Hugging Face access token. You can obtain your token from Hugging Face Account Settings.
Dataset:
- Replace 'your_dataset.jsonl' in the code with the path to your JSONL dataset file.
- Ensure your dataset is in JSON Lines format, where each line is a valid JSON object.
- Each JSON object should at least contain an "input" field. If available, "task_id" or other metadata can also be included.
Library Installation: The code includes commands to install and upgrade the necessary libraries. If you're running this code in a Jupyter notebook or Google Colab, you can execute these commands directly.
Inference Process:
- The model and tokenizer are loaded using the Unsloth library.
- For each input in your dataset, the code generates a prompt in the following format:
### Instruction
{input_text}
### Response
- The model generates a response, which is then decoded and appended to the results.
- The final results are saved in a jsonl file named after the model.