File size: 4,489 Bytes
e7b2f2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24450fb
e7b2f2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24450fb
 
4f04c56
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: llama2
language:
- ja
- en
---

## Model Overview
This is a comedy language model that has been enhanced by adding Japanese vocabulary to Llama2-13b and undergoing continued pre-training. It has been fine-tuned with comedy data after pre-training.  
This model excels in "Ogiri," a Japanese wordplay game that involves answering questions or completing sentences in a humorous or witty manner.  
Ogiri is a traditional form of Japanese entertainment that showcases one's quick wit and creativity, often performed in groups where participants try to come up with the most amusing response. Our model is capable of generating Ogiri responses in both Japanese and English, making it versatile for engaging with this unique aspect of Japanese culture in a multilingual context.  
This model has been supported by the [AWS LLM Development Support Program](https://aws.amazon.com/jp/local/llm-development-support-program/). For continued pre-training, parallel training was conducted using instances equipped with AWS Trainium [trn1.32xlarge](https://aws.amazon.com/jp/ec2/instance-types/trn1/) × 4.

* License: [LLAMA 2 COMMUNITY LICENSE](https://github.com/facebookresearch/llama/blob/main/LICENSE)
* Library: [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron)

### Tokenizer
The original tokenizer of Llama2, with a vocabulary of 32,000, has been expanded by adding 13,046 Japanese vocabulary items using BPE, bringing the total vocabulary size to 45,046. When adding vocabulary, single-character kanji tokens have been limited to those that are commonly used kanji and those that appear frequently in the training data. Tokens that pair numbers with letters and symbols with letters have been avoided by removing numbers and symbols from the data in advance.

## Training Data
The model was pre-trained using the following corpora, with a total of 65 billion tokens:
* Japanese data from [C4](https://huggingface.co/datasets/mc4)
* Japanese data from [CC-100](https://huggingface.co/datasets/cc100)
* Japanese data from [OSCAR](https://huggingface.co/datasets/oscar)
* Japanese and English dump data from Wikipedia ([Japanese Main Page](https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8), [English Main Page](https://en.wikipedia.org/wiki/Main_Page))
* Proprietary company data

## How to Use
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

if torch.cuda.is_available():
    model = model.to("cuda")

odai = "What happens when a clock is hungry?"
text = f"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

Input:
{odai}

Response:
"""
text = text.lstrip()

with torch.no_grad():
    token_ids = tokenizer.encode(text, return_tensors="pt").to(model.device) 
    output_ids = model.generate(
                token_ids,
                do_sample=True,
                min_new_tokens=1,
                max_new_tokens=64,
                top_p=0.9,
                top_k=50,
                temperature=0.8,
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id,
    )
output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
print(output)
"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

Input:
What happens when a clock is hungry?

Response:
It takes time to get back on top!
"""
```

### How to Run on AWS inf2.xlarge
As of January 24, 2024, [AWS inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/) offer a cost-effective solution for operating models with over 10 billion parameters compared to GPU instances.  
The model and source code can be found [here](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron).

## Developers
- Tatsuya Uchida
- Yohei Kobashi
- Shuya Kuroki
- Hikaru Kubota
- Daisuke Takenouchi