stockmark/stockmark-100b-instruct-v0.1

Stockmark-100b-instruct-v0.1 is an instruction tuned version of stockmark-100b, a 100 billion parameter LLM developed by Stockmark Inc.

How to use

import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

prompt_template = """### 指示:
{instruction}

### 応答:
"""

tokenizer = AutoTokenizer.from_pretrained("stockmark/stockmark-100b-instruct-v0.1")
model = AutoPeftModelForCausalLM.from_pretrained("stockmark/stockmark-100b-instruct-v0.1", device_map="auto", torch_dtype=torch.bfloat16)

instruction = "生成AIとは？"
prompt = prompt_template.format(instruction=instruction)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
with torch.inference_mode():
    tokens = model.generate(
        input_ids,
        max_new_tokens = 256,
        do_sample = True,
        temperature = 0.7,
        top_p = 0.95,
        repetition_penalty = 1.08
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)

Dataset (fine-tuning)

Ichikara instruction [Web Page], [Ppaer]

Performance

Stockmark Business Questions

Dataset: https://huggingface.co/datasets/stockmark/business-questions

model	accuracy
stockmark-100b-instruct	0.90
stockmark-13b-instruct	0.80
GPT-3.5-turbo^1	0.42

Japanese Vicuna QA Benchmark

We excluded categories that require calculation and coding, and use remaining 60 questions for evaluation.

GitHub: https://github.com/ku-nlp/ja-vicuna-qa-benchmark

model	average score
stockmark-100b-instruct	5.97
tokyotech-llm/Swallow-70b-instruct-hf	5.59
GPT-3.5 (text-davinci-003)	5.08

Inference speed

model	time [s] for genrating 100 characters in Japanese
stockmark-100b-instruct	1.86
gpt-3.5-turbo	2.15
gpt-4-turbo	5.48
tokyotech-llm/Swallow-70b-instruct-hf	2.22

For local LLMs, we measured the inference time using AWS Inferentia2.

License

MIT

Developed by

Stockmark Inc.

stockmark
/

stockmark-100b-instruct-v0.1