Text Generation
Transformers
Safetensors
Korean
llama
conversational
Inference Endpoints
text-generation-inference
JeongwonChoi's picture
Initial commit
64685a1
metadata
tags:
  - text-generation
license: cc-by-nc-sa-4.0
language:
  - ko
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
pipeline_tag: text-generation

DataVortexTL-1.1B-v0.1

DataVortex

License

cc-by-nc-sa-4.0

Model Details

Base Model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Trained On

H100 80GB 1ea

Instruction format

Model Benchmark

Ko-LLM-Leaderboard

On Benchmarking...

Implementation Code

Since, chat_template already contains insturction format above. You can use the code below.

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

model = AutoModelForCausalLM.from_pretrained("Edentns/DataVortexTL-1.1B-v0.1", device_map=device)
tokenizer = AutoTokenizer.from_pretrained("Edentns/DataVortexTL-1.1B-v0.1")

messages = [
    { "role": "user", "content": "대한민국의 수도는 어디야?" }
]

encoded = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_token_type_ids=False
).to(device)

decoded = model.generate(
    input_ids=encoded,
    temperature=0.2,
    top_p=0.9,
    repetition_penalty=1.2,
    do_sample=True,
    max_length=4096,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id
)
decoded = decoded[0][encoded.shape[1]:decoded[0].shape[-1]]
decoded_text = tokenizer.decode(decoded, skip_special_tokens=True)
print(decoded_text)