--- license: apache-2.0 datasets: - Open-Orca/SlimOrca pipeline_tag: text-generation --- Obtained from freecs/ThetaWave-7B after SFT fine tuning. Open-Orca/SlimOrca datasets were used. The model does not currently support system_prompt because it uses mistral's chat_template, and the next release is in training to switch to the chatml template to support system_prompt. system_prompt can be implemented if you manually change the chat_template, but the After testing, this seems to degrade the model performance. More model details will be released... Vllm deployment command ``` # Single graphics card python /path/to/vllm/vllm/entrypoints/openai/api_server.py \ --model '/path/to/ThetaWave-7B-sft' \ --tokenizer '/path/to/ThetaWave-7B-sft' \ --tokenizer-mode auto \ --dtype float16 \ --enforce-eager \ --host 0.0.0.0 \ --port 6000 \ --disable-log-stats \ --disable-log-requests # Dual graphics cards python /path/to/vllm/vllm/entrypoints/openai/api_server.py \ --model '/path/to/ThetaWave-7B-sft' \ --tokenizer '/path/to/ThetaWave-7B-sft' \ --tokenizer-mode auto \ --dtype float16 \ --enforce-eager \ --tensor-parallel-size 2 \ --worker-use-ray \ --engine-use-ray \ --host 0.0.0.0 \ --port 6000 \ --disable-log-stats \ --disable-log-requests ``` Try it directly: ``` from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("Liangmingxin/ThetaWave-7B-sft") tokenizer = AutoTokenizer.from_pretrained("Liangmingxin/ThetaWave-7B-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ```