Edit model card

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

llamafactory-cli train
--stage sft
--do_train True
--model_name_or_path shenzhi-wang/Llama3-8B-Chinese-Chat
--preprocessing_num_workers 16
--finetuning_type lora
--quantization_bit 4
--template llama3
--flash_attn auto
--dataset_dir data
--dataset healthcare
--cutoff_len 512
--learning_rate 0.0002
--num_train_epochs 1.0
--max_samples 10000
--per_device_train_batch_size 2
--gradient_accumulation_steps 16
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--optim adamw_torch
--packing False
--report_to none
--output_dir saves/LLaMA3-8B-Chinese-Chat/lora/4-bit-LLaMA3-8B-Chinese-Chat
--fp16 True
--plot_loss True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--lora_rank 16
--lora_alpha 16
--lora_dropout 0.05
--lora_target all

Speeds, Sizes, Times [optional]

Time: 0:48:21

[More Information Needed]

Evaluation

llamafactory-cli train
--stage sft
--model_name_or_path shenzhi-wang/Llama3-8B-Chinese-Chat
--preprocessing_num_workers 16
--finetuning_type lora
--quantization_bit 4
--template llama3
--flash_attn auto
--dataset_dir data
--dataset sample_healthcare
--cutoff_len 512
--max_samples 10000
--per_device_eval_batch_size 2
--predict_with_generate True
--max_new_tokens 512
--top_p 0.7
--temperature 0.95
--output_dir saves/LLaMA3-8B-Chinese-Chat/lora/eval_2024-06-19-09-59-51
--do_predict True
--adapter_name_or_path saves/LLaMA3-8B-Chinese-Chat/lora/4-bit-LLaMA3-8B-Chinese-Chat

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

{
"predict_bleu-4": 3.6242000999999995,
"predict_rouge-1": 18.92082985,
"predict_rouge-2": 2.8953536,
"predict_rouge-l": 15.46343345,
"predict_runtime": 35159.3161,
"predict_samples_per_second": 0.057,
"predict_steps_per_second": 0.028
}

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·