Text Generation
English
peft - PEFT 0.5.0
Edit model card

Model Details

QLoRA Reinforcement Learning (DPO) on Llama-2-13b-chat.

Model Description

  • Train by: [Nekochu]
  • Model type: [Llama]
  • Finetuned from model Llama2

How to Get Started with the Model

Output example tested In text-generation-webui
Input base llama-2-chat QLoRa
Name 3 applications for artificial intelligence Sure! Here are three potential applications of artificial intelligence (AI):

- Healthcare: AI can be used to analyze medical records and identify patterns that could lead to better diagnoses or treatments for diseases. Additionally, AI-powered robots can assist in surgeries and other medical procedures, allowing doctors to perform complex tasks with greater precision and accuracy.
- Finance: AI can be used to detect fraud and prevent financial losses by analyzing large amounts of data quickly and accurately. Additionally, AI-powered chatbots can help customers with their financial queries, freeing up human representatives to focus on more complex issues.
Education: AI can be used to personalize learning experiences for students, adapting to their individual needs and abilities. Additionally, AI-powered tools can help teachers with grading and providing feedback, reducing their workload and allowing them to focus on more important aspects of
Certainly! In terms of applications, there are countless possibilities, but here are three examples that come to mind:

1) Virtual Assistants - These virtual assistants use natural language processing (NLP) capabilities to understand voice commands from users, perform tasks such as scheduling appointments or sending emails, and even make decisions based on user preferences.
2) Predictive Maintenance - Artificial intelligence algorithms can analyze data from sensors in industrial machines and predict when maintenance will be required, reducing downtime and increasing productivity.
3) Fraud Detection - Machine learning algorithms can analyze patterns in customer transactions to identify fraudulent activity, allowing businesses to take action before financial loss occurs.

Prompt template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:

Training Details

main branch is 200k step. 3 epoch for a total of 400k step in 10 days

Click to see details

Training and evaluation data

Nekochu

Training procedure

Datasets
  • HH-RLHF (en)
  • Open Assistant (multilingual)
  • GPT-4 Generated Data (en&zh)

Training hyperparameters

The following hyperparameters were used during training:

  • finetuning_type: lora
  • quantization_bit: 4
  • stage: dpo
  • learning_rate: 5e-05
  • cutoff_len: 4096
  • num_train_epochs: 3.0
  • max_samples: 100000
  • warmup_steps: 0
  • train_batch_size: 1
  • distributed_type: single-GPU
  • num_devices: 1
  • warmup_steps: 0
  • rope_scaling: linear
  • lora_rank: 32
  • lora_dropout: 0.15
  • dpo_beta: 0.1
  • bnb_4bit_compute_dtype: bfloat16

Trainer

  • hiyouga/LLaMA-Efficient-Tuning
Downloads last month
0
Inference Examples
Inference API (serverless) has been turned off for this model.
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Datasets used to train Nekochu/Luminia-13B-v1-QLora