suehyunpark commited on
Commit
107cfa0
1 Parent(s): a24054a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -13
README.md CHANGED
@@ -11,8 +11,8 @@ library_name: transformers
11
 
12
  - **Homepage: In Progress**
13
  - **Repository: https://github.com/kaistAI/Janus**
14
- - **Paper:**
15
- - **Point of Contact:seongyun@kaist.ac.kr**
16
 
17
  # TL; DR
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6550c4f27bbfce1878f5f280/vrQl8D8FV3vqUJYbPgsiG.png)
@@ -20,17 +20,17 @@ library_name: transformers
20
  Janus is a model trained using [Mistral-7B-v0.2](https://huggingface.co/mistral-community/Mistral-7B-v0.2) as its base model. Janus has been trained on [Multifaceted Collection](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-SFT), a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless.
21
 
22
  # Model Details
23
- Janus-RM-7B is a reward model created by training Janus with Multifaceted-Collection-RM. Janus-RM-7B generates rewards when provided with various system messages and instructions, along with the personalized responses generated in accordance with these. This can be utilized to perform tasks such as PPO and best-of-n sampling.
24
 
25
  ## Model Description
26
 
27
  - **Model type:** Language model
28
  - **Language(s) (NLP):** English
29
  - **License:** Apache 2.0
30
- - **Related Models:** [Janus-66k-7B]() [Janus-DPO-7B](), [Janus-ORPO-7B](), [Janus-7B]()
31
- - **Training Datasets**: [Multifaceted-Collection-SFT](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-SFT)
32
  - **Resources for more information:**
33
- - [Research paper]()
34
  - [GitHub Repo](https://github.com/kaistAI/Janus)
35
 
36
  # Usage
@@ -79,23 +79,28 @@ print(decoded[0][len(input_str):])
79
  ```
80
  To train Janus and evaluate the responses it generates, please refer to the [GitHub Repo](https://github.com/kaistAI/Janus).
81
  Additionally, refer to the [Multifaceted Bench](https://huggingface.co/datasets/kaist-ai/Multifaceted-Bench), which evaluates how well LLM generates personalized responses.
 
82
  # Training Details
83
  ## Training hyperparameters
84
 
85
- The following hyperparameters were used during training:
86
- - learning_rate: 5e-06
87
- - train_batch_size: 2
88
  - eval_batch_size: 2
89
  - seed: 42
90
  - distributed_type: multi-GPU
91
  - num_devices: 4
92
  - gradient_accumulation_steps: 4
93
- - total_train_batch_size: 32
94
  - total_eval_batch_size: 8
95
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
96
  - lr_scheduler_type: cosine
97
- - lr_scheduler_warmup_steps: 10
98
- - num_epochs: 4
 
 
 
 
99
 
100
  ## Framework versions
101
 
@@ -103,6 +108,7 @@ The following hyperparameters were used during training:
103
  - Pytorch 2.2.2
104
  - Datasets 2.18.0
105
  - Tokenizers 0.15.0
 
106
 
107
  # Citation
108
 
 
11
 
12
  - **Homepage: In Progress**
13
  - **Repository: https://github.com/kaistAI/Janus**
14
+ - **Paper: https://arxiv.org/abs/2405.17977**
15
+ - **Point of Contact: suehyunpark@kaist.ac.kr**
16
 
17
  # TL; DR
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6550c4f27bbfce1878f5f280/vrQl8D8FV3vqUJYbPgsiG.png)
 
20
  Janus is a model trained using [Mistral-7B-v0.2](https://huggingface.co/mistral-community/Mistral-7B-v0.2) as its base model. Janus has been trained on [Multifaceted Collection](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-SFT), a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless.
21
 
22
  # Model Details
23
+ Janus-RM-7B is a reward model created by training Janus-7B (which is trained for only 1 epoch on the full 196k training instances) with [Multifaceted-Collection-RM](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-RM) and a similar-sized mix of representative general helpfulness data: 72% of [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), 14% of [OASST1 dataset preprocessed for reward modeling](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward), and 14% of [WebGPT Comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons). Janus-RM-7B predicts a scalar reward when provided with a concatenation of system message, instruction, chosen response, and rejected response. This can be utilized to perform as a scoring function for Best-of-N sampling or for preference tuning with proximal policy optimization (PPO).
24
 
25
  ## Model Description
26
 
27
  - **Model type:** Language model
28
  - **Language(s) (NLP):** English
29
  - **License:** Apache 2.0
30
+ - **Related Models:** [Janus-DPO-7B](https://huggingface.co/kaist-ai/janus-dpo-7b), [Janus-ORPO-7B](https://huggingface.co/kaist-ai/janus-orpo-7b), [Janus-7B](https://huggingface.co/kaist-ai/janus-7b)
31
+ - **Training Datasets**: [Multifaceted-Collection-RM](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-RM), [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf), [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward), [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
32
  - **Resources for more information:**
33
+ - [Research paper](https://arxiv.org/abs/2405.17977)
34
  - [GitHub Repo](https://github.com/kaistAI/Janus)
35
 
36
  # Usage
 
79
  ```
80
  To train Janus and evaluate the responses it generates, please refer to the [GitHub Repo](https://github.com/kaistAI/Janus).
81
  Additionally, refer to the [Multifaceted Bench](https://huggingface.co/datasets/kaist-ai/Multifaceted-Bench), which evaluates how well LLM generates personalized responses.
82
+
83
  # Training Details
84
  ## Training hyperparameters
85
 
86
+ The following hyperparameters were used for training:
87
+ - learning_rate: 9e-6
88
+ - train_batch_size: 8
89
  - eval_batch_size: 2
90
  - seed: 42
91
  - distributed_type: multi-GPU
92
  - num_devices: 4
93
  - gradient_accumulation_steps: 4
94
+ - total_train_batch_size: 128
95
  - total_eval_batch_size: 8
96
+ - optimizer: AdamW with betas=(0.9,0.95)
97
  - lr_scheduler_type: cosine
98
+ - lr_scheduler_warmup_steps: 3% of the maximum number of steps
99
+ - num_epochs: 1
100
+ - use_flash_attention_2: true
101
+ - maximum_sequence_length: 2048
102
+ - bf16: true
103
+ - gradient_checkpointing: true
104
 
105
  ## Framework versions
106
 
 
108
  - Pytorch 2.2.2
109
  - Datasets 2.18.0
110
  - Tokenizers 0.15.0
111
+ - DeepSpeed Zero-3
112
 
113
  # Citation
114