giobin commited on
Commit
3b7a94a
·
verified ·
1 Parent(s): ff8b510

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -13
README.md CHANGED
@@ -11,19 +11,70 @@ This is a IDEFICS 9B model trained with ppo on the frozenlake env.
11
 
12
  ## Model Details
13
 
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ### Model Sources [optional]
29
 
 
11
 
12
  ## Model Details
13
 
14
+ ### Trainer Hyperparameters
15
+
16
+ suppress_warnings: True
17
+ debug: True
18
+ seed: 9812
19
+ reseed_env: True
20
+ torch_deterministic: True
21
+ track: True
22
+ wandb_project_name: "frozenlake_idefics"
23
+ wandb_entity: null #'rl-team-unito'
24
+ wandb_log_dir: "${now:%Y-%m-%d_%H-%M-%S}"
25
+ save_video: True
26
+ save_video_every: 20
27
+ save_stats: True
28
+ save_episode: False
29
+ env_size: 244
30
+ env_area: 8
31
+ num_prompt_images: 1
32
+ use_text_description: True
33
+
34
+ # Algorithm specific arguments
35
+ model: "HuggingFaceM4/idefics-9b-instruct"
36
+ model_ckpt: null
37
+ lora_adapter_path: null
38
+ is_slippery: False
39
+ fixed_orientation: True
40
+ no_step_description: False
41
+ first_person: True
42
+ fov: 1
43
+
44
+ total_timesteps: 400000
45
+ disable_training: False
46
+ from_accelerate_savestate_to_checkpoint: False
47
+ learning_rate: 1e-5
48
+ critic_learning_rate: 1e-5
49
+ local_num_envs: 4
50
+ num_steps: 128
51
+ anneal_lr: False
52
+ gamma: 0.99
53
+ gae_lambda: 0.95
54
+ num_minibatches: 128
55
+ update_epochs: 1
56
+ norm_adv: True
57
+ clip_coef: 0.1
58
+ clip_vloss: True
59
+ ent_coef: 0.01 #0.01
60
+ vf_coef: 0.5
61
+ max_grad_norm: 0.5
62
+ target_kl: null
63
+ save_every: 50
64
+ gradient_accumulation: 4
65
+ adam_epsilon: 1e-8
66
+ gradient_ckpt: False
67
+ lora: True
68
+ temperature: 'max_logit'
69
+ disable_adapters_for_generation: True
70
+ normalization_by_words: False
71
+ action_logits_from_whole_seq: True
72
+ advanced_action_matching: False
73
+ env_id: "FrozenLakeText-v0" # MiniGrid-LavaGapS7-v0
74
+ generate_actions: False
75
+ value_prompt_template: "I am the agent in this minigrid world. {} Avoid the traps!\nWhat's the next best action?"
76
+ action_template: " Based on the information provided, the next best action would be to {}"
77
+ possible_actions_list: "forward pickup toggle opt_left opt_right opt_back"
78
 
79
  ### Model Sources [optional]
80