giobin
/

IDEFICS_frozenlake_rocket_trained

Model card Files Files and versions Community

giobin commited on Sep 10, 2024

Commit

3b7a94a

·

verified ·

1 Parent(s): ff8b510

Update README.md

Files changed (1) hide show

README.md +64 -13

README.md CHANGED Viewed

@@ -11,19 +11,70 @@ This is a IDEFICS 9B model trained with ppo on the frozenlake env.
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]

 ## Model Details
+### Trainer Hyperparameters
+suppress_warnings: True
+debug: True
+seed: 9812
+reseed_env: True
+torch_deterministic: True
+track: True
+wandb_project_name: "frozenlake_idefics"
+wandb_entity: null #'rl-team-unito'
+wandb_log_dir: "${now:%Y-%m-%d_%H-%M-%S}"
+save_video: True
+save_video_every: 20
+save_stats: True
+save_episode: False
+env_size: 244
+env_area: 8
+num_prompt_images: 1
+use_text_description: True
+# Algorithm specific arguments
+model: "HuggingFaceM4/idefics-9b-instruct"
+model_ckpt: null
+lora_adapter_path: null
+is_slippery: False
+fixed_orientation: True
+no_step_description: False
+first_person: True
+fov: 1
+total_timesteps: 400000
+disable_training: False
+from_accelerate_savestate_to_checkpoint: False
+learning_rate: 1e-5
+critic_learning_rate: 1e-5
+local_num_envs: 4
+num_steps: 128
+anneal_lr: False
+gamma: 0.99
+gae_lambda: 0.95
+num_minibatches: 128
+update_epochs: 1
+norm_adv: True
+clip_coef: 0.1
+clip_vloss: True
+ent_coef: 0.01 #0.01
+vf_coef: 0.5
+max_grad_norm: 0.5
+target_kl: null
+save_every: 50
+gradient_accumulation: 4
+adam_epsilon: 1e-8
+gradient_ckpt: False
+lora: True
+temperature: 'max_logit'
+disable_adapters_for_generation: True
+normalization_by_words: False
+action_logits_from_whole_seq: True
+advanced_action_matching: False
+env_id: "FrozenLakeText-v0"  # MiniGrid-LavaGapS7-v0
+generate_actions: False
+value_prompt_template: "I am the agent in this minigrid world. {} Avoid the traps!\nWhat's the next best action?"
+action_template: " Based on the information provided, the next best action would be to {}"
+possible_actions_list: "forward pickup toggle opt_left opt_right opt_back"
 ### Model Sources [optional]