--- license: other tags: - generated_from_trainer - google/gemma - PyTorch - transformers - trl - peft - tensorboard model-index: - name: gemma-2b-chat-ultra results: [] datasets: - HuggingFaceH4/deita-10k-v0-sft license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms language: - en base_model: google/gemma-2b widget: - example_title: LLM messages: - role: user content: What is a large language model (LLM)? pipeline_tag: text-generation --- # Model Card for gemma-2b-chat-ultra: 💬🤖 **gemma-2b-chat-ultra** is a language model that is trained to act as AI assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset [HuggingFaceH4/deita-10k-v0-sft](https://huggingface.co/datasets/HuggingFaceH4/deita-10k-v0-sft). ## Training Metrics [The training metrics can be found on **TensorBoard**](https://huggingface.co/Menouar/gemma-2b-chat-ultra/tensorboard). ## Training hyperparameters The following hyperparameters were used during the training: - output_dir: peft-lora-model - overwrite_output_dir: True - do_train: False - do_eval: False - do_predict: False - evaluation_strategy: no - prediction_loss_only: False - per_device_train_batch_size: 2 - per_device_eval_batch_size: None - per_gpu_train_batch_size: None - per_gpu_eval_batch_size: None - gradient_accumulation_steps: 4 - eval_accumulation_steps: None - eval_delay: 0 - learning_rate: 2e-05 - weight_decay: 0.0 - adam_beta1: 0.9 - adam_beta2: 0.999 - adam_epsilon: 1e-08 - max_grad_norm: 0.3 - num_train_epochs: 3 - max_steps: -1 - lr_scheduler_type: cosine - lr_scheduler_kwargs: {} - warmup_ratio: 0.1 - warmup_steps: 0 - log_level: passive - log_level_replica: warning - log_on_each_node: True - logging_dir: peft-lora-model/runs/Mar21_07-18-25_a518bf4a6403 - logging_strategy: steps - logging_first_step: False - logging_steps: 10 - logging_nan_inf_filter: True - save_strategy: epoch - save_steps: 500 - save_total_limit: None - save_safetensors: True - save_on_each_node: False - save_only_model: False - no_cuda: False - use_cpu: False - use_mps_device: False - seed: 42 - data_seed: None - jit_mode_eval: False - use_ipex: False - bf16: True - fp16: False - fp16_opt_level: O1 - half_precision_backend: auto - bf16_full_eval: False - fp16_full_eval: False - tf32: None - local_rank: 0 - ddp_backend: None - tpu_num_cores: None - tpu_metrics_debug: False - debug: [] - dataloader_drop_last: False - eval_steps: None - dataloader_num_workers: 0 - dataloader_prefetch_factor: None - past_index: -1 - run_name: peft-lora-model - disable_tqdm: False - remove_unused_columns: True - label_names: None - load_best_model_at_end: False - metric_for_best_model: None - greater_is_better: None - ignore_data_skip: False - fsdp: [] - fsdp_min_num_params: 0 - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - fsdp_transformer_layer_cls_to_wrap: None - accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True) - deepspeed: None - label_smoothing_factor: 0.0 - optim: adamw_torch_fused - optim_args: None - adafactor: False - group_by_length: False - length_column_name: length - report_to: ['tensorboard'] - ddp_find_unused_parameters: None - ddp_bucket_cap_mb: None - ddp_broadcast_buffers: None - dataloader_pin_memory: True - dataloader_persistent_workers: False - skip_memory_metrics: True - use_legacy_prediction_loop: False - push_to_hub: False - resume_from_checkpoint: None - hub_model_id: None - hub_strategy: every_save - hub_token: None - hub_private_repo: False - hub_always_push: False - gradient_checkpointing: True - gradient_checkpointing_kwargs: {'use_reentrant': False} - include_inputs_for_metrics: False - fp16_backend: auto - push_to_hub_model_id: None - push_to_hub_organization: None - push_to_hub_token: None - mp_parameters: - auto_find_batch_size: False - full_determinism: False - torchdynamo: None - ray_scope: last - ddp_timeout: 1800 - torch_compile: False - torch_compile_backend: None - torch_compile_mode: None - dispatch_batches: None - split_batches: None - include_tokens_per_second: False - include_num_input_tokens_seen: False - neftune_noise_alpha: None - distributed_state: Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda - _n_gpu: 1 - __cached__setup_devices: cuda:0 - deepspeed_plugin: None