gemma-2b-chat-ultra / README.md
Menouar's picture
Upload folder using huggingface_hub
56cbd4e verified
|
raw
history blame
4.78 kB
metadata
license: other
tags:
  - generated_from_trainer
  - google/gemma
  - PyTorch
  - transformers
  - trl
  - peft
  - tensorboard
model-index:
  - name: gemma-2b-chat-ultra
    results: []
datasets:
  - HuggingFaceH4/deita-10k-v0-sft
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
language:
  - en
base_model: google/gemma-2b
widget:
  - example_title: LLM
    messages:
      - role: user
        content: What is a large language model (LLM)?
pipeline_tag: text-generation

Model Card for gemma-2b-chat-ultra:

💬🤖

gemma-2b-chat-ultra is a language model that is trained to act as AI assistant. It is a finetuned version of google/gemma-2b that was trained using SFTTrainer on publicly available dataset HuggingFaceH4/deita-10k-v0-sft.

Training Metrics

The training metrics can be found on TensorBoard.

Training hyperparameters

The following hyperparameters were used during the training:

  • output_dir: peft-lora-model

  • overwrite_output_dir: True

  • do_train: False

  • do_eval: False

  • do_predict: False

  • evaluation_strategy: no

  • prediction_loss_only: False

  • per_device_train_batch_size: 2

  • per_device_eval_batch_size: None

  • per_gpu_train_batch_size: None

  • per_gpu_eval_batch_size: None

  • gradient_accumulation_steps: 4

  • eval_accumulation_steps: None

  • eval_delay: 0

  • learning_rate: 2e-05

  • weight_decay: 0.0

  • adam_beta1: 0.9

  • adam_beta2: 0.999

  • adam_epsilon: 1e-08

  • max_grad_norm: 0.3

  • num_train_epochs: 3

  • max_steps: -1

  • lr_scheduler_type: cosine

  • lr_scheduler_kwargs: {}

  • warmup_ratio: 0.1

  • warmup_steps: 0

  • log_level: passive

  • log_level_replica: warning

  • log_on_each_node: True

  • logging_dir: peft-lora-model/runs/Mar21_07-18-25_a518bf4a6403

  • logging_strategy: steps

  • logging_first_step: False

  • logging_steps: 10

  • logging_nan_inf_filter: True

  • save_strategy: epoch

  • save_steps: 500

  • save_total_limit: None

  • save_safetensors: True

  • save_on_each_node: False

  • save_only_model: False

  • no_cuda: False

  • use_cpu: False

  • use_mps_device: False

  • seed: 42

  • data_seed: None

  • jit_mode_eval: False

  • use_ipex: False

  • bf16: True

  • fp16: False

  • fp16_opt_level: O1

  • half_precision_backend: auto

  • bf16_full_eval: False

  • fp16_full_eval: False

  • tf32: None

  • local_rank: 0

  • ddp_backend: None

  • tpu_num_cores: None

  • tpu_metrics_debug: False

  • debug: []

  • dataloader_drop_last: False

  • eval_steps: None

  • dataloader_num_workers: 0

  • dataloader_prefetch_factor: None

  • past_index: -1

  • run_name: peft-lora-model

  • disable_tqdm: False

  • remove_unused_columns: True

  • label_names: None

  • load_best_model_at_end: False

  • metric_for_best_model: None

  • greater_is_better: None

  • ignore_data_skip: False

  • fsdp: []

  • fsdp_min_num_params: 0

  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}

  • fsdp_transformer_layer_cls_to_wrap: None

  • accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True)

  • deepspeed: None

  • label_smoothing_factor: 0.0

  • optim: adamw_torch_fused

  • optim_args: None

  • adafactor: False

  • group_by_length: False

  • length_column_name: length

  • report_to: ['tensorboard']

  • ddp_find_unused_parameters: None

  • ddp_bucket_cap_mb: None

  • ddp_broadcast_buffers: None

  • dataloader_pin_memory: True

  • dataloader_persistent_workers: False

  • skip_memory_metrics: True

  • use_legacy_prediction_loop: False

  • push_to_hub: False

  • resume_from_checkpoint: None

  • hub_model_id: None

  • hub_strategy: every_save

  • hub_token: None

  • hub_private_repo: False

  • hub_always_push: False

  • gradient_checkpointing: True

  • gradient_checkpointing_kwargs: {'use_reentrant': False}

  • include_inputs_for_metrics: False

  • fp16_backend: auto

  • push_to_hub_model_id: None

  • push_to_hub_organization: None

  • push_to_hub_token: None

  • mp_parameters:

  • auto_find_batch_size: False

  • full_determinism: False

  • torchdynamo: None

  • ray_scope: last

  • ddp_timeout: 1800

  • torch_compile: False

  • torch_compile_backend: None

  • torch_compile_mode: None

  • dispatch_batches: None

  • split_batches: None

  • include_tokens_per_second: False

  • include_num_input_tokens_seen: False

  • neftune_noise_alpha: None

  • distributed_state: Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

  • _n_gpu: 1

  • __cached__setup_devices: cuda:0

  • deepspeed_plugin: None