--- library_name: peft base_model: mistralai/Mistral-7B-v0.1 license: apache-2.0 --- ## Model Description The model is a fine-tuned (quantized) Mistral7b model on a self-organised dataset about environmental knowledge. This model is currently still under development. - **Developed by:** Fiona Zhang - **Funded:** CSIRO, Pawsey Supercomputing Research Centre - **Finetuned from model:** [Mistral7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) ## Uses This repository includes the weights learned during the training process. It should be loaded witht the pre-trained Mistral 7b and tokenizer. ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer # Load the tokenizer, adjust configuration if needed tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Load the fine-tuned model with its trained weights fine_tuned_model = AutoModelForSequenceClassification.from_pretrained( 'fionazhang/mistral_7b_environment', ) # Now you can use `fine_tuned_model` for inference or further training input_text = "The impact of climate change on" output_text = fine_tuned_model.generate(tokenizer.encode(input_text, return_tensors="pt")) print(tokenizer.decode(output_text[0], skip_special_tokens=True)) ``` ## Bias, Risks, and Limitations There are no modifications applied to the model. The model may return undesired or offensive response. Filters are encouraged to apply. ## Training Data The fine-tuning data are parsed from these public Wikipedia websites: - [Environmental Issues](https://en.wikipedia.org/wiki/Environmental_issues) - [Natural Environment](https://en.wikipedia.org/wiki/Natural_environment) - [Biophysical Environment](https://en.wikipedia.org/wiki/Biophysical_environment) - [Ecology](https://en.wikipedia.org/wiki/Ecology) - [Environment (Systems)](https://en.wikipedia.org/wiki/Environment_(systems)) - [Built Environment](https://en.wikipedia.org/wiki/Built_environment) - [Climate Change](https://en.wikipedia.org/wiki/Climate_change) - [Human Impact on the Environment](https://en.wikipedia.org/wiki/Human_impact_on_the_environment) - [Environment of Australia](https://en.wikipedia.org/wiki/Environment_of_Australia) - [Environmental Protection](https://en.wikipedia.org/wiki/Environmental_protection) - [Environmental Issues in Australia](https://en.wikipedia.org/wiki/Environmental_issues_in_Australia) The text corpus are preprocessed for better format. ## Training Procedure The fine-tuning is self-supervised. ## Training Hyperparameters ```python training_arguments = TrainingArguments( output_dir="", num_train_epochs=1, per_device_train_batch_size=4, gradient_accumulation_steps=1, optim="paged_adamw_32bit", save_steps=25, logging_steps=25, learning_rate=2e-4, weight_decay=0.001, fp16=False, bf16=False, max_grad_norm=0.3, max_steps=-1, warmup_ratio=0.03, group_by_length=True, lr_scheduler_type="constant", report_to="wandb" ) ``` ## Evaluation Not yet evaluated. Still working ## Environmental Impact - **Hardware Type:** T4 GPU - **Hours used:** <1 - **Cloud Provider:** Google Cloud - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ### Framework versions - PEFT 0.7.1