--- license: apache-2.0 base_model: mistralai/Mistral-7B-v0.1 tags: - trl - sft - generated_from_trainer model-index: - name: mistral-environment-all results: [] --- # mistral-environment-all ## Model Description The model is a fine-tuned (quantized) Mistral7b model on a self-organised dataset about environmental knowledge. This model is currently still under development. - **Developed by:** Fiona Zhang - **Funded:** CSIRO, Pawsey Supercomputing Research Centre - **Finetuned from model:** [Mistral7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) ## Uses This repository includes the weights learned during the training process. It should be loaded witht the pre-trained Mistral 7b and tokenizer. ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # Load the tokenizer, adjust configuration if needed tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Text generation def generate_text_sequences(pipe, prompt): sequences = pipe( f"prompt", do_sample=True, max_new_tokens=100, temperature=0.8, top_k=50, top_p=0.95, num_return_sequences=1, ) return sequences[0]['generated_text'] # Now you can use the model for inference pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto", pad_token_id=2 ) print(generate_text_sequences(pipe, "your prompt")) ``` ## Training Data The fine-tuning data are parsed from these public Wikipedia websites: - [Environmental Issues](https://en.wikipedia.org/wiki/Environmental_issues) - [Natural Environment](https://en.wikipedia.org/wiki/Natural_environment) - [Biophysical Environment](https://en.wikipedia.org/wiki/Biophysical_environment) - [Ecology](https://en.wikipedia.org/wiki/Ecology) - [Environment (Systems)](https://en.wikipedia.org/wiki/Environment_(systems)) - [Built Environment](https://en.wikipedia.org/wiki/Built_environment) - [Climate Change](https://en.wikipedia.org/wiki/Climate_change) - [Human Impact on the Environment](https://en.wikipedia.org/wiki/Human_impact_on_the_environment) - [Environment of Australia](https://en.wikipedia.org/wiki/Environment_of_Australia) - [Environmental Protection](https://en.wikipedia.org/wiki/Environmental_protection) - [Environmental Issues in Australia](https://en.wikipedia.org/wiki/Environmental_issues_in_Australia) The text corpus are preprocessed for better format. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.03 - num_epochs: 1 ### Training results ### Framework versions - Transformers 4.36.2 - Pytorch 2.1.0a0+git7bcf7da - Datasets 2.16.1 - Tokenizers 0.15.0 ## Environmental Impact - **Hardware Type:** Setonix (Pawsey Supercomputing Research Centre) - **Hours used:** <1 - **Cloud Provider:** Google Cloud - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed]