fionazhang commited on
Commit
ba2832b
1 Parent(s): 37e9970

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -7
README.md CHANGED
@@ -15,19 +15,62 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # mistral-environment-all
17
 
18
- This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
19
 
20
- ## Model description
 
21
 
22
- More information needed
23
 
24
- ## Intended uses & limitations
 
 
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training procedure
33
 
@@ -53,3 +96,14 @@ The following hyperparameters were used during training:
53
  - Pytorch 2.1.0a0+git7bcf7da
54
  - Datasets 2.16.1
55
  - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  # mistral-environment-all
17
 
18
+ ## Model Description
19
 
20
+ <!-- Provide a longer summary of what this model is. -->
21
+ The model is a fine-tuned (quantized) Mistral7b model on a self-organised dataset about environmental knowledge. This model is currently still under development.
22
 
 
23
 
24
+ - **Developed by:** Fiona Zhang
25
+ - **Funded:** CSIRO, Pawsey Supercomputing Research Centre
26
+ - **Finetuned from model:** [Mistral7b](https://huggingface.co/mistralai/Mistral-7B-v0.1)
27
 
 
28
 
 
29
 
30
+ ## Uses
31
+
32
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
33
+ This repository includes the weights learned during the training process. It should be loaded witht the pre-trained Mistral 7b and tokenizer.
34
+
35
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
36
+ ```python
37
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
38
+
39
+ # Load the tokenizer, adjust configuration if needed
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+ model = AutoModelForCausalLM.from_pretrained(model_name)
42
+
43
+ # Load the fine-tuned model with its trained weights
44
+ fine_tuned_model = AutoModelForSequenceClassification.from_pretrained(
45
+ 'fionazhang/mistral_7b_environment',
46
+ )
47
+
48
+ # Now you can use `fine_tuned_model` for inference or further training
49
+ input_text = "The impact of climate change on"
50
+ output_text = fine_tuned_model.generate(tokenizer.encode(input_text, return_tensors="pt"))
51
+
52
+ print(tokenizer.decode(output_text[0], skip_special_tokens=True))
53
+
54
+ ```
55
+
56
+
57
+ ## Training Data
58
+
59
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
60
+ The fine-tuning data are parsed from these public Wikipedia websites:
61
+ - [Environmental Issues](https://en.wikipedia.org/wiki/Environmental_issues)
62
+ - [Natural Environment](https://en.wikipedia.org/wiki/Natural_environment)
63
+ - [Biophysical Environment](https://en.wikipedia.org/wiki/Biophysical_environment)
64
+ - [Ecology](https://en.wikipedia.org/wiki/Ecology)
65
+ - [Environment (Systems)](https://en.wikipedia.org/wiki/Environment_(systems))
66
+ - [Built Environment](https://en.wikipedia.org/wiki/Built_environment)
67
+ - [Climate Change](https://en.wikipedia.org/wiki/Climate_change)
68
+ - [Human Impact on the Environment](https://en.wikipedia.org/wiki/Human_impact_on_the_environment)
69
+ - [Environment of Australia](https://en.wikipedia.org/wiki/Environment_of_Australia)
70
+ - [Environmental Protection](https://en.wikipedia.org/wiki/Environmental_protection)
71
+ - [Environmental Issues in Australia](https://en.wikipedia.org/wiki/Environmental_issues_in_Australia)
72
+
73
+ The text corpus are preprocessed for better format.
74
 
75
  ## Training procedure
76
 
 
96
  - Pytorch 2.1.0a0+git7bcf7da
97
  - Datasets 2.16.1
98
  - Tokenizers 0.15.0
99
+
100
+
101
+ ## Environmental Impact
102
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
103
+
104
+ - **Hardware Type:** T4 GPU
105
+ - **Hours used:** <1
106
+ - **Cloud Provider:** Google Cloud
107
+ - **Compute Region:** [More Information Needed]
108
+ - **Carbon Emitted:** [More Information Needed]
109
+