File size: 3,394 Bytes
865fce4 3bc2906 65cb381 0a97cd0 3bc2906 5849ee8 88fff50 5849ee8 3bc2906 07586fb 3bc2906 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: mit
datasets:
- SustcZhangYX/ChatEnv
language:
- en
tags:
- Environmental Science
---
<div align="center">
<img src="LOGO.PNG" width="450px">
<h1 align="center"><font face="Arial">EnvGPT: Leveraging a Large Language Model for Environmental Science</font></h1>
</div>
**EnvGPT** is the first domain-specific large language model tailored for environmental science tasks.
Environmental science presents unique challenges for LLMs due to its interdisciplinary nature. EnvGPT was developed to address these challenges by leveraging a domain-specific environmental science instruction dataset and benchmark.
*The model was fine-tuned on this environmental science-specific instruction dataset, [ChatEnv](https://huggingface.co/datasets/SustcZhangYX/ChatEnv), through Supervised Fine-Tuning (SFT). The dataset contains a total token count of **107,197,329**, highlighting its depth and comprehensiveness for environmental science tasks.*
## 🚀 Getting Started
### Download the model
Download the model: [EnvGPT](https://huggingface.co/SustcZhangYX/EnvGPT)
```shell
git lfs install
git clone https://huggingface.co/SustcZhangYX/EnvGPT
```
### Model Usage
Here is a Python code snippet that demonstrates how to load the tokenizer and model and generate text using EnvGPT.
```python
import transformers
import torch
# Set the path to your local model
model_path = "YOUR_LOCAL_MODEL_PATH"
pipeline = transformers.pipeline(
"text-generation",
model=model_path, # Use local model path
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are an expert assistant in environmental science, EnvGPT.You are a helpful assistant."},
{"role": "user", "content": "What is the definition of environmental science?"},
]
# Pass top_p and temperature directly in the pipeline call
outputs = pipeline(
messages,
max_new_tokens=4096,
top_p=0.7, # Add nucleus sampling
temperature=0.9, # Add temperature control
)
print(outputs[0]["generated_text"])
```
This code demonstrates how to load the tokenizer and model from your local path, define environmental science-specific prompts, and generate responses using sampling techniques like top-p and temperature.
## 🌏 Acknowledgement
EnvGPT is fine-tuned based on the open-sourced [LLaMA](https://huggingface.co/meta-llama). We thank Meta AI for their contributions to the community.
## ❗Disclaimer
This project is intended solely for academic research and exploration. Please note that, like all large language models, this model may exhibit limitations, including potential inaccuracies or hallucinations in generated outputs.
## Limitations
- The model may produce hallucinated outputs or inaccuracies, which are inherent to large language models.
- The model's identity has not been specifically optimized and may generate content that resembles outputs from other LLaMA-based models or similar architectures.
- Generated outputs can vary between attempts due to sensitivity to prompt phrasing and token context.
## 🚩Citation
If you use EnvGPT in your research or applications, please cite this work as follows:
```Markdown
[Placeholder for Citation]
Please refer to the forthcoming publication for details about EnvGPT.
This section will be updated with the citation once the paper is officially published.
```
|