OceanGPT
Collection
6 items
β’
Updated
β’
1
This repo contains a large language model (OceanGPT) for ocean science tasks trained with KnowLM. It should be noted that the OceanGPT is constantly being updated, so the current model is not the final version.
OceanGPT-7B-v0.2 is based on Qwen2-7B and trained on a bilingual dataset in Chinese and English.
You can download the model to generate responses or contact the email for the online test demo.
We wil provide several examples soon and you can modify the input according to your needs.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"zjunlp/OceanGPT-7B-v0.2",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("zjunlp/OceanGPT-7B-v0.2")
prompt = "Which is the largest ocean in the world?"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
We wil provide several examples soon and you can modify the input according to your needs.
*Note: We are conducting the final checks on OceanBench and will be uploading it to Hugging Face soon.
>>> from datasets import load_dataset
>>> dataset = load_dataset("zjunlp/OceanBench")
@article{bi2023oceangpt,
title={OceanGPT: A Large Language Model for Ocean Science Tasks},
author={Bi, Zhen and Zhang, Ningyu and Xue, Yida and Ou, Yixin and Ji, Daxiong and Zheng, Guozhou and Chen, Huajun},
journal={arXiv preprint arXiv:2310.02031},
year={2023}
}