license: apache-2.0
language:
- ru
- en
- de
- es
- it
- ja
- vi
- zh
- fr
- pt
- id
- ko
pipeline_tag: text-generation
🌍 Vulture-180B
Vulture-180B is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM), on top of the famous Falcon-180B by TII. We collected a new dataset from news articles and Wikipedia's pages of 12 languages (Total: 80GB) and continue the pretraining process of Falcon-180B. Finally, we construct a multilingual instructional dataset following Alpaca's techniques.
While Vulture-180B is an adapter freely usable under APACHE-2.0, Falcon-180B itself remains available only under the Falcon-180B TII License and Acceptable Use Policy. Users should ensure any commercial applications based on Vulture-180B comply with the restrictions on Falcon-180B's use.
Technical Report coming soon 🤗
Prompt Format
The reccomended model usage is:
A chat between a curious user and an artificial intelligence assistant.
USER:{user's question}<|endoftext|>ASSISTANT:
Model Details
Model Description
- Developed by: https://www.tii.ae
- Finetuned by: Virtual Interactive
- Language(s) (NLP): English, German, Spanish, French, Portugese, Russian, Italian, Vietnamese, Indonesian, Chinese, Japanese and Korean
- Training Time: 3,000 A100 Hours
Acknowledgement
- Thanks to TII for the amazing Falcon as the foundation model.
- Big thanks to Google for their generous Cloud credits.
Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
Bias, Risks, and Limitations
Vulture-180B is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
Recommendations
We recommend users of Vulture-180B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
How to Get Started with the Model
To run inference with the model in full bfloat16
precision you need approximately 8xA100 80GB or equivalent.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
from peft import PeftModel
model = "tiiuae/falcon-180b"
adapters_name = 'vilm/vulture-180b'
tokenizer = AutoTokenizer.from_pretrained(model)
m = AutoModelForCausalLM.from_pretrained(model, torch_dtype=torch.bfloat16, device_map="auto" )
m = PeftModel.from_pretrained(m, adapters_name)
prompt = "A chat between a curious user and an artificial intelligence assistant.\n\nUSER:Thành phố Hồ Chí Minh nằm ở đâu?<|endoftext|>ASSISTANT:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = m.generate(input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
do_sample=True,
temperature=0.6,
top_p=0.9,
max_new_tokens=50,)
output = output[0].to("cpu")
print(tokenizer.decode(output))