vilsonrodrigues
/

falcon-7b-instruct-sharded

@@ -1,9 +1,22 @@
 ---
 datasets:
-- tiiuae/falcon-refinedweb
 language:
-- en
 inference: true
 license: apache-2.0
 ---
@@ -22,12 +35,16 @@ Tutorial: https://medium.com/@vilsonrodrigues/run-your-private-llm-falcon-7b-ins
 *Paper coming soon 😊.*
 ## Why use Falcon-7B-Instruct?
 * **You are looking for a ready-to-use chat/instruct model based on [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).**
 * **Falcon-7B is a strong base model, outperforming comparable open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 * **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
 💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 🔥 **Looking for an even more powerful model?** [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) is Falcon-7B-Instruct's big brother!
@@ -43,7 +60,6 @@ pipeline = transformers.pipeline(
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
     device_map="auto",
 )
 sequences = pipeline(
@@ -60,6 +76,10 @@ for seq in sequences:
 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
 # Model Card for Falcon-7B-Instruct
@@ -109,7 +129,6 @@ pipeline = transformers.pipeline(
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
     device_map="auto",
 )
 sequences = pipeline(

 ---
 datasets:
+  - tiiuae/falcon-refinedweb
 language:
+  - en
 inference: true
+widget:
+  - text: "Hey Falcon! Any recommendations for my holidays in Abu Dhabi?"
+    example_title: "Abu Dhabi Trip"
+  - text: "What's the Everett interpretation of quantum mechanics?"
+    example_title: "Q/A: Quantum & Answers"
+  - text: "Give me a list of the top 10 dive sites you would recommend around the world."
+    example_title: "Diving Top 10"
+  - text: "Can you tell me more about deep-water soloing?"
+    example_title: "Extreme sports"
+  - text: "Can you write a short tweet about the Apache 2.0 release of our latest AI model, Falcon LLM?"
+    example_title: "Twitter Helper"
+  - text: "What are the responsabilities of a Chief Llama Officer?"
+    example_title: "Trendy Jobs"
 license: apache-2.0
 ---
 *Paper coming soon 😊.*
+🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading [this great blogpost fron HF](https://huggingface.co/blog/falcon)!
 ## Why use Falcon-7B-Instruct?
 * **You are looking for a ready-to-use chat/instruct model based on [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).**
 * **Falcon-7B is a strong base model, outperforming comparable open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 * **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
+⚠️ Falcon is now available as a core model in the `transformers` library! To use the in-library version, please install the latest version of `transformers` with `pip install git+https://github.com/ huggingface/transformers.git`, then simply remove the `trust_remote_code=True` argument from `from_pretrained()`.
 💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 🔥 **Looking for an even more powerful model?** [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) is Falcon-7B-Instruct's big brother!
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 sequences = pipeline(
 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
+For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blogpost]((https://huggingface.co/blog/falcon).
+You will need **at least 16GB of memory** to swiftly run inference with Falcon-7B-Instruct.
 # Model Card for Falcon-7B-Instruct
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 sequences = pipeline(