--- datasets: - tiiuae/falcon-refinedweb language: - en --- # Falcon-RW-1B **Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). It is made available under the [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-rw-1b/blob/main/LICENSE.txt).** RefinedWeb is a high-quality web dataset built by leveraging stringent filtering and large-scale deduplication. Falcon-RW-1B, trained on RefinedWeb only, matches or outperforms comparable models trained on curated data. This model is intended for use as a research artifact, to study the influence of training on appropriately filtered web data alone. # Model Card for Falcon-RW-1B ## Model Details ### Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Language(s) (NLP):** English - **License:** TII Falcon LLM License ### Model Source - **Paper:** coming soon - **Demo:** coming soon ## Uses ### Direct Use Research on large language models, and the influence of adequately filtered and deduplicated web data on the properties of large language models (fairness, safety, limitations, capabilities, etc.). ### Out-of-Scope Use Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful ## Bias, Risks, and Limitations Falcon-RW models are trained on English data only, and will not generalize appropriately to other languages. Furthermore, as they are trained on a large-scale corpora representative of the web, they will carry the stereotypes and biases commonly encountered online ## Paper More details coming soon in the paper.