File size: 1,793 Bytes
1a8b638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
datasets:
- tiiuae/falcon-refinedweb
language:
- en
---

# Falcon-RW-1B

**Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). It is made available under the [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-rw-1b/blob/main/LICENSE.txt).**

RefinedWeb is a high-quality web dataset built by leveraging stringent filtering and large-scale deduplication. Falcon-RW-1B, trained on RefinedWeb only, matches or outperforms comparable models trained on curated data.

This model is intended for use as a research artifact, to study the influence of training on appropriately filtered web data alone.


# Model Card for Falcon-RW-1B


## Model Details

### Model Description

- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Language(s) (NLP):** English
- **License:** TII Falcon LLM License

### Model Source

- **Paper:** coming soon
- **Demo:** coming soon

## Uses

### Direct Use

Research on large language models, and the influence of adequately filtered and deduplicated web data on the properties of large language models (fairness, safety, limitations, capabilities, etc.).

### Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful

## Bias, Risks, and Limitations

Falcon-RW models are trained on English data only, and will not generalize appropriately to other languages. Furthermore, as they are trained on a large-scale corpora representative of the web, they will carry the stereotypes and biases commonly encountered online


## Paper

More details coming soon in the paper.