hy1111 commited on
Commit
2011329
·
verified ·
1 Parent(s): e7462e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -23,7 +23,7 @@ The training data is sourced from two types of datasets:
23
  - **Coarse semantic labels**: 8.5 million images paired with captions of varying quality, ranging from well-defined descriptions to noisy and less relevant text.
24
 
25
  ### 2. Data Filtering
26
- To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
27
 
28
  ![Data Purification Process](figure/newversion.png)
29
  *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
 
23
  - **Coarse semantic labels**: 8.5 million images paired with captions of varying quality, ranging from well-defined descriptions to noisy and less relevant text.
24
 
25
  ### 2. Data Filtering
26
+ To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, CLIP_Sem. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
27
 
28
  ![Data Purification Process](figure/newversion.png)
29
  *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*