# UVQ: Universal Video Quality Model
This repository contains checkpointed models of Google's Universal Video Quality (UVQ) model.
UVQ is a no-reference perceptual video quality assessment model that is designed to work
well on user-generated content, where there is no pristine reference.
Read this blog post for an overview of UVQ:
"[UVQ: Measuring YouTube's Perceptual Video Quality](https://ai.googleblog.com/2022/08/uvq-measuring-youtubes-perceptual-video.html)", Google AI Blog 2022
More details are available in our paper:
Yilin Wang, Junjie Ke, Hossein Talebi, Joong Gon Yim, Neil Birkbeck, Balu Adsumilli, Peyman Milanfar, Feng Yang, "[Rich features for perceptual quality assessment of UGC videos](https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Rich_Features_for_Perceptual_Quality_Assessment_of_UGC_Videos_CVPR_2021_paper.html)", CVPR 2021.
The corresponding data from the paper is available for download from: [YouTube UGC Dataset](https://media.withyoutube.com)
## Running the code
### Dependencies
You must have [FFmpeg](http://www.ffmpeg.org/) installed and available on your path.
The models and code require Python 3.6 (or greater) and [Tensorflow](https://www.tensorflow.org/install).
With virtualenv, you can install the requirements to a virtual environment:
```
virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt
```
### Predict Quality
You can grab some examples videos from the [YouTube UGC Dataset](https://media.withyoutube.com). For example, you can get Gaming_1080P-0ce6_orig.mp4 using curl:
```
curl -o Gaming_1080P-0ce6_orig.mp4 https://storage.googleapis.com/ugc-dataset/vp9_compressed_videos/Gaming_1080P-0ce6_orig.mp4
```
You can then run the example:
```bash
mkdir -p results
python3 uvq_main.py --input_files="Gaming_1080P-0ce6_orig,20,Gaming_1080P-0ce6_orig.mp4" --output_dir results --model_dir models
```
#### Input file formatting
The input files format is a line with the following fields:
`id,video_length,filepath`
#### Results
The `output_dir` will contain a csv file with the results for each model. For example,
```bash
cat results/mos_ytugc20s_0_Gaming_1080P-0ce6_orig.mp4_orig.csv
```
Gives:
```bash
Gaming_1080P-0ce6,compression,3.927867603302002
Gaming_1080P-0ce6,content,3.945391607284546
Gaming_1080P-0ce6,distortion,4.267196607589722
Gaming_1080P-0ce6,compression_content,3.9505696296691895
Gaming_1080P-0ce6,compression_distortion,4.062019920349121
Gaming_1080P-0ce6,content_distortion,4.067790699005127
Gaming_1080P-0ce6,compression_content_distortion,4.058663845062256
```
We provide multiple predcited scores, using different combinations of UVQ features.
`compression_content_distortion` (combining three features) is our default score for Mean Opinion Score (MOS) prediction.
The output features folder includes UVQ labels and raw features:
```bash
Gaming_1080P-0ce6_orig_feature_compression.binary
Gaming_1080P-0ce6_orig_feature_content.binary
Gaming_1080P-0ce6_orig_feature_distortion.binary
Gaming_1080P-0ce6_orig_label_compression.csv
Gaming_1080P-0ce6_orig_label_content.csv
Gaming_1080P-0ce6_orig_label_distortion.csv
```
UVQ labels (.csv, each row corresponding to 1s chunk):
compression: 16 compression levels per row, corresponding to 4x4 subregions of the entire frame.
distortion: 26 distortion types defined in [KADID-10k](http://database.mmsp-kn.de/kadid-10k-database.html) for 2x2 subregions. The first element is the undefined type.
content: 3862 content labels defined in [YouTube-8M](https://research.google.com/youtube8m/).
UVQ raw features (in binary):
25600 float numbers per 1s chunk.
## Contributors
[//]: contributor-faces
[//]: contributor-faces