Spaces:
Running
Running
# Neural net based entropy coding | |
This is a [TensorFlow](http://www.tensorflow.org/) model for additional | |
lossless compression of bitstreams generated by neural net based image | |
encoders as described in | |
[https://arxiv.org/abs/1703.10114](https://arxiv.org/abs/1703.10114). | |
To be more specific, the entropy coder aims at compressing further binary | |
codes which have a 3D tensor structure with: | |
* the first two dimensions of the tensors corresponding to the height and | |
the width of the binary codes, | |
* the last dimension being the depth of the codes. The last dimension can be | |
sliced into N groups of K, where each additional group is used by the image | |
decoder to add more details to the reconstructed image. | |
The code in this directory only contains the underlying code probability model | |
but does not perform the actual compression using arithmetic coding. | |
The code probability model is enough to compute the theoretical compression | |
ratio. | |
## Prerequisites | |
The only software requirements for running the encoder and decoder is having | |
Tensorflow installed. | |
You will also need to add the top level source directory of the entropy coder | |
to your `PYTHONPATH`, for example: | |
`export PYTHONPATH=${PYTHONPATH}:/tmp/models/compression` | |
## Training the entropy coder | |
### Synthetic dataset | |
If you do not have a training dataset, there is a simple code generative model | |
that you can use to generate a dataset and play with the entropy coder. | |
The generative model is located under dataset/gen\_synthetic\_dataset.py. Note | |
that this simple generative model is not going to give good results on real | |
images as it is not supposed to be close to the statistics of the binary | |
representation of encoded images. Consider it as a toy dataset, no more, no | |
less. | |
To generate a synthetic dataset with 20000 samples: | |
`mkdir -p /tmp/dataset` | |
`python ./dataset/gen_synthetic_dataset.py --dataset_dir=/tmp/dataset/ | |
--count=20000` | |
Note that the generator has not been optimized at all, generating the synthetic | |
dataset is currently pretty slow. | |
### Training | |
If you just want to play with the entropy coder trainer, here is the command | |
line that can be used to train the entropy coder on the synthetic dataset: | |
`mkdir -p /tmp/entropy_coder_train` | |
`python ./core/entropy_coder_train.py --task=0 | |
--train_dir=/tmp/entropy_coder_train/ | |
--model=progressive | |
--model_config=./configs/synthetic/model_config.json | |
--train_config=./configs/synthetic/train_config.json | |
--input_config=./configs/synthetic/input_config.json | |
` | |
Training is configured using 3 files formatted using JSON: | |
* One file is used to configure the underlying entropy coder model. | |
Currently, only the *progressive* model is supported. | |
This model takes 2 mandatory parameters and an optional one: | |
* `layer_depth`: the number of bits per layer (a.k.a. iteration). | |
Background: the image decoder takes each layer to add more detail | |
to the image. | |
* `layer_count`: the maximum number of layers that should be supported | |
by the model. This should be equal or greater than the maximum number | |
of layers in the input binary codes. | |
* `coded_layer_count`: This can be used to consider only partial codes, | |
keeping only the first `coded_layer_count` layers and ignoring the | |
remaining layers. If left empty, the binary codes are left unchanged. | |
* One file to configure the training, including the learning rate, ... | |
The meaning of the parameters are pretty straightforward. Note that this | |
file is only used during training and is not needed during inference. | |
* One file to specify the input dataset to use during training. | |
The dataset is formatted using tf.RecordIO. | |
## Inference: file size after entropy coding. | |
### Using a synthetic sample | |
Here is the command line to generate a single synthetic sample formatted | |
in the same way as what is provided by the image encoder: | |
`python ./dataset/gen_synthetic_single.py | |
--sample_filename=/tmp/dataset/sample_0000.npz` | |
To actually compute the additional compression ratio using the entropy coder | |
trained in the previous step: | |
`python ./core/entropy_coder_single.py | |
--model=progressive | |
--model_config=./configs/synthetic/model_config.json | |
--input_codes=/tmp/dataset/sample_0000.npz | |
--checkpoint=/tmp/entropy_coder_train/model.ckpt-209078` | |
where the checkpoint number should be adjusted accordingly. | |