|
# Using TensorFlow Securely
|
|
|
|
This document discusses how to safely deal with untrusted programs (models or
|
|
model parameters), and input data. Below, we also provide guidelines on how to
|
|
report vulnerabilities in TensorFlow.
|
|
|
|
## TensorFlow models are programs
|
|
|
|
TensorFlow's runtime system interprets and executes programs. What machine
|
|
learning practitioners term
|
|
[**models**](https://developers.google.com/machine-learning/glossary/#model) are
|
|
expressed as programs that TensorFlow executes. TensorFlow programs are encoded
|
|
as computation
|
|
[**graphs**](https://developers.google.com/machine-learning/glossary/#graph).
|
|
The model's parameters are often stored separately in **checkpoints**.
|
|
|
|
At runtime, TensorFlow executes the computation graph using the parameters
|
|
provided. Note that the behavior of the computation graph may change
|
|
depending on the parameters provided. TensorFlow itself is not a sandbox. When
|
|
executing the computation graph, TensorFlow may read and write files, send and
|
|
receive data over the network, and even spawn additional processes. All these
|
|
tasks are performed with the permissions of the TensorFlow process. Allowing
|
|
for this flexibility makes for a powerful machine learning platform,
|
|
but it has implications for security.
|
|
|
|
The computation graph may also accept **inputs**. Those inputs are the
|
|
data you supply to TensorFlow to train a model, or to use a model to run
|
|
inference on the data.
|
|
|
|
**TensorFlow models are programs, and need to be treated as such from a security
|
|
perspective.**
|
|
|
|
## Running untrusted models
|
|
|
|
As a general rule: **Always** execute untrusted models inside a sandbox (e.g.,
|
|
[nsjail](https://github.com/google/nsjail)).
|
|
|
|
There are several ways in which a model could become untrusted. Obviously, if an
|
|
untrusted party supplies TensorFlow kernels, arbitrary code may be executed.
|
|
The same is true if the untrusted party provides Python code, such as the
|
|
Python code that generates TensorFlow graphs.
|
|
|
|
Even if the untrusted party only supplies the serialized computation
|
|
graph (in form of a `GraphDef`, `SavedModel`, or equivalent on-disk format), the
|
|
set of computation primitives available to TensorFlow is powerful enough that
|
|
you should assume that the TensorFlow process effectively executes arbitrary
|
|
code. One common solution is to allow only a few safe Ops. While this is
|
|
possible in theory, we still recommend you sandbox the execution.
|
|
|
|
It depends on the computation graph whether a user provided checkpoint is safe.
|
|
It is easily possible to create computation graphs in which malicious
|
|
checkpoints can trigger unsafe behavior. For example, consider a graph that
|
|
contains a `tf.cond` depending on the value of a `tf.Variable`. One branch of
|
|
the `tf.cond` is harmless, but the other is unsafe. Since the `tf.Variable` is
|
|
stored in the checkpoint, whoever provides the checkpoint now has the ability to
|
|
trigger unsafe behavior, even though the graph is not under their control.
|
|
|
|
In other words, graphs can contain vulnerabilities of their own. To allow users
|
|
to provide checkpoints to a model you run on their behalf (e.g., in order to
|
|
compare model quality for a fixed model architecture), you must carefully audit
|
|
your model, and we recommend you run the TensorFlow process in a sandbox.
|
|
|
|
## Accepting untrusted Inputs
|
|
|
|
It is possible to write models that are secure in a sense that they can safely
|
|
process untrusted inputs assuming there are no bugs. There are two main reasons
|
|
to not rely on this: First, it is easy to write models which must not be exposed
|
|
to untrusted inputs, and second, there are bugs in any software system of
|
|
sufficient complexity. Letting users control inputs could allow them to trigger
|
|
bugs either in TensorFlow or in dependent libraries.
|
|
|
|
In general, it is good practice to isolate parts of any system which is exposed
|
|
to untrusted (e.g., user-provided) inputs in a sandbox.
|
|
|
|
A useful analogy to how any TensorFlow graph is executed is any interpreted
|
|
programming language, such as Python. While it is possible to write secure
|
|
Python code which can be exposed to user supplied inputs (by, e.g., carefully
|
|
quoting and sanitizing input strings, size-checking input blobs, etc.), it is
|
|
very easy to write Python programs which are insecure. Even secure Python code
|
|
could be rendered insecure by a bug in the Python interpreter, or in a bug in a
|
|
Python library used (e.g.,
|
|
[this one](https://www.cvedetails.com/cve/CVE-2017-12852/)).
|
|
|
|
## Running a TensorFlow server
|
|
|
|
TensorFlow is a platform for distributed computing, and as such there is a
|
|
TensorFlow server (`tf.train.Server`). **The TensorFlow server is meant for
|
|
internal communication only. It is not built for use in an untrusted network.**
|
|
|
|
For performance reasons, the default TensorFlow server does not include any
|
|
authorization protocol and sends messages unencrypted. It accepts connections
|
|
from anywhere, and executes the graphs it is sent without performing any checks.
|
|
Therefore, if you run a `tf.train.Server` in your network, anybody with
|
|
access to the network can execute what you should consider arbitrary code with
|
|
the privileges of the process running the `tf.train.Server`.
|
|
|
|
When running distributed TensorFlow, you must isolate the network in which the
|
|
cluster lives. Cloud providers provide instructions for setting up isolated
|
|
networks, which are sometimes branded as "virtual private cloud." Refer to the
|
|
instructions for
|
|
[GCP](https://cloud.google.com/compute/docs/networks-and-firewalls) and
|
|
[AWS](https://aws.amazon.com/vpc/)) for details.
|
|
|
|
Note that `tf.train.Server` is different from the server created by
|
|
`tensorflow/serving` (the default binary for which is called `ModelServer`).
|
|
By default, `ModelServer` also has no built-in mechanism for authentication.
|
|
Connecting it to an untrusted network allows anyone on this network to run the
|
|
graphs known to the `ModelServer`. This means that an attacker may run
|
|
graphs using untrusted inputs as described above, but they would not be able to
|
|
execute arbitrary graphs. It is possible to safely expose a `ModelServer`
|
|
directly to an untrusted network, **but only if the graphs it is configured to
|
|
use have been carefully audited to be safe**.
|
|
|
|
Similar to best practices for other servers, we recommend running any
|
|
`ModelServer` with appropriate privileges (i.e., using a separate user with
|
|
reduced permissions). In the spirit of defense in depth, we recommend
|
|
authenticating requests to any TensorFlow server connected to an untrusted
|
|
network, as well as sandboxing the server to minimize the adverse effects of
|
|
any breach.
|
|
|
|
## Vulnerabilities in TensorFlow
|
|
|
|
TensorFlow is a large and complex system. It also depends on a large set of
|
|
third party libraries (e.g., `numpy`, `libjpeg-turbo`, PNG parsers, `protobuf`).
|
|
It is possible that TensorFlow or its dependent libraries contain
|
|
vulnerabilities that would allow triggering unexpected or dangerous behavior
|
|
with specially crafted inputs.
|
|
|
|
### What is a vulnerability?
|
|
|
|
Given TensorFlow's flexibility, it is possible to specify computation graphs
|
|
which exhibit unexpected or unwanted behavior. The fact that TensorFlow models
|
|
can perform arbitrary computations means that they may read and write files,
|
|
communicate via the network, produce deadlocks and infinite loops, or run out
|
|
of memory. It is only when these behaviors are outside the specifications of the
|
|
operations involved that such behavior is a vulnerability.
|
|
|
|
A `FileWriter` writing a file is not unexpected behavior and therefore is not a
|
|
vulnerability in TensorFlow. A `MatMul` allowing arbitrary binary code execution
|
|
**is** a vulnerability.
|
|
|
|
This is more subtle from a system perspective. For example, it is easy to cause
|
|
a TensorFlow process to try to allocate more memory than available by specifying
|
|
a computation graph containing an ill-considered `tf.tile` operation. TensorFlow
|
|
should exit cleanly in this case (it would raise an exception in Python, or
|
|
return an error `Status` in C++). However, if the surrounding system is not
|
|
expecting the possibility, such behavior could be used in a denial of service
|
|
attack (or worse). Because TensorFlow behaves correctly, this is not a
|
|
vulnerability in TensorFlow (although it would be a vulnerability of this
|
|
hypothetical system).
|
|
|
|
As a general rule, it is incorrect behavior for TensorFlow to access memory it
|
|
does not own, or to terminate in an unclean way. Bugs in TensorFlow that lead to
|
|
such behaviors constitute a vulnerability.
|
|
|
|
One of the most critical parts of any system is input handling. If malicious
|
|
input can trigger side effects or incorrect behavior, this is a bug, and likely
|
|
a vulnerability.
|
|
|
|
### Reporting vulnerabilities
|
|
|
|
Please email reports about any security related issues you find to
|
|
`security@tensorflow.org`. This mail is delivered to a small security team. Your
|
|
email will be acknowledged within one business day, and you'll receive a more
|
|
detailed response to your email within 7 days indicating the next steps in
|
|
handling your report. For critical problems, you may encrypt your report (see
|
|
below).
|
|
|
|
Please use a descriptive subject line for your report email. After the initial
|
|
reply to your report, the security team will endeavor to keep you informed of
|
|
the progress being made towards a fix and announcement.
|
|
|
|
In addition, please include the following information along with your report:
|
|
|
|
* Your name and affiliation (if any).
|
|
* A description of the technical details of the vulnerabilities. It is very
|
|
important to let us know how we can reproduce your findings.
|
|
* An explanation who can exploit this vulnerability, and what they gain when
|
|
doing so -- write an attack scenario. This will help us evaluate your report
|
|
quickly, especially if the issue is complex.
|
|
* Whether this vulnerability public or known to third parties. If it is, please
|
|
provide details.
|
|
|
|
If you believe that an existing (public) issue is security-related, please send
|
|
an email to `security@tensorflow.org`. The email should include the issue ID and
|
|
a short description of why it should be handled according to this security
|
|
policy.
|
|
|
|
Once an issue is reported, TensorFlow uses the following disclosure process:
|
|
|
|
* When a report is received, we confirm the issue and determine its severity.
|
|
* If we know of specific third-party services or software based on TensorFlow
|
|
that require mitigation before publication, those projects will be notified.
|
|
* An advisory is prepared (but not published) which details the problem and
|
|
steps for mitigation.
|
|
* The vulnerability is fixed and potential workarounds are identified.
|
|
* Wherever possible, the fix is also prepared for the branches corresponding to
|
|
all releases of TensorFlow at most one year old. We will attempt to commit
|
|
these fixes as soon as possible, and as close together as possible.
|
|
* Patch releases are published for all fixed released versions, a
|
|
notification is sent to discuss@tensorflow.org, and the advisory is published.
|
|
|
|
Note that we mostly do patch releases for security reasons and each version of
|
|
TensorFlow is supported for only 1 year after the release.
|
|
|
|
Past security advisories are listed below. We credit reporters for identifying
|
|
security issues, although we keep your name confidential if you request it.
|
|
|
|
#### Encryption key for `security@tensorflow.org`
|
|
|
|
If your disclosure is extremely sensitive, you may choose to encrypt your
|
|
report using the key below. Please only use this for critical security
|
|
reports.
|
|
|
|
```
|
|
-----BEGIN PGP PUBLIC KEY BLOCK-----
|
|
|
|
mQENBFpqdzwBCADTeAHLNEe9Vm77AxhmGP+CdjlY84O6DouOCDSq00zFYdIU/7aI
|
|
LjYwhEmDEvLnRCYeFGdIHVtW9YrVktqYE9HXVQC7nULU6U6cvkQbwHCdrjaDaylP
|
|
aJUXkNrrxibhx9YYdy465CfusAaZ0aM+T9DpcZg98SmsSml/HAiiY4mbg/yNVdPs
|
|
SEp/Ui4zdIBNNs6at2gGZrd4qWhdM0MqGJlehqdeUKRICE/mdedXwsWLM8AfEA0e
|
|
OeTVhZ+EtYCypiF4fVl/NsqJ/zhBJpCx/1FBI1Uf/lu2TE4eOS1FgmIqb2j4T+jY
|
|
e+4C8kGB405PAC0n50YpOrOs6k7fiQDjYmbNABEBAAG0LVRlbnNvckZsb3cgU2Vj
|
|
dXJpdHkgPHNlY3VyaXR5QHRlbnNvcmZsb3cub3JnPokBTgQTAQgAOBYhBEkvXzHm
|
|
gOJBnwP4Wxnef3wVoM2yBQJaanc8AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA
|
|
AAoJEBnef3wVoM2yNlkIAICqetv33MD9W6mPAXH3eon+KJoeHQHYOuwWfYkUF6CC
|
|
o+X2dlPqBSqMG3bFuTrrcwjr9w1V8HkNuzzOJvCm1CJVKaxMzPuXhBq5+DeT67+a
|
|
T/wK1L2R1bF0gs7Pp40W3np8iAFEh8sgqtxXvLGJLGDZ1Lnfdprg3HciqaVAiTum
|
|
HBFwszszZZ1wAnKJs5KVteFN7GSSng3qBcj0E0ql2nPGEqCVh+6RG/TU5C8gEsEf
|
|
3DX768M4okmFDKTzLNBm+l08kkBFt+P43rNK8dyC4PXk7yJa93SmS/dlK6DZ16Yw
|
|
2FS1StiZSVqygTW59rM5XNwdhKVXy2mf/RtNSr84gSi5AQ0EWmp3PAEIALInfBLR
|
|
N6fAUGPFj+K3za3PeD0fWDijlC9f4Ety/icwWPkOBdYVBn0atzI21thPRbfuUxfe
|
|
zr76xNNrtRRlbDSAChA1J5T86EflowcQor8dNC6fS+oHFCGeUjfEAm16P6mGTo0p
|
|
osdG2XnnTHOOEFbEUeWOwR/zT0QRaGGknoy2pc4doWcJptqJIdTl1K8xyBieik/b
|
|
nSoClqQdZJa4XA3H9G+F4NmoZGEguC5GGb2P9NHYAJ3MLHBHywZip8g9oojIwda+
|
|
OCLL4UPEZ89cl0EyhXM0nIAmGn3Chdjfu3ebF0SeuToGN8E1goUs3qSE77ZdzIsR
|
|
BzZSDFrgmZH+uP0AEQEAAYkBNgQYAQgAIBYhBEkvXzHmgOJBnwP4Wxnef3wVoM2y
|
|
BQJaanc8AhsMAAoJEBnef3wVoM2yX4wIALcYZbQhSEzCsTl56UHofze6C3QuFQIH
|
|
J4MIKrkTfwiHlCujv7GASGU2Vtis5YEyOoMidUVLlwnebE388MmaJYRm0fhYq6lP
|
|
A3vnOCcczy1tbo846bRdv012zdUA+wY+mOITdOoUjAhYulUR0kiA2UdLSfYzbWwy
|
|
7Obq96Jb/cPRxk8jKUu2rqC/KDrkFDtAtjdIHh6nbbQhFuaRuWntISZgpIJxd8Bt
|
|
Gwi0imUVd9m9wZGuTbDGi6YTNk0GPpX5OMF5hjtM/objzTihSw9UN+65Y/oSQM81
|
|
v//Fw6ZeY+HmRDFdirjD7wXtIuER4vqCryIqR6Xe9X8oJXz9L/Jhslc=
|
|
=CDME
|
|
-----END PGP PUBLIC KEY BLOCK-----
|
|
```
|
|
|
|
### Known Vulnerabilities
|
|
|
|
At this time there are no known vulnerability with TensorFlow-models. For a list of known vulnerabilities and security advisories for TensorFlow,
|
|
[click here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/README.md).
|
|
|