|
# Use Models |
|
|
|
## Build Models from Yacs Config |
|
From a yacs config object, |
|
models (and their sub-models) can be built by |
|
functions such as `build_model`, `build_backbone`, `build_roi_heads`: |
|
```python |
|
from detectron2.modeling import build_model |
|
model = build_model(cfg) # returns a torch.nn.Module |
|
``` |
|
|
|
`build_model` only builds the model structure and fills it with random parameters. |
|
See below for how to load an existing checkpoint to the model and how to use the `model` object. |
|
|
|
### Load/Save a Checkpoint |
|
```python |
|
from detectron2.checkpoint import DetectionCheckpointer |
|
DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS |
|
|
|
checkpointer = DetectionCheckpointer(model, save_dir="output") |
|
checkpointer.save("model_999") # save to output/model_999.pth |
|
``` |
|
|
|
Detectron2's checkpointer recognizes models in pytorch's `.pth` format, as well as the `.pkl` files |
|
in our model zoo. |
|
See [API doc](../modules/checkpoint.html#detectron2.checkpoint.DetectionCheckpointer) |
|
for more details about its usage. |
|
|
|
The model files can be arbitrarily manipulated using `torch.{load,save}` for `.pth` files or |
|
`pickle.{dump,load}` for `.pkl` files. |
|
|
|
### Use a Model |
|
|
|
A model can be called by `outputs = model(inputs)`, where `inputs` is a `list[dict]`. |
|
Each dict corresponds to one image and the required keys |
|
depend on the type of model, and whether the model is in training or evaluation mode. |
|
For example, in order to do inference, |
|
all existing models expect the "image" key, and optionally "height" and "width". |
|
The detailed format of inputs and outputs of existing models are explained below. |
|
|
|
__Training__: When in training mode, all models are required to be used under an `EventStorage`. |
|
The training statistics will be put into the storage: |
|
```python |
|
from detectron2.utils.events import EventStorage |
|
with EventStorage() as storage: |
|
losses = model(inputs) |
|
``` |
|
|
|
__Inference__: If you only want to do simple inference using an existing model, |
|
[DefaultPredictor](../modules/engine.html#detectron2.engine.defaults.DefaultPredictor) |
|
is a wrapper around model that provides such basic functionality. |
|
It includes default behavior including model loading, preprocessing, |
|
and operates on single image rather than batches. See its documentation for usage. |
|
|
|
You can also run inference directly like this: |
|
``` |
|
model.eval() |
|
with torch.no_grad(): |
|
outputs = model(inputs) |
|
``` |
|
|
|
### Model Input Format |
|
|
|
Users can implement custom models that support any arbitrary input format. |
|
Here we describe the standard input format that all builtin models support in detectron2. |
|
They all take a `list[dict]` as the inputs. Each dict |
|
corresponds to information about one image. |
|
|
|
The dict may contain the following keys: |
|
|
|
* "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`. |
|
Image normalization, if any, will be performed inside the model using |
|
`cfg.MODEL.PIXEL_{MEAN,STD}`. |
|
* "height", "width": the **desired** output height and width **in inference**, which is not necessarily the same |
|
as the height or width of the `image` field. |
|
For example, the `image` field contains the resized image, if resize is used as a preprocessing step. |
|
But you may want the outputs to be in **original** resolution. |
|
If provided, the model will produce output in this resolution, |
|
rather than in the resolution of the `image` as input into the model. This is more efficient and accurate. |
|
* "instances": an [Instances](../modules/structures.html#detectron2.structures.Instances) |
|
object for training, with the following fields: |
|
+ "gt_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each instance. |
|
+ "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories). |
|
+ "gt_masks": a [PolygonMasks](../modules/structures.html#detectron2.structures.PolygonMasks) |
|
or [BitMasks](../modules/structures.html#detectron2.structures.BitMasks) object storing N masks, one for each instance. |
|
+ "gt_keypoints": a [Keypoints](../modules/structures.html#detectron2.structures.Keypoints) |
|
object storing N keypoint sets, one for each instance. |
|
* "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth for training. |
|
Values represent category labels starting from 0. |
|
* "proposals": an [Instances](../modules/structures.html#detectron2.structures.Instances) |
|
object used only in Fast R-CNN style models, with the following fields: |
|
+ "proposal_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing P proposal boxes. |
|
+ "objectness_logits": `Tensor`, a vector of P scores, one for each proposal. |
|
|
|
For inference of builtin models, only "image" key is required, and "width/height" are optional. |
|
|
|
We currently don't define standard input format for panoptic segmentation training, |
|
because models now use custom formats produced by custom data loaders. |
|
|
|
#### How it connects to data loader: |
|
|
|
The output of the default [DatasetMapper]( ../modules/data.html#detectron2.data.DatasetMapper) is a dict |
|
that follows the above format. |
|
After the data loader performs batching, it becomes `list[dict]` which the builtin models support. |
|
|
|
|
|
### Model Output Format |
|
|
|
When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses. |
|
|
|
When in inference mode, the builtin models output a `list[dict]`, one dict for each image. |
|
Based on the tasks the model is doing, each dict may contain the following fields: |
|
|
|
* "instances": [Instances](../modules/structures.html#detectron2.structures.Instances) |
|
object with the following fields: |
|
* "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance. |
|
* "scores": `Tensor`, a vector of N confidence scores. |
|
* "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories). |
|
+ "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance. |
|
+ "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3). |
|
Each row in the last dimension is (x, y, score). Confidence scores are larger than 0. |
|
* "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction. |
|
* "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances) |
|
object with the following fields: |
|
* "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) |
|
object storing N boxes. |
|
* "objectness_logits": a torch vector of N confidence scores. |
|
* "panoptic_seg": A tuple of `(pred: Tensor, segments_info: Optional[list[dict]])`. |
|
The `pred` tensor has shape (H, W), containing the segment id of each pixel. |
|
|
|
* If `segments_info` exists, each dict describes one segment id in `pred` and has the following fields: |
|
|
|
* "id": the segment id |
|
* "isthing": whether the segment is a thing or stuff |
|
* "category_id": the category id of this segment. |
|
|
|
If a pixel's id does not exist in `segments_info`, it is considered to be void label |
|
defined in [Panoptic Segmentation](https://arxiv.org/abs/1801.00868). |
|
|
|
* If `segments_info` is None, all pixel values in `pred` must be ≥ -1. |
|
Pixels with value -1 are assigned void labels. |
|
Otherwise, the category id of each pixel is obtained by |
|
`category_id = pixel // metadata.label_divisor`. |
|
|
|
|
|
### Partially execute a model: |
|
|
|
Sometimes you may want to obtain an intermediate tensor inside a model, |
|
such as the input of certain layer, the output before post-processing. |
|
Since there are typically hundreds of intermediate tensors, there isn't an API that provides you |
|
the intermediate result you need. |
|
You have the following options: |
|
|
|
1. Write a (sub)model. Following the [tutorial](./write-models.md), you can |
|
rewrite a model component (e.g. a head of a model), such that it |
|
does the same thing as the existing component, but returns the output |
|
you need. |
|
2. Partially execute a model. You can create the model as usual, |
|
but use custom code to execute it instead of its `forward()`. For example, |
|
the following code obtains mask features before mask head. |
|
|
|
```python |
|
images = ImageList.from_tensors(...) # preprocessed input tensor |
|
model = build_model(cfg) |
|
model.eval() |
|
features = model.backbone(images.tensor) |
|
proposals, _ = model.proposal_generator(images, features) |
|
instances, _ = model.roi_heads(images, features, proposals) |
|
mask_features = [features[f] for f in model.roi_heads.in_features] |
|
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances]) |
|
``` |
|
|
|
3. Use [forward hooks](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks). |
|
Forward hooks can help you obtain inputs or outputs of a certain module. |
|
If they are not exactly what you want, they can at least be used together with partial execution |
|
to obtain other tensors. |
|
|
|
All options require you to read documentation and sometimes code |
|
of the existing models to understand the internal logic, |
|
in order to write code to obtain the internal tensors. |
|
|