For example, a FasterRCNN approach uses the following steps: - the image features are given out by a FPN (Feature Pyramid Net) over a ResNet backbone, and - then a RPN (Regision Proposal Network) layer detects proposals in those features, and - then the ROI (Region of Interest) heads get the box proposals in the original image, and - then the boxes are selected using a NMS (Non-max suppression), - and then the features for selected boxes are used as visual features.