Release note for Finegrain Box Segmenter v0.1

Demo

If you want to give the Finegrain Box Segmenter a try, the best way to is take a look at the Finegrain Object Cutter Space we shipped on Hugging Face: it's a fun "prompt to cut out" experience that will enable you to create pixel quality and high resolution cutouts for any object in a photo, by just naming the object.

Motivation

While building Finegrain, we needed a way to create pixel perfect and high resolution cutouts for objects in images. We looked at off-the-shelf solutions, but they simply didn't work for us:

  • On the one hand, traditional background removal models are great at producing HD cutouts, but unfortunately, different people will have different definitions for background and foreground in a given image - a way to prompt these models is missing.
  • On the other hand, new promptable approaches like SAM or SAM2 don't meet the quality bar for the use cases we are pursuing: they are generating internally a 256x256 low resolution mask - with built-in upscaling mechanisms that create artefacts and struggle with complex masks (a la Eiffel Tower).

The Finegrain Box Segmenter avoids these pitfalls by training MVANet to be a box-promptable High Definition (1024x1024) object-cutout model, making no assumption on what is background and what is foreground: users are fully in control.

License

The Finegrain Box Segmenter is published under the MIT license. Have fun using it in your projects! If you want an optimized version (speed and accuracy wise), we offer an API - just ping us!

Features

The Finegrain Box Segmenter:

  • produces HD and pixel quality masks,
  • gives control to users via box prompting,
  • outputs alpha masks: you can use it as an end-to-end Matting Segmenter without any post-processing or trimap.

Use cases

You should think of the Finegrain Box Segmenter as a way to select an object in a image, with pixel level accuracy, and in high resolution.

It's a prerequisite for a number of object manipulation tasks like:

  • Remove the background around an object
  • Change the background around an object
  • Erase an object from an image
  • Recolor an object in an image
  • Replace an object in an image
  • ...

Out-of-the box, the Finegrain Box Segmenter requires a bounding box as an input, but you can easily augment it to enable "prompt to select object" scenarios - see the Finegrain Object Cutter Hugging Face space for an example implementation.

Training

Our focus at Finegrain is e-commerce. We therefore trained our model with product datasets coming from 2 sources:

  • Nfinite:
  • Finegrain:
    • 1184 images sourced via hard negative mining
    • Natural data (both studio and UGC photos)
    • 1479 pixel quality masks
    • Common objects
    • Closed source

We moved away from the usual random crop approach. Instead, we designed our custom cropping strategy to make sure the model understands what object to select in a given bounding box. We used batch sizes of 5 to improve the training stability.

Evaluation

Given our focus on e-commerce, we crafted a specific test set, and in order to ease benchmarking with other models and solutions, we decided to open source part of it as the Finegrain Product Masks Lite, containing 120 pixel quality masks of common objects (both UGC and studio photos).

We're using the usual metrics, namely MAE, Smeasure, Emeasure and Dice, computed with PySODMetrics. We'll add more later to account for matting aspects (transparent objects) - still a work-in-progress on our end.

Model MAE Smeasure Emeasure Dice
briaai/RMBG-1.4 (x) 0.0226 90.7% 94.3% 88.5%
ZhengPeng7/BiRefNet (xx) 0.0194 93.1% 95.1% 91.5%
finegrain/finegrain-box-segmenter 0.0078 97.4% 98.5% 96.7%

(x) Using Cropping with 5% margin

(xx) Using "Segmentation With Box Guidance" from BiRefNet

Limitations

The Finegrain Box Segmenter v0.1 has a number of limitations that will be tackled in future versions:

  • prompting is not baked in yet,
  • it struggles when the object is touching the side of the image,
  • it doesn't support yet matting of STM (Salient Transparent/Meticulous Objects) or NS (non-salient) masks (see Deep Automatic Natural Image Matting for definition of SO/STM/NS),
  • it doesn't fully nail yet hard cases like hard shadows, strong reflections or hand-held configurations:
Strong reflection Hard shadow Hand-held
Image image/png image/png image/png
briaai/RMBG-1.4 image/png image/png image/png
ZhengPeng7/BiRefNet image/png image/png image/png
finegrain/finegrain-box-segmenter image/png image/png image/png

Bias and Fairness

Given our focus on e-commerce, we haven't yet conducted a thorough bias and fairness review. It will be tackled in future releases.

Usage

With refiners (https://github.com/finegrain-ai/refiners)

from PIL import Image
from refiners.solutions import BoxSegmenter

input_image = Image.open("input.png") 

# Downloads the weights from finegrain/finegrain-box-segmenter
segmenter = BoxSegmenter()
# box_prompt is (x_min, y_min, x_max, y_max)
mask = segmenter(input_image, box_prompt=(24, 133, 588, 531))
# Or without box_prompt as a background remover
# mask = segmenter(input_image.convert("RGB"))
mask.save("output.png")

With ComfyUI (https://registry.comfy.org/publishers/finegrain/nodes/comfyui-refiners)

Install the comfyui-refiners custom nodes:

comfy node registry-install comfyui-refiners

Example workflow Workflow

Via our HuggingFace Space (https://huggingface.co/spaces/finegrain/finegrain-object-cutter)

You can directly try it for free on our space, you may get rate-limited by ZeroGPU though. You will have better performances running the model yourself, and even more from our API (just ping us!)

Downloads last month
4,224
Safetensors
Model size
94.6M params
Tensor type
FP16
·
Inference API
Inference API (serverless) does not yet support refiners models for this pipeline type.

Dataset used to train finegrain/finegrain-box-segmenter

Spaces using finegrain/finegrain-box-segmenter 2