Update README.md

b3fd7a1 verified 19 days ago

4.81 kB

	---
	library_name: keras-hub
	tags:
	- image-segmentation
	- keras
	---
	## Model Overview
	A Keras model implementing the MixTransformer architecture to be used as a backbone for the SegFormer architecture. This model is supported in both KerasCV and KerasHub. KerasCV will no longer be actively developed, so please try to use KerasHub.

	References:
	- [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) # noqa: E501
	- [Based on the TensorFlow implementation from DeepVision](https://github.com/DavidLandup0/deepvision/tree/main/deepvision/models/classification/mix_transformer) # noqa: E501

	## Links
	* [MiT Quickstart Notebook: coming soon]()
	* [MiT API Documentation: coming soon]()

	## Installation

	Keras and KerasHub can be installed with:

	```
	pip install -U -q keras-Hub
	pip install -U -q keras>=3
	```

	Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

	## Presets

	The following model checkpoints are provided by the Keras team. Weights have been ported from https://dl.fbaipublicfiles.com/segment_anything/. Full code examples for each are available below.
	Here's the table formatted similarly to the given pattern:

	Here's the updated table with the input resolutions included in the descriptions:

	\| Preset name \| Parameters \| Description \|
	\|--------------------------\|------------\|--------------------------------------------------------------------------------------------------\|
	\| mit_b0_ade20k_512 \| 3.32M \| MiT (MixTransformer) model with 8 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. \|
	\| mit_b1_ade20k_512 \| 13.16M \| MiT (MixTransformer) model with 8 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. \|
	\| mit_b2_ade20k_512 \| 24.20M \| MiT (MixTransformer) model with 16 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. \|
	\| mit_b3_ade20k_512 \| 44.08M \| MiT (MixTransformer) model with 28 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. \|
	\| mit_b4_ade20k_512 \| 60.85M \| MiT (MixTransformer) model with 41 transformer blocks, trained on the ADE20K dataset with an input resolution of 512x512 pixels. \|
	\| mit_b5_ade20k_640 \| 81.45M \| MiT (MixTransformer) model with 52 transformer blocks, trained on the ADE20K dataset with an input resolution of 640x640 pixels. \|
	\| mit_b0_cityscapes_1024 \| 3.32M \| MiT (MixTransformer) model with 8 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|
	\| mit_b1_cityscapes_1024 \| 13.16M \| MiT (MixTransformer) model with 8 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|
	\| mit_b2_cityscapes_1024 \| 24.20M \| MiT (MixTransformer) model with 16 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|
	\| mit_b3_cityscapes_1024 \| 44.08M \| MiT (MixTransformer) model with 28 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|
	\| mit_b4_cityscapes_1024 \| 60.85M \| MiT (MixTransformer) model with 41 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|
	\| mit_b5_cityscapes_1024 \| 81.45M \| MiT (MixTransformer) model with 52 transformer blocks, trained on the Cityscapes dataset with an input resolution of 1024x1024 pixels. \|

	## Example Usage
	Using the class with a `backbone`:

	```
	import tensorflow as tf
	import keras_cv
	import numpy as np

	images = np.ones(shape=(1, 96, 96, 3))
	labels = np.zeros(shape=(1, 96, 96, 1))
	backbone = keras_cv.models.MiTBackbone.from_preset("mit_b4_ade20k_512")

	# Evaluate model
	model(images)

	# Train model
	model.compile(
	optimizer="adam",
	loss=keras.losses.BinaryCrossentropy(from_logits=False),
	metrics=["accuracy"],
	)
	model.fit(images, labels, epochs=3)
	```

	## Example Usage with Hugging Face URI

	Using the class with a `backbone`:

	```
	import tensorflow as tf
	import keras_cv
	import numpy as np

	images = np.ones(shape=(1, 96, 96, 3))
	labels = np.zeros(shape=(1, 96, 96, 1))
	backbone = keras_cv.models.MiTBackbone.from_preset("hf://keras/mit_b4_ade20k_512")

	# Evaluate model
	model(images)

	# Train model
	model.compile(
	optimizer="adam",
	loss=keras.losses.BinaryCrossentropy(from_logits=False),
	metrics=["accuracy"],
	)
	model.fit(images, labels, epochs=3)
	```