SemanticFPN / README.md

Update README.md (#2)

1b18724 verified 6 months ago

No virus

4.91 kB

	---
	license: apache-2.0
	tags:
	- RyzenAI
	- Image Segmentation
	- Pytorch
	- Vision
	datasets:
	- cityscape
	language:
	- en
	Metircs:
	- mIoU
	---

	# SemanticFPN model trained on cityscapes

	SemanticFPN is a conceptually simple yet effective baseline for panoptic segmentation trained on cityscapes. The method starts with Mask R-CNN with FPN and adds to it a lightweight semantic segmentation branch for dense-pixel prediction. It was introduced in the paper [Panoptic Feature Pyramid Networks in 2019](https://arxiv.org/pdf/1901.02446.pdf) by Kirillov, Alexander, et al.

	We develop a modified version that could be supported by [AMD Ryzen AI](https://ryzenai.docs.amd.com).


	## Model description

	SemanticFPN is a single network that unifies the tasks of instance segmentation and semantic segmentation. The network is designed by endowing Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. This simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. It is a robust and accurate baseline for both tasks and can serve as a strong baseline for future research in panoptic segmentation.


	## Intended uses & limitations

	You can use the raw model for image segmentation. See the [model hub](https://huggingface.co/models?sort=trending&search=amd%2FSemanticFPN) to look for all available SemanticFPN models.


	## How to use

	### Installation

	Follow [Ryzen AI Installation](https://ryzenai.docs.amd.com/en/latest/inst.html) to prepare the environment for Ryzen AI.
	Run the following script to install pre-requisites for this model.
	```bash
	pip install -r requirements.txt
	```


	### Data Preparation (optional: for accuracy evaluation)

	1. Download cityscapes dataset (https://www.cityscapes-dataset.com/downloads)
	- grundtruth folder: gtFine_trainvaltest.zip [241MB]
	- image folder: leftImg8bit_trainvaltest.zip [11GB]
	2. Organize the dataset directory as follows:
	```Plain
	└── data
	└── cityscapes
	├── leftImg8bit
	\| ├── train
	\| └── val
	└── gtFine
	├── train
	└── val
	```

	### Test & Evaluation

	- Code snippet from [`infer_onnx.py`](infer_onnx.py) on how to use
	```python
	parser = argparse.ArgumentParser(description='SemanticFPN model')
	parser.add_argument('--onnx_path', type=str, default='FPN_int_NHWC.onnx')
	parser.add_argument('--save_path', type=str, default='./data/demo_results/senmatic_results.png')
	parser.add_argument('--input_path', type=str, default='data/cityscapes/cityscapes/leftImg8bit/test/bonn/bonn_000000_000019_leftImg8bit.png')
	parser.add_argument('--ipu', action='store_true',
	help='use ipu')
	parser.add_argument('--provider_config', type=str, default=None,
	help='provider config path')
	args = parser.parse_args()

	if args.ipu:
	providers = ["VitisAIExecutionProvider"]
	provider_options = [{"config_file": args.provider_config}]
	else:
	providers = ['CPUExecutionProvider']
	provider_options = None

	onnx_path = args.onnx_path
	input_img = build_img(args)
	session = onnxruntime.InferenceSession(onnx_path, providers=providers, provider_options=provider_options)
	ort_input = {session.get_inputs()[0].name: input_img.cpu().numpy()}
	ort_output = session.run(None, ort_input)[0]
	if isinstance(ort_output, (tuple, list)):
	ort_output = ort_output[0]

	output = ort_output[0].transpose(1, 2, 0)
	seg_pred = np.asarray(np.argmax(output, axis=2), dtype=np.uint8)
	color_mask = colorize_mask(seg_pred)
	color_mask.save(args.save_path)
	```

	- Run inference for a single image
	```python
	python infer_onnx.py --onnx_path FPN_int_NHWC.onnx --input_path /Path/To/Your/Image --ipu --provider_config Path/To/vaip_config.json
	```

	- Test accuracy of the quantized model
	```python
	python test_onnx.py --onnx_path FPN_int_NHWC.onnx --dataset citys --test-folder ./data/cityscapes --crop-size 256 --ipu --provider_config Path/To/vaip_config.json
	```
	### Performance

	\| model \| input size \| FLOPs \| mIoU on Cityscapes Validation\|
	\|-------\|------------\|--------------\|-------\|
	\| SemanticFPN(ResNet18)\| 256x512 \| 10G \| 62.9% \|

	\| model \| input size \| FLOPs \| INT8 mIoU on Cityscapes Validation\|
	\|-------\|------------\|---------------\|--------------\|
	\| SemanticFPN(ResNet18)\| 256x512 \| 10G \| 62.5% \|

	```bibtex
	@inproceedings{kirillov2019panoptic,
	title={Panoptic feature pyramid networks},
	author={Kirillov, Alexander and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
	booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
	pages={6399--6408},
	year={2019}
	}
	```