OpenGVLab/InternVL2-26B · About the downsample ratio

Jul 7, 2024

From the code, pixel shuffle actually didn't changed any hw dimension, how does it ablt to downsample?

czczup

OpenGVLab org Jul 7, 2024

Pixel Shuffle rearranges an image’s height and width into the channel dimension for downsampling.

Input Feature Map: Starts with dimensions ((H, W, C)).
Rearranging Dimensions: Converts spatial dimensions into channels. For example, ((2H, 2W, C)) becomes ((H, W, 4C)).

This effectively downscales the image by packing spatial details into more channels.

czczup changed discussion status to closed Jul 7, 2024

Jul 8, 2024

Hi, does it better compare with Resampler? Or convolution?