Image Acquisition Fundamentals in Digital Processing

Image acquisition in digital processing is the first step into turning the physical phenomena (what we see in real life), into a digital representation (what we see in our computers). It starts with the interaction between an illumination source and the subject being imaged. This illumination can be of various types, from conventional light sources to more sophisticated forms like electromagnetic or ultrasound energy. The interaction results in energy either being reflected from or transmitted through the objects in the scene. This energy is captured by sensors, which transform one form of energy into another (i.e. transducers converting the incident energy into electrical voltage). The voltage signal is then digitized, resulting in a digital image. To do so, we need advanced technology and precise calibration to ensure that we have an accurate representation of the physical scene. In the next sections, we will explore some of these technologies.

The first photograph of Moon by Ranger 7 in 1964 (Courtesy of NASA)

Sensor Technologies and Their Role in Image Acquisition

As mentioned before, the first step in digital imaging is the sensors. To create a two-dimensional image, single sensing elements (such as photodiodes) are moved along the x and y axes. In contrast, the more common sensor strips capture images linearly in one direction. Thus, to obtain a complete 2D image, these strips are moved perpendicularly. This technology is commonly found in devices like flatbed scanners and used in airborne imaging systems. In more specialized applications, like medical imaging (e.g., CAT scans), ring-configured sensor strips are used. These setups involve complex reconstruction advanced algorithms to transform the captured data into meaningful images.

Sensor arrays, like CCDs in digital cameras, consist of 2D arrays of sensing elements. They capture a complete image without motion, as each element detects part of the scene. These arrays are advantageous as they don’t require movement to capture an image, unlike single sensing elements and sensor strips. The captured energy is focused onto the sensor array, converted into an analog signal, and then digitized to form a digital image.

Digital Image Formation and Representation

The core of digital image formation is the function $f(x,y)$ , which is determined by the illumination source $i(x,y)$ , and the reflectance $r(x,y)$ from the scene.

Image acquisition by collecting the reflected light from the scene

In transmission-based imaging, such as X-rays, transmissivity takes the place of reflectivity. The digital representation of an image is essentially a matrix or array of numerical values, each corresponding to a pixel. The process of transforming continuous image data into a digital format is twofold:

Sampling, which digitizes the coordinate values
Quantization, which converts amplitude values into discrete quantities.

The resolution and quality of a digital image significantly depend on the following:

The number of samples and discrete intensity levels used.
The dynamic range of the imaging system, which is the ratio of the maximum measurable intensity to the minimum detectable intensity. This also plays a crucial role in the appearance and contrast of the image.

The first digital photograph by Russell A. Kirsch in 1957

Understanding Resolution in Digital Imaging

Spatial resolution refers to the smallest distinguishable detail in an image and is often measured in line pairs per unit distance or pixels per unit distance. The meaningfulness of spatial resolution is context-dependent, varying according to the spatial units used. For example, a 20-megapixel camera typically offers higher detail resolution than an 8-megapixel camera. Intensity resolution relates to the smallest detectable change in intensity level and is often limited by the hardware’s capabilities. It’s quantized in binary increments, such as 8 bits or 256 levels. The perception of these intensity changes is influenced by various factors, including noise, saturation, and the capabilities of human vision.

Image Restoration and Reconstruction Techniques

Image restoration focuses on recovering a degraded image using knowledge about the degradation phenomenon. It often involves modelling the degradation process and applying inverse processes to regain the original image.

An example for image restoration where the image is restored and colorized

In contrast, image enhancement is more subjective. It aims to improve the visual appearance of an image. Restoration techniques include dealing with issues like noise, which can originate from various sources during image acquisition or transmission. Advanced filters, both adaptive and non-adaptive, are used in this case because of their noise-reduction capabilities. In medical imaging, particularly in computed tomography (CT), image reconstruction from projections is a crucial application.

The first photograph of a person Louis Daguerre, 1838 at the Boulevard du Temple, in Paris

Colour in Image Processing

Color is a powerful descriptor in image processing. It plays a role in object identification and recognition. Color image processing includes both pseudo-color and full-color processing.

The first colour photograph by James Clerk Maxwell in 1861 using 3 colour filters

Pseudo-color processing assigns colors to grayscale intensities, while full-color processing uses actual color data from sensors. Understanding the fundamentals of color, including human color perception, the color spectrum, and the attributes of chromatic light, is key. The fundamentals of color involve the trichromatic nature of human vision, which perceives red, green, and blue. On the other hand, color perception is how the three types of cones in our eyes are stimulated. Finally, the color spectrum is the range of wavelengths in the electromagnetic spectrum that elicit distinct visual sensations.

Different color models, such as RGB for monitors and cameras and CMY/CMYK for printing, standardize color representation in digital imaging. In the RGB color model, images have three components (i.e., channels), each for red, green, and blue. The pixel depth in RGB images determines the number of possible colors, with a typical full-color image having a 24-bit depth (8 bits for each color component). This allows for over 16 million possible colors! The RGB color cube represents the range of colors achievable in this model, with the grayscale extending from black to white.

Figure 9: The colour channels of an image

Image Compression

Data compression reduces the data needed to represent information. It distinguishes between data (the means of conveying information) and information itself. It targets redundancy, which is data that is either irrelevant or repetitive. For example, a 10:1 compression ratio indicates 90% data redundancy.

In digital image compression, particularly with 2-D intensity arrays, the three main types of redundancies are:

Coding redundancy: Coding redundancy is particularly prevalent in images where the distribution of intensity values does not spread evenly across all possible values, which is depicted as a non-uniform histogram. In such images, some intensity values occur more frequently than others, yet natural binary encoding assigns the same number of bits to represent each intensity value, regardless of its frequency. This means that common values are not encoded more efficiently than rare values, leading to inefficient use of bits and, thus coding redundancy. Ideally, more frequent values should be assigned shorter and less frequent values longer codes to minimize the number of bits used, which is not the case with natural binary encoding in non-uniform histograms.
Spatial and temporal redundancy: Spatial and temporal redundancy appear in correlated pixel values within an image or across video frames.
Irrelevant information: Irrelevant information includes data ignored by the human visual system or unnecessary for the image’s purpose.

Efficient coding considers event probabilities, like intensity values in images. Techniques like run-length encoding reduce spatial redundancy in images with constant intensity lines, significantly compressing data. Similarly, temporal redundancy in video sequences can be addressed. Removing irrelevant information, though, leads to quantization, an irreversible loss of quantitative information. Information theory, with concepts like entropy, helps determine the minimum data needed for accurate image representation. Image quality post-compression is assessed using objective fidelity criteria (mathematical functions of input and output) and subjective fidelity criteria (human evaluations).

Image compression systems use encoders and decoders. Encoders eliminate redundancies through mapping (to reduce spatial/temporal redundancy), quantization (to discard irrelevant information), and symbol coding (assigning codes to quantizer output). Decoders reverse these processes, except for quantization. Image file formats, containers, and standards like JPEG and MPEG are used for data organization and storage. Huffman coding is a notable method for removing coding redundancy, creating efficient representations by coding the least probable source symbols first.

< > Update on GitHub