Digital Image Processing (DIP): A Beginner's Guide
Machines can be trained to interpret images the same way our brains do and then analyze those images much more in-depth than we can. When computer intelligence (CI) is applied to image processing, it can fuel face recognition and authentication functionality to ensure security in public places, detect and recognize objects and patterns in images and videos, etc.
Digital image processing is the manipulation of images using digital computers. Its use has grown exponentially in the last decades. Multimedia systems, one of the main pillars of today’s information society, depend heavily on digital image processing.
What is image processing?
Image processing is a method of performing certain operations on an image to enhance it or extract useful information. It’s a type of signal processing in which the input is an image, and the output may be an image or characteristics/features associated with that image.
The two types of image processing techniques are analog and digital. Analog image processing can be used for hard copies like printouts and photographs. Digital image processing techniques help manipulate digital images using computers. The three general phases that all types of data undergo while using digital techniques are pre-processing, enhancement, and display of extracted information.
But what is an image?
Image refers to a 2D light-intensity function indicated by f(x,y), where the value or amplitude of ‘f’ at space coordinates (x,y) gives an intensity (brightness) of the image at that point. The image must be sampled and converted into a matrix of numbers to be processed digitally. Since a computer represents the numbers using finite precision, these numbers must be quantized to be represented digitally.
Two components can characterize the basic nature of f(x,y):
- The amount of light source incident (illumination) on the scene:
i(x,y) , where 0≤i(x,y)≤∞
- The amount of light reflected (reflectance) by objects:
r(x,y) , where 0≤r(x,y)≤1
An image can be defined as a two-dimensional array precisely lined up in rows and columns. Digital Image comprises a finite number of elements, each with a particular value at a specific location. Such elements are picture elements, image elements, and pixels. The term Pixel is most widely used to denote the elements of a Digital Image.
2D Array of Pixels (Source)
There are several types of images:
- Binary Images
It is the simplest type of image. It takes just two values, i.e., Black and White or 0 and 1. Binary images consist of a 1-bit image, and only one binary digit represents a pixel.
Example of binary image (Source)
- Grayscale images
Grayscale images are monochrome images, which means they have only one color. Grayscale images do not contain any color information. A standard grayscale image contains 8 bits/pixel data with 256 different grey levels. Each pixel determines available different grey levels.
Grayscale Image
- Color images
Color images are three-band monochrome images where each band contains a different color, and the information is stored in the digital image. The color images have gray-level data in each spectral band. Digital images are represented in red, green, and blue (RGB model). Each color image has 24 bits/pixel, which means 8 bits for each of the three-color bands (RGB).
RGB image representation (Source)
Image processing is used extensively in medical visualization, biometric technologies, self-driving automobiles, gaming, surveillance, law enforcement agencies, and other fields. The following are the primary purposes of image processing:
Visualization— Converting (rendering) image pixel/voxel into 2D/3D graphical representation. Most computers support 8-bit (256) grayscale display, sufficient for human vision to resolve 32-64 grayscale. Visualization aims to communicate data or information clearly and effectively to readers.
Image restoration—The purpose of image restoration is to “compensate for” or “undo” defects that degrade an image. Degradation takes many forms, such as motion blur, noise, and camera misfocus. In cases like motion blur, it is possible to estimate the actual blurring function perfectly and “undo” the blur to restore the original image.
Image retrieval — Browsing, searching, and retrieving images from an extensive database of digital images. Most traditional and standard image retrieval methods utilize metadata such as captioning, keywords, or descriptions of the images to retrieve the annotation words.
Pattern recognition — Pattern recognition is classifying input data into objects, classes, or categories using computer algorithms based on key features or regularities. Pattern recognition has applications in computer vision, image segmentation, object detection, radar processing, speech recognition, and text classification.
Pattern recognition operation (Source)
Let’s look at each of these phases.
- Image acquisition – In image processing, image acquisition retrieves an image from a source, usually hardware systems like cameras, sensors, etc. It’s the first and most crucial step in the workflow sequence because, without an image, the system makes no actual processing.
In the image acquisition process, incoming light energy derived from an object is converted into an electrical signal by a combination of sensitive sensors to the specific type of energy. These minute sub-systems work together to provide the most precise representation of the object.
While the sensor system and cameras primarily rely on available technologies, users have complete control over illumination.
Image enhancement- Improves the quality of an image by extracting hidden information from it for further processing.
Image restoration –Recovering an image from a degraded version—usually a blurred and noisy image. Image restoration is a basic problem in image processing, and it also provides a testbed for more general inverse problems. Image restoration is performed by reversing the process that blurs the image. Such is accomplished by imaging a point source and using the point source image called the Point Spread Function (PSF) to restore the image information lost to the blurring process.
Two views of Jupiter taken from the Hubble Space Telescope with flawed mirror (top) and restored images (Source)
Morphological processing explains the shapes and structures of the objects in an image. In the morphological processing of images, pixels are added or removed. The design and shape of the objects are analyzed so that they can be identified. The basic operations in this processing are binary convolution and correlation based on logical operations rather than arithmetic operations.
Image-Segmentation is the process of dividing an image into multiple segments. Image segmentation is often used to locate objects and boundaries in images. The goal of segmentation is to simplify and change the representation of an image into something more meaningful and easier to analyze.
Object recognition is a computer-vision technique for identifying objects in images or videos. Object recognition is a crucial output of deep learning and machine learning algorithms. When humans look at a picture or watch a video, we can readily spot people, objects, scenes, and visual details. The objective is to teach a computer to do what comes naturally to humans: understand what an image contains.
Machine learning and deep learning techniques have recently become popular approaches to object recognition problems. Both methods learn to identify objects in images, but their execution differs.
Machine learning and deep learning techniques for object recognition. (Source)
Representation and description:
-
Based on their outer characteristics (its boundary): – Shape characteristics.
Based on their inner characteristics (its region): – Regional properties such as color, texture, etc
Description based on a selected representation:
- Description: length, orientation, the number of concavities in the boundary, and statistical measures of the region.
-
Image compression - Minimizing the size in bytes of a graphics file without degrading the quality of the image. The reduced file size allows more images to be stored in a given disk or memory space. It also reduces the time needed for images to be sent over the Internet or downloaded from web pages.
Image files can be compressed in several ways. The two most common compressed graphic image formats for Internet use are the JPEG format and the GIF format. The JPEG method is commonly used for photographs, while the GIF method is frequently used for line art and other images in which geometric shapes are relatively simple.
Some of the standard image compression techniques are:
Fractal
Wavelets
Transform coding
Run-length encoding
Color Image processing: Color is essential in image processing because the color is a robust descriptor that simplifies object identification and extraction. A color model aims to facilitate the specification of colors in some standard way. A color model specifies a coordinate system and a subspace where a single point represents each color. Color models most common in image processing are:
- RGB model for color displays and video cameras
In this model, each color is displayed in its original colors: red, green, and blue. The model is based on a Cartesian coordinate system. Various colors in this model are points on or inside the cube and are defined by vectors extending from the origin.
RGB colour model
CMYK (cyan, magenta, yellow, black) model for color prints
HSI (hue, saturation, intensity) model
The RGB and CMYK color models are unsuitable for describing colors in terms of human interpretation. When we view a color object, we define its hue, saturation, and brightness (intensity). The HSI model separates the intensity component from the color-carrying data (hue and saturation) in a color image. The hue, saturation, and intensity values can be extracted from the RGB color cube. As a result, this model is the ideal tool for developing color image processing algorithms.
Full-colour image and its HSI component images (Source)
Traditional image processing algorithms
Morphological Image Processing
Morphological image processing removes imperfections from binary images because binary regions produced by simple thresholding can be distorted by noise and texture. It also helps to smooth the image by using opening and closing operations.
Morphological operations can be expanded to grayscale images. It comprises non-linear operations related to the structure or features of an image. This technique examines the image using a small template known as the structuring element, which is placed on different possible locations in the image and is compared with the corresponding neighborhood pixels. A structuring element is a small matrix that has 0 and 1 values.
Let’s look at two basic functions of morphological image processing, Dilation and Erosion:
The dilation function adds pixels to the boundaries of the object in an image
The erosion function removes the pixels from the object boundaries.
The table below lists rules for a Dilation and Erosion function.
The number of pixels deleted or added to the original image is based on the size of the structuring element.
“What is a structuring element?”
A structuring element is a matrix comprising only 0’s and 1’s that can have any shape and size. It is placed at all possible locations in the image, and it is compared with the corresponding neighborhood pixels.
Probing of an image with a structuring element (white and grey pixels have zero and non-zero values, respectively). (Source)
In the above illustration, the element ‘A’ fits the image, the ‘B’ intersects the image, and the ‘C’ is out of the image.
2. Gaussian Image Processing
A gaussian blur, also referred to as Gaussian smoothing, blurs the image by a Gaussian function.
It is used to reduce image noise. The visual impact of such a blurring technique is similar to looking at an image through the translucent screen. It’s also used in computer vision for different-scale image enhancement or as a data augmentation technique in deep learning.
The primary gaussian function looks like this:
In practice, it is best to take full advantage of the Gaussian blur’s separable property by dividing the process into two passes. In the first pass, a one-dimensional kernel is used to blur the image in only the horizontal or vertical direction. The same one-dimensional kernel is used to blur in the remaining direction in the second pass.
If we have a normally distributed filter, and when applied to an image, the results look like this:
Gaussian image processing (Source)
You can see that some of the edges have a little less detail. The filter gives more weight to the pixels at the center than those away from the center. Gaussian filters are low-pass filters, i.e., they weaken the high frequencies. It is usually used in edge detection.
3. Fourier Transform in image processing
The Fourier Transform is an essential image-processing tool to decompose an image into its sine and cosine components. It has several applications, including image reconstruction, compression, and filtering. As we are concerned with digital photos, we will restrict this discussion to the Discrete Fourier Transform (DFT). Let’s examine a sinusoid; it comprises of three things:
Magnitude – related to contrast
Spatial frequency – related to brightness
Phase – related to color information
The image in the frequency domain appears like this:
Fourier transform decomposes the image into sine and cosine components and is widely used in image reconstruction, compression, and filtering. Source
The principle for 2D discrete Fourier transform is:
4. Wavelet Image Processing
A wavelet is a wave-like oscillation with amplitude that starts at zero, increases, and decreases to zero. It can be usually visualized as a “brief oscillation”, like the one recorded by a seismograph or heart monitor. Wavelets have certain properties that make them useful for signal processing. Wavelets can also be combined, using a “reverse, shift, multiply, and integrate” technique called convolution, with portions of a known signal to extract information from the unknown signal.
Difference between a sine-wave and a Wavelet (Source)
The main difference is that the sine-wave is not localized in time (it stretches out from -infinity to +infinity), while a wavelet is localized in time. Such a feature allows the Wavelet transform to obtain time information in addition to frequency information.
Image processing using Neural Networks
Neural Networks are multi-layered networks consisting of neurons or nodes. These neurons are the core processing units of the neural network. They are designed to act like human brains. They take in data, train themselves to recognize the patterns in the data and then predict the output.
A basic neural network has three layers:
Input layer
Hidden layer
Output layer
Basic neural network (Source)
One well-known neural network architecture that made a significant breakthrough in image data is the Convolution Neural Network, also called CNN. CNN analyzes images for different image-processing tasks.
The convolutional neural network is based on three primary layers which are:
Convolutional Layer
Pooling Layer
Fully Connected Layer
Convolutional Layer (CONV): They are the core building block of CNN, responsible for convolution operations. The element involved in this layer’s convolution operation is called the Kernel/Filter (matrix). The kernel makes horizontal and vertical shifts based on the stride rate until the full image is traversed.
Movement of the kernel (Source)
- Pooling Layer (POOL): The pooling layer gradually reduces the image’s size, keeping only the most essential information. Its purpose is to progressively reduce the spatial dimension of the representation to reduce the number of parameters and computation in the network. There are two types of Pooling: Max Pooling and Average Pooling.
Max pooling returns the maximum value from the area covered by the kernel on the image. Average pooling returns an average of all the values in the part of the image covered by the kernel.
Pooling operation (Source)
Fully Connected Layer (FC): The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures. CNN is used primarily to extract features from the image with the help of its layers. CNN is widely used in image classification, where each input image is passed through a series of layers to get a probabilistic value between 0 and 1.
CNN architecture (Source)
Conclusion
Digital image processing, or DIP, is a fascinating field that has the potential to change our perception of and interactions with the visual environment. DIP uses are numerous and constantly changing, ranging from photo enhancement to the extraction of crucial data for scientific study. When you dive further, you’ll find a wealth of algorithms and methods that enable you to work with, decipher, and unlock the potential that lies behind digital photos. Recall that this is only the start. You too may become an expert at creating the digital world one pixel at a time with commitment and investigation.