Project 4 consists two parts: Image Warping and Mosaicing (Part A) and Feature Matching for Autostitching (Part B).
In Part A, I rectified images using Perspective Transform. I manually selected correspondences on the images, and warped them so that their transformed correspondences form a rectangle. I also produced mosaics images by blending pairs of images that overlap with each other. First, I manually matched a few pixels that represent the same corner of an object on the two images. Then, I treated these pixel matches as correspondences, and warped the first image so that after warping, the correspondences on the first image aligns with the correspondences on the second images. In this way, the same objects on the two images would match. Finally, I conducted Alpha Blend on the output mosaic to erase the edge between the two images.
In Part B, I also produced mosaic images, only this time instead of manually matching the pixels, the pixel matches are automatically detected and selected. Corners serve as great symbols of objects on an image, so I used Harris Corner Detector to find the corners on the images, and treat them as interest points. Then, I used Adaptive Non-Maximal Suppression (ANMS) to select a few interest points that are not only high in "corner strength", but also as uniformly distributed in the image as possible. They are the potential correspondences. Later, I matched the potential correspondences using Feature Descriptors. If the best match of a potential correspondence did not score significantly higher than its second-best match, I would abandon this pixel. The matched pixels still may contain error. I found the optimal set of matches using the idea of Random Sample Consensus (RANSAC). At last, similar to Part A, I used the optimal matches to conduct perspective transform on the first image so that it aligns with the second image, and blended the overlapping region to erase the edge.
Image rectification is a transformation process that projects images onto a plane that faces directly towards a rectangular object in the image. To do this, I manually selected the corners of a rectangular object in the image as the correspondences. I conducted Perspective Transform on the image so that after transformation, these correspondences would form a rectangle. Perspective Transform is denoted as \[ \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}, \] where (x, y) is the pixel coordinate before the transform, (x', y') is its coordinate after the transform, and the middle 3*3 matrix is the Transformation Matrix we wish to obtain.
The transformation matrix can be calculated by solving the equation \[ \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix}, \] where (x, y) and (x', y') is the coordinates of a correspondence before and after its transform, respectively. The equation should have a single solution when the number of correspondences is fixed to 4. When more than 4 correspondences exist, an approximate solution is reached by solving the equation using the first four correspondences, and Gradient Descent is applied on the transformation matrix to adjust it according to the rest of the correspondences. The loss function is set as the mean-square-loss between the correspondences' desired coordinates after transform and their true post-transform coordinates using the current matrix.
Figure 1 shows the correspondences (blue dots) of a sign image and a sheet image. Their rectified image using these correspondences are shown in Figure 2.
An Image Mosaic is a compound image created by stitching together a series of adjacent pictures, where between each adjacent images, an overlapping region displaying the same objects should exist. In this project, I created mosaic images formed by two overlapping images. I manually matched a number of pixels (no less than 4) on the two images overlapping region. Figure 3 shows the matching pixels on two Sather Tower images I filmed. I used these pixel matches as correspondences, and warped the first image using perspective transform so that its post-transform correspondences aligns with the correspondences on the second image. In this way, the same object in the overlapping region would match. The Sather Tower mosaic image is shown in Figure 4a.
Modern cameras (especially phone cameras) would auto-adjust images' brightness, and it is almost impossible to have the same pixel being the exact same color on two different films. To erase the nasty edge between the two images, I conducted Alpha Compositing to blend the overlapping region. The alpha value for each pixel is denoted as \[ \alpha = \frac{d_1}{d_1 + d_2}, \] where d1 is the pixel's Euclidean Distance to its closest non-overlapping point in image 1, while d2 is that of image 2. The distances are quickly obtained using the Distance Transform algorithm. The mosaic image after blending is shown in Figure 4b.
I also produced mosaic image of my messy room and a crossroad on my way to school. They are shown in Figure 5 and 6, respectively.
In Part B, instead of manually matching the pixel's on the two images, I would rely on the computer to find these matching pixels. Corners can be great symbol of objects on an image, as the pixels around a corner can have identical color distribution. I used Harris Corner Detector to extract the corner strength of each pixel. The higher the pixel's corner strength, the more likely the pixel represents a corner in the image. The corner strength of each pixel in an image featuring Moffitt Library (Figure 7a) is shown in Figure 7b.
Pixels with higher corner strength are more likely to have an identical color distribution nearby, which may boost the accuracy of matching. However, an optimal set of correspondences for Perspective Transform should also cover as many spaces on the image as possible, and a clustered set of correspondences would often lead to an unsuccessful mosaic. In order to find the pixels that are not only high in corner strength but also not clustering, I used Adaptive Non-Maximal Suppression (ANMS). I ordered the pixels into a list according to their corner strength, and start selecting the potential correspondences from the front of the ordered list. I would only select a pixel if it is no closer than 18 pixels with any of the pixels already selected. The 5000 pixels with the highest corner strength (interest points) are shown as blue dots in Figure 8, along with the selected interest points (potential correspondences) shown as red dots in the image.
With potential correspondences selected on both adjacent images, it is time to match them. Within each match, the potential correspondences should point to the same corner of the same object on the image (the same corner of a building, the same treetop, etc.). Researches have suggested that when the neighbouring pixels of the same part of an object is blurred, the blurring result should remain similar even if the filming angle rotates within a small range. Therefore, sampled the pixels from a patch of radius r = 20 (size 40 * 40) with spacing s = 5 (Feature Descriptor size 8 * 8). The sampled pixels are normalized to a mean of 0 and a standard division of 1. The correspondences with their Feature Descriptor achieving the highest inner product are matched.
When there are similar objects in an image (e.g. similar windows in a building, leaves on a tree), it is easy to match the wrong corner. So I abandoned all pixels whose best match is not significantly higher than its second-best match (The inner product achieved with its second-best match is not at least 40% smaller than that achieved with the best match). The potential correspondences (pink dots) and the selected pixel matches (red dots and lines) in the two adjacent moffitt library images are shown in Figure 9.
The matched pixels may still contain errors. I adapted the idea of Random Sample Consensus (RANSAC) to find the optimal set of matches to conduct perspective transform. For every approach in 1000 approaches, I would calculate the transformation matrix according to a randomly selected 4 pixel matches, and record the number of outliers for each obtained matrix. An outlier is any correspondences in all the matched pixels that after transform does not align with its match on the second image. Here, having a distance larger than 3 pixels is not considered aligned. The transformation matrix with the least outliers is the optimal transformation matrix.
Using the same technic introduced in Part A, I produced Mosaics of the Moffitt Library (Figure 10), Baker Bioenginuity Hub (Figure 11), and the FU on my roommate's door (Figure 12). I also generated mosaics using the same source images based on manually selected matches. I presented them alongside the fully-automatically-generated mosaics. In my opinion, the mosaics with computer-generated matches seems to be more accurately aligned.
* Part A finished on Oct 19, 2024; Part B finished on Oct 24, 2024.