Sergey Prokudin-Gorskii photographed the Russian Empire using black-and-white negatives with red, green and blue filters respectively, hoping future technologies could stack the three layers to produce colored image (See Prokudin-Gorskii's work here ). The goal of this project is to take the digitized negatives of Prokudin-Gorskii's work and produce the single RGB color image. Note that the three layers are not accurately aligned. Therefore, in order to produce a clear picture, an automated aligning process is required.
To accurately align the blue and green layers to the red layer, I created a loss function to represent how accurately the layers are aligned.
The optimal alignment is calculated by finding the alignment with the least loss.
I initially employed L2 loss as loss function, only to find the results not satisfying.
I discovered that a bright pixel in one layer does not necessarily align to another bright pixel in a different layer,
as a color may result in a high value when passed through one filter, while leaving a low value when passed though another filter.
Even though pixel values in different layers may not align, edges of shapes and figures should align.
Therefore, to optimize my model, I adapted the famous Sobel Operator
to detect the edges in each layer before alignment, and the loss function is defined as the L2 loss between the layers' edges.
Figure 1 shows the image emir.tif's original blue layer and its edges detected by the Sobel Operator.
To find the alignment resulting in the lowest loss, I exhaustively searched all the possible candidates within a range, where the center of the blue or green layer may be aligned to one of the pixels in a square on the red layer. The square is defined by its center and its length of side. I initially set the square at the centre of the red layer, with its sides 21 pixels long. After traversing all 441 candidates, the alignment with the lowest loss is chosen. I adapted this process on the three smaller jpg images, cathedral.jpg, monastery.jpg and tobolsk.jpg respectively. In Figure 2, 3, 4, I compare the images stacked directly without the alignment process to the aligned images.
Although exhaustive search has proven effective on small jpg images, when adapted on larger tif images, the optimal blue and green shift may exceed the range of [-10, 10] pixels. In order to expand the range, more time is needed. Time complexity increase dramatically along with the expansion of search range. To improve time efficiency, I adapted the idea of Image Pyramid. For every large tif image, I downsample it using average pooling with a factor of 3 along each direction, with the resulting image subjected to the same procedure again until it is small enough, which in this task, having a height smaller than 500 pixels. Exhaustive search is conducted on the center square of the smallest image, and the optimal shift for blue and green layers are found and recorded. Then, the pyramid goes up a layer, using the optimal shift obtained by the lower layer to get an estimate of where the best alignment is, and conduct exhaustive search around that estimate to gain a more accurate result based on the image with higher resolution. This process is again applied on the next layer with an even bigger image, and eventually the optimal alignment is obtained from the search on the original image. I use a recursive function to implement this process. The resulting images are shown in Figure 5 ~ 15.
* Finished on Sep 9, 2024.