Final Project
: Neural Radiance Field (NeRF)
My final project consists two parts: Fit a Neural Field to a 2D Image
(Part A), and Fit a Neural Radiance Field from Multi-view Images
(Part B).
In Part A, I built a Multiplayer Perceptron (MLP) network to
fit a single 2D image so that given any pixel's coordinate, the network
can predict the pixel's RGB color. When the image's shape is provided,
the network can reconstruct the whole image.
In Part B, I trained a larger MLP network to serve as a Neural
Radiance Field (NeRF) and used it to fit a 3D Lego object through
inverse rendering from multi-view calibrated images. The pixels
on the images were bounded with rays represented in 3D world coordinate
system. Sample locations were gathered along the rays, and their volume
rendering results were used to fit the RGB colors on the images' pixels.
In this way the Lego object was modeled into the NeRF. Using the
trained NeRF, I'm able to predict the images of the Lego taken
from any given perspectives. I rendered these images into a video
to create a rotating effect of the Lego.
Project 5
: Fun with Diffusion Models
Project 5 consists two parts: The power of Diffusion Models
(Part A) and Diffusion Models from Scratch (Part B).
In Part A, I mainly played around with a pretrained diffusion model
called DeepFloyd IF. First, I used the model to conduct
denoising. I blurred a sample image using random noise,
and used the model to predict that noise. I also denoised the image using
Gaussian Blur and compared the denoised results. Then, I denoised
a random noise image to obtain a computer-generated image. I adapted the
Classifier-Free Guidance technic. Later, I conducted image-to-image
translation, where images are translated into similar images either based
on masks or text prompts. At last, I produced Visual Anagrams,
Hybrid Images, and a course logo.
In Part B, I built and trained a diffusion model from scratch. First,
I trained a UNet to denoise half-noisy MNIST images (original
image + 50% pure noise). Then, to denoise images with different
amount of noise, I added Time Conditioning to the UNet, where the
UNet is told how noisy each images are. The trained UNet can accurately
predict the noise that had been added to the images. Using the
trained UNet denoiser, I generated MNIST-like images by denoising
pure noise in 300 steps, only to find the computer
generated images looks little like human-written numbers. To boost the
result, I added Class Conditioning to the UNet, where the UNet
is not only told how noisy the images are, but also the labels (0 to 9)
of the images. 10% of the images are not provided with a label. I adapted
Classifier-Free Guidance to generate MNIST-like images. The results
are much better compared the previous attempt.
Project 4
: Autostitching and Photo Mosaic
Project 4 consists two parts: Image Warping and Mosaicing
(Part A) and Feature Matching for Autostitching (Part B).
In Part A, I rectified images using Perspective Transform.
I manually selected correspondences on the images, and warped them
so that their transformed correspondences form a rectangle.
I also produced mosaics images by blending pairs of images that
overlap with each other. First, I manually matched a few pixels that
represent the same corner of an object on the two images. Then, I
treated these pixel matches as correspondences, and warped the first
image so that after warping, the correspondences on the first image
aligns with the correspondences on the second images. In this way,
the same objects on the two images would match. Finally, I conducted
Alpha Blend on the output mosaic to erase the edge between the
two images.
In Part B, I also produced mosaic images,
only this time instead of manually matching the pixels, the pixel
matches are automatically detected and selected.
Corners serve as great symbols of objects on an image, so I used
Harris Corner Detector to find the corners on the images, and
treat them as interest points. Then, I used
Adaptive Non-Maximal Suppression (ANMS) to select a few interest
points that are not only high in "corner strength", but also as
uniformly distributed in the image as possible. They are the potential
correspondences. Later, I matched the potential correspondences using
Feature Descriptors. If the best match of a
potential correspondence did not score significantly higher than its
second-best match, I would abandon this pixel. The matched pixels
still may contain error. I found the optimal set of matches using the
idea of Random Sample Consensus (RANSAC). At last,
similar to Part A, I used the optimal matches to conduct perspective
transform on the first image so that it aligns with the second image,
and blended the overlapping region to erase the edge.
Project 3
: Face Morphing and Modelling a Photo Collection
In the first part of this project, I morphed two face images using
Affine Transformation. I obtained 100 correspondences for
each of the two faces and computed their average coordinates.
Triangulation was conducted on the correspondences and for
each triangle, Affine Matrixes were generated to stretch the
triangles from the original image to the Midway image. I used Cross
Dissolve to bind the color. I furtherly generated a sequence
of 51 morphed images using different Morph Weight to produce the
morphing GIF. In the second part, I computed the mean face of 12
Brazilian faces, and stretched my face into the shape of the mean face.
I also computed the mean face of 12 smiling Brazilian faces to add a smile
on my grim portrait.
Project 2
: Fun with Filters and Frequencies
By applying filters and analyzing frequencies, images can be processed
and combined in interesting ways.
In the first part of this project, edge detection is conducted by applying
the Finite Difference Filter. Gaussian Filter is applied to
get rid of the unnecessary wrinkles. Then, images are sharpened by stacking
its edges onto itself. The second part of this project consists of two
image binding tasks. The first task generates Hybrid Image by adding the
high frequency of one image to the low frequency of another. Both successful
and failing attempts are introduced. The second task blends images by
applying the Gaussian Stack and the Laplacian Stack.
Project 1
: Colorizing the Prokudin-Gorskii Photo Collection Prokudin-Gorskii photographed the Russian Empire using black-and-white
negatives with red, green and blue filters, hoping future technologies could
stack the three layers to produce colored image. I took the digitized negatives
of Prokudin-Gorskii's work to produce the RGB color image. The original
three color layers from Prokudin-Gorskii's work were not accurately aligned,
so I designed an alignment algorithm using Image Pyramid and Edge
Detection to preprocess the layers before stacking them together.