UC Berkeley CS 180 Fall 2024
Intro to Computer Vision and Computational Photography
Projects

David Wei
david_wei@berkeley.edu
  1. Final Project : Neural Radiance Field (NeRF)
    My final project consists two parts: Fit a Neural Field to a 2D Image (Part A), and Fit a Neural Radiance Field from Multi-view Images (Part B).
     In Part A, I built a Multiplayer Perceptron (MLP) network to fit a single 2D image so that given any pixel's coordinate, the network can predict the pixel's RGB color. When the image's shape is provided, the network can reconstruct the whole image.
     In Part B, I trained a larger MLP network to serve as a Neural Radiance Field (NeRF) and used it to fit a 3D Lego object through inverse rendering from multi-view calibrated images. The pixels on the images were bounded with rays represented in 3D world coordinate system. Sample locations were gathered along the rays, and their volume rendering results were used to fit the RGB colors on the images' pixels. In this way the Lego object was modeled into the NeRF. Using the trained NeRF, I'm able to predict the images of the Lego taken from any given perspectives. I rendered these images into a video to create a rotating effect of the Lego.
  2. Project 5 : Fun with Diffusion Models
    Project 5 consists two parts: The power of Diffusion Models (Part A) and Diffusion Models from Scratch (Part B).
      In Part A, I mainly played around with a pretrained diffusion model called DeepFloyd IF. First, I used the model to conduct denoising. I blurred a sample image using random noise, and used the model to predict that noise. I also denoised the image using Gaussian Blur and compared the denoised results. Then, I denoised a random noise image to obtain a computer-generated image. I adapted the Classifier-Free Guidance technic. Later, I conducted image-to-image translation, where images are translated into similar images either based on masks or text prompts. At last, I produced Visual Anagrams, Hybrid Images, and a course logo.
      In Part B, I built and trained a diffusion model from scratch. First, I trained a UNet to denoise half-noisy MNIST images (original image + 50% pure noise). Then, to denoise images with different amount of noise, I added Time Conditioning to the UNet, where the UNet is told how noisy each images are. The trained UNet can accurately predict the noise that had been added to the images. Using the trained UNet denoiser, I generated MNIST-like images by denoising pure noise in 300 steps, only to find the computer generated images looks little like human-written numbers. To boost the result, I added Class Conditioning to the UNet, where the UNet is not only told how noisy the images are, but also the labels (0 to 9) of the images. 10% of the images are not provided with a label. I adapted Classifier-Free Guidance to generate MNIST-like images. The results are much better compared the previous attempt.
  3. Project 4 : Autostitching and Photo Mosaic
    Project 4 consists two parts: Image Warping and Mosaicing (Part A) and Feature Matching for Autostitching (Part B).
     In Part A, I rectified images using Perspective Transform. I manually selected correspondences on the images, and warped them so that their transformed correspondences form a rectangle. I also produced mosaics images by blending pairs of images that overlap with each other. First, I manually matched a few pixels that represent the same corner of an object on the two images. Then, I treated these pixel matches as correspondences, and warped the first image so that after warping, the correspondences on the first image aligns with the correspondences on the second images. In this way, the same objects on the two images would match. Finally, I conducted Alpha Blend on the output mosaic to erase the edge between the two images.
     In Part B, I also produced mosaic images, only this time instead of manually matching the pixels, the pixel matches are automatically detected and selected. Corners serve as great symbols of objects on an image, so I used Harris Corner Detector to find the corners on the images, and treat them as interest points. Then, I used Adaptive Non-Maximal Suppression (ANMS) to select a few interest points that are not only high in "corner strength", but also as uniformly distributed in the image as possible. They are the potential correspondences. Later, I matched the potential correspondences using Feature Descriptors. If the best match of a potential correspondence did not score significantly higher than its second-best match, I would abandon this pixel. The matched pixels still may contain error. I found the optimal set of matches using the idea of Random Sample Consensus (RANSAC). At last, similar to Part A, I used the optimal matches to conduct perspective transform on the first image so that it aligns with the second image, and blended the overlapping region to erase the edge.
  4. Project 3 : Face Morphing and Modelling a Photo Collection
    In the first part of this project, I morphed two face images using Affine Transformation. I obtained 100 correspondences for each of the two faces and computed their average coordinates. Triangulation was conducted on the correspondences and for each triangle, Affine Matrixes were generated to stretch the triangles from the original image to the Midway image. I used Cross Dissolve to bind the color. I furtherly generated a sequence of 51 morphed images using different Morph Weight to produce the morphing GIF. In the second part, I computed the mean face of 12 Brazilian faces, and stretched my face into the shape of the mean face. I also computed the mean face of 12 smiling Brazilian faces to add a smile on my grim portrait.
  5. Project 2 : Fun with Filters and Frequencies
    By applying filters and analyzing frequencies, images can be processed and combined in interesting ways. In the first part of this project, edge detection is conducted by applying the Finite Difference Filter. Gaussian Filter is applied to get rid of the unnecessary wrinkles. Then, images are sharpened by stacking its edges onto itself. The second part of this project consists of two image binding tasks. The first task generates Hybrid Image by adding the high frequency of one image to the low frequency of another. Both successful and failing attempts are introduced. The second task blends images by applying the Gaussian Stack and the Laplacian Stack.
  6. Project 1 : Colorizing the Prokudin-Gorskii Photo Collection
    Prokudin-Gorskii photographed the Russian Empire using black-and-white negatives with red, green and blue filters, hoping future technologies could stack the three layers to produce colored image. I took the digitized negatives of Prokudin-Gorskii's work to produce the RGB color image. The original three color layers from Prokudin-Gorskii's work were not accurately aligned, so I designed an alignment algorithm using Image Pyramid and Edge Detection to preprocess the layers before stacking them together.




* Last updated on Dec 9, 2024.