Project Work: Week 8

Variational Auto Encoder

This week I took big steps to modify the application so it can actually sample novel meshes. This chiefly involved the conversion of the AtlasNet autoencoder architecture into a variational autoencoder (VAE).

Architecture description

At the start of this week, my only goal was to find some method for synthesizing novel models, but I wasn’t yet certain how to do this. I knew that it had to “sampling the learned manifold” by generating a random feature vector and running it through the network decoder, so that is where I started. I performed one initial test where I put a randomized feature vector in the decoder of the network trained on 1000 planes, but the result was not great.

Random feature vector recon

Clearly this naive approach was not the way to go. Doing a bit of research online, I realized that in practice many generative models of this type are actually variational autoencoders, rather than the plane autoencoder that AtlasNet is. I decided to learn more about the architecture and how I could convert AtlasNet to this new scheme.

VAE Architecture description

A vanilla autoencoder tries to learn a network for the best embedding of input data to be encoded into, and then has another learned network for decoding that embedding back into the original image. The key here is that in between the encoder and decoder is a regular 1xN feature vector output by a fully-connected layer at the end of the encoder.

In contrast, rather than encode into one vector, a variational autoencoder (VAE) learns to encode in such a way that this embedding is a sample from a probability distribution. In practice, this means that rather than output a feature vector in between, the encoder outputs a mean and standard deviation used to sample a feature vector to input to the decoder. This is what makes synthesis possible: one can then give a random sample on a normal distribution to the decoder to get a novel result.

This change also necessitates an extra term to the training loss in the form of Kullback-Leibler divergence (KLD) loss, which quantifies the difference between two probability distributions.

VAE Results

Once I understood the basic architecture, I read online to learn that by adjusting the end of the encoder to output a mean and standard deviation, I could make the change to a VAE architecture. I also had to add in the KLD loss which I was able to do from some PyTorch examples. Once I was finished I tried training first on the 100 swords and then again on the 1000 planes to get the following results:

VAE 1st sword VAE 1st sword recon VAE 1st plane VAE 1st plane recon

These reconstruction results were not very satisfactory, but I also wasn’t certain that I completed the implementation correctly. :) First I noticed that I had accidentally removed a ReLU nonlinearity from the end of the encoder, but adding this back in did not change the results appreciably. Next I tried taking out the fully-connected layer at the end of the encoder before the layers for the mean and standard deviation so that those two could be learned directly after the main encoder section. However, this did not seem to work well either.

Finally, given that the plane reconstructions were so good before but no longer in the VAE architecture, I thought that if I decreased the weight of the new KLD term in the loss function I might be able able to force the architecture to focus more on the standard reconstruction architecture while still maintaining the sampling ability. I found discussion online and a paper that both suggested 0.1 as an appropriate weight for the KLD loss, which I tried to use. However this again did not give me better reconstructions.

VAE sampling

Given that I was able to create a VAE architecture, I did want to try the sampling mechanism, which gave the results below when sampling a feature vector with a 0,1 normal distribution.

VAE Planes

The results were unique, but given that the reconstructions were not great to begin with the generated results were not very realistic-looking either.

Transfer learning

On a separate note, over the past week I had been mulling ideas from my deep learning class about the prevalence of transfer learning in the field. This is when a network is first trained on a large, standard dataset such as ImageNet (or ShapeNet like AtlasNet is!) before being further trained on the dataset of interest. With this method some of the existing weights may be fixed, or the entire network’s weights may be allowed to change.

To try this method, I activated a provided input argument provided by AtlasNet to load an existing model, and I used the pre-trained ShapeNet model that came with the GitHub repo. I conducted training on the 100 sword dataset for 120 epochs as normal. This gave surprisingly good results!

Sword pretrain training Sword pretrain reconstruction

The output was not quite as good when I tried on other examples, but seems to perform much better than any other method I have tried in the past few weeks, which seems like an accomplishment.

Next steps

This week I made modifications to change the AtlasNet autoencoder into a VAE network, with mixed results. My work at the end on adjusting the KLD loss weight seems promising; I think I will start the next week by reading more into those papers and figuring out if I missed a step such as normalization. Otherwise I will have to continue experimentation to figure out how the new additions are interacting with the standard AtlasNet parts to cause it to not function as well anymore.

Meanwhile my work on transfer learning was promising, but there is the issue that the ShapeNet pre-trained network is trained with the vanilla AE, not VAE architecture. I will need to surmount this problem next week, otherwise I will have to train my own network to use in pre-training.

Nonetheless, I took some big steps this week and got a few promising results here and there, so I am confident for next week.