Project Work: Week 10

Final Testing: Transfer Learning + VAE

As the project’s timeline is wrapping up, I did some final experiments and then finished by finally running the VAE model I have been working in conjuction with the transfer learning method I demonstrated previously.

Diversion: Transfer learning on depth map network

In a slight diversion at the start of the week, knowing that transfer learning worked very well on the autoencoder network for the sword dataset when the network was pretrained on ShapeNet, I tried a similar tactic on the Soltani et al. depth map network I worked with in early October. The pre-trained version for that network similarly uses ShapeNet, and I was able to load in the pre-trained weights without much trouble. However, unfortunately even with the pre-trained weights the network gave the same results I was receiving on the 100-sword dataset before, namely the degeneration into small diamond-like shapes.

Small experiment: Disabling random “patch sampling” before decoder

After that didn’t work, I returned to my main work on the AtlasNet architecture. My problem before was that all outputs seemed too similar to one another (and the quality wasn’t great). I tried to look for more reasons why this might be and noticed several interesting lines in the model architecture code before the latent vector is fed to the decoder.

Here a comment in one version of the documentation talked about random “patch sampling”, i.e. the 25 different patches that the network uses to make up the final reconstructed point cloud (and thus mesh). I theorized that this may have been affecting the performance in some way, e.g. randomizing which weights should go to each patch every epoch instead of keeping them the same each time. Thus I attempted to remove this functionality by commenting out the randomization line.

No random patch sampling

Unfortunately, this seemed to break something because the network was only generating one useful patch out of the 25, and the rest just degenerated into single points in space. Given that this code was in the main autoencoder anyway, though I don’t fully understand it it seems to perform some needed function, and removing it does not solve my problem.

VAE with 25 different distributions

Next I tried to modify the VAE to try something I had been meaning to try at the end of last week, namely having the network learn a different probability distribution for each patch instead of one for all of them. I edited the network architecture code to use one nn.ModuleList each of mu and logvar layers instead of just single layers.


Training the new network for a few epochs, I was surprised to encounter very similar results to the image above when I took out random “patch sampling”. However, I realized this was due to the fact that I forgot to modify returning mu and logvar (and thus calculating Kullback-Leibler distance for the loss function) for each of the new layers, and thus the system was only returning the last one. This meant that the network was only backpropagating into that one layer’s weights, and was not learning anything for any of the other patches, which explains the results in the image.

I fixed the code to get the following results:


In the end I calculate the KLD for each patch and then sum them all together. The total is added to the chamfer distance to get the final loss function. Here I cut training short because the training and validation loss didn’t seem to be decreasing at all over nearly 20 epochs. Thinking that my change might have affected the “ideal” setting for the learning rate, I tried decreasing the learning rate to 0.0001 to get rid of these fluctuations.


The training plots are above. Since I added so many extra layers, training this network for 120 epochs took a long time; over 30 minutes.


And here is an example of the results. Here the output mesh seems to match the general wingless shape of the input point cloud (kind of), but overall I was disappointed to see that this whole endeavor didn’t seem to let the network broaden the range of meshes it would output besides variations on a “default plane”, which is still what I see here.

Varying the influence of epsilon

After that experiment, I went back to the version where only one distribution is learned, and tried to play with the distribution coefficients again. Last week I tried varying a scalar on the KLD term in the loss function, but this time I tried scaling the epsilon vector that is randomly generated multiplied by the logvar term in calculating the VAE latent vector z.

First I tried scaling the vector by a factor of 0.1 to get the following training results:


These results were not very different than what I had been receiving without the 0.1 scalar, which I confirmed by performing reconstructions using the learned weights of the network. Thinking I might not have gone far enough, I tried just scaling this term to 0 entirely:


This gave me very odd results. The reconstruction in the visdom tool looked okay (but not really better), but the loss graph showed flat, strong (-410) negative loss. I didn’t really understand what this meant, and when I tried to perform reconstructions using the learned weights, I got weird degenerate results with the different patches all in random places. Clearly this approach wasn’t working either.

Transfer learning with the VAE

At this point with none of these experiments leading anywhere, I realized that I didn’t have much time left on the project, so I needed to focus my efforts on at least trying to get results for the network I have on my original sword dataset so I can have something to present. This required figuring out how to combine the transfer learning method I tried a few weeks ago with the VAE modifications I have made since then.

Ultimately, I figured out a way to modify the weight-loading code in the training script to only try to load weights into layers with names that match ones in the file I am trying to load from. In the default version all layer names must match, which was an issue since the mu and logvar layers unique to the VAE model don’t exist in the pretrained network weights.

Once I did this, I loaded the pre-trained ShapeNet weights and ran on the 1000 ShapeNet plane dataset for 120 epochs as usual:


This showed pretty good results as seen in the reconstruction on the left. When I performed actual reconstruction on my own though the resulting meshes were somewhat less accurate though (but more than before). I also noted here how the training and validation losses suddenly diverged at 100 epochs. This is probably due to some existing hyperparameter schedule in the code that I didn’t get time to investigate this week.


Next I tried the exact same experiment but on the dataset I created on my own of 100 swords taken from 3D video games. The results initially seemed okay around 40 epochs but then seemed to get worse as the reconstructions became more blob-like.


Indeed, when I performed reconstruction on a random point cloud afterwards the results were not very pretty. My next goal is to figure out how to fix this part.

Sampling with the transfer learned VAE

Finally, since the crux of my project is about synthesizing new models for video games, I needed to also test sampling with this pre-trained network as well. I didn’t try on the one trained on swords because the reconstruction results were still not good, but I did try several results on the version trained on planes.


The results are definitely plane-like, but admittedly they are not very realistic as I would like. Unfortunately this is where I am at this rather late stage in the project. I think it is quite possible that the VAE architecture I chose is not going to generate as nice-looking results as the autoencoder reconstructions anyway, so this may just be the best I can get. I recorded these for now, and in my remaining time will work on trying to get similar or better results for the sword models.

Next steps

Here at the end of the project I have not reached a point where I have a model that is synthesizing beautiful and realistic new 3D models, but that may have been a little unrealistic to begin with. Currently I at least have a working VAE model that I can synthesize some new models from, and this is what I plan to present. As mentioned, in the remaining time I have I plan to work hard to see if I can get the network to synthesize at least decent-looking results for the sword dataset like it currently can for the planes. This would at least give reasonable validation of the ability of this technology to do what I set out to do, even if it is not quite powerful enough yet or I could not find the right architecture in the time I had.