I continued various experiments to try and determine why my current VAE implementation has not been giving good reconstructions yet.
At the start of the week, I had implemented a version of a variatonal autoencoder network using the AtlasNet model by Groueix et al., and was able to get reconstructions on 1000 ShapeNet planes. However these plane reconstructions were of much worse quality than those from the vanilla autoencoder architecture, which were quite true to the original point clouds. Better reconstructions are needed so that the models I generate using the end product are realistic enough; if they come out as amorphous blobs then the results won’t be quite as useful. Thus this week I set out to tinker with the VAE architecture in an attempt to determine how I could make better reconstructions.
One of the last things I tried last week was to weight the Kullback-Leibler divergence (KLD) loss term that measures loss on the learned distributions (the VAE part) less at 0.1 in hopes of getting better reconstructions. Since that wasn’t enough before, this week I began just by testing an even smaller weight, choosing a lambda weight of 0.001 instead.
Early on in training there seemed to be pretty good results and I was impressed with the quality of the reconstructions given.
However, by the end of training the reconstructions seemed to spread out again and become a little less tight. Another key fact I noticed was how the reconstructions all seemed to follow a basic passenger jet sort of shape, so that even when the input point clouds are different, as with the fighter jet in the image above, the outputs are still generally the same.
I confirmed this trend by running several actual reconstructions and getting similar results (and similar to the results I got when lambda = 0.1 before). Even when I tried giving an input where all values on the y and z axes were set to 0, I still received a similar output when reconstructed through the VAE.
Since the reconstructions seemed better earlier on in training, I tried to create a sort of learning schedule by decaying the value of lambda from 0.001 by a factor of 10 at 80 and 100 epochs respectively (out of 120 epochs). This did not give appreciably good results. In fact, when I tried to cut off training at 40 epochs when I thought there were good reconstructions that did not seem to be the case after all. At this point I realized that the main issue was that of all the reconstructions turning out the same, which was apparently starting already by this 40 epoch mark.
Comparison: Ignoring the KLD Loss term
I theorized that training with the KLD loss term was causing these reconstructions to all turn out the same, unlike the unique, realistic reconstructions from the original AE without the KLD term). Here is an example finished training visualization from the standard autoencoder:
The KLD term should be expected to give more organic, random results thanks to its randomized epsilon term used in the reparametrization trick between the encoder and decoder, but I found sources online that seemed to indicate still (as I was trying all along) that if the weight KLD term is decreased (or at the extreme, set to 0) then the model should naturally prioritize reconstruction. When I set the KLD term to 0 though, I did not find this to be the case.
In fact, the end results were just a bunch of useless dots that didn’t give any output mesh. Furthermore as I investigated I figured out that the weights on layer for the sigma term (which is multiplied by epsilon) were not being updated even though I thought they should have been. I am still currently stuck on this result, and finished the week by posting some of my findings on the AtlasNet GitHub page asking for support from the author.
This week unfortunately was somewhat less productive than the last in that I was unable to solve my problem of getting better reconstructions from the VAE network. I still have things to test for the next week, such as continuing to tinker with the KLD term weights, but there is not much time left on the project. Ultimately, in the next I need to worker harder to get to the point where I have a working reconstruction model that I can randomly sample from, as this will achieve my main project goals and give me something to present at the research expo in two (!) weeks.
As mentioned I posted on GitHub asking for support, but I also plan to reach out to one of my professors in the next few days who may be able to help me get around this problem so that I can complete my race to the finish and get my project done.