Project Work: Week 5

One Epoch at a Time

This week signalled a more complete shift in my efforts as I fully transitioned from working with the Soltani et al. depth map network to AtlasNet. However, initial work with AtlasNet shows more potential that I encountered with the previous model.

Final work with depth maps

In working with the depth-map network by Soltani et al., after last week, I was still suspicious about whether I was correctly activating the “silhouette” option in network training. This option is supposed to take input depth maps as binary black/white, which I thought would eliminate the issue I had where the network activations were only concentrated in areas near to the camera. However I was still getting the same results as before, and after confirmation I realized that the silhouette mode was working correctly, but that it it would not fix my issue.

One VP sword

Next, I wondered if the number of different viewpoints was affecting things and tried training on just one viewpoint. This created the much more sword, like image visible above, which also matches the training orientation. Encouraged by this result, I tried training for two viewpoints and got similar results but for two different viewpoints. However I started to notice a “ghosting” effect where a faint shape of one viewpoint’s image would be present in another one. This phenomenon also existed in the ShapeNet headphone results.

Training on 6 viewpoints

Finally, I tried training on 6 viewpoints, to create the image above. This result is visibly different from the horizontal diamond shape I was getting on the sword dataset for 20 viewpoints, but clearly shows degeneration towards that shape. In addition all six viewpoints displayed this general shape, which led me to believe that with my data, beyond a certain small number of viewpoints the network outputs very quickly homogenize into this one, flat diamond shape for all viewpoints.

After these results, I began to realize that unless my input data are extremely similar, there may not be much I can do for the Soltani et al. network to give better synthesis results. It may be the case that this model is simply not good enough, and will not be able to do better than this no matter what I try. Given that I was already looking into alternatives, this was the moment when I decided to move away from this model once and for all.

Real work with AtlasNet

Energized by my determination to try another network, I faced the AtlasNet model. I had trouble with this last week because I was unable to get my data to work with the custom dataloader, but this week was able to circumvent this issue and proceed to eventually get results.

The solution to the dataloader problem ended up being to simply spoof the rendered view files that the dataloader was expecting with ones copy-pasted from the ShapeNet dataset. Since according to the advice I received from the author these views would not even be used for the autoencoder I wished to train, spoofing the data like this did not appear to affect the results in theory and in reality.

Binary PLY mess

The next error I encountered was an arcane PyTorch error, but after some sleuthing online I found an answer asking about non-ASCII characters. On closer inspection, I realized that although my sword data were in PLY files as needed, apparently PLY files come in both binary and ASCII flavors, and mine were mistakenly in binary, making them unreadable. Using Meshlab I was able to go back and re-export the files as ASCII PLY with the correct settings.

The final error I ran into related to column mismatches. After print-debugging I was able to find that portions of each line of the PLY files were being cut off in some places. It seems that the expected PLY line length was hard-coded at 60 characters, while mine extended longer. By changing this value to 80 I was able to make things work.

After all this, at last the AtlasNet model was able to start training on my data!

Initial AtlasNet training

The model was set to run for 120 epochs, and completed training in about 8-10 minutes on my computer. The authors provided functionality to connect to an online Visdom visualization tool where I was able to view input and training/validation loss. Here I quickly realized that training loss was not decreasing at all, and instead was wildly fluctuating around a very high 14,000. (!)

Drawing on knowledge from my deep learning class 😎 I realized that earlier I was concerned how my PLY files were not normalized to fit within a unit square, but had not thought much of it. Now I thought that these values greater than 1 could be causing the activations to be greater than 1, which would lead to very high loss and steep, steep changes in gradients during backpropagation (thus the oscillation). I took a small sample of data I had before that I had normalized, and tried running AtlasNet again. This time the training loss decreased on the expected curve, though validation loss soon increased far beyond training loss indicating massive overfitting. As an initial run I was not as concerned about this though.

However, I did realize that I had been training on the AtlasNet v2 network, which though newer does not provide any visualization tools after training is complete. Without any way to check my results, I ended up porting the data to train again on the AtlasNet v1 network which does include visualization scripts. Here I was able to get the following results:

Initial AtlasNet training

The results are not very great, but as an first try on this network I was impressed. The general sword shape is captured as well as some other features, such as increased detail around the sword guard compared to the blade. With more experimentation I hope that I could tease better results from the AtlasNet network on my data.

Next steps

I am very encouraged by what I found from the AtlasNet results, and my main plan for the next week is to continue experimentation with this network to see if I can get better results by adjusting model parameters or trying other data such as my Pokémon dataset. If the model still does not perform well enough, I may consider some other papers I have recently found that could work as alternatives (they demonstrate better results than AtlasNet on ShapeNet data). I also wish to reach out again to industry members to see if I can conduct a second interview to get advice on the impact of my project. Overall though, I have renewed optimism for the future of Project Rodin and am looking forward to this week’s tasks.