This week I continued on the work I completed last week with generating shapes from a Pokémon dataset using Soltani et al.'s multi-view encoder-decoder model. One of the major findings from last week was that my dataset was too diverse in its shapes, which resulted in the generation of many uninteresting blob-like shapes that did not express any distinctive features. Thus this week I created a new dataset that was more targeted to attempt to solve the problem.
This week I created a dataset of different kinds of swords, noting that this class of objects would still be useful for a game but that its shapes are more limited than that of the Pokémon I tested last week. Still, there was some intra-class variation such as curved blades and the length of each sword. Unlike last week I also sourced the models from a variety of different games instead of just one, which required that I resize each one to be proportional to the others before exporting.
Once all of the models were ready, as before I ran the program included with Soltani et al.'s code to generate depth maps from each of the STL files. Then I proceeded to train the network, where I ran into a good deal of trouble. Since my dataset was smaller this time than last, I ran into several unforeseen issues with the model code where the number of batches ran at a given stage would be larger due to division along the lines of (#examples/batchSize = #batches). A larger number of batches means that more training is running in parallel on the GPU. Thus I encountered a series of mysterious errors from deep within the Torch training libraries that I eventually realized were all coming from running out of VRAM on the GPU. I was able to solve these problems by manually restricting the number of batches in several places, though unfortunately this whole episode sapped a good deal of my time.
Eventually, with the model working I was able to retrieve samples from the learned manifold of sword shapes. Unfortunately, the results were a series of vague diamonds for all viewpoints, in a result reminiscent of last week. However this result troubled me more than last week, since it did not seem like having that same 2D shape for all viewpoints would logically lead to a 3D shape as some would have to look different. This led me to try various tactics to investigate if not all viewpoints were being used, but I was unable to find anything to solve the problem.
I wondered if the dataset size made any difference, and tried limiting the headphone dataset (which gives acceptable results) down to <15 examples. Curiously though, I was still able to get decent results while I would receive the diamond shapes for the swords on the exact same settings (i.e. only changing the data). Next steps for me are trying to figure out what intrinsic differences in the data might exist that are causing this problem, though I think I need to limit my time on this. As I have already sunk some time trying to fix this, if I do not get any farther after a few more attempts I think I will move on to trying a different model instead.