I’ve recently completed another large update to the MindsEye code, implementing a reference-counting base for many of the core classes. This memory management pattern provides us with much tighter system resource management and dramatically reduces load on JVM’s garbage collector. Memory contention has proven to be a main limiting factor in supporting modern large-scale deep-learning models, so these changes were quite beneficial and I think they suggest why Java has often been less popular in this field: The reliance on mark-sweep memory management in Java is often quite inefficient compared to other models when used on this problem.
Now that I’ve cleaned up the testing and documentation of MindsEye, I have been able to re-focus on why I started writing it: Optimization Algorithm Research. In the course of playing with this code I have tried countless ideas, most of which taught me though failure instead of success… However I do have two ideas, fully implemented and demonstrated in MindsEye, that I’d like to introduce today: Recursive Subspace Optimization allows deep networks to be trained effectively, and Quadratic Quasi-Newton enhances L-BFGS with a quadratic term on the line-search path.
In the last article, we covered a common testing framework for individual components, but we didn’t cover how these networks are actually trained. More specifically, how should we design a test suite to cover something so broad as optimization? A big problem here is that the components are heavily dependent on each other and also vary greatly in function and contract, and so there are few opportunities for generic testing and validation logic.
A critical part of any good software is test code. It is an understatement that tests improve quality; they improve the scalability of the entire software development process. Tests let you write more code, faster code, better code. One of the leading testing methodologies is unit testing: the philosophy of breaking down software into individual components and testing each separately. It turns out that a great case study in unit test design also happens to be one of today’s hot tech topics - artificial neural networks.
A recent project that has huge implications for the field of AI is NVidia’s CuDNN library and related cuda-based libraries. Beyond simply being very useful and enabling hardware accelerated AI with cutting-edge performance, it establishes a common layer of high-performance mathematical primitives that, while using the hardware to its best extent, provides a common api to write software. With my recent addition of CuDNN-based layers, Mindseye should behave comparably with any other state-of-the-art deep learning library.
Recent developments in MindsEye have yielded greatly increased speed and scalability of network training. Major improvements to the OpenCL kernels have increased speed in some tests by 50x or more, and data-parallel training has been tested with a Spark cluster. This combination of GPU and cluster computing support should bring MindsEye much closer to the performance and scale of other frameworks, if not in the competitive range! The componentization of the optimization code that I wrote about previously has enabled Spark support to be implemented in only about 100 lines in one self-contained class, a nice result of careful design.
Further research and development with MindsEye has produced two new features I would like to discuss today. The first is a working demonstration of a stacked sparse denoising image autoencoder, which is a fundamental tool in any deep learning toolkit. Second, I will introduce a useful tool for producing both static and interactive scientific reports, which I use to produce many of my demonstrations and conduct much of my research.
Almost two years ago I developed a neural network library called MindsEye, which has largely sat idle since the release of TensorFlow. Recently however I’ve wanted to follow up on research involving neural networks, but I wanted a “pure” java option I could use for research. And so I decided it was time to revive my old project. In this release, I have reviewed all of the code and made many improvements.
In my last post, I talked about a new project I was working on to explore convolutional neural networks (CNNs). I’ve spent much of the time since playing with and iterating on this library, and I wanted to take a moment to share what has been built so far. I’ve ended up with a library of 30 network layer types which can be wired in an arbitrary directed acyclic (non-recurrent) graph/network and perform gradient descent training and optimization.
I’ve gotten to an interesting point in my latest project, inspired by Google’s fascinating recent work with convolutional neural networks. The project can now apply inverse convolution operations using multiple fitness functions. I wanted to explore the technology of image processing neural networks from the ground-up, so I started by building the fundamentals of a backpropagation neural network library. Building the basic components and solving the initial problems has been interesting, and surprisingly complex.