Java Reference Counting (and side projects)
I’ve recently completed another large update to the MindsEye code, implementing a reference-counting base for many of the core classes. This memory management pattern provides us with much tighter system resource management and dramatically reduces load on JVM’s garbage collector. Memory contention has proven to be a main limiting factor in supporting modern large-scale deep-learning models, so these changes were quite beneficial and I think they suggest why Java has often been less popular in this field: The reliance on mark-sweep memory management in Java is often quite inefficient compared to other models when used on this problem. Conversion to a reference counting model is possible in Java, but it involved so much testing and iteration that it actually gave me enough time to prototype a couple of interesting new projects.The Road to Reference CountingThe transition to reference-counting came over several phases due to how it overlays the main mark-sweep garbage collector. First, of course, we establish our base classes for reference counting logic. The initial design of our base class is critical, not only for performance but also for debugging. The class used by MindsEye is ReferenceCountingBase, and has a few key characteristics:Many operations check and require that the object is “alive”, with >0 referencesEnforces the thread-safe contract for freeing the object exactly once immediately after the last reference is freedIf configured in “debug” mode, it will track the thread stacks of all reference allocations and frees.If the object is freed by the garbage collector (i.e. not by a manual reference free) a diagnostic message is logged to debug.If an object is double-freed, i.e. a dead object is freed, throw a fatal error.Our next step is to wire any resource-intensive class we want to be able to explicitly free so that it derives from this base type. We then recursively search for any types which keep references to reference counted types, and these should also be included. Then we face the long task of adding the housekeeping calls to addRef and freeRef; some platforms are able to automate this and even hide it from the language, but in Java this pattern isn’t nearly as standard. We rely on missing addRef calls to be detected at runtime when using or freeing a dead object, where a fatal exception is thrown. We also detect missing freeRef calls at runtime by detecting memory leaks which are logged when objects are freed by Java’s garbage collector. Fortunately, our adoption of reference counting logic can be somewhat gradual since we can ignore any missing freeRef calls which do not retain too much memory.Once we have wired in reference counting logic, we can use it to provide some very useful optimizations. The first is to use object pools, such as my RecycleBin, to ease the load on the memory allocation system. When an object is explicitly freed, it can add its resources back to these pools, and then use them when being re-instantiated. The second is to provide methods such as addAndFree; many immutable object operators will be immediately followed by freeing the previous instance, and if this is made into one call then the object can be safely modified in-place if there is only one reference to it. This pattern can provide the increased efficiency of mutating operations combined with the logical rigidity of immutable objects.The final optimization was to integrate reference counting into our caching mechanisms and provide logic for the eviction of “soft” logical references to free memory under load. In MindsEye, we track the usage of GPU memory and when it is over a threshold we evict cached convolution kernels and intermediate datasets.While the computer was working…Completing this project required a lot of testing and iteration. While I was running the automated tests, between bug fix updates, I had quite a bit of downtime in which I didn’t want to change MindsEye code, and so I ended up exploring some lower-priority coding projects I’d been thinking about: The first is a filesystem driver for Hadoop which allows efficient and direct use of Git repositories; The second is an integration of Maven and Eclipse’s Java Document Object Model to provide a skeleton for future code analysis and generation projects.For the unfamiliar, the Hadoop tool ecosystem includes a pluggable filesystem, with drivers provided for local, HDFS, S3, and many other storage systems. Git is a popular peer-to-peer version control system, providing a popular and efficient network-capable replicated database for source code and text data. Our project uses JGit, which is a pure-java implementation of Git, to manage local repositories and network transfers. Repositories are cloned as needed, then updated using a background thread to maintain current data. Once the data needed is cloned to the local filesystem (similar to how the S3 drivers work) then further driver logic is provided by the local filesystem driver after rewritten api calls. This provides high performance access to code and configuration, while using highly efficient network updates.Another tool, or rather the start of one, is an integration of Maven and Eclipse to provide easy access to advanced Java meta-coding. This tool loads a project given its Pom.xml file, and uses maven to resolve all runtime dependencies and source paths. This project data is then used to initialize an eclipse code parser, resulting in a number of AST (abstract syntax tree) objects. There objects can then be read and analyzed using the visitor api, and the files can even be updated by leveraging the document object model. So far the demonstration simply mounts a project and prints out all of the ASTs, but even this can be interesting. I have some much more interesting plans for this tool…