Wednesday, March 08, 2006

Java Optimizations

Java programmers are used to having memory handled for them. They don't have to malloc, and they don't have to free. You just create objects when you need them and once you've stopped using them, the JVM will clean up after you. It's magic.

But sooner or later garbage collection will bite you on the ass. Here's what happens. Your application has places where it creates a lot of little objects and discards them almost immediately. If this doesn't fill up your heap space, it will fragment it. The JVM has a more and more difficult time finding consecutive blocks to fit your larger objects. Ultimately, this will result in a "full gc", where the garbage collector walks the entire heap looking for memory blocks to free up, moving objects around in memory as necessary to defragment the heap. This process may lock up your application completely, and, if memory is being allocated faster than it can be freed, indefinitely. Generally the symptom of this is that Java's CPU usage goes to 100% and stays there.

To remedy this, you must optimize. But how, and what? As a starting point, here are some of the most common culprits of inefficient memory allocation, where you can optimize a lot for little effort:

XML. Processing XML is unnecessarily resource-intensive. The best advice if you value your memory, and if you can at all help it, is to not use XML at all in your application. Walk away and don't look back. Parsing this stuff makes a lot of memory allocations. Every tag and attribute becomes an object, and so does every bit of whitespace between, before, and after every tag. And every object is a new memory allocation. There are many alternatives to XML that cost a lot fewer resources, including databases, s-expressions, and property files.

Vendor JDBC drivers. These are the stock drivers that come with your Database Management System. Since these drivers are usually free, and they are not the vendors' core product, they are notoriously inefficient. If your application talks to a database, chances are that your JDBC driver is eating most of your memory. Consider switching to a commercial-grade JDBC driver, or make fewer calls to the database in your application. Just switching drivers can slash your memory usage by two thirds or more.

StringBuffers. Most applications do a lot of string concatenation, and you can get a lot of bang for your optimization buck here. Use StringBuffer.append() instead of Strings and the '+' operator. All string concatenation actually gets translated to StringBuffer appends at compile-time, but in general the compiler will create a new StringBuffer for every appearance of the '+' operator. So use append() explicitly, use as few StringBuffers as you can, call .toString() on the StringBuffer as late as you can get away with, and only once.

For example, if your method creates a StringBuffer, appends a bunch of stuff to it, and returns a String, consider converting that to a void method that takes a StringBuffer argument and appends to it. This will avoid both the allocation of a new StringBuffer and its conversion to a String.

Mind you, don't try to optimize concatenation of String literals this way. These can and will be optimized at compile-time by javac to be allocated a single time when the class first loads.

Finally, if you already know roughly the size your final StringBuffer will be, initialize the buffer first with the StringBuffer(int) constructor. The default size of a StringBuffer is 16 characters, and every time you append past the buffer size the underlying char[] array is doubled in size which requires allocating new memory. Better to allocate the entire buffer once, with not a byte more than you need.

The same principle applies to collections. Initialize your collections to their final size rather than allowing them to resize several times. If you're adding to a collection in a loop, you probably already know how many iterations there will be, so pass that number to the constructor and save yourself some heap fragmentation.

Attack of the .clone()s. Sometimes you want to augment an object on the fly, just for yourself, without modifying the particular instance. So you clone to a new instance and modify that. What this does is allocate memory for the new object and then copy each of the source object's attributes to the clone. Be very careful with this, and avoid wherever possible. If you have to clone large objects, a lot of objects, or both, then you probably have a poor design and you should refactor. Deep cloning is even worse. Usually this is done by serializing an object to a stream, and then creating a new object from the serialized data. It's very rare that you actually need two copies of the same thing in the same VM. Consider refactoring.

Go static. There might be places in your applications where a lot of objects are being created for no good reason. You might have objects that do nothing except sit around and have their methods called, or get created on the fly like so: new Wossname().doThing(). Here, doThing() should be static, and there's no reason to create that instance of Wossname. There might be other objects that maintain a little bit of internal state that can just as well be static. For example, a method may behave differently based on if a boolean property of the object is true or false. In that case you can replace that design with two classes, one with a static property set to true and another which is a subclass of the first, only overriding the static property by setting it to false. Alternatively, just pass the boolean to static methods instead of keeping a property on the class. Voila, no instances of those classes need to be created, ever.

Pooling. In that same vein, you can save a lot of memory on classes that a lot of threads have to instantiate constantly. Instead of creating and discarding several hundred objects per second, consider creating a pool for those objects. Threads will request an instance from the pool and then return the instance back to the pool when they're done with it. At this point the pool will reinitialize the object as necessary to make it ready to be given to another thread, just as if it were a brand new instance. The effect is that memory consumption will remain more constant and stable over time as a result of fewer overall allocations.


Ultimately, all these optimizations are just guesswork unless you can monitor the effects in a controlled environment. For this you need a profiler. I recommend investing in JProfiler. It's a very good tool, although still somewhat buggy. Its best feature is the ability to look at "allocation hotspots" which shows you where most of the memory is being allocated, both by amount of memory and by number of allocations. Not just which objects are eating your memory, but which methods are creating them. This kind of tool will allow you to discover just how poorly your code really is utilizing memory, or how badly it is fragmenting your precious heap.

No comments: