The Efficiency of Java and C++
I seem to have trouble convincing my neo-Turk students that Java's design makes
it inherently less efficient than C++.
The arguments often and up in an exchange of comments like:
— This (micro) benchmark executes with the same speed when written in Java and C.
— Yes, but a realistic application, like Eclipse takes ages to start up.
— You are only complaining about the cost of the runtime startup
costs and JIT compilation, which are quickly amortized, and, anyway,
Eclipse offers many more features than other IDEs.
and so on.
I therefore wrote a small program to demonstrate the exact problems of
Java's design decisions.
The program fills a sorted container with 10000000 objects containing a pseudo-random integers, and then prints out one every 100000, displaying 10 integers on its standard output. The integer values of the objects in the container are ordered in a descending sequence. Here is the Java (1.5) and the C++ code.
Java
import java.util.*;
final class RInteger implements Comparable <RInteger> {
private int i;
RInteger(int v) { i = v; }
public String toString() { return (new Integer(i)).toString(); }
public int compareTo(RInteger o) {
if (o.i > i)
return 1;
else if (o.i < i)
return -1;
else
return 0;
}
}
class SortInt {
public static void main(String args[]) {
TreeSet <RInteger> s = new TreeSet<RInteger>();
int i;
int rand = 10;
for (i = 0; i < 1000000; i++) {
s.add(new RInteger(rand));
rand = rand * 1103515245 + 12345;
}
Iterator <RInteger> si;
for (i = 0, si = s.iterator(); si.hasNext(); i++)
if (i % 100000 == 0)
System.out.println(si.next());
else
si.next();
}
}
C++
#include <set>
#include <iostream>
using namespace std;
class RInteger {
private:
int i;
public:
RInteger(int v) { i = v; }
friend ostream& operator <<(ostream& o, const RInteger &ri);
inline friend bool operator <(const RInteger a, const RInteger b);
};
ostream&
operator <<(ostream& o, const RInteger &ri)
{
cout << ri.i;
return o;
}
inline bool
operator <(const RInteger a, const RInteger b)
{
return (b.i < a.i);
}
main(int argc, char *argv[])
{
set <RInteger> s;
int i;
int rand = 10;
for (i = 0; i < 1000000; i++) {
s.insert(RInteger(rand));
rand = rand * 1103515245 + 12345;
}
set<RInteger>::const_iterator si;
for (i = 0, si = s.begin(); si != s.end(); i++, si++)
if (i % 100000 == 0)
cout << *si << '\n';
return 0;
}
As you can see the two programs are very similar. Both use the language's abstraction facilities to store the integer values in an object, and specify its ordering within the container. I did not use the library's random number generator, but used the same one on both programs to obtain repeatable results. Differences in the container implementation could affect the program's running speed, but I claim that, because the two containers offer similar functionality, their comparison is fair game. Also note the different implementation of the integer comparison method. I was forced to use the if / else sequence in Java, because simply returning the difference between the two integers triggered a bug in the TreeSet code, and made the program produce different (erroneous) results. However, the change did not significantly impact the program's performance.
The Results
Predictably (for me at least) the Java program run three times slower than the C++ code. I sent the results to a student who in our discussions is an advocate of Java's performance, and he performed the following experiments.
- He replaced TreeSet with HashSet, but the program ended up spending most of its time in hash code.
- He stored the objects in an array, but the Java program became 10 times slower.
- He also instructed the C++ compiler to optimize the C++ code for Pentium 3 instructions (I failed to specify that), and the C++ code doubled its speed.
- Could the System.out.println be doing something very inefficient?
- Try a different VM.
- Implementation differences between stl::set and TreeSet
- Non-aligned code and cache misses.
The Difference Explained
I wrote the two programs with an explicit goal to demonstrate how Java's design makes it inherently slower than C++, and I succeeded. I was however surprised that the bottleneck was not readily apparent to the (very clever) people who looked at the two programs.
An explicit goal of C++ is that programs should only pay the cost of the facilities they require. In the program I wrote, the methods associated with the RInteger objects can be readily found at compile time. Therefore, there is no reason to pay the storage and processing costs of dynamic dispatch. Each C++ object is stored as an integer, and the methods associated with it, are as efficient as a C function taking integer arguments. With some minor tweaks I could recompile an operating system kernel written in C with a C++ compiler, and the system would run with exactly the same speed. If I wanted to encapsulate some operations of an operating system's abstraction (such as an inode) in an object to make the code more maintainable, I could do that, and, again I would not pay a performance price for it. If I used dynamic dispatch for methods that were already called in this way in C (think of vnodes), again, I would pay a price, but it would be the same as the price of the C code (a function call through a pointer).
In addition, the C++ STL library is cleverly designed to take advantage of this fact. The way templates work in C++ ensures that when we use a container for a specific object the code is explicitly optimized for the properties of that object. This makes the code extremely fast, but also results in code bloat.
Java has taken a different direction, forcing us to pay the costs of the abstraction mechanisms it offers, whether we need them or not. Therefore, all the RInteger objects are boxed and associated with with the class that produced them, occupying additional memory space, and requiring at least one level of indirection to access their methods. In addition, the Java containers are written to operate any Java objects, and can not easily take into account the properties of the specific objects (note that I marked the Java class as final, to provide the compiler a hint that the specific object would not be subclassed).
The way Java manages memory is also another contributing factor. While I agree that garbage collection can in some cases be more efficient than the explicit new/delete memory management of C++, the problem with Java is that it ignores (with a significant cost) two other ways to manage memory, both safe and very efficient. In C++ objects can be allocated on the method's stack frame (and automatically deallocated when the method exists), and can also be allocated in static memory. In this case the RInteger objects are first created in the method's stack, before getting passed to the insert method. In Java we also pay the runtime and space performance cost for allocating all those objects on the heap.
Concluding
So is the Java program destined to be always be slower than the C++ one? Not neccessarily. I could envisage a Java compiler that would create a separate specialized and highly optimized instance of the RInteger class and the TreeSet container to take into account the reduced requirements of the RInteger object. (An unoptimized version of the same code would still be needed for the case where RIntegers were used in a different context.) However I don't think this will happen anytime soon, if ever. The analysis required for performing these optimizations is very deep (it would have to cover the complete space of a running application), and made more difficult with Java's reflection facilities. In C++ it is considerably simpler to generate efficient code, so the performance will always stay one step ahead.
On the other hand, performance is not everything. The syntax of C++ is in cases areas arcane, and the way the language's elements interoperate can sometimes perplex even a guru. Pointers and explicit memory management have tortured and confused novice programmers for years, so there is a lot to be said in favor of Java's simplicity. I believe there is a clear ecological niche in the space defined by the simplicity of Java and the efficiency of C++. Stay tuned!
Check also the update on this entry.
Read and post comments