Latest test run
- Compilers and flags:
- GCC 4.4.1:
-O2 -march=core2 -ftree-vectorize -ffast-math -mfpmath=sse
- Visual C++ 2005, 2008 and 2010 beta 2:
/Ox /GS- /fp:fast /arch:SSE2 /MT
- Intel C++ Compiler 11.1:
/Ox /QxHost /EHsc
- Test system: Core2 Duo 2.13 GHz, 2 GB RAM.
- Operating systems: Vista 64 Ultimate for VC++ and Intel, Ubuntu 9.10 x64 for GCC.
- Results:
Older tests
Test description
- Bilinear C: samples an image with bilinear filtering. The image resolution is 1024x1024, 32 million random samples are taken.
- Bilinear SSE Inline: Bilinear filtering implemented with SSE1 as inline assembly. This code doesn't compile in 64-bit and doesn't work with GCC
(as it uses VC++ inline asm syntax). Same image as above, 82 million random samples.
- Bilinear SSE Intrinsics: Same SSE code as above, but written with intrinsics instead of inline ASM so that the compilers can show off their
scheduling-fu. VC++ doesn't compile it in 64-bit because it thinks MMX is evil on that platform. 82 million samples (same sample locations as above).
- Bilinear SSE2 Intrinsics: SSE2 bilinear filter, written with intrinsics. This one works with all the compilers which matter, on both 32-bit and 64-bit.
Same samples as the two tests above.
- Google Hash Insert: 600,000 inserts into a google::dense_hash_map<const char*, unsigned int>. The keys are 13 characters long and are compared
with strcmp().
- Google Hash Lookup: 6 million lookups into a google::dense_hash_map<const char*, unsigned int> containing 1 million entries. When a key is found,
the corresponding value is pushed into an std::vector.
- Linear search: 10,000 very unexciting linear searches inside an array of 100,000 C strings. The strings are compared with strcmp().
- Map Inserts: 3 million inserts in an std::map<unsigned int, unsigned int>. The map is cleared after each 100,000 inserts.
- Map Lookup/Delete: 5 million lookups into an std::map<unsigned int, unsigned int> containing 1 million elements. When a key is found,
the element is deleted. About 25% of the lookups are positive.
- Matrix Inplace: 53 million inverse-transpose operations on 4x4 matrices. To give the compilers the opportunity to demonstrate their NRVO and temporary
removal skills, the invert and transpose functions build a matrix which is returned by value and assigned to the original matrix, e.g.:
Matrix4 Matrix4::Invert() { ... }
Matrix4& Matrix4::InvertInPlace() { *this = Invert(); return *this; }
Matrix4 Matrix4::Transpose() { ... }
Matrix4& Matrix4::TransposeInPlace() { *this = Transpose(); return *this; }
// The test:
m.InvertInPlace().TransposeInPlace();
- Matrix-Matrix: 61 million 4x4 matrix-matrix multiplications. Nothing fancy in the multiplication operator.
- MtRand: 500 million random integers are generated using the Mersenne Twister algorithm.
- RB Construct: 5 million random integers are inserted into a red-black tree. The tree is cleared after each 1 million inserts.
- RB Find: 15 million lookups into a red-black tree containing 100,000 nodes.
- Raytracer double: The first single-file raytracer I could find with Google. Uses doubles. The scene is a group of spheres arranged in a
Sierpinski triangle. Only direct and shadow rays are traced, no bounces. Renders a 512x512 image with 4x supersampling.
- Raytracer float: The same raytracer as above, but with floats instead of doubles.
- Scaling Dumb C:. A 64x2143 image is scaled to 64x915 one thousand times. The code is a straightforward implementation of a box filter.
- Scaling Better C: Optimized C code for the same box filter as above.
- Scaling SSE2: SSE2-optimized version of the box filter.
- Triangulator: A polygon triangulator which uses monotone decomposition. The polygons in the test data set were produced by Lightwave
from the text "The quick brown fox jumps over the lazy dog" using a wacky font. 37 polygons, 7591 vertices, 2000 runs.
- Vector-Matrix: 245 million vector-matrix multiplications. 4x4 matrices, 4-component vectors with the 4th assumed to be 1.
- Vector-NormMatrix: 53 million inverse-transpose operations on 4x4 matrices followed by a vector multiplication. The inverse-transpose
is not in-place, i.e.:
Matrix4 Matrix4::Invert() { ... }
Matrix4 Matrix4::Transpose() { ... }
Vector operator*(const Matrix4& m, const Vector& v) { ... }
// The test:
Vector result = m.Invert().Transpose() * v;
so compilers can get clever about removing those temporaries.
- std::sort: An std::vector containing 150 million integers is sorted with std::sort().
Each test was ran 3 times and the best time was kept. Green cells mark the best time. Orange cells are within 3% of the best time. Red cells
are failed tests, i.e. the code didn't compile or didn't work correctly with that particular compiler.
Some tests aren't exactly apples-to-apples comparisons, as they use the STL and/or standard C library, and that code is different with
every compiler. I included them because the performance of the libraries which ship with the compiler is important.
The code for the tests is available here.
I can be reached at mihnea.balta@gmail.com for questions, comments etc.