cjameshuff

2012-Nov-26, 01:42 PM

I've done some some benchmarks comparing common approaches to doing vector math, particularly in relation to 3D geometry:

https://github.com/cjameshuff/vectests

It's a bit messy (various alternate tests/implementations are commented out) and nowhere near comprehensive, but gives some useful results. There's little advantage to using arrays: C++ has little or no overhead for structs/classes, given the right constructors, proper parameter passing by reference, and as long as virtual functions aren't used.

However, there is an advantage to using the built-in (and non-standard) primitive vector types, even over a vector class using SSE. I haven't looked at the resulting code in detail, but I think this is largely due to the built-in vectors being passed to functions via the SSE registers, while the contents of the vector class have to be loaded into a register before use. And Clang can apparently emit fairly good SSE code for those vectors, though hand optimization using SSE intrinsics can occasionally be a substantial benefit.

SSE is clearly not designed for doing general vector math, it's designed to do parallel operations on scalars. It's quite weak at "horizontal" operations like adding up the elements of a register, though recent versions have added a few horizontal operations that work on pairs of elements and floating point dot product instructions. It's also hampered by only being 128 bits wide, which means double precision math can only work on 2 doubles at a time. The new AVX instructions do better, doubling the register width to 256 bits...and someday I'll have hardware that can use them.

https://github.com/cjameshuff/vectests

It's a bit messy (various alternate tests/implementations are commented out) and nowhere near comprehensive, but gives some useful results. There's little advantage to using arrays: C++ has little or no overhead for structs/classes, given the right constructors, proper parameter passing by reference, and as long as virtual functions aren't used.

However, there is an advantage to using the built-in (and non-standard) primitive vector types, even over a vector class using SSE. I haven't looked at the resulting code in detail, but I think this is largely due to the built-in vectors being passed to functions via the SSE registers, while the contents of the vector class have to be loaded into a register before use. And Clang can apparently emit fairly good SSE code for those vectors, though hand optimization using SSE intrinsics can occasionally be a substantial benefit.

SSE is clearly not designed for doing general vector math, it's designed to do parallel operations on scalars. It's quite weak at "horizontal" operations like adding up the elements of a register, though recent versions have added a few horizontal operations that work on pairs of elements and floating point dot product instructions. It's also hampered by only being 128 bits wide, which means double precision math can only work on 2 doubles at a time. The new AVX instructions do better, doubling the register width to 256 bits...and someday I'll have hardware that can use them.