SiCortex was an inspiring company - they had a novel chip-to-chip networking architecture that boasted improved latency and bandwidth for MPI programs. They packed their processors densely, and instead of making rack-mount servers they optimized their own custom chassis for cooling. Their largest system unit had over 5,000 cores and could fit within the power budget of the typical office without any power or cooling retrofitting - a very exciting proposition and I don't think I was the only one who dreamed of bringing one home.
The system was marketed as a power efficient supercomputer, a niche that seems like a good idea to target since there is such a large margin by which commodity servers can be beat in that arena, with the right architecture. Low-power cores coming out of ARM and Tensilica inspire thoughts of how computer systems could incorporate such efficient cores in a useful way.
In 2007, about a year after SiCortex installed their first systems, the best SiCortex system was "only arguably" more power efficient than the latest Intel servers - meaning a significant advantage on some common cluster workloads wasn't obvious. For example, in the double-precision GFlops arena (often times not a representative benchmark but used here for simplicity), SiCortex provided 5832 cores, each capable of 1GFlops, yielding 5.832 TFlops. The power consumption of the system was about 18 kilowatts, resulting in 324 GFlops per kilowatt. The 3Ghz Core 2 Quads that could fit in 150-watt servers in 2007 were putting out 48GFlops (4 ops/cycle in SIMD, 4 cores, 3ghz). That's 320 GFlops per kilowatt, reducing the SiCortex advantage to a rounding error.
The GFlops comparison is not fair - the SiCortex architecture had a lot of advantages outside of GFlops, like much lower penalties for cache misses, higher memory bandwidth per compute cycle, somewhat lower penalties for branch misprediction, etc. The comparison above also does not take network power consumption into account, and the PC network would have delivered lower performance for latency or bandwidth-bound problems. These advantages for SiCortex would have been more compelling if the floating point power efficiency had at least a 2x-4x advantage over Intel at the time, which could be perceived as a 2-year to 4-year advantage over commodity servers as a minimum. As it was, it was easy to think of workloads (e.g. SIMD floating-point bound workloads) that gained no power efficiency advantage on the SiCortex hardware, which starts the power efficiency story on the wrong foot.