Ever since I watched this video showing the Virtex-7 2000T running 3600 cores at 100mhz (daaaamn) I wanted to investigate the architecture and what kinds of performance per watt it could reasonably be expected to deliver running other kinds of soft cores. Initially I thought the only easy-to-use tool to estimate the power for a given level of utilization and frequency was the pocket power estimator. This got me to back up my iPhone and upgrade from a 2-year old version of iOS (shout out to iOS 3.1.2 users!) to one that is new enough to actually run current apps (yeah yeah, I know, don't put off to tomorrow what you can do today - but I don't want to clutter my PC with iTunes and the backup process takes hours! In the words of Gob from Arrested Development. "Yeah, the guy with the million dollar PC needs to wait three hours to backup his phone. COME ON!") - (for the humor impaired, my PC did not actually cost a million dollars :)
When I saw the XPE Xilinx Power Estimator spreadsheet my initial suspicion was that it would be difficult to use. I downloaded it anyway and am proud to say that it works quite well. You can bump the numbers for frequency, logic utilization, BRAM utilization, DSP (multiplier) utilization, etc.. It was a bit harder to estimate Flip-Flop (FF) usage - and toggle rate was hard to guess so I went with the default (12.5%). Dynamic power for the logic scales linearly with the toggle rate, which I was not figuring to be true. In contrast I had figured the miniature SRAMs inside each programmable logic module (64-bits in 6-input Lookup Tables (LUTs)) would perform a table lookup whether or not the inputs had changed at all. Thus, either the table lookup is not happening when the inputs stay constant, or it doesn't consume much power to do the lookups, or looking up the same value twice in a row doesn't use much, if any, power the second time.
I highly recommend the Xilinx Power Estimator. Given a design on a different generation of Xilinx product, it is fairly easy to guess how many distributed copies of the design could be spread across a different Xilinx product with X times as many logic gate equivalents (e.g. Spartan-3 4M -> Xilinx-7 20M is 5x increase). Scale the units such as BRAMs and DSPs by 5x, maintain the logic utilization and FF percentage utilization, and scale the frequency as is reasonable. Pop the numbers in the spreadsheet and you've got a power estimate. (In the words of Desmond from Fallout 3's Point Lookout, "Easy Peasy").
(Note that although IO is not considered here, that is an important aspect of scaling performance to a new FPGA. Still, many designs will find plenty of bandwidth on a superior new FPGA, and the power consumption of the IO is often small relative to logic power consumption.)