Valve’s Multithreaded Programming

the guys over at valve are the shit. they’ve apparently figured out a hybrid threading model (combining coarse and fine-grained threading) to actually pull some crazy power out of those multi-core processors.

In one demo, over 500 tiny critters maneuvered around fire and complex obstacles, even tipping over a crate with their combined weight (physics calculations can also be multithreaded). The demo was run on a 2.6GHz Kentsfield CPU with four cores and 2GB of RAM. On a single-core 3.2 GHz Pentium 4, fewer than 100 critters could run around at the same frame rate, which looked much less impressive.

yeah. they are pulling mostly linear power out of it. so, it’s actually like having 4 processors instead of one.

what else has a killer multi-core-crazy processor in it?

As far as console ports go, Gabe mentioned that Valve is already putting their hybrid threading technology to good use for their Xbox 360 projects, but could not comment about its applicability to the PS3 as they were not doing any PS3 ports themselves and had no PS3 systems in their building at the time. One of the Valve programmers did mention that the PS3’s architecture is not quite as suited to their frameworks due to its asymmetric approach and the fact that the SPU (Synergistic Processing Units) could not directly access main memory.

it’s a bummer the ps3 hardware designers can design ce products but not “computers” and for sure not well enough to know what needs to happen for software over the next 5 to 10 years to unlock this “potential” they’re always flailing about.

how long before the other commercial engines start working through this stuff. mad ups for valve.


UPDATE:  i should note, too, that the above “demo” is a 2.6 GHz processor with 4 simultaneous threads.  note that it “roasts” a single threaded 3.2 GHz processor.  since i didn’t explicitly state it, the xbox 360 has 6 (not 4) threads running at 3.2 GHz (not 2.6 GHz).  so, basically, better than all of the above in their pc demo.


  1. Factory on

    “so, basically, better than all of the above in their pc demo.”

    Erm, no. The next (?) generation of consoles all (except possibly the Wii) have in-order processors, which means that despite being clocked rather high, these cpus are actually pretty slow. Quad core x86 cpus should easily be able to beat all of them.

  2. m3mnoch on

    i see where you’re coming from, but the xenon has 3 cores (albeit, not 4 like a quad), each independent with it’s own 2 in-order threads. 6 threads total. 3 pairs of 2 in-order threads — not 6 in-order threads. so, overall, there are still 2 additional execution paths to be used.

    so, yeah, in theory, still faster. especially since it’s clocked faster.


  3. Factory on

    Erm, no. Even considering the two threads per core, the xenon will still be slower. You are underestimating the gains that being an OoO architecture give you.

  4. m3mnoch on

    not really. you are OVERestimating the difference between 3 cores and 4.

    so, in ibm’s benchmark tests for multithreaded (i.e. games — especially the code valve is putting together) applications, having 2 threads on a single processor (xenon’s cores are essentially hyperthreading) can net you speed performances up to 30% or 50% depending on the application.

    what does that mean? time for the rough back-of-the-envelope numbers…. it means one xenon core is 1.4-ish of a regular 3.2 GHz processor.

    so, 3 dual-threaded cores running at 1.4 times the performance is about the same as 4.2 single threaded 3.2 GHz cores — given the linear gains valve’s talking about.

    4.2 > 4.

    oh…. and there’s that 19% clock speed increase over the kentsfield to deal with too.

    so, not only is this false “Quad core x86 cpus should easily be able to beat all of them,” but, it’s arguably faster.

    that being said, i’ll totally rethink my position if you had some numbers or thoughts or … well … anything other than a second-grade “nuh-uh” to back up your argument.

    my way, obviously, is the way i understand it to work.


  5. Factory on

    Erm, you are assuming that clock-for-clock that an in-order cpu does as much work as an OoO cpu.

  6. m3mnoch on

    erm, you are assuming things like streaming media and vertex calculations don’t fill over 80% of the execution path.

    it’s the same reason powerpc hardware rips shit over x86 hardware during photoshop filter benchmarks. deep, not wide pipelines.


