Friday 22 July 2011

cpu visualisation

Its quite interesting to contemplate different ways of looking at
things.

I have an Intel i7 machine - its fast (its a laptop, so it could be
faster if I had a desktop CPU).

Linux provides a lot of raw data, but one thing that "top" lacks is
more detailed info. There are display widgets for KDE and GNOME
which help you visualise cpu load, but this display shows something
interesting:


last pid: 4792 in: 4448 load avg: 1.28 0.71 0.43 23:21:45
CPU: 8(HT) @ 2.00GHz, proc:231, thr:464, zombies: 1, stopped: 5, running: 3 [t
dixxy: 7.3% usr, 0.1% nice, 1.5% sys, 84.6% idle, 6.4% iow, 0.1% sirq
RAM:7918M RSS:0K Free:303M Cached:1913M Dirty: 664K Swap:225M Free:7878M
cpu
Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
usr nice sys idle iow irq sirq steal guest gnice
CPU0 8.4% 0.0% 2.6% 73.6% 15.2% 0.0% 0.8% 0.0% 0.0% 0.0%
CPU1 63.6% 0.0% 1.8% 21.0% 14.2% 0.0% 0.0% 0.0% 0.0% 0.0%
CPU2 0.2% 0.0% 1.0% 99.2% 0.0% 0.0% 0.2% 0.0% 0.0% 0.0%
CPU3 2.4% 0.0% 1.0% 97.0% 0.4% 0.0% 0.0% 0.0% 0.0% 0.0%
CPU4 0.0% 0.2% 0.2% 101.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
CPU5 0.0% 0.0% 0.2% 100.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
CPU6 0.2% 0.0% 0.8% 99.4% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
CPU7 0.0% 0.2% 0.6% 98.6% 1.2% 0.0% 0.0% 0.0% 0.0% 0.0%

MHz Cache Bogomips
CPU0 2001.000 6144 KB 3990.88
CPU1 1400.000 6144 KB 3990.92
CPU2 800.000 6144 KB 3990.97
CPU3 800.000 6144 KB 3990.93
CPU4 800.000 6144 KB 3990.96
CPU5 800.000 6144 KB 3990.96
CPU6 800.000 6144 KB 3990.94
CPU7 800.000 6144 KB 3990.98


The info is taken from /proc/cpuinfo (this is the "proc" utility - available
at my website; run it and type 'cpu' at the command line to see this
display).

Note that CPU0 is running at 2GHz - to be expected, although slightly strange.
Its strange because this represents the cpu that the proc command
is instantaneously running on. It doesnt use much cpu, but the cpu
has adjusted the clock to give it speed. (Note that, as an i7, this
CPU should be able to ramp up to 2.9GHz but I havent seen evidence in
/proc/cpuinfo this occurs).

Note also that cpus 2-7 are idle (800MHz is the lowest speed
without actually sleeping).

CPU1 is running at 1.4GHz - I have a backup job running in another
window. The question is - *what is cpu1?* I presume its the
hyperthreaded cpu, and therefore should run slower than cpu0. Ideally,
jobs should run on: cpu0, cpu2, cpu4, cpu6, cpu1, cpu3, cpu5, cpu7, in that
order.

The question in my mind - what is hyperthreading -- is it an attribute
of the cpu, which is fixed, or does it meander from one cpu to another.
If the hyperthreaded sibling is solely virtual, then one can deduce
that for this system, we should get unequal performance as the 5th cpu
is made to do work.

I just did a test (seeing how many "counts" we can do per second), and
ran 5 of them in parallel. Certainly, one of them was not as busy as
the other 4. [This was not a good test, since the counter-loop doesnt
exercise cache-misses and hyperthread ability, but solely relies
on the Linux scheduler to run the processes].

Definitely requires more investigation to understand the effects.

Post created by CRiSP v10.0.12a-b6036


3 comments:

  1. What is hyperthreading? I'm sure this has changed in recent years, but here's what I know about it.

    CPUs began with simple architectures where a single command occupied the entire cpu until it was completed. As though a supermarket could only allow one customer with his/her list (command) at a time in the building.

    In order to increase throughput of the instruction stream, CPUs became "micro coded", where a single external command was executed as a sequence of simpler internal micro commands. This allowed the advent of the pipelined architecture where parts of several commands could be in-process simultaneously as they moved through a fixed sequence of cpu functions (one command could be doing a memory fetch, while another was in the ALU, and a third was working in the floating point unit). Unfortunately, not every command needed every processing unit, so while you were pushing more commands through the CPU with each clock tick, a lot of the processing units were still idle (a memory fetch didn't need the FPU or ALU, even though they were required stops in the pipeline).

    In our market analogy, it's as though a supermarket was arranged as one long aisle-- canned-goods, frozen-foods, butcher, dairy, bakery, and produce in sequence, and customers could enter with their list (command) as they pleased, but had to walk in a single line through all sections to reach a single cash register.

    In current supersymmetric architectures, the pipeline is gone and the number of processing units has increased. There may be several FPUs, branch units, memory interfaces and ALUs in the CPU. Each cpu micro command arrives at a dispatch station which determines what unit it needs and dispatches it to that unit's command queue. The commands don't have to pass through a rigid pipeline and so, for example, the FPU's queue only contains commands that need to use it.

    This is the modern supermarket, where customers can come in the door, and go to just the sections they need to satisfy their list and head to the cashier.

    But in a CPU, you can't have multiple cashiers. Results must come out of the CPU in the same order that the commands went in. Our supermarket metaphor breaks down a bit here. So we have a single cashier, and the customers take a ticket when they walk through the door, which is their reservation for the cashier. This way they will leave in the same order they arrived, even the first customer shops for a week's worth of groceries and the second only needs a carton of milk.

    Even with this architecture, many of the processing stations in the CPU go idle, so hyperthreading was invented--a second stream of commands into the processor, with its own dispatch station and its own output stream. Both streams share the resources of the CPU--with some contention, but over all it works quite well.

    In the supermarket analogy, we have opened a second door inside and added a cashier. Customers who enter by the new door get a reservation for the new cashier. Except for contention at the butcher and bakery, customers pass from their door to their cashier as though they were in separate supermarkets.

    So hyperthreading is real. There are second instruction and output streams for the chip. Using the second processor increases the overall internal utilization and throughput of the chip, however it does have the potential to reduce the throughput of the first stream.

    ReplyDelete
  2. Thanks Jim. The question I have in my mind is : can you distinguish the primary cpu from the hyperthread -- or is this symmetric? In your analogy, they are different and static. I am not sure how to validate which model is appropriate (and the model may change from one model of cpu to another).

    ReplyDelete
  3. I expect it's symmetric, but I don't really know the answer to that. I'm surprised that the two streams can clock at different frequencies, but now that I think about it there's plenty of stream maintenance circuitry that can have an independent clock. As I said, it's been several years since I dealt with this.

    As for CPU allocation models, for outright performance, each core should have one stream (cpu) through it until you run out of cores, and then start allocating the second cpu on the cores. For power saving, the cores should have both cpus filled before moving to the next core--that is, if you can tell which cpu is is operating on which core. I suppose that's part of why you're asking how to tell them apart. I haven't personally succumbed to the lure of the i5 or i7 CPU--mostly because I've spent my mad money lately on other (tablets) toys--but we're starting a machine upgrade cycle at work and I ought to have something like a desktop i7 before too long.

    ReplyDelete