This story starts back in 2008 when I bought an Asus M3A78 Pro motherboard with an AMD 4850e processor. I was very happy with it until a few weeks ago, when I started streaming 1080p video to my Playstation: the processor couldn't keep up. I needed an upgrade. I was looking into the, more expensive, AMD e-series processors because of the low TDP values. However, last week I realised I was too focussed on the TDP values. Those values don't say much about the idle power consumption. In fact I read about the Phenom II processors with C3 stepping which have a better power management... Then I decided to go for the AMD Phenom II X4 965 Black Edition (which means: no multiplier lock) instead of the (slower, as power efficient marketed, and in my country as expensive) Phenom II X4 905e.
4850e vs. 965 BE
The 4850e is an Athlon X2 processor and the 965 BE is a Phenom II X4. But, both processor types don't differ much, technically. Okay, an Athlon X2 has two cores versus four on the Phenom II X4. And the Phenom II has L3 cache, but that doesn't always mean a higher performance. A difference, however, is the speed of the HyperTransport. 1000 MHz for the Athlon X2 versus 2000 MHz for the Phenom II (at least the 965 BE). With that in mind I figured when disabling two cores and clocking it back to the speed of the 4850e it would be almost as power efficient. But it wasn't that simple...
Well, at the time I bought the 4850e the processor was also overdimensioned in the system. I didn't need 'all that power'. But now it is even not powerful enough for 1080p video transcode. With that in mind I figured, I now only need 2 cores, the extra cores I can use in the future when needed
When I first booted I saw that the 965 drew 12~15 watts more than the 4850e, at idle... Disabling two cores in the bios didn't help, clocking it back didn't help... nothing.
When I saw all the BIOS options about NB (NorthBridge) and HT (HyperTransport), I really didn't know what to do with it: in my time there was only a FSB speed an a multiplier. Nothing more . I first needed to learn more about the Phenom II.
The Phenom II architecture
I surfed around the net, read some white papers and retrieved the most valuable information from fora to assemble this picture. It's an architectural view of the Phenom II processor with the chipset of my motherboard.
There are different buses, with different speeds. They can all be controlled by setting a multiplier in the BIOS. This multiplier times the reference speed (or FSB, also configured in the BIOS, typical 200 MHz) is the bus' speed.
There is the processor speed, which is the speed the processor internally runs on
There is the HyperTransport or HT speed which is the speed the processor communicates with the chipset (NorthBridge) of the system. The NorthBridge connects to the PCI Express bus to communicate with the videocard(s) or onboard video chip. In my case the chipset has an integrated video chip.
There is the NB bus speed, this is typical for the Phenom processors which have an NorthBridge controller integrated. It's the speed the cores intercommunicate with each other (if I;m corrent) and the speed on which the cores communicate with the L3 cache and the NorthBridge controller. A very important note: the NB speed should be equal or higher than the HT speed, because of the simple fact that it otherwise can't handle the data coming from the NorthBridge.
And there is the good and old memory bus speed, on which the memory runs on.
When looking to this picture I realised that for a server you don't need a high HT speed, because video performance is not important (with a small side note: if you have a onboard Nvidia chip you can use it with a special API to do videoencoding/decoding for example). And the HT speed probably isn't a bottleneck for harddisk throughput either. With that in mind I started to configure the BIOS.
I regularly do some disk transfer tests on my 5-disk (software-)RAID5 array. The number fluctuate because I do the test while the RAID array is mounted. But anyways.
Over 2009 I gathered the following numbers (MB/s, with hdparm -t). That is with HT speed at 1000 MHz and with the 4850e processor.
Currently I get these numbers (HT speed @ 400 MHz with 965 BE processor):
It's not a very representative test, but at least the performance didn't drop remarkably.
I have an Asus M3A78 Pro motherboard, but BIOS options in AM2/2+/3 motherboards of other brands or types are similar. I started to disable everything I'm not using and lowering the HT speed.
Here is an overview of changes I made to the bios. The names of the options are in italic. Below there is a link to screenshots of all my BIOS settings.
I enabled Cool 'n Quiet (this enables frequency throttling)
I enabled C1E (this allows the cores to shutdown when idle, the C3 stepping has a hardware implementation without the performance penalty the pre-C3 CPUs are suffering from)
I disabled the audio chip
I didn't disable the serial chip (I am using it for communicating with the UPS)
I set the HT speed to 400 MHz (with 200 MHz my system didn't boot)
I left the NB speed to auto (I don't want to lower my L3 cache speed)
I upped the scrub times of my ECC memory (the lower the scrub times, the more power it consumes)
I enabled ACPI 2.0
I set the fan control to silent
I lowered the graphics clock (GFX Engine Clock) to 150 (lowest setting)
I disabled the NB Azalia (has something to do with sound)
Note that I didn't undervolt anything, nor did I disable cores at this moment. I do that from Linux.
Please note that in some BIOSes the HT and NB speed is given in MHz, rather than in multipliers. When you increase the reference speed (FSB) of your board (typical 200 MHz) you have to take this into account.
For example the HT speed setting in my BIOS is called CPU-NB HT Link Speed and has settings from 200 to 2200. 200 means multiplier 1, 400 means multiplier 2, etc. So if you set this to 2200 and your FSB to 225, this actually means dat the HT speed will be 11x 225 = 2475 MHz.
As promised, here are the screenshots of all my BIOS settings.
The Phenom II has four power states: P0, P1, P2 and P3. P3 is the lowest powerstate on which the system runs as slow as possible. 800 MHz is typical. P0 is the highest state on which the processor runs on its max (3400 MHz in case of the 965 BE). Phenom II processors before the C3 stepping could only throttle the core frequencies for all cores simultaneously. When one core is occupied, the other core were also running on the same frequency. This has been fixed in the C3 stepping of the Phenom IIs.
For the 965 BE the frequencies for the different powerstates are:
P0: 3.40 GHz @ 1.4V
P1: 2.70 GHz @ 1.3V
P2: 2.20 GHz @ 1.2V
P3: 800 MHz @ 1.1V
With the Linux tool k10ctl the frequencies and voltages of the cores can be set for each different power state (for Windows there's k10stat). I currently use the same frequencies as in the table above, but lowered the voltage for each powerstate with 0.1V. To allow frequency throttling, note that the selected governor (man cpufreq-set) should be either 'powersave' or 'ondemand'. I prefer 'ondemand' which shows a better maximum performance that 'powersave'.
The integrated NorthBridge also has powerstates: P0 and P1. Instead of a multiplier it uses a divider (or divisor, what is the correct name?). In P0 the NorthBridge runs on full speed, in P1 it runs on half the speed. Half the speed of what the NorthBridge is configured to in the BIOS, that is. With k10ctl you can also configure this for each CPU powerstate. However, it seems that this divider and the NB voltage cannot be changed after the system has booted.
Because of the fact that k10ctl is a bit difficult to use I wrote a wrapper script for it. Use k10ctl and my wrapper script you can easily create different profiles. For example:
My 'allround' profile (k10_allround.sh):
# P3:800 MHz
k10ctl_wrapper 3 900 4 > /dev/null
# P2: 2000 MHz
k10ctl_wrapper 2 1100 10 > /dev/null
# P1: 2800 MHz
k10ctl_wrapper 1 1200 14 > /dev/null
# P0: 3400 MHz
k10ctl_wrapper 0 1300 17 > /dev/null
My 'performance' profile (k10_performance.sh):
# P3: 2.8 GHz
k10ctl_wrapper 3 1300 14 > /dev/null
# P2: 3.2 GHz
k10ctl_wrapper 2 1350 16 > /dev/null
# P1: 3.8 GHz
k10ctl_wrapper 1 1450 19 > /dev/null
# P0: 4.0 GHz
k10ctl_wrapper 0 1550 20 > /dev/null
The nice thing with k10ctl is that you can select a frequency and voltage per powerstate. When I change the voltage or frequency from the BIOS, it only changes it for P0.
In Linux you can also disable cores on the fly. I'm still experimenting with it, but it brought me some kernel Oops errors and vague behavior. To disable a core:
To enable a core:
I don't know what happens if you disable a core that is actually doing some work... It seems to work however. A side note however: idle power consumption is not decreased when I disable one or more cores. Maybe because of the fact that the C1E in the 965 BE C3 does it's work correctly and there isn't much to save left.
I didn't do any scientific tests on this. Also, my system has a RAID5 with 5 harddisks + 1 spare (in standby). So comparing it to other systems would be difficult. I did some tests however, the results can be found here.
I can tell you that my server with 4850e (including RAID5, network switch and UPS overhead) drew 105 watts idle (measured with the UPS). Now with the 965 BE it draws 108 watts idle. That's with 400 MHz HT speed and the k10_allround profile mentioned above.
AMD Performance Tuning Guide
At the end of this article I've written my finding when using k10ctl
Here is more information about my home Linux server (Dutch)
Last update notes
Latest scripts can be found here, no support
Was this article useful to you? Please help me by using the toolbar below to tweet the article, give it an eKudo or add it to Hyves, Facebook, Delicous, Digg or another site. Thanks!
Comments on this articleThere is 1 comment
PS: Your comments section is a bit strict as I can't type my name correctly since it uses a ç and the filters don't allow this.