21.7.10

VIA Epia Homeserver Project - HDD Power Saving

The power consumption of a hard disk cannot be neglected in a system designed to be of low consumption. For the operating system, a flash based drive (SSD or CF disk) is a good solution, yet for high volume storage this is usually not affordable. Recently, hard disk drives in the terrabyte range especially designed to be power efficient have been announced such as Western Digital's Greenpower series or the Ecogreen drives of Samsung. Their offer is  a substantial decrease of power consumption (up to 40% claimed) at the expense of somewhat slower operation. This is achieved for instance by spinning only at 5400rpm instead of 7200rpm, which is still reasonable for media storage or backup and even for office applications.

This tutorial is on setting up a Samsung Ecogreen HD154UI drive in my Via Epia homeserver. The goal was to make use of the power saving features that the drive has to offer using the native linux tool hdparm.
This is a very powerful (and therefore dangerous!) program that can read or set many parameters of PATA or SATA drives in order to tune performance, power consumption, or even hot swap disks. As of the writing of this tutorial, I use hdparm v8.9, note that syntax might change later on. It is also important that the proper driver for the PATA and SATA chipset is either compiled in the kernel or loaded as a module.

All functionality of hdparm is available through command line, yet it is better to use /etc/hdparm.conf to adjust parameters, as it will make those effective after each system boot. Mine looks like this:

/dev/sda {
    dma = on
    interrupt_unmask = on
    mult_sect_io = 16
    write_cache = on
    transfer_mode = 70
    io32_support = 1
    apm = 1
    spindown_time = 60
}

with dma = on meaning DMA instead of PIO (this is the default anyway). Interrupt_unmask = on allowing the system to process multiple IRQs at the same time. mult_sect_io = 16 determines the number of simultaneously transferred blocks (safe to set it to its maximum value). Write_cache = on tells the system to cache write operations. This might be dangerous if a power-cut occurs, yet usually is a good idea to set. Transfer_mode = 70 sets UDMA6. Other UDMA, DMA, PIO settings can be found here. io32_support = 1 is quite straightforward and should be set as well. 

The rest are the power saving parameters: apm = X sets how aggressively the drive would enter power saving modes with 1 equals the lowest consumption and 255 means no power saving and maximum performance. Note that X being equal to 128 or above, no spindown is allowed. As I use my disk mostly for storage, I chose the lowest power setting, X=1. Spindown_time sets the length of inactivity before the drive spins down in 5 sec units, so that 60 means 5 minutes timeout. If you experience that you have to wait often to the drive to spin up, consider setting this parameter higher. 

It is very important to note that not only disk I/O but any disk operation including checking drive health status using smartmontools will cause the drive to spin up. This means that periodic checking of drive health or temperature will render this setting essentially useless.

If hdparm.conf is set up correctly, the new parameters will be effective upon the next boot and can be checked:

~#: hdparm -i /dev/sda

/dev/sda:

 Model=SAMSUNG HD154UI                         , FwRev=1AG01118, SerialNo=S1XWJ1KZ300133     
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=18446744072344861488
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-3,4,5,6,7

 * signifies the current active mode


20.7.10

VIA Epia Homeserver Project - CPU Frequency Scaling

Even though the VIA Epia platform is designed to be very power efficient out of the box, it can be made even better. One tool to do this is the CPU frequency scaling that is supported by both recent linux kernels and the Eden CPU. Specifically, the processor of the EN12000EG board runs at 1200MHz by default, but can go down to as low as 400MHz, resulting not only in reduced power consumption, but decreased heating as well. This especially comes handy for fanless systems, eventually resulting in lower die temperature and longer life span.

System setup

In order to utilize this feature the following kernel modules have to compiled (=Y), or installed as module (=M):
X86 CPU: CONFIG_X86_CPU=y 
VIA C7 CPU: CONFIG_MVIAC7=y

ACPI: (not sure if these are necessary, yet they won't hurt...)
CONFIG_ACPI=y 
CONFIG_ACPI_PROCESSOR=y

CONFIG_X86_PCC_CPUFREQ=y
CONFIG_X86_ACPI_CPUFREQ=y

(Note: # CONFIG_X86_E_POWERSAVER is not set because it is considered to be dangerous as no hardware limits are taken into account while tuning the CPU clock. However, I did not find any remarks nor observations on this driver).

CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

The options above sets the Governors that can be used later on. As you can see, there are five alternatives, Performance, Powersave, Userspace, Ondemand and Conservative.
The character of each governors  are the following:
  •  Performance: simply sets the highest available CPU frequency. Usually this is the default in stock kernels.
  • Powersave: sets the lowest available frequency no matter what the system load is.
  • Userspace: the CPU clock can be set manually thorugh the /sys interface
  • Ondemand: switches between the highest and lowest frequency according to the system load. The corresponding thresholds can be set via the the /sys interface. 
  • Conservative: such as above, although it can set intermediate frequency values as well. The trade-off is higher latency.
The default governor is quite straightforward to set, here I chose Conservative. In this tutorial, only this governor is discussed in detail, however, most of the arguments are straightforward to use for the others as well.

Test results

Upon booting with the new kernel and/or loading the corresponding modules, the influence of the governor is already visible (remember, Conservative governor is set):

#:  cat /proc/cpuinfo
processor    : 0
vendor_id    : CentaurHauls
cpu family    : 6
model        : 10
model name    : VIA Esther processor 1200MHz
stepping    : 9
cpu MHz        : 400.000
cache size    : 128 KB
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx pni est tm2 rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
bogomips    : 797.99
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 32 bits virtual
power management:

showing that the CPU clock is set to 400MHz. If, however, some CPU intensive process is started (compiling kernel or some application is a good check :)) then increased demand results in increased clock frequency:

#:  cat /proc/cpuinfo
processor    : 0
vendor_id    : CentaurHauls
cpu family    : 6
model        : 10
model name    : VIA Esther processor 1200MHz
stepping    : 9
cpu MHz        : 1200.000
cache size    : 128 KB
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 1
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx pni est tm2 rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
bogomips    : 2393.89
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 32 bits virtual
power management:

Detailed statistics is available via the /sys interface:

#: cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
1200000 38445
400000 42604408

which tells us both the available frequency values (in kHz) and the corresponding time that the CPU spent in that state in 10ms.

#: cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table
   From  :    To
         :   1200000    400000
  1200000:         0        51
   400000:        50         0 

This command provides a nice matrix in which the starting and end frequency for each transition is observed. Note that as the system boots up using the higher value, the difference of one means that now the clock is equal to the lower value.

The current frequency can be obtained as above by interrogating /proc/cpuinfo, however /sys tells it as well:

#: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
400000

The operation of this governor can be tuned via the nodes in /sys/devices/system/cpu/cpu0/cpufreq/conservative (followed by their default values):

down_threshold (20)
freq_step (5)
ignore_nice_load (0)
sampling_down_factor (1)
sampling_rate (200000)
sampling_rate_max (4294967295)
sampling_rate_min (200000)
up_threshold (80)

I however advise to be careful tampering with them. As each clock transition takes time the penalty can be serious performance degrade if the system is set to change frequency too often. On the other hand, increasing power consumption and heating is the result of too soft settings.

Conclusions

Using the Conservative governor is a reasonable improvement for my VIA Epia box, as the system now spends most of its time (over 99%!) in 400MHz instead of 1200MHz. In theory, however, some degradation of performance is associated with tuning the CPU frequency (as the clock transition takes time to occur), this was not observable for my system. 

Sadly enough, the governor was not able to tune the CPU clock continuously, it rather switched between the highest and lowest available values. Because of this, the Conservative and Ondemand governors essentially equivalent on this system. I am not sure if using CONFIG_X86_E_POWERSAVER instead of the default ACPI cpufreq driver solve this problem, this is to be tested.