Difference between revisions of "Script for Undervolt Stress Testing"
(MoveToCode) |
|||
(4 intermediate revisions by 4 users not shown) | |||
Line 3: | Line 3: | ||
People have many different tolerances for how far they will undervolt their system. Some are eager to just run their Pentium-Ms at 700mV and abandon safety; they ramp their systems as far as they can without crashing their system, and maybe they pull the voltages up a margin from the failure point. However, this provides only a weak degree of security as a number of failures can occur that might not surface immediately. In the worst case, the system will fail months later, and the blame might be assigned to, say, a kernel upgrade or patch when really the system failed due to intermittent lack of power. | People have many different tolerances for how far they will undervolt their system. Some are eager to just run their Pentium-Ms at 700mV and abandon safety; they ramp their systems as far as they can without crashing their system, and maybe they pull the voltages up a margin from the failure point. However, this provides only a weak degree of security as a number of failures can occur that might not surface immediately. In the worst case, the system will fail months later, and the blame might be assigned to, say, a kernel upgrade or patch when really the system failed due to intermittent lack of power. | ||
− | Many would like to guard themselves again such a failure and consequently have opted to run a prime number stress test such as [http://www.mersenne.org/prime.htm| MPrime] in a "torture test" mode, while they ramp down their voltages to find a comfortable margin from the failure point. However, as per recommendations from a [http://mailman.linux-thinkpad.org/pipermail/linux-thinkpad/2006-July/034806.html | + | Many would like to guard themselves again such a failure and consequently have opted to run a prime number stress test such as [http://www.mersenne.org/prime.htm| MPrime] in a "torture test" mode, while they ramp down their voltages to find a comfortable margin from the failure point. However, as per recommendations from a [http://mailman.linux-thinkpad.org/pipermail/linux-thinkpad/2006-July/034806.html thread of the Linux-Thinkpad mailing list], perhaps even more can be done. Following such advice, this script not only runs MPrime, but also toggles on and off a lot of power-demanding features of the laptop throughout the course of the test. The idea is to more rapidly expose corner cases in which the system might act up. |
{{NOTE|Please feel ''very free'' to improve/fix this script. My intent for its posting is to make its ownership as public as possible. There's no need to try to E-mail me to validate your changes. If you feel they are in the best interest of the public, just make the changes. The script attempts to employ pre-conditions to intelligently apply functionality only to those laptops that appear to support it. Hopefully, its framework will allow for extension without heavy redesign.}} | {{NOTE|Please feel ''very free'' to improve/fix this script. My intent for its posting is to make its ownership as public as possible. There's no need to try to E-mail me to validate your changes. If you feel they are in the best interest of the public, just make the changes. The script attempts to employ pre-conditions to intelligently apply functionality only to those laptops that appear to support it. Hopefully, its framework will allow for extension without heavy redesign.}} | ||
Line 494: | Line 494: | ||
</pre> | </pre> | ||
+ | <digg /> | ||
[[Category:Scripts]] | [[Category:Scripts]] |
Latest revision as of 12:20, 12 September 2008
This script helps in calibrating voltages when undervolting a Pentium M processor.
People have many different tolerances for how far they will undervolt their system. Some are eager to just run their Pentium-Ms at 700mV and abandon safety; they ramp their systems as far as they can without crashing their system, and maybe they pull the voltages up a margin from the failure point. However, this provides only a weak degree of security as a number of failures can occur that might not surface immediately. In the worst case, the system will fail months later, and the blame might be assigned to, say, a kernel upgrade or patch when really the system failed due to intermittent lack of power.
Many would like to guard themselves again such a failure and consequently have opted to run a prime number stress test such as MPrime in a "torture test" mode, while they ramp down their voltages to find a comfortable margin from the failure point. However, as per recommendations from a thread of the Linux-Thinkpad mailing list, perhaps even more can be done. Following such advice, this script not only runs MPrime, but also toggles on and off a lot of power-demanding features of the laptop throughout the course of the test. The idea is to more rapidly expose corner cases in which the system might act up.
This page contains a large amount of code. The actual code should be moved to a dedicated code article, to make easier to download and edit.
#!/bin/bash # # DESCRIPTION AND MOTIVATION # -------------------------- # Designed for an undervolted laptops with frequency stepping, this script # swings the system between aggressive and low power use, and also swings # among the available frequencies. # # The idea is that such exteme use of the system will likely explore corner # cases where the system might fail. Hopefully, such testing can curtail the # time necessary to establish confidence in undervolted systems. # # In the background the MPrime program, a prime number search engine, runs in a # "torture test" mode, in which it tests computations against known results and # errs out if there's a discrepancy. Unless it errs out, this script runs # forever. # # IMPLEMENTATION # -------------- # The design of this script attempts to address laptops beyond the Thinkpad T42 # for which it was designed. Many of the function definitions are prepended # with conditionals that check the system for functionality and either bail out # or disable features accordingly. # # In particular, the nature of what "aggressive" constitutes is defined by a # number of "toggle_" functions. The pre-pended conditional to these functions # appends the function name to $AGGRESSIVE_TOGGLES if the system appears to # support the feature. The toggle_aggression function then calls all the # functions in $AGGRESSIVE_TOGGLES. Look at these "toggle_" functions for # examples of how to extend this script for other possible stressing. # # EXTERNAL PROGRAMS EMPLOYED # -------------------------- # Test system integriy (required): MPrime - http://www.mersenne.org/prime.htm # Download files: curl - http://curl.haxx.se # Read random sectors from CD: spew (for gorge) - http://spew.berlios.de # Keep hard disk active: stress - http://weather.ou.edu/~apw/projects/stress/ # # EXECUTION # --------- # Read this script including all the warnings below, and then make sure all the # variables in the "Script Globals" section are appropriately set. # # This script uses the mprime binary with the "-t" switch for the MPrime # "torture test." This test by default uses all the memory available on the # system. However, if you run this system for many hours, your kernel may run # out of memory, and kill mprime and this script. To spare yourself this # problem, use the "NightMemory=" and "DayMemory=" parameters in MPrime's # local.ini file, a file typically in the same directory as the mprime # executable (read the MPrime documentation for specifics). The torture test # by default uses the greater of these two settings, so just set them both a # reasonable margin away from the total amount of memory available on your # system. On a system with 512MB of RAM, I set these parameters both to 448, # and had enough memory left over to run my normal set of background processes. # # The arguments of this script are "aggression" toggles to disable. Any # function below that begins with "toggle_$OPTION" can be disabled by using # $OPTION as one of the arguments of this script. Otherwise, all the stressing # that a system supports are enabled by default. # # Because of Warning 3 below, I recommend you run this script as # # stress_test 2>&1 | tee output # # so that you have a persistent record of what has happened in case your battery # drains completely. # # Keeping in mind Warning 1.1, run the script for as long as it takes to # establish confidence in your system (a few hours, half a day, etc.). # # WARNINGS # -------- # 1) This is a STRESS test, and it is very possible that you may witness some # very bad behavior. Some systems might already be on the verge of breaking, # and this script might push them over the edge, and damage them irreparably. # Especially since you've probably undervolted your system, please accept the # inherent risk in running this script. In fact, I have even seen some # unexpected behavior on non-undervolted systems running this script. # # 1.1) This is a STRESS test, and it will run your system very hot at times. # Since you are probably running this test because you've undervolted your # system, you assumedly care a lot about conserving your battery's charge. # However, running a system hot and needlessly running through charging cycles # will tax your battery more than just normal use. It is very difficult to # even estimate how much of your battery's life you may throw away running # this test. In all likelihood on a battery that's not too old or too new, it # should be imperceptible, and the security you'll gain after running this test # will be worth it. You can alway run this script without the battery # connected -- just run it with an "ac_via_smapi" argument to disable # toggling from the ac to battery power. # # 2) Please READ THIS SCRIPT BEFORE RUNNING IT. It was very much designed for my # personal system, and although it worked very well for my needs, it relies # heavily on a number of external programs for full functionality. Finding these # programs isn't so bad (with the exception of MPrime all were available as # Debian packages -- spew, gorge, curl, etc.). As I noted above, I've tried to # structure this script such that it can be extended (as opposed to overwritten) # to support other functionality. However, you should also read this script # entirely because it's not mature, so it's difficult for me to document all the # strange ways in which it might behave under various circumstances. # # 3) This script might drain your battery completely. It has some strong measures # to prevent that from happening, but I can't make guarantees. # # 4) Be mindful that upon breaking out of this script, your system maybe not be # in an agreeable state. There is a bash trap that performs a lot of cleanup # if you exit with a Ctrl-C. But I didn't make the code to revert the CD's speed, # the wireless device's original txpower, the display's brightness, etc. Also, the # bash trap isn't perfect, and might fail to restore the system. # set -e # Script designed to bail out on any irregularities. ############################################## # SCRIPT GLOBALS # # (may need some adjusting for your system) # ############################################## MPRIME_BIN="./gimps/mprime" # MPrime binary location (get from # http://www.mersenne.org/freesoft.htm) AGGRESSIVE_SLEEP_SEC=90 # Seconds for "agressive" testing interval when # testing with a fixed frequency NONAGGRESSIVE_SLEEP_SEC=120 # Seconds for non-"aggressive" testing interval # when testing with a fixed frequency FREQ_CYCLE_SLEEP_SEC=15 # Seconds for each random frequency when testing # with a fixed aggression FREQ_CYCLE_NUM=15 # Number of random frequencies to cycle through # when testing with a fixed aggression CAPACITY_LIMIT=50 # Minimum mWh required in battery before the script # takes time out to recharge the battery SECONDS_TO_CHARGE=300 # Seconds to charge is $CAPACITY_LIMIT is reached WIFI_DEVICE=eth1 # Set to garbage if you don't want to use wifi MAX_TXPOWER=20 # Tx power (dB) used for wifi device in aggressive # mode (off in non-aggressive mode) CDROM_DEV_FILE=/dev/hdc # Set to garbage if you don't want to use the CD-ROM MAX_CD_SPEED=24 # Speed of CD in aggressive mode (off in # non-aggressive mode) # Some services need to be stopped to prevent a conflict with # aggressive/non-aggressive mode settings. These services are restarted in # reverse order upon the script's exit. You can customize the path to these # scripts here if your flavor of GNU doesn't use /etc/init.d/. # SERVICES_TO_STOP="tpsmapi powernowd acpid sleepd laptop-mode" PATH_TO_SERVICES_SCRIPTS="/etc/init.d" # Some info that should be in SysFS or ProcFS. # SYS_CPU_DIR=/sys/devices/system/cpu/cpu0/cpufreq FREQS="$(cat $SYS_CPU_DIR/scaling_available_frequencies)" FREQS_ARRAY=($FREQS) SYS_TPSMAPI_BAT_DIR=/sys/devices/platform/smapi/BAT0 IBM_ACPI_BRIGHTNESS_FILE=/proc/acpi/ibm/brightness RF_KILL_FILE=/sys/class/net/$WIFI_DEVICE/device/rf_kill ############ # BINARIES # ############ # # Establishes paths for all binaries to make it easier for functions to test if # they are executable with 'test -x "$BINARY_BIN"'. # { CURL_BIN=$(which curl) GORGE_BIN=$(which gorge) STRESS_BIN=$(which stress) IWCONFIG_BIN=$(which iwconfig) IFUP_BIN=$(which ifup) IFDOWN_BIN=$(which ifdown) EJECT_BIN=$(which eject) CPUFREQSET_BIN=$(which cpufreq-set) KILLALL_BIN=$(which killall) RENICE_BIN=$(which renice) } || true ############# # FUNCTIONS # ############# # clean_up() # # Kills mprime background job and starts services that were stopped at the # beginning of the scripts execution. # if [ ! -x "$KILLALL_BIN" ] then echo "Sorry, this script uses killall" ; exit 1 fi for service in $SERVICES_TO_STOP ; do if [ ! -x "$PATH_TO_SERVICES_SCRIPTS/$service" ] then echo "$PATH_TO_SERVICES_SCRIPTS/$service can't be called." ; exit 1 fi done clean_up() { $KILLALL_BIN -q mprime || true if [ "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi local SERVICES_TO_START="" for service in $SERVICES_TO_STOP do SERVICES_TO_START="$service $SERVICES_TO_START" done for service in $SERVICES_TO_START do $PATH_TO_SERVICES_SCRIPTS/$service start done } trap "echo 'cleaning up...' ; clean_up" SIGINT SIGTERM SIGHUP # do_sleep() # # Before starting a testing interval, checks in the battery is low, and charges the # battery if necessary. After the testing interval, the running status of the # mprime background job is verified. # # TODO: I've not addressed multiple batteries, APM, or ACPI. # if [ ! -r "$SYS_TPSMAPI_BAT_DIR/remaining_capacity" ] then echo -n "WARNING: Thinkpad SMAPI SysFS interface not " > /dev/stderr echo "available to detect if battery" > /dev/stderr echo -n " level too low. This script could drain " > /dev/stderr echo "all of your battery." > /dev/stderr fi do_sleep() { if [ -r "$SYS_TPSMAPI_BAT_DIR/remaining_capacity" ] ; then local REMAINING_CAPACITY while REMAINING_CAPACITY=$(cat $SYS_TPSMAPI_BAT_DIR/remaining_capacity \ 2> /dev/std) \ && REMAINING_CAPACITY=${REMAINING_CAPACITY%% *} \ && [ "$REMAINING_CAPACITY" ] \ && [ "$REMAINING_CAPACITY" -lt "$CAPACITY_LIMIT" ] ; do echo ; echo -n "Battery is too low to continue, " echo "taking a break to charge up." OLD_AGGRESSIVE="$AGGRESSIVE" if [ "AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi sleep $SECONDS_TO_CHARGE if [ ! "$OLD_AGGRESSIVE" = "$AGGRESSIVE" ] ; then toggle_aggression ; fi done fi sleep $1 if kill -0 $MPRIME_PID 2> /dev/null then return 0 else echo ; echo "mprime bailed out here!" clean_up exit 1 fi } # set_frequency() # # Changes the frequency of the processor to $1. # # TODO: Perhaps there should be other ways to change the frequency another way. # I found cpufreq-set convenient because it handles both ProcFS _and_ # SysFS. # if [ ! -x "$CPUFREQSET_BIN" ] ; then echo "Sorry, the set_frequency() function needs to be updated" > /dev/stderr echo " to change frequencies without cpufreq-set." > /dev/stderr exit 1 fi set_frequency() { $CPUFREQSET_BIN -f $1 } # toggle_ac_via_smapi() # # If the system is an Thinkpad with the tp_smapi kernel module set up, the # ac power is cut in an aggressive mode and returned in the non-agressive mode. # if [ -w "$SYS_TPSMAPI_BAT_DIR/force_discharge" \ -a -w "$SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes" ] then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_ac_via_smapi" fi toggle_ac_via_smapi() { if [ "$AGGRESSIVE" = "true" ] then echo 0 > $SYS_TPSMAPI_BAT_DIR/force_discharge echo 0 > $SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes else echo 1 > $SYS_TPSMAPI_BAT_DIR/force_discharge echo 5 > $SYS_TPSMAPI_BAT_DIR/inhibit_charge_minutes fi } # toggle_ibm_acpi_brightness() # # If the Thinkpad ibm_acpi kernel module is set up, the brightness of screen # is set to the brightest setting in an agressive mode and the dimmest setting # otherwise. # if [ -w "$IBM_ACPI_BRIGHTNESS_FILE" ] then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_ibm_acpi_brightness" fi toggle_ibm_acpi_brightness() { if [ "$AGGRESSIVE" = "true" ] then echo level 0 > $IBM_ACPI_BRIGHTNESS_FILE else echo level 7 > $IBM_ACPI_BRIGHTNESS_FILE fi } # toggle_intel_wireless() # # Turns the wireless device on in power-hogging mode when aggressive, and # turns the device off otherwise. # # NOTE: Designed for the Intel 2200BG open source driver, and may not be # compatible with much else. # if [ -w "$RF_KILL_FILE" -a -x "$PKILL_BIN" -a -x "$IFDOWN_BIN" \ -a -x "$IFUP_BIN" -a -x "$IWCONFIG_BIN" -a "$WIFI_DEVICE" ] \ && grep "$WIFI_DEVICE" /proc/net/wireless then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_intel_wireless" $IWCONFIG_BIN $WIFI_DEVICE txpower $MAX_TXPOWER $IWCONFIG_BIN $WIFI_DEVICE power off fi toggle_intel_wireless() { if [ "$AGGRESSIVE" = "true" ] then echo 1 > $RF_KILL_FILE else echo 0 > $RF_KILL_FILE $PKILL_BIN ^ifdown$\|^ifup$ || true $IFDOWN_BIN $WIFI_DEVICE 2> /dev/null || true $IFUP_BIN $WIFI_DEVICE 2> /dev/null local NUM_OF_TRIES=0 while $IWCONFIG_BIN $WIFI_DEVICE | grep unassociated > /dev/null \ && [ "$NUM_OF_TRIES" -lt 15 ] do sleep 3 NUM_OF_TRIES=$(($NUM_OF_TRIES + 1)) done fi } # toggle_gorge() # # In an aggressive mode, reads data from the CD-ROM at random offsets using the # 'gorge' command (http://spew.berlios.de/). # # NOTE: Don't use a DVD, as the speed set by `eject' doesn't affect DVDs. # # NOTE: Make sure to use a CD with more than 450MB of data. # if [ -x "$GORGE_BIN" -a -x "$KILLALL_BIN" -a -r "$CDROM_DEV_FILE" ] then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_gorge" fi toggle_gorge() { if [ "$AGGRESSIVE" = "true" ] then $KILLALL_BIN -q $GORGE_BIN || true else $GORGE_BIN -r 450M $CDROM_DEV_FILE 2> /dev/null & local GORGE_PID=$! # # My laptop needed a little priority push to get gorge CD reading started # in sync with the interval. # if [ -x "$RENICE_BIN" ] then $RENICE_BIN -2 -p $GORGE_PID > /dev/null fi fi } # toggle_stress() # # Runs the `stress' program (http://weather.ou.edu/~apw/projects/stress/) in # the aggressive mode with settings to issue a large number of write(), # unlink(), and sync() events. # if [ -x "$STRESS_BIN" -a -x "$KILLALL_BIN" ] then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_stress" fi toggle_stress() { if [ "$AGGRESSIVE" = "true" ] then $KILLALL_BIN -q $STRESS_BIN || true else $STRESS_BIN -q -i 1 -d 1 & fi } # toggle_curl() # # Downloads a file (to drain power through the wireless device) in the # aggressive mode using `curl'. # if [ -x "$CURL_BIN" -a -x "$KILLALL_BIN" ] then AGGRESSIVE_TOGGLES="$AGGRESSIVE_TOGGLES toggle_curl" fi toggle_curl() { URL_FIRST_HALF="http://cdimage.debian.org/cdimage/weekly-builds/" URL_SECOND_HALF="i386/iso-cd/debian-testing-i386-binary-1.iso" if [ "$AGGRESSIVE" = "true" ] then $KILLALL_BIN -q $CURL_BIN || true else $CURL_BIN $URL_FIRST_HALF$URL_SECOND_HALF > /dev/null 2> /dev/null & fi } # toggle_aggression() # # Runs all the "toggle_" functions supported by the system unless specified # as disabled in the script arguments. # for toggle_to_disable in $@ do AGGRESSIVE_TOGGLES=$(echo $AGGRESSIVE_TOGGLES \ | sed -e "s/toggle_$toggle_to_disable//") done toggle_aggression() { for toggle in $AGGRESSIVE_TOGGLES ; do $toggle ; done if [ "$AGGRESSIVE" = "true" ] then AGGRESSIVE="false" else AGGRESSIVE="true" fi } ######### # SETUP # ######### # Stopping services that might interfere with the system state this script # controls (precondition satisfied in definition of clean_up). # for service in $SERVICES_TO_STOP do /etc/init.d/$service stop done # Setting CD to a fast speed # if [ -x "$EJECT_BIN" ] then $EJECT_BIN -x $MAX_CD_SPEED elif [ -x "$HDPARM_BIN" ] then $HDPARM_BIN -E $MAX_CD_SPEED fi # Starting the prime number search # if [ ! -x "$MPRIME_BIN" ] ; then echo "mprime program not executable/found." > /dev/stderr exit 1 fi $MPRIME_BIN -t > mprime_output.txt & MPRIME_PID=$! ######## # BODY # ######## while true ; do for f in $FREQS ; do echo "Cycling aggression twice for ${f}kHz: " set_frequency $f if [ ! "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi for i in 1 2 ; do echo " high " ; do_sleep $AGGRESSIVE_SLEEP_SEC ; toggle_aggression echo " low " ; do_sleep $NONAGGRESSIVE_SLEEP_SEC ; toggle_aggression done echo for i in 1 2 ; do if [ $i -eq 1 ] then if [ ! "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi echo "Random freqs under high aggression: " else if [ "$AGGRESSIVE" = "true" ] ; then toggle_aggression ; fi echo "Random freqs under low aggression: " fi for (( i=1 ; i<=$FREQ_CYCLE_NUM ; i+=1 )) ; do FREQ=${FREQS_ARRAY[$(($RANDOM % 6))]} echo " ${FREQ}..." set_frequency $FREQ do_sleep $FREQ_CYCLE_SLEEP_SEC done echo done done done
<digg />