Script for Undervolt Stress Testing

From ThinkWiki
Revision as of 00:16, 13 April 2007 by Fb0Rnq (Talk | contribs)
Jump to: navigation, search

This script helps in calibrating voltages when undervolting a Pentium M processor.

People have many different tolerances for how far they will undervolt their system. Some are eager to just run their Pentium-Ms at 700mV and abandon safety; they ramp their systems as far as they can without crashing their system, and maybe they pull the voltages up a margin from the failure point. However, this provides only a weak degree of security as a number of failures can occur that might not surface immediately. In the worst case, the system will fail months later, and the blame might be assigned to, say, a kernel upgrade or patch when really the system failed due to intermittent lack of power.

Many would like to guard themselves again such a failure and consequently have opted to run a prime number stress test such as MPrime in a "torture test" mode, while they ramp down their voltages to find a comfortable margin from the failure point. However, as per recommendations from a thread of the Linux-Thinkpad mailing list, perhaps even more can be done. Following such advice, this script not only runs MPrime, but also toggles on and off a lot of power-demanding features of the laptop throughout the course of the test. The idea is to more rapidly expose corner cases in which the system might act up.

NOTE!
Please feel very free to improve/fix this script. My intent for its posting is to make its ownership as public as possible. There's no need to try to E-mail me to validate your changes. If you feel they are in the best interest of the public, just make the changes. The script attempts to employ pre-conditions to intelligently apply functionality only to those laptops that appear to support it. Hopefully, its framework will allow for extension without heavy redesign.
ATTENTION!
There are very important warnings embedded into the comments of this script. I have left them there because if you copy this script to your system, I would want you to carry these warnings as comments with you. Please read these comments and the script very carefully. Stress testing an undervolted system is not a trivial undertaking and you need to be as accountable as possible for what a script like this does.

This page contains a large amount of code. The actual code should be moved to a dedicated code article, to make easier to download and edit.

#!/bin/bash
#
# DESCRIPTION AND MOTIVATION 
# --------------------------
# Designed for an undervolted laptops with frequency stepping, this script
# swings the system between aggressive and low power use, and also swings
# among the available frequencies.
# 
# The idea is that such exteme use of the system will likely explore corner
# cases where the system might fail.  Hopefully, such testing can curtail the
# time necessary to establish confidence in undervolted systems.
#
# In the background the MPrime program, a prime number search engine, runs in a
# "torture test" mode, in which it tests computations against known results and
# errs out if there's a discrepancy.  Unless it errs out, this script runs
# forever.
# 
# IMPLEMENTATION
# --------------
# The design of this script attempts to address laptops beyond the Thinkpad T42
# for which it was designed.  Many of the function definitions are prepended
# with conditionals that check the system for functionality and either bail out
# or disable features accordingly.
#
# In particular, the nature of what "aggressive" constitutes is defined by a 
# number of "toggle_" functions.  The pre-pended conditional to these functions
# appends the function name to $AGGRESSIVE_TOGGLES if the system appears to
# support the feature.  The toggle_aggression function then calls all the 
# functions in $AGGRESSIVE_TOGGLES.  Look at these "toggle_" functions for 
# examples of how to extend this script for other possible stressing.
#
# EXTERNAL PROGRAMS EMPLOYED
# --------------------------
# Test system integriy (required):  MPrime - http://www.mersenne.org/prime.htm
# Download files:  curl - http://curl.haxx.se
# Read random sectors from CD:  spew (for gorge) - http://spew.berlios.de
# Keep hard disk active:  stress - http://weather.ou.edu/~apw/projects/stress/
#
# EXECUTION
# ---------
# Read this script including all the warnings below, and then make sure all the
# variables in the "Script Globals" section are appropriately set. 
#
# This script uses the mprime binary with the "-t" switch for the MPrime
# "torture test."  This test by default uses all the memory available on the
# system.  However, if you run this system for many hours, your kernel may run
# out of memory, and kill mprime and this script.  To spare yourself this
# problem, use the "NightMemory=" and "DayMemory=" parameters in MPrime's
# local.ini file, a file typically in the same directory as the mprime
# executable (read the MPrime documentation for specifics).  The torture test
# by default uses the greater of these two settings, so just set them both a
# reasonable margin away from the total amount of memory available on your
# system.  On a system with 512MB of RAM, I set these parameters both to 448,
# and had enough memory left over to run my normal set of background processes.
#
# The arguments of this script are "aggression" toggles to disable.  Any
# function below that begins with "toggle_$OPTION" can be disabled by using
# $OPTION as one of the arguments of this script.  Otherwise, all the stressing
# that a system supports are enabled by default.
#
# Because of Warning 3 below, I recommend you run this script as
#
#     stress_test 2>