ETM -- A Program Exception and Termination Manager

Paul DuBois
dubois@primate.wisc.edu
Wisconsin Regional Primate Research Center
Revision date:  10 April 1997

1. Introduction

This document describes Exception and Termination Manager (ETM), a simple(-minded)
library to manage exceptional conditions that arise during program execution,
and to provide for orderly program shutdown.

There are at least a couple of approaches one may adopt for handling
error conditions within an application:

-	Have functions always return a value and have all callers test the
return value and respond accordingly.

-	Force the program to give up and exit early.

Each approach has strengths and weaknesses. A difficulty with the first
is that actions composed of many subsidiary actions, each of which may
themselves succeed or fail, can easily become very unwieldy when an attempt
is made to handle all possible outcomes. However, such a program will
also continue in the face of extreme adversity.

An advantage of the second approach is that it is, conceptually at least,
simpler to let a program die when a serious error occurs. The difficulty
lies in making sure the program cleans up and shuts down properly before
it exits. This can be a problem especially when a program uses a number
of independent modules which can each encounter exceptional conditions
and need to be shut down, and which may know nothing of each other. ETM
is designed to alleviate the difficulties of this second approach.

The general architecture assumed for this discussion is that of an application
which uses zero or more subsystems which may be more or less independent
of each other, and which may each require initialization and/or termination.
Also, other application-specific initialization and/or termination actions
may need to be performed which are unrelated to those of the subsystems,
e.g., temporary files created at the beginning of the application need
to be removed before final termination, network connections need to be
shut down, terminal state needs to be restored.

Ideally, when an application executes normally, it will initialize, perform
the main processing, then shut down in an orderly fashion. This does
not always occur. Exceptional conditions may be detected which necessitate
a "panic" (an immediate program exit) because processing cannot continue
further, or because it is judged too burdensome to try to continue.

An individual subsystem may be easily written such that a panic within
itself causes its own shutdown code to be invoked. It is more difficult
to arrange for other subsystems to be notified of the panic so that they
can shut down as well, since the subsystem in which the panic occurs
may not even know about them.

An additional difficulty is that some exceptions may occur for reasons
not related to algorithmically detectable conditions. For instance, the
user of an application may cause a signal to be delivered to it at any
time. This has nothing to do with normal execution and cannot be predicted.

The goals of ETM are thus twofold:

(1)	Panics triggered anywhere within an application or any of its subsystems
should cause orderly shutdown of all subsystems and the application itself.

(2)	Signals that normally terminate a program should be caught and trigger
a panic to allow shutdown as per (1).

2. Processing Model

The model used by ETM is that the application initializes subsystems
in the order required by any dependencies among them, and then terminates
them in the reverse order. The presumption here is that if subsystem
ss2 is dependendent upon subsystem ss1, then ss1 should be initialized
first and terminated last; the dependency is unlikely to make it wise
to shut down ss1 before ss2.

ETM must itself be initialized before any other subsystem which uses
it. The initialization call, ETMInit(), takes as an argument a pointer
to a routine which performs any application-specific cleanup not related
to its subsystems, or NULL if there is no such routine.

Each of the subsystems should then be initialized. A subsystem's initialization
routine should call ETMAddShutdownProc() to register its own shutdown
routine with ETM, if there is one. (Some subsystems may require no explicit
initialization or termination. However, if there is a shutdown routine,
you should at least call ETMAddShutdownProc() to register it.)

When the program detects an exceptional condition, it calls ETMPanic()
to describe the problem and exit. ETMPanic() is also called automatically
when a signal is caught. A message is printed, and all the shutdown routines
that have been registered are automatically executed, including the application-specific
one.

ETM is designed to handle shutting down under unusual circumstances,
but it also works well for terminating normally. Instead of calling ETMPanic(),
the application calls ETMEnd(). This is much like calling ETMPanic(),
except that no error message is printed, and ETMEnd() returns to the
caller. which takes care of calling all the shutdown routines that have
been registered.

It is evident that the functionality provided by ETM is somewhat like
that of the atexit() routine provided on some systems. Some differences
between the two are:

-	atexit() is either built in or not available. ETM can be put on any
system to which it can be ported (extent unknown, but includes at least
SunOS, Ultrix, Mips RISC/os and THINK C).

-	ETM is more suited for handling exceptional conditions.

-	ETM shutdown routines can be installed and removed later. atexit()
provides only for installation (although you could simulate removal by
setting a flag which shutdown routines examine to see whether to execute
or not).

Here is a short example of how to set up and shut down using ETM.


main ()
{
	. . .
	ETMInit (Cleanup);	/* register application-specific cleanup */
	SS1Init ();	/* registers SS1End() for shutdown */
	SS2Init ();	/* registers SS2End() for shutdown */
	SS3Init ();	/* registers SS3End() for shutdown */

	... main processing here ...

	ETMEnd ();	/* calls SS3End (), SS2End () and SS1End () */
	exit (0);
}


Subsystems that are themselves built on other subsystems may follow this
model, except that they would not call ETMInit() or ETMEnd().

If there is no special initialization or shutdown activity, and you don't
care about catching signals, it is not necessary to call ETMInit() and
ETMEnd(). The application may still call ETMPanic() to print error messages
and terminate. (Even if the application does use ETMInit() and ETMEnd(),
it is safe to call ETMPanic() before any initialization has been done,
because nothing needs to be shut down at that point yet.)

If ETM itself encounters an exceptional condition (e.g., it cannot allocate
memory when it needs to), it will--of course--trigger a panic. This should
be rare, but if it occurs, ETM will generate a message indicating what
the problem was.

3. Caveats

Shutdown routines shouldn't call ETMPanic(), since ETMPanic() causes
shutdown routines to be executed. ETM detects loops of this sort, but
their occurrence indicate a flaw in program logic. Similarly, if you
install a print routine to redirect ETM's output somewhere other than
stderr, the routine shouldn't call ETM to print any messages.

kill -9 is uncatchable and there's nothing you can do about it.

4. Programming Interface

The ETM library should be installed in /usr/lib/libetm.a or local equivalent,
and applications should link in the ETM library with the -letm flag.
Source files that use ETM routines should include etm.h. If you use ETM
functions in a source file without including etm.h, you will get undefined
symbol errors at link time.

The abstract types ETMProcRetType and ETMProcPtr may be used for declaring
and passing pointers to functions that are passed to ETM routines. By
default these will be void and void(*)(), but on deficient systems with
C compilers lacking void pointers they will be int and int(*)(), the
usual C defaults for functions.

These types make it easier to declare properly typed functions and NULL
pointers. For instance, if you don't pass any shutdown routine to ETMInit(),
use


ETMInit ((ETMProcPtr) NULL);


If you do, use


ETMProcRetType ShutdownProc () { . . . }
. . .
main ()
{
	. . .
	ETMInit (ShutdownProc);
	. . .
}


Descriptions of the ETM routines follow.

ETMProcRetType ETMInit (p)
ETMProcPtr	p;

Registers the application's cleanup routine p (which should be NULL if
there is none) and registers default handlers for the following signals
(all of which normally cause program exit): SIGHUP, SIGINT, SIGQUIT,
SIGILL, SIGSYS, SIGTERM, SIGBUS, SIGSEGV, SIGFPE, SIGPIPE. If p is not
NULL, it should point to a routine that takes no arguments and returns
no value.

ETMProcRetType ETMEnd ()

Causes all registered shutdown routines to be executed. The application
may then exit normally with exit(0).

ETMProcRetType ETMPanic (fmt, ...)
char	*fmt;

ETMPanic() is called when a panic condition occurs, and the program cannot
continue. The arguments are as those for printf() and are used to print
a message after shutting down all subsystems and executing the application's
cleanup routine, and before calling exit(). ETMPanic() adds a newline
to the end of the message.

ETMPanic() may be called at any time, including prior to calling ETMInit(),
but only those shutdown routines which have been registered are invoked.

A common problem with applications that encounter exceptional conditions
such as segmentation faults is that you often don't see all the output
your application has produced. This is because stdout is often buffered.
To alleviate this problem, stdout is flushed before any message is printed,
so that any pending application output is flushed and appears before
the error message.

By default, ETMPanic() prints the message on stderr. This behavior may
be modified with ETMSetPrintProc().

The default exit() value is 1. This may be modified with ETMSetExitStatus().

ETMProcRetType ETMMsg (fmt, ...)
char	*fmt;

ETMMsg() is like ETMPanic() except that it just prints the message and
returns. It is useful in that if panic message output has been redirected
somewhere other than stderr (e.g., to the system log), ETMMsg() will
write its output there, too. The application does not need to know whether
such redirection has taken place.

ETMMsg() may be called at any time, including prior to calling ETMInit().

ETMProcRetType ETMAddShutdownProc (p)
ETMProcPtr	p;

Register a shutdown routine with ETM. This is normally called within
a subsystem's initialization routine. p should point to a routine that
takes no arguments and returns no value.

ETMProcRetType ETMRemoveShutdownProc (p)
ETMProcPtr	p;

Deregister a previously-registered shutdown routine with ETM. This is
useful for routines that only need to be registered temporarily, e.g.,
during execution of some piece of code that temporarily creates some
file that needs to be removed if the program crashes, but which removes
it itself if execution proceeds normally.

ETMProcRetType ETMSetSignalProc (signo, p)
int	signo;
ETMProcPtr	p;

Register a signal-catching routine to override ETM's default. The routine
will be called with one argument, the signal number. It should return
no value, regardless of the usual return type of signal handler routines
on your system. (When ETM is configured on your system, it knows the
proper return value for signal() but hides differences among systems
from your application so you don't have to think about it.)

To return a signal to its default action or to cause a signal to be ignored,
pass the following values for p (these are defined in etm.h):


ETMSigIgnore	signal is ignored
ETMSigDefault	signal default action is restored


ETMProcPtr ETMGetSignalProc (signo)
int	signo;

Returns the function current used to catch signal signo, or NULL if the
signal is handled with the default action or being ignored (it's not
possible to distinguish between the last two cases).

ETMProcRetType ETMSetPrintProc (p)
ETMProcPtr	p;

This routine is used to register a procedure that ETM can use to print
messages. The default is to send messages to stderr, which is appropriate
for most programs. Applications may prefer to send messages elsewhere.
For instance, non-interactive programs like network servers might send
them to syslog() instead. Or a program may wish to send messages to multiple
destinations.

To override the default, pass the address of an alternate print routine
to ETMSetPrintProc(). The routine should take one argument, a pointer
to a character string, and return no value. The argument will be the
fully formatted panic message, complete with a newline on the end. To
restore the default, pass NULL.

The printing routine shouldn't call ETMPanic() or ETMMsg() or a loop
will be detected and ETM will conveniently panic as a service to let
you know you have a logic error in your program.

ETMProcPtr ETMGetPrintProc ()

Returns a pointer to the current printing function, NULL if the default
is being used.

ETMProcRetType ETMSetExitStatus (status)
int	status;

This routine is used to register the status value that is passed to exit()
when a panic occurs. The default is 1. For some applications it is desirable
to return a different value. For instance, a mail server that processes
messages may send back a message to the person who sent mail when a request
is erroneous, then panic (perhaps by writing a message to the system
log). On some systems, if a program invoked to handle mail returns non-zero,
the mailer will send another message to that person stating that there
was a problem handling the mail. This extra message is unnecessary, and
can be suppressed by registering an exit status of 0.

If ETMSetAbort() has been called to force an abort() on a panic, the
exit status is not returned.

int ETMGetExitStatus ()

Returns the current exit status which will be returned if a panic occurs.

ETMProcRetType ETMSetAbort (val)
int	val;

Calling this function with a non-zero value of val causes ETM to try
to generate a core image when ETMPanic() is called (after the panic message
is printed). This can sometimes be useful for debugging. If val is zero,
image generation is suppressed. The default is no image.

ETMSetAbort() is meaningless on systems with no concept of a core image.
Also, if you install a signal catcher for SIGABRT, you may end up in
a panic loop.

int ETMGetAbort ()
int	val;

Return current image generation value.
