test file for threads:

	Macaulay2/packages/Macaulay2Doc/test/threads.m2

documentation for threads:

	Macaulay2/packages/Macaulay2Doc/threads.m2

audit all the code and make it re-entrant.

    already re-entrant or locked :

    	storing or removing an entry from a hash table
	assignment to a variable

    not re-entrant, but safe:

    	getting the keys, pairs, or values from a mutable hash table (storing and removing is
	done carefully, so that another thread traversing one of the lists of key-value pairs
	will not traverse part of it twice

    not yet re-entrant :

    	storing an entry into a mutable list at a position greater than the size,
	causing the list to be enlarged

    some areas of concern:

        in d/*.d we have exceptionFlag and interruptFlag, etc., which are defined to be atomic data items
	as well as thread local.  But one depends on another, so the pair isn't atomic.  That might be a problem.

Figure out how:

	to provide thread local storage when it isn't provided by the compiler/linker, such as under Mac OS X:

	    ideas:

		1. pthread_key_create

		2. see specific.c and include/private/specific.h in gc library.  It's faster than
		   pthread_key_create, and uses a hash table based on an address of a location
		   on the thread's stack.

	to communicate with a single thread in an xterm window

	to debug a thread with the Macaulay2 debugger

	to wait for a thread, or to wait for the first thread of a set of threads

	to provide mutexes at top level.

	     Possible ways:

		  1. top level mutex objects that can be locked or unlocked

		       But what if a top level routine is interrupted? How do we unlock the mutex while climbing 
		       the stack to the top level?  We don't yet have a top level way for the user to
		       install clean-up code.

		  2. a language construct for exclusive evaluation of code or a function

		  	e.g.: exclusivelyEvaluate (x=3; y=4)

			or  : exclusivelyEvaluate( () -> (x=3; y=4) )

			Hmm, this is not so useful -- it would be nicer to have exclusivity depend
			not only on the code, but also on the data!

	     Here's one situation where it would useful:  we are trying to print the internal state of
	     an object that is changing as it is being updated in another thread.  (It might be a chain
	     complex representing a resolution of a module.)  While examining the internal state we are
	     caching some results that depend on the internal state, such as the matrices in a resolution.
	     When printing again later we want certainty about whether the cached results have been 
	     invalidated by further computation.  We're not completely sure how to do this best.

	threads could report their status and progress

	threads, when done with a task, could wait for a new task to be assigned

	to wait for any thread in a set of threads to terminate (wait for pthread_cond_t?)

	our "time" and "timing" commands could give cputime just for the current thread.
	(this now works for linux, but not mac os x; see M2lib.c: system_cpuTime()
	and system_cpuTime_init())

	to make printing to stdio thread-safe.  The D variables stdIO and
	stderr are thread local, so that means that error messages (which are
	printed by D language code) from different threads come out on
	different lines on the screen (desirable!), but the Macaulay2 variable
	stdio is not thread local, so when printing is done from top level,
	there can be a problem.

		here is an example:

		i7 : for i from 1 to 3 list inThread (() -> (2^500; print "sssssssssssssssssssssssssssssssss"))
		sssssssssssssssssssssssssssssssss
		sssssssssssssssssssssssssssssssss
		/home/dan/src/M2/mike-development/BUILD/dan/../../Macaulay2/d/stdio.d:401:47: array index -34 out of bounds 0 .. 4095
		/home/dan/src/M2/mike-development/BUILD/dan/../../Macaulay2/d/stdio.d:524:29: array index -34 out of bounds 0 .. 4095

		The string printed is 34 bytes long (with the newline), and the I/O buffer is 4096 bytes long.

Provide Scheduling Facilities

   A. There are these competing "aspects":

	(1) sometimes a computation needs to be done, but there are multiple strategies for
	    pursuing it.  Then it can be a good idea to pursue the various strategies
	    simultaneously, waiting for the first one to return with the answer.  At that
	    point, the others can be killed.

	(2) sometimes several computations need to be done, and they can be done
	    in parallel

	(3) we start several threads computing various things, wait for the first
	    to return, and then we let the other threads continue until they have used
	    50%, say, more cputime than the first one.  We retrieve and use any additional
	    answers returned during that time for continuing the original computation,
	    and kill the thread that didn't terminate during that time.

    B. Here is a related question: what happens when threads start other threads, 
       recursively, and the applicable "aspect" varies?  That may affect how the threads are
       scheduled.  For example, if create a team of threads according to aspect (1), and they
       all start new threads, it would be bad if the children of the first thread would
       get all the CPU time.

    C. It would be nice if tasks could be scheduled directly from C++ or D code, in addition
       to Macaulay2 code.

    D. Some ideas might be gotten from "Grand Central Dispatch"
