9 files changed, 5597 insertions, 0 deletions
diff --git a/mpi/README b/mpi/README
new file mode 100644
index 00000000..415029f7
--- /dev/null
+++ b/mpi/README
@@ -0,0 +1,793 @@
+About the MPI Library
+---------------------
+
+The files 'mpi.h' and 'mpi.c' define a simple, arbitrary precision
+signed integer arithmetic package.  The implementation is not the most
+efficient possible, but the code is small and should be fairly easily
+portable to just about any machine that supports an ANSI C compiler,
+as long as it is capable of at least 16-bit arithmetic (but also see
+below for more on this).
+
+This library was written with an eye to cryptographic applications;
+thus, some care is taken to make sure that temporary values are not
+left lying around in memory when they are no longer in use.  This adds
+some overhead for zeroing buffers before they are released back into
+the free pool; however, it gives you the assurance that there is only
+one copy of your important values residing in your process's address
+space at a time.  Obviously, it is difficult to guarantee anything, in
+a pre-emptive multitasking environment, but this at least helps you
+keep a lid on the more obvious ways your data can get spread around in
+memory.
+
+
+Using the Library
+-----------------
+
+To use the MPI library in your program, you must include the header:
+
+#include "mpi.h"
+
+This header provides all the type and function declarations you'll
+need to use the library.  Almost all the names defined by the library
+begin with the prefix 'mp_', so it should be easy to keep them from
+clashing with your program's namespace (he says, glibly, knowing full
+well there are always pathological cases).
+
+There are a few things you may want to configure about the library.
+By default, the MPI library uses an unsigned short for its digit type,
+and an unsigned int for its word type.  The word type must be big
+enough to contain at least two digits, for the primitive arithmetic to
+work out.  On my machine, a short is 2 bytes and an int is 4 bytes --
+but if you have 64-bit ints, you might want to use a 4-byte digit and
+an 8-byte word.  I have tested the library using 1-byte digits and
+2-byte words, as well.  Whatever you choose to do, the things you need
+to change are:
+
+(1) The type definitions for mp_digit and mp_word.
+
+(2) The macro DIGIT_FMT which tells mp_print() how to display a
+    single digit.  This is just a printf() format string, so you
+    can adjust it appropriately.
+
+(3) The macros MP_DIGIT_MAX and MP_WORD_MAX, which specify the
+    largest value expressible in an mp_digit and an mp_word,
+    respectively.
+
+Both the mp_digit and mp_word should be UNSIGNED integer types.  The
+code relies on having the full positive precision of the type used for
+digits and words.
+
+The remaining type definitions should be left alone, for the most
+part.  The code in the library does not make any significant
+assumptions about the sizes of things, but there is little if any
+reason to change the other parameters, so I would recommend you leave
+them as you found them.
+
+The library comes with a Perl script, 'types.pl', which will scan your
+current Makefile settings, and attempt to find good definitions for
+these types.  It relies on a Unix sort of build environment, so it
+probably won't work under MacOS or Windows, but it can be convenient
+if you're porting to a new flavour of Unix.  Just run 'types.pl' at
+the command line, and it will spit out its results to the standard
+output.
+
+
+Conventions
+-----------
+
+Most functions in the library return a value of type mp_err.  This
+permits the library to communicate success or various kinds of failure
+to the calling program.  The return values currently defined are:
+
+        MP_OKAY         - okay, operation succeeded, all's well
+        MP_YES          - okay, the answer is yes (same as MP_OKAY)
+        MP_NO           - okay, but answer is no (not MP_OKAY)
+        MP_MEM          - operation ran out of memory
+        MP_RANGE        - input parameter was out of range
+        MP_BADARG       - an invalid input parameter was provided
+        MP_UNDEF        - no output value is defined for this input
+
+The only function which currently uses MP_UNDEF is mp_invmod().
+Division by zero is undefined, but the division functions will return
+MP_RANGE for a zero divisor.  MP_BADARG usually means you passed a
+bogus mp_int structure to the function.  MP_YES and MP_NO are not used
+by the library itself; they're defined so you can use them in your own
+extensions.
+
+If you need a readable interpretation of these error codes in your
+program, you may also use the mp_strerror() function.  This function
+takes an mp_err as input, and returns a pointer to a human-readable
+string describing the meaning of the error.  These strings are stored
+as constants within the library, so the caller should not attempt to
+modify or free the memory associated with these strings.
+
+The library represents values in signed-magnitude format.  Values
+strictly less than zero are negative, all others are considered
+positive (zero is positive by fiat).  You can access the 'sign' member
+of the mp_int structure directly, but better is to use the mp_cmp_z()
+function, to find out which side of zero the value lies on.
+
+Most arithmetic functions have a single-digit variant, as well as the
+full arbitrary-precision.  An mp_digit is an unsigned value between 0
+and MP_DIGIT_MAX inclusive.  The radix is available as RADIX.  The
+number of bits in a given digit is given as DIGIT_BIT.
+
+Generally, input parameters are given before output parameters.
+Unless otherwise specified, any input parameter can be re-used as an
+output parameter, without confusing anything.
+
+The basic numeric type defined by the library is an mp_int.  Virtually
+all the functions in the library take a pointer to an mp_int as one of
+their parameters.  An explanation of how to create and use these
+structures follows.  And so, without further ado...
+
+
+Initialization and Cleanup
+--------------------------
+
+The basic numeric type defined by the library is an 'mp_int'.
+However, it is not sufficient to simply declare a variable of type
+mp_int in your program.  These variables also need to be initialized
+before they can be used, to allocate the internal storage they require
+for computation.
+
+This is done using one of the following functions:
+
+        mp_init(mp_int *mp);
+        mp_init_copy(mp_int *mp, mp_int *from);
+        mp_init_size(mp_int *mp, mp_size p);
+
+Each of these requires a pointer to a structure of type mp_int.  The
+basic mp_init() simply initializes the mp_int to a default size, and
+sets its value to zero.  If you would like to initialize a copy of an
+existing mp_int, use mp_init_copy(), where the 'from' parameter is the
+mp_int you'd like to make a copy of.  The third function,
+mp_init_size(), permits you to specify how many digits of precision
+should be preallocated for your mp_int.  This can help the library
+avoid unnecessary re-allocations later on.
+
+The default precision used by mp_init() can be retrieved using:
+
+        precision = mp_get_prec();
+
+This returns the number of digits that will be allocated.  You can
+change this value by using:
+
+        mp_set_prec(unsigned int prec);
+
+Any positive value is acceptable -- if you pass zero, the default
+precision will be re-set to the compiled-in library default (this is
+specified in the header file 'mpi-config.h', and typically defaults to
+8 or 16).
+
+Just as you must allocate an mp_int before you can use it, you must
+clean up the structure when you are done with it.  This is performed
+using the mp_clear() function.  Remember that any mp_int that you
+create as a local variable in a function must be mp_clear()'d before
+that function exits, or else the memory allocated to that mp_int will
+be orphaned and unrecoverable.
+
+To set an mp_int to a given value, the following functions are given:
+
+        mp_set(mp_int *mp, mp_digit d);
+        mp_set_int(mp_int *mp, long z);
+
+The mp_set() function sets the mp_int to a single digit value, while
+mp_set_int() sets the mp_int to a signed long integer value.
+
+To set an mp_int to zero, use:
+
+        mp_zero(mp_int *mp);
+
+
+Copying and Moving
+------------------
+
+If you have two initialized mp_int's, and you want to copy the value
+of one into the other, use:
+
+        mp_copy(from, to)
+
+This takes care of clearing the old value of 'to', and copies the new
+value into it.  If 'to' is not yet initialized, use mp_init_copy()
+instead (see above).
+
+Note:   The library tries, whenever possible, to avoid allocating
+----    new memory.  Thus, mp_copy() tries first to satisfy the needs
+        of the copy by re-using the memory already allocated to 'to'.
+        Only if this proves insufficient will mp_copy() actually
+        allocate new memory.
+
+        For this reason, if you know a priori that 'to' has enough
+        available space to hold 'from', you don't need to check the
+        return value of mp_copy() for memory failure.  The USED()
+        macro tells you how many digits are used by an mp_int, and
+        the ALLOC() macro tells you how many are allocated.
+
+If you have two initialized mp_int's, and you want to exchange their
+values, use:
+
+        mp_exch(a, b)
+
+This is better than using mp_copy() with a temporary, since it will
+not (ever) touch the memory allocator -- it just swaps the exact
+contents of the two structures.  The mp_exch() function cannot fail;
+if you pass it an invalid structure, it just ignores it, and does
+nothing.
+
+
+Basic Arithmetic
+----------------
+
+Once you have initialized your integers, you can operate on them.  The
+basic arithmetic functions on full mp_int values are:
+
+mp_add(a, b, c)         - computes c = a + b
+mp_sub(a, b, c)         - computes c = a - b
+mp_mul(a, b, c)         - computes c = a * b
+mp_sqr(a, b)            - computes b = a * a
+mp_div(a, b, q, r)      - computes q, r such that a = bq + r
+mp_div_2d(a, d, q, r)   - computes q = a / 2^d, r = a % 2^d
+mp_expt(a, b, c)        - computes c = a ** b
+mp_2expt(a, k)          - computes a = 2^k
+mp_sqrt(a, c)           - computes c = floor(sqrt(a))
+
+The mp_div_2d() function efficiently computes division by powers of
+two.  Either the q or r parameter may be NULL, in which case that
+portion of the computation will be discarded.
+
+The mp_sqr() function squares its input argument.  A call to 
+
+	mp_sqr(a, c) 
+
+... is identical in meaning to mp_mul(a, a, c); however, if the
+MP_SQUARE variable is set true in mpi-config.h (see below), then it
+will be implemented with a different algorithm, that is supposed to
+take advantage of the redundant computation that takes place during
+squaring.  Unfortunately, some compilers result in worse performance
+on this code, so you can change the behaviour at will.  There is a
+utility program "mulsqr.c" that lets you test which does better on
+your system.
+
+The algorithms used for some of the computations here are described in
+the following files which are included with this distribution:
+
+mul.txt         Describes the multiplication algorithm
+div.txt         Describes the division algorithm
+expt.txt        Describes the exponentiation algorithm
+sqrt.txt        Describes the square-root algorithm
+square.txt      Describes the squaring algorithm
+
+There are single-digit versions of most of these routines, as well.
+In the following prototypes, 'd' is a single mp_digit:
+
+mp_add_d(a, d, c)       - computes c = a + d
+mp_sub_d(a, d, c)       - computes c = a - d
+mp_mul_d(a, d, c)       - computes c = a * d
+mp_mul_2(a, c)          - computes c = a * 2
+mp_mul_2d(a, d, c)      - computes c = a * 2^d
+mp_div_d(a, d, q, r)    - computes q, r such that a = bq + r
+mp_div_2(a, c)          - computes c = a / 2
+mp_div_2d(a, d, q, r)   - computes q, r such that a = q2^d + r
+mp_expt_d(a, d, c)      - computes c = a ** d
+
+The mp_mul_2() and mp_div_2() functions take advantage of the internal
+representation of an mp_int to do multiplication by two more quickly
+than mp_mul_d() would.  Other basic functions of an arithmetic variety
+include:
+
+mp_zero(a)              - assign 0 to a
+mp_neg(a, c)            - negate a: c = -a
+mp_abs(a, c)            - absolute value: c = |a|
+
+
+Comparisons
+-----------
+
+Several comparison functions are provided.  Each of these, unless
+otherwise specified, returns zero if the comparands are equal, < 0 if
+the first is less than the second, and > 0 if the first is greater
+than the second:
+
+mp_cmp_z(a)             - compare a <=> 0
+mp_cmp_d(a, d)          - compare a <=> d, d is a single digit
+mp_cmp(a, b)            - compare a <=> b
+mp_cmp_mag(a, b)        - compare |a| <=> |b|
+mp_cmp_int(a, z)        - compare a <=> z, z is a signed long integer
+mp_isodd(a)             - return nonzero if odd, zero otherwise
+mp_iseven(a)            - return nonzero if even, zero otherwise
+
+
+Modular Arithmetic
+------------------
+
+Modular variations of the basic arithmetic functions are also
+supported.  These are available if the MP_MODARITH parameter in
+mpi-config.h is turned on (it is by default).  The modular arithmetic
+functions are:
+
+mp_mod(a, m, c)         - compute c = a (mod m), 0 <= c < m
+mp_mod_d(a, d, c)       - compute c = a (mod d), 0 <= c < d (see below)
+mp_addmod(a, b, m, c)   - compute c = (a + b) mod m
+mp_submod(a, b, m, c)   - compute c = (a - b) mod m
+mp_mulmod(a, b, m, c)   - compute c = (a * b) mod m
+mp_sqrmod(a, m, c)      - compute c = (a * a) mod m
+mp_exptmod(a, b, m, c)  - compute c = (a ** b) mod m
+mp_exptmod_d(a, d, m, c)- compute c = (a ** d) mod m
+
+All these functions return the least non-negative representative of
+their result, modulo m.
+
+The mp_mod_d() function computes a modular reduction around a single
+digit d.  The resut is a single digit c.  
+
+See the file 'redux.txt' for a description of the modular reduction
+algorithm used by mp_exptmod().
+
+
+Greatest Common Divisor
+-----------------------
+
+If The greates common divisor of two values can be found using one of the
+following functions:
+
+mp_gcd(a, b, c)         - compute c = (a, b) using binary algorithm
+mp_lcm(a, b, c)         - compute c = [a, b] = ab / (a, b)
+mp_xgcd(a, b, g, x, y)  - compute g, x, y so that ax + by = g = (a, b)
+
+Also provided is a function to compute modular inverses, if they
+exist:
+
+mp_invmod(a, m, c)      - compute c = a^-1 (mod m), if it exists
+
+Because an inverse is defined for a (mod m) if and only if (a, m) = 1
+(that is, if a and m are relatively prime), mp_invmod() may not be
+able to compute an inverse for the arguments.  In this case, it
+returns the value MP_UNDEF, and does not modify c.  If an inverse is
+defined, however, it returns MP_OKAY, and sets c to the value of the
+inverse (mod m).
+
+The function mp_xgcd() computes the greatest common divisor, and also
+returns values of x and y satisfying Bezout's identity.  This is used
+by mp_invmod() to find modular inverses.  However, if you do not need
+these values, you will find that mp_gcd() is MUCH more efficient,
+since it doesn't need all the intermediate values that mp_xgcd()
+requires in order to compute x and y. 
+
+The mp_gcd() (and mp_xgcd()) functions use the binary (extended) GCD
+algorithm due to Josef Stein.
+
+
+Input & Output Functions
+------------------------
+
+The following basic I/O routines are provided.  These are present at
+all times:
+
+mp_read_radix(mp, str, r)  - convert a string in radix r to an mp_int
+
+
+mp_read_signed_bin(mp, s, len)
+                           - convert a signed string of bytes to an mp_int
+
+
+mp_read_unsigned_bin(mp, s, len)
+                           - convert an unsigned string of bytes to an mp_int
+
+
+mp_toradix(mp, str, r)     - convert an mp_int to a string of radix r digits
+mp_radix_size(mp, r)       - return length of buffer needed by mp_toradix()
+
+mp_to_signed_bin(mp, str)  - convert an mp_int to a string of bytes (signed)
+mp_signed_bin_size(mp)     - return length of buffer needed by the above
+
+mp_to_unsigned_bin(mp, str)
+                           - convert an mp_int to a string of bytes (unsigned)
+mp_unsigned_bin_size(mp)   - return length of buffer needed by the above
+
+mp_count_bits(mp)          - return the (exact) number of significant bits in
+                             the magnitude of mp.
+
+mp_char2value(ch, r)       - convert ch to its value when taken as
+                             a radix r digit, or -1 if invalid
+
+mp_value_radix_size(n, q, r)
+                           - return the number of bytes that would be required
+			     to express a number in radix r, if it were stored
+			     as n units of q bits each.  (Not including sign 
+			     or terminator bytes)
+
+mp_strerror(err)           - get a string describing mp_err value 'err'
+
+If you compile the MPI library with MP_IOFUNC defined, you will also
+have access to the following additional I/O function:
+
+mp_print(mp, ofp)       - print an mp_int as text to output stream ofp
+
+The mp_radix_size() function returns a size in bytes guaranteed to be
+big enough to hold the results of calling mp_toradix() on the value in
+that radix.  It includes an extra byte for a NUL terminator, and, if
+the input value is negative, an extra byte for the leading '-'.
+
+The sizes returned by mp_signed_bin_size() and mp_unsigned_bin_size()
+are exact; they will never over-estimate the number of bytes needed,
+and they do not include space for terminators, although the signed
+version does leave space for a sign indicator.
+
+The mp_read_radix() and mp_toradix() functions support bases from 2 to
+64 inclusive.  If you require more general radix conversion facilities
+than this, you will need to write them yourself (that's why mp_div_d()
+is provided, after all).
+
+Note:   mp_read_radix() will accept as digits either capital or 
+----    lower-case letters.  However, the current implementation of
+        mp_toradix() only outputs upper-case letters, when writing
+        bases betwee 10 and 36.  The underlying code supports using
+        lower-case letters, but the interface stub does not have a
+        selector for it.  You can add one yourself if you think it
+        is worthwhile -- I do not.  Bases from 36 to 64 use lower-
+        case letters as distinct from upper-case.  Bases 63 and
+        64 use the characters '+' and '/' as digits.
+
+        Note also that compiling with MP_IOFUNC defined will cause
+        inclusion of <stdio.h>, so if you are trying to write code
+        which does not depend on the standard C library, you will
+        probably want to avoid this option.  This is needed because
+        the mp_print() function takes a standard library FILE * as
+        one of its parameters, and uses the fprintf() function.
+
+The mp_to_signed_bin() function converts the integer to a sequence of
+bytes, in big-endian ordering (most-significant byte first).  Assuming
+your bytes are 8 bits wide, this corresponds to base 256.  The sign is
+encoded as a single leading byte, whose value is 0 for zero or
+positive values, or 1 for negative values.  The mp_read_signed_bin()
+function reverses this process -- it takes a buffer of bytes,
+interprets the first as a sign indicator (0 = zero/positive, nonzero =
+negative), and the rest as a sequence of 1-byte digits in big-endian
+ordering.
+
+The mp_signed_bin_size() function returns the exact number of bytes
+required to store the given integer in "raw" format (as described in
+the previous paragraph).  Zero is returned in case of error; a valid
+integer will require at least three bytes of storage.
+
+
+Other Functions
+---------------
+
+The files 'mpprime.h' and 'mpprime.c' define some routines which are
+useful for divisibility testing and probabilistic primality testing.
+The routines defined are:
+
+mpp_divis(a, b)          - is a divisible by b?
+mpp_divis_d(a, d)        - is a divisible by digit d?
+mpp_random(a)            - set a to random value at current precision
+mpp_random_size(a, prec) - set a to random value at given precision
+
+Note:  The mpp_random() and mpp_random_size() functions use the C
+----   library's rand() function to generate random values.  It is
+       up to the caller to seed this generator before it is called.
+       These functions are not suitable for generating quantities
+       requiring cryptographic-quality randomness; they are intended
+       primarily for use in primality testing.
+
+       Note too that the MPI library does not call srand(), so your
+       application should do this, if you ever want the sequence
+       to change.
+
+mpp_divis_vector(a, v, s, w)  - is a divisible by any of the s digits
+                                in v?  If so, let w be the index of 
+                                that digit
+
+mpp_divis_primes(a, np)       - is a divisible by any of the first np
+                                primes?  If so, set np to the prime 
+                                which divided a.
+
+mpp_fermat(a, w)              - test if w^a = w (mod a).  If so, 
+                                returns MP_YES, otherwise MP_NO.
+
+mpp_pprime(a, nt)             - perform nt iterations of the Rabin-
+                                Miller probabilistic primality test
+                                on a.  Returns MP_YES if all tests
+                                passed, or MP_NO if any test fails.
+
+The mpp_fermat() function works based on Fermat's little theorem, a
+consequence of which is that if p is a prime, and (w, p) = 1, then:
+
+        w^p = w (mod p)
+
+Put another way, if w^p != w (mod p), then p is not prime.  The test
+is expensive to compute, but it helps to quickly eliminate an enormous
+class of composite numbers prior to Rabin-Miller testing.
+
+Building the Library
+--------------------
+
+For the impatient among us, you should be able to build the MPI
+library using the following simple steps:
+
+	- Edit "Makefile.base" to set the compiler and its 
+	  options appropriately for your system.  If you're
+	  not sure, the default GCC settings are probably fine.
+
+	- Type 'make'
+
+	- You can run unit tests by typing 'make test'
+
+	- You can build the utility programs (in the 'utils' 
+	  subdirectory) by typing 'make tools'
+
+If you are interested in integrating MPI with your own application,
+however, read on.
+
+The MPI library is designed to be as self-contained as possible.  You
+should be able to compile it with your favourite ANSI C compiler, and
+link it into your program directly.  If you are on a Unix system using
+the GNU C compiler (gcc), the following should work:
+
+% gcc -ansi -pedantic -Wall -O3 -c mpi.c
+
+The file 'mpi-config.h' defines several configurable parameters for
+the library, which you can adjust to suit your application.  At the
+time of this writing, the available options are:
+
+MP_IOFUNC       - Define true to include the mp_print() function, 
+                  which is moderately useful for debugging.  This
+                  implicitly includes <stdio.h>.  For most I/O, you
+		  probably should leave this off, and use the built
+		  in base conversion function mp_toradix().
+
+MP_MODARITH     - Define true to include the modular arithmetic
+                  functions.  If you don't need modular arithmetic
+                  in your application, you can set this to zero to
+                  leave out all the modular routines.
+
+MP_NUMTH        - Define true to include number theoretic functions
+                  mp_gcd(), mp_lcm(), and mp_invmod().
+
+MP_LOGTAB       - If true, the file "logtab.h" is included, which
+                  is basically a static table of base 2 logarithms.
+                  These are used to compute how big the buffers for
+                  radix conversion need to be.  If you set this false,
+                  the library includes <math.h> and uses log().  This
+                  typically forces you to link against math libraries.
+
+MP_MEMSET       - If true, use memset() to zero buffers.  If you run
+                  into weird alignment related bugs, set this to zero
+                  and an explicit loop will be used.
+
+MP_MEMCPY       - If true, use memcpy() to copy buffers.  If you run
+                  into weird alignment bugs, set this to zero and an
+                  explicit loop will be used.
+
+MP_CRYPTO       - If true, whenever arrays of digits are free'd, they
+                  are zeroed first.  This is useful if you're using
+                  the library in a cryptographic environment; however,
+                  it does add overhead to each free() operation.  For
+                  performance, if you don't care about zeroing your
+                  buffers, set this to false.
+
+MP_ARGCHK       - Set to 0, 1, or 2.  This defines how the argument
+                  checking macro, ARGCHK(), gets expanded.  If this 
+                  is set to zero, ARGCHK() expands to nothing; no 
+                  argument checks are performed.  If this is 1, the
+                  ARGCHK() macro expands to code that returns MP_BADARG
+                  or similar at runtime.  If it is 2, ARGCHK() expands 
+                  to an assert() call that aborts the program on a 
+                  bad input (this is the default, and is recommended)
+
+MP_DEBUG        - Turns on debugging output.  This is probably not at
+                  all useful unless you are debugging the library.  It
+                  can spit out a LOT of output.
+
+MP_DEFPREC      - The default precision of a newly-created mp_int, in
+                  digits.  The precision can be changed at runtime by
+                  the mp_set_prec() function, but this is its initial
+                  value.  The memory allocator rounds all requests to
+		  a multiple of this size.
+
+MP_SQUARE       - If this is set to a nonzero value, the mp_sqr()
+		  function will use an alternate algorithm that takes
+		  advantage of the redundant inner product computation
+		  when both multiplicand and multiplier are equal.
+
+		  This should be faster on most systems, but because
+		  the inner loop is a bit more complicated than for
+		  regular multiplication, it may actually turn out to
+		  be faster to just multiply directly on your system.
+		  If that is the case, set this to false, and mp_sqr()
+		  will just call mp_mul() directly.
+
+		  This applies to all uses of mp_sqr(), including the
+		  modular version mp_sqrmod().
+
+		  There is a script called time_multiply in the utils
+		  subdirectory that will perform timing tests between
+		  mp_mu() and mp_sqr() to determine which is better.
+		  Use this if you have doubts (by default, MP_SQUARE
+		  should probably be left on)
+
+MP_PTAB_SIZE    - When compiling mpprime.c, the code includes a set
+                  of small prime integers, used to quickly eliminate
+                  obviously non-prime values.  This directive sets 
+                  how big that table will be.  If you set it to zero,
+                  all the primes less than 2^16 will be included,
+                  so it is best to set this to something "reasonable".
+                  See the file 'primes.c' for more details.
+
+If you would like to use the mp_print() function (see above), be sure
+to define MP_IOFUNC in mpi-config.h.  Many of the test drivers in the
+'tests' subdirectory expect this to be defined (although the test
+driver 'mpi-test' doesn't need it)
+
+If all goes well, the library should compile without warnings using
+this combination.  You should, of course, make whatever adjustments
+you find necessary.  
+
+The MPI library distribution comes with several additional programs
+which are intended to demonstrate the use of the library, and provide
+a framework for testing it.  There are a handful of test driver
+programs, in the files named 'mptest-X.c', where X is a digit.  Also,
+there are some simple command-line utilities (in the 'utils'
+directory) for manipulating large numbers.  These include:
+
+basecvt.c       A radix-conversion program, supporting bases from
+                2 to 64 inclusive.
+
+exptmod.c       Computes arbitrary precision modular exponentiation
+                from the command line (exptmod a b m -> a^b (mod m))
+
+fact.c          Computes the factorial of an arbitrary precision
+                integer (iterative).
+
+gcd.c           Computes the greatest common divisor of two values.
+                If invoked as 'xgcd', also computes constants x and
+                y such that (a, b) = ax + by, in accordance with
+                Bezout's identity.
+
+invmod.c        Computes modular inverses
+
+isprime.c       Performs the Rabin-Miller probabilistic primality
+                test on a number.  Values which fail this test are
+                definitely composite, and those which pass are very
+                likely to be prime (although there are no guarantees)
+
+
+makeprime.c	Finds the first prime number greater than or equal
+		to the input value.
+
+metime.c	Performs modular exponentiation timing tests.
+
+mpicalc.c	Stacking calculator implemented around MPI (can be 
+		used to build automatic testing scripts that compare
+		against known-good implementations like 'dc' or 'bc')
+
+mulsqr.c	Tests the speed of mp_mul() vs. mp_sqr().
+
+primegen.c      Generates large (probable) primes.
+
+sieve.c         Implements the Sieve of Eratosthenes, using a big
+                bitmap, to generate a list of prime numbers.
+
+Most of these can be built from the Makefile that comes with the
+library.  Try 'make tools', if your environment supports it.  (If you
+are compiling on a Macintosh, I'm afraid you'll have to build them by
+hand -- fortunately, this is not difficult -- the library itself
+should compile just fine under Metrowerks CodeWarrior).
+
+
+Testing the Library
+-------------------
+
+Running 'make test' from the main directory of the MPI distribution
+should perform unit tests.  A set of unit tests are included in a
+program called 'mpi-test' (source in tests/mpi-test.c).  To invoke it
+manually, do this:
+
+	cd tests
+	make mpi-test
+	mpi-test list
+
+Alternatively, you can just run the 'all-tests' script that resides in
+the 'tests' directory.  If all the tests pass, you should see a
+message reading:
+
+        All tests passed
+
+If something went wrong, you'll get:
+
+        One or more tests failed.
+
+If this happens, there should be a file called 'test-errors.txt' in
+the 'tests' directory, which shows what values failed.  Any failure
+indicates a bug of some kind in the library, which needs to be fixed
+in order to guarantee accurate results.  If you get any such error,
+please let me know what platform and compiler you were using at the
+time, as well as which test vector failed, and what the error message
+it reported was.
+
+If you're on a system such as the Macintosh, where the standard Unix
+build tools don't work, you can build the 'mpi-test' program manually,
+and run it by hand.  This is tedious and obnoxious, sorry.
+
+There are some old manual testing programs in the 'tests' directory.
+I don't recommend using them, in general, but they are there if you
+want them.  The Makefiles don't know how to build these.  Read the
+comments at the top of each source file to see what the driver is
+supposed to test.  You probably don't need to do this; these programs
+were only written to help me as I was developing the library.
+
+The relevant files in the 'tests' directory are:
+
+mpi-test.c              The source for the test driver
+
+make-test-arrays        A Perl script to generate some of the internal
+                        data structures used by mpi-test.c
+
+test-arrays.txt         The source file for make-test-arrays
+
+all-tests               A Bourne shell script which runs all the
+                        tests in the mpi-test suite
+
+Note:   Several of the tests hard-wired into 'mpi-test' operate under
+----    the assumption that you are using at least a 16-bit mp_digit 
+        type.  If that is not true, several tests might fail, because 
+        of range problems with the maximum digit value.
+
+	Former versions of the library required modifications to the
+	mp_read_raw() function, which used mp_mul_d() to multiply by
+	256 in each step.  This is no longer necessary; the current
+	version uses s_mp_mul_2d() to do its shifting.
+
+Acknowledgements:
+----------------
+
+The algorithms used in this library were drawn primarily from Volume
+2 of Donald Knuth's magnum opus, _The Art of Computer Programming_, 
+"Semi-Numerical Methods".  Barrett's algorithm for modular reduction
+came from Menezes, Oorschot, and Vanstone's _Handbook of Applied
+Cryptography_, Chapter 14.
+
+Thanks are definitely due to the following helpful souls, who have
+helped find and get rid of bugs in the library:
+
+- Tom St. Denis has found and reported lots and lots of bugs of
+  various severity, and generally tests the code heavily.  Kudos
+  and thanks to Tom for his persistence, and his patience with 
+  my mistakes. :)
+
+- Bryan Olson helped fix a subtle bug in s_mp_div() that was making
+  certain corner cases break.
+
+- Nelson Bolyard <nelsonb@netscape.com> found a subtle division bug
+  in s_mp_div() that was badly hurting performance in some cases.
+
+- scroussette@yahoo.com spotted a NULL pointer indirection bug in 
+  mp_div_d() when the quotient was zero.
+
+- Tushar Udeshi <tudeshi@zyvex.com> found a sign-related bug in the
+  mp_sub() function that would give the incorrect sign to the result
+  of subtracting a positive from a negative, when the output parameter
+  was different from both the input parameters.
+
+If you spot something you think may be a bug, please let me know!
+Obviously, it's most helpful if you can give me some test data that
+cause the bug to occur, and a complete description of what happens.
+
+
+About the Author
+----------------
+
+This software was written by Michael J. Fromberger,
+  http://www.dartmouth.edu/~sting/
+
+See the MPI home page at
+  http://www.dartmouth.edu/~sting/mpi/
+
+This software is in the public domain.  It is entirely free, and you
+may use it and/or redistribute it for whatever purpose you choose;
+however, as free software, it is provided without warranty of any
+kind, not even the implied warranty of merchantability or fitness for
+a particular purpose.
+
+Last updated:  24-Dec-2002
diff --git a/mpi/logtab.h b/mpi/logtab.h
new file mode 100644
index 00000000..88d2afa6
--- /dev/null
+++ b/mpi/logtab.h
@@ -0,0 +1,20 @@
+const double s_logv_2[] = {
+   0.000000000, 0.000000000, 1.000000000, 0.630929754, 	/*  0  1  2  3 */
+   0.500000000, 0.430676558, 0.386852807, 0.356207187, 	/*  4  5  6  7 */
+   0.333333333, 0.315464877, 0.301029996, 0.289064826, 	/*  8  9 10 11 */
+   0.278942946, 0.270238154, 0.262649535, 0.255958025, 	/* 12 13 14 15 */
+   0.250000000, 0.244650542, 0.239812467, 0.235408913, 	/* 16 17 18 19 */
+   0.231378213, 0.227670249, 0.224243824, 0.221064729, 	/* 20 21 22 23 */
+   0.218104292, 0.215338279, 0.212746054, 0.210309918, 	/* 24 25 26 27 */
+   0.208014598, 0.205846832, 0.203795047, 0.201849087, 	/* 28 29 30 31 */
+   0.200000000, 0.198239863, 0.196561632, 0.194959022, 	/* 32 33 34 35 */
+   0.193426404, 0.191958720, 0.190551412, 0.189200360, 	/* 36 37 38 39 */
+   0.187901825, 0.186652411, 0.185449023, 0.184288833, 	/* 40 41 42 43 */
+   0.183169251, 0.182087900, 0.181042597, 0.180031327, 	/* 44 45 46 47 */
+   0.179052232, 0.178103594, 0.177183820, 0.176291434, 	/* 48 49 50 51 */
+   0.175425064, 0.174583430, 0.173765343, 0.172969690, 	/* 52 53 54 55 */
+   0.172195434, 0.171441601, 0.170707280, 0.169991616, 	/* 56 57 58 59 */
+   0.169293808, 0.168613099, 0.167948779, 0.167300179, 	/* 60 61 62 63 */
+   0.166666667
+};
+
diff --git a/mpi/make-logtab b/mpi/make-logtab
new file mode 100755
index 00000000..bb4eea39
--- /dev/null
+++ b/mpi/make-logtab
@@ -0,0 +1,31 @@
+#!/usr/bin/env perl
+
+#
+# make-logtab
+#
+# Generate a table of logarithms of 2 in various bases, for use in
+# estimating the output sizes of various bases.
+#
+# by Michael J. Fromberger <sting@linguist.dartmouth.edu>
+# Copyright (C) 1999 Michael J. Fromberger, All Rights Reserved
+#
+# $Id: make-logtab,v 1.2 2006/07/07 19:35:02 sting Exp $
+#
+
+$ARRAYNAME = $ENV{'ARRAYNAME'} || "s_logv_2";
+$ARRAYTYPE = $ENV{'ARRAYTYPE'} || "double";
+
+printf("const %s %s[] = {\n   %0.9f, %0.9f, ", 
+       $ARRAYTYPE, $ARRAYNAME, 0, 0);
+$brk = 2;
+for($ix = 2; $ix < 64; $ix++) {
+    printf("%0.9f, ", (log(2)/log($ix)));
+    $brk = ($brk + 1) & 3;
+    if(!$brk) {
+	printf("\t/* %2d %2d %2d %2d */\n   ",
+	       $ix - 3, $ix - 2, $ix - 1, $ix);
+    }
+}
+printf("%0.9f\n};\n\n", (log(2)/log($ix)));
+
+exit 0;
diff --git a/mpi/mpi-config.h b/mpi/mpi-config.h
new file mode 100644
index 00000000..9d3df110
--- /dev/null
+++ b/mpi/mpi-config.h
@@ -0,0 +1,85 @@
+/* Default configuration for MPI library */
+/* $Id: mpi-config.h,v 1.1 2004/02/08 04:29:29 sting Exp $ */
+
+#ifndef MPI_CONFIG_H_
+#define MPI_CONFIG_H_
+
+/*
+  For boolean options, 
+  0 = no
+  1 = yes
+
+  Other options are documented individually.
+
+ */
+
+#ifndef MP_IOFUNC
+#define MP_IOFUNC     0  /* include mp_print() ?                */
+#endif
+
+#ifndef MP_MODARITH
+#define MP_MODARITH   1  /* include modular arithmetic ?        */
+#endif
+
+#ifndef MP_NUMTH
+#define MP_NUMTH      1  /* include number theoretic functions? */
+#endif
+
+#ifndef MP_LOGTAB
+#define MP_LOGTAB     1  /* use table of logs instead of log()? */
+#endif
+
+#ifndef MP_MEMSET
+#define MP_MEMSET     1  /* use memset() to zero buffers?       */
+#endif
+
+#ifndef MP_MEMCPY
+#define MP_MEMCPY     1  /* use memcpy() to copy buffers?       */
+#endif
+
+#ifndef MP_CRYPTO
+#define MP_CRYPTO     0  /* erase memory on free?               */
+#endif
+
+#ifndef MP_ARGCHK
+/*
+  0 = no parameter checks
+  1 = runtime checks, continue execution and return an error to caller
+  2 = assertions; dump core on parameter errors
+ */
+#define MP_ARGCHK     2  /* how to check input arguments        */
+#endif
+
+#ifndef MP_DEBUG
+#define MP_DEBUG      0  /* print diagnostic output?            */
+#endif
+
+#ifndef MP_DEFPREC
+#define MP_DEFPREC    16 /* default precision, in digits        */
+#endif
+
+#ifndef MP_MACRO
+#define MP_MACRO      1  /* use macros for frequent calls?      */
+#endif
+
+#ifndef MP_SQUARE
+#define MP_SQUARE     1  /* use separate squaring code?         */
+#endif
+
+#ifndef MP_PTAB_SIZE
+/*
+  When building mpprime.c, we build in a table of small prime
+  values to use for primality testing.  The more you include,
+  the more space they take up.  See primes.c for the possible
+  values (currently 16, 32, 64, 128, 256, and 6542)
+ */
+#define MP_PTAB_SIZE  128  /* how many built-in primes?         */
+#endif
+
+#ifndef MP_COMPAT_MACROS
+#define MP_COMPAT_MACROS 0   /* define compatibility macros?    */
+#endif
+
+#endif /* ifndef MPI_CONFIG_H_ */
+
+
diff --git a/mpi/mpi-types.h b/mpi/mpi-types.h
new file mode 100644
index 00000000..e222d68d
--- /dev/null
+++ b/mpi/mpi-types.h
@@ -0,0 +1,17 @@
+/* Type definitions generated by 'types.pl' */
+
+typedef char               mp_sign;
+typedef unsigned short     mp_digit;  /* 2 byte type */
+typedef unsigned int       mp_word;   /* 4 byte type */
+typedef unsigned int       mp_size;
+typedef int                mp_err;
+
+#define MP_DIGIT_BIT       (CHAR_BIT*sizeof(mp_digit))
+#define MP_DIGIT_MAX       USHRT_MAX
+#define MP_WORD_BIT        (CHAR_BIT*sizeof(mp_word))
+#define MP_WORD_MAX        UINT_MAX
+
+#define RADIX              (MP_DIGIT_MAX+1)
+
+#define MP_DIGIT_SIZE      2
+#define DIGIT_FMT          "%04X"
diff --git a/mpi/mpi.c b/mpi/mpi.c
new file mode 100644
index 00000000..e754ed2d
--- /dev/null
+++ b/mpi/mpi.c
@@ -0,0 +1,4001 @@
+/*
+    mpi.c
+
+    by Michael J. Fromberger <http://www.dartmouth.edu/~sting/>
+    Copyright (C) 1998 Michael J. Fromberger, All Rights Reserved
+
+    Arbitrary precision integer arithmetic library
+
+    $Id: mpi.c,v 1.1 2004/02/08 04:29:29 sting Exp $
+ */
+
+#include "mpi.h"
+#include <stdlib.h>
+#include <string.h>
+#include <ctype.h>
+
+#if MP_DEBUG
+#include <stdio.h>
+
+#define DIAG(T,V) {fprintf(stderr,T);mp_print(V,stderr);fputc('\n',stderr);}
+#else
+#define DIAG(T,V)
+#endif
+
+/* 
+   If MP_LOGTAB is not defined, use the math library to compute the
+   logarithms on the fly.  Otherwise, use the static table below.
+   Pick which works best for your system.
+ */
+#if MP_LOGTAB
+
+/* {{{ s_logv_2[] - log table for 2 in various bases */
+
+/*
+  A table of the logs of 2 for various bases (the 0 and 1 entries of
+  this table are meaningless and should not be referenced).  
+
+  This table is used to compute output lengths for the mp_toradix()
+  function.  Since a number n in radix r takes up about log_r(n)
+  digits, we estimate the output size by taking the least integer
+  greater than log_r(n), where:
+
+  log_r(n) = log_2(n) * log_r(2)
+
+  This table, therefore, is a table of log_r(2) for 2 <= r <= 36,
+  which are the output bases supported.  
+ */
+
+#include "logtab.h"
+
+/* }}} */
+#define LOG_V_2(R)  s_logv_2[(R)]
+
+#else
+
+#include <math.h>
+#define LOG_V_2(R)  (log(2.0)/log(R))
+
+#endif
+
+/* Default precision for newly created mp_int's      */
+static unsigned int s_mp_defprec = MP_DEFPREC;
+
+/* {{{ Digit arithmetic macros */
+
+/*
+  When adding and multiplying digits, the results can be larger than
+  can be contained in an mp_digit.  Thus, an mp_word is used.  These
+  macros mask off the upper and lower digits of the mp_word (the
+  mp_word may be more than 2 mp_digits wide, but we only concern
+  ourselves with the low-order 2 mp_digits)
+
+  If your mp_word DOES have more than 2 mp_digits, you need to
+  uncomment the first line, and comment out the second.
+ */
+
+/* #define  CARRYOUT(W)  (((W)>>DIGIT_BIT)&MP_DIGIT_MAX) */
+#define  CARRYOUT(W)  ((W)>>DIGIT_BIT)
+#define  ACCUM(W)     ((W)&MP_DIGIT_MAX)
+
+/* }}} */
+
+/* {{{ Comparison constants */
+
+#define  MP_LT       -1
+#define  MP_EQ        0
+#define  MP_GT        1
+
+/* }}} */
+
+/* {{{ Constant strings */
+
+/* Constant strings returned by mp_strerror() */
+static const char *mp_err_string[] = {
+  "unknown result code",     /* say what?            */
+  "boolean true",            /* MP_OKAY, MP_YES      */
+  "boolean false",           /* MP_NO                */
+  "out of memory",           /* MP_MEM               */
+  "argument out of range",   /* MP_RANGE             */
+  "invalid input parameter", /* MP_BADARG            */
+  "result is undefined"      /* MP_UNDEF             */
+};
+
+/* Value to digit maps for radix conversion   */
+
+/* s_dmap_1 - standard digits and letters */
+static const char *s_dmap_1 = 
+  "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/";
+
+#if 0
+/* s_dmap_2 - base64 ordering for digits  */
+static const char *s_dmap_2 =
+  "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+#endif
+
+/* }}} */
+
+/* {{{ Static function declarations */
+
+/* 
+   If MP_MACRO is false, these will be defined as actual functions;
+   otherwise, suitable macro definitions will be used.  This works
+   around the fact that ANSI C89 doesn't support an 'inline' keyword
+   (although I hear C9x will ... about bloody time).  At present, the
+   macro definitions are identical to the function bodies, but they'll
+   expand in place, instead of generating a function call.
+
+   I chose these particular functions to be made into macros because
+   some profiling showed they are called a lot on a typical workload,
+   and yet they are primarily housekeeping.
+ */
+#if MP_MACRO == 0
+ void     s_mp_setz(mp_digit *dp, mp_size count); /* zero digits           */
+ void     s_mp_copy(mp_digit *sp, mp_digit *dp, mp_size count); /* copy    */
+ void    *s_mp_alloc(size_t nb, size_t ni);       /* general allocator     */
+ void     s_mp_free(void *ptr);                   /* general free function */
+#else
+
+ /* Even if these are defined as macros, we need to respect the settings
+    of the MP_MEMSET and MP_MEMCPY configuration options...
+  */
+ #if MP_MEMSET == 0
+  #define  s_mp_setz(dp, count) \
+       {int ix;for(ix=0;ix<(count);ix++)(dp)[ix]=0;}
+ #else
+  #define  s_mp_setz(dp, count) memset(dp, 0, (count) * sizeof(mp_digit))
+ #endif /* MP_MEMSET */
+
+ #if MP_MEMCPY == 0
+  #define  s_mp_copy(sp, dp, count) \
+       {int ix;for(ix=0;ix<(count);ix++)(dp)[ix]=(sp)[ix];}
+ #else
+  #define  s_mp_copy(sp, dp, count) memcpy(dp, sp, (count) * sizeof(mp_digit))
+ #endif /* MP_MEMCPY */
+
+ #define  s_mp_alloc(nb, ni)  calloc(nb, ni)
+ #define  s_mp_free(ptr) {if(ptr) free(ptr);}
+#endif /* MP_MACRO */
+
+mp_err   s_mp_grow(mp_int *mp, mp_size min);   /* increase allocated size */
+mp_err   s_mp_pad(mp_int *mp, mp_size min);    /* left pad with zeroes    */
+
+void     s_mp_clamp(mp_int *mp);               /* clip leading zeroes     */
+
+void     s_mp_exch(mp_int *a, mp_int *b);      /* swap a and b in place   */
+
+mp_err   s_mp_lshd(mp_int *mp, mp_size p);     /* left-shift by p digits  */
+void     s_mp_rshd(mp_int *mp, mp_size p);     /* right-shift by p digits */
+void     s_mp_div_2d(mp_int *mp, mp_digit d);  /* divide by 2^d in place  */
+void     s_mp_mod_2d(mp_int *mp, mp_digit d);  /* modulo 2^d in place     */
+mp_err   s_mp_mul_2d(mp_int *mp, mp_digit d);  /* multiply by 2^d in place*/
+void     s_mp_div_2(mp_int *mp);               /* divide by 2 in place    */
+mp_err   s_mp_mul_2(mp_int *mp);               /* multiply by 2 in place  */
+mp_digit s_mp_norm(mp_int *a, mp_int *b);      /* normalize for division  */
+mp_err   s_mp_add_d(mp_int *mp, mp_digit d);   /* unsigned digit addition */
+mp_err   s_mp_sub_d(mp_int *mp, mp_digit d);   /* unsigned digit subtract */
+mp_err   s_mp_mul_d(mp_int *mp, mp_digit d);   /* unsigned digit multiply */
+mp_err   s_mp_div_d(mp_int *mp, mp_digit d, mp_digit *r);
+		                               /* unsigned digit divide   */
+mp_err   s_mp_reduce(mp_int *x, mp_int *m, mp_int *mu);
+                                               /* Barrett reduction       */
+mp_err   s_mp_add(mp_int *a, mp_int *b);       /* magnitude addition      */
+mp_err   s_mp_sub(mp_int *a, mp_int *b);       /* magnitude subtract      */
+mp_err   s_mp_mul(mp_int *a, mp_int *b);       /* magnitude multiply      */
+#if 0
+void     s_mp_kmul(mp_digit *a, mp_digit *b, mp_digit *out, mp_size len);
+                                               /* multiply buffers in place */
+#endif
+#if MP_SQUARE
+mp_err   s_mp_sqr(mp_int *a);                  /* magnitude square        */
+#else
+#define  s_mp_sqr(a) s_mp_mul(a, a)
+#endif
+mp_err   s_mp_div(mp_int *a, mp_int *b);       /* magnitude divide        */
+mp_err   s_mp_2expt(mp_int *a, mp_digit k);    /* a = 2^k                 */
+int      s_mp_cmp(mp_int *a, mp_int *b);       /* magnitude comparison    */
+int      s_mp_cmp_d(mp_int *a, mp_digit d);    /* magnitude digit compare */
+int      s_mp_ispow2(mp_int *v);               /* is v a power of 2?      */
+int      s_mp_ispow2d(mp_digit d);             /* is d a power of 2?      */
+
+int      s_mp_tovalue(char ch, int r);          /* convert ch to value    */
+char     s_mp_todigit(int val, int r, int low); /* convert val to digit   */
+int      s_mp_outlen(int bits, int r);          /* output length in bytes */
+
+/* }}} */
+
+/* {{{ Default precision manipulation */
+
+unsigned int mp_get_prec(void)
+{
+  return s_mp_defprec;
+
+} /* end mp_get_prec() */
+
+void         mp_set_prec(unsigned int prec)
+{
+  if(prec == 0)
+    s_mp_defprec = MP_DEFPREC;
+  else
+    s_mp_defprec = prec;
+
+} /* end mp_set_prec() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ mp_init(mp) */
+
+/*
+  mp_init(mp)
+
+  Initialize a new zero-valued mp_int.  Returns MP_OKAY if successful,
+  MP_MEM if memory could not be allocated for the structure.
+ */
+
+mp_err mp_init(mp_int *mp)
+{
+  return mp_init_size(mp, s_mp_defprec);
+
+} /* end mp_init() */
+
+/* }}} */
+
+/* {{{ mp_init_array(mp[], count) */
+
+mp_err mp_init_array(mp_int mp[], int count)
+{
+  mp_err  res;
+  int     pos;
+
+  ARGCHK(mp !=NULL && count > 0, MP_BADARG);
+
+  for(pos = 0; pos < count; ++pos) {
+    if((res = mp_init(&mp[pos])) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  return MP_OKAY;
+
+ CLEANUP:
+  while(--pos >= 0) 
+    mp_clear(&mp[pos]);
+
+  return res;
+
+} /* end mp_init_array() */
+
+/* }}} */
+
+/* {{{ mp_init_size(mp, prec) */
+
+/*
+  mp_init_size(mp, prec)
+
+  Initialize a new zero-valued mp_int with at least the given
+  precision; returns MP_OKAY if successful, or MP_MEM if memory could
+  not be allocated for the structure.
+ */
+
+mp_err mp_init_size(mp_int *mp, mp_size prec)
+{
+  ARGCHK(mp != NULL && prec > 0, MP_BADARG);
+
+  if((DIGITS(mp) = s_mp_alloc(prec, sizeof(mp_digit))) == NULL)
+    return MP_MEM;
+
+  SIGN(mp) = MP_ZPOS;
+  USED(mp) = 1;
+  ALLOC(mp) = prec;
+
+  return MP_OKAY;
+
+} /* end mp_init_size() */
+
+/* }}} */
+
+/* {{{ mp_init_copy(mp, from) */
+
+/*
+  mp_init_copy(mp, from)
+
+  Initialize mp as an exact copy of from.  Returns MP_OKAY if
+  successful, MP_MEM if memory could not be allocated for the new
+  structure.
+ */
+
+mp_err mp_init_copy(mp_int *mp, mp_int *from)
+{
+  ARGCHK(mp != NULL && from != NULL, MP_BADARG);
+
+  if(mp == from)
+    return MP_OKAY;
+
+  if((DIGITS(mp) = s_mp_alloc(USED(from), sizeof(mp_digit))) == NULL)
+    return MP_MEM;
+
+  s_mp_copy(DIGITS(from), DIGITS(mp), USED(from));
+  USED(mp) = USED(from);
+  ALLOC(mp) = USED(from);
+  SIGN(mp) = SIGN(from);
+
+  return MP_OKAY;
+
+} /* end mp_init_copy() */
+
+/* }}} */
+
+/* {{{ mp_copy(from, to) */
+
+/*
+  mp_copy(from, to)
+
+  Copies the mp_int 'from' to the mp_int 'to'.  It is presumed that
+  'to' has already been initialized (if not, use mp_init_copy()
+  instead). If 'from' and 'to' are identical, nothing happens.
+ */
+
+mp_err mp_copy(mp_int *from, mp_int *to)
+{
+  ARGCHK(from != NULL && to != NULL, MP_BADARG);
+
+  if(from == to)
+    return MP_OKAY;
+
+  { /* copy */
+    mp_digit   *tmp;
+
+    /*
+      If the allocated buffer in 'to' already has enough space to hold
+      all the used digits of 'from', we'll re-use it to avoid hitting
+      the memory allocater more than necessary; otherwise, we'd have
+      to grow anyway, so we just allocate a hunk and make the copy as
+      usual
+     */
+    if(ALLOC(to) >= USED(from)) {
+      s_mp_setz(DIGITS(to) + USED(from), ALLOC(to) - USED(from));
+      s_mp_copy(DIGITS(from), DIGITS(to), USED(from));
+      
+    } else {
+      if((tmp = s_mp_alloc(USED(from), sizeof(mp_digit))) == NULL)
+	return MP_MEM;
+
+      s_mp_copy(DIGITS(from), tmp, USED(from));
+
+      if(DIGITS(to) != NULL) {
+#if MP_CRYPTO
+	s_mp_setz(DIGITS(to), ALLOC(to));
+#endif
+	s_mp_free(DIGITS(to));
+      }
+
+      DIGITS(to) = tmp;
+      ALLOC(to) = USED(from);
+    }
+
+    /* Copy the precision and sign from the original */
+    USED(to) = USED(from);
+    SIGN(to) = SIGN(from);
+  } /* end copy */
+
+  return MP_OKAY;
+
+} /* end mp_copy() */
+
+/* }}} */
+
+/* {{{ mp_exch(mp1, mp2) */
+
+/*
+  mp_exch(mp1, mp2)
+
+  Exchange mp1 and mp2 without allocating any intermediate memory
+  (well, unless you count the stack space needed for this call and the
+  locals it creates...).  This cannot fail.
+ */
+
+void mp_exch(mp_int *mp1, mp_int *mp2)
+{
+#if MP_ARGCHK == 2
+  assert(mp1 != NULL && mp2 != NULL);
+#else
+  if(mp1 == NULL || mp2 == NULL)
+    return;
+#endif
+
+  s_mp_exch(mp1, mp2);
+
+} /* end mp_exch() */
+
+/* }}} */
+
+/* {{{ mp_clear(mp) */
+
+/*
+  mp_clear(mp)
+
+  Release the storage used by an mp_int, and void its fields so that
+  if someone calls mp_clear() again for the same int later, we won't
+  get tollchocked.
+ */
+
+void   mp_clear(mp_int *mp)
+{
+  if(mp == NULL)
+    return;
+
+  if(DIGITS(mp) != NULL) {
+#if MP_CRYPTO
+    s_mp_setz(DIGITS(mp), ALLOC(mp));
+#endif
+    s_mp_free(DIGITS(mp));
+    DIGITS(mp) = NULL;
+  }
+
+  USED(mp) = 0;
+  ALLOC(mp) = 0;
+
+} /* end mp_clear() */
+
+/* }}} */
+
+/* {{{ mp_clear_array(mp[], count) */
+
+void   mp_clear_array(mp_int mp[], int count)
+{
+  ARGCHK(mp != NULL && count > 0, MP_BADARG);
+
+  while(--count >= 0) 
+    mp_clear(&mp[count]);
+
+} /* end mp_clear_array() */
+
+/* }}} */
+
+/* {{{ mp_zero(mp) */
+
+/*
+  mp_zero(mp) 
+
+  Set mp to zero.  Does not change the allocated size of the structure,
+  and therefore cannot fail (except on a bad argument, which we ignore)
+ */
+void   mp_zero(mp_int *mp)
+{
+  if(mp == NULL)
+    return;
+
+  s_mp_setz(DIGITS(mp), ALLOC(mp));
+  USED(mp) = 1;
+  SIGN(mp) = MP_ZPOS;
+
+} /* end mp_zero() */
+
+/* }}} */
+
+/* {{{ mp_set(mp, d) */
+
+void   mp_set(mp_int *mp, mp_digit d)
+{
+  if(mp == NULL)
+    return;
+
+  mp_zero(mp);
+  DIGIT(mp, 0) = d;
+
+} /* end mp_set() */
+
+/* }}} */
+
+/* {{{ mp_set_int(mp, z) */
+
+mp_err mp_set_int(mp_int *mp, long z)
+{
+  int            ix;
+  unsigned long  v = abs(z);
+  mp_err         res;
+
+  ARGCHK(mp != NULL, MP_BADARG);
+
+  mp_zero(mp);
+  if(z == 0)
+    return MP_OKAY;  /* shortcut for zero */
+
+  for(ix = sizeof(long) - 1; ix >= 0; ix--) {
+
+    if((res = s_mp_mul_2d(mp, CHAR_BIT)) != MP_OKAY)
+      return res;
+
+    res = s_mp_add_d(mp, 
+		     (mp_digit)((v >> (ix * CHAR_BIT)) & UCHAR_MAX));
+    if(res != MP_OKAY)
+      return res;
+
+  }
+
+  if(z < 0)
+    SIGN(mp) = MP_NEG;
+
+  return MP_OKAY;
+
+} /* end mp_set_int() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ Digit arithmetic */
+
+/* {{{ mp_add_d(a, d, b) */
+
+/*
+  mp_add_d(a, d, b)
+
+  Compute the sum b = a + d, for a single digit d.  Respects the sign of
+  its primary addend (single digits are unsigned anyway).
+ */
+
+mp_err mp_add_d(mp_int *a, mp_digit d, mp_int *b)
+{
+  mp_err   res = MP_OKAY;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  if(SIGN(b) == MP_ZPOS) {
+    res = s_mp_add_d(b, d);
+  } else if(s_mp_cmp_d(b, d) >= 0) {
+    res = s_mp_sub_d(b, d);
+  } else {
+    SIGN(b) = MP_ZPOS;
+
+    DIGIT(b, 0) = d - DIGIT(b, 0);
+  }
+
+  return res;
+
+} /* end mp_add_d() */
+
+/* }}} */
+
+/* {{{ mp_sub_d(a, d, b) */
+
+/*
+  mp_sub_d(a, d, b)
+
+  Compute the difference b = a - d, for a single digit d.  Respects the
+  sign of its subtrahend (single digits are unsigned anyway).
+ */
+
+mp_err mp_sub_d(mp_int *a, mp_digit d, mp_int *b)
+{
+  mp_err   res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  if(SIGN(b) == MP_NEG) {
+    if((res = s_mp_add_d(b, d)) != MP_OKAY)
+      return res;
+
+  } else if(s_mp_cmp_d(b, d) >= 0) {
+    if((res = s_mp_sub_d(b, d)) != MP_OKAY)
+      return res;
+
+  } else {
+    mp_neg(b, b);
+
+    DIGIT(b, 0) = d - DIGIT(b, 0);
+    SIGN(b) = MP_NEG;
+  }
+
+  if(s_mp_cmp_d(b, 0) == 0)
+    SIGN(b) = MP_ZPOS;
+
+  return MP_OKAY;
+
+} /* end mp_sub_d() */
+
+/* }}} */
+
+/* {{{ mp_mul_d(a, d, b) */
+
+/*
+  mp_mul_d(a, d, b)
+
+  Compute the product b = a * d, for a single digit d.  Respects the sign
+  of its multiplicand (single digits are unsigned anyway)
+ */
+
+mp_err mp_mul_d(mp_int *a, mp_digit d, mp_int *b)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if(d == 0) {
+    mp_zero(b);
+    return MP_OKAY;
+  }
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  res = s_mp_mul_d(b, d);
+
+  return res;
+
+} /* end mp_mul_d() */
+
+/* }}} */
+
+/* {{{ mp_mul_2(a, c) */
+
+mp_err mp_mul_2(mp_int *a, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, c)) != MP_OKAY)
+    return res;
+
+  return s_mp_mul_2(c);
+
+} /* end mp_mul_2() */
+
+/* }}} */
+
+/* {{{ mp_div_d(a, d, q, r) */
+
+/*
+  mp_div_d(a, d, q, r)
+
+  Compute the quotient q = a / d and remainder r = a mod d, for a
+  single digit d.  Respects the sign of its divisor (single digits are
+  unsigned anyway).
+ */
+
+mp_err mp_div_d(mp_int *a, mp_digit d, mp_int *q, mp_digit *r)
+{
+  mp_err   res;
+  mp_digit rem;
+  int      pow;
+
+  ARGCHK(a != NULL, MP_BADARG);
+
+  if(d == 0)
+    return MP_RANGE;
+
+  /* Shortcut for powers of two ... */
+  if((pow = s_mp_ispow2d(d)) >= 0) {
+    mp_digit  mask;
+
+    mask = (1 << pow) - 1;
+    rem = DIGIT(a, 0) & mask;
+
+    if(q) {
+      mp_copy(a, q);
+      s_mp_div_2d(q, pow);
+    }
+
+    if(r)
+      *r = rem;
+
+    return MP_OKAY;
+  }
+
+  /*
+    If the quotient is actually going to be returned, we'll try to
+    avoid hitting the memory allocator by copying the dividend into it
+    and doing the division there.  This can't be any _worse_ than
+    always copying, and will sometimes be better (since it won't make
+    another copy)
+
+    If it's not going to be returned, we need to allocate a temporary
+    to hold the quotient, which will just be discarded.
+   */
+  if(q) {
+    if((res = mp_copy(a, q)) != MP_OKAY)
+      return res;
+
+    res = s_mp_div_d(q, d, &rem);
+    if(s_mp_cmp_d(q, 0) == MP_EQ)
+      SIGN(q) = MP_ZPOS;
+
+  } else {
+    mp_int  qp;
+
+    if((res = mp_init_copy(&qp, a)) != MP_OKAY)
+      return res;
+
+    res = s_mp_div_d(&qp, d, &rem);
+    if(s_mp_cmp_d(&qp, 0) == 0)
+      SIGN(&qp) = MP_ZPOS;
+
+    mp_clear(&qp);
+  }
+
+  if(r)
+    *r = rem;
+
+  return res;
+
+} /* end mp_div_d() */
+
+/* }}} */
+
+/* {{{ mp_div_2(a, c) */
+
+/*
+  mp_div_2(a, c)
+
+  Compute c = a / 2, disregarding the remainder.
+ */
+
+mp_err mp_div_2(mp_int *a, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, c)) != MP_OKAY)
+    return res;
+
+  s_mp_div_2(c);
+
+  return MP_OKAY;
+
+} /* end mp_div_2() */
+
+/* }}} */
+
+/* {{{ mp_expt_d(a, d, b) */
+
+mp_err mp_expt_d(mp_int *a, mp_digit d, mp_int *c)
+{
+  mp_int   s, x;
+  mp_err   res;
+  mp_sign  cs = MP_ZPOS;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_init(&s)) != MP_OKAY)
+    return res;
+  if((res = mp_init_copy(&x, a)) != MP_OKAY)
+    goto X;
+
+  DIGIT(&s, 0) = 1;
+
+  if((d % 2) == 1)
+    cs = SIGN(a);
+
+  while(d != 0) {
+    if(d & 1) {
+      if((res = s_mp_mul(&s, &x)) != MP_OKAY)
+	goto CLEANUP;
+    }
+
+    d >>= 1;
+
+    if((res = s_mp_sqr(&x)) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  SIGN(&s) = cs;
+
+  s_mp_exch(&s, c);
+
+CLEANUP:
+  mp_clear(&x);
+X:
+  mp_clear(&s);
+
+  return res;
+
+} /* end mp_expt_d() */
+
+/* }}} */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ Full arithmetic */
+
+/* {{{ mp_abs(a, b) */
+
+/*
+  mp_abs(a, b)
+
+  Compute b = |a|.  'a' and 'b' may be identical.
+ */
+
+mp_err mp_abs(mp_int *a, mp_int *b)
+{
+  mp_err   res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  SIGN(b) = MP_ZPOS;
+
+  return MP_OKAY;
+
+} /* end mp_abs() */
+
+/* }}} */
+
+/* {{{ mp_neg(a, b) */
+
+/*
+  mp_neg(a, b)
+
+  Compute b = -a.  'a' and 'b' may be identical.
+ */
+
+mp_err mp_neg(mp_int *a, mp_int *b)
+{
+  mp_err   res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  if(s_mp_cmp_d(b, 0) == MP_EQ) 
+    SIGN(b) = MP_ZPOS;
+  else 
+    SIGN(b) = (SIGN(b) == MP_NEG) ? MP_ZPOS : MP_NEG;
+
+  return MP_OKAY;
+
+} /* end mp_neg() */
+
+/* }}} */
+
+/* {{{ mp_add(a, b, c) */
+
+/*
+  mp_add(a, b, c)
+
+  Compute c = a + b.  All parameters may be identical.
+ */
+
+mp_err mp_add(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_err  res;
+  int     cmp;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(SIGN(a) == SIGN(b)) { /* same sign:  add values, keep sign */
+
+    /* Commutativity of addition lets us do this in either order,
+       so we avoid having to use a temporary even if the result 
+       is supposed to replace the output
+     */
+    if(c == b) {
+      if((res = s_mp_add(c, a)) != MP_OKAY)
+	return res;
+    } else {
+      if(c != a && (res = mp_copy(a, c)) != MP_OKAY)
+	return res;
+
+      if((res = s_mp_add(c, b)) != MP_OKAY) 
+	return res;
+    }
+
+  } else if((cmp = s_mp_cmp(a, b)) > 0) {  /* different sign: a > b   */
+
+    /* If the output is going to be clobbered, we will use a temporary
+       variable; otherwise, we'll do it without touching the memory 
+       allocator at all, if possible
+     */
+    if(c == b) {
+      mp_int  tmp;
+
+      if((res = mp_init_copy(&tmp, a)) != MP_OKAY)
+	return res;
+      if((res = s_mp_sub(&tmp, b)) != MP_OKAY) {
+	mp_clear(&tmp);
+	return res;
+      }
+
+      s_mp_exch(&tmp, c);
+      mp_clear(&tmp);
+
+    } else {
+
+      if(c != a && (res = mp_copy(a, c)) != MP_OKAY)
+	return res;
+      if((res = s_mp_sub(c, b)) != MP_OKAY)
+	return res;
+
+    }
+
+  } else if(cmp == 0) {             /* different sign, a == b   */
+
+    mp_zero(c);
+    return MP_OKAY;
+
+  } else {                          /* different sign: a < b    */
+
+    /* See above... */
+    if(c == a) {
+      mp_int  tmp;
+
+      if((res = mp_init_copy(&tmp, b)) != MP_OKAY)
+	return res;
+      if((res = s_mp_sub(&tmp, a)) != MP_OKAY) {
+	mp_clear(&tmp);
+	return res;
+      }
+
+      s_mp_exch(&tmp, c);
+      mp_clear(&tmp);
+
+    } else {
+
+      if(c != b && (res = mp_copy(b, c)) != MP_OKAY)
+	return res;
+      if((res = s_mp_sub(c, a)) != MP_OKAY)
+	return res;
+
+    }
+  }
+
+  if(USED(c) == 1 && DIGIT(c, 0) == 0)
+    SIGN(c) = MP_ZPOS;
+
+  return MP_OKAY;
+
+} /* end mp_add() */
+
+/* }}} */
+
+/* {{{ mp_sub(a, b, c) */
+
+/*
+  mp_sub(a, b, c)
+
+  Compute c = a - b.  All parameters may be identical.
+ */
+
+mp_err mp_sub(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_err  res;
+  int     cmp;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(SIGN(a) != SIGN(b)) {
+    if(c == a) {
+      if((res = s_mp_add(c, b)) != MP_OKAY)
+	return res;
+    } else {
+      if(c != b && ((res = mp_copy(b, c)) != MP_OKAY))
+	return res;
+      if((res = s_mp_add(c, a)) != MP_OKAY)
+	return res;
+      SIGN(c) = SIGN(a);
+    }
+
+  } else if((cmp = s_mp_cmp(a, b)) > 0) { /* Same sign, a > b */
+    if(c == b) {
+      mp_int  tmp;
+
+      if((res = mp_init_copy(&tmp, a)) != MP_OKAY)
+	return res;
+      if((res = s_mp_sub(&tmp, b)) != MP_OKAY) {
+	mp_clear(&tmp);
+	return res;
+      }
+      s_mp_exch(&tmp, c);
+      mp_clear(&tmp);
+
+    } else {
+      if(c != a && ((res = mp_copy(a, c)) != MP_OKAY))
+	return res;
+
+      if((res = s_mp_sub(c, b)) != MP_OKAY)
+	return res;
+    }
+
+  } else if(cmp == 0) {  /* Same sign, equal magnitude */
+    mp_zero(c);
+    return MP_OKAY;
+
+  } else {               /* Same sign, b > a */
+    if(c == a) {
+      mp_int  tmp;
+
+      if((res = mp_init_copy(&tmp, b)) != MP_OKAY)
+	return res;
+
+      if((res = s_mp_sub(&tmp, a)) != MP_OKAY) {
+	mp_clear(&tmp);
+	return res;
+      }
+      s_mp_exch(&tmp, c);
+      mp_clear(&tmp);
+
+    } else {
+      if(c != b && ((res = mp_copy(b, c)) != MP_OKAY)) 
+	return res;
+
+      if((res = s_mp_sub(c, a)) != MP_OKAY)
+	return res;
+    }
+
+    SIGN(c) = !SIGN(b);
+  }
+
+  if(USED(c) == 1 && DIGIT(c, 0) == 0)
+    SIGN(c) = MP_ZPOS;
+
+  return MP_OKAY;
+
+} /* end mp_sub() */
+
+/* }}} */
+
+/* {{{ mp_mul(a, b, c) */
+
+/*
+  mp_mul(a, b, c)
+
+  Compute c = a * b.  All parameters may be identical.
+ */
+
+mp_err mp_mul(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_err   res;
+  mp_sign  sgn;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  sgn = (SIGN(a) == SIGN(b)) ? MP_ZPOS : MP_NEG;
+
+  if(c == b) {
+    if((res = s_mp_mul(c, a)) != MP_OKAY)
+      return res;
+
+  } else {
+    if((res = mp_copy(a, c)) != MP_OKAY)
+      return res;
+
+    if((res = s_mp_mul(c, b)) != MP_OKAY)
+      return res;
+  }
+  
+  if(sgn == MP_ZPOS || s_mp_cmp_d(c, 0) == MP_EQ)
+    SIGN(c) = MP_ZPOS;
+  else
+    SIGN(c) = sgn;
+  
+  return MP_OKAY;
+
+} /* end mp_mul() */
+
+/* }}} */
+
+/* {{{ mp_mul_2d(a, d, c) */
+
+/*
+  mp_mul_2d(a, d, c)
+
+  Compute c = a * 2^d.  a may be the same as c.
+ */
+
+mp_err mp_mul_2d(mp_int *a, mp_digit d, mp_int *c)
+{
+  mp_err   res;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, c)) != MP_OKAY)
+    return res;
+
+  if(d == 0)
+    return MP_OKAY;
+
+  return s_mp_mul_2d(c, d);
+
+} /* end mp_mul() */
+
+/* }}} */
+
+/* {{{ mp_sqr(a, b) */
+
+#if MP_SQUARE
+mp_err mp_sqr(mp_int *a, mp_int *b)
+{
+  mp_err   res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  if((res = s_mp_sqr(b)) != MP_OKAY)
+    return res;
+
+  SIGN(b) = MP_ZPOS;
+
+  return MP_OKAY;
+
+} /* end mp_sqr() */
+#endif
+
+/* }}} */
+
+/* {{{ mp_div(a, b, q, r) */
+
+/*
+  mp_div(a, b, q, r)
+
+  Compute q = a / b and r = a mod b.  Input parameters may be re-used
+  as output parameters.  If q or r is NULL, that portion of the
+  computation will be discarded (although it will still be computed)
+
+  Pay no attention to the hacker behind the curtain.
+ */
+
+mp_err mp_div(mp_int *a, mp_int *b, mp_int *q, mp_int *r)
+{
+  mp_err   res;
+  mp_int   qtmp, rtmp;
+  int      cmp;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if(mp_cmp_z(b) == MP_EQ)
+    return MP_RANGE;
+
+  /* If a <= b, we can compute the solution without division, and
+     avoid any memory allocation
+   */
+  if((cmp = s_mp_cmp(a, b)) < 0) {
+    if(r) {
+      if((res = mp_copy(a, r)) != MP_OKAY)
+	return res;
+    }
+
+    if(q) 
+      mp_zero(q);
+
+    return MP_OKAY;
+
+  } else if(cmp == 0) {
+
+    /* Set quotient to 1, with appropriate sign */
+    if(q) {
+      int qneg = (SIGN(a) != SIGN(b));
+
+      mp_set(q, 1);
+      if(qneg)
+	SIGN(q) = MP_NEG;
+    }
+
+    if(r)
+      mp_zero(r);
+
+    return MP_OKAY;
+  }
+
+  /* If we get here, it means we actually have to do some division */
+
+  /* Set up some temporaries... */
+  if((res = mp_init_copy(&qtmp, a)) != MP_OKAY)
+    return res;
+  if((res = mp_init_copy(&rtmp, b)) != MP_OKAY)
+    goto CLEANUP;
+
+  if((res = s_mp_div(&qtmp, &rtmp)) != MP_OKAY)
+    goto CLEANUP;
+
+  /* Compute the signs for the output  */
+  SIGN(&rtmp) = SIGN(a); /* Sr = Sa              */
+  if(SIGN(a) == SIGN(b))
+    SIGN(&qtmp) = MP_ZPOS;  /* Sq = MP_ZPOS if Sa = Sb */
+  else
+    SIGN(&qtmp) = MP_NEG;   /* Sq = MP_NEG if Sa != Sb */
+
+  if(s_mp_cmp_d(&qtmp, 0) == MP_EQ)
+    SIGN(&qtmp) = MP_ZPOS;
+  if(s_mp_cmp_d(&rtmp, 0) == MP_EQ)
+    SIGN(&rtmp) = MP_ZPOS;
+
+  /* Copy output, if it is needed      */
+  if(q) 
+    s_mp_exch(&qtmp, q);
+
+  if(r) 
+    s_mp_exch(&rtmp, r);
+
+CLEANUP:
+  mp_clear(&rtmp);
+  mp_clear(&qtmp);
+
+  return res;
+
+} /* end mp_div() */
+
+/* }}} */
+
+/* {{{ mp_div_2d(a, d, q, r) */
+
+mp_err mp_div_2d(mp_int *a, mp_digit d, mp_int *q, mp_int *r)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL, MP_BADARG);
+
+  if(q) {
+    if((res = mp_copy(a, q)) != MP_OKAY)
+      return res;
+
+    s_mp_div_2d(q, d);
+  }
+
+  if(r) {
+    if((res = mp_copy(a, r)) != MP_OKAY)
+      return res;
+
+    s_mp_mod_2d(r, d);
+  }
+
+  return MP_OKAY;
+
+} /* end mp_div_2d() */
+
+/* }}} */
+
+/* {{{ mp_expt(a, b, c) */
+
+/*
+  mp_expt(a, b, c)
+
+  Compute c = a ** b, that is, raise a to the b power.  Uses a
+  standard iterative square-and-multiply technique.
+ */
+
+mp_err mp_expt(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_int   s, x;
+  mp_err   res;
+  mp_digit d;
+  int      dig, bit;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(mp_cmp_z(b) < 0)
+    return MP_RANGE;
+
+  if((res = mp_init(&s)) != MP_OKAY)
+    return res;
+
+  mp_set(&s, 1);
+
+  if((res = mp_init_copy(&x, a)) != MP_OKAY)
+    goto X;
+
+  /* Loop over low-order digits in ascending order */
+  for(dig = 0; dig < (USED(b) - 1); dig++) {
+    d = DIGIT(b, dig);
+
+    /* Loop over bits of each non-maximal digit */
+    for(bit = 0; bit < DIGIT_BIT; bit++) {
+      if(d & 1) {
+	if((res = s_mp_mul(&s, &x)) != MP_OKAY) 
+	  goto CLEANUP;
+      }
+
+      d >>= 1;
+      
+      if((res = s_mp_sqr(&x)) != MP_OKAY)
+	goto CLEANUP;
+    }
+  }
+
+  /* Consider now the last digit... */
+  d = DIGIT(b, dig);
+
+  while(d) {
+    if(d & 1) {
+      if((res = s_mp_mul(&s, &x)) != MP_OKAY)
+	goto CLEANUP;
+    }
+
+    d >>= 1;
+
+    if((res = s_mp_sqr(&x)) != MP_OKAY)
+      goto CLEANUP;
+  }
+  
+  if(mp_iseven(b))
+    SIGN(&s) = SIGN(a);
+
+  res = mp_copy(&s, c);
+
+CLEANUP:
+  mp_clear(&x);
+X:
+  mp_clear(&s);
+
+  return res;
+
+} /* end mp_expt() */
+
+/* }}} */
+
+/* {{{ mp_2expt(a, k) */
+
+/* Compute a = 2^k */
+
+mp_err mp_2expt(mp_int *a, mp_digit k)
+{
+  ARGCHK(a != NULL, MP_BADARG);
+
+  return s_mp_2expt(a, k);
+
+} /* end mp_2expt() */
+
+/* }}} */
+
+/* {{{ mp_mod(a, m, c) */
+
+/*
+  mp_mod(a, m, c)
+
+  Compute c = a (mod m).  Result will always be 0 <= c < m.
+ */
+
+mp_err mp_mod(mp_int *a, mp_int *m, mp_int *c)
+{
+  mp_err  res;
+  int     mag;
+
+  ARGCHK(a != NULL && m != NULL && c != NULL, MP_BADARG);
+
+  if(SIGN(m) == MP_NEG)
+    return MP_RANGE;
+
+  /*
+     If |a| > m, we need to divide to get the remainder and take the
+     absolute value.  
+
+     If |a| < m, we don't need to do any division, just copy and adjust
+     the sign (if a is negative).
+
+     If |a| == m, we can simply set the result to zero.
+
+     This order is intended to minimize the average path length of the
+     comparison chain on common workloads -- the most frequent cases are
+     that |a| != m, so we do those first.
+   */
+  if((mag = s_mp_cmp(a, m)) > 0) {
+    if((res = mp_div(a, m, NULL, c)) != MP_OKAY)
+      return res;
+    
+    if(SIGN(c) == MP_NEG) {
+      if((res = mp_add(c, m, c)) != MP_OKAY)
+	return res;
+    }
+
+  } else if(mag < 0) {
+    if((res = mp_copy(a, c)) != MP_OKAY)
+      return res;
+
+    if(mp_cmp_z(a) < 0) {
+      if((res = mp_add(c, m, c)) != MP_OKAY)
+	return res;
+
+    }
+    
+  } else {
+    mp_zero(c);
+
+  }
+
+  return MP_OKAY;
+
+} /* end mp_mod() */
+
+/* }}} */
+
+/* {{{ mp_mod_d(a, d, c) */
+
+/*
+  mp_mod_d(a, d, c)
+
+  Compute c = a (mod d).  Result will always be 0 <= c < d
+ */
+mp_err mp_mod_d(mp_int *a, mp_digit d, mp_digit *c)
+{
+  mp_err   res;
+  mp_digit rem;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if(s_mp_cmp_d(a, d) > 0) {
+    if((res = mp_div_d(a, d, NULL, &rem)) != MP_OKAY)
+      return res;
+
+  } else {
+    if(SIGN(a) == MP_NEG)
+      rem = d - DIGIT(a, 0);
+    else
+      rem = DIGIT(a, 0);
+  }
+
+  if(c)
+    *c = rem;
+
+  return MP_OKAY;
+
+} /* end mp_mod_d() */
+
+/* }}} */
+
+/* {{{ mp_sqrt(a, b) */
+
+/*
+  mp_sqrt(a, b)
+
+  Compute the integer square root of a, and store the result in b.
+  Uses an integer-arithmetic version of Newton's iterative linear
+  approximation technique to determine this value; the result has the
+  following two properties:
+
+     b^2 <= a
+     (b+1)^2 >= a
+
+  It is a range error to pass a negative value.
+ */
+mp_err mp_sqrt(mp_int *a, mp_int *b)
+{
+  mp_int   x, t;
+  mp_err   res;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  /* Cannot take square root of a negative value */
+  if(SIGN(a) == MP_NEG)
+    return MP_RANGE;
+
+  /* Special cases for zero and one, trivial     */
+  if(mp_cmp_d(a, 0) == MP_EQ || mp_cmp_d(a, 1) == MP_EQ) 
+    return mp_copy(a, b);
+    
+  /* Initialize the temporaries we'll use below  */
+  if((res = mp_init_size(&t, USED(a))) != MP_OKAY)
+    return res;
+
+  /* Compute an initial guess for the iteration as a itself */
+  if((res = mp_init_copy(&x, a)) != MP_OKAY)
+    goto X;
+
+  for(;;) {
+    /* t = (x * x) - a */
+    mp_copy(&x, &t);      /* can't fail, t is big enough for original x */
+    if((res = mp_sqr(&t, &t)) != MP_OKAY ||
+       (res = mp_sub(&t, a, &t)) != MP_OKAY)
+      goto CLEANUP;
+
+    /* t = t / 2x       */
+    s_mp_mul_2(&x);
+    if((res = mp_div(&t, &x, &t, NULL)) != MP_OKAY)
+      goto CLEANUP;
+    s_mp_div_2(&x);
+
+    /* Terminate the loop, if the quotient is zero */
+    if(mp_cmp_z(&t) == MP_EQ)
+      break;
+
+    /* x = x - t       */
+    if((res = mp_sub(&x, &t, &x)) != MP_OKAY)
+      goto CLEANUP;
+
+  }
+
+  /* Copy result to output parameter */
+  mp_sub_d(&x, 1, &x);
+  s_mp_exch(&x, b);
+
+ CLEANUP:
+  mp_clear(&x);
+ X:
+  mp_clear(&t); 
+
+  return res;
+
+} /* end mp_sqrt() */
+
+/* }}} */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ Modular arithmetic */
+
+#if MP_MODARITH
+/* {{{ mp_addmod(a, b, m, c) */
+
+/*
+  mp_addmod(a, b, m, c)
+
+  Compute c = (a + b) mod m
+ */
+
+mp_err mp_addmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && b != NULL && m != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_add(a, b, c)) != MP_OKAY)
+    return res;
+  if((res = mp_mod(c, m, c)) != MP_OKAY)
+    return res;
+
+  return MP_OKAY;
+
+}
+
+/* }}} */
+
+/* {{{ mp_submod(a, b, m, c) */
+
+/*
+  mp_submod(a, b, m, c)
+
+  Compute c = (a - b) mod m
+ */
+
+mp_err mp_submod(mp_int *a, mp_int *b, mp_int *m, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && b != NULL && m != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_sub(a, b, c)) != MP_OKAY)
+    return res;
+  if((res = mp_mod(c, m, c)) != MP_OKAY)
+    return res;
+
+  return MP_OKAY;
+
+}
+
+/* }}} */
+
+/* {{{ mp_mulmod(a, b, m, c) */
+
+/*
+  mp_mulmod(a, b, m, c)
+
+  Compute c = (a * b) mod m
+ */
+
+mp_err mp_mulmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && b != NULL && m != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_mul(a, b, c)) != MP_OKAY)
+    return res;
+  if((res = mp_mod(c, m, c)) != MP_OKAY)
+    return res;
+
+  return MP_OKAY;
+
+}
+
+/* }}} */
+
+/* {{{ mp_sqrmod(a, m, c) */
+
+#if MP_SQUARE
+mp_err mp_sqrmod(mp_int *a, mp_int *m, mp_int *c)
+{
+  mp_err  res;
+
+  ARGCHK(a != NULL && m != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_sqr(a, c)) != MP_OKAY)
+    return res;
+  if((res = mp_mod(c, m, c)) != MP_OKAY)
+    return res;
+
+  return MP_OKAY;
+
+} /* end mp_sqrmod() */
+#endif
+
+/* }}} */
+
+/* {{{ mp_exptmod(a, b, m, c) */
+
+/*
+  mp_exptmod(a, b, m, c)
+
+  Compute c = (a ** b) mod m.  Uses a standard square-and-multiply
+  method with modular reductions at each step. (This is basically the
+  same code as mp_expt(), except for the addition of the reductions)
+  
+  The modular reductions are done using Barrett's algorithm (see
+  s_mp_reduce() below for details)
+ */
+
+mp_err mp_exptmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c)
+{
+  mp_int   s, x, mu;
+  mp_err   res;
+  mp_digit d, *db = DIGITS(b);
+  mp_size  ub = USED(b);
+  int      dig, bit;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(mp_cmp_z(b) < 0 || mp_cmp_z(m) <= 0)
+    return MP_RANGE;
+
+  if((res = mp_init(&s)) != MP_OKAY)
+    return res;
+  if((res = mp_init_copy(&x, a)) != MP_OKAY)
+    goto X;
+  if((res = mp_mod(&x, m, &x)) != MP_OKAY ||
+     (res = mp_init(&mu)) != MP_OKAY)
+    goto MU;
+
+  mp_set(&s, 1);
+
+  /* mu = b^2k / m */
+  s_mp_add_d(&mu, 1); 
+  s_mp_lshd(&mu, 2 * USED(m));
+  if((res = mp_div(&mu, m, &mu, NULL)) != MP_OKAY)
+    goto CLEANUP;
+
+  /* Loop over digits of b in ascending order, except highest order */
+  for(dig = 0; dig < (ub - 1); dig++) {
+    d = *db++;
+
+    /* Loop over the bits of the lower-order digits */
+    for(bit = 0; bit < DIGIT_BIT; bit++) {
+      if(d & 1) {
+	if((res = s_mp_mul(&s, &x)) != MP_OKAY)
+	  goto CLEANUP;
+	if((res = s_mp_reduce(&s, m, &mu)) != MP_OKAY)
+	  goto CLEANUP;
+      }
+
+      d >>= 1;
+
+      if((res = s_mp_sqr(&x)) != MP_OKAY)
+	goto CLEANUP;
+      if((res = s_mp_reduce(&x, m, &mu)) != MP_OKAY)
+	goto CLEANUP;
+    }
+  }
+
+  /* Now do the last digit... */
+  d = *db;
+
+  while(d) {
+    if(d & 1) {
+      if((res = s_mp_mul(&s, &x)) != MP_OKAY)
+	goto CLEANUP;
+      if((res = s_mp_reduce(&s, m, &mu)) != MP_OKAY)
+	goto CLEANUP;
+    }
+
+    d >>= 1;
+
+    if((res = s_mp_sqr(&x)) != MP_OKAY)
+      goto CLEANUP;
+    if((res = s_mp_reduce(&x, m, &mu)) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  s_mp_exch(&s, c);
+
+ CLEANUP:
+  mp_clear(&mu);
+ MU:
+  mp_clear(&x);
+ X:
+  mp_clear(&s);
+
+  return res;
+
+} /* end mp_exptmod() */
+
+/* }}} */
+
+/* {{{ mp_exptmod_d(a, d, m, c) */
+
+mp_err mp_exptmod_d(mp_int *a, mp_digit d, mp_int *m, mp_int *c)
+{
+  mp_int   s, x;
+  mp_err   res;
+
+  ARGCHK(a != NULL && c != NULL, MP_BADARG);
+
+  if((res = mp_init(&s)) != MP_OKAY)
+    return res;
+  if((res = mp_init_copy(&x, a)) != MP_OKAY)
+    goto X;
+
+  mp_set(&s, 1);
+
+  while(d != 0) {
+    if(d & 1) {
+      if((res = s_mp_mul(&s, &x)) != MP_OKAY ||
+	 (res = mp_mod(&s, m, &s)) != MP_OKAY)
+	goto CLEANUP;
+    }
+
+    d /= 2;
+
+    if((res = s_mp_sqr(&x)) != MP_OKAY ||
+       (res = mp_mod(&x, m, &x)) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  s_mp_exch(&s, c);
+
+CLEANUP:
+  mp_clear(&x);
+X:
+  mp_clear(&s);
+
+  return res;
+
+} /* end mp_exptmod_d() */
+
+/* }}} */
+#endif /* if MP_MODARITH */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ Comparison functions */
+
+/* {{{ mp_cmp_z(a) */
+
+/*
+  mp_cmp_z(a)
+
+  Compare a <=> 0.  Returns <0 if a<0, 0 if a=0, >0 if a>0.
+ */
+
+int    mp_cmp_z(mp_int *a)
+{
+  if(SIGN(a) == MP_NEG)
+    return MP_LT;
+  else if(USED(a) == 1 && DIGIT(a, 0) == 0)
+    return MP_EQ;
+  else
+    return MP_GT;
+
+} /* end mp_cmp_z() */
+
+/* }}} */
+
+/* {{{ mp_cmp_d(a, d) */
+
+/*
+  mp_cmp_d(a, d)
+
+  Compare a <=> d.  Returns <0 if a<d, 0 if a=d, >0 if a>d
+ */
+
+int    mp_cmp_d(mp_int *a, mp_digit d)
+{
+  ARGCHK(a != NULL, MP_EQ);
+
+  if(SIGN(a) == MP_NEG)
+    return MP_LT;
+
+  return s_mp_cmp_d(a, d);
+
+} /* end mp_cmp_d() */
+
+/* }}} */
+
+/* {{{ mp_cmp(a, b) */
+
+int    mp_cmp(mp_int *a, mp_int *b)
+{
+  ARGCHK(a != NULL && b != NULL, MP_EQ);
+
+  if(SIGN(a) == SIGN(b)) {
+    int  mag;
+
+    if((mag = s_mp_cmp(a, b)) == MP_EQ)
+      return MP_EQ;
+
+    if(SIGN(a) == MP_ZPOS)
+      return mag;
+    else
+      return -mag;
+
+  } else if(SIGN(a) == MP_ZPOS) {
+    return MP_GT;
+  } else {
+    return MP_LT;
+  }
+
+} /* end mp_cmp() */
+
+/* }}} */
+
+/* {{{ mp_cmp_mag(a, b) */
+
+/*
+  mp_cmp_mag(a, b)
+
+  Compares |a| <=> |b|, and returns an appropriate comparison result
+ */
+
+int    mp_cmp_mag(mp_int *a, mp_int *b)
+{
+  ARGCHK(a != NULL && b != NULL, MP_EQ);
+
+  return s_mp_cmp(a, b);
+
+} /* end mp_cmp_mag() */
+
+/* }}} */
+
+/* {{{ mp_cmp_int(a, z) */
+
+/*
+  This just converts z to an mp_int, and uses the existing comparison
+  routines.  This is sort of inefficient, but it's not clear to me how
+  frequently this wil get used anyway.  For small positive constants,
+  you can always use mp_cmp_d(), and for zero, there is mp_cmp_z().
+ */
+int    mp_cmp_int(mp_int *a, long z)
+{
+  mp_int  tmp;
+  int     out;
+
+  ARGCHK(a != NULL, MP_EQ);
+  
+  mp_init(&tmp); mp_set_int(&tmp, z);
+  out = mp_cmp(a, &tmp);
+  mp_clear(&tmp);
+
+  return out;
+
+} /* end mp_cmp_int() */
+
+/* }}} */
+
+/* {{{ mp_isodd(a) */
+
+/*
+  mp_isodd(a)
+
+  Returns a true (non-zero) value if a is odd, false (zero) otherwise.
+ */
+int    mp_isodd(mp_int *a)
+{
+  ARGCHK(a != NULL, 0);
+
+  return (DIGIT(a, 0) & 1);
+
+} /* end mp_isodd() */
+
+/* }}} */
+
+/* {{{ mp_iseven(a) */
+
+int    mp_iseven(mp_int *a)
+{
+  return !mp_isodd(a);
+
+} /* end mp_iseven() */
+
+/* }}} */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ Number theoretic functions */
+
+#if MP_NUMTH
+/* {{{ mp_gcd(a, b, c) */
+
+/*
+  Like the old mp_gcd() function, except computes the GCD using the
+  binary algorithm due to Josef Stein in 1961 (via Knuth).
+ */
+mp_err mp_gcd(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_err   res;
+  mp_int   u, v, t;
+  mp_size  k = 0;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(mp_cmp_z(a) == MP_EQ && mp_cmp_z(b) == MP_EQ)
+      return MP_RANGE;
+  if(mp_cmp_z(a) == MP_EQ) {
+    if((res = mp_copy(b, c)) != MP_OKAY)
+      return res;
+    SIGN(c) = MP_ZPOS; return MP_OKAY;
+  } else if(mp_cmp_z(b) == MP_EQ) {
+    if((res = mp_copy(a, c)) != MP_OKAY)
+      return res;
+    SIGN(c) = MP_ZPOS; return MP_OKAY;
+  }
+
+  if((res = mp_init(&t)) != MP_OKAY)
+    return res;
+  if((res = mp_init_copy(&u, a)) != MP_OKAY)
+    goto U;
+  if((res = mp_init_copy(&v, b)) != MP_OKAY)
+    goto V;
+
+  SIGN(&u) = MP_ZPOS;
+  SIGN(&v) = MP_ZPOS;
+
+  /* Divide out common factors of 2 until at least 1 of a, b is even */
+  while(mp_iseven(&u) && mp_iseven(&v)) {
+    s_mp_div_2(&u);
+    s_mp_div_2(&v);
+    ++k;
+  }
+
+  /* Initialize t */
+  if(mp_isodd(&u)) {
+    if((res = mp_copy(&v, &t)) != MP_OKAY)
+      goto CLEANUP;
+    
+    /* t = -v */
+    if(SIGN(&v) == MP_ZPOS)
+      SIGN(&t) = MP_NEG;
+    else
+      SIGN(&t) = MP_ZPOS;
+    
+  } else {
+    if((res = mp_copy(&u, &t)) != MP_OKAY)
+      goto CLEANUP;
+
+  }
+
+  for(;;) {
+    while(mp_iseven(&t)) {
+      s_mp_div_2(&t);
+    }
+
+    if(mp_cmp_z(&t) == MP_GT) {
+      if((res = mp_copy(&t, &u)) != MP_OKAY)
+	goto CLEANUP;
+
+    } else {
+      if((res = mp_copy(&t, &v)) != MP_OKAY)
+	goto CLEANUP;
+
+      /* v = -t */
+      if(SIGN(&t) == MP_ZPOS)
+	SIGN(&v) = MP_NEG;
+      else
+	SIGN(&v) = MP_ZPOS;
+    }
+
+    if((res = mp_sub(&u, &v, &t)) != MP_OKAY)
+      goto CLEANUP;
+
+    if(s_mp_cmp_d(&t, 0) == MP_EQ)
+      break;
+  }
+
+  s_mp_2expt(&v, k);       /* v = 2^k   */
+  res = mp_mul(&u, &v, c); /* c = u * v */
+
+ CLEANUP:
+  mp_clear(&v);
+ V:
+  mp_clear(&u);
+ U:
+  mp_clear(&t);
+
+  return res;
+
+} /* end mp_bgcd() */
+
+/* }}} */
+
+/* {{{ mp_lcm(a, b, c) */
+
+/* We compute the least common multiple using the rule:
+
+   ab = [a, b](a, b)
+
+   ... by computing the product, and dividing out the gcd.
+ */
+
+mp_err mp_lcm(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_int  gcd, prod;
+  mp_err  res;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  /* Set up temporaries */
+  if((res = mp_init(&gcd)) != MP_OKAY)
+    return res;
+  if((res = mp_init(&prod)) != MP_OKAY)
+    goto GCD;
+
+  if((res = mp_mul(a, b, &prod)) != MP_OKAY)
+    goto CLEANUP;
+  if((res = mp_gcd(a, b, &gcd)) != MP_OKAY)
+    goto CLEANUP;
+
+  res = mp_div(&prod, &gcd, c, NULL);
+
+ CLEANUP:
+  mp_clear(&prod);
+ GCD:
+  mp_clear(&gcd);
+
+  return res;
+
+} /* end mp_lcm() */
+
+/* }}} */
+
+/* {{{ mp_xgcd(a, b, g, x, y) */
+
+/*
+  mp_xgcd(a, b, g, x, y)
+
+  Compute g = (a, b) and values x and y satisfying Bezout's identity
+  (that is, ax + by = g).  This uses the extended binary GCD algorithm
+  based on the Stein algorithm used for mp_gcd()
+ */
+
+mp_err mp_xgcd(mp_int *a, mp_int *b, mp_int *g, mp_int *x, mp_int *y)
+{
+  mp_int   gx, xc, yc, u, v, A, B, C, D;
+  mp_int  *clean[9];
+  mp_err   res;
+  int      last = -1;
+
+  if(mp_cmp_z(b) == 0)
+    return MP_RANGE;
+
+  /* Initialize all these variables we need */
+  if((res = mp_init(&u)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &u;
+  if((res = mp_init(&v)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &v;
+  if((res = mp_init(&gx)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &gx;
+  if((res = mp_init(&A)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &A;
+  if((res = mp_init(&B)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &B;
+  if((res = mp_init(&C)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &C;
+  if((res = mp_init(&D)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &D;
+  if((res = mp_init_copy(&xc, a)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &xc;
+  mp_abs(&xc, &xc);
+  if((res = mp_init_copy(&yc, b)) != MP_OKAY) goto CLEANUP;
+  clean[++last] = &yc;
+  mp_abs(&yc, &yc);
+
+  mp_set(&gx, 1);
+
+  /* Divide by two until at least one of them is even */
+  while(mp_iseven(&xc) && mp_iseven(&yc)) {
+    s_mp_div_2(&xc);
+    s_mp_div_2(&yc);
+    if((res = s_mp_mul_2(&gx)) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  mp_copy(&xc, &u);
+  mp_copy(&yc, &v);
+  mp_set(&A, 1); mp_set(&D, 1);
+
+  /* Loop through binary GCD algorithm */
+  for(;;) {
+    while(mp_iseven(&u)) {
+      s_mp_div_2(&u);
+
+      if(mp_iseven(&A) && mp_iseven(&B)) {
+	s_mp_div_2(&A); s_mp_div_2(&B);
+      } else {
+	if((res = mp_add(&A, &yc, &A)) != MP_OKAY) goto CLEANUP;
+	s_mp_div_2(&A);
+	if((res = mp_sub(&B, &xc, &B)) != MP_OKAY) goto CLEANUP;
+	s_mp_div_2(&B);
+      }
+    }
+
+    while(mp_iseven(&v)) {
+      s_mp_div_2(&v);
+
+      if(mp_iseven(&C) && mp_iseven(&D)) {
+	s_mp_div_2(&C); s_mp_div_2(&D);
+      } else {
+	if((res = mp_add(&C, &yc, &C)) != MP_OKAY) goto CLEANUP;
+	s_mp_div_2(&C);
+	if((res = mp_sub(&D, &xc, &D)) != MP_OKAY) goto CLEANUP;
+	s_mp_div_2(&D);
+      }
+    }
+
+    if(mp_cmp(&u, &v) >= 0) {
+      if((res = mp_sub(&u, &v, &u)) != MP_OKAY) goto CLEANUP;
+      if((res = mp_sub(&A, &C, &A)) != MP_OKAY) goto CLEANUP;
+      if((res = mp_sub(&B, &D, &B)) != MP_OKAY) goto CLEANUP;
+
+    } else {
+      if((res = mp_sub(&v, &u, &v)) != MP_OKAY) goto CLEANUP;
+      if((res = mp_sub(&C, &A, &C)) != MP_OKAY) goto CLEANUP;
+      if((res = mp_sub(&D, &B, &D)) != MP_OKAY) goto CLEANUP;
+
+    }
+
+    /* If we're done, copy results to output */
+    if(mp_cmp_z(&u) == 0) {
+      if(x)
+	if((res = mp_copy(&C, x)) != MP_OKAY) goto CLEANUP;
+
+      if(y)
+	if((res = mp_copy(&D, y)) != MP_OKAY) goto CLEANUP;
+      
+      if(g)
+	if((res = mp_mul(&gx, &v, g)) != MP_OKAY) goto CLEANUP;
+
+      break;
+    }
+  }
+
+ CLEANUP:
+  while(last >= 0)
+    mp_clear(clean[last--]);
+
+  return res;
+
+} /* end mp_xgcd() */
+
+/* }}} */
+
+/* {{{ mp_invmod(a, m, c) */
+
+/*
+  mp_invmod(a, m, c)
+
+  Compute c = a^-1 (mod m), if there is an inverse for a (mod m).
+  This is equivalent to the question of whether (a, m) = 1.  If not,
+  MP_UNDEF is returned, and there is no inverse.
+ */
+
+mp_err mp_invmod(mp_int *a, mp_int *m, mp_int *c)
+{
+  mp_int  g, x;
+  mp_sign sa;
+  mp_err  res;
+
+  ARGCHK(a && m && c, MP_BADARG);
+
+  if(mp_cmp_z(a) == 0 || mp_cmp_z(m) == 0)
+    return MP_RANGE;
+
+  sa = SIGN(a);
+
+  if((res = mp_init(&g)) != MP_OKAY)
+    return res;
+  if((res = mp_init(&x)) != MP_OKAY)
+    goto X;
+
+  if((res = mp_xgcd(a, m, &g, &x, NULL)) != MP_OKAY)
+    goto CLEANUP;
+
+  if(mp_cmp_d(&g, 1) != MP_EQ) {
+    res = MP_UNDEF;
+    goto CLEANUP;
+  }
+
+  res = mp_mod(&x, m, c);
+  SIGN(c) = sa;
+
+CLEANUP:
+  mp_clear(&x);
+X:
+  mp_clear(&g);
+
+  return res;
+
+} /* end mp_invmod() */
+
+/* }}} */
+#endif /* if MP_NUMTH */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ mp_print(mp, ofp) */
+
+#if MP_IOFUNC
+/*
+  mp_print(mp, ofp)
+
+  Print a textual representation of the given mp_int on the output
+  stream 'ofp'.  Output is generated using the internal radix.
+ */
+
+void   mp_print(mp_int *mp, FILE *ofp)
+{
+  int   ix;
+
+  if(mp == NULL || ofp == NULL)
+    return;
+
+  fputc((SIGN(mp) == MP_NEG) ? '-' : '+', ofp);
+
+  for(ix = USED(mp) - 1; ix >= 0; ix--) {
+    fprintf(ofp, DIGIT_FMT, DIGIT(mp, ix));
+  }
+
+} /* end mp_print() */
+
+#endif /* if MP_IOFUNC */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* {{{ More I/O Functions */
+
+/* {{{ mp_read_signed_bin(mp, str, len) */
+
+/* 
+   mp_read_signed_bin(mp, str, len)
+
+   Read in a raw value (base 256) into the given mp_int
+ */
+
+mp_err  mp_read_signed_bin(mp_int *mp, unsigned char *str, int len)
+{
+  mp_err         res;
+
+  ARGCHK(mp != NULL && str != NULL && len > 0, MP_BADARG);
+
+  if((res = mp_read_unsigned_bin(mp, str + 1, len - 1)) == MP_OKAY) {
+    /* Get sign from first byte */
+    if(str[0])
+      SIGN(mp) = MP_NEG;
+    else
+      SIGN(mp) = MP_ZPOS;
+  }
+
+  return res;
+
+} /* end mp_read_signed_bin() */
+
+/* }}} */
+
+/* {{{ mp_signed_bin_size(mp) */
+
+int    mp_signed_bin_size(mp_int *mp)
+{
+  ARGCHK(mp != NULL, 0);
+
+  return mp_unsigned_bin_size(mp) + 1;
+
+} /* end mp_signed_bin_size() */
+
+/* }}} */
+
+/* {{{ mp_to_signed_bin(mp, str) */
+
+mp_err mp_to_signed_bin(mp_int *mp, unsigned char *str)
+{
+  ARGCHK(mp != NULL && str != NULL, MP_BADARG);
+
+  /* Caller responsible for allocating enough memory (use mp_raw_size(mp)) */
+  str[0] = (char)SIGN(mp);
+
+  return mp_to_unsigned_bin(mp, str + 1);
+
+} /* end mp_to_signed_bin() */
+
+/* }}} */
+
+/* {{{ mp_read_unsigned_bin(mp, str, len) */
+
+/*
+  mp_read_unsigned_bin(mp, str, len)
+
+  Read in an unsigned value (base 256) into the given mp_int
+ */
+
+mp_err  mp_read_unsigned_bin(mp_int *mp, unsigned char *str, int len)
+{
+  int     ix;
+  mp_err  res;
+
+  ARGCHK(mp != NULL && str != NULL && len > 0, MP_BADARG);
+
+  mp_zero(mp);
+
+  for(ix = 0; ix < len; ix++) {
+    if((res = s_mp_mul_2d(mp, CHAR_BIT)) != MP_OKAY)
+      return res;
+
+    if((res = mp_add_d(mp, str[ix], mp)) != MP_OKAY)
+      return res;
+  }
+  
+  return MP_OKAY;
+  
+} /* end mp_read_unsigned_bin() */
+
+/* }}} */
+
+/* {{{ mp_unsigned_bin_size(mp) */
+
+int     mp_unsigned_bin_size(mp_int *mp) 
+{
+  mp_digit   topdig;
+  int        count;
+
+  ARGCHK(mp != NULL, 0);
+
+  /* Special case for the value zero */
+  if(USED(mp) == 1 && DIGIT(mp, 0) == 0)
+    return 1;
+
+  count = (USED(mp) - 1) * sizeof(mp_digit);
+  topdig = DIGIT(mp, USED(mp) - 1);
+
+  while(topdig != 0) {
+    ++count;
+    topdig >>= CHAR_BIT;
+  }
+
+  return count;
+
+} /* end mp_unsigned_bin_size() */
+
+/* }}} */
+
+/* {{{ mp_to_unsigned_bin(mp, str) */
+
+mp_err mp_to_unsigned_bin(mp_int *mp, unsigned char *str)
+{
+  mp_digit      *dp, *end, d;
+  unsigned char *spos;
+
+  ARGCHK(mp != NULL && str != NULL, MP_BADARG);
+
+  dp = DIGITS(mp);
+  end = dp + USED(mp) - 1;
+  spos = str;
+
+  /* Special case for zero, quick test */
+  if(dp == end && *dp == 0) {
+    *str = '\0';
+    return MP_OKAY;
+  }
+
+  /* Generate digits in reverse order */
+  while(dp < end) {
+    int      ix;
+
+    d = *dp;
+    for(ix = 0; ix < sizeof(mp_digit); ++ix) {
+      *spos = d & UCHAR_MAX;
+      d >>= CHAR_BIT;
+      ++spos;
+    }
+
+    ++dp;
+  }
+
+  /* Now handle last digit specially, high order zeroes are not written */
+  d = *end;
+  while(d != 0) {
+    *spos = d & UCHAR_MAX;
+    d >>= CHAR_BIT;
+    ++spos;
+  }
+
+  /* Reverse everything to get digits in the correct order */
+  while(--spos > str) {
+    unsigned char t = *str;
+    *str = *spos;
+    *spos = t;
+
+    ++str;
+  }
+
+  return MP_OKAY;
+
+} /* end mp_to_unsigned_bin() */
+
+/* }}} */
+
+/* {{{ mp_count_bits(mp) */
+
+int    mp_count_bits(mp_int *mp)
+{
+  int      len;
+  mp_digit d;
+
+  ARGCHK(mp != NULL, MP_BADARG);
+
+  len = DIGIT_BIT * (USED(mp) - 1);
+  d = DIGIT(mp, USED(mp) - 1);
+
+  while(d != 0) {
+    ++len;
+    d >>= 1;
+  }
+
+  return len;
+  
+} /* end mp_count_bits() */
+
+/* }}} */
+
+/* {{{ mp_read_radix(mp, str, radix) */
+
+/*
+  mp_read_radix(mp, str, radix)
+
+  Read an integer from the given string, and set mp to the resulting
+  value.  The input is presumed to be in base 10.  Leading non-digit
+  characters are ignored, and the function reads until a non-digit
+  character or the end of the string.
+ */
+
+mp_err  mp_read_radix(mp_int *mp, unsigned char *str, int radix)
+{
+  int     ix = 0, val = 0;
+  mp_err  res;
+  mp_sign sig = MP_ZPOS;
+
+  ARGCHK(mp != NULL && str != NULL && radix >= 2 && radix <= MAX_RADIX, 
+	 MP_BADARG);
+
+  mp_zero(mp);
+
+  /* Skip leading non-digit characters until a digit or '-' or '+' */
+  while(str[ix] && 
+	(s_mp_tovalue(str[ix], radix) < 0) && 
+	str[ix] != '-' &&
+	str[ix] != '+') {
+    ++ix;
+  }
+
+  if(str[ix] == '-') {
+    sig = MP_NEG;
+    ++ix;
+  } else if(str[ix] == '+') {
+    sig = MP_ZPOS; /* this is the default anyway... */
+    ++ix;
+  }
+
+  while((val = s_mp_tovalue(str[ix], radix)) >= 0) {
+    if((res = s_mp_mul_d(mp, radix)) != MP_OKAY)
+      return res;
+    if((res = s_mp_add_d(mp, val)) != MP_OKAY)
+      return res;
+    ++ix;
+  }
+
+  if(s_mp_cmp_d(mp, 0) == MP_EQ)
+    SIGN(mp) = MP_ZPOS;
+  else
+    SIGN(mp) = sig;
+
+  return MP_OKAY;
+
+} /* end mp_read_radix() */
+
+/* }}} */
+
+/* {{{ mp_radix_size(mp, radix) */
+
+int    mp_radix_size(mp_int *mp, int radix)
+{
+  int  len;
+  ARGCHK(mp != NULL, 0);
+
+  len = s_mp_outlen(mp_count_bits(mp), radix) + 1; /* for NUL terminator */
+
+  if(mp_cmp_z(mp) < 0)
+    ++len; /* for sign */
+
+  return len;
+
+} /* end mp_radix_size() */
+
+/* }}} */
+
+/* {{{ mp_value_radix_size(num, qty, radix) */
+
+/* num = number of digits
+   qty = number of bits per digit
+   radix = target base
+   
+   Return the number of digits in the specified radix that would be
+   needed to express 'num' digits of 'qty' bits each.
+ */
+int    mp_value_radix_size(int num, int qty, int radix)
+{
+  ARGCHK(num >= 0 && qty > 0 && radix >= 2 && radix <= MAX_RADIX, 0);
+
+  return s_mp_outlen(num * qty, radix);
+
+} /* end mp_value_radix_size() */
+
+/* }}} */
+
+/* {{{ mp_toradix(mp, str, radix) */
+
+mp_err mp_toradix(mp_int *mp, unsigned char *str, int radix)
+{
+  int  ix, pos = 0;
+
+  ARGCHK(mp != NULL && str != NULL, MP_BADARG);
+  ARGCHK(radix > 1 && radix <= MAX_RADIX, MP_RANGE);
+
+  if(mp_cmp_z(mp) == MP_EQ) {
+    str[0] = '0';
+    str[1] = '\0';
+  } else {
+    mp_err   res;
+    mp_int   tmp;
+    mp_sign  sgn;
+    mp_digit rem, rdx = (mp_digit)radix;
+    char     ch;
+
+    if((res = mp_init_copy(&tmp, mp)) != MP_OKAY)
+      return res;
+
+    /* Save sign for later, and take absolute value */
+    sgn = SIGN(&tmp); SIGN(&tmp) = MP_ZPOS;
+
+    /* Generate output digits in reverse order      */
+    while(mp_cmp_z(&tmp) != 0) {
+      if((res = s_mp_div_d(&tmp, rdx, &rem)) != MP_OKAY) {
+	mp_clear(&tmp);
+	return res;
+      }
+
+      /* Generate digits, use capital letters */
+      ch = s_mp_todigit(rem, radix, 0);
+
+      str[pos++] = ch;
+    }
+
+    /* Add - sign if original value was negative */
+    if(sgn == MP_NEG)
+      str[pos++] = '-';
+
+    /* Add trailing NUL to end the string        */
+    str[pos--] = '\0';
+
+    /* Reverse the digits and sign indicator     */
+    ix = 0;
+    while(ix < pos) {
+      char tmp = str[ix];
+
+      str[ix] = str[pos];
+      str[pos] = tmp;
+      ++ix;
+      --pos;
+    }
+    
+    mp_clear(&tmp);
+  }
+
+  return MP_OKAY;
+
+} /* end mp_toradix() */
+
+/* }}} */
+
+/* {{{ mp_char2value(ch, r) */
+
+int    mp_char2value(char ch, int r)
+{
+  return s_mp_tovalue(ch, r);
+
+} /* end mp_tovalue() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ mp_strerror(ec) */
+
+/*
+  mp_strerror(ec)
+
+  Return a string describing the meaning of error code 'ec'.  The
+  string returned is allocated in static memory, so the caller should
+  not attempt to modify or free the memory associated with this
+  string.
+ */
+const char  *mp_strerror(mp_err ec)
+{
+  int   aec = (ec < 0) ? -ec : ec;
+
+  /* Code values are negative, so the senses of these comparisons
+     are accurate */
+  if(ec < MP_LAST_CODE || ec > MP_OKAY) {
+    return mp_err_string[0];  /* unknown error code */
+  } else {
+    return mp_err_string[aec + 1];
+  }
+
+} /* end mp_strerror() */
+
+/* }}} */
+
+/*========================================================================*/
+/*------------------------------------------------------------------------*/
+/* Static function definitions (internal use only)                        */
+
+/* {{{ Memory management */
+
+/* {{{ s_mp_grow(mp, min) */
+
+/* Make sure there are at least 'min' digits allocated to mp              */
+mp_err   s_mp_grow(mp_int *mp, mp_size min)
+{
+  if(min > ALLOC(mp)) {
+    mp_digit   *tmp;
+
+    /* Set min to next nearest default precision block size */
+    min = ((min + (s_mp_defprec - 1)) / s_mp_defprec) * s_mp_defprec;
+
+    if((tmp = s_mp_alloc(min, sizeof(mp_digit))) == NULL)
+      return MP_MEM;
+
+    s_mp_copy(DIGITS(mp), tmp, USED(mp));
+
+#if MP_CRYPTO
+    s_mp_setz(DIGITS(mp), ALLOC(mp));
+#endif
+    s_mp_free(DIGITS(mp));
+    DIGITS(mp) = tmp;
+    ALLOC(mp) = min;
+  }
+
+  return MP_OKAY;
+
+} /* end s_mp_grow() */
+
+/* }}} */
+
+/* {{{ s_mp_pad(mp, min) */
+
+/* Make sure the used size of mp is at least 'min', growing if needed     */
+mp_err   s_mp_pad(mp_int *mp, mp_size min)
+{
+  if(min > USED(mp)) {
+    mp_err  res;
+
+    /* Make sure there is room to increase precision  */
+    if(min > ALLOC(mp) && (res = s_mp_grow(mp, min)) != MP_OKAY)
+      return res;
+
+    /* Increase precision; should already be 0-filled */
+    USED(mp) = min;
+  }
+
+  return MP_OKAY;
+
+} /* end s_mp_pad() */
+
+/* }}} */
+
+/* {{{ s_mp_setz(dp, count) */
+
+#if MP_MACRO == 0
+/* Set 'count' digits pointed to by dp to be zeroes                       */
+void s_mp_setz(mp_digit *dp, mp_size count)
+{
+#if MP_MEMSET == 0
+  int  ix;
+
+  for(ix = 0; ix < count; ix++)
+    dp[ix] = 0;
+#else
+  memset(dp, 0, count * sizeof(mp_digit));
+#endif
+
+} /* end s_mp_setz() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_copy(sp, dp, count) */
+
+#if MP_MACRO == 0
+/* Copy 'count' digits from sp to dp                                      */
+void s_mp_copy(mp_digit *sp, mp_digit *dp, mp_size count)
+{
+#if MP_MEMCPY == 0
+  int  ix;
+
+  for(ix = 0; ix < count; ix++)
+    dp[ix] = sp[ix];
+#else
+  memcpy(dp, sp, count * sizeof(mp_digit));
+#endif
+
+} /* end s_mp_copy() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_alloc(nb, ni) */
+
+#if MP_MACRO == 0
+/* Allocate ni records of nb bytes each, and return a pointer to that     */
+void    *s_mp_alloc(size_t nb, size_t ni)
+{
+  return calloc(nb, ni);
+
+} /* end s_mp_alloc() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_free(ptr) */
+
+#if MP_MACRO == 0
+/* Free the memory pointed to by ptr                                      */
+void     s_mp_free(void *ptr)
+{
+  if(ptr)
+    free(ptr);
+
+} /* end s_mp_free() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_clamp(mp) */
+
+/* Remove leading zeroes from the given value                             */
+void     s_mp_clamp(mp_int *mp)
+{
+  mp_size   du = USED(mp);
+  mp_digit *zp = DIGITS(mp) + du - 1;
+
+  while(du > 1 && !*zp--)
+    --du;
+
+  if(du == 1 && *zp == 0)
+    SIGN(mp) = MP_ZPOS;
+
+  USED(mp) = du;
+
+} /* end s_mp_clamp() */
+
+
+/* }}} */
+
+/* {{{ s_mp_exch(a, b) */
+
+/* Exchange the data for a and b; (b, a) = (a, b)                         */
+void     s_mp_exch(mp_int *a, mp_int *b)
+{
+  mp_int   tmp;
+
+  tmp = *a;
+  *a = *b;
+  *b = tmp;
+
+} /* end s_mp_exch() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ Arithmetic helpers */
+
+/* {{{ s_mp_lshd(mp, p) */
+
+/* 
+   Shift mp leftward by p digits, growing if needed, and zero-filling
+   the in-shifted digits at the right end.  This is a convenient
+   alternative to multiplication by powers of the radix
+ */   
+
+mp_err   s_mp_lshd(mp_int *mp, mp_size p)
+{
+  mp_err   res;
+  mp_size  pos;
+  mp_digit *dp;
+  int     ix;
+
+  if(p == 0)
+    return MP_OKAY;
+
+  if((res = s_mp_pad(mp, USED(mp) + p)) != MP_OKAY)
+    return res;
+
+  pos = USED(mp) - 1;
+  dp = DIGITS(mp);
+
+  /* Shift all the significant figures over as needed */
+  for(ix = pos - p; ix >= 0; ix--) 
+    dp[ix + p] = dp[ix];
+
+  /* Fill the bottom digits with zeroes */
+  for(ix = 0; ix < p; ix++)
+    dp[ix] = 0;
+
+  return MP_OKAY;
+
+} /* end s_mp_lshd() */
+
+/* }}} */
+
+/* {{{ s_mp_rshd(mp, p) */
+
+/* 
+   Shift mp rightward by p digits.  Maintains the invariant that
+   digits above the precision are all zero.  Digits shifted off the
+   end are lost.  Cannot fail.
+ */
+
+void     s_mp_rshd(mp_int *mp, mp_size p)
+{
+  mp_size  ix;
+  mp_digit *dp;
+
+  if(p == 0)
+    return;
+
+  /* Shortcut when all digits are to be shifted off */
+  if(p >= USED(mp)) {
+    s_mp_setz(DIGITS(mp), ALLOC(mp));
+    USED(mp) = 1;
+    SIGN(mp) = MP_ZPOS;
+    return;
+  }
+
+  /* Shift all the significant figures over as needed */
+  dp = DIGITS(mp);
+  for(ix = p; ix < USED(mp); ix++)
+    dp[ix - p] = dp[ix];
+
+  /* Fill the top digits with zeroes */
+  ix -= p;
+  while(ix < USED(mp))
+    dp[ix++] = 0;
+
+  /* Strip off any leading zeroes    */
+  s_mp_clamp(mp);
+
+} /* end s_mp_rshd() */
+
+/* }}} */
+
+/* {{{ s_mp_div_2(mp) */
+
+/* Divide by two -- take advantage of radix properties to do it fast      */
+void     s_mp_div_2(mp_int *mp)
+{
+  s_mp_div_2d(mp, 1);
+
+} /* end s_mp_div_2() */
+
+/* }}} */
+
+/* {{{ s_mp_mul_2(mp) */
+
+mp_err s_mp_mul_2(mp_int *mp)
+{
+  int      ix;
+  mp_digit kin = 0, kout, *dp = DIGITS(mp);
+  mp_err   res;
+
+  /* Shift digits leftward by 1 bit */
+  for(ix = 0; ix < USED(mp); ix++) {
+    kout = (dp[ix] >> (DIGIT_BIT - 1)) & 1;
+    dp[ix] = (dp[ix] << 1) | kin;
+
+    kin = kout;
+  }
+
+  /* Deal with rollover from last digit */
+  if(kin) {
+    if(ix >= ALLOC(mp)) {
+      if((res = s_mp_grow(mp, ALLOC(mp) + 1)) != MP_OKAY)
+	return res;
+      dp = DIGITS(mp);
+    }
+
+    dp[ix] = kin;
+    USED(mp) += 1;
+  }
+
+  return MP_OKAY;
+
+} /* end s_mp_mul_2() */
+
+/* }}} */
+
+/* {{{ s_mp_mod_2d(mp, d) */
+
+/*
+  Remainder the integer by 2^d, where d is a number of bits.  This
+  amounts to a bitwise AND of the value, and does not require the full
+  division code
+ */
+void     s_mp_mod_2d(mp_int *mp, mp_digit d)
+{
+  unsigned int  ndig = (d / DIGIT_BIT), nbit = (d % DIGIT_BIT);
+  unsigned int  ix;
+  mp_digit      dmask, *dp = DIGITS(mp);
+
+  if(ndig >= USED(mp))
+    return;
+
+  /* Flush all the bits above 2^d in its digit */
+  dmask = (1 << nbit) - 1;
+  dp[ndig] &= dmask;
+
+  /* Flush all digits above the one with 2^d in it */
+  for(ix = ndig + 1; ix < USED(mp); ix++)
+    dp[ix] = 0;
+
+  s_mp_clamp(mp);
+
+} /* end s_mp_mod_2d() */
+
+/* }}} */
+
+/* {{{ s_mp_mul_2d(mp, d) */
+
+/*
+  Multiply by the integer 2^d, where d is a number of bits.  This
+  amounts to a bitwise shift of the value, and does not require the
+  full multiplication code.
+ */
+mp_err    s_mp_mul_2d(mp_int *mp, mp_digit d)
+{
+  mp_err   res;
+  mp_digit save, next, mask, *dp;
+  mp_size  used;
+  int      ix;
+
+  if((res = s_mp_lshd(mp, d / DIGIT_BIT)) != MP_OKAY)
+    return res;
+
+  dp = DIGITS(mp); used = USED(mp);
+  d %= DIGIT_BIT;
+
+  mask = (1 << d) - 1;
+
+  /* If the shift requires another digit, make sure we've got one to
+     work with */
+  if((dp[used - 1] >> (DIGIT_BIT - d)) & mask) {
+    if((res = s_mp_grow(mp, used + 1)) != MP_OKAY)
+      return res;
+    dp = DIGITS(mp);
+  }
+
+  /* Do the shifting... */
+  save = 0;
+  for(ix = 0; ix < used; ix++) {
+    next = (dp[ix] >> (DIGIT_BIT - d)) & mask;
+    dp[ix] = (dp[ix] << d) | save;
+    save = next;
+  }
+
+  /* If, at this point, we have a nonzero carryout into the next
+     digit, we'll increase the size by one digit, and store it...
+   */
+  if(save) {
+    dp[used] = save;
+    USED(mp) += 1;
+  }
+
+  s_mp_clamp(mp);
+  return MP_OKAY;
+
+} /* end s_mp_mul_2d() */
+
+/* }}} */
+
+/* {{{ s_mp_div_2d(mp, d) */
+
+/*
+  Divide the integer by 2^d, where d is a number of bits.  This
+  amounts to a bitwise shift of the value, and does not require the
+  full division code (used in Barrett reduction, see below)
+ */
+void     s_mp_div_2d(mp_int *mp, mp_digit d)
+{
+  int       ix;
+  mp_digit  save, next, mask, *dp = DIGITS(mp);
+
+  s_mp_rshd(mp, d / DIGIT_BIT);
+  d %= DIGIT_BIT;
+
+  mask = (1 << d) - 1;
+
+  save = 0;
+  for(ix = USED(mp) - 1; ix >= 0; ix--) {
+    next = dp[ix] & mask;
+    dp[ix] = (dp[ix] >> d) | (save << (DIGIT_BIT - d));
+    save = next;
+  }
+
+  s_mp_clamp(mp);
+
+} /* end s_mp_div_2d() */
+
+/* }}} */
+
+/* {{{ s_mp_norm(a, b) */
+
+/*
+  s_mp_norm(a, b)
+
+  Normalize a and b for division, where b is the divisor.  In order
+  that we might make good guesses for quotient digits, we want the
+  leading digit of b to be at least half the radix, which we
+  accomplish by multiplying a and b by a constant.  This constant is
+  returned (so that it can be divided back out of the remainder at the
+  end of the division process).
+
+  We multiply by the smallest power of 2 that gives us a leading digit
+  at least half the radix.  By choosing a power of 2, we simplify the 
+  multiplication and division steps to simple shifts.
+ */
+mp_digit s_mp_norm(mp_int *a, mp_int *b)
+{
+  mp_digit  t, d = 0;
+
+  t = DIGIT(b, USED(b) - 1);
+  while(t < (RADIX / 2)) {
+    t <<= 1;
+    ++d;
+  }
+    
+  if(d != 0) {
+    s_mp_mul_2d(a, d);
+    s_mp_mul_2d(b, d);
+  }
+
+  return d;
+
+} /* end s_mp_norm() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ Primitive digit arithmetic */
+
+/* {{{ s_mp_add_d(mp, d) */
+
+/* Add d to |mp| in place                                                 */
+mp_err   s_mp_add_d(mp_int *mp, mp_digit d)    /* unsigned digit addition */
+{
+  mp_word   w, k = 0;
+  mp_size   ix = 1, used = USED(mp);
+  mp_digit *dp = DIGITS(mp);
+
+  w = dp[0] + d;
+  dp[0] = ACCUM(w);
+  k = CARRYOUT(w);
+
+  while(ix < used && k) {
+    w = dp[ix] + k;
+    dp[ix] = ACCUM(w);
+    k = CARRYOUT(w);
+    ++ix;
+  }
+
+  if(k != 0) {
+    mp_err  res;
+
+    if((res = s_mp_pad(mp, USED(mp) + 1)) != MP_OKAY)
+      return res;
+
+    DIGIT(mp, ix) = k;
+  }
+
+  return MP_OKAY;
+
+} /* end s_mp_add_d() */
+
+/* }}} */
+
+/* {{{ s_mp_sub_d(mp, d) */
+
+/* Subtract d from |mp| in place, assumes |mp| > d                        */
+mp_err   s_mp_sub_d(mp_int *mp, mp_digit d)    /* unsigned digit subtract */
+{
+  mp_word   w, b = 0;
+  mp_size   ix = 1, used = USED(mp);
+  mp_digit *dp = DIGITS(mp);
+
+  /* Compute initial subtraction    */
+  w = (RADIX + dp[0]) - d;
+  b = CARRYOUT(w) ? 0 : 1;
+  dp[0] = ACCUM(w);
+
+  /* Propagate borrows leftward     */
+  while(b && ix < used) {
+    w = (RADIX + dp[ix]) - b;
+    b = CARRYOUT(w) ? 0 : 1;
+    dp[ix] = ACCUM(w);
+    ++ix;
+  }
+
+  /* Remove leading zeroes          */
+  s_mp_clamp(mp);
+
+  /* If we have a borrow out, it's a violation of the input invariant */
+  if(b)
+    return MP_RANGE;
+  else
+    return MP_OKAY;
+
+} /* end s_mp_sub_d() */
+
+/* }}} */
+
+/* {{{ s_mp_mul_d(a, d) */
+
+/* Compute a = a * d, single digit multiplication                         */
+mp_err   s_mp_mul_d(mp_int *a, mp_digit d)
+{
+  mp_word w, k = 0;
+  mp_size ix, max;
+  mp_err  res;
+  mp_digit *dp = DIGITS(a);
+
+  /*
+    Single-digit multiplication will increase the precision of the
+    output by at most one digit.  However, we can detect when this
+    will happen -- if the high-order digit of a, times d, gives a
+    two-digit result, then the precision of the result will increase;
+    otherwise it won't.  We use this fact to avoid calling s_mp_pad()
+    unless absolutely necessary.
+   */
+  max = USED(a);
+  w = dp[max - 1] * d;
+  if(CARRYOUT(w) != 0) {
+    if((res = s_mp_pad(a, max + 1)) != MP_OKAY)
+      return res;
+    dp = DIGITS(a);
+  }
+
+  for(ix = 0; ix < max; ix++) {
+    w = (dp[ix] * d) + k;
+    dp[ix] = ACCUM(w);
+    k = CARRYOUT(w);
+  }
+
+  /* If there is a precision increase, take care of it here; the above
+     test guarantees we have enough storage to do this safely.
+   */
+  if(k) {
+    dp[max] = k; 
+    USED(a) = max + 1;
+  }
+
+  s_mp_clamp(a);
+
+  return MP_OKAY;
+  
+} /* end s_mp_mul_d() */
+
+/* }}} */
+
+/* {{{ s_mp_div_d(mp, d, r) */
+
+/*
+  s_mp_div_d(mp, d, r)
+
+  Compute the quotient mp = mp / d and remainder r = mp mod d, for a
+  single digit d.  If r is null, the remainder will be discarded.
+ */
+
+mp_err   s_mp_div_d(mp_int *mp, mp_digit d, mp_digit *r)
+{
+  mp_word   w = 0, t;
+  mp_int    quot;
+  mp_err    res;
+  mp_digit *dp = DIGITS(mp), *qp;
+  int       ix;
+
+  if(d == 0)
+    return MP_RANGE;
+
+  /* Make room for the quotient */
+  if((res = mp_init_size(&quot, USED(mp))) != MP_OKAY)
+    return res;
+
+  USED(&quot) = USED(mp); /* so clamping will work below */
+  qp = DIGITS(&quot);
+
+  /* Divide without subtraction */
+  for(ix = USED(mp) - 1; ix >= 0; ix--) {
+    w = (w << DIGIT_BIT) | dp[ix];
+
+    if(w >= d) {
+      t = w / d;
+      w = w % d;
+    } else {
+      t = 0;
+    }
+
+    qp[ix] = t;
+  }
+
+  /* Deliver the remainder, if desired */
+  if(r)
+    *r = w;
+
+  s_mp_clamp(&quot);
+  mp_exch(&quot, mp);
+  mp_clear(&quot);
+
+  return MP_OKAY;
+
+} /* end s_mp_div_d() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ Primitive full arithmetic */
+
+/* {{{ s_mp_add(a, b) */
+
+/* Compute a = |a| + |b|                                                  */
+mp_err   s_mp_add(mp_int *a, mp_int *b)        /* magnitude addition      */
+{
+  mp_word   w = 0;
+  mp_digit *pa, *pb;
+  mp_size   ix, used = USED(b);
+  mp_err    res;
+
+  /* Make sure a has enough precision for the output value */
+  if((used > USED(a)) && (res = s_mp_pad(a, used)) != MP_OKAY)
+    return res;
+
+  /*
+    Add up all digits up to the precision of b.  If b had initially
+    the same precision as a, or greater, we took care of it by the
+    padding step above, so there is no problem.  If b had initially
+    less precision, we'll have to make sure the carry out is duly
+    propagated upward among the higher-order digits of the sum.
+   */
+  pa = DIGITS(a);
+  pb = DIGITS(b);
+  for(ix = 0; ix < used; ++ix) {
+    w += *pa + *pb++;
+    *pa++ = ACCUM(w);
+    w = CARRYOUT(w);
+  }
+
+  /* If we run out of 'b' digits before we're actually done, make
+     sure the carries get propagated upward...  
+   */
+  used = USED(a);
+  while(w && ix < used) {
+    w += *pa;
+    *pa++ = ACCUM(w);
+    w = CARRYOUT(w);
+    ++ix;
+  }
+
+  /* If there's an overall carry out, increase precision and include
+     it.  We could have done this initially, but why touch the memory
+     allocator unless we're sure we have to?
+   */
+  if(w) {
+    if((res = s_mp_pad(a, used + 1)) != MP_OKAY)
+      return res;
+
+    DIGIT(a, ix) = w;  /* pa may not be valid after s_mp_pad() call */
+  }
+
+  return MP_OKAY;
+
+} /* end s_mp_add() */
+
+/* }}} */
+
+/* {{{ s_mp_sub(a, b) */
+
+/* Compute a = |a| - |b|, assumes |a| >= |b|                              */
+mp_err   s_mp_sub(mp_int *a, mp_int *b)        /* magnitude subtract      */
+{
+  mp_word   w = 0;
+  mp_digit *pa, *pb;
+  mp_size   ix, used = USED(b);
+
+  /*
+    Subtract and propagate borrow.  Up to the precision of b, this
+    accounts for the digits of b; after that, we just make sure the
+    carries get to the right place.  This saves having to pad b out to
+    the precision of a just to make the loops work right...
+   */
+  pa = DIGITS(a);
+  pb = DIGITS(b);
+
+  for(ix = 0; ix < used; ++ix) {
+    w = (RADIX + *pa) - w - *pb++;
+    *pa++ = ACCUM(w);
+    w = CARRYOUT(w) ? 0 : 1;
+  }
+
+  used = USED(a);
+  while(ix < used) {
+    w = RADIX + *pa - w;
+    *pa++ = ACCUM(w);
+    w = CARRYOUT(w) ? 0 : 1;
+    ++ix;
+  }
+
+  /* Clobber any leading zeroes we created    */
+  s_mp_clamp(a);
+
+  /* 
+     If there was a borrow out, then |b| > |a| in violation
+     of our input invariant.  We've already done the work,
+     but we'll at least complain about it...
+   */
+  if(w)
+    return MP_RANGE;
+  else
+    return MP_OKAY;
+
+} /* end s_mp_sub() */
+
+/* }}} */
+
+/* {{{ s_mp_mul(a, b) */
+
+/* Compute a = |a| * |b|                                                  */
+mp_err   s_mp_mul(mp_int *a, mp_int *b)
+{
+  mp_word   w, k = 0;
+  mp_int    tmp;
+  mp_err    res;
+  mp_size   ix, jx, ua = USED(a), ub = USED(b);
+  mp_digit *pa, *pb, *pt, *pbt;
+
+  if((res = mp_init_size(&tmp, ua + ub)) != MP_OKAY)
+    return res;
+
+  /* This has the effect of left-padding with zeroes... */
+  USED(&tmp) = ua + ub;
+
+  /* We're going to need the base value each iteration */
+  pbt = DIGITS(&tmp);
+
+  /* Outer loop:  Digits of b */
+
+  pb = DIGITS(b);
+  for(ix = 0; ix < ub; ++ix, ++pb) {
+    if(*pb == 0) 
+      continue;
+
+    /* Inner product:  Digits of a */
+    pa = DIGITS(a);
+    for(jx = 0; jx < ua; ++jx, ++pa) {
+      pt = pbt + ix + jx;
+      w = *pb * *pa + k + *pt;
+      *pt = ACCUM(w);
+      k = CARRYOUT(w);
+    }
+
+    pbt[ix + jx] = k;
+    k = 0;
+  }
+
+  s_mp_clamp(&tmp);
+  s_mp_exch(&tmp, a);
+
+  mp_clear(&tmp);
+
+  return MP_OKAY;
+
+} /* end s_mp_mul() */
+
+/* }}} */
+
+/* {{{ s_mp_kmul(a, b, out, len) */
+
+#if 0
+void   s_mp_kmul(mp_digit *a, mp_digit *b, mp_digit *out, mp_size len)
+{
+  mp_word   w, k = 0;
+  mp_size   ix, jx;
+  mp_digit *pa, *pt;
+
+  for(ix = 0; ix < len; ++ix, ++b) {
+    if(*b == 0)
+      continue;
+    
+    pa = a;
+    for(jx = 0; jx < len; ++jx, ++pa) {
+      pt = out + ix + jx;
+      w = *b * *pa + k + *pt;
+      *pt = ACCUM(w);
+      k = CARRYOUT(w);
+    }
+
+    out[ix + jx] = k;
+    k = 0;
+  }
+
+} /* end s_mp_kmul() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_sqr(a) */
+
+/*
+  Computes the square of a, in place.  This can be done more
+  efficiently than a general multiplication, because many of the
+  computation steps are redundant when squaring.  The inner product
+  step is a bit more complicated, but we save a fair number of
+  iterations of the multiplication loop.
+ */
+#if MP_SQUARE
+mp_err   s_mp_sqr(mp_int *a)
+{
+  mp_word  w, k = 0;
+  mp_int   tmp;
+  mp_err   res;
+  mp_size  ix, jx, kx, used = USED(a);
+  mp_digit *pa1, *pa2, *pt, *pbt;
+
+  if((res = mp_init_size(&tmp, 2 * used)) != MP_OKAY)
+    return res;
+
+  /* Left-pad with zeroes */
+  USED(&tmp) = 2 * used;
+
+  /* We need the base value each time through the loop */
+  pbt = DIGITS(&tmp);
+
+  pa1 = DIGITS(a);
+  for(ix = 0; ix < used; ++ix, ++pa1) {
+    if(*pa1 == 0)
+      continue;
+
+    w = DIGIT(&tmp, ix + ix) + (*pa1 * *pa1);
+
+    pbt[ix + ix] = ACCUM(w);
+    k = CARRYOUT(w);
+
+    /*
+      The inner product is computed as:
+
+         (C, S) = t[i,j] + 2 a[i] a[j] + C
+
+      This can overflow what can be represented in an mp_word, and
+      since C arithmetic does not provide any way to check for
+      overflow, we have to check explicitly for overflow conditions
+      before they happen.
+     */
+    for(jx = ix + 1, pa2 = DIGITS(a) + jx; jx < used; ++jx, ++pa2) {
+      mp_word  u = 0, v;
+      
+      /* Store this in a temporary to avoid indirections later */
+      pt = pbt + ix + jx;
+
+      /* Compute the multiplicative step */
+      w = *pa1 * *pa2;
+
+      /* If w is more than half MP_WORD_MAX, the doubling will
+	 overflow, and we need to record a carry out into the next
+	 word */
+      u = (w >> (MP_WORD_BIT - 1)) & 1;
+
+      /* Double what we've got, overflow will be ignored as defined
+	 for C arithmetic (we've already noted if it is to occur)
+       */
+      w *= 2;
+
+      /* Compute the additive step */
+      v = *pt + k;
+
+      /* If we do not already have an overflow carry, check to see
+	 if the addition will cause one, and set the carry out if so 
+       */
+      u |= ((MP_WORD_MAX - v) < w);
+
+      /* Add in the rest, again ignoring overflow */
+      w += v;
+
+      /* Set the i,j digit of the output */
+      *pt = ACCUM(w);
+
+      /* Save carry information for the next iteration of the loop.
+	 This is why k must be an mp_word, instead of an mp_digit */
+      k = CARRYOUT(w) | (u << DIGIT_BIT);
+
+    } /* for(jx ...) */
+
+    /* Set the last digit in the cycle and reset the carry */
+    k = DIGIT(&tmp, ix + jx) + k;
+    pbt[ix + jx] = ACCUM(k);
+    k = CARRYOUT(k);
+
+    /* If we are carrying out, propagate the carry to the next digit
+       in the output.  This may cascade, so we have to be somewhat
+       circumspect -- but we will have enough precision in the output
+       that we won't overflow 
+     */
+    kx = 1;
+    while(k) {
+      k = pbt[ix + jx + kx] + 1;
+      pbt[ix + jx + kx] = ACCUM(k);
+      k = CARRYOUT(k);
+      ++kx;
+    }
+  } /* for(ix ...) */
+
+  s_mp_clamp(&tmp);
+  s_mp_exch(&tmp, a);
+
+  mp_clear(&tmp);
+
+  return MP_OKAY;
+
+} /* end s_mp_sqr() */
+#endif
+
+/* }}} */
+
+/* {{{ s_mp_div(a, b) */
+
+/*
+  s_mp_div(a, b)
+
+  Compute a = a / b and b = a mod b.  Assumes b > a.
+ */
+
+mp_err   s_mp_div(mp_int *a, mp_int *b)
+{
+  mp_int   quot, rem, t;
+  mp_word  q;
+  mp_err   res;
+  mp_digit d;
+  int      ix;
+
+  if(mp_cmp_z(b) == 0)
+    return MP_RANGE;
+
+  /* Shortcut if b is power of two */
+  if((ix = s_mp_ispow2(b)) >= 0) {
+    mp_copy(a, b);  /* need this for remainder */
+    s_mp_div_2d(a, (mp_digit)ix);
+    s_mp_mod_2d(b, (mp_digit)ix);
+
+    return MP_OKAY;
+  }
+
+  /* Allocate space to store the quotient */
+  if((res = mp_init_size(&quot, USED(a))) != MP_OKAY)
+    return res;
+
+  /* A working temporary for division     */
+  if((res = mp_init_size(&t, USED(a))) != MP_OKAY)
+    goto T;
+
+  /* Allocate space for the remainder     */
+  if((res = mp_init_size(&rem, USED(a))) != MP_OKAY)
+    goto REM;
+
+  /* Normalize to optimize guessing       */
+  d = s_mp_norm(a, b);
+
+  /* Perform the division itself...woo!   */
+  ix = USED(a) - 1;
+
+  while(ix >= 0) {
+    /* Find a partial substring of a which is at least b */
+    while(s_mp_cmp(&rem, b) < 0 && ix >= 0) {
+      if((res = s_mp_lshd(&rem, 1)) != MP_OKAY) 
+	goto CLEANUP;
+
+      if((res = s_mp_lshd(&quot, 1)) != MP_OKAY)
+	goto CLEANUP;
+
+      DIGIT(&rem, 0) = DIGIT(a, ix);
+      s_mp_clamp(&rem);
+      --ix;
+    }
+
+    /* If we didn't find one, we're finished dividing    */
+    if(s_mp_cmp(&rem, b) < 0) 
+      break;    
+
+    /* Compute a guess for the next quotient digit       */
+    q = DIGIT(&rem, USED(&rem) - 1);
+    if(q <= DIGIT(b, USED(b) - 1) && USED(&rem) > 1)
+      q = (q << DIGIT_BIT) | DIGIT(&rem, USED(&rem) - 2);
+
+    q /= DIGIT(b, USED(b) - 1);
+
+    /* The guess can be as much as RADIX + 1 */
+    if(q >= RADIX)
+      q = RADIX - 1;
+
+    /* See what that multiplies out to                   */
+    mp_copy(b, &t);
+    if((res = s_mp_mul_d(&t, q)) != MP_OKAY)
+      goto CLEANUP;
+
+    /* 
+       If it's too big, back it off.  We should not have to do this
+       more than once, or, in rare cases, twice.  Knuth describes a
+       method by which this could be reduced to a maximum of once, but
+       I didn't implement that here.
+     */
+    while(s_mp_cmp(&t, &rem) > 0) {
+      --q;
+      s_mp_sub(&t, b);
+    }
+
+    /* At this point, q should be the right next digit   */
+    if((res = s_mp_sub(&rem, &t)) != MP_OKAY)
+      goto CLEANUP;
+
+    /*
+      Include the digit in the quotient.  We allocated enough memory
+      for any quotient we could ever possibly get, so we should not
+      have to check for failures here
+     */
+    DIGIT(&quot, 0) = q;
+  }
+
+  /* Denormalize remainder                */
+  if(d != 0) 
+    s_mp_div_2d(&rem, d);
+
+  s_mp_clamp(&quot);
+  s_mp_clamp(&rem);
+
+  /* Copy quotient back to output         */
+  s_mp_exch(&quot, a);
+  
+  /* Copy remainder back to output        */
+  s_mp_exch(&rem, b);
+
+CLEANUP:
+  mp_clear(&rem);
+REM:
+  mp_clear(&t);
+T:
+  mp_clear(&quot);
+
+  return res;
+
+} /* end s_mp_div() */
+
+/* }}} */
+
+/* {{{ s_mp_2expt(a, k) */
+
+mp_err   s_mp_2expt(mp_int *a, mp_digit k)
+{
+  mp_err    res;
+  mp_size   dig, bit;
+
+  dig = k / DIGIT_BIT;
+  bit = k % DIGIT_BIT;
+
+  mp_zero(a);
+  if((res = s_mp_pad(a, dig + 1)) != MP_OKAY)
+    return res;
+  
+  DIGIT(a, dig) |= (1 << bit);
+
+  return MP_OKAY;
+
+} /* end s_mp_2expt() */
+
+/* }}} */
+
+/* {{{ s_mp_reduce(x, m, mu) */
+
+/*
+  Compute Barrett reduction, x (mod m), given a precomputed value for
+  mu = b^2k / m, where b = RADIX and k = #digits(m).  This should be
+  faster than straight division, when many reductions by the same
+  value of m are required (such as in modular exponentiation).  This
+  can nearly halve the time required to do modular exponentiation,
+  as compared to using the full integer divide to reduce.
+
+  This algorithm was derived from the _Handbook of Applied
+  Cryptography_ by Menezes, Oorschot and VanStone, Ch. 14,
+  pp. 603-604.  
+ */
+
+mp_err   s_mp_reduce(mp_int *x, mp_int *m, mp_int *mu)
+{
+  mp_int   q;
+  mp_err   res;
+  mp_size  um = USED(m);
+
+  if((res = mp_init_copy(&q, x)) != MP_OKAY)
+    return res;
+
+  s_mp_rshd(&q, um - 1);       /* q1 = x / b^(k-1)  */
+  s_mp_mul(&q, mu);            /* q2 = q1 * mu      */
+  s_mp_rshd(&q, um + 1);       /* q3 = q2 / b^(k+1) */
+
+  /* x = x mod b^(k+1), quick (no division) */
+  s_mp_mod_2d(x, DIGIT_BIT * (um + 1));
+
+  /* q = q * m mod b^(k+1), quick (no division) */
+  s_mp_mul(&q, m);
+  s_mp_mod_2d(&q, DIGIT_BIT * (um + 1));
+
+  /* x = x - q */
+  if((res = mp_sub(x, &q, x)) != MP_OKAY)
+    goto CLEANUP;
+
+  /* If x < 0, add b^(k+1) to it */
+  if(mp_cmp_z(x) < 0) {
+    mp_set(&q, 1);
+    if((res = s_mp_lshd(&q, um + 1)) != MP_OKAY)
+      goto CLEANUP;
+    if((res = mp_add(x, &q, x)) != MP_OKAY)
+      goto CLEANUP;
+  }
+
+  /* Back off if it's too big */
+  while(mp_cmp(x, m) >= 0) {
+    if((res = s_mp_sub(x, m)) != MP_OKAY)
+      break;
+  }
+
+ CLEANUP:
+  mp_clear(&q);
+
+  return res;
+
+} /* end s_mp_reduce() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ Primitive comparisons */
+
+/* {{{ s_mp_cmp(a, b) */
+
+/* Compare |a| <=> |b|, return 0 if equal, <0 if a<b, >0 if a>b           */
+int      s_mp_cmp(mp_int *a, mp_int *b)
+{
+  mp_size   ua = USED(a), ub = USED(b);
+
+  if(ua > ub)
+    return MP_GT;
+  else if(ua < ub)
+    return MP_LT;
+  else {
+    int      ix = ua - 1;
+    mp_digit *ap = DIGITS(a) + ix, *bp = DIGITS(b) + ix;
+
+    while(ix >= 0) {
+      if(*ap > *bp)
+	return MP_GT;
+      else if(*ap < *bp)
+	return MP_LT;
+
+      --ap; --bp; --ix;
+    }
+
+    return MP_EQ;
+  }
+
+} /* end s_mp_cmp() */
+
+/* }}} */
+
+/* {{{ s_mp_cmp_d(a, d) */
+
+/* Compare |a| <=> d, return 0 if equal, <0 if a<d, >0 if a>d             */
+int      s_mp_cmp_d(mp_int *a, mp_digit d)
+{
+  mp_size  ua = USED(a);
+  mp_digit *ap = DIGITS(a);
+
+  if(ua > 1)
+    return MP_GT;
+
+  if(*ap < d) 
+    return MP_LT;
+  else if(*ap > d)
+    return MP_GT;
+  else
+    return MP_EQ;
+
+} /* end s_mp_cmp_d() */
+
+/* }}} */
+
+/* {{{ s_mp_ispow2(v) */
+
+/*
+  Returns -1 if the value is not a power of two; otherwise, it returns
+  k such that v = 2^k, i.e. lg(v).
+ */
+int      s_mp_ispow2(mp_int *v)
+{
+  mp_digit d, *dp;
+  mp_size  uv = USED(v);
+  int      extra = 0, ix;
+
+  d = DIGIT(v, uv - 1); /* most significant digit of v */
+
+  while(d && ((d & 1) == 0)) {
+    d >>= 1;
+    ++extra;
+  }
+
+  if(d == 1) {
+    ix = uv - 2;
+    dp = DIGITS(v) + ix;
+
+    while(ix >= 0) {
+      if(*dp)
+	return -1; /* not a power of two */
+
+      --dp; --ix;
+    }
+
+    return ((uv - 1) * DIGIT_BIT) + extra;
+  } 
+
+  return -1;
+
+} /* end s_mp_ispow2() */
+
+/* }}} */
+
+/* {{{ s_mp_ispow2d(d) */
+
+int      s_mp_ispow2d(mp_digit d)
+{
+  int   pow = 0;
+
+  while((d & 1) == 0) {
+    ++pow; d >>= 1;
+  }
+
+  if(d == 1)
+    return pow;
+
+  return -1;
+
+} /* end s_mp_ispow2d() */
+
+/* }}} */
+
+/* }}} */
+
+/* {{{ Primitive I/O helpers */
+
+/* {{{ s_mp_tovalue(ch, r) */
+
+/*
+  Convert the given character to its digit value, in the given radix.
+  If the given character is not understood in the given radix, -1 is
+  returned.  Otherwise the digit's numeric value is returned.
+
+  The results will be odd if you use a radix < 2 or > 62, you are
+  expected to know what you're up to.
+ */
+int      s_mp_tovalue(char ch, int r)
+{
+  int    val, xch;
+  
+  if(r > 36)
+    xch = ch;
+  else
+    xch = toupper(ch);
+
+  if(isdigit(xch))
+    val = xch - '0';
+  else if(isupper(xch))
+    val = xch - 'A' + 10;
+  else if(islower(xch))
+    val = xch - 'a' + 36;
+  else if(xch == '+')
+    val = 62;
+  else if(xch == '/')
+    val = 63;
+  else 
+    return -1;
+
+  if(val < 0 || val >= r)
+    return -1;
+
+  return val;
+
+} /* end s_mp_tovalue() */
+
+/* }}} */
+
+/* {{{ s_mp_todigit(val, r, low) */
+
+/*
+  Convert val to a radix-r digit, if possible.  If val is out of range
+  for r, returns zero.  Otherwise, returns an ASCII character denoting
+  the value in the given radix.
+
+  The results may be odd if you use a radix < 2 or > 64, you are
+  expected to know what you're doing.
+ */
+  
+char     s_mp_todigit(int val, int r, int low)
+{
+  char   ch;
+
+  if(val < 0 || val >= r)
+    return 0;
+
+  ch = s_dmap_1[val];
+
+  if(r <= 36 && low)
+    ch = tolower(ch);
+
+  return ch;
+
+} /* end s_mp_todigit() */
+
+/* }}} */
+
+/* {{{ s_mp_outlen(bits, radix) */
+
+/* 
+   Return an estimate for how long a string is needed to hold a radix
+   r representation of a number with 'bits' significant bits.
+
+   Does not include space for a sign or a NUL terminator.
+ */
+int      s_mp_outlen(int bits, int r)
+{
+  return (int)((double)bits * LOG_V_2(r) + 0.5);
+
+} /* end s_mp_outlen() */
+
+/* }}} */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* HERE THERE BE DRAGONS                                                  */
diff --git a/mpi/mpi.h b/mpi/mpi.h
new file mode 100644
index 00000000..c87df54b
--- /dev/null
+++ b/mpi/mpi.h
@@ -0,0 +1,221 @@
+/*
+    mpi.h
+
+    by Michael J. Fromberger <http://www.dartmouth.edu/~sting/>
+    Copyright (C) 1998 Michael J. Fromberger, All Rights Reserved
+
+    Arbitrary precision integer arithmetic library
+
+    $Id: mpi.h,v 1.1 2004/02/08 04:29:29 sting Exp $
+ */
+
+#ifndef _H_MPI_
+#define _H_MPI_
+
+#include "mpi-config.h"
+
+#if MP_DEBUG
+#undef MP_IOFUNC
+#define MP_IOFUNC 1
+#endif
+
+#if MP_IOFUNC
+#include <stdio.h>
+#include <ctype.h>
+#endif
+
+#include <limits.h>
+
+#define  MP_NEG  1
+#define  MP_ZPOS 0
+
+/* Included for compatibility... */
+#define  NEG     MP_NEG
+#define  ZPOS    MP_ZPOS
+
+#define  MP_OKAY          0 /* no error, all is well */
+#define  MP_YES           0 /* yes (boolean result)  */
+#define  MP_NO           -1 /* no (boolean result)   */
+#define  MP_MEM          -2 /* out of memory         */
+#define  MP_RANGE        -3 /* argument out of range */
+#define  MP_BADARG       -4 /* invalid parameter     */
+#define  MP_UNDEF        -5 /* answer is undefined   */
+#define  MP_LAST_CODE    MP_UNDEF
+
+#include "mpi-types.h"
+
+/* Included for compatibility... */
+#define DIGIT_BIT         MP_DIGIT_BIT
+#define DIGIT_MAX         MP_DIGIT_MAX
+
+/* Macros for accessing the mp_int internals           */
+#define  SIGN(MP)     ((MP)->sign)
+#define  USED(MP)     ((MP)->used)
+#define  ALLOC(MP)    ((MP)->alloc)
+#define  DIGITS(MP)   ((MP)->dp)
+#define  DIGIT(MP,N)  (MP)->dp[(N)]
+
+#if MP_ARGCHK == 1
+#define  ARGCHK(X,Y)  {if(!(X)){return (Y);}}
+#elif MP_ARGCHK == 2
+#include <assert.h>
+#define  ARGCHK(X,Y)  assert(X)
+#else
+#define  ARGCHK(X,Y)  /*  */
+#endif
+
+/* This defines the maximum I/O base (minimum is 2)   */
+#define MAX_RADIX         64
+
+typedef struct {
+  mp_sign       sign;    /* sign of this quantity      */
+  mp_size       alloc;   /* how many digits allocated  */
+  mp_size       used;    /* how many digits used       */
+  mp_digit     *dp;      /* the digits themselves      */
+} mp_int;
+
+/*------------------------------------------------------------------------*/
+/* Default precision                                                      */
+
+unsigned int mp_get_prec(void);
+void         mp_set_prec(unsigned int prec);
+
+/*------------------------------------------------------------------------*/
+/* Memory management                                                      */
+
+mp_err mp_init(mp_int *mp);
+mp_err mp_init_array(mp_int mp[], int count);
+mp_err mp_init_size(mp_int *mp, mp_size prec);
+mp_err mp_init_copy(mp_int *mp, mp_int *from);
+mp_err mp_copy(mp_int *from, mp_int *to);
+void   mp_exch(mp_int *mp1, mp_int *mp2);
+void   mp_clear(mp_int *mp);
+void   mp_clear_array(mp_int mp[], int count);
+void   mp_zero(mp_int *mp);
+void   mp_set(mp_int *mp, mp_digit d);
+mp_err mp_set_int(mp_int *mp, long z);
+
+/*------------------------------------------------------------------------*/
+/* Single digit arithmetic                                                */
+
+mp_err mp_add_d(mp_int *a, mp_digit d, mp_int *b);
+mp_err mp_sub_d(mp_int *a, mp_digit d, mp_int *b);
+mp_err mp_mul_d(mp_int *a, mp_digit d, mp_int *b);
+mp_err mp_mul_2(mp_int *a, mp_int *c);
+mp_err mp_div_d(mp_int *a, mp_digit d, mp_int *q, mp_digit *r);
+mp_err mp_div_2(mp_int *a, mp_int *c);
+mp_err mp_expt_d(mp_int *a, mp_digit d, mp_int *c);
+
+/*------------------------------------------------------------------------*/
+/* Sign manipulations                                                     */
+
+mp_err mp_abs(mp_int *a, mp_int *b);
+mp_err mp_neg(mp_int *a, mp_int *b);
+
+/*------------------------------------------------------------------------*/
+/* Full arithmetic                                                        */
+
+mp_err mp_add(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_sub(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_mul(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_mul_2d(mp_int *a, mp_digit d, mp_int *c);
+#if MP_SQUARE
+mp_err mp_sqr(mp_int *a, mp_int *b);
+#else
+#define mp_sqr(a, b) mp_mul(a, a, b)
+#endif
+mp_err mp_div(mp_int *a, mp_int *b, mp_int *q, mp_int *r);
+mp_err mp_div_2d(mp_int *a, mp_digit d, mp_int *q, mp_int *r);
+mp_err mp_expt(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_2expt(mp_int *a, mp_digit k);
+mp_err mp_sqrt(mp_int *a, mp_int *b);
+
+/*------------------------------------------------------------------------*/
+/* Modular arithmetic                                                     */
+
+#if MP_MODARITH
+mp_err mp_mod(mp_int *a, mp_int *m, mp_int *c);
+mp_err mp_mod_d(mp_int *a, mp_digit d, mp_digit *c);
+mp_err mp_addmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c);
+mp_err mp_submod(mp_int *a, mp_int *b, mp_int *m, mp_int *c);
+mp_err mp_mulmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c);
+#if MP_SQUARE
+mp_err mp_sqrmod(mp_int *a, mp_int *m, mp_int *c);
+#else
+#define mp_sqrmod(a, m, c) mp_mulmod(a, a, m, c)
+#endif
+mp_err mp_exptmod(mp_int *a, mp_int *b, mp_int *m, mp_int *c);
+mp_err mp_exptmod_d(mp_int *a, mp_digit d, mp_int *m, mp_int *c);
+#endif /* MP_MODARITH */
+
+/*------------------------------------------------------------------------*/
+/* Comparisons                                                            */
+
+int    mp_cmp_z(mp_int *a);
+int    mp_cmp_d(mp_int *a, mp_digit d);
+int    mp_cmp(mp_int *a, mp_int *b);
+int    mp_cmp_mag(mp_int *a, mp_int *b);
+int    mp_cmp_int(mp_int *a, long z);
+int    mp_isodd(mp_int *a);
+int    mp_iseven(mp_int *a);
+
+/*------------------------------------------------------------------------*/
+/* Number theoretic                                                       */
+
+#if MP_NUMTH
+mp_err mp_gcd(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_lcm(mp_int *a, mp_int *b, mp_int *c);
+mp_err mp_xgcd(mp_int *a, mp_int *b, mp_int *g, mp_int *x, mp_int *y);
+mp_err mp_invmod(mp_int *a, mp_int *m, mp_int *c);
+#endif /* end MP_NUMTH */
+
+/*------------------------------------------------------------------------*/
+/* Input and output                                                       */
+
+#if MP_IOFUNC
+void   mp_print(mp_int *mp, FILE *ofp);
+#endif /* end MP_IOFUNC */
+
+/*------------------------------------------------------------------------*/
+/* Base conversion                                                        */
+
+#define BITS     1
+#define BYTES    CHAR_BIT
+
+mp_err mp_read_signed_bin(mp_int *mp, unsigned char *str, int len);
+int    mp_signed_bin_size(mp_int *mp);
+mp_err mp_to_signed_bin(mp_int *mp, unsigned char *str);
+
+mp_err mp_read_unsigned_bin(mp_int *mp, unsigned char *str, int len);
+int    mp_unsigned_bin_size(mp_int *mp);
+mp_err mp_to_unsigned_bin(mp_int *mp, unsigned char *str);
+
+int    mp_count_bits(mp_int *mp);
+
+#if MP_COMPAT_MACROS
+#define mp_read_raw(mp, str, len) mp_read_signed_bin((mp), (str), (len))
+#define mp_raw_size(mp)           mp_signed_bin_size(mp)
+#define mp_toraw(mp, str)         mp_to_signed_bin((mp), (str))
+#define mp_read_mag(mp, str, len) mp_read_unsigned_bin((mp), (str), (len))
+#define mp_mag_size(mp)           mp_unsigned_bin_size(mp)
+#define mp_tomag(mp, str)         mp_to_unsigned_bin((mp), (str))
+#endif
+
+mp_err mp_read_radix(mp_int *mp, unsigned char *str, int radix);
+int    mp_radix_size(mp_int *mp, int radix);
+int    mp_value_radix_size(int num, int qty, int radix);
+mp_err mp_toradix(mp_int *mp, unsigned char *str, int radix);
+
+int    mp_char2value(char ch, int r);
+
+#define mp_tobinary(M, S)  mp_toradix((M), (S), 2)
+#define mp_tooctal(M, S)   mp_toradix((M), (S), 8)
+#define mp_todecimal(M, S) mp_toradix((M), (S), 10)
+#define mp_tohex(M, S)     mp_toradix((M), (S), 16)
+
+/*------------------------------------------------------------------------*/
+/* Error strings                                                          */
+
+const  char  *mp_strerror(mp_err ec);
+
+#endif /* end _H_MPI_ */
diff --git a/mpi/mplogic.c b/mpi/mplogic.c
new file mode 100644
index 00000000..32df307a
--- /dev/null
+++ b/mpi/mplogic.c
@@ -0,0 +1,382 @@
+/*
+    mplogic.c
+
+    by Michael J. Fromberger <http://www.dartmouth.edu/~sting/>
+    Copyright (C) 1998 Michael J. Fromberger, All Rights Reserved
+
+    Bitwise logical operations on MPI values
+
+    $Id: mplogic.c,v 1.1 2004/02/08 04:29:29 sting Exp $
+ */
+
+#include "mplogic.h"
+#include <stdlib.h>
+
+/* Some things from the guts of the MPI library we make use of... */
+extern mp_err   s_mp_lshd(mp_int *mp, mp_size p);
+extern void     s_mp_rshd(mp_int *mp, mp_size p);
+
+#define  s_mp_clamp(mp)\
+   { while(USED(mp) > 1 && DIGIT((mp), USED(mp) - 1) == 0) USED(mp) -= 1; }
+
+/* {{{ Lookup table for population count */
+
+static unsigned char bitc[] = {
+   0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 
+   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
+   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
+   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
+   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
+   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
+   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
+   4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
+};
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/*
+  mpl_not(a, b)    - compute b = ~a
+  mpl_and(a, b, c) - compute c = a & b
+  mpl_or(a, b, c)  - compute c = a | b
+  mpl_xor(a, b, c) - compute c = a ^ b
+ */
+
+/* {{{ mpl_not(a, b) */
+
+mp_err mpl_not(mp_int *a, mp_int *b)
+{
+  mp_err   res;
+  int      ix;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  /* This relies on the fact that the digit type is unsigned */
+  for(ix = 0; ix < USED(b); ix++) 
+    DIGIT(b, ix) = ~DIGIT(b, ix);
+
+  s_mp_clamp(b);
+
+  return MP_OKAY;
+
+} /* end mpl_not() */
+
+/* }}} */
+
+/* {{{ mpl_and(a, b, c) */
+
+mp_err mpl_and(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_int  *which, *other;
+  mp_err   res;
+  int      ix;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(USED(a) <= USED(b)) {
+    which = a;
+    other = b;
+  } else {
+    which = b;
+    other = a;
+  }
+
+  if((res = mp_copy(which, c)) != MP_OKAY)
+    return res;
+
+  for(ix = 0; ix < USED(which); ix++)
+    DIGIT(c, ix) &= DIGIT(other, ix);
+
+  s_mp_clamp(c);
+
+  return MP_OKAY;
+
+} /* end mpl_and() */
+
+/* }}} */
+
+/* {{{ mpl_or(a, b, c) */
+
+mp_err mpl_or(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_int  *which, *other;
+  mp_err   res;
+  int      ix;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(USED(a) >= USED(b)) {
+    which = a;
+    other = b;
+  } else {
+    which = b;
+    other = a;
+  }
+
+  if((res = mp_copy(which, c)) != MP_OKAY)
+    return res;
+
+  for(ix = 0; ix < USED(which); ix++)
+    DIGIT(c, ix) |= DIGIT(other, ix);
+
+  return MP_OKAY;
+
+} /* end mpl_or() */
+
+/* }}} */
+
+/* {{{ mpl_xor(a, b, c) */
+
+mp_err mpl_xor(mp_int *a, mp_int *b, mp_int *c)
+{
+  mp_int  *which, *other;
+  mp_err   res;
+  int      ix;
+
+  ARGCHK(a != NULL && b != NULL && c != NULL, MP_BADARG);
+
+  if(USED(a) >= USED(b)) {
+    which = a;
+    other = b;
+  } else {
+    which = b;
+    other = a;
+  }
+
+  if((res = mp_copy(which, c)) != MP_OKAY)
+    return res;
+
+  for(ix = 0; ix < USED(which); ix++)
+    DIGIT(c, ix) ^= DIGIT(other, ix);
+
+  s_mp_clamp(c);
+
+  return MP_OKAY;
+
+} /* end mpl_xor() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/*
+  mpl_rsh(a, b, d)     - b = a >> d
+  mpl_lsh(a, b, d)     - b = a << d
+ */
+
+/* {{{ mpl_rsh(a, b, d) */
+
+mp_err mpl_rsh(mp_int *a, mp_int *b, mp_digit d)
+{
+  mp_err   res;
+  mp_digit dshift, bshift;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  dshift = d / DIGIT_BIT;  /* How many whole digits to shift by */
+  bshift = d % DIGIT_BIT;  /* How many bits to shift by         */
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  /* Shift over as many whole digits as necessary */
+  if(dshift) 
+    s_mp_rshd(b, dshift);
+
+  /* Now handle any remaining bit shifting */
+  if(bshift)
+  {
+    mp_digit  prev = 0, next, mask = (1 << bshift) - 1;
+    int       ix;
+
+    /*
+      'mask' is a digit with the lower bshift bits set, the rest
+      clear.  It is used to mask off the bottom bshift bits of each
+      digit, which are then shifted on to the top of the next lower
+      digit.
+     */
+    for(ix = USED(b) - 1; ix >= 0; ix--) {
+      /* Take off the lower bits and shift them up... */
+      next = (DIGIT(b, ix) & mask) << (DIGIT_BIT - bshift);
+
+      /* Shift down the current digit, and mask in the bits saved
+	 from the previous digit 
+       */
+      DIGIT(b, ix) = (DIGIT(b, ix) >> bshift) | prev;
+      prev = next;
+    }
+  }
+
+  s_mp_clamp(b);   
+
+  return MP_OKAY;
+
+} /* end mpl_rsh() */
+
+/* }}} */
+
+/* {{{ mpl_lsh(a, b, d) */
+
+mp_err mpl_lsh(mp_int *a, mp_int *b, mp_digit d)
+{
+  mp_err   res;
+  mp_digit dshift, bshift;
+
+  ARGCHK(a != NULL && b != NULL, MP_BADARG);
+
+  dshift = d / DIGIT_BIT;
+  bshift = d % DIGIT_BIT;
+
+  if((res = mp_copy(a, b)) != MP_OKAY)
+    return res;
+
+  if(dshift)
+    if((res = s_mp_lshd(b, dshift)) != MP_OKAY)
+      return res;
+
+  if(bshift){ 
+    int       ix;
+    mp_digit  prev = 0, next, mask = (1 << bshift) - 1;
+
+    for(ix = 0; ix < USED(b); ix++) {
+      next = (DIGIT(b, ix) >> (DIGIT_BIT - bshift)) & mask;
+
+      DIGIT(b, ix) = (DIGIT(b, ix) << bshift) | prev;
+      prev = next;
+    }
+  }
+
+  s_mp_clamp(b);
+
+  return MP_OKAY;
+
+} /* end mpl_lsh() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/*
+  mpl_num_set(a, num)
+
+  Count the number of set bits in the binary representation of a.
+  Returns MP_OKAY and sets 'num' to be the number of such bits, if
+  possible.  If num is NULL, the result is thrown away, but it is
+  not considered an error.
+
+  mpl_num_clear() does basically the same thing for clear bits.
+ */
+
+/* {{{ mpl_num_set(a, num) */
+
+mp_err mpl_num_set(mp_int *a, int *num)
+{
+  int            ix, db, nset = 0;
+  mp_digit       cur;
+  unsigned char  reg;
+
+  ARGCHK(a != NULL, MP_BADARG);
+
+  for(ix = 0; ix < USED(a); ix++) {
+    cur = DIGIT(a, ix);
+    
+    for(db = 0; db < sizeof(mp_digit); db++) {
+      reg = (cur >> (CHAR_BIT * db)) & UCHAR_MAX;
+
+      nset += bitc[reg];
+    }
+  }
+
+  if(num)
+    *num = nset;
+
+  return MP_OKAY;
+
+} /* end mpl_num_set() */
+
+/* }}} */
+
+/* {{{ mpl_num_clear(a, num) */
+
+mp_err mpl_num_clear(mp_int *a, int *num)
+{
+  int            ix, db, nset = 0;
+  mp_digit       cur;
+  unsigned char  reg;
+
+  ARGCHK(a != NULL, MP_BADARG);
+
+  for(ix = 0; ix < USED(a); ix++) {
+    cur = DIGIT(a, ix);
+    
+    for(db = 0; db < sizeof(mp_digit); db++) {
+      reg = (cur >> (CHAR_BIT * db)) & UCHAR_MAX;
+
+      nset += bitc[UCHAR_MAX - reg];
+    }
+  }
+
+  if(num)
+    *num = nset;
+
+  return MP_OKAY;
+
+
+} /* end mpl_num_clear() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/*
+  mpl_parity(a)
+
+  Determines the bitwise parity of the value given.  Returns MP_EVEN
+  if an even number of digits are set, MP_ODD if an odd number are
+  set.
+ */
+
+/* {{{ mpl_parity(a) */
+
+mp_err mpl_parity(mp_int *a)
+{
+  int      ix, par = 0;
+  mp_digit cur;
+
+  ARGCHK(a != NULL, MP_BADARG);
+
+  for(ix = 0; ix < USED(a); ix++) {
+    int   shft = (sizeof(mp_digit) * CHAR_BIT) / 2;
+
+    cur = DIGIT(a, ix);
+
+    /* Compute parity for current digit */
+    while(shft != 0) {
+      cur ^= (cur >> shft);
+      shft >>= 1;
+    }
+    cur &= 1;
+
+    /* XOR with running parity so far   */
+    par ^= cur;
+  }
+
+  if(par)
+    return MP_ODD;
+  else
+    return MP_EVEN;
+
+} /* end mpl_parity() */
+
+/* }}} */
+
+/*------------------------------------------------------------------------*/
+/* HERE THERE BE DRAGONS                                                  */
diff --git a/mpi/mplogic.h b/mpi/mplogic.h
new file mode 100644
index 00000000..8400da3d
--- /dev/null
+++ b/mpi/mplogic.h
@@ -0,0 +1,47 @@
+/*
+    mplogic.h
+
+    by Michael J. Fromberger <http://www.dartmouth.edu/~sting/>
+    Copyright (C) 1998 Michael J. Fromberger, All Rights Reserved
+
+    Bitwise logical operations on MPI values
+
+    $Id: mplogic.h,v 1.1 2004/02/08 04:29:29 sting Exp $
+ */
+
+#ifndef _H_MPLOGIC_
+#define _H_MPLOGIC_
+
+#include "mpi.h"
+
+/*
+  The logical operations treat an mp_int as if it were a bit vector,
+  without regard to its sign (an mp_int is represented in a signed
+  magnitude format).  Values are treated as if they had an infinite
+  string of zeros left of the most-significant bit.
+ */
+
+/* Parity results                    */
+
+#define MP_EVEN       MP_YES
+#define MP_ODD        MP_NO
+
+/* Bitwise functions                 */
+
+mp_err mpl_not(mp_int *a, mp_int *b);            /* one's complement  */
+mp_err mpl_and(mp_int *a, mp_int *b, mp_int *c); /* bitwise AND       */
+mp_err mpl_or(mp_int *a, mp_int *b, mp_int *c);  /* bitwise OR        */
+mp_err mpl_xor(mp_int *a, mp_int *b, mp_int *c); /* bitwise XOR       */
+
+/* Shift functions                   */
+
+mp_err mpl_rsh(mp_int *a, mp_int *b, mp_digit d);   /* right shift    */
+mp_err mpl_lsh(mp_int *a, mp_int *b, mp_digit d);   /* left shift     */
+
+/* Bit count and parity              */
+
+mp_err mpl_num_set(mp_int *a, int *num);         /* count set bits    */
+mp_err mpl_num_clear(mp_int *a, int *num);       /* count clear bits  */
+mp_err mpl_parity(mp_int *a);                    /* determine parity  */
+
+#endif /* end _H_MPLOGIC_ */