Commit 2f1e4f6c authored by Robert Sprowson's avatar Robert Sprowson Committed by ROOL
Browse files

Update SoftFloat library to 2c

Library release from 2015 is almost identical to 2b, except
* Lots and lots of comments improved, legalese wording changed
* Fix for 128 bit shift in softfloat-macros
* A few 32->32 and 64->64 casts which do nothing on ARM

Version 0.14. Tagged as 'VFPSupport-0_14'
parent 8689395a
;
; This file is automatically maintained by srccommit, do not edit manually.
; Last processed by srccommit version: 1.1.
;
GBLS Module_MajorVersion
GBLA Module_Version
......@@ -10,14 +9,12 @@
GBLS Module_ApplicationDate
GBLS Module_HelpVersion
GBLS Module_ComponentName
GBLS Module_ComponentPath
Module_MajorVersion SETS "0.13"
Module_Version SETA 13
Module_MajorVersion SETS "0.14"
Module_Version SETA 14
Module_MinorVersion SETS ""
Module_Date SETS "20 Feb 2018"
Module_ApplicationDate SETS "20-Feb-18"
Module_Date SETS "03 May 2021"
Module_ApplicationDate SETS "03-May-21"
Module_ComponentName SETS "VFPSupport"
Module_ComponentPath SETS "mixed/RiscOS/Sources/HWSupport/VFPSupport"
Module_FullVersion SETS "0.13"
Module_HelpVersion SETS "0.13 (20 Feb 2018)"
Module_FullVersion SETS "0.14"
Module_HelpVersion SETS "0.14 (03 May 2021)"
END
/* (0.13)
/* (0.14)
*
* This file is automatically maintained by srccommit, do not edit manually.
* Last processed by srccommit version: 1.1.
*
*/
#define Module_MajorVersion_CMHG 0.13
#define Module_MinorVersion_CMHG
#define Module_Date_CMHG 20 Feb 2018
#define Module_MajorVersion_CMHG 0.14
#define Module_MinorVersion_CMHG
#define Module_Date_CMHG 03 May 2021
#define Module_MajorVersion "0.13"
#define Module_Version 13
#define Module_MajorVersion "0.14"
#define Module_Version 14
#define Module_MinorVersion ""
#define Module_Date "20 Feb 2018"
#define Module_Date "03 May 2021"
#define Module_ApplicationDate "20-Feb-18"
#define Module_ApplicationDate "03-May-21"
#define Module_ComponentName "VFPSupport"
#define Module_ComponentPath "mixed/RiscOS/Sources/HWSupport/VFPSupport"
#define Module_FullVersion "0.13"
#define Module_HelpVersion "0.13 (20 Feb 2018)"
#define Module_LibraryVersionInfo "0:13"
#define Module_FullVersion "0.14"
#define Module_HelpVersion "0.14 (03 May 2021)"
#define Module_LibraryVersionInfo "0:14"
This directory contains a modified version of the SoftFloat library, which VFPSupport's support code uses to perform floating point calculations in an IEEE-compliant way.
The bits64 version of SoftFloat Release 2b was used as the basis, obtained from http://www.jhauser.us/arithmetic/SoftFloat.html
The bits64 version of SoftFloat Release 2c was used as the basis, obtained from http://www.jhauser.us/arithmetic/SoftFloat.html
The original SoftFloat license text follows
----------------------------------------------------------------------------
Legal Notice
SoftFloat was written by me, John R. Hauser. This work was made possible in
part by the International Computer Science Institute, located at Suite 600,
1947 Center Street, Berkeley, California 94704. Funding was partially
provided by the National Science Foundation under grant MIP-9311980. The
original version of this code was written as part of a project to build
a fixed-point vector processor in collaboration with the University of
California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
SoftFloat was written by John R. Hauser. Release 2c of SoftFloat was made
possible in part by the International Computer Science Institute, located
at Suite 600, 1947 Center Street, Berkeley, California 94704. Funding
was partially provided by the National Science Foundation under grant
MIP-9311980. The original version of this code was written as part of a
project to build a fixed-point vector processor in collaboration with the
University of California at Berkeley, overseen by Profs. Nelson Morgan and
John Wawrzynek.
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort
has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL
LOSSES, COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO
FURTHERMORE EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER
SCIENCE INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES,
COSTS, OR OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE
SOFTWARE.
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TOLERATE ALL LOSSES, COSTS, OR
OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE WITHOUT RECOMPENSE FROM JOHN
HAUSER OR THE INTERNATIONAL COMPUTER SCIENCE INSTITUTE, AND WHO FURTHERMORE
EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE
INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR
OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE,
OR INCURRED BY ANYONE DUE TO A DERIVATIVE WORK THEY CREATE USING ANY PART OF
THE SOFTWARE.
Derivative works are acceptable, even for commercial purposes, provided
that the minimal documentation requirements stated in the source code are
satisfied.
The following are expressly permitted, even for commercial purposes:
(1) distribution of SoftFloat in whole or in part, as long as this and
other legal notices remain and are prominent, and provided also that, for a
partial distribution, prominent notice is given that it is a subset of the
original; and
(2) inclusion or use of SoftFloat in whole or in part in a derivative
work, provided that the use restrictions above are met and the minimal
documentation requirements stated in the source code are satisfied.
......@@ -4,32 +4,25 @@
/*============================================================================
This C source file is part of the SoftFloat IEC/IEEE Floating-point Arithmetic
Package, Release 2b.
Written by John R. Hauser. This work was made possible in part by the
International Computer Science Institute, located at Suite 600, 1947 Center
Street, Berkeley, California 94704. Funding was partially provided by the
National Science Foundation under grant MIP-9311980. The original version
of this code was written as part of a project to build a fixed-point vector
processor in collaboration with the University of California at Berkeley,
overseen by Profs. Nelson Morgan and John Wawrzynek. More information
is available through the Web page `http://www.cs.berkeley.edu/~jhauser/
arithmetic/SoftFloat.html'.
This C source file is part of the Berkeley SoftFloat IEEE Floating-Point
Arithmetic Package, Release 2c, by John R. Hauser.
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort has
been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIMES
RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS
AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSSES,
COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHERMORE
EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE
INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS, OR
OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE.
Derivative works are acceptable, even for commercial purposes, so long as
(1) the source code for the derivative work includes prominent notice that
the work is derivative, and (2) the source code includes prominent notice with
these four paragraphs for those parts of this code that are retained.
AND ORGANIZATIONS WHO CAN AND WILL TOLERATE ALL LOSSES, COSTS, OR OTHER
PROBLEMS THEY INCUR DUE TO THE SOFTWARE WITHOUT RECOMPENSE FROM JOHN HAUSER OR
THE INTERNATIONAL COMPUTER SCIENCE INSTITUTE, AND WHO FURTHERMORE EFFECTIVELY
INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE INSTITUTE
(possibly via similar legal notice) AGAINST ALL LOSSES, COSTS, OR OTHER
PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWARE, OR
INCURRED BY ANYONE DUE TO A DERIVATIVE WORK THEY CREATE USING ANY PART OF THE
SOFTWARE.
Derivative works require also that (1) the source code for the derivative work
includes prominent notice that the work is derivative, and (2) the source code
includes prominent notice of these three paragraphs for those parts of this
code that are retained.
=============================================================================*/
......@@ -37,7 +30,7 @@ these four paragraphs for those parts of this code that are retained.
#include "softfloat.h"
/*----------------------------------------------------------------------------
| Floating-point rounding mode, extended double-precision rounding precision,
| Floating-point rounding mode, double-extended-precision rounding precision,
| and exception flags.
*----------------------------------------------------------------------------*/
#ifndef __riscos
......@@ -112,6 +105,7 @@ static int32 roundAndPackInt32( flag zSign, bits64 absZ )
z = absZ;
#endif
if ( zSign ) z = - z;
z = (sbits32) z;
if ( ( absZ>>32 ) || ( z && ( ( z < 0 ) ^ zSign ) ) ) {
float_raise( float_flag_invalid );
return zSign ? (sbits32) 0x80000000 : 0x7FFFFFFF;
......@@ -166,6 +160,7 @@ static int64 roundAndPackInt64( flag zSign, bits64 absZ0, bits64 absZ1 )
}
z = absZ0;
if ( zSign ) z = - z;
z = (sbits64) z;
if ( z && ( ( z < 0 ) ^ zSign ) ) {
overflow:
float_raise( float_flag_invalid );
......@@ -264,9 +259,9 @@ INLINE float32 packFloat32( flag zSign, int16 zExp, bits32 zSig )
| significand must be normalized or smaller. If `zSig' is not normalized,
| `zExp' must be 0; in that case, the result returned is a subnormal number,
| and it must not require rounding. In the usual case that `zSig' is
| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
| The handling of underflow and overflow follows the IEC/IEEE Standard for
| Binary Floating-Point Arithmetic.
| normalized, `zExp' must be 1 less than the "true" floating-point exponent.
| The handling of underflow and overflow follows the IEEE Standard for
| Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
static float32 roundAndPackFloat32( flag zSign, int16 zExp, bits32 zSig )
......@@ -326,7 +321,7 @@ static float32 roundAndPackFloat32( flag zSign, int16 zExp, bits32 zSig )
| and significand `zSig', and returns the proper single-precision floating-
| point value corresponding to the abstract input. This routine is just like
| `roundAndPackFloat32' except that `zSig' does not have to be normalized.
| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the "true"
| floating-point exponent.
*----------------------------------------------------------------------------*/
......@@ -434,9 +429,9 @@ INLINE float64 packFloat64( flag zSign, int16 zExp, bits64 zSig )
| significand must be normalized or smaller. If `zSig' is not normalized,
| `zExp' must be 0; in that case, the result returned is a subnormal number,
| and it must not require rounding. In the usual case that `zSig' is
| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
| The handling of underflow and overflow follows the IEC/IEEE Standard for
| Binary Floating-Point Arithmetic.
| normalized, `zExp' must be 1 less than the "true" floating-point exponent.
| The handling of underflow and overflow follows the IEEE Standard for
| Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
static float64 roundAndPackFloat64( flag zSign, int16 zExp, bits64 zSig )
......@@ -508,7 +503,7 @@ static float64 roundAndPackFloat64( flag zSign, int16 zExp, bits64 zSig )
| and significand `zSig', and returns the proper double-precision floating-
| point value corresponding to the abstract input. This routine is just like
| `roundAndPackFloat64' except that `zSig' does not have to be normalized.
| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the "true"
| floating-point exponent.
*----------------------------------------------------------------------------*/
......@@ -525,7 +520,7 @@ static float64
#ifdef FLOATX80
/*----------------------------------------------------------------------------
| Returns the fraction bits of the extended double-precision floating-point
| Returns the fraction bits of the double-extended-precision floating-point
| value `a'.
*----------------------------------------------------------------------------*/
......@@ -537,7 +532,7 @@ INLINE bits64 extractFloatx80Frac( floatx80 a )
}
/*----------------------------------------------------------------------------
| Returns the exponent bits of the extended double-precision floating-point
| Returns the exponent bits of the double-extended-precision floating-point
| value `a'.
*----------------------------------------------------------------------------*/
......@@ -549,7 +544,7 @@ INLINE int32 extractFloatx80Exp( floatx80 a )
}
/*----------------------------------------------------------------------------
| Returns the sign bit of the extended double-precision floating-point value
| Returns the sign bit of the double-extended-precision floating-point value
| `a'.
*----------------------------------------------------------------------------*/
......@@ -561,7 +556,7 @@ INLINE flag extractFloatx80Sign( floatx80 a )
}
/*----------------------------------------------------------------------------
| Normalizes the subnormal extended double-precision floating-point value
| Normalizes the subnormal double-extended-precision floating-point value
| represented by the denormalized significand `aSig'. The normalized exponent
| and significand are stored at the locations pointed to by `zExpPtr' and
| `zSigPtr', respectively.
......@@ -580,7 +575,7 @@ static void
/*----------------------------------------------------------------------------
| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an
| extended double-precision floating-point value, returning the result.
| double-extended-precision floating-point value, returning the result.
*----------------------------------------------------------------------------*/
INLINE floatx80 packFloatx80( flag zSign, int32 zExp, bits64 zSig )
......@@ -596,25 +591,25 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zExp, bits64 zSig )
/*----------------------------------------------------------------------------
| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
| and extended significand formed by the concatenation of `zSig0' and `zSig1',
| and returns the proper extended double-precision floating-point value
| and returns the proper double-extended-precision floating-point value
| corresponding to the abstract input. Ordinarily, the abstract value is
| rounded and packed into the extended double-precision format, with the
| rounded and packed into the double-extended-precision format, with the
| inexact exception raised if the abstract input cannot be represented
| exactly. However, if the abstract value is too large, the overflow and
| inexact exceptions are raised and an infinity or maximal finite value is
| returned. If the abstract value is too small, the input value is rounded to
| a subnormal number, and the underflow and inexact exceptions are raised if
| the abstract input cannot be represented exactly as a subnormal extended
| double-precision floating-point number.
| returned. If the abstract value is too small, the input value is rounded
| to a subnormal number, and the underflow and inexact exceptions are raised
| if the abstract input cannot be represented exactly as a subnormal double-
| extended-precision floating-point number.
| If `roundingPrecision' is 32 or 64, the result is rounded to the same
| number of bits as single or double precision, respectively. Otherwise, the
| result is rounded to the full precision of the extended double-precision
| number of bits as single- or double-precision, respectively. Otherwise,
| the result is rounded to the full precision of the double-extended-precision
| format.
| The input significand must be normalized or smaller. If the input
| significand is not normalized, `zExp' must be 0; in that case, the result
| returned is a subnormal number, and it must not require rounding. The
| handling of underflow and overflow follows the IEC/IEEE Standard for Binary
| Floating-Point Arithmetic.
| handling of underflow and overflow follows the IEEE Standard for Floating-
| Point Arithmetic.
*----------------------------------------------------------------------------*/
static floatx80
......@@ -779,7 +774,7 @@ static floatx80
/*----------------------------------------------------------------------------
| Takes an abstract floating-point value having sign `zSign', exponent
| `zExp', and significand formed by the concatenation of `zSig0' and `zSig1',
| and returns the proper extended double-precision floating-point value
| and returns the proper double-extended-precision floating-point value
| corresponding to the abstract input. This routine is just like
| `roundAndPackFloatx80' except that the input significand does not have to be
| normalized.
......@@ -938,8 +933,8 @@ INLINE float128
| significand is not normalized, `zExp' must be 0; in that case, the result
| returned is a subnormal number, and it must not require rounding. In the
| usual case that the input significand is normalized, `zExp' must be 1 less
| than the ``true'' floating-point exponent. The handling of underflow and
| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| than the "true" floating-point exponent. The handling of underflow and
| overflow follows the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
static float128
......@@ -1038,7 +1033,7 @@ static float128
| returns the proper quadruple-precision floating-point value corresponding
| to the abstract input. This routine is just like `roundAndPackFloat128'
| except that the input significand has fewer bits and does not have to be
| normalized. In all cases, `zExp' must be 1 less than the ``true'' floating-
| normalized. In all cases, `zExp' must be 1 less than the "true" floating-
| point exponent.
*----------------------------------------------------------------------------*/
......@@ -1073,7 +1068,7 @@ static float128
/*----------------------------------------------------------------------------
| Returns the result of converting the 32-bit two's complement integer `a'
| to the single-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 int32_to_float32( int32 a )
......@@ -1090,7 +1085,7 @@ float32 int32_to_float32( int32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 32-bit two's complement integer `a'
| to the double-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float64 int32_to_float64( int32 a )
......@@ -1113,9 +1108,8 @@ float64 int32_to_float64( int32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 32-bit two's complement integer `a'
| to the extended double-precision floating-point format. The conversion
| is performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| to the double-extended-precision floating-point format. The conversion is
| performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
floatx80 int32_to_floatx80( int32 a )
......@@ -1141,7 +1135,7 @@ floatx80 int32_to_floatx80( int32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 32-bit two's complement integer `a' to
| the quadruple-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float128 int32_to_float128( int32 a )
......@@ -1165,7 +1159,7 @@ float128 int32_to_float128( int32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 64-bit two's complement integer `a'
| to the single-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 int64_to_float32( int64 a )
......@@ -1208,7 +1202,7 @@ float32 int64_to_float32( int64 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 64-bit two's complement integer `a'
| to the double-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float64 int64_to_float64( int64 a )
......@@ -1228,9 +1222,8 @@ float64 int64_to_float64( int64 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 64-bit two's complement integer `a'
| to the extended double-precision floating-point format. The conversion
| is performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| to the double-extended-precision floating-point format. The conversion
| is performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
floatx80 int64_to_floatx80( int64 a )
......@@ -1254,7 +1247,7 @@ floatx80 int64_to_floatx80( int64 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the 64-bit two's complement integer `a' to
| the quadruple-precision floating-point format. The conversion is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float128 int64_to_float128( int64 a )
......@@ -1289,11 +1282,11 @@ float128 int64_to_float128( int64 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the 32-bit two's complement integer format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic---which means in particular that the conversion is rounded
| according to the current rounding mode. If `a' is a NaN, the largest
| positive integer is returned. Otherwise, if the conversion overflows, the
| largest integer with the same sign as `a' is returned.
| performed according to the IEEE Standard for Floating-Point Arithmetic---
| which means in particular that the conversion is rounded according to the
| current rounding mode. If `a' is a NaN, the largest positive integer is
| returned. Otherwise, if the conversion overflows, the largest integer with
| the same sign as `a' is returned.
*----------------------------------------------------------------------------*/
int32 float32_to_int32( float32 a )
......@@ -1319,11 +1312,10 @@ int32 float32_to_int32( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the 32-bit two's complement integer format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic, except that the conversion is always rounded toward zero.
| If `a' is a NaN, the largest positive integer is returned. Otherwise, if
| the conversion overflows, the largest integer with the same sign as `a' is
| returned.
| performed according to the IEEE Standard for Floating-Point Arithmetic,
| except that the conversion is always rounded toward zero. If `a' is a NaN,
| the largest positive integer is returned. Otherwise, if the conversion
| overflows, the largest integer with the same sign as `a' is returned.
*----------------------------------------------------------------------------*/
int32 float32_to_int32_round_to_zero( float32 a )
......@@ -1361,11 +1353,11 @@ int32 float32_to_int32_round_to_zero( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the 64-bit two's complement integer format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic---which means in particular that the conversion is rounded
| according to the current rounding mode. If `a' is a NaN, the largest
| positive integer is returned. Otherwise, if the conversion overflows, the
| largest integer with the same sign as `a' is returned.
| performed according to the IEEE Standard for Floating-Point Arithmetic---
| which means in particular that the conversion is rounded according to the
| current rounding mode. If `a' is a NaN, the largest positive integer is
| returned. Otherwise, if the conversion overflows, the largest integer with
| the same sign as `a' is returned.
*----------------------------------------------------------------------------*/
int64 float32_to_int64( float32 a )
......@@ -1397,11 +1389,10 @@ int64 float32_to_int64( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the 64-bit two's complement integer format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic, except that the conversion is always rounded toward zero. If
| `a' is a NaN, the largest positive integer is returned. Otherwise, if the
| conversion overflows, the largest integer with the same sign as `a' is
| returned.
| performed according to the IEEE Standard for Floating-Point Arithmetic,
| except that the conversion is always rounded toward zero. If `a' is a NaN,
| the largest positive integer is returned. Otherwise, if the conversion
| overflows, the largest integer with the same sign as `a' is returned.
*----------------------------------------------------------------------------*/
int64 float32_to_int64_round_to_zero( float32 a )
......@@ -1447,8 +1438,7 @@ int64 float32_to_int64_round_to_zero( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the double-precision floating-point format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float64 float32_to_float64( float32 a )
......@@ -1477,9 +1467,8 @@ float64 float32_to_float64( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the extended double-precision floating-point format. The conversion
| is performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| `a' to the double-extended-precision floating-point format. The conversion
| is performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
floatx80 float32_to_floatx80( float32 a )
......@@ -1511,8 +1500,7 @@ floatx80 float32_to_floatx80( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of converting the single-precision floating-point value
| `a' to the double-precision floating-point format. The conversion is
| performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float128 float32_to_float128( float32 a )
......@@ -1540,10 +1528,10 @@ float128 float32_to_float128( float32 a )
#endif
/*----------------------------------------------------------------------------
| Rounds the single-precision floating-point value `a' to an integer, and
| returns the result as a single-precision floating-point value. The
| operation is performed according to the IEC/IEEE Standard for Binary
| Floating-Point Arithmetic.
| Rounds the single-precision floating-point value `a' to an integer,
| and returns the result as a single-precision floating-point value. The
| operation is performed according to the IEEE Standard for Floating-Point
| Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_round_to_int( float32 a )
......@@ -1601,9 +1589,9 @@ float32 float32_round_to_int( float32 a )
/*----------------------------------------------------------------------------
| Returns the result of adding the absolute values of the single-precision
| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated
| before being returned. `zSign' is ignored if the result is a NaN.
| The addition is performed according to the IEC/IEEE Standard for Binary
| Floating-Point Arithmetic.
| before being returned. `zSign' is ignored if the result is a NaN. The
| addition is performed according to the IEEE Standard for Floating-Point
| Arithmetic.
*----------------------------------------------------------------------------*/
static float32 addFloat32Sigs( float32 a, float32 b, flag zSign )
......@@ -1673,8 +1661,8 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign )
| Returns the result of subtracting the absolute values of the single-
| precision floating-point values `a' and `b'. If `zSign' is 1, the
| difference is negated before being returned. `zSign' is ignored if the
| result is a NaN. The subtraction is performed according to the IEC/IEEE
| Standard for Binary Floating-Point Arithmetic.
| result is a NaN. The subtraction is performed according to the IEEE
| Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
static float32 subFloat32Sigs( float32 a, float32 b, flag zSign )
......@@ -1745,9 +1733,9 @@ static float32 subFloat32Sigs( float32 a, float32 b, flag zSign )
}
/*----------------------------------------------------------------------------
| Returns the result of adding the single-precision floating-point values `a'
| and `b'. The operation is performed according to the IEC/IEEE Standard for
| Binary Floating-Point Arithmetic.
| Returns the result of adding the single-precision floating-point values
| `a' and `b'. The operation is performed according to the IEEE Standard for
| Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_add( float32 a, float32 b )
......@@ -1767,8 +1755,8 @@ float32 float32_add( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns the result of subtracting the single-precision floating-point values
| `a' and `b'. The operation is performed according to the IEC/IEEE Standard
| for Binary Floating-Point Arithmetic.
| `a' and `b'. The operation is performed according to the IEEE Standard for
| Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_sub( float32 a, float32 b )
......@@ -1788,8 +1776,8 @@ float32 float32_sub( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns the result of multiplying the single-precision floating-point values
| `a' and `b'. The operation is performed according to the IEC/IEEE Standard
| for Binary Floating-Point Arithmetic.
| `a' and `b'. The operation is performed according to the IEEE Standard for
| Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_mul( float32 a, float32 b )
......@@ -1853,7 +1841,7 @@ float32 float32_mul( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns the result of dividing the single-precision floating-point value `a'
| by the corresponding value `b'. The operation is performed according to the
| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_div( float32 a, float32 b )
......@@ -1919,7 +1907,7 @@ float32 float32_div( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns the remainder of the single-precision floating-point value `a'
| with respect to the corresponding value `b'. The operation is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
float32 float32_rem( float32 a, float32 b )
......@@ -2028,8 +2016,8 @@ float32 float32_rem( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns the square root of the single-precision floating-point value `a'.
| The operation is performed according to the IEC/IEEE Standard for Binary
| Floating-Point Arithmetic.
| The operation is performed according to the IEEE Standard for Floating-Point
| Arithmetic.
*----------------------------------------------------------------------------*/
/* Disable peephole optimisation as a workaround for a bug in CC 5.69 which
......@@ -2093,7 +2081,7 @@ float32 float32_sqrt( float32 a )
/*----------------------------------------------------------------------------
| Returns 1 if the single-precision floating-point value `a' is equal to
| the corresponding value `b', and 0 otherwise. The comparison is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
flag float32_eq( float32 a, float32 b )
......@@ -2114,8 +2102,7 @@ flag float32_eq( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns 1 if the single-precision floating-point value `a' is less than
| or equal to the corresponding value `b', and 0 otherwise. The comparison
| is performed according to the IEC/IEEE Standard for Binary Floating-Point
| Arithmetic.
| is performed according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
flag float32_le( float32 a, float32 b )
......@@ -2138,7 +2125,7 @@ flag float32_le( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns 1 if the single-precision floating-point value `a' is less than
| the corresponding value `b', and 0 otherwise. The comparison is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
flag float32_lt( float32 a, float32 b )
......@@ -2162,7 +2149,7 @@ flag float32_lt( float32 a, float32 b )
| Returns 1 if the single-precision floating-point value `a' is equal to
| the corresponding value `b', and 0 otherwise. The invalid exception is
| raised if either operand is a NaN. Otherwise, the comparison is performed
| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| according to the IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
flag float32_eq_signaling( float32 a, float32 b )
......@@ -2182,7 +2169,7 @@ flag float32_eq_signaling( float32 a, float32 b )
| Returns 1 if the single-precision floating-point value `a' is less than or
| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not
| cause an exception. Otherwise, the comparison is performed according to the
| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
| IEEE Standard for Floating-Point Arithmetic.
*----------------------------------------------------------------------------*/
flag float32_le_quiet( float32 a, float32 b )
......@@ -2210,8 +2197,8 @@ flag float32_le_quiet( float32 a, float32 b )
/*----------------------------------------------------------------------------
| Returns 1 if the single-precision floating-point value `a' is less than