Document: N1092
Date: 29-Nov-04

# The type and representation of unsuffixed floating constant

## Introduction

The proposal of decimal floating type in N1077 requires new suffixes for the decimal floating constants. It would help usability if unsuffixed floating constant can be used. The same usability issue exists for the fixed point types specified in TR 18037. This paper proposes a solution.

The issue can be illustrated by the following example (_Decimal64 is a decimal floating type proposed in N1077):

_Decimal64 rate = 0.1;

0.1 has type double. In an implementation where binary representation is used for the floating types, and FLT_EVAL_METHOD is not -1, the internal representation of 0.1 cannot be exact. The variable 'rate' will get a value slightly different from 0.1. This defeated the purpose of decimal floating types. On the other hand, requiring programmers to write:

_Decimal64 rate = 0.1dd;

is inconvenient.

## Translation time data type

The main idea is to introduce a translation time data type (TTDT) which the translator uses as the type for unsuffixed floating constants. A floating constant is kept in this type and representation until an operation requires it to be converted to an actual type. The value of the constant remains exact as long as possible during the translation process. The idea can be summarized as follows:

1/ The implementation is allowed to use a type different from double and long double as the type of unsuffixed floating constant. This is an implementation defined type. The intention is this type can represent the floating constant extactly. (A possible choice is a decimal floating type.)

2/ The range and precision of this type are implementation defined and are fixed throughout the program.

3/ TTDT is an arithmetic type. All arithmetic operations are defined for this type.

4/ Usual arithmetic conversion is extended to handle mixed operations between TTDT and other types. Roughly speaking, if an operation involves both TTDT and an actual type, the TTDT is converted to an actual type before the operation. This way, there is no "top-down" context information required when processing unsuffixed floating constants. For example:

double  f;
f =  0.1;

Suppose the implementation uses _Decimal128 (a decimal floating type defined in N1077) as the TTDT. 0.1 is represented exactly after the constant is scanned. It is then converted to double in the assignment operator.

f = 0.1 * 0.3;

Here, both 0.1 and 0.3 are represented in TTDT. The result of the multiply operator also has type TTDT. It is then converted to double before the assignment. The result would be different from a current implementation where double is used as the type and a binary representation for the constants. The error due to conversion is incurred in both constants, and then propagated through the multiply. The TTDT provides more accurate result.

float g = 0.3f;
f = 0.1 * g;

When one operand is a TTDT and the other is one of float/double/long double, the TTDT is converted to double with an internal representation following the specification of FLT_EVAL_METHOD for constant of type double. Usual arithmetic conversion is then applied to the resulting operands.

_Decimal32 h = 0.1;

If one operand is a TTDT and the other a decimal floating type, the TTDT is converted to _Decimal64 with an internal representation specified by DEC_EVAL_METHOD (as specified in N1077). Usual arithmetic conversion is then applied.

If one operand is a TTDT and the other a fixed point type, the TTDT is converted to the fixed point type. If the implementation supports fixed point type, it should choose a representation for TTDT that can represent floating and fixed point constants exactly.

## Suggested changes to C99

Below are suggested changes to C99 to capture the above idea. Decimal floating types and fixed point types are not considered in these changes.

In 6.2.5 after paragraph 28, add a paragraph:

[28a] There is an implementation defined data type called the translation time data type, or TTDT. TTDT is an arithmetic type and is used as the type for unsuffixed floating constants. There is no type specifier for TTDT.

Replace 6.4.4.2 paragraph 4 with the following:

[4] An unsuffixed floating constant has type TTDT. If suffixed by the letter f or F, it has type float. If suffixed by the letter l or L, it has type long double.

Add the following paragraphs after 6.3.1.7:

6.3.1.7a Translation Time Data Type

When a TTDT is converted to double, it is converted to the internal representation specified by FLT_EVAL_METHOD.

Recommended practice

The conversion of TTDT to double should match the execution-time conversion of character strings by library functions, such as strtod, given matching inputs suitable for both conversions, the same format and default execution-time rounding.

6.3.1.7b

Before the usual arithmetic conversions are carried out, if one operand is TTDT and the other is not, the TTDT operand is converted to double.