Ticket #1978 (reopened bug)

Opened 11 years ago

Last modified 11 years ago

possible problem with g++ --optimize (t/pmc/float.t failure)

Reported by: mikehh Owned by:
Priority: normal Milestone:
Component: build Version: 3.0.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform: linux

Description

In testing t/pmc/float.t it failed a test when built with g++-4.5 with --optimize.

Failed test: 102 - comparison ops: cmp_p_p: equality (passes g++ without --optimize and gcc - Ubuntu 10.10 i386)

kid51 reported that the test passed for him using g++ v4.3.2

I did some further tests with Ubuntu 10.04 i386 and Ubuntu 10.10 i386 and it failed (g++ --optimize only) with g++ v4.4.3 and v4.5.1

NotFound and cotto both managed to reproduce the error using g++ v4.4.5

NotFound proposed a patch, which fixed the problem and cotto committed

diff --git a/src/pmc/float.pmc b/src/pmc/float.pmc
index e31cf55..dc8e06e 100644
--- a/src/pmc/float.pmc
+++ b/src/pmc/float.pmc
@@ -285,8 +285,9 @@ Returns the result of comparing the number with C<*value>.
     }
 
     MULTI INTVAL cmp_num(DEFAULT value) {
-        const FLOATVAL diff =
-                SELF.get_number() - VTABLE_get_number(INTERP, value);
+        volatile FLOATVAL n1 = SELF.get_number();
+        volatile FLOATVAL n2 = VTABLE_get_number(INTERP, value);
+        const FLOATVAL diff = n1 - n2;
         return diff > 0 ? 1 : diff < 0 ? -1 : 0;
     }

this fixed the failure.

The main reason for creating the ticket (apart from cotto asking me to) is that we had a failure with g++ --optimize 32 bit (later versions) that did not occur without --optimize or with gcc. The test did not fail on 64 bit (perhaps because of different ways of handling floating point).

Change History

  Changed 11 years ago by cotto

  • status changed from new to closed
  • resolution set to fixed

This is fixed, but I've added a reference to the fix in 248434ef in case anyone wants to diagnose this further. NotFound came up with the fix, but said that he doesn't entirely understand why it works.

  Changed 11 years ago by doughera

  • status changed from closed to reopened
  • resolution fixed deleted

I think the test is wrong. It does, in essence, the following:

    $P1 = new ['Float']
    $P1 = 123.45
    $P2 = new ['String']
    $P2 = "123.45"
    $I0 = cmp_num $P1, $P2
    is($I0, 0, 'comparison ops: cmp_p_p: equality')

I gather that the intent is to check whether the conversion of the string "123.45" to a Float is the same, whether that string comes in as literal text in the PIR file, or as a "String" that gets converted to a number.

Comparing those two code paths is sensible, but the problem is the choice of number. Since 123.45 can't be represented exactly in binary, the precise representation depends on how many bits are used. On the x86 family of processors using the x87 math coprocessor, that might be 80 bits. Using 'volatile' forces the compiler to reload the contents from memory. Without the volatile keyword, the compiler is free to use a copy that might be in a register. In the general case, we don't want to force the compiler to always refetch floating point numbers from memory. The compiler ought to be able to use registers.

I think this fix should be reverted, and the test should be changed to pick a number that can be represented exactly in binary.

  Changed 11 years ago by whiteknight

Better yet, Parrot should have a better and more robust way to compare floats using a delta or some other algorithm. Directly comparing floating-point representations in memory is fraught with danger no matter what values we pick.

I suggest we either modify cmp_num to take a third argument for a delta value, or we simply build in a much more robust algorithm into cmp_num.

  Changed 11 years ago by NotFound

I don't think that picking a number that pass the tests solves anything. If the exact equality isn't supported we should use an epsilon instead.

And if we don't support exact equality we'll get that comparing $P1 and $P2 gives a result different than converting them to number registers and comparing. This may be confusing.

  Changed 11 years ago by NotFound

Exact equality is needed sometimes. If cmp_num says they are equal but the difference is zero, or the contrary, will not be good. If someone wants comparison within a margin he should ask for that.

  Changed 11 years ago by doughera

Agreed, floating point is complicated. How "close" is "close enough" is strongly dependent on context. There is no single universal answer. It all depends on context. The context here is test 102 of t/pmc/float.t, not a generic discussion of floating point equality.

In this particular case, we have to ask what, precisely, we wish to test. I indicated above what I thought this test was testing. Assuming that is correct, then for this specific test, I think picking a number that can be represented exactly is sensible.

Of course, perhaps that test was intended to test something else. In that case, depending on what that "something else" is, it might be that the test is simply wrong and should be deleted. Without knowing what that "something else" is, I have no opinion.

In either case, I think forcing the compiler to fetch a fresh value from memory is suboptimal and should be reverted.

  Changed 11 years ago by NotFound

Note that the cmp_num vtable is used in a lot of places, so the impact of changing its meaning is unpredictable. I don't think we should risk that because of an optimization that no one is going to be able to measure.

About what the test does: I think that ff we want to improve it we should check for a variety of values the result of cpm_num with the PMCs is the same as the comparison of the number values obtained from them. Given that the implementation of the vtable cmp_num in several PMCs uses the vtable get_number, looks like the appropriate behavior to me.

follow-up: ↓ 10   Changed 11 years ago by mikehh

I agree with most of the comments related to floating point testing for equality - it is a very hazardous activity.

However, one of the main points of this ticket, was that we had a situation where the optimizing g++ compiler on a 32 bit platform, generated a different result, from the non-optimizing g++ compiler and from both the optimizing and non-optimizing gcc compiler and that an earlier version of that compiler did not, and furthermore, the problem did not occur on a 64 bit platform.

Is this a compiler bug, a parrot bug in which the later g++ compilers have more efficient optimizers, and where can this effect us in future.

Cheers, Michael (mikehh)

  Changed 11 years ago by bacek

Hello.

Just my $0.02. We should at least start with something like this - change 0.0 equality test to use epsilon value.

diff --git a/include/parrot/misc.h b/include/parrot/misc.h
index e81ff80..488739b 100644
--- a/include/parrot/misc.h
+++ b/include/parrot/misc.h
@@ -20,7 +20,8 @@
 
 #include "parrot/parrot.h"
 
-#define FLOAT_IS_ZERO(f) ((f) == 0.0)
+#include <float.h>
+#define FLOAT_IS_ZERO(f) (((f) <= DBL_EPSILON) && (f) >= -DBL_EPSILON )
 
 #ifndef PARROT_HAS_C99_SNPRINTF
 #  define snprintf Parrot_secret_snprintf
diff --git a/src/pmc/float.pmc b/src/pmc/float.pmc
index 55708a3..f265dd8 100644
--- a/src/pmc/float.pmc
+++ b/src/pmc/float.pmc
@@ -260,13 +260,13 @@ The C<cmp> operation.
 
     MULTI INTVAL cmp(Float value) {
         const FLOATVAL diff = SELF.get_number() - VTABLE_get_number(INTERP, value);
-        return diff > 0 ? 1 : diff < 0 ? -1 : 0;
+        return FLOAT_IS_ZERO(diff) ? 0 : diff < 0 ? -1 : 1;
     }
 
     MULTI INTVAL cmp(DEFAULT value) {
         const FLOATVAL diff =
                 SELF.get_number() - VTABLE_get_number(INTERP, value);
-        return diff > 0 ? 1 : diff < 0 ? -1 : 0;
+        return FLOAT_IS_ZERO(diff) ? 0 : diff < 0 ? -1 : 1;
     }
 
 /*
@@ -281,7 +281,7 @@ Returns the result of comparing the number with C<*value>.
 
     MULTI INTVAL cmp_num(Float value) {
         const FLOATVAL diff = SELF.get_number() - VTABLE_get_number(INTERP, value);
-        return diff > 0 ? 1 : diff < 0 ? -1 : 0;
+        return FLOAT_IS_ZERO(diff) ? 0 : diff < 0 ? -1 : 1;
     }
 
     MULTI INTVAL cmp_num(DEFAULT value) {
@@ -291,7 +291,7 @@ Returns the result of comparing the number with C<*value>.
         volatile FLOATVAL n1 = SELF.get_number();
         volatile FLOATVAL n2 = VTABLE_get_number(INTERP, value);
         const FLOATVAL diff = n1 - n2;
-        return diff > 0 ? 1 : diff < 0 ? -1 : 0;
+        return FLOAT_IS_ZERO(diff) ? 0 : diff < 0 ? -1 : 1;
     }
 
 /*

-- Bacek

in reply to: ↑ 8   Changed 11 years ago by doughera

Replying to mikehh:

I agree with most of the comments related to floating point testing for equality - it is a very hazardous activity.

Similarly, defining what parrot will and will not guarantee is also a very tricky activity.

However, one of the main points of this ticket, was that we had a situation where the optimizing g++ compiler on a 32 bit platform, generated a different result, from the non-optimizing g++ compiler and from both the optimizing and non-optimizing gcc compiler and that an earlier version of that compiler did not, and furthermore, the problem did not occur on a 64 bit platform. Is this a compiler bug, a parrot bug in which the later g++ compilers have more efficient optimizers, and where can this effect us in future.

I don't think it's a compiler bug. It's easy to imagine different versions of the compiler deciding to keep different items in registers. On x86 systems using the x87 coprocessor with the extended 80-bit precision, this can lead to the sort of problem you encountered. This does not affect x86_64 systems (nor SPARC nor PPC), since they don't use 80-bit extended precision registers for floating point.

I think it's an area where parrot's desired behavior hasn't been explicitly defined, so it's unclear whether the test, the pmc code, or both are in error.

Either choice can lead to surprises. For example, allowing the 80-bit registers led to the problem here. Not allowing 80-bit registers at all (gcc offers several ways to do this) might cause problems with external libraries that expect to use the 80-bit registers. (As just one example, a parrot-based program and an "equivalent" C program might no longer get the same answer. Perl 5 certainly has received a number of bug reports of just this sort.)

As a virtual machine that hopes to interact both with higher level languages and with external libraries, both choices are reasonable, but neither is ideal.

My personal bias is that the original test was in error -- parrot shouldn't make such promises of equality -- but it's also reasonable to take the opposite point of view.

  Changed 11 years ago by nwellnhof

Yes, this is a well-known issue with excess precision on x87, see  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 for example.

Using SSE instead of x87 with "-msse2 -mfpmath=sse" should also do the trick.

  Changed 11 years ago by jkeenan

Is the issue reported in TT #1930 (a test failure on Darwin/PPC in t/op/number.t that appears only with --optimize) fundamentally the same as this one?

  Changed 11 years ago by jkeenan

  • component changed from none to build

  Changed 11 years ago by nwellnhof

FWIW, I added the -fexcess-precision=standard option supported by GCC >= 4.5 in d0fd13f.

  Changed 11 years ago by dukeleto

What needs to happen to close this ticket? Do we have a consensus?

Note: See TracTickets for help on using tickets.