Ticket #1071 (new bug)

Opened 12 years ago

Last modified 11 years ago

Intermittent failure with t/pmc/os.t test 9

Reported by: mikehh Owned by:
Priority: normal Milestone:
Component: testing Version: trunk
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

I am getting an intermittent failure with t/pmc/os.t - test 9 on Ubuntu 9.04 amd64.

It has failed twice for me recently - pretty much in the same way (see below) but in both cases when I re-ran the test it PASSed.

In fact in the last couple of days I have run the test many times (considering that it is run 7 times for each of my test runs, once in smoke and 6 times (once for each core) in fulltest. As I have logged 6 test runs since the first failure (+ other runs not recorded) I have run the test more than 50 times with 2 failures.

At r41542 from  http://smolder.plusthree.com/app/public_projects/report_details/28233:

1..16
ok 1 - Test cwd
ok 2 - Test chdir
ok 3 - Test mkdir
ok 4 - Test rm call in a directory
ok 5 - Test that rm removed the directory
ok 6 - Test OS.stat
ok 7 - Test OS.readdir
ok 8 - Test OS.rename
not ok 9 - Test OS.lstat

#   Failed test 'Test OS.lstat'
#   at t/pmc/os.t line 314.
#          got: '0x00000811
# 0x00a1b82f
# 0x000081a4
# 0x00000001
# 0x000003e8
# 0x000003e8
# 0x00000000
# 0x00000004
# 0x4ac081ae
# 0x4ac081ad
# 0x4ac081ad
# 0x00001000
# 0x00000008
# '
#     expected: '0x00000811
# 0x00a1b82f
# 0x000081a4
# 0x00000001
# 0x000003e8
# 0x000003e8
# 0x00000000
# 0x00000004
# 0x4ac081ad
# 0x4ac081ad
# 0x4ac081ad
# 0x00001000
# 0x00000008
# '
ok 10 - Test rm call in a file
ok 11 - Test that rm removed file
ok 12 - Test symlink
ok 13 - symlink was really created
ok 14 - Test link
ok 15 - hard link to file was really created
ok 16 - Test dirlink
# Looks like you failed 1 test of 16.

At r41534 make fulltest - make testr:

#   Failed test 'Test OS.lstat'
#   at t/pmc/os.t line 314.
#          got: '0x00000811
# 0x009e0109
# 0x000081a4
# 0x00000001
# 0x000003e8
# 0x000003e8
# 0x00000000
# 0x00000004
# 0x4abfaa1d
# 0x4abfaa1c
# 0x4abfaa1c
# 0x00001000
# 0x00000008
# '
#     expected: '0x00000811
# 0x009e0109
# 0x000081a4
# 0x00000001
# 0x000003e8
# 0x000003e8
# 0x00000000
# 0x00000004
# 0x4abfaa1c
# 0x4abfaa1c
# 0x4abfaa1c
# 0x00001000
# 0x00000008
# '

Note that the failure (0x4ac081ae expected 0x4abfaa1d) and (0x4ac081ad expected 0x4abfaa1c) in the 9th line of output seems to be the same failure.

I am investigating this further.

Change History

Changed 12 years ago by mikehh

messed the note up - should be:

Note that the failure (0x4abfaa1e expected 0x4abfaa1d) and (0x4abfaa1d expected 0x4abfaa1c) in the 9th line of output seems to be the same failure.

(Copy-paste doesn't seem to work too well in Trac)

Changed 12 years ago by doughera

On Mon, 28 Sep 2009, Parrot wrote:

>  It has failed twice for me recently - pretty much in the same way (see
>  below) but in both cases when I re-ran the test it PASSed.
 
>  In fact in the last couple of days I have run the test many times
>  (considering that it is run 7 times for each of my test runs, once in
>  smoke and 6 times (once for each core) in fulltest.  As I have logged 6
>  test runs since the first failure (+ other runs not recorded) I have run
>  the test more than 50 times with 2 failures.

>  Note that the failure (0x4ac081ae expected 0x4abfaa1d) and (0x4ac081ad
>  expected 0x4abfaa1c) in the 9th line of output seems to be the same
>  failure.

The 9th element is the 'atime last access time in seconds since the 
epoch'.  This simply means that between the time when perl did the stat on 
the file (to get the 'Expected' values) and when parrot did a stat on the 
file, the internal clock ticked over one second.

This sort of error is to be expected in any test that involves timing, and 
the parrot test suite ought to be a bit more forgiving here, but I don't 
think there's anything deeper going on.

-- 
    Andy Dougherty		doughera@lafayette.edu

Changed 12 years ago by chromatic

On Monday 28 September 2009 03:35:01 Parrot wrote:

>- I am getting an intermittent failure with t/pmc/os.t - test 9 on Ubuntu
> 9.04 amd64.

>  Note that the failure (0x4ac081ae expected 0x4abfaa1d) and (0x4ac081ad
>  expected 0x4abfaa1c) in the 9th line of output seems to be the same
>  failure.

Given that the ninth element of the list returned from the lstat() method is a 
file's atime, there's a race condition here a parallel test or a test on a 
heavily loaded system might encounter.

Perhaps we should check a delta against the time-based fields (nine through 
eleven).

-- c

Changed 12 years ago by dukeleto

I think if we just test that the ninth through eleventh fields are >= to their original values, that would be sufficient.

Changed 12 years ago by mikehh

The only times I have had a failure with this (only twice in a large number of tests) - it has been the atime (access time) component that has differed by 1 (second). What the test does: it runs a perl "stat" and places that in the <<CODE portion of the pir_output_is and then runs the pir code which calls the os.pmc "stat" function from pir and compares the output. Under most circumstances this would be equal - but in rare circumstances the access time of the two different calls could differ since the two calls could be just before and just after the second tick. The access time is the only aspect of the test that could differ. It would entail a complete rewrite of the test and using the access functions for individual elements, which would of course, give better coverage - so maybe it is worthwhile.

I will make an attempt at this "real soon now", when I have some time available for it - but it is not a high priority. Unless someone beats me to it :-}

Changed 12 years ago by doughera

  • component changed from none to testing

If anyone does modify this test, it would be useful to fix TT #457 at the same time.

Changed 11 years ago by mikehh

the test was moved from t/pmc/os.t to t/dynpmc/os.t at r46259

Note: See TracTickets for help on using tickets.