Ticket #1789 (closed bug: fixed)

Opened 4 years ago

Last modified 4 years ago

String concatenation is really slow

Reported by: pmichaud Owned by:
Priority: normal Milestone:
Component: core Version: 2.7.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

String concatenation can be really slow in Parrot. This was originally reported in TT #1749 as part of the 'readall' method on FileHandle PMCs; that ticket has been marked as "fixed" (by causing 'readall' to avoid string concatenations) but the underlying performance problem remains. This ticket focuses more directly on the string concatenation issue so it doesn't get lost.

The following Perl 5 program generates a string of 25,000 comma-separated random integers:

#! perl

my $s = int(rand(10000));
for ($n=1; $n<25000; $n++) {
    $s = $s . ', ' . int(rand(10000));
}
print length($s), "\n"

Hand translation to PIR results in something like:

pmichaud@orange:~/rakudo$ cat x1.pir
.loadlib 'math_ops'

.sub 'main' :main
    .local string s
    .local int n

    $I0 = rand 10000
    s = $I0
    n = 1
  loop:
    unless n < 25000 goto done
    $I0 = rand 10000
    $S0 = $I0
    s = concat s, ', '
    s = concat s, $S0
    inc n
    goto loop
  done:
    $I0 = length s
    say $I0
.end

Running the program under r48933 takes a little over 3 seconds:

pmichaud@orange:~/rakudo$ parrot_install/bin/parrot_config revision
48933
pmichaud@orange:~/rakudo$ time parrot_install/bin/parrot x1.pir
147305

real	0m3.208s
user	0m3.190s
sys	0m0.000s

Adding a single line to load Rakudo's perl6.pbc bytecode file (but not use Rakudo in any way) makes the program at least nine times slower:

pmichaud@orange:~/rakudo$ diff -u x1.pir x2.pir
--- x1.pir	2010-09-13 15:07:23.000000000 -0500
+++ x2.pir	2010-09-13 15:07:29.000000000 -0500
@@ -1,6 +1,8 @@
 .loadlib 'math_ops'
 
 .sub 'main' :main
+    load_bytecode 'perl6.pbc'
+
     .local string s
     .local int n
 
pmichaud@orange:~/rakudo$ time parrot_install/bin/parrot x2.pir
147242

real	0m30.381s
user	0m30.030s
sys	0m0.200s

Notes:

1. It only takes about a second for x2.pir to complete the C<load_bytecode 'perl6.pbc'> step -- the increase in execution time is almost entirely in the concatenation steps.

2. There's no "large Rakudo parse tree or ast" lying around that would be artificially increasing the work of the GC -- this is truly compiled code.

3. The code is running entirely in the 'parrot' HLL namespace, there are no Rakudo-specific PMCs, subroutines, dispatchers, etc. being used once the load_bytecode has completed.

(By way of comparison, the Perl 5 version of this program completes in approximately 0.7 sec on my system.)

Pm

Change History

  Changed 4 years ago by nwellnhof

The problem is that after the immutable string changes, concatenating many small strings has quadratic runtime. I think the in-place variant of the concat opcode could be optimized pretty easily for that case. We would have to add back the Parrot_str_append function which can reallocate the string buffer efficiently if it's unshared.

Optimizing the three argument variant of concat would require tracking additional data for each string buffer, AFAICS.

Unshared buffers would be another solution but they make substr slower.

follow-up: ↓ 3   Changed 4 years ago by Paul C. Anagnostopoulos

Can the assembler be made clever enough to notice multiple catenations in a row? If so, then you could have an n-ary catenate operator that allocates the final string of the overall size and then catenates all the operand strings into it.

in reply to: ↑ 2   Changed 4 years ago by plobsing

Replying to Paul C. Anagnostopoulos:

Can the assembler be made clever enough to notice multiple catenations in a row? If so, then you could have an n-ary catenate operator that allocates the final string of the overall size and then catenates all the operand strings into it.

Pattern matching on source code is the wrong approach because:

  • Matching is hard.
  • Intelligence/Complexity in the assembler is evil. It complicates the compiler and violates the POLS of the user. PIR is already a horrible language in this regard.
  • There will always be a case that isn't matched by the matching heuristic which has drastically different performance characteristics from seemingly equivalent code.

Secondly, n-ary concatenate as an op is a bad idea because:

  • Variable-length ops already require too much special casing in anything that has to walk bytecode. Please don't add any more of these.
  • Requires temporal coupling of all strings to be concatenated. This doesn't solve, for example, efficiently reading and buffering data from a socket as it arrives.

follow-up: ↓ 5   Changed 4 years ago by Paul C. Anagnostopoulos

I understand your points, but I don't see how something like:

    a := b & c & d;              // & is catenation.

is going to happen efficiently without some cleverness.

in reply to: ↑ 4   Changed 4 years ago by pmichaud

Replying to Paul C. Anagnostopoulos:

I understand your points, but I don't see how something like: {{{ a := b & c & d; // & is catenation. }}} is going to happen efficiently without some cleverness.

Other dynamic languages like Perl 5 and Python are able to perform string concatenation effectively without having to resort to any special cleverness. The inefficiency isn't in the api of the concatenation operation, it's in Parrot's implementation of it.

Another possibly interesting data point: running x1.pir with garbage collection disabled (-G) makes it run substantially slower as well.

pmichaud@orange:~/rakudo$ time parrot_install/bin/parrot x1.pir
147244

real	0m3.242s
user	0m3.200s
sys	0m0.030s
pmichaud@orange:~/rakudo$ time parrot_install/bin/parrot -G x1.pir
147266

real	0m14.895s
user	0m1.010s
sys	0m4.140s
pmichaud@orange:~/rakudo$ 

I'm guessing (and it's completely a guess on my part) that this is because with garbage collection disabled, the memory allocator ends up doing more work in trying to decide when/where to allocate memory. Still, it surprised me a bit.

Pm

  Changed 4 years ago by nwellnhof

I had a further look into this, and it's actually pretty easy to speed up the concatenation. Please give r49018 a try. Only appending short strings is optimized now but a similar optimization is possible for prepending short strings.

  Changed 4 years ago by NotFound

r49018 is just plain wrong. strings are immutable.

  Changed 4 years ago by NotFound

Discusion about r49018, related details, life, the universe and all in:  http://irclog.perlgeek.de/parrot/2010-09-15#i_2829617

  Changed 4 years ago by bacek

JFYI.

On gc_massacre branch with GC MS2 enabled:

bacek@illusion:~/src/rakudo$ time parrot x1.pir 
147205

real	0m1.584s
user	0m1.376s
sys	0m0.152s
bacek@illusion:~/src/rakudo$ time parrot x2.pir 
147239

real	0m1.930s
user	0m1.720s
sys	0m0.212s

Looks promising :)

-- Bacek.

  Changed 4 years ago by bacek

  • status changed from new to closed
  • resolution set to fixed

Results on current parrot's master:

~/src/rakudo (master)$ time parrot x.pir 
147179

real	0m0.028s
user	0m0.028s
sys	0m0.000s
~/src/rakudo (master)$ time parrot y.pir 
147225

real	0m0.342s
user	0m0.280s
sys	0m0.060s
~/src/rakudo (master)$ 

Closing ticket.

Note: See TracTickets for help on using tickets.