Ticket #1749 (closed bug: fixed)

Opened 11 years ago

Last modified 11 years ago

readall method on open FileHandle is inefficient

Reported by: pmichaud Owned by:
Priority: normal Milestone:
Component: core Version: 2.6.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

Using readall on an open filehandle appears to be extremely inefficient. Some simple tests reading a 10,000 line file can end up taking several seconds to complete. Notably, reading the filehandle line-at-a-time via 'read' is far more efficient than 'readall'.

Looking at the filehandle.pmc code, it appears that 'readall' turns on line buffering on the input filehandle, and then does successive string concatenations to produce the resulting string. That involves a lot of string allocations, iiuc, which is probably part of the slowness.

I'm not sure why it chooses to use line buffering -- it seems like some other strategy would be far more efficient.

Pm

Change History

  Changed 11 years ago by chromatic

On Friday 20 August 2010 at 18:15, Parrot  wrote:

>  I'm not sure why it chooses to use line buffering -- it seems like
>  some other strategy would be far more efficient.

Agreed, but in browsing the IO code I haven't seen an easy way to specify 
"Don't read line by line" or an IO function to perform the STRING 
concatenation at the C level, which won't allocate lots of temporary immutable 
STRINGs.

Allison?

-- c

  Changed 11 years ago by bacek

On Sun, Aug 22, 2010 at 1:35 AM, chromatic <chromatic@wgz.org> wrote:
> On Friday 20 August 2010 at 18:15, Parrot  wrote:
>
>>  I'm not sure why it chooses to use line buffering -- it seems like
>>  some other strategy would be far more efficient.
>
> Agreed, but in browsing the IO code I haven't seen an easy way to specify
> "Don't read line by line" or an IO function to perform the STRING
> concatenation at the C level, which won't allocate lots of temporary immutable
> STRINGs.

What about switching to use StringBuilder internally?

-- 
Bacek

  Changed 11 years ago by pmichaud

On Sun, Aug 22, 2010 at 08:52:21AM +0800, Vasily Chekalkin wrote:
> On Sun, Aug 22, 2010 at 1:35 AM, chromatic <chromatic@wgz.org> wrote:
> > On Friday 20 August 2010 at 18:15, Parrot  wrote:
> >
> >>  I'm not sure why it chooses to use line buffering -- it seems like
> >>  some other strategy would be far more efficient.
> >
> > Agreed, but in browsing the IO code I haven't seen an easy way to specify
> > "Don't read line by line" or an IO function to perform the STRING
> > concatenation at the C level, which won't allocate lots of temporary immutable
> > STRINGs.
> 
> What about switching to use StringBuilder internally?

Wouldn't StringBuilder still require lots of intermediate strings to
be used as input to the StringBuilder?

Pm

  Changed 11 years ago by chromatic

On Saturday 21 August 2010 at 18:42, Patrick R wrote:

> > What about switching to use StringBuilder internally?
 
> Wouldn't StringBuilder still require lots of intermediate strings to
> be used as input to the StringBuilder?

Yes, and due to an implementation quirk of StringBuilder the internal encoding 
of the resulting STRING may change from the expected form to the most 
compatible form.  That *shouldn't* be an issue, but it caused a test failure 
when I tried it initially.

Reading the whole file in one go seems the cheapest in terms of C function 
calls and memory allocations.

-- c

follow-up: ↓ 6   Changed 11 years ago by bacek

  • status changed from new to closed
  • resolution set to fixed

Fixed in r48710.

in reply to: ↑ 5   Changed 11 years ago by jimmy

  • status changed from closed to reopened
  • resolution fixed deleted

Replying to bacek:

Fixed in r48710.

reverted in r48715.

  Changed 11 years ago by nwellnhof

  • status changed from reopened to closed
  • resolution set to fixed

The actual problem is that concatenating strings using str_concat has quadratic running time with immutable strings. Fixed in r48824 by using the StringBuilder PMC.

Note: See TracTickets for help on using tickets.