22 April 2011

underscore, "while", and angle brackets in Perl

This post discusses some pitfalls of the Perl construct "while (<>)". We'll refer to it as WAB (While Angle Bracket).

WAB sets $_ but does not localize *_ (the underscore glob). This can cause undesired interactions with other constructs that set $_. These constructs include "for ()", "foreach ()", "map", and "grep".

In general, if a WAB is dynamically enclosed by one of these other constructs, it will try to stomp on the enclosing $_. If $_ is not a constant, it will succeed in stomping on it. If $_ is a constant, recent Perls will die with "Modification of a read-only value attempted".

The program below, wab.pl, shows this. Its WAB stomps on the $_ set by the enclosing "for ()". What's more, since $_ is just an alias to the members of the list given to "for ()", the WAB stomps on the list, too!
The command "true | ./wab.pl" gives the following output.
$VAR1 = undef;
$VAR2 = undef;
The effect is more dramatic if the list given to "for ()" contains constants. If we modify wab.pl with the following patch and run it under a recent Perl, it dies with "Modification of a read-only value attempted".
12c12
< for ( $a )
---
> for ( 1, $a )
There are various ways to avoid WAB's behavior. One way is to explicitly localize *_. For example, we could modify wab.pl with the following patch.
8c8
< sub f { while ( <> ) {} }
---
> sub f { local *_; while ( <> ) {} }
The output would then be as follows.
$VAR1 = 1;
$VAR2 = 1;
We can also just stop using WAB. For example, we could modify wab.pl with the following patch.
8c8
< sub f { while ( <> ) {} }
---
> sub f { while ( my $f = <> ) {} }
That concludes the main body of this post. Some additional notes appear below, for the more curious.

Some additional notes

Though WAB's behavior is often undesirable, it is far from undocumented. See, for example, I/O Operators in the official Perl documentation.

Constructs other than WAB that set $_ work fine together because they localize *_. These constructs include "for ()", "foreach ()", "map", and "grep".

It is not sufficient to localize $_, i.e. to do "local $_". We need to localize the entire glob for underscore, i.e. we need to do "local *_". This is needed in case $_ is currently aliased to a magic constant like $1. In such a case, doing "local $_" gets fresh storage for $_ but still leaves it as a constant, i.e. read-only.

In recent versions of Perl, you can use "my $_" to achieve an effect similar to "local *_". The effect is only similar, not identical, because this makes the scope of $_ lexical rather than dynamic.

We could continue to use WAB but still avoid undesired interactions if we stopped using the other constructs that set $_, or started using them in a WAB-defensive way. This feels a little like "blaming the victim," but who said programming was fair?

In recent versions of Perl, a WAB-defensive way to use any of these constructs is to precede them with "my $_". The dynamically-enclosed code may have to be changed because this makes the scope of $_ lexical rather than dynamic.

If "my $_" is unavailable or undesirable, we can use alternate forms of "for" and "foreach" that do not set $_. For example, we could modify wab.pl with the following patch.
12c12
< for ( $a )
---
> for my $i ( $a )
15c15
<     print Dumper( $_, $a );
---
>     print Dumper( $i, $a );
Unlike "for ()" and "foreach ()", "map" and "grep" don't have alternate forms that would allow us to avoid setting $_. But, if "my $_" is unavailable or undesirable, we can use "map" and "grep" in the following WAB-defensive way.

  1. Capture $_ in a "my" variable before any WAB has a chance to change it and then use that "my" variable instead of $_.
  2. Copy the list to be operated on to a temporary to make WAB's attempts to stomp (a) succeed (b) be invisible.
E.g. imagine "f" might use WAB in the code below.
map { f(); g( $_ ) } @a;
We might WAB-defend this code as follows.
{ map { my $x = $_; f(); g( $x ) } ( my @tmp = @a ) }
This is cumbersome, but might be the best choice if it is costly to change "f".

I conclude this post by listing some relevant links below.

No comments:

Post a Comment