28 April 2011

undefined subroutines in Perl

Perl does not complain about calls to undeclared subroutines made with parentheses. E.g. if f is undeclared, Perl would not complain about "f(2)", though it would complain about "f 2". Neither "use strict" nor "use warnings" changes the situation.

This is not always a bad thing, since it enables techniques in which f is undeclared at compile time but will be defined at run time. This is possible since "declared-ness" is a static / lexical / syntactic property, whereas defined-ness is a dynamic / semantic property.

But, this is a bad thing if such "undeclared but defined" techniques are not being used. In those cases, an "undefined subroutine" error can happen at run time. Such bugs can slip by tests when coverage is incomplete, e.g. in code to handle an error that is hard to provoke.

Three ways to avoid this problem are listed below.
  • avoid parentheses when calling subroutines
  • call only references to subroutines
  • use B::Lint
The most direct solution is to avoid parentheses when calling subroutines, e.g. to use "f 2" instead of "f(2)". This requires subroutines to be declared before they are used, which may require some adjustment to existing code or some adjustment to habits when writing new code. It also sometimes requires precedence to be made explicit, e.g. "f(2) or die" doesn't simply become "f 2 or die", it must become "(f 2) or die" or similar to preserve the original meaning.

If parentheses cannot be avoided, then a solution is to call only references to subroutines.
This takes advantage of the fact that Perl can detect undeclared variables via "use strict".
E.g.,
sub MyFunction { ... }
...
MyFunction();
becomes
my $MyFunction = sub { ... };
...
$MyFunction->();
One consequence of this style is that subroutines must be defined before they are used. This may be considered a good thing or a bad thing. Another consequence of this style is that it makes stack traces much less meaningful. This is almost certainly a bad thing. Consider the following program.
use Carp qw(cluck);
 
my $f = sub { cluck 'cluck'; };
sub g { cluck 'cluck'; }
 
$f->();
g();
The output of this program is
cluck at ./sub.pl line 9
  main::__ANON__() called at ./sub.pl line 12
cluck at ./sub.pl line 8
  main::g() called at ./sub.pl line 11
One way around this "__ANON__" problem is to use references to named subroutines rather than references to anonymous subroutines. E.g. the "__ANON__" problem goes away if the program above is modified with the following diff.
< my $f = sub { cluck 'cluck'; };
---
> my $f = \&g;
But, this re-exposes us to the problem of undefined subroutines at run time. E.g. above, if the user had accidentally typed "\&G" instead of "\&g", this would go undetected until runtime. In addition to being error-prone to write and maintain, this solution is cumbersome to read. E.g.
sub MyFunction { ... }
...
MyFunction();
Becomes
my $MyFunction = \&MyFunction;
sub MyFunction { ... }
...
$MyFunction->();
To discourage direct use of named subroutines, some naming convention should probably be used. Perhaps a prefix of "ppp", e.g.
my $MyFunction = \&pppMyFunction;
sub pppMyFunction { ... }
...
$MyFunction->();
Another way around the __ANON__ problem is to use the Sub::Name CPAN module.

Finally, the B::Lint core Perl module can be used. Though it claims to be equivalent to an extended version of the -w option, there are some important differences that may limit its applicability. It is unlike -w in that it only compiles the program; it does not run it. Also, unlike -w, it has no lexical form, whereas -w has the analogous lexical form "use warnings".

I conclude by thanking Perl Monks for the posts below.

22 April 2011

underscore, "while", and angle brackets in Perl

This post discusses some pitfalls of the Perl construct "while (<>)". We'll refer to it as WAB (While Angle Bracket).

WAB sets $_ but does not localize *_ (the underscore glob). This can cause undesired interactions with other constructs that set $_. These constructs include "for ()", "foreach ()", "map", and "grep".

In general, if a WAB is dynamically enclosed by one of these other constructs, it will try to stomp on the enclosing $_. If $_ is not a constant, it will succeed in stomping on it. If $_ is a constant, recent Perls will die with "Modification of a read-only value attempted".

The program below, wab.pl, shows this. Its WAB stomps on the $_ set by the enclosing "for ()". What's more, since $_ is just an alias to the members of the list given to "for ()", the WAB stomps on the list, too!
The command "true | ./wab.pl" gives the following output.
$VAR1 = undef;
$VAR2 = undef;
The effect is more dramatic if the list given to "for ()" contains constants. If we modify wab.pl with the following patch and run it under a recent Perl, it dies with "Modification of a read-only value attempted".
12c12
< for ( $a )
---
> for ( 1, $a )
There are various ways to avoid WAB's behavior. One way is to explicitly localize *_. For example, we could modify wab.pl with the following patch.
8c8
< sub f { while ( <> ) {} }
---
> sub f { local *_; while ( <> ) {} }
The output would then be as follows.
$VAR1 = 1;
$VAR2 = 1;
We can also just stop using WAB. For example, we could modify wab.pl with the following patch.
8c8
< sub f { while ( <> ) {} }
---
> sub f { while ( my $f = <> ) {} }
That concludes the main body of this post. Some additional notes appear below, for the more curious.

21 April 2011

bash pipefail

If you care about the exit code of a piped bash command, you must set the "pipefail" option. The code below shows this.


Output:
false alone: 1
false piped to head: 0
false 'pipefailed' to head: 1
The "take home" message from this is that you should always set the pipefail option. Too bad it is not the default.

Below are links to some other sites on this topic.

20 April 2011

Inaugural post

Now that I've left Oblong Industries, I have time to write a blog.  I plan to post on programming, build systems, and software engineering in general.  I plan to express my opinions as well as present facts I wish I'd known.  These facts will rarely be discoveries, i.e. most of them will be widely known already.  For those widely-known facts, I hope to add some value by presenting them in a way I wish had existed.  Perhaps that will be a useful theme: "facts I wish I'd known, presented in a way I wish had existed."