28 April 2011

undefined subroutines in Perl

Perl does not complain about calls to undeclared subroutines made with parentheses. E.g. if f is undeclared, Perl would not complain about "f(2)", though it would complain about "f 2". Neither "use strict" nor "use warnings" changes the situation.

This is not always a bad thing, since it enables techniques in which f is undeclared at compile time but will be defined at run time. This is possible since "declared-ness" is a static / lexical / syntactic property, whereas defined-ness is a dynamic / semantic property.

But, this is a bad thing if such "undeclared but defined" techniques are not being used. In those cases, an "undefined subroutine" error can happen at run time. Such bugs can slip by tests when coverage is incomplete, e.g. in code to handle an error that is hard to provoke.

Three ways to avoid this problem are listed below.
  • avoid parentheses when calling subroutines
  • call only references to subroutines
  • use B::Lint
The most direct solution is to avoid parentheses when calling subroutines, e.g. to use "f 2" instead of "f(2)". This requires subroutines to be declared before they are used, which may require some adjustment to existing code or some adjustment to habits when writing new code. It also sometimes requires precedence to be made explicit, e.g. "f(2) or die" doesn't simply become "f 2 or die", it must become "(f 2) or die" or similar to preserve the original meaning.

If parentheses cannot be avoided, then a solution is to call only references to subroutines.
This takes advantage of the fact that Perl can detect undeclared variables via "use strict".
E.g.,
sub MyFunction { ... }
...
MyFunction();
becomes
my $MyFunction = sub { ... };
...
$MyFunction->();
One consequence of this style is that subroutines must be defined before they are used. This may be considered a good thing or a bad thing. Another consequence of this style is that it makes stack traces much less meaningful. This is almost certainly a bad thing. Consider the following program.
use Carp qw(cluck);
 
my $f = sub { cluck 'cluck'; };
sub g { cluck 'cluck'; }
 
$f->();
g();
The output of this program is
cluck at ./sub.pl line 9
  main::__ANON__() called at ./sub.pl line 12
cluck at ./sub.pl line 8
  main::g() called at ./sub.pl line 11
One way around this "__ANON__" problem is to use references to named subroutines rather than references to anonymous subroutines. E.g. the "__ANON__" problem goes away if the program above is modified with the following diff.
< my $f = sub { cluck 'cluck'; };
---
> my $f = \&g;
But, this re-exposes us to the problem of undefined subroutines at run time. E.g. above, if the user had accidentally typed "\&G" instead of "\&g", this would go undetected until runtime. In addition to being error-prone to write and maintain, this solution is cumbersome to read. E.g.
sub MyFunction { ... }
...
MyFunction();
Becomes
my $MyFunction = \&MyFunction;
sub MyFunction { ... }
...
$MyFunction->();
To discourage direct use of named subroutines, some naming convention should probably be used. Perhaps a prefix of "ppp", e.g.
my $MyFunction = \&pppMyFunction;
sub pppMyFunction { ... }
...
$MyFunction->();
Another way around the __ANON__ problem is to use the Sub::Name CPAN module.

Finally, the B::Lint core Perl module can be used. Though it claims to be equivalent to an extended version of the -w option, there are some important differences that may limit its applicability. It is unlike -w in that it only compiles the program; it does not run it. Also, unlike -w, it has no lexical form, whereas -w has the analogous lexical form "use warnings".

I conclude by thanking Perl Monks for the posts below.

No comments:

Post a Comment