Chapter 14

The Perl Language

by Bob Breedlove


CONTENTS

The goal of this chapter is to explain the Perl language so that you can use it to create Web applications. I do not attempt in this short space to cover all the capabilities of Perl. Several good books on programming Perl are available, including Teach Yourself Perl in 21 Days and Perl 5 Unleashed, both from Sams Publishing. This chapter assumes that you have at least a basic understanding of programming and programming terminology.

This chapter relies heavily on the Perl manual pages (man pages). The UNIX man facility provides online documentation from specially formatted files and is the standard for UNIX-based documentation. Implementations of Perl for other operating systems might also supply versions of this authoritative documentation. For ease of access, the Perl manual has been split up into several sections. References are made throughout this chapter using the standard naming convention for these pages as shown in the following table.

Man pageDescription
perlPerl overview (this section)
perldataPerl data structures
perlsynPerl syntax
perlopPerl operators and precedence
perlrePerl regular expressions
perlrunPerl execution and options
perlfuncPerl built-in functions
perlvarPerl predefined variables
perlsubPerl subroutines
perlmodPerl modules
perlrefPerl references and nested data structures
perlobjPerl objects
perlbotPerl object-oriented tricks and examples
perldebugPerl debugging
perldiagPerl diagnostic messages
perlformPerl formats
perlipcPerl interprocess communication
perlsecPerl security
perltrapPerl traps for the unwary
perlstylePerl style guide
perlapiPerl application programming interface
perlgutsPerl internal functions for those doing extensions
perlcallPerl calling conventions from C
perlovlPerl overloading semantics
perlembedHow to embed Perl in your C or C++ app
perlpodPerl plain old documentation

One excellent version of the documentation is supplied as Adobe Acrobat files. It is available on the Web at


http://www.perl.com/CPAN/authors/id/BMIDD/perlpdf-5.002.tar.gz

Adobe readers are available free for many operating systems; check http://www.adobe.com for current availability.

The man pages are also available in HTML format as Web pages. They are available on the Internet at


http://www.perl.com/CPAN/doc/manual/html/index.html

or packaged in two formats at


http://www.perl.com/CPAN/doc/manual/html/PerlDoc-beta1g-html.tar.gz

http://www.perl.com/CPAN/authors/id/BMIDD/perlhtml-5.002.tar.gz

If you do not have easy access to this complete reference set with your version of Perl, you should take the time to download a copy.

At this writing, two versions of Perl are in common use. Version 4+ and version 5+ (version numbers for version 5 might vary). I cover aspects of the language common to both and only touch on the more advanced aspects of version 5+ that are directly applicable to Web applications.

After you have completed this and the next chapter, "Perl in Internet Applications," you should have a solid understanding of Perl and be able to create Web applications using this language.

About the Perl Chapters

This and the following chapters form a reference and tutorial for the Perl language in Internet programming. This chapter follows the organization of the Perl manual. I hope that the information here clarifies some aspects of Perl programming that are especially important in Internet programming.

Perl provides a number of equally good ways to accomplish a task. The best way to learn to program in Perl is to program in Perl. Because it is an interpreted language, you can develop and test small portions of a program with relative ease in a small amount of time. In the following chapter, a small application acts as a tutorial in Perl programming. You are encouraged to enter the program code along with the chapter, to try it, and to experiment with alternative programming techniques by either modifying the samples or creating your own code based on the demonstrated techniques.

Perl varies some depending on the platform on which it is implemented and the version of the language that runs on your installation. I'll point out differences where applicable.

Writing Perl Scripts

To reiterate an important point, the best way to learn to write Perl scripts is to simply write Perl scripts. That statement isn't quite as silly as it sounds. Perl is intended to be a language in which you can get things done, and it usually gives you more than one way to accomplish a task. The scripts can be simple, straightforward, and quick-and-dirty or elegant and organized. (This ability to write either quick-and-dirty code or elegant and organized code is especially true if you are using the object-oriented aspects of Perl version 5. These aspects of the language are beyond the scope of this chapter, however.)

In keeping with the tradition of most programming texts (at least for C-like languages), here is the popular Hello World script (program) for Perl:


print "Hello World\n";

Not much of a program, is it? It is fully functional, however. Compare the Perl script with its C counterpart:


void main()

       printf("Hello World\n");

}

You'll notice differences right away. First, because Perl is a scripting language, it starts at the top of the script file and works its way to the bottom. It can take some branches, perform some loops, or execute some functions, but top-to-bottom execution is the basic rule of script programming. Perl has no main() function. Perl starts with the first executable instruction it finds and executes instructions until it executes the entire script. Note that subroutines (functions), methods, packages, and so on are not considered executable instructions for this purpose.

The Perl print statement is also less complex than its C counterpart. Perl actually supports the printf() function, but the print statement prints to the standard output just fine.

Executing Perl Scripts

The techniques used to execute Perl will vary somewhat by operating system. Generally, Perl is supplied as an executable program (file). Refer to your specific operating system manuals for instructions on how to execute programs, and refer to the documentation with Perl for your specific platform for specifics on executing Perl scripts. This section takes the simplest example of command-line operating systems such as UNIX and MS-DOS. In general, the following steps are needed to create and execute the Hello World script:

You should see the Hello, world phrase on your screen followed by a newline.

On most UNIX systems and Windows NT, the Perl interpreter can be associated with a particular file naming pattern to allow a safer execution of Perl for purposes of writing CGI programs. For example, if you had associated Perl with files ending in .pl, the command hello.pl would execute the Hello World script when entered on the command line.

Note that in Web programming the following construct is very dangerous, and you should avoid using it at all costs:


http://{host}/{library}/Perl?hello.pl

If you allow this, any Perl script can be substituted, with possible disastrous effects for your host site. Instead, if you are going to do Web programming in Perl, you should be able to execute your Perl scripts by entering only the name of the script. The exact method you use to accomplish this task depends upon your operating system.

On UNIX platforms, Perl scripts can be executed in the same way that shell scripts are executed-that is, by providing the full location of the interpreter (Perl, in this case) on your system by making the first line of the script a comment in a special format:


#!{interpreter location}

On my installation, the location is /usr/bin/Perl. Thus, you can modify the "Hello World" program to be


#!/usr/bin/Perl

print "Hello World\n";

Then use the chmod command to set the resulting script file to be executable using some variation of the command:


chmod +x hello.pl

You run the executable by simply entering its name at the command prompt (hello.pl<enter>).

On other systems, you might have to register the extension (.pl) with the operating system to run the interpreter when a file with this extension is selected. Note also that some HTTP daemons or installations require scripts with specific extensions. The installation on which my home page is located (http://www.channel1.com/users/rbreed01/), for example, requires that all executable scripts have an extension of .cgi.

Perl Style

Everyone who writes in Perl develops a personal style. Style is important when you want to change something on your script-and you will want to change things-sometimes months after you have implemented the script. Style and comments can help you make improvements in your script at a later date with a minimum of fuss.

Programmers can argue style until the cows come home. Larry Wall has some definite feelings about Perl style. If you're interested, check out the perlstyle man page. The important point is readability and maintainability. You have to be able to figure out what is going on and be able to make changes to your code quickly.

The following list outlines some more substantive style issues you might want to consider. For examples and other issues, see the perlstyle man page.

NOTE
Using "here documents" can actually detract from readability of indentations used for formatting in programs. You might want to include some comments and whitespace to delineate the documen

Perl Data Types

Perl has three data types:

Perl is not strongly typed. In fact, all data in Perl is either a scalar, an array of scalars, or a hash of scalars. You do not have to declare variables as a particular type (integer, character, or Boolean) before you use them. Variables can contain either numeric or alphanumeric data and can vary throughout the execution of the program. The following code is valid in a Perl script:


$a = 'some string';

...

$a = 25;

Because arrays are arrays of scalars, different elements of an array can contain either numeric or alphanumeric data. The following code


@a = (1, 2, 'buckle my shoe,', 3, 4, 'shut the door.');

print join(' ',@a), "\n";

works just fine and results in the following line:


1 2 buckle my shoe, 3 4 shut the door.

A scalar value is interpreted as TRUE in the Boolean sense if it is not the null string or the number 0 (or its string equivalent, 0). The Boolean context is simply a special kind of scalar context.

The two varieties of null scalars are defined and undefined. Undefined null scalars are returned when something doesn't have a real value, such as when an error occurs, at end of file, or when you refer to an uninitialized variable or element of an array. In Perl, variables do not have to be predefined. Therefore, an undefined null scalar may become defined the first time you use it as if it were defined (such as in an assignment statement). However, before that, you can use the defined() operator to determine whether the value is defined.

Normal arrays are indexed by number with the first element indexed at zero. Negative subscripts count from the end. Hash arrays are indexed by string.

Scalar values are always named with $, even when referring to a scalar that is part of an array:


$month # a simple value holding the month of the year



@month = 

('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');

$month[0] # the first element of array @month, 'Jan'

%month = ('Jan',31,'Feb',28,'Mar',31);

$month{'Feb'} # the 'Feb' value from the associative array %month or 28

Entire arrays or array slices are denoted by @:


@month # The entire array 

@value[3,4,5] # the 4th through 6th elements of the array

@things{'abc','def'} 

# the elements indexed by 'abc' and 'def' from the associative array

Entire hashes are denoted by %:


%days # (key1, val1, key2, val2 ...)

Subroutines are named with an initial &:


if (&getValue() < 10) {

       ...

}

sub getValue {

       ...

       return $value

}

Perl Variable Naming Conventions

Every variable type has its own namespace. You can use the same name for a scalar variable, an array, or a hash. Therefore, $foo and @foo are two different variables, and $foo[1] is a part of @foo, not a part of $foo.

In general, Perl variable names can contain any combination of characters, underscores, and special characters. I personally prefer to use all characters and digits to avoid some of the special cases described in the following paragraphs.

Because variable and array references always start with $, @, or %, the Perl "reserved" words, which define the language constructs, aren't in fact reserved with respect to variable names. They are reserved with respect to language elements, such as labels and filehandles, however, which don't have an initial special character.

Like C, case is significant in Perl-"FOO", "Foo", and "foo" are all different names. Names that start with a letter or underscore can also contain digits and underscores.

You can use an expression that returns a reference to an object of the same type as a variable.

Names that start with a digit can only contain more digits. Names that do not start with a letter, underscore, or digit are limited to one character-for example, $% or $$. (Most one-character names have a predefined significance to Perl. For instance, $$ is the current process ID.)

Scalar Values

Numeric literals are specified in any of the customary floating-point or integer formats:


12345 

12345.67 

.23E-10 

0xffff # hex 

0377 # octal 

4_294_967_296 # underline for legibility

String literals are usually delimited by either single or double quotes. Double-quoted string literals are subject to backslash and variable substitution. Single-quoted strings are not subject to these substitutions except for "" and \\. The usual UNIX backslash rules apply for making characters such as newline and tab, as well as some more exotic forms.

You can also embed newlines directly in your strings; that is, strings can end on a different line from where they begin. This feature is nice, but if you forget your trailing quote, the error is not reported until Perl finds another line containing the quote character, which might be much farther on in the script. Variable substitution inside strings is limited to scalar variables, arrays, and array slices. The following example prints the name in the line:


$name = 'Fred';

print "Hello, $name!\n";

As in some shells, you can put curly brackets around the identifier to delimit it from following alphanumerics. In fact, an identifier within such curlies is forced to be a string, as is any single identifier within a hash subscript.


$days{'Feb'}

can be written as


$days{Feb}

and the quotes are assumed automatically. Anything more complicated in the subscript is interpreted as an expression.

Note that a single-quoted string must be separated from a preceding word by a space because a single quote is a valid character in an identifier.

A word that has no other interpretation in Perl is treated as a quoted string. These words are known as barewords. A bareword that consists entirely of lowercase letters risks conflict with future reserved words. You might want to avoid barewords entirely or always code them in uppercase.

Perl supports a line-oriented form of quoting. Following a command, you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are the value of the item. The terminating string can be either an identifier (a word) or some quoted text. If quoted text, the type of quotes you use determines the treatment of the text, just as in regular quoting. An unquoted identifier works like double quotes. You cannot leave a space between the << and the identifier. The terminating string must appear by itself-unquoted and with no surrounding whitespace-on the terminating line.

This line-oriented format can be especially helpful in producing HTML pages. The following script prints the template for an HTML page:


print <<EOF;

<HTML>

<HEAD>

<TITLE>...</TITLE>

</HEAD>

<BODY>

...

</BODY>

</HTML>

EOF

       ;

Note that the terminating EOF is on a line by itself and a semicolon (;) is supplied on the next line.

List values are separated by commas and enclosed in parentheses. The following code


@myList = (1,2,3,4,5);

assigns the values 1-5 to the array variable @myList. Arrays assigned to other arrays lose their identity. Given the assignment above,


@myList2 = (@myList,6,7,8,9,10);

is equivalent to


@myList2 = (1,2,3,4,5,6,7,8,9,10);

You cannot identify @myList within @myList2.

The null list is represented by ().

Lists can be assigned to when the elements of the list are valid to be assigned to. This feature can be useful when splitting comma-delimited files.


while(<IN>) {

       chop;

       ($name, $addr, $city, @junk) = split(/,/);

       ...

}

The preceding script reads lines from the filehandle IN and splits them by commas, placing the first three results from the split into $name, $addr, and $city, respectively, and the remainder of the line, if any, into the array @junk. (If you aren't going to use the remainder of the line, you can leave off @junk, and the remainder of the line is not assigned.)

You can actually place an array or hash anywhere in a list, but then all remaining items are assigned to the array; any subsequent items in the list are unassigned.

When assigning values to a hash, you can use the => operator. Using the operator is just a more visible way of showing the assignment. For example, the following assignments are equivalent:


%stuff = ( thing1 => 'abcde',

           thing2 => 'defgh',

           thing3 => 'ijklm');



%stuff = ( thing1, 'abcde',

           thing2, 'defgh',

           thing3, 'ijklm');

One thing to note about hashes: The order in which a hash is initialized is not necessarily the order in which the elements are retrieved from the hash (see the description of the SORT statement).

Assigning an array to a scalar variable returns the number of items in the array. If the number returned is zero, then the array is empty.

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * because it represents all types. Typeglobs used to be the preferred way to pass arrays and hashes by reference into a function, but now that there are real references, you rarely use this technique.

One place to still use typeglobs is for passing or storing filehandles. To save a filehandle, do this:


$fh = *STDOUT;

Use the same method to create a local filehandle.

Predefined Variables

Predefined variables are names that have special meaning to Perl. Most of the punctuation names (such as $$ and $#) have reasonable mnemonics, or analogues, in one of the shells. Nevertheless, if you wish to use the long variable names, you just need to say


use English;

at the top of your program. This statement will alias all the short names to the long names in the current package (that is, the module) so that you can use the longer English names instead of the more cryptic special character names. Some of them even have medium names, generally borrowed from the UNIX pattern matching language awk.

A few of these variables are considered read-only. If you try to assign to a read-only variable, either directly or indirectly through a reference, you raise a runtime exception.

The following table describes the Perl predefined variables.

VariableDescription
$ARGThe default input and pattern-searching space.
$Here are the places where Perl assumes $ even if you don't use it:
  • Various unary functions, including functions like ord() and int(), as well as all the filetests (-f, -d) except for -t, which defaults to STDIN.
  • Various list functions like print() and unlink().
  • The pattern matching operations m//, s///, and tr/// when used without an =~ operator.
  • The default iterator variable in a foreach loop if no other variable is supplied.
  • The implicit iterator variable in the grep() and map() functions.
  • The default place to put an input record when a C<<FH>> operation's result is tested by itself as the sole criterion of a while test. Note that outside a while test, this condition does not occur.
$<digit>Contains the subpattern from the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested blocks that have been exited already. [read-only]
$MATCH $&The string matched by the last successful pattern match excluding any matches hidden within a BLOCK or eval() enclosed by the current BLOCK. [read-only]
$PREMATCH $'The string preceding whatever was matched by the last successful pattern match excluding any matches hidden within a BLOCK or eval enclosed by the current BLOCK. [read-only]
$POSTMATCH $'The string following whatever was matched by the last successful pattern match excluding matches hidden within a BLOCK or eval() enclosed by the current BLOCK. [read-only]
$LAST_PAREN_MATCH $+The last bracket matched by the last search pattern. This variable is useful if you don't know which of a set of alternative patterns matched. [read-only]
$MULTILINE_MATCHING $*Set to 1 to do multiline matching within a string, or 0 to tell Perl that it can assume that strings contain a single line for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. Note that this variable influences only the interpretation of ^ and $. You can search for a literal newline even when $* is 0.
Input_line_number HANDLE EXPR $INPUT_LINE_NUMBER $NR $. The current input line number of the last filehandle that was read. An explicit close on the filehandle resets the line number. Line numbers increase across ARGV files.
Input_record_separator HANDLE EXPR The input record separator.
$INPUT_RECORD_SEPARATORSet to newline by default.
$RS $/Treats blank lines as delimiters if set to the null string. You can set the separator to a multicharacter string to match a multicharacter delimiter. Note that setting the separator to \n\n means something slightly different than setting it to "" if the file contains consecutive blank lines. Setting it to "" treats two or more consecutive blank lines as a single blank line. Setting it to \n\n blindly assumes that the next input character belongs to the next paragraph, even if it's a newline.
Autoflush HANDLE EXPR $OUTPUT_AUTOFLUSH $| A setting of nonzero forces a flush afterevery write or print on the currently selected output channel. The default is 0.
Output_field_separator HANDLE EXPR $OUTPUT_FIELD_SEPARATOR The output field separator for the print operator.
$OFS $,Ordinarily, the print operator simply prints the comma-separated fields you specify.
Output_record_separator HANDLE EXPR $OUTPUT_RECORD_SEPARATOR The output record separator for the print operator.
$ORS $\Ordinarily, the print operator simply prints the comma-separated fields you specify with no trailing newline or record separator assumed.
$LIST_SEPARATOR $"This separator is like except that it applies to array values interpolated into a double-quoted string or similar inter-preted string. Default is a space.
$SUBSCRIPT_SEPARATOR $SUBSEP $; The subscript separator for multidimen-sional array emulation. Default is. \034Note that if the keys contain binary data, might not have a safe value.
$OFMT $#The output format for printed numbers. The initial value is %.20g.
format_page_number HANDLE EXPR $FORMAT_PAGE_NUMBER $% The current page number of the currently selected output channel.
Format_lines_per_page HANDLE EXPR $FORMAT_LINES_PER_PAGE $= The current page length of thecurrently selected output channel. Default is 60.
Format_lines_left HANDLE EXPR $FORMAT_LINES_LEFT The number of lines left on the page of the currently $- selected output channel.
Format_name HANDLE EXPR $FORMAT_NAME The name of the current report format for the currently selected output channel. Default is the name of the filehandle.
Format_top_name HANDLE EXPR $FORMAT_TOP_NAME $^ The name of the current top-of-page format for the currently selected output channel. Default is the name of the filehandle with _TOP appended.
Format_line_break_characters HANDLE EXPR $FORMAT_LINE_BREAK_CHARACTERS $: The current set of characters after which a string can be broken to fill continuation fields (starting with ^) in a format. Default is \n- to break on whitespace or hyphens.
Format_formfeed HANDLE EXPR$FORMAT_FORMFEED $^L What the program outputs to perform a form feed. Default is \f.
$ACCUMULATOR $^AThe current value of the write()accumulator for format() lines. A format contains formline() commands that put their result into $^A. After calling its format, write() prints the contents of $^A and empties. You never actually see the contents of $^A unless you call formline()yourself and then look at it.
$CHILD_ERROR $?The status returned by the last pipe close, backtick (``) command, or system() operator. Note that this status word is returned by the wait() system call, so the exit value of the subprocess is actually ($? > 8>). Thus, on many systems, $? & 255 specifies which signal, if any, the process died from and whether a core dump occurred.
$OS_ERROR $ERRNO $!If used in a numeric context, $! yields the current value of errno with all the usual caveats. (That is, you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used in a string context, $! yields the corresponding system error string. You can assign a value to $! in order to set errno if, for example, you want $! to return the string for error n or you want to set the exit value for the die() operator.
$EVAL_ERROR $@The Perl syntax error message from the last eval() command. If null, the last eval() parsed and executed correctly (although the operations you invoked might have failed in the normal fashion).Note that warning messages are not collected in this variable. You can, however, set up a routine to process warnings by setting $SIG{___WARN___} below.
$PROCESS_ID $PID $$The process number of the Perl running this script.
$REAL_USER_ID $UID $<The real user ID (uid) of this process.
$EFFECTIVE_USER_ID $EUID $> The effective user ID of this process.
$REAL_GROUP_ID $GID $(The real group ID (gid) of this process.If you are on a machine that supports membership in multiple groups simulta- neously, the variable gives a space-separated list of groups you are in. The first number is the one returned by getgid(), and the subsequent numbers are returned by getgroups(), one of which may be the same as the first number.
$EFFECTIVE_GROUP_ID $EGID $)The effective gid of this process. If you are on a machine that supports member-ship in multiple groups simultaneously, it gives a space-separated list of groups you are in. The first number is the one returned by getegid(), and the subse-quent numbers are returned by getgroups(), one of which may be the same as the first number.
$PROGRAM_NAME $0Contains the name of the file containing the Perl script being executed. Assigning to $0 modifies the argument area that the ps(1) program sees.
$[The index of the first element in an array and of the first character in a substring. The default is 0.
$Perl_VERSION $]The string prints the version number of this Perl installation (equivalent to the command line perl -v).
$DEBUGGING $^DThe current value of the debugging flags.
$SYSTEM_FD_MAX $^FThe maximum system file descriptor, ordinarily 2. System file descriptors are passed to exec()ed processes, whereas higher file descrip-tors are not. Also, during an open(), system file descriptors are preserved even if the open() fails. (Ordinary file descriptors are closed before the open() is attempted.) Note that the
close-on-exec $^Fstatus of a file descriptor is decided according to the value of at the time of the open, not at the time of the exec.
$INPLACE_EDIT $^IThe current value of the inplace-edit extension. Use undef to disable inplace editing.
$PERLDB $^PThe internal flag that the debugger clears so that it doesn't debug itself.You could conceivably disable debug-ging yourself by clearing it.
$BASETIME $^TThe time at which the script began running in seconds since the epoch (beginning of 1970). The values returned by the -M, -A, and -C filetests are based on this value.
$WARNING $^WThe current value of the warning switch,either TRUE or FALSE.
$EXECUTABLE_NAME $^XThe name that the Perl binary itself wasexecuted as, from C's argv[0].
$ARGVContains the name of the current file when reading from <>.
@ARGVThe array @ARGV contains the command-line arguments. Note that $#ARGV is the number of arguments minus one because $ARGV[0] is the first argument, not thecommand name. See $0 for the command name.
@INCThe array @INC contains the list of places to look for Perl scripts to be evaluated by the do EXPR, require, or use constructs. @INC initially consists of the arguments to any -I command-line switches, followed by the default Perl library, followed by ., to represent the current directory.
%INCThe hash %INC contains entries for each filename that has been included via do or require. The key is the filename you specified, and the value is the location of the file actually found. The require command uses this array to determine whether a given file has already been included.
$ENV{expr}The hash %ENV contains your current environment. Setting a value in ENV changes the environment for child processes.
$SIG{expr}The hash %SIG is used to set signal handlers for various signals.

Perl Syntax

Perl is generally a free-form language. The only elements that you need to declare are report formats and subroutines. To create a variable or other object, simply use it. All uninitialized user-created objects are assumed to start with a null or 0 value until they are defined by some explicit operation such as assignment.

The sequence of statements is executed just once. The interpreter first "compiles" the script, checking for syntax errors. With the exception of subroutines (functions), Perl executes statements from the first line of the script to the last. Of course, like any programming language, Perl supports looping and branching statements, which affect the flow of the program but generally continue execution at the next sequential statement after the statements complete their operation.

Comments-Documenting the Script

Comments are indicated by the # character and extend to the end of the line. Here are examples of comments:


# This is a comment which extends over the entire line

# Comments do not span lines, the "#" character must be used

# at the start of the comment on every line.

$a = 1; # Comments can be added to lines

$b = 2; # to explain usage of a particular instruction.

Declarations

Declarations all take effect at compile time when the script is first executed. You can put a declaration anywhere you can put a statement, but a declaration has no effect on the execution of the primary sequence of statements.

Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point forward in the program.

Simple Statements

A simple statement is an expression that is evaluated and executed by the interpreter. You must terminate every simple statement with a semicolon unless it is the final statement in a block.

Optionally, you can place a SINGLE modifier after any simple statement; the modifier goes just before the terminating semicolon (or block ending). Here are the four valid modifiers:


if {expression} 

unless {expression} 

while {expression} 

until {expression}

The if and unless modifiers work as you might expect. The statement is executed if or unless the {expression} is true.


$a = 2 if $b = $c;

The variable $a is initialized to 2 if the variable $b is equal to the variable $c. An equivalent statement in a more traditional format is


if $b = $c {

       $a = 2;

}

The unless operator executes the statement only if the {expression} is not true. For example,


$a = 2 unless $b = $c;

sets $a equal to 2 only if $b is not equal to $c. The equivalent statement is


if $b != $c {

       $a = 2;

}

The while and until modifiers first evaluate the conditional except when they follow a do block.


$c = 0;

$c += 2 until $a + $c > 10;

adds 2 to $c until $a plus $c equals 10. If $a is 0 before this statement, it might execute several times. If $a is equal to or greater than 10 when the statement is first evaluated, it would never execute.

In the case of a do{} block, the block executes once before the conditional is evaluated. Therefore, you can write loops such as


do { $in = <STDIN>; ... } while $in ne ".\n";

The statements within the do{} block read and process at least one statement from standard input. If that statement is a period followed by an end-of-line character, the loop terminates.

Compound Statements

A series of statements that defines a scope is called a block (referenced as BLOCK throughout this chapter). Generally, a block is delimited by braces ({}).

You can use the following compound statements to control flow:


if (expression) {...} 

if (expression) {...} else {...} 

if (expression) {...} elsif (expression) {...} else {...} 

label while (expression) {...}

label while (expression) {...} continue {...} 

label for (expression; expression; expression) {...} 

label foreach variable (list) {...} 

label {...} continue {...}

Note that, unlike C and Pascal, which execute only the next statement after the conditional unless braces or begin/end pairs are used, Perl compound statements are defined in terms of BLOCKs, not statements. Consequently, the braces are required.

The if statement in Perl works the same as it does in all other languages. If you use unless in place of if, the sense of the test is reversed.

The while statement executes the block as long as the expression is true (not the null string or 0 or 0). The label is optional. If a label is included, it consists of an identifier followed by a colon. The label identifies the loop for the loop control statements next, last, and redo. If the label is omitted, the loop control statement refers to the innermost enclosing loop.

A continue block is always executed immediately before the conditional is about to be evaluated again, just like the third part of a for loop in C. Therefore, you can use a continue block to increment a loop variable, even when the loop has been continued via the next statement.

Loop Control

The following statements control looping in Perl scripts. They are generally equivalent to their C language counterparts.

next

Starts the next iteration of the loop. In the following example, the code in the while loop will not be executed if the line begins with a #, indicating a comment in a Perl script. (This is a convenient construct to strip comments from Perl code.)


LINE: while (<IN>) { 

next LINE if /^#/; # strip comments 

... # additional code

} #end of while loop

last

Immediately exits the loop. The continue block, if any, is not executed.


GET1: while (<IN>) { 

last GET1 if /^EOF/; # exit when a line starting with EOF is encountered;

 ... 

}

redo

Restarts the loop block without evaluating the conditional again. The continue block, if any, is not executed.

For example, when processing a file, input lines might end in a continuation character, such as the plus sign. You can use redo to skip ahead and get the next record.


while (<>) { 

       chop; #remove training linefeed

       if (s/\+$//) {  # record ends in a plus sign indicating a continued line

              $_ .= <>; #append the next line to $_

              redo unless eof(); # go back and check again

       } # end of if statement

... # process the record after gathering continuation lines

} # end of while statement

For Loops

Perl's for loops are exactly like their C equivalents.


for ($i = 1; $i < 100; $i++) { ... }

You could write the same thing using a while loop:


$i =1; 

while ($i < 100) { 

       ... 

} continue {

       $I++;

}

Foreach Loops

The foreach loop iterates over a normal list value and sets the variable to be each element of the list in turn. The variable is implicitly local to the loop and regains its former value upon exiting the loop.

The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. If the variable is omitted, $_ is set to each value. If LIST is an actual array (as opposed to an expression returning a list value), you can modify each element of the array by modifying the variable inside the loop.

Here are some examples. The first reads the array @things, returning each value into $_ and changing the name smith to jones.


foreach (@things) { 

       s/smith/jones/ 

} 

The next example reads each element of @numbs into $num and then multiplies it by 2.


foreach $numb (@numbs) { 

$numb *= 2; 

}

The last example prints the keys and values in the associative array %items.


foreach $item (key %items) {

       print "$item = $items{$item}\n"

}

Blocks

A labeled or unlabeled block is equivalent to a loop that executes once. Therefore, you can use any of the loop control statements to leave or restart the block. The continue block is optional.

Unlike C, Perl does not have a switch statement. Perl does have several alternative ways to write equivalent statements. The block construct is particularly nice for doing case structures. Here's an example using a block. Note that SWITCH: is not a statement, but merely a label. Any other label would work as well. This statement tests the first character of $_ and performs the logic related to it.


SWITCH: { 

       if (/^a/) { 

              $value = 1; 

              last SWITCH; 

       } 

       if (/^b/) { 

              $value = 2; 

              last SWITCH; 

       } 

       if (/^c/) { 

              $value = 3; 

              last SWITCH; 

       } 

       $value = 0; 

}

Goto

Perl does support three forms of the goto statement: goto-LABEL, goto-EXPR, and goto-&NAME. A loop's LABEL is not a valid target for a goto; it's just the name of the loop. However, these statements could be considered bad programming form, so I advise against using them unless absolutely necessary and will not spend time on them here. These functions are described in the "Alphabetical Listing of Perl Functions."

Perl Operators

The following list outlines Perl operator associativity and precedence, from highest precedence to lowest. With very few exceptions, Perl operators operate on scalar values only, not on array values.

Left Terms and List Operators (Leftward)
left ->
nonassoc ++ --
right **
right ! ~ \ and unary + and -left
=~
!~
left * / % x
left + - .
left << >>

Nonassoc Named Unary Operators
nonassoc < > <= >= lt gt le ge
nonassoc == != <=> eq ne cmp
left &
left | ^
left &&
left ||
nonassoc ..
right ?:
right = += -= *= and so on
left , =>

Nonassoc List Operators (Rightward)
left not
left and
left or xor

The following sections present the operators in precedence order.

Terms and List Operators (Leftward)

Any term (shown throughout this chapter as "TERM") is of the highest precedence of Perl. Terms include variables, quote and quote-like operators, expressions in parentheses, and functions whose arguments are parenthesized.

If any list operator (print(), for example) or any unary operator (chdir(), for example) is followed by a left parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest precedence, just like a normal function call.

In the absence of parentheses, the precedence of list operators such as print, sort, or chmod is either very high or very low depending on whether you look at the left side of the operator or the right side of it. In the following example, the elements (commas) on the right of the sort are evaluated before the sort, but the commas on the left are evaluated after.


@ary = (1, 3, sort 4, 2); 

print @ary; # prints 1324

List operators tend to "gobble up" all the arguments that follow them and then act like a simple TERM with regard to the preceding expression. Note that you have to be careful with parentheses. This is illustrated in the following examples.


# These evaluate exit before doing the print and, thus never print: 

print($foo, exit); # Obviously not what you want. 

print $foo, exit; # Nor is this. 



# These do the print before evaluating exit: 

(print $foo), exit; # This is what you want. 

print($foo), exit; # Or this. 

print ($foo), exit; # Or even this.

Also note that


print ($foo & 255) + 1, "\n";

probably doesn't do what you expect at first glance. A complete discussion of parentheses is beyond the scope of this chapter. See "Named Unary Operators" in the perlop man page for a more complete discussion of parentheses.

Also parsed as terms are the do{} and eval{} constructs, as well as subroutine and method calls and the anonymous constructors [] and {}.

The Arrow Operator

Just as in C and C++, -> is an infix dereference operator. If the right side is either a [...] or {...} subscript, then the left side must be either a hard or symbolic reference to an array or hash (or a location capable of holding a hard reference, if it's an lvalue). See the perlref man page for a more complete explanation of its use.

Otherwise, the right side is a method name or a simple scalar variable containing the method name, and the left side must either be an object or a class name. See the perlobj man page for a more complete discussion.

Autoincrement and Autodecrement

++ and -- placed before a variable increment or decrement the variable before returning the value. Placed after, they increment or decrement the variable after returning the value.

The autoincrement operator has an extra functionality built into it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has been used only in string contexts since it was set and has a value that has any number of alpha characters followed by any number of numeric characters (/^[a-zA-Z]*[0-9]*$/), the increment is done as a string, preserving each character within its range with carry. Here are some examples:


print ++($foo = '99'); # prints '100' 

print ++($foo = 'a0'); # prints 'a1' 

print ++($foo = 'Az'); # prints 'Ba' 

print ++($foo = 'zz'); # prints 'aaa'

The autodecrement operator does not perform this little trick in reverse.

Exponentiation

** is the exponentiation operator. Note that it binds even more tightly than unary minus, so -2**4 is -(2**4), not (-2)**4.

Symbolic Unary Operators

These are operators represented by single character symbols. Many are equivalent to their counterparts in languages like C, COBOL, or Pascal. Others have unique meaning or usage in Perl.

!
Logical negation, that is, "not."
-
Arithmetic negation if the operand is numeric. If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned.
~
Bitwise negation (1's complement).
+
Has no effect whatsoever, even on strings.
\
Creates a reference to whatever follows it. See the perlref man page. Do not confuse this behavior with the behavior of a backslash within a string, although both forms do convey the notion of protecting the next thing from interpretation.

Binding Operators

Binding operators bind an expression to a pattern match.

=~
Binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or translation. The left argument is what is supposed to be searched, substituted, or translated instead of the default $_. The return value indicates the success of the operation.
!~
Performs just like =~ except the return value is logically negated.

Perl Built-In Functions

Perl supports a rich set of built-in functions. These functions can be used as terms in an expression. The two categories of functions are

The difference between the categories is their precedence relationship with a following comma. List operators take more than one argument, whereas unary operators can never take more than one argument.

In the syntax descriptions in Table 14.1, list operators that expect a list are shown with LIST as an argument. Such a list can consist of any combination of scalar arguments or list values; the list values are included in the list as if each individual element were entered at that point in the list, forming a longer single-dimensional list value. Elements of the LIST should be separated by commas.

You can use any function in Table 14.1 with or without parentheses around its arguments.

Perl Functions By Category

Table 14.1 shows the Perl functions by category. Some functions appear in more than one place. Not all of these functions are covered in detail in this chapter. I skip the functions that have no value in most CGI programs or that are more complex. Refer to the perlfunc man page or the Sams books mentioned earlier in the chapter for information about these functions.

Table 14.1. Perl functions by category.

CategoryPerl function
Functions for scalars or stringschomp, chop, chr, crypt, hex, index, lc, lcfirst, length, oct, ord, pack, q/STRING/, qq/STRING/, reverse, rindex, sprintf, substr, tr///, uc, ucfirst, y///
Regular expressions and pattern matching m//, pos, quotemeta, s///, split, study
Numeric functionsabs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, srand
Real @ARRAY functionspop, push, shift, splice, unshift
List data functionsgrep, join, map, qw/STRING/, reverse, sort, unpack
Real %HASH functionsdelete, each, exists, keys, values
Input and output functionsbinmode, close, closedir, dbmclose, dbmopen, die, eof, fileno, flock, format, getc, print, printf, read, readdir, rewinddir, seek, seekdir, select, syscall, sysread, syswrite, tell, telldir, truncate, warn, write
Functions for fixed length data or records pack, read, syscall, sysread, syswrite, unpack, vec
Functions for filehandles, files, or directories -X, chdir, chmod, chown, chroot, fcntl, glob, ioctl, link, lstat, mkdir, open, opendir, readlink, rename, rmdir, stat, symlink, umask, unlink, utime
Keywords related to the control of program flow caller, continue, die, do, dump, eval, exit, goto, last, next, redo, return, sub, wantarray
Scoping keywordscaller, import, local, my, package, use
Miscellaneous functionsdefined, dump, eval, formline, local, my, reset, scalar, undef, wantarray
Functions for processes and process groups alarm, exec, fork, getpgrp, getppid, getpriority, kill, pipe, qx/STRING/, setpgrp, setpriority, sleep, system, times, wait,waitpid
Keywords related to Perl modulesdo, import, no, package, require, use
Keywords related to classes and object orientation bless, dbmclose, dbmopen, package, ref, tie, tied, untie, use
Low-level socket functionsaccept, bind, connect, getpeername, getsockname, getsockopt, listen, recv, send, setsockopt, shut-down, socket, socketpair
System V interprocess communication functions msgctl, msgget, msgrcv, msgsnd, semctl, semget, semop, shmctl, shmget, shmread, shmwrite
Fetching user and group informationendgrent, endhostent, endnetent, endpwent, getgrent, getgrgid, getgrnam, getlogin, getpwent, getpwnam, getpwuid, setgrent, setpwent
Fetching network informationendprotoent, endservent, gethostbyaddr, gethostbyname, gethostent, getnetbyaddr, getnetbyname, getnetent, getprotobyname, getprotobynumber, getprotoent, getservbyname, getservbyport, getservent, sethostent, setnetent, setprotoent, setservent
Time functionsgmtime, localtime, time, times

Alphabetical Listing of Perl Functions

This section presents the basic Perl functions in alphabetical order as a reference. Not all functions in the language are included in detail here.

-X [[FILEHANDLE|EXPR]]

A file test, where X is one of the letters in the list that follows. This unary operator takes one argument, either a filename or a filehandle, and tests the associated file to see if something is true about it. If the argument is omitted, the expression tests $_, except for -t, which tests STDIN. Unless otherwise documented, this test returns 1 for TRUE, "" for FALSE, or the undefined value if the file doesn't exist. Precedence is the same as any other named unary operator, and the argument may be parenthesized like any other unary operator. The operator may be any of the following:

-r
File is readable by effective uid/gid.
-w
File is writable by effective uid/gid.
-x
File is executable by effective uid/gid.
-o
File is owned by effective uid.
-R
File is readable by real uid/gid.
-W
File is writable by real uid/gid.
-X
File is executable by real uid/gid.
-O
File is owned by real uid.
-e
File exists.
-z
File has zero size.
-s
File has nonzero size (returns size).
-f
File is a plain file.
-d
File is a directory.
-l
File is a symbolic link.
-p
File is a named pipe (FIFO).
-S
File is a socket.
-b
File is a block special file.
-c
File is a character special file.
-t
Filehandle is opened to a tty.
-u
File has setuid bit set.
-g
File has setgid bit set.
-k
File has sticky bit set.
-T
File is a text file.
-B
File is a binary file (opposite of -T).
-M
Age of file in days when script started.
-A
Same for access time.
-C
Same for inode change time.

Note that not all of the preceding operators have meaning in all operating systems. See the Perl man pages for details of using these switches.

abs VALUE

Returns the absolute value of its argument.

accept NEWSOCKET,GENERICSOCKET

Accepts an incoming socket connection, just as the UNIX accept(2) system call does. Returns the packed address if it succeeded, FALSE otherwise.

atan2 Y,X

Returns the arctangent of Y/X in the range -PI to PI.

bind SOCKET,NAME

Binds a network address to a socket, just as the bind system call does. Returns TRUE if it succeeded, FALSE otherwise. NAME should be a packed address of the appropriate type for the socket.

binmode FILEHANDLE

The file identified by FILEHANDLE is read or written in binary mode in operating systems that distinguish between binary and text files. Files that are not in binary mode have CR LF sequences translated to LF on input and LF translated to CR LF on output.

caller [EXPR]

caller returns the context of the current subroutine call. In a scalar context, it returns TRUE if a caller exists, that is, in a subroutine or eval() or require(); otherwise, it returns FALSE. In a list context, returns


($package, $filename, $line) = caller;

With EXPR, caller returns some extra information that the debugger uses to print a stack trace. The value of EXPR indicates how many call frames to go back before the current one.


($package, $filename, $line, $subroutine, $hasargs, $wantargs) = caller($i);

chdir [EXPR]

Changes the working directory to EXPR, if possible. If EXPR is omitted, it changes the working directory to the home directory. Returns TRUE upon success; otherwise, returns FALSE.

chmod LIST

Changes the permissions of a list of files. The first element of the list must be the numerical mode, which should probably be an octal number. Returns the number of files successfully changed.


$cnt = chmod 0755, 'foo', 'bar'; 

chmod 0755, @myfiles;

chomp [VARIABLE|LIST]

chomp is a slightly safer version of chop (see next entry). chomp removes any line ending that corresponds to the current value of $/ and returns the number of characters removed. It's often used to remove the newline from the end of an input record when you're worried that the final record may be missing its newline. When in paragraph mode ($/ = ""), chomp removes all trailing newlines from the string. If VARIABLE is omitted, it chomps $_.


while (<>) { 

       chomp; # avoid \n on last field 

       @array = split(/:/); 

       ... 

}

You can actually chomp anything that's an lvalue, including an assignment:


chomp($cwd = 'pwd'); 

chomp($answer = <STDIN>);

If you chomp a list, each element is chomped and the total number of characters removed is returned.

chop [VARIABLE|LIST]

Chops off the last character of a string and returns the character chopped. The primary use of chop is to remove the newline from the end of an input record. It neither scans nor copies the string. If VARIABLE is omitted, it chops $_.

If you chop a list, each element is chopped. Only the value of the last chop is returned.

Note that chop returns the last character. To return all but the last character, use


substr($ string, 0, -1)

chown LIST

Changes the owner (and group) of a list of files. The first two elements of the list must be the numerical uid and gid, in that order. Returns the number of files successfully changed.

chr NUMBER

Returns the character represented by that NUMBER in the character set. For example, chr(65) is A in ASCII.

close FILEHANDLE

Closes the file or pipe associated with the filehandle, returning TRUE only if stdio successfully flushes buffers and closes the system file descriptor.

FILEHANDLE may be an expression whose value gives the real filehandle name.

closedir DIRHANDLE

Closes a directory opened by opendir().

connect SOCKET,NAME

Attempts to connect to a remote socket, just as the connect system call does. Returns TRUE if successful, FALSE otherwise. NAME should be a packed address of the appropriate type for the socket.

continue BLOCK

Actually a flow control statement rather than a function. If a continue BLOCK is attached to a BLOCK (typically in a while or foreach), the continue statement is always executed just before the conditional is about to be evaluated again, just like the third part of a for loop in C.

cos EXPR

Returns the cosine of EXPR (expressed in radians). If EXPR is omitted, the function takes the cosine of $_.

crypt PLAINTEXT,SALT

Encrypts a string exactly like the crypt(3) function in the C library.

defined EXPR

Returns a Boolean value saying whether EXPR has a real value or not. Many operations return the undefined value under exceptional conditions. This function allows you to distinguish between an undefined null scalar and a defined null scalar with operations that might return a real null string, such as referencing elements of an array.

See also undef.

delete EXPR

Deletes the specified value from its hash array. Returns the deleted value or the undefined value if nothing was deleted. Deleting from $ENV{} modifies the environment. Deleting from an array tied to a DBM file deletes the entry from the DBM file.

The following deletes all the values of an associative array:


foreach $key (keys %ARRAY) { 

       delete $ARRAY{$key}; 

}

die LIST

Outside of an eval(), prints the value of LIST to STDERR and exits with the current value of $! (errno). If $! is 0, exits with the value of ($? > 8)> (backtick 'command' status). If ($? > 8)> is 0, exits with 255. Inside an eval(), the error message is stuffed into $@, and the eval() is terminated with the undefined value; this functionality makes die() the way to raise an exception in a script.

do BLOCK

Not really a function. Returns the value of the last command in the sequence of commands indicated by BLOCK. When modified by a loop modifier, executes the BLOCK once before testing the loop condition.

do SUBROUTINE(LIST)

A deprecated form of subroutine call. See the perlsub man page for more information on subroutines.

do EXPR

Uses the value of EXPR as a filename and executes the contents of the file as a Perl script. Its primary use is to include subroutines from a Perl subroutine library.


do 'stat.pl';

is just like


eval 'cat stat.pl';

except that it's more efficient, more concise, keeps track of the current filename for error messages, and searches all the -I libraries if the file isn't in the current directory. Both statements parse the file every time they are called.

A better way to include library modules is to use the use() and require() operators, which also do error checking and raise an exception if a problem occurs.

dump LABEL

This function causes an immediate core dump.

each ASSOC_ARRAY

Returns a two-element array consisting of the key and value for the next value of an associative array so that you can iterate over it. Entries are returned in an apparently random order. When the array is entirely read, a null array is returned. The following call to each() starts iterating again. The iterator can be reset only by reading all the elements from the array. You should not add elements to an array while you're iterating over it. Each associative array has a single iterator that all each(), keys(), and values() function calls in the program share.

eof [FILEHANDLE|()]

Returns 1 if the next read on FILEHANDLE returns end of file or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value gives the real filehandle name.

An eof without an argument uses the last file read as an argument. Empty parentheses may be used to indicate the pseudofile formed of the files listed on the command line. Use eof(ARGV) or eof without the parentheses to test each file in a while (<>) loop.

eval [EXPR|BLOCK]

EXPR is parsed and executed as if it were a little Perl program. It is executed in the context of the current Perl program so that any variable settings, subroutines, or format definitions remain afterwards. The value returned is the value of the last expression evaluated; alternatively, a return statement may be used, just as with subroutines.

Note
Eval can be very dangerous in CGI programming. Do not automatically eval anything sent to you by a Web browser

If a syntax error or runtime error occurs or a die() statement is executed, eval() returns an undefined value and $@ is set to the error message. If no error occurs, $@ is guaranteed to be a null string. If EXPR is omitted, eval evaluates $_. You may omit the final semicolon, if any, from the expression.

Note that because eval() traps otherwise fatal errors, it is useful for determining whether a particular feature (such as socket() or symlink()) is implemented. It is also Perl's exception-trapping mechanism, when the die operator is used to raise exceptions.

exec LIST

The exec() function executes a system command and never returns. Use the system() function if you want it to return.

exists EXPR

Returns TRUE if the specified hash key exists in its hash array even if the corresponding value is undefined.


print "Exists\n" if exists $array{$key}; 

print "Defined\n" if defined $array{$key}; 

print "True\n" if $array{$key};

A hash element can only be TRUE if it's defined, and it can be defined if it exists, but the reverse doesn't necessarily hold true.

exit [EXPR]

Evaluates EXPR and exits immediately with that value. See also die(). If EXPR is omitted, exits with 0 status.

exp [EXPR]

Returns e (the natural logarithm base) to the power of EXPR. If EXPR is omitted, gives exp($_).

fcntl FILEHANDLE,FUNCTION,SCALAR

Implements the fcntl(2) function.

fileno FILEHANDLE

Returns the file descriptor for a filehandle. This function is useful for constructing bitmaps for select(). If FILEHANDLE is an expression, the value is taken as the name of the filehandle.

flock FILEHANDLE,OPERATION

Calls flock(2) on FILEHANDLE. See the flock(2) man page for definition of OPERATION. Returns TRUE for success, FALSE for failure. This function produces a fatal error if it is used on a machine that doesn't implement either flock(2) or fcntl(2).

fork

Does a fork(2) system call. Returns the child process ID (PID) to the parent process and 0 to the child process, or returns undef if the fork is unsuccessful.

Note
Unflushed buffers remain unflushed in both processes, which means you may need to set $| ($AUTOFLUSH in English) or call the autoflush() filehandle method to avoid duplicate output

getc FILEHANDLE

getc returns the next character from the input file attached to FILEHANDLE or a null string at end of file. If FILEHANDLE is omitted, reads from STDIN. This is not particularly efficient. It cannot be used to get unbuffered single characters, however.

getlogin

Returns the current login from /etc/utmp, if any. If null, use getpwuid().

getpeername SOCKET

Returns the packed sockaddr address of the other end of the SOCKET connection.

getpgrp PID

Returns the current process group for the specified PID and returns 0 for the current process. Raises an exception if used on a machine that doesn't implement getpgrp(2). If PID is omitted, the function returns the process group of the current process.

getppid

Returns the process ID of the parent process.

getpriority WHICH,WHO

Returns the current priority for a process, a process group, or a user. (See the getpriority(2) man page.) Raises a fatal exception if used on a machine that doesn't implement getpriority(2).

getpwnam NAME
getgrnam NAME
gethostbyname NAME
getnetbyname NAME
getprotobyname NAME
getpwuid UID
getgrgid GID
getservbyname NAME,PROTO
gethostbyaddr ADDR,ADDRTYPE
getnetbyaddr ADDR,ADDRTYPE
getprotobynumber NUMBER
getservbyport PORT,PROTO
getpwent
getgrent
gethostent
getnetent
getprotoent
getservent
setpwent
setgrent
sethostent STAYOPEN
setnetent STAYOPEN
setprotoent STAYOPEN
setservent STAYOPEN
endpwent
endgrent
endhostent
endnetent
endprotoent
endservent

All these routines perform the same functions as their counterparts in the system library.

getsockname SOCKET

Returns the packed sockaddr address of this end of the SOCKET connection.


use Socket; 

$mysockaddr = getsockname(SOCK); 

($port, $myaddr) = unpack_sockaddr_in($mysockaddr);

getsockopt SOCKET,LEVEL,OPTNAME

Returns the socket option requested or returns undefined if there is an error.

glob EXPR

Returns the value of EXPR with filename expansions such as a shell would do. This routine is the internal function implementing the <*.*> operator.

gmtime EXPR

Converts a time as returned by the time function to a nine-element array with the time localized for the standard Greenwich time zone. Typically used as follows:


($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime(time);

All array elements are numeric and come straight out of a struct tm. Specifically, $mon has the range 0...11, and $wday has the range 0...6. If EXPR is omitted, gmtime performs the equivalent of gmtime( time()).

goto [LABEL|EXPR|&NAME]

The goto-LABEL form finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that requires initialization, such as a subroutine or a foreach loop, or to go into a construct that is optimized away. Although the goto-LABEL form can be used to go almost anywhere else within the dynamic scope, including out of subroutines, a better method is to use some other construct such as last or die.

The goto-EXPR form expects a label name, whose scope is resolved dynamically.

The goto-&NAME form substitutes a call to the named subroutine for the currently running subroutine. This form is used by AUTOLOAD subroutines that want to load another subroutine and then pretend that the newly loaded subroutine had been called in the first place. After the goto, not even caller() is able to tell which routine was called first.

grep [BLOCK|EXPR], LIST

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value consisting of those elements for which the expression evaluated to TRUE. In a scalar context, grep returns the number of times the expression was TRUE.

hex EXPR

Interprets EXPR as a hex string and returns the corresponding decimal value. If EXPR is omitted, the function uses $_.

index STR,SUBSTR[,POSITION]

Returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. The return value is based at 0 (or whatever you've set the $[ variable to). If the substring is not found, returns one less than the base, ordinarily -1.

int EXPR

Returns the integer portion of EXPR. If EXPR is omitted, uses $_.

ioctl FILEHANDLE,FUNCTION,SCALAR

Implements the ioctl(2) function.

join EXPR,LIST

Joins the separate strings of LIST or ARRAY into a single string, with fields separated by the value of EXPR, and returns the string. This routine can be used to create delimited records for inclusion in databases. For example, given a form that returns three variables that have been parsed into the hash %in, the following code


$in{'var 1'} = 'Last Name';

$in{'var 2'} = 'First Name';

$in{'var 3'} = 'Middle Name';

$dbrec = join(',', $in{'var1'}, $in{'var2'}, $in{'var3'});

results in


Last Name,First Name,Middle Name

See split.

keys ASSOC_ARRAY

Returns a normal array consisting of all the keys of the named associative array. (In a scalar context, returns the number of keys.) The keys are returned in an apparently random order, but it is the same order as either the values() or each() function produces.

This routine can be very useful in processing lists of key/value pairs. For example, given an associative array called %stuff, the following code prints the keys and their values:


foreach $key (keys %stuff) {

       print "$key = $stuff{$key}\n";

}

kill LIST

Sends a signal to a list of processes. The first element of the list must be the signal to send. Returns the number of processes successfully signaled.

last [LABEL]

The last command is like the break statement in C. It immediately exits the loop in question. If the LABEL is omitted, the command refers to the innermost enclosing loop. The continue block, if any, is not executed:


LINE: while (<STDIN>) { 

       last LINE if /^$/; # exit when done with header 

       ...

}

lc EXPR

Returns a lowercased version of EXPR.

lcfirst EXPR

Returns the value of EXPR with the first character lowercased.

length EXPR

Returns the length in characters of the value of EXPR. If EXPR is omitted, returns length of $_. Remember that unless you have reset $[, strings are zero based. Thus, length({string}) actually points one character beyond the end of the string.

link OLDFILE,NEWFILE

Creates a new filename linked to the old filename. Returns 1 for success, 0 otherwise. (Note, link might not be implemented on all operating systems.)

listen SOCKET,QUEUESIZE

Does the same thing that the listen system call does. Returns TRUE if it succeeded, FALSE otherwise.

local EXPR

A local modifies the listed variables to be local to the enclosing block, subroutine, eval{}, or do. If more than one value is listed, the list must be placed in parentheses.

localtime EXPR

Converts a time as returned by the time function to a nine-element array with the time analyzed for the local time zone. Typically used as follows:


($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

All array elements are numeric and come straight out of a struct tm. In particular, $mon has the range 0..11 and $wday has the range 0..6. If EXPR is omitted, does localtime(time).

In a scalar context, prints out the ctime(3) value:


$now_string = localtime; # e.g. "Thu Oct 13 04:54:34 1994"

log EXPR

Returns logarithm (base e) of EXPR. If EXPR is omitted, returns log of $_.

lstat [FILEHANDLE|EXPR]

Does the same thing as the stat() function, but performs the stat function on a symbolic link instead of the file to which the symbolic link points. If symbolic links are not implemented on your system, a normal stat() occurs.

m// or //

The match operator. See the section, "Perl Regular Expressions," for more details on the match operator and its available options.

map [BLOCK LIST|EXPR,LIST]

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. Evaluates BLOCK or EXPR in a list context, so each element of LIST may produce zero, one, or more elements in the returned value.


@chars = map(chr, @nums);

translates a list of numbers to the corresponding characters.

mkdir FILENAME,MODE

Creates the directory specified by FILENAME with permissions specified by MODE (as modified by umask). If it succeeds, mkdir returns 1; otherwise, it returns 0 and sets $! (errno). MODE varies depending on the operating system implementation.

msgctl ID,CMD,ARG

Calls the System V IPC function msgctl(2). If CMD is &IPC_STAT, then ARG must be a variable that holds the returned msqid_ds structure. Returns values like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value.

msgget KEY,FLAGS

Calls the System V IPC function msgget(2). Either returns the message queue ID or returns the undefined value if an error occurs.

msgsnd ID,MSG,FLAGS

Calls the System V IPC function msgsnd to send the message MSG to the message queue ID. MSG must begin with the long integer message type, which may be created with pack("l", $type). Returns TRUE if successful; returns FALSE if an error occurs.

msgrcv ID,VAR,SIZE,TYPE,FLAGS

Calls the System V IPC function msgrcv to receive a message from message queue ID into variable VAR with a maximum message size of SIZE. If a message is received, the message type is the first thing in VAR; the maximum length of VAR is SIZE plus the size of the message type. Returns TRUE if successful or returns FALSE if an error occurs.

my EXPR

A my declares the listed variables to be local (lexically) to the enclosing block, subroutine, eval, or do/require/use file. If more than one value is listed, the list must be placed in parentheses.

next [LABEL]

The next command is like the continue statement in C; it starts the next iteration of the loop:


LINE: while (<STDIN>) { 

       next LINE if /^#/; # discard comments 

       ... 

}

Note that if the preceding code contained a continue block, the block would be executed even on discarded lines. If the LABEL is omitted, the command refers to the innermost enclosing loop.

no Module LIST

This function is the opposite of the use function. See the use function.

oct EXPR

Interprets EXPR as an octal string and returns the corresponding decimal value. (If EXPR happens to begin with 0x, this function interprets it as a hex string instead.)

If EXPR is omitted, uses $_.

open FILEHANDLE[,EXPR]

Opens the file whose filename is given by EXPR and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE contains the filename. The following characters have special meaning if they begin the filename:

< or nothingOpened for input
>Opened for output
>>Opened for appending

You can put a + in front of the > or < to indicate that you want both read and write access to the file. Thus, +< is usually preferred for read/write updates-the +> mode would clobber the file first. These indicators correspond to the fopen(3) modes of r, r+, w, w+, a, and a+.

If the filename begins with a vertical bar (|), the filename is interpreted as a command to which output is to be piped; and if the filename ends with a |, the filename is interpreted as a command from which input will be piped.

Opening - opens STDIN, and opening >- opens STDOUT. Open returns nonzero upon success and returns the undefined value otherwise. If the open involved a pipe, the return value is the process id (PID) of the subprocess.

opendir DIRHANDLE,EXPR

Opens a directory named EXPR for processing by readdir(), telldir(), seekdir(), rewinddir(), and closedir(). Returns TRUE if successful. DIRHANDLEs have their own namespaces separate from FILEHANDLEs.

ord EXPR

Returns the numeric ASCII value of the first character of EXPR. If EXPR is omitted, uses $_.

pack TEMPLATE,LIST

Takes an array or list of values and packs it into a binary structure, returning the string containing the structure. The TEMPLATE is a sequence of characters that give the order and type of values, as follows:

A
ASCII string, space padded
a
ASCII string, null padded
b
Bit string, ascending bit order
B
Bit string, descending bit order
h
Hex string, low nybble first
H
Hex string, high nybble first
c
Signed char value
C
Unsigned char value
s
Signed short value
S
Unsigned short value
i
Signed integer value
I
Unsigned integer value
l
Signed long value
L
Unsigned long value
n
Short in "network" order
N
Long in "network" order
v
Short, little-endian order
V
Long, little-endian order
f
Single-precision float, native format
d
Double-precision float, native format
p
Pointer to null-terminated string
P
Pointer to a structure (fixed-length string)
u
Uuencoded string
x
Null byte
X
Back up a byte

Each letter may optionally be followed by a number that gives a repeat count. With all types except a, A, b, B, h, H, and P, the pack function gobbles up that many values from the list. A * for the repeat count means to use however many items are left. The a and A types gobble just one value but pack it as a string of length count, padding with nulls or spaces as necessary. (When unpacking, A strips trailing spaces and nulls, but a does not.) Likewise, the b and B fields pack a string that many bits long. The h and H fields pack a string that many nybbles long. The P packs a pointer to a structure of the size indicated by the length. Real numbers (floats and doubles) are in the native machine format only; because of the large number of floating formats and the lack of a standard network representation, no facility for interchange has been made. Therefore, packed floating-point data written on one machine may not be readable on another-even if both use IEEE floating-point arithmetic (as the "endian-ness" of the memory representation is not part of the IEEE specification). Note that Perl uses doubles internally for all numeric calculations, and converting from double to float and back to double again inevitably loses precision (for example, unpack("f", pack("f", $foo)) does not in general equal $foo).

You can generally use the same template in the unpack function.

package NAMESPACE

Declares the compilation unit as being in the given NAMESPACE. The scope of the package declaration is from the declaration itself through the end of the enclosing block (the same scope as the local() operator).

pipe READHANDLE,WRITEHANDLE

Opens a pair of connected pipes like the corresponding system call. Note that if you set up a loop of piped processes, deadlock can occur unless you are very careful. In addition, note that Perl's pipes use stdio buffering, so you may need to set $| to flush your WRITEHANDLE after each command, depending on the application.

pop ARRAY

Pops and returns the last value of the array, shortening the array by 1. If the array is empty, returns the undefined value. If ARRAY is omitted, pops the @ARGV array in the main program and the @_ array in subroutines, just like shift().

pos SCALAR

Returns the offset of where the last m//g search left off for the variable in question. (m//g searches for all occurrences of the regular expression in a line.) Can be modified to change that offset.

print [FILEHANDLE LIST|LIST]

Prints a string or a comma-separated list of strings. Returns TRUE if successful. FILEHANDLE may be a scalar variable name, in which case the variable contains the name of or a reference to the filehandle. If FILEHANDLE is omitted, prints to standard output or to the last selected output channel. If LIST is also omitted, prints $_ to STDOUT. To set the default output channel to something other than STDOUT, use the select operation. Note that, because print takes a LIST, anything in the LIST is evaluated in a list context, and any subroutine that you call has one or more of its expressions evaluated in a list context. Also, be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print.

push ARRAY,LIST

Treats ARRAY as a stack and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. Returns the new number of elements in the array.

q[q|x|w]/STRING/

Generalized quotes.

quotemeta EXPR

Returns the value of EXPR with all regular expression metacharacters backslashed.

rand [EXPR]

Returns a random fractional number between 0 and the value of EXPR. (EXPR should be positive.) If EXPR is omitted, returns a value between 0 and 1. This function produces repeatable sequences unless srand() is invoked. See also srand.

read FILEHANDLE,SCALAR,LENGTH[,OFFSET]

Attempts to read LENGTH bytes of data into variable SCALAR from the specified FILEHANDLE. Returns the number of bytes actually read or returns undef if an error occurs. SCALAR is grown or shrunk to the length actually read. An OFFSET may be specified to place the read data at some place other than the beginning of the string. This call is actually implemented in terms of stdio's fread call. To get a true read system call, see sysread.

readdir DIRHANDLE

Returns the next directory entry for a directory opened by opendir(). If used in a list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in a scalar context or a null list in a list context.

readlink EXPR

If symbolic links are implemented, readlink returns the value of a symbolic link. If not, it gives a fatal error. If a system error occurs, readlink returns the undefined value and sets $! (errno). If EXPR is omitted, uses $_.

recv SOCKET,SCALAR,LEN,FLAGS

Receives a message on a socket. Attempts to receive LENGTH bytes of data into the variable SCALAR from the specified SOCKET filehandle. Returns the address of the sender. Returns the undefined value if an error occurs. SCALAR is grown or shrunk to the length actually read. Takes the same flags as the system call of the same name.

redo [LABEL]

The redo command restarts the loop block without evaluating the conditional again. The continue block, if any, is not executed. If the LABEL is omitted, the command refers to the innermost enclosing loop.

ref EXPR

Returns a TRUE value if EXPR is a reference, FALSE otherwise. The value returned depends on what EXPR is a reference to. The built-in types that EXPR can reference include REF, SCALAR, ARRAY, HASH, CODE, and GLOB.

rename OLDNAME,NEWNAME

Changes the name of a file. Returns 1 for success, 0 otherwise. Does not work across file system boundaries.

require [EXPR]

Demands some semantics specified by EXPR or by $_ if EXPR is not supplied. If EXPR is numeric, demands that the current version of Perl ($] or $Perl_VERSION) be equal to or greater than EXPR.

Otherwise, demands that a library file be included if it hasn't already been included.

Note that the file is not included twice under the same specified name. The file must return TRUE as the last statement to indicate successful execution of any initialization code, so it's customary to end such a file with 1;.

If EXPR is a bare word, require assumes a .pm extension to enable you to load standard modules without altering your namespace.

reset [EXPR]

Generally used in a continue block at the end of a loop to clear variables and reset ?? searches so that they work again. The expression is interpreted as a list of single characters (hyphens are allowed for ranges). All variables and arrays beginning with one of those letters are reset to their pristine state. If the expression is omitted, one-match searches (?pattern?) are reset to match again. Only resets variables or searches in the current package and always returns 1.

return LIST

Returns from a subroutine or eval with the value specified. If LIST is omitted, a subroutine or eval() automatically returns the value of the last expression evaluated.

reverse LIST

In a list context, returns a list value consisting of the elements of LIST in the opposite order. In a scalar context, returns a string value consisting of the bytes of the first element of LIST in the opposite order.

rewinddir DIRHANDLE

Sets the current position to the beginning of the directory for the readdir() routine on DIRHANDLE.

rindex STR,SUBSTR[,POSITION]

Works just like index except that it returns the position of the last occurrence of SUBSTR in STR. If POSITION is specified, returns the last occurrence at or before that position.

rmdir [FILENAME]

Deletes the directory specified by FILENAME if it is empty. If it succeeds, rmdir returns 1; otherwise, rmdir returns 0 and sets $! (errno). If FILENAME is omitted, uses $_.

s///

The substitution operator. See the section "Perl Regular Expressions" for more detail on the substitution operator and its available options.

scalar EXPR

Forces EXPR to be interpreted in a scalar context and returns the value of EXPR.

seek FILEHANDLE,POSITION,WHENCE

Randomly positions the file pointer for FILEHANDLE, just like the C fseek() call of stdio. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are

0
Set file pointer to POSITION
1
Set file pointer to current plus POSITION
2
Set file pointer to EOF plus offset

You can use the values SEEK_SET, SEEK_CUR, and SEEK_END for this from a POSIX module. Returns 1 on success and 0 otherwise.

seekdir DIRHANDLE,POS

Sets the current position for the readdir() routine on DIRHANDLE. POS must be a value returned by telldir().

select [FILEHANDLE]

Returns the currently selected filehandle. If FILEHANDLE is supplied, select sets the current default filehandle for output. This action has two effects. First, a write or a print without a filehandle defaults to this FILEHANDLE. Second, references to variables related to output refer to this output channel.

select RBITS,WBITS,EBITS,TIMEOUT

Calls the select(2) system call with the bit masks specified, which can be constructed using fileno() and vec().

semctl ID,SEMNUM,CMD,ARG

Calls the System V IPC function semctl. If CMD is &IPC_STAT or &GETALL, then ARG must be a variable that holds the returned semid_ds structure or semaphore value array. semctl is similar to ioctl: in that both return the undefined value for error, "0 but true" for zero, or the actual return value otherwise.

semget KEY,NSEMS,FLAGS

Calls the System V IPC function semget. Returns the semaphore ID; if an error occurs, returns the undefined value.

semop KEY,OPSTRING

Calls the System V IPC function semop to perform semaphore operations such as signaling and waiting. OPSTRING must be a packed array of semop structures. Each semop structure can be generated with pack("sss", $semnum, $semop, $semflag). The number of semaphore operations is implied by the length of OPSTRING. Returns TRUE if successful; returns FALSE if an error occurs.

send SOCKET,MSG,FLAGS[,TO]

Sends a message on a socket. Takes the same flags as the system call of the same name. On unconnected sockets, you must specify a destination to send TO, in which case send does a C sendto(). Returns the number of characters sent or the undefined value if an error occurs.

setpgrp PID,PGRP

Sets the current process group for the specified PID, 0 for the current process. Produces a fatal error if used on a machine that doesn't implement setpgrp(2).