Tuesday, July 1, 2008

LEARN PERL(CLASS 3)

String Functions




CHAPTER 16: STRING FUNCTIONS


String Manipulation

- Perl provides several functions to perform various operations
on strings

- These are similar to the corresponding awk built-in string
functions


Length Function

- Returns the length in characters of an expression evaluated in
a scalar context

- length (SCALAR)
length SCALAR

- If SCALAR is omitted, the length of $_ is returned

- Ex.

$x = length ("toy1.c"); # $x is 6
$x = length (6 + 6); # $x is 2


Index Function

- Returns the position (starting at 0) of the first (leftmost)
occurrence of a substring (SUBSTRING) in a string (STRING)

- index (STRING, SUBSTRING, POSITION)
index (STRING, SUBSTRING)

- If the substring is not found in the string, -1 is returned

- If POSITION is specified, the search starts at that position and
the returned value will be greater than or equal to POSITION
(or -1)

- Ex.

$x = index ("testing", "t"); # $x is 0
$x = index ("testing", "bob"); # $x is -1
$x = index ("testing", "t", 2); # $x is 3


Rindex Function

- Returns the position (starting at 0) of the last (rightmost)
occurrence of a substring (SUBSTRING) in a string (STRING)

- rindex (STRING, SUBSTRING, POSITION)
rindex (STRING, SUBSTRING)

- If the substring is not found in the string, -1 is returned

- If POSITION is specified, then it is the rightmost position
that can be returned. So the returned value will be less than
or equal to POSITION (or -1).

- Note that the returned value of rindex is still the position of
the substring from the LEFT end of the string

- Ex.

$x = rindex ("testing", "t"); # $x is 3
$x = rindex ("testing", "t", 2); # $x is 0


Substr Function

- Extracts a substring from a string

- substr (STRING, OFFSET, LENGTH)
substr (STRING, OFFSET)

- Extracts the substring starting at position OFFSET of length
LENGTH from the string STRING

- If LENGTH is not specified, everything to the end of the string
is extracted. If LENGTH is zero or negative, the null string is
returned. The extracted characters NEVER go beyond the end of
the string.

- If OFFSET exceeds the string length, the null string is returned.
If OFFSET is a negative number, the extraction begins at OFFSET
characters from the end of the string. If a negative OFFSET
would cause the extraction to begin before the start of the string,
an offset of 0 is used.

- Ex.

$x = substr ("testing", 2); # $x is "sting"
$x = substr ("testing", 2, 3); # $x is "sti"
$x = substr ("testing", -2, 3); # $x is "ng"


Using Substr As An Lvalue

- If the STRING argument to the substr function is a scalar variable,
substr can itself be used on the left side of an assignment

- In this case, that part of the string which would have been
extracted is changed. The original string automatically grows
or shrinks as appropriate.

- This method is more efficient than string concatenation

- Ex.

$x = "Testing";
substr ($x, 4) = "ed"; # $x is now "Tested"

$x = "Testing";
substr ($x, 0, 0) = "Start "; # $x is now "Start Testing"
# A way to prepend!

$x = "Testing";
substr ($x, length ($x), 0) = " Over";
# $x is now "Testing Over"
# A way to append!


Note, however, that if the offset is more than the length of
the string, the original string is unchanged!

$x = "Testing";
substr ($x, length ($x) + 1, 0) = " Over";
# $x is still "Testing"


Sprintf Function

- Returns a string formatted by the usual "printf" format
specifications

- sprintf (FORMAT, LIST)

- Sprintf is useful in many circumstances. In particular, consider
the following. Suppose you want to invoke the system function as
follows:

system ("/bin/chmod 0755 toy1");

(BTW, it is MUCH better to use the chmod function to do the above,
but for the sake of an example, bear with me!)

Suppose the mode is stored in a scalar variable:

$mode = 0755;

If you use:

system ("/bin/chmod $mode toy1");

Perl expands $mode as a DECIMAL value (493 in this case) and
/bin/chmod complains about an invalid mode. To solve this
problem use sprintf to create a string with the proper octal
value for the mode:

$string = sprintf ("/bin/chmod %o toy1", $mode);
system ($string);


Hex Function

- Returns the decimal value of an expression interpreted as
a hex string

- hex (EXPR)
hex EXPR

- If EXPR is omitted, uses $_

- The hex function is used to convert input data in hex format
to the proper numeric value

- The hex function can handle strings with or without a leading
0x or 0X

- Ex.

$x = hex ("0xa2"); # $x is 162
$x = hex ("a2"); # $x is 162
$x = hex (0xa2); # $x is 354 (!)


Oct Function

- Returns the decimal value of an expression interpreted as
an octal string

- oct (EXPR)
oct EXPR

- If EXPR is omitted, uses $_

- The oct function is used to convert input data in octal format
to the proper numeric value

- The oct function can also handle strings with a leading 0x or
0X

- Ex.

$x = oct ("042"); # $x is 34
$x = oct ("42"); # $x is 34
$x = oct ("0x42"); # $x is 66
$x = oct (042); # $x is 28 (!)


Transliteration

- Translates all occurrences of the characters found in a search
list (SL) to the corresponding character in a replacement list
(RL).

- tr/SL/RL/
y/SL/RL/ (y is an alias for tr, for you sed fanatics!)

- Returns the number of characters replaced

- Similar to the UNIX "tr" command

- Operates on $_ by default. The target can be changed with the
=~ operator.

- If the RL is shorter than the SL, the last character of the RL
is repeated until the lists are equal length (but NOT if the d
(delete) option is used)

- If the RL is empty, a copy of the SL is used for the RL (but
NOT if the d (delete) option is used)

- A range of characters can be indicated by two characters
separated by a dash. (Use \- to get a literal dash.)

- More efficient than the substitution command

- Ex.

$x = "Testing";
$x =~ tr/et/ET/; # $x is now "TEsTing"

$x = "Testing";
$x =~ tr/a-z/x/; # $x is now "Txxxxxx"

$x = "Testing";
$x =~ tr/A-Z/a-z/; # $x is now "txxxxxx"
# (Converts uppercase to
# lowercase)

$x = "baacaad";
$y = $x =~ tr/a//; # $x is still "baacaad"
# $y is 4 (the number of a's in $x)


Options For The Transliteration Command

- d (delete) - deletes all characters in the SL which do NOT
have a corresponding character in the RL.
Those characters from the SL which do have a
corresponding character in the RL are translated
normally. If the RL is shorter than the SL, it
is NOT extended and if the RL is empty, it is
NOT equated to the SL.

- Ex.

$x = "Testing";
$x =~ tr/tei/w/d; # $x is now "Tswng"

- c (complement) - complements the SL with respect to the
characters \001 - \377. So the actual SL
is the set of all possible 256 characters
minus the original SL.

- Ex.

$x = "Good&Plenty";
$x =~ tr/a-zA-Z/ /c; # $x is now "Good Plenty"
# (All non-alphabetics are
# changed to blanks)

- s (squeeze) - squeezes sequences of the same TRANSLATED
characters to a single occurrence of that
character. Note that sequences of the
same character which occurred in the
original string and did NOT result from
the translation are NOT squeezed.

- Ex.

$x = "Good&Plenty";
$x =~ tr/len/x/s; # $x is now "Good&Pxty"

$x = "Good&Plenty";
$x =~ tr/len/t/s; # $x is now "Good&Ptty"


Other String Functions

- Don't forget our old favorites!

chop
print
printf
s///

No comments: