Está en la página 1de 27

PERL Regular Expressions

Regular Expressions (0)


Its a template that either matches or doesnt

match a given string.


One of the most important features of PERL -

a strong regular expression support

/PATTERN/

Regular Expressions (1)


the Dirty Dozen Metacharacters
\ . * +? ( ) |[ { ^ $
These characters have special meaning in
regular expressions.
A backslash in front of any meta-character
makes it non special.

Regular Expressions (2)


. matches any char except a newline \n

Quantifiers decides how many time the preceding


item has to be repeated.
/hello.you/ matches any string that has hello, followed by any
one (exactly one) character, followed by you.
/to*ols/ last character before * may be repeated zero or more
times. Matches tools,tooooools,tols (but not toxols !!!)
/to+ols/ ------//------- one or more -----//------.
/to.*ols/ matches to, followed by any string, followed by ols.

Regular Expressions(3)
/to?ols/ the character before ? is optional. Thus, there are only
two matching strings tools and tols.
/to{2}ls/ the number in {} tells about the repetitions
{count}

- Match exactly count times

{min,max} - Match at least min but not more than max times
{min,}

- Match at least min times

Write {} quantifier for *, +, ? ?

Regular Expressions (4)


Grouping parentheses ( ) are used for grouping one or more
characters.
/(tools)+/ matches toolstoolstoolstools.
Alternatives:
/hello (world|Perl)/ - matches hello world, hello Perl.

Regular Expressions (5)


Character Class - A list of all possible characters
/Hello [abcde]/ matches Hello a or Hello b
/Hello [a-e]/

the same as above

Negating:
[^abc] any char except a,b,c

Regular Expressions (6)


Shortcuts
\d digit [0-9]
\w word character [A-Za-z0-9_ ]
\s white space [\n \t \r \s]
Negative ^ [^\d] matches non digit
\S anything not \s
\D anything not \d
\W anything not \w

The character classes for -

1. Matching of vowels
2. Matching of consonants
3. Anything other than non Numbers
Diff between \D and [^\d]

Regular Expressions (7)


/^abc/ - ^ beginning of a string

Anchors

/a\^bc/ - matches \^
/[^abc]/ - negating

^ - marks the beginning of the string


$ - marks the end of the string
/^Hello Perl/ - matches Hello Perl, good by Perl, but not Perl
Hello Perl
What pattern will match blank lines ?
/^\s*$/ - matches all blank lines

Regular Expressions (8)


\b - matches at either end of a word (matches the start or the
end of a group of \w characters)
/\bPerl\b/ - matches Hello Perl, Perl
but not Perl++
/^\w+\b/ matches with what part of Thats my house

\B - negative of \b

Regular Expressions (9)


Back references:
/(World|Perl) \1/ - matches World World, Perl Perl.

/((hello|hi) (world|Perl))/
\1 refers to (hello|hi) (world|Perl)
\2 refers to (hello|hi)
\3 refers to (world|Perl)

$1,$2,$3 store the


values of \1,\2,\3 after
a reg.expr. is applied.

Regular Expressions (10)


Option modifiers
/i : Case insensitive
/s : . will match \n
/m : Let ^ & $ match next to embedded \n
/x : Ignore white spaces
/o : Compile the pattern once

Regular Expressions (11)


Bind Operator

=~

Tells Perl to match the pattern on the right


against the string on the left.

Pattern match operator m//


$str =~ /pattern/;
$str =~ m/pattern/;

Regular Expressions (12)


When no variable is mentioned the pattern is
matched with default variable $_
if( $str =~ /hello/){

while( <STDIN> ){
if( /hello/ ){

}
@words = split /\s+/, $str;

}
}

Examples
$date="12 10
10";
if($date=~ /(\d+)/){
print
$1.":".$2.":".$3.":\n";
}
#output ($2 and $3 are empty):
#12:::
if($date=~ /(\d+)(\s+\1)+/){
print $1.":".$2.":".$3.":\n";
}
#output (notice $3 is empty):
#10:
10::

$str="Hello World";
if($str=~ /((Hello|Hi) (World|Perl))/)
{
print $1.":".$2.":".$3.":\n";
}
#output:
#Hello World:Hello:World:
$str="Hello Perl Hi";
if($str=~ /((Hello|Hi) (World|Perl)) \
1/){
print $1.":".$2.":".$3.":\n";
}
#output: non
$str="Hello Perl Hi";
if($str=~ /((Hello|Hi) (World|Perl)) \
1/){
print $1.":".$2.":".$3.":\n";
}
#output:
#Hi Perl:Hi:Perl:

Examples
1. What is it?
/^0x[0-9a-fA-F]+$/

2. Date format: Month-Day-Year -> Year:Day:Month


$date = 12-31-1901;
$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;

Examples
3. Make a pattern that matches any line of input that has
the same word repeated two (or more) times in a row.
Whitespace between words may differ.

4. /^\w+\b/ matches with what part of Thats my house

Example
1. /\w+/

#matches a word

2. /(\w+)/

#to remember later

3. /(\w+)\1/

#two times

4. /(\w+)\s+\1/ #whitespace between words


5. This is a test -> /\b(\w+)\s+\1/
6. This is the theory -> /\b(\w+)\s+\1\b/

Lets try
1) Write a regular expression that identifies a 24-hour
clock. For example: 0:01, 00:20, 15:00, 23:59

2) Write a regular expression that identifies a floating


point. For example: 10, 10.0001, -0.1, +001.3456789

For both write a single program that identifies these


patterns in the input lines and prints out only the
matched patterns.

Negated Match
Negation
if( $str =~ /hello/){

if( $str !~ /hello/){

Regular Expressions (13)


$&

- what really was matched

$`

- what was before

- the rest of the string after the matched pattern

$` . $& . $ - original string

Caution: Never use this in your script if you really dont need
this.

Regular Expressions (14)


Substitutions:
s/T/U/; #substitutes T with U (only once)
s/T/U/g; #global substitution
s/\s+/ /g; #collapses whitespaces
s/(\w+) (\w+)/$2 $1/g;
s/T/U/; #applied on $_ variable
$str =~ s/T/U/;

Regular Expressions (15)


File Extension Renaming:
my ($from, $to) = @ARGV;
@files = glob (*.$from);
foreach $file (@files){
$newfile = $file;
s/\.$from$/\.$to/g
$newfile =~=~
s/\.$from/\.$to/g;
rename($file, $newfile);
}

Split and Join


$str=aaa bbb

ccc

dddd;

@words = split /\s+/, $str;


$str = join :, @words;

#result is aaa:bbb:ccc:dddd

@words = split /\s+/, $_; aaa b -> , aaa, b


@words = split;

aaa b ->

aaa, b

@words = split , $_;

aaa b ->

aaa, b

Grep

grep EXPR, LIST;


@results = grep /^>/, @array;
@results = grep /^>/, <FILE>;

Thank You !!!

También podría gustarte