perl - Extracting string with regex stored in hash -
i'm trying parse out specific values text file, , output different file.
i'm using regular expressions stored in hash (matched descriptive name) search through string (scalar), , storing discovered values in array, written out file.
i've got working, except searching/extracting part. (i've learned perl in past couple days, wouldn't surprised if making simple mistakes.)
$inputstring = 'lorem ipsum dolor date: 20110131 quis semper egestas.'; %myregexhash = ( date => '/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/' ); @foundvaluesarray=(); while ( ($thefieldname, $theregex) = each (%myregexhash)) { if ($inputstring =~ $theregex) { push(@foundvaluesarray, "$thefieldname: $&\n"); $inputstring = $'; } } print "@foundvaluesarray"; the array fills field names ("date:"), not values i'm looking ("20110131").
any idea i'm doing wrong?
make 1 small change:
%myregexhash = ( date => qr/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/ ); note use of qr//, compiles regex.
you're new, i'd recommend few other changes.
any non-trivial program should begin following front matter:
#! /usr/bin/env perl use strict; use warnings; the strict pragma has nice benefits such catching misspelled variable names @ compile time , checking use of references. warnings pragma turns on warning diagnostics can alert questionable cases in code.
now must predeclare:
my $inputstring = 'lorem ipsum dolor date: 20110131 quis semper egestas.'; %myregexhash = ( date => qr/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/ ); @foundvaluesarray=(); the = () implied in array or hash declaration, don't see in idiomatic perl.
you don't want use $& if can because slows down entire program.
warning: once perl sees need 1 of
$&,$`, or$'anywhere in program, has provide them every pattern match. may substantially slow program. perl uses same mechanism produce$1,$2, etc., pay price each pattern contains capturing parentheses. (to avoid cost while retaining grouping behaviour, use extended regular expression(?: ... )instead.) if never use$&,$`or$', patterns without capturing parentheses not penalized. avoid$&,$', ,$`if can, if can't (and algorithms appreciate them), once you've used them once, use them @ will, because you've paid price. of 5.005,$¬ costly other two.
because surrounded pattern parentheses, substring matched captured in $1, grab there.
also, way chopped off front of $inputstring more naturally expressed in perl s///.
while (my ($thefieldname, $theregex) = each (%myregexhash)) { if ($inputstring =~ s/$theregex//) { push(@foundvaluesarray, "$thefieldname: $1\n"); } } print "@foundvaluesarray"; output:
date: 20110131
Comments
Post a Comment