Thursday, February 17, 2011

How can I search a particular column in Perl?

I have a text file which contains some data. I am trying to search for EA in ID column only and prints the whole row. But the code recognize all EA and prints all rows. What code I should add to satisfy the condition? Thanks Again:-)!

DATA:
Name Age ID
---------------------
KRISTE,22,EA2008
J**EA**N,21,ES4567
JAK,45,EA2008

The code prints:
KRISTE,22,EA2008
J**EA**N,21,ES4567
JAK,45,EA2008

Desired output:
KRIS,22,EA2008
Kane,45,EA2008,

file='save.txt';
open(F,$file)||die("Could not open $file");
while ($line=<F>){
if ($line=~ m/$EA/i) {
my @cells=($f1,$f2,$f3)= split ',',$line;
print "<TD>f1</TD>";
print "<TD>f2</TD>";
print "<TD>f3</TD>";
}
From stackoverflow
  • You almost had it, I think this should work:

    file='save.txt';
    open(F,$file)||die("Could not open $file");
    
    while ($line=<F>){
      my @cells=($f1,$f2,$f3)= split ',',$line;
      if ($f3=~ m/$EA/i) {
        print "<TD>f1</TD>";
        print "<TD>f2</TD>";
        print "<TD>f3</TD>";
      }
    }
    

    This splits the line into columns first, and then does the regex only on the third column.

    BTW your code may have other problems (for example those print statements don't look like they print the values of your variables) but I don't know perl very well so I only answered your main question...

    Shiel : my $EA=param('keyword');//example:$EA= 'EA'
    Jonathan Leffler : You've fixed the core problem right - split then match - but didn't fix the bust syntax. Brian fixed the bust syntax - but didn't fix the core problem.
  • You should post the actual sample program you are using to illustrate the problem. Here's your cleansed program:

    use strict;
    use warnings;
    
    use CGI;
    
    my $EA = param('keyword');
    
    my $file = 'save.txt';
    open my $fh, "<", $file or die "Could not open $file: $!";
    
    while( $line=<$fh> ) {
       if( $line=~ m/$EA/i ) {
           my( $f1, $f2, $f3 ) = split ',', $line;
           print "<TD>$f1</TD>";
           print "<TD>$f2</TD>";
           print "<TD>$f3</TD>";
           }
       }
    

    Here's a few things that can help you.

    • Your variables need their sigils. They don't do anything without them.
    • When you try to open a file and want to report an error, include the $! variable so you see what the error is.
    • You can split directly to scalar variables. It's just a list assignment. You don't need the extra @cell variable.
    • Give your statements some room to breathe by using some whitespace. It's free, after all.
    Jonathan Leffler : Your cleansed program is good - except it doesn't fix the problem.
    brian d foy : I didn't say it fixed the problem. I didn't bother to even think about the problem. I fixed the question though :)
  • A combination of brian's and Jeremy's code fixes all the problems:

    use strict;
    use warnings;
    
    my $file = 'save.txt';
    open my $fh, "<", $file or die "Could not open $file: $!";
    
    while ($line = <$fh>)
    {
        my($f1, $f2, $f3) = split ',', $line;
        if ($f3 =~ m/EA/i)
        {
            print "<TD>$f1</TD>";
            print "<TD>$f2</TD>";
            print "<TD>$f3</TD>";
        }
    }
    

    Brian had generalized the match pattern with use CGI; and my $EA = param('keyword'); but I undid that as I didn't see it as applicable to the question.

    brian d foy : the param stuff came from the OPs comments to the other answers.
    Jonathan Leffler : OK - fair enough. I managed to miss that. That's why you get code reviewed, anyway. :-D
  • Alternately, you could alter your regex to just match the third item in the list:

    /[^,]*,[^,]*,.*EA/
    
    Dave Sherohman : [^,]* would be better than .*?, as it explicitly states that you only want non-comma characters, which makes your intent more obvious to a human reader and allows the regex engine to avoid (potentially time-consuming) backtracking.
    Ben Doom : @Dave Sherohman, agreed (mostly about the backtracking -- I don't find it any more readable). Answer edited.
  • Your regex is incorrect for what you are trying to do. Ben's solution works, although there should also be a ^ at the start, which ensures that the regex will start matching from the start of the string:

    /^.?,.?,.*EA/

    Also, your code is kinda noisy, from a perl point of view. If you want to make your code easier to read, you can do this (I'm using Ben's regex):

    $f = 'save.txt';

    open( F, $file );

    @matches = grep { /^.?,.?,.*EA/ } <F>;

    Now @matches will hold all your matched records, you can do what you want with them.

  • Rather than trying to do the CSV parsing yourself, use the excellent and efficient Text::CSV_XS. This will handle escapes and quoting.

    #!/usr/bin/perl -w
    
    use Text::CSV_XS;
    
    my $csv = Text::CSV_XS->new();
    
    # Skip to the data.
    while(<DATA>) {
        last if /^-{10,}$/;
    }
    
    while( my $row = $csv->getline(*DATA) ) {
        print "@$row\n" if $row->[2] =~ /EA/;
    }
    
    
    __DATA__
    Name Age ID
    ---------------------
    KRISTE,22,EA2008
    J**EA**N,21,ES4567
    JAK,45,EA2008
    

0 comments:

Post a Comment