Monday, March 28, 2011

Regular expression to find a line containing certain characters and remove that line

Hi,

I have text file which has lot of character entries one line after another. I want to find all lines which start with :: and delete all those lines.

What is the regular expression to do this?

-AD

From stackoverflow
  • Simple as:

    ^::
    
  • ^::.*[\r\n]*
    

    If you're reading the file line-by-line you won't need the [\r\n]* part.

  • Regular expressions don't "do" anything. They only match text.

    What you want is some tools that uses regular expressions to identify a line and then apply some command to those tools.

    One such tools is sed (there's also awk and many others). You'd use it like this:

    sed -e "/^::/d" < input.txt > output.txt
    

    The part "/^::/" tells sed to apply the following command to all lines that start with "::" and "d" simply means "delete that line".

    Or the simplest solution (which my brain didn't produce for some strange reason):

    grep -v "^::" input.txt > output.txt
    
    Dscoduc : I think you have forgotten the Regex.Replace function... That actually "does" something, doesn't it?
    Joachim Sauer : @Dcoduc: as you said: The function does something (its one of the tools I mentioned). The regular expression itself still only matches some text. It's the semantics of the function that defines what is to be done with the matched text.
    Dscoduc : Thanks for the clarification... I stand corrected...
  • sed -i -e '/^::/d' yourfile.txt
    
    oylenshpeegul : I think this is perhaps the best answer, but it might be worth mentioning that not all versions of sed have a -i option.
  • If you don't have sed or grep, find this and replace with empty string:

    ^::.*[\r\n]
    
  • Thanks for the pointers:

    Following thing worked for me. After "::" any character was possiblly present in the text file so i gave:

    ^::[a-zA-Z0-9 I put all punctuation symbols here]*$

    -AD

    Manu : you don't need to match enything after the initial ^:: In your example you are forced to "account for" all the characters because you put a $ at the end.
    Alan Moore : If he's using a line-oriented tool like grep you're right. But he still hasn't said.
    Alan Moore : @goldenmean, what's preventing you from using .* instead of that monster character class?
    Dscoduc : I agree, it would be probably better to use a singleline option and add the .* to the expression.
    Alan Moore : Single-line? Why would you want the dot to match newline characters? If you read one line at a time, there won't be any newlines to match, and if you read the whole file into memory before processing, the dot-star will consume the rest of the file the first time it's applied.
  • Here's my contribution in C#:

    Text stream:

    string stream = :: This is a comment line
    

    Syntax:

    Regex commentsExp = new Regex("^::.*", RegexOptions.Singleline);
    

    Usage:

    Console.WriteLine(commentsExp.Replace(stream, string.Empty));
    

    Alternatively, if I wanted to simply take a text file that included comments and produce an exact duplicate without the comment lines I could use a simple but effective combination of the type and findstr commandline tools:

    type commented.txt | findstr /v /R "^::" > uncommented.txt
    

0 comments:

Post a Comment