Code Answer: Regular expression to find a line containing certain characters and remove that line

Hi,

I have text file which has lot of character entries one line after another. I want to find all lines which start with :: and delete all those lines.

What is the regular expression to do this?

-AD

From stackoverflow

Simple as:
```
^::
```
```
^::.*[\r\n]*
```
If you're reading the file line-by-line you won't need the [\r\n]* part.
Regular expressions don't "do" anything. They only match text.

What you want is some tools that uses regular expressions to identify a line and then apply some command to those tools.

One such tools is sed (there's also awk and many others). You'd use it like this:
```
sed -e "/^::/d" < input.txt > output.txt
```
The part "/^::/" tells sed to apply the following command to all lines that start with "::" and "d" simply means "delete that line".

Or the simplest solution (which my brain didn't produce for some strange reason):
```
grep -v "^::" input.txt > output.txt
```
Dscoduc : I think you have forgotten the Regex.Replace function... That actually "does" something, doesn't it?

Joachim Sauer : @Dcoduc: as you said: The function does something (its one of the tools I mentioned). The regular expression itself still only matches some text. It's the semantics of the function that defines what is to be done with the matched text.

Dscoduc : Thanks for the clarification... I stand corrected...
```
sed -i -e '/^::/d' yourfile.txt
```
oylenshpeegul : I think this is perhaps the best answer, but it might be worth mentioning that not all versions of sed have a -i option.
If you don't have sed or grep, find this and replace with empty string:
```
^::.*[\r\n]
```
Thanks for the pointers:

Following thing worked for me. After "::" any character was possiblly present in the text file so i gave:

^::[a-zA-Z0-9 I put all punctuation symbols here]*$

-AD

Manu : you don't need to match enything after the initial ^:: In your example you are forced to "account for" all the characters because you put a $ at the end.

Alan Moore : If he's using a line-oriented tool like grep you're right. But he still hasn't said.

Alan Moore : @goldenmean, what's preventing you from using .* instead of that monster character class?

Dscoduc : I agree, it would be probably better to use a singleline option and add the .* to the expression.

Alan Moore : Single-line? Why would you want the dot to match newline characters? If you read one line at a time, there won't be any newlines to match, and if you read the whole file into memory before processing, the dot-star will consume the rest of the file the first time it's applied.
Here's my contribution in C#:

Text stream:
```
string stream = :: This is a comment line
```
Syntax:
```
Regex commentsExp = new Regex("^::.*", RegexOptions.Singleline);
```
Usage:
```
Console.WriteLine(commentsExp.Replace(stream, string.Empty));
```
Alternatively, if I wanted to simply take a text file that included comments and produce an exact duplicate without the comment lines I could use a simple but effective combination of the type and findstr commandline tools:
```
type commented.txt | findstr /v /R "^::" > uncommented.txt
```

Code Answer

Monday, March 28, 2011

Regular expression to find a line containing certain characters and remove that line

0 comments:

Post a Comment

Blog Archive