Tuesday, March 1, 2011

Match regexp in java

String text = "[! hello ¡world ¡] otra cosa ¡]";
String pt = "\\[!(.*)¡\\]";
Matcher mc = Pattern.compile(pt).matcher(text);
while( mc.find() ){
    System.out.println(mc.group(1));
}

This code prints hello ¡world ¡] otra cosa.

What would be a pattern that matches only hello ¡world?

What I don't find is a way to negate a literal string instead of just a char. Something like: ([^(¡\])]*)

The question is:
How to match everything that is NOT a literal string?

From stackoverflow
  • Just add a ? after the *

    String pt = "\\[!(.*?)¡\\]";
    
    Franco : Thanks man, this is the right answer.
    Esko : Franco, there's the green tick right below the voting arrows, if you feel this is the right answer, please accept it.
  • You need a shy or reluctant (non-greedy) expression.

    See the documentation for the java.util.regex.Pattern class for the Greedy Quantifier syntax. In your case, you want your Kleene star to match the shortest string possible. The normal behavior is greedy: it matches the longest string possible.

  • An alternative way to match the string would be to use negation so that you can avoid .*? which can cause undesirable results and slow performance.

    String pt = "\[!([^]]*)¡\]";
    

    This version will match everything up until it finds the bracket and only backtrack once to match the ¡.

    Jeremy Stein : I assume Franco wants `]` to be matched by the `.` if it isn't preceded by `¡`.
  • To answer your direct question, the way to only match . when it is not part of the string ¡] is to use a negative look-ahead:

    String pt = "\\[!((?:(?!¡\\]).)*)¡\\]";
    
    Franco : This is what I am looking for, matching . that is not ¡] for example. I'll see if it works

0 comments:

Post a Comment