Saturday, February 19, 2011

JavaScript RegEx for div tags

I have a JavaScript variable which holds an HTML page and due to the setup I need to extract everything between <div id="LiveArea"> and </div> from that variable using JavaScript.

Any help is greatly appreciated.

From stackoverflow
  • I'm not sure I follow you when you say, "Javascript variable which holds an html page", but If you need to extract the HTML between such a div, you can use the element's innerHTML property.

    
    var e = document.getElementById('LiveArea');
    if(e) alert(e.innerHTML);
    
    
    
  • This should do it:

    pattern = /<div id="LiveArea">(.*?)<\/div>/;
    matches = your_html_var.match(pattern);
    the_string = matches[0];
    
    document.write(the_string);
    
    PhiLho : Should be matches[1] to get the part inside the div. And indeed, one must hope there is no internal div... Might work on well defined context.
    Timothée Boucher : That wouldn't work because the end of the match could match a closing tag that doesn't (necessarily) correspond to your opening tag. If you make the expression lazy it could stop on a closing `div` inside `LiveArea`; if it is greedy, it would stop at the last closing `div`, again, not necessarily the one corresponding to your opening `div`. Also, PhilLho is right: `matches[0]` will hold the whole pattern matched and `matches[1]` will hold your capturing group.
  • This will not be possible with just a regular expression unless the HTML inside that div contains no other divs. Because what will happen with a pattern like Jeremy's is that it will match the first closing div tag, which wouldn't necessarily be the closing tag for the div#LiveArea element.

    If you have control over the source HTML, you could insert a comment that you could use to match on for the correct "closing" location.

    There are other javascript-only options, but they are each very kludgy or hacky

    1. Set the innerHTML of a hidden element equal to this string of content, THEN pull the innerHTML you need using mmattax's solution. But you will probably have to perform the 2nd step here with a timeout to give the browser time to evaluate this new HTML and expose it to the DOM.
    2. Actually parse the content, keeping track of opening/closing divs as you come across them so you will then know when you're at the correct </div> tag.
  • var temp = document.createElement('DIV');
    temp.innerHTML = YourVariable;
    var liveArea;
    for (var i = 0; i < temp.childNodes.length; i++)
    {
       if (temp.childNodes[i].id == 'LiveArea')
       {
           liveArea = temp.childNodes[i];
           break;
       }
    }
    
    Peter Bailey : Why is this being voted up? It doesn't even work. HTMLElement.getElementById() is not a standard DOM method. If this solution relies on a 3rd party library, then the response should indicate as such.
    FlySwat : Quite right, for some reason I thought that HTMLElements had getElementById on them (They should), I've corrected it to a way that will work.
    Peter Bailey : I agree, they should have that method. Still, this solution assumes that div#LiveArea will actually be a child node, and not a deeper descendant, which may not be the case w/the source HTML.
  • I found this article surfing on the web which take a DIV id and shows it on a new page to print it;

    function getPrint(print_area)
    {
    //Creating new page
    var pp = window.open();
    //Adding HTML opening tag with <HEAD> … </HEAD> portion 
    pp.document.writeln('<HTML><HEAD><title>Print Preview</title>')
    pp.document.writeln('<LINK href=Styles.css type="text/css" rel="stylesheet">')
    pp.document.writeln('<LINK href=PrintStyle.css ' + 
                        'type="text/css" rel="stylesheet" media="print">')
    pp.document.writeln('<base target="_self"></HEAD>')
    
    //Adding Body Tag
    pp.document.writeln('<body MS_POSITIONING="GridLayout" bottomMargin="0"');
    pp.document.writeln(' leftMargin="0" topMargin="0" rightMargin="0">');
    //Adding form Tag
    pp.document.writeln('<form method="post">');
    
    //Creating two buttons Print and Close within a HTML table
    pp.document.writeln('<TABLE width=100%><TR><TD></TD></TR><TR><TD align=right>');
    pp.document.writeln('<INPUT ID="PRINT" type="button" value="Print" ');
    pp.document.writeln('onclick="javascript:location.reload(true);window.print();">');
    pp.document.writeln('<INPUT ID="CLOSE" type="button" ' + 
                        'value="Close" onclick="window.close();">');
    pp.document.writeln('</TD></TR><TR><TD></TD></TR></TABLE>');
    
    //Writing print area of the calling page
    pp.document.writeln(document.getElementById(print_area).innerHTML);
    //Ending Tag of </form>, </body> and </HTML>
    pp.document.writeln('</form></body></HTML>');
    

    }

    You will call this script sending the DIV id you want to get;

    btnGet.Attributes.Add("Onclick", "getPrint('YOURDIV');")
    

    It worked exactly as I wanted. Hope it helps

  • it seems that javascript doesn't support lookbehinds which is very disapointing, that would make this problem so much easier to solve.

    (?<=<div id="LiveArea">).*(?=<\/div>)

    here are some links that might help out tho.

    although while discussing the issue of nested tags... that would be beyond the abilities of regex to solve so jeremy's solution is the best you can do with regex. and what is more they have to be on a single line... it won't even match if the the contents of the div are on seperate lines because there is no 's' flag for javascript. I think peter has given the answer for this one.

  • Sorry for late reply, if someone else stumbles on this problem here is my suggestion, assuming you have access to the page you are reading from source code.

    Add a HTML-comment like this

    <div id="LiveArea">
    <!--LiveArea-->
    Content here
    <!--EndLiveArea-->
    </div>
    

    Then match it with

    htmlVal.match(/<\!\-\-LiveArea"\-\->(.*?)<\!\-\-EndLiveArea"\-\->/);
    
  • Let jQuery do the parsing for you:

    $(page_html).find("#LiveArea").html();
    

0 comments:

Post a Comment