Tuesday, May 3, 2011

Online tool for crawling a website and retriving all meta information for every page

Does anyone know of a free online tool that can crawl any given website and return just the Meta Keywords and Meta Description information?

From stackoverflow
  • I think you have to code by yourself :(

    But no panic, it's not hard, I think you can do it with a couple of unix commands :)

  • Assuming you have access to Linux/Unix:

    mkdir temp
    cd temp
    wget -r SITE_ADDRESS
    

    Then, for keywords:

    egrep -r -h 'meta[^>]+name="keywords' * | sed 's/^.*content="\([^"]*\)".*$/\1/g'
    

    and for descriptions:

    egrep -r -h 'meta[^>]+name="description' * | sed 's/^.*content="\([^"]*\)".*$/\1/g'
    

    If you want all the unique keywords, try:

    egrep -r -h 'meta[^>]+name="keywords' * | sed 's/^.*content="\([^"]*\)".*$/\1/g' | sed 's/\s*,\s*/\n/g' | sort | uniq
    

    I'm sure there's a one-liner or program out there that does this exact thing, and there are definitely easier answers.

0 comments:

Post a Comment