Check a Google Sitemap for bad URLs
Create cagsmfbu.py:
import sys import httplib2 import xml.dom.minidom as md H = httplib2.Http() X = md.parse(open(sys.argv[1])) locs = X.getElementsByTagName("loc") for loc in locs: url = loc.childNodes[].nodeValue.encode('u8') try: res, content = H.request(url) print "%s\t%d" % (url, res.status) except: print "%s\tTOOMANY" % url sys.stdout.flush()
And then
% python cagsmfbu.py sitemap.xml | tee output.tdf