Search Google from command line

As I had pointer out in my previous entry, I had not been able to search google from within my emacs environment. Hence the "need" to write a command line script which I would be able to call from within emacs

The code is not the best I have written and any decent Python programmer would be able to make more improvements to it. If you do something clever with the code, it would be very kind of you to let me know about it too (raj at rajshekhar.net).

You need to have Pygoogle module installed. In its unaltered form, the script will require Python2.3 to run. However, if you remove the #--ugly hack part (see the comments in the code), it will run with Python2.2 too.

#!/usr/bin/python2.3
import google,sys,codecs
from sgmllib import SGMLParser

# HTML Stripper class to strip out html from the google search
# returned.  shamelessly copy pasted from
# http://mail.python.org/pipermail/tutor/2002-September/017573.html

class HTMLStripper(SGMLParser):
    def __init__(self):
        SGMLParser.__init__(self)
        self._text = []

    def handle_data(self, data):
        self._text.append(data)

    def read_text(self):
        return ''.join(self._text)


def strip_html(text):
    stripper = HTMLStripper()
    stripper.feed(text)
    return stripper.read_text()

print "Searching the World Live Web "

google.setLicense('your google key') # must get your own key from  http://www.google.com/apis/

codecs.register_error('xml', codecs.xmlcharrefreplace_errors)

n_show_results = 10 #change the number of search results that are shown from here 
search_str = ""
for i in range(1,len(sys.argv)):
    search_str = search_str + " " + sys.argv[i]
    
print "Searching for " ,search_str

data = google.doGoogleSearch(search_str,0,n_show_results)
   
print 'Search took %f time and I found a total of %d results' % (data.meta.searchTime,  data.meta.estimatedTotalResultsCount)

for result in data.results:
    temp = result.title
    
    
    # see  http://www.informit.com/articles/article.asp?p=31272&seqNum=5
    # to know why this ugly hack is needed
    
    #-- begin hack

    in_tuple=codecs.getencoder('ASCII')(temp, 'xml')
    in_str = str(in_tuple)
    print 'Titlet:', strip_html(in_str)

    #-- end hack

    # if you just want to use the script from command line and not
    # call it from emacs, you can remove the part between # -- hack
    # block and replace it with the following line
    # print 'Titlet:' result.title
    
    
    print 'URL\t:', result.URL
    print
    
print "\n "

Additional information