My Writings - Programming
Search Google from command line
Last Updated (Monday, 20 September 2004 09:13) Monday, 20 September 2004 08:40
As I had pointer out in my previous entry, I had not been able to search google from within my emacs environment. Hence the "need" to write a command line script which I would be able to call from within emacs
The code is not the best I have written and any decent Python programmer would be able to make more improvements to it. If you do something clever with the code, it would be very kind of you to let me know about it too (raj at rajshekhar.net).
You need to have Pygoogle module installed. In its unaltered form, the script will require Python2.3 to run. However, if you remove the #--ugly hack part (see the comments in the code), it will run with Python2.2 too.
#!/usr/bin/python2.3
import google,sys,codecs
from sgmllib import SGMLParser
# HTML Stripper class to strip out html from the google search
# returned. shamelessly copy pasted from
# http://mail.python.org/pipermail/tutor/2002-September/017573.html
class HTMLStripper(SGMLParser):
def __init__(self):
SGMLParser.__init__(self)
self._text = []
def handle_data(self, data):
self._text.append(data)
def read_text(self):
return ''.join(self._text)
def strip_html(text):
stripper = HTMLStripper()
stripper.feed(text)
return stripper.read_text()
print "Searching the World Live Web "
google.setLicense('your google key') # must get your own key from http://www.google.com/apis/
codecs.register_error('xml', codecs.xmlcharrefreplace_errors)
n_show_results = 10 #change the number of search results that are shown from here
search_str = ""
for i in range(1,len(sys.argv)):
search_str = search_str + " " + sys.argv[i]
print "Searching for " ,search_str
data = google.doGoogleSearch(search_str,0,n_show_results)
print 'Search took %f time and I found a total of %d results' % (data.meta.searchTime, data.meta.estimatedTotalResultsCount)
for result in data.results:
temp = result.title
# see http://www.informit.com/articles/article.asp?p=31272&seqNum=5
# to know why this ugly hack is needed
#-- begin hack
in_tuple=codecs.getencoder('ASCII')(temp, 'xml')
in_str = str(in_tuple)
print 'Titlet:', strip_html(in_str)
#-- end hack
# if you just want to use the script from command line and not
# call it from emacs, you can remove the part between # -- hack
# block and replace it with the following line
# print 'Titlet:' result.title
print 'URL\t:', result.URL
print
print "\n "
| < Prev | Next > |
|---|