为列表l BossArrayike Yahoo search results
python (59), boss (11)Its been a while since I've worked on any small fun projects just for my own enjoyment, and this is something I've been wanting to put together for a couple of weeks and finally had a couple chunks of time to do it. I am pleased to announceboss_array.py
, which is a pleasant wrapper around theYahoo BOSS Mashup Framework, which allows search results to be treated like a regular list. Lets start off with an example.
>>>fromboss_arrayimportBossArrayx>>> x = BossArray("Tokyo cheap hotel")>>>x[0]{u'dispurl': u'directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'title': u'Tokyo Hotels in Japan - DirectRooms', u'url': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'abstract': u'Tokyo discount hotel reservations from DirectRooms. Save ... The hotel is located on the eastern side of Tokyo, Nihonbashi area, ... 5 >all Tokyo hotels on 1 ...', u'clickurl': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'date': u'2008/05/03', u'size': u'53291'} >>>len(x)298267>>>x[0:10]u"All ten results would display, removed for readability.">>>x[20:60]u"Results 20-60 would display...">>>x[0:200]u"Results 0-200 would display..."
There are a couple of convenient things happening here:
You specify the query once when you create the
BossArray
, and afterwards it remembers your search terms.It allows very easy access to the number of search terms (via the
len
function).Allows you to retrieve more than 50 results at once (the search api is limited for 50 results in one query, but the
BossArray
will break large requests apart into multiple queries, at the moment they are processed sequentially, so it can get a bit slow if you are retrieving a very large quantity of results at once).All search results are cached by the
BossArray
. That means if you retrievex[0:20]
and then retrievex[5]
it doesn't require an http request. If you attempted to retrievex[5:15]
it would use the cached copy as well. It works well in more complex situations as well, consider this:>>>_=x[0:50]>>>_=x[100:150]>>>_=x[0:200]
BossArray
is smart enough to handle that correctly. In the final lookup,x[0:200]
, it will use the cached results from 0-50, and 100-150, and perform two queries to fill in the missing gaps between 50-100 and 150-200.
Usage
You've already seen the usage in the above examples, but here are a few more examples. First we'll open the fifteenth search result in a web browser.
>>>fromboss_arrayimportBossArray>>>x=BossArray("Python")>>>import浏览器>>>浏览器.open(x[10]['url'])
Next, lets look at displaying all the urls for the first one hundred search results.
>>>fromboss_arrayimportBossArray>>>x=BossArray("Restaurants near Suitengumae Station")>>>urls=[a['url']forainx[:100]]>>>forurlinurls:>>>printurlu"url 1"u"url 2"u"..."
Basically, you use it like a Python list.
Setup
Setting upBossArray
is as simple assetting up the Yahoo BOSS Mashup framework, and then starting Python from a directory containing aconfig.json
file (as explained in the Yahoo BOSS Mashup framework setup instructions).
Repository and Download
You can download and contribute toBossArray
at its github repository.