New Delicious

26 Oct 2011

I’ve been a long time user of delicious.com, back from when it was had the hard-to-remember address of del.icio.us. Even though bookmark syncing is built into both Firefox and Chrome having your bookmarks available on a website that you can access from anywhere, and integrated into your browser using an extension is a big advantage for me.

Recently the was some cause for concern as Yahoo decided they wanted to offload the site. As it had been a Yahoo property for a long time anyone taking it on would have a big job just to extricate it from Yahoo’s infrastructure. AVOS, a new venture from the founders of YouTube took the plunge.n Several months on and the site suddenly relaunched. Superficially it looks very similar, but with a slight Web 2.0 sheen to it. The transition seems to have been handled very well given the scale of the rewrite that was required. Aside from a smattering of bugs which have been quickly squashed the biggest issue seems to have been with the people who ignored the warnings and didn’t agree to allow Yahoo to send AVOS their details.

The biggest change is the addition of stacks. These are curated sets of links on a single topic. While the old Delicious showed popular links on the frontpage the new site shows featured stacks. As you can associate an image with a stack this gives the frontpage a much more visual look. The old website was undeniably plain and the images really brighten it up.

Beating Google With CouchDB, Celery and Whoosh (Part 8)

21 Oct 2011

In the previous seven posts I’ve gone through all the stages in building a search engine. If you want to try and run it for yourself and tweak it to make it even better then you can. I’ve put the code up on GitHub. All I ask is that if you beat Google, you give me a credit somewhere.

When you’ve downloaded the code it should prove to be quite simple to get running. First you’ll need to edit settings.py. It should work out of the box, but you should change the USER_AGENT setting to something unique. You may also want to adjust some of the other settings, such as the database connection or CouchDB urls.n To set up the CouchDB views type python manage.py update_couchdb.

Next, to run the celery daemon you’ll need to type the following two commands:

python manage.py celeryd -Q retrieve
python manage.py celeryd -Q process

This sets up the daemons to monitor the two queues and process the tasks. As mentioned in a previous post two queues are needed to prevent one set of tasks from swamping the other.

Beating Google With CouchDB, Celery and Whoosh (Part 7)

19 Oct 2011

The key ingredients of our search engine are now in place, but we face a problem. We can download webpages and store them in CouchDB. We can rank them in order of importance and query them using Whoosh but the internet is big, really big! A single server doesn’t even come close to being able to hold all the information that you would want it to - Google has an estimated 900,000 servers. So how do we scale this the software we’ve written so far effectively?

The reason I started writing this series was to investigate how well Celery’s integration with CouchDB works. This gives us an immediate win in terms of scaling as we don’t need to worry about a different backend, such as RabbitMQ. Celery itself is designed to scale so we can run celeryd daemons as many boxes as we like and the jobs will be divided amongst them. This means that our indexing and ranking processes will scale easily.

CouchDB is not designed to scale across multiple machines, but there is some mature software, CouchDB-lounge that does just that. I won’t go into how to get set this up but fundamentally you set up a proxy that sits in front of your CouchDB cluster and shards the data across the nodes. It deals with the job of merging view results and managing where the data is actually stored so you don’t have to. O’Reilly’s CouchDB: The Definitive Guide has a chapter on clustering that is well worth a read.

iPhone 4S

17 Oct 2011

This weekend I joined the hysterical masses and upgraded my increasingly ancient iPhone 3G to a shiny new 64GB iPhone 4S. Except that it was actually a bit of an anticlimax. I went into my local O2 shop at about 10:30am on Saturday morning, the day after the launch, and purchased a phone. No queueing, no raging hoards. I didn’t even have to shove a granny out of the way to get one. However, after handing over my credit card while cringing at the expense it was back home to enjoy the famous Apple unboxing experience.

I wish I’d never upgraded my 3G to iOS 4.2. Up until that point it was a great phone. Afterwards it was slow and applications would repeated crash on start up. Did I mention it was slow?

It’s hard to express just how much quicker the 4S is compared to my 3G. Often just typing my the passcode would be too quick for the 3G and it would miss one of the numbers forcing me to go back. No danger of this with the 4G though. Application starting, browsing the web, taking photos are all super speedy.

Although it’s the same as the iPhone 4 the screen is still incredible. It’s so bright and sharp it’s really a joy to use. It really comes into its own when browsing webpages that are designed for bigger screens. The extra detail really helps you to work out where to zoom in.

Beating Google With CouchDB, Celery and Whoosh (Part 6)

13 Oct 2011

We’re nearing the end of our plot to create a Google-beating search engine (in my dreams at least) and in this post we’ll build the interface to query the index we’ve built up. Like Google the interface is very simple, just a text box on one page and a list of results on another.

To begin with we just need a page with a query box. To make the page slightly more interesting we’ll also include the number of pages in the index, and a list of the top documents as ordered by our ranking algorithm.

In the templates on this page we reference base.html which provides the boiler plate code needed to make an HTML page.

Older Newer

Andrew Wilkinson

New Delicious

Beating Google With CouchDB, Celery and Whoosh (Part 8)

Beating Google With CouchDB, Celery and Whoosh (Part 7)

iPhone 4S

Beating Google With CouchDB, Celery and Whoosh (Part 6)