Scalable Collaborative Filtering With MongoDB

Many websites have some form of recommendation system. While it’s simple to create a recommendation system for small amounts of data, how do you create a system that scales to huge amounts of data?

How to actually calculate the similarity of two items is a complicated topic with many possible solutions. Which one if appropriate depends on your particularly application. If you want to find out more I suggest reading the excellent Programming Collective Intelligence (Amazon affiliate link) by Toby Segaran.

We’ll take the simplest method for calculating similarity and just calculate the percentage of users who have visited both pages compared to the total number who have visited either. If we have Page 1 that was visited by user A, B and C and Page 2 that was visited by A, C and D then the A and C visited both, but A, B, C and D visited either one so the similarity is 50%.

Read More...

Steve Jobs and the Lean Startup

Steve Jobs

On my 25 minute train journey to work each morning I like to pass the time by reading. The two most recent books I’ve read are The Lean Startup: How Constant Innovation Creates Radically Successful Businesses by Eric Ries and Steve Jobs by Walter Isaacson (both links contain an affiliate id). Although one is a biography and the other is a book on project management they actually cover similar ground, and both are books that people working in technology should read.

Walter Isaacson’s book has been extensively reviewed and dissected so I’m not going to go into detail on it. The book is roughly divided into two halves. The first section is on the founding of Apple, Pixar and NeXT. This section serves an inspirational guide to setting up your own company. The joy of building a great product and defying the odds against a company succeeding comes across very strongly. The later section following Job’s return to Apple is a much more about the nuts and bolts of running a huge corporation. While it’s an interesting guide to how Apple got to where it is today, it lacks the excitement of the earlier chapters.

Read More...

Django ImportError Hiding

A little while ago I was asked what my biggest gripe with Django was. At the time I couldn’t think of a good answer because since I started using Django in the pre-1.0 days most of the rough edges have been smoothed. Yesterday though, I encountered an error that made me wish I thought of it at the time.

The code that produced the error looked like this:

from django.db import models
class MyModel(model.Model):
    ...
    def save(self):
        models.Model.save(self)
        ...
    ...

The error that was raised was AttributeError: 'NoneType' object has no attribute 'Model'. This means that rather than containing a module object, models was None. Clearly this is impossible as the class could not have been created if that was the case. Impossible or not, it was clearly happening.

Read More...

Back Garden Weather in CouchDB (Part 5)

After a two week gap the recent snow in the UK has inspired me to get back to my series of posts on my weather station website, WelwynWeather.co.uk. In this post I’ll discuss the records page, which shows details such as the highest and lowest temperatures, and the heaviest periods of rain.

From a previous post in this series you’ll remember that the website is implemented as a CouchApp. These are Javascript functions that run inside the CouchDB database, and while they provide quite a lot of flexibility you do need to tailor your code to them.

On previous pages we have use CouchDB’s map/reduce framework to summarise data then used a list function to display the results. The records page could take a similar approach, but there are some drawbacks to that. Unlike the rest of the pages the information on the records page consists of a number of unrelated numbers. While we could create a single map/reduce function to process all of them at once. That function will quickly grow and become unmanageable, so instead we’ll calculate the statistics individually and use AJAX to load them dynamically into the page.

Read More...

Hackathons, and why your company needs one

I could wax lyrical about how programming is an art form and requires a great deal of creativity. However, it’s easy to loose focus on this in the middle of creating project specs and servicing your technical debt. Like many companies we recently held a hackathon event where we split up into teams and worked on projects suggested by the team members.

Different teams took different approaches to the challenge, one team set about integrating an open source code review site in our development environment, others investigated how some commercial technologies could be useful to us. My team built a collaborative filtering system using MongoDB. I’ll post about that project in the future, but in this post I wanted to focus on what we learnt about running a company Hackathon event.

If you’re lucky you’ll work in a company that’s focused on technology and you’ll always be creating new and interesting things. In the majority of companies technology is a means to a end, rather than the goal. In that case it’s easy to become so engrossed in the day to day work that you forget to innovate or to experiment with new technologies. A hackathon is a great way to take a step back and try something new for a few days.n Running a hackathon event should be divided into three stages, preparation, the event and the post event. Before the event you need to take some time to collect ideas and do some preliminary research. The event itself should be a whirlwind of pumping out code and building something exciting. Afterwards you need to take some time to demonstrate what you’ve built, and share what you’ve learnt.

Read More...