Searching Stemmed Fields With Whoosh

Whoosh is quite a nice pure-python full text search engine. While it is still being actively developed and is suitable for production usage there are still some rough edges. One problem that stumped me for a while was searching stemmed fields.

Stemming is where you take the endings off words, such as ‘ings’ on the word endings. This reduces the accuracy of searches but greatly increases the chances of users finding something related to what they were looking for.n To create a stemmed field you need to tell Whoosh to use the StemmingAnalyzer, as shown in the schema definition below.

from whoosh.analysis import StemmingAnalyzer
from whoosh.fields import Schema, TEXT, ID
schema = Schema(id=ID(stored=True, unique=True),
                       text=TEXT(analyzer=StemmingAnalyzer()))
Read More...

Custom Podcasts With MythTV

I love listening to both BBC Radio 4 and BBC 6 Music. Like the rest of the BBC radio stations a significant proportion of the shows are available as a podcast. Unfortunately this is not true of all the shows, and for those that feature music such as Adam & Joe or Steve Lamacq the podcasts are talking only.

I watch almost all of TV through MythTV which records all of my favourite shows automatically while on my way to work I like to listen to podcasts that are downloaded automatically by iTunes. Would it be possible to automatically record shows with MythTV that aren’t available as podcasts and sync them to my iPhone automatically?

Recording a radio show with MythTV is no different to recording a TV show so that’s not a problem. MythTV also provides the ability to run a script after certain shows have been recorded. All that is required is a script that converts the recording into an mp3 file and to build an RSS feed which can be read by iTunes.

Read More...

CouchDB Document Cache

It’s well known that one of the best things you can do to speed up CouchDB is to use bulk inserts to add or update many documents at one time.

Bulk updates are easy to use if you’re just blindly inserting documents into the database because you can just maintain a list of documents. However, a common scheme that I often use is to call a view to determine whether a document representing an object exists, update it if it does, add a new document if it doesn’t. To help make this easier I use the DocCache class given below.

The cache contains two interesting methods, get and update. Rather than writing directly to CouchDB when you want to add or update a document just pass the document to update. This will cache the document and periodically save them in a bulk update.

It is possible that you will retrieve a document from CouchDB that an updated version exists in the cache. To avoid the possibility that changes get lost you should pass the retrieved document to get. This will either return the document you passed in or the document that’s waiting to be saved if it exists in the cache. Because there is a gap between when you ask for document to be saved and when it actually is saved any views you use may be out of date, but that’s the cost of faster updates with CouchDB.

Read More...

Charming Roulette

Recently I went to a wedding which had a casino theme. To keep the guests entertained they gave every guest $100 from the Bank Of Fun to spend on the roulette and black jack tables. I decided to play roulette and I knew that the best way to maximise my chances of winning was to bet only on odd or even and to double my bet whenever I lost. At one point I was 2.6x up on my initial stake, but unfortunately, as you’d expect, I eventually lost the lot.

I want to see what I could have done to increase my peak winnings, and to try my best to leave the table with a positive cash flow. To do this we’ll simulate a roulette table using Python and try out various betting strategies. The Roulette wheel that was used at the Wedding was an American wheel and featured the numbers 1 to 36 as well as 0 and 00. Betting on odd or even will win if a number 1 to 36 comes up and it is odd or even. 0 or 00 will lose you your money. If you win your stake is doubled. This means that by betting on odd or even you stand a 47% chance of winning.

Read More...

Testing A Facebook Connect Site

I’ve been developing a website in my spare time. Because I want to add plenty of social features it makes sense to let users login using Facebook Connect. The Facebook platform is by far the most successful social platform with many developers having created applications and websites that use it. I expected that the experience for developers would be a good one. Unfortunately, I was disappointed.

Facebook makes it easy to register an application and provide links to libraries that wrap their API and make it easy to get started. What Facebook don’t provide however is a downloadable version of their API to test with. Facebook have made some effort to support test users, but you have to open ports in your firewall and use your real facebook account to test with. Testing a new user signing up for your app is really quite a chore. Automating this sort of test is essentially impossible.

In an ideal world Facebook would produce a downloadable program that you can use to automatically user, programmatically log in users and generally automatically test all the parts of your code. The danger is that they’d have to give you a downloadable copy of their website code. Google App Engine give a similar downloadable environment, and you can’t say that Google don’t have a load of code that they don’t want to give away!

The Facebook API is pretty simple to get started with, and with in a couple of minutes you’ll have the code written to log a user in. Checking that it all works though, is a much tougher challenge…

Read More...