21 Jan 2010
Whoosh is quite a nice pure-python full text search engine. While it is still being
actively developed and is suitable for production usage there are still some rough edges. One problem that
stumped me for a while was searching stemmed fields.
Stemming is where you take the endings off words, such as ‘ings’ on the word endings. This reduces the
accuracy of searches but greatly increases the chances of users finding something related to what they were
looking for.n To create a stemmed field you need to tell Whoosh to use the
StemmingAnalyzer
, as
shown in the schema definition below.
from whoosh.analysis import StemmingAnalyzer
from whoosh.fields import Schema, TEXT, ID
schema = Schema(id=ID(stored=True, unique=True),
text=TEXT(analyzer=StemmingAnalyzer()))
Read More...
17 Dec 2009
I love listening to both BBC Radio 4 and BBC 6 Music. Like the rest of the BBC radio stations a significant
proportion of the shows are available as a podcast. Unfortunately this is not true of all the shows, and for
those that feature music such as Adam & Joe or Steve Lamacq the podcasts are talking only.
I watch almost all of TV through MythTV which records all of my favourite shows automatically while on my way
to work I like to listen to podcasts that are downloaded automatically by iTunes. Would it be possible to
automatically record shows with MythTV that aren’t available as podcasts and sync them to my iPhone
automatically?
Recording a radio show with MythTV is no different to recording a TV show so that’s not a problem. MythTV also
provides the ability to run a script after certain shows have been recorded. All that is required is a script
that converts the recording into an mp3 file and to build an RSS feed which can be read by iTunes.
Read More...
29 Sep 2009
It’s well known that one of the best things you can do to speed up CouchDB is to use bulk
inserts to add or update many documents at one
time.
Bulk updates are easy to use if you’re just blindly inserting documents into the database because you can just
maintain a list of documents. However, a common scheme that I often use is to call a view to determine whether
a document representing an object exists, update it if it does, add a new document if it doesn’t. To help make
this easier I use the DocCache
class given below.
The cache contains two interesting methods, get
and update
. Rather than writing directly to CouchDB when
you want to add or update a document just pass the document to update
. This will cache the document and
periodically save them in a bulk update.
It is possible that you will retrieve a document from CouchDB that an updated version exists in the cache. To
avoid the possibility that changes get lost you should pass the retrieved document to get
. This will either
return the document you passed in or the document that’s waiting to be saved if it exists in the cache.
Because there is a gap between when you ask for document to be saved and when it actually is saved any views
you use may be out of date, but that’s the cost of faster updates with CouchDB.
Read More...
25 Sep 2009
Recently I went to a wedding which had a casino theme. To keep the guests entertained they gave every guest
$100 from the Bank Of Fun to spend on the roulette and black jack tables. I decided to play roulette and I
knew that the best way to maximise my chances of winning was to bet only on odd or even and to double my bet
whenever I lost. At one point I was 2.6x up on my initial stake, but unfortunately, as you’d expect, I
eventually lost the lot.
I want to see what I could have done to increase my peak winnings, and to try my best to leave the table with
a positive cash flow. To do this we’ll simulate a roulette table using Python and try out various betting
strategies. The Roulette wheel that was used at the Wedding was an American wheel and featured the numbers 1
to 36 as well as 0 and 00. Betting on odd or even will win if a number 1 to 36 comes up and it is odd or even.
0 or 00 will lose you your money. If you win your stake is doubled. This means that by betting on odd or even
you stand a 47% chance of winning.
Read More...
02 Jul 2009
I’ve been developing a website in my spare time. Because I want to add plenty of
social features it makes sense to let users login using Facebook Connect. The Facebook platform is by far the
most successful social platform with many developers having created applications and websites that use it. I
expected that the experience for developers would be a good one. Unfortunately, I was disappointed.
Facebook makes it easy to register an application and provide links to libraries that wrap their API and make
it easy to get started. What Facebook don’t provide however is a downloadable version of their API to test
with. Facebook have made some effort to support test
users, but you have to open ports in your
firewall and use your real facebook account to test with. Testing a new user signing up for your app is really
quite a chore. Automating this sort of test is essentially impossible.
In an ideal world Facebook would produce a downloadable program that you can use to automatically user,
programmatically log in users and generally automatically test all the parts of your code. The danger is that
they’d have to give you a downloadable copy of their website code. Google App Engine give a similar
downloadable environment, and you can’t say that Google don’t have a load of code that they don’t want to give
away!
The Facebook API is pretty simple to get started with, and with in a couple of minutes you’ll have the code
written to log a user in. Checking that it all works though, is a much tougher challenge…
Read More...