Geohash PHP class

Thursday, 28 February 2008, 0:43

geohash.org popped up recently with a method of generating short hash codes for geographical locations. The codes have some interesting properties and the algorithm is documented on wikipedia.

I recently started doing a Project Euler problem a day as a fun diversion. I found the geohash algorithm intriguing, so it became my “Euler problem” for the day!

Having done it, I’m not sure I’ll find a use for it myself, but if anyone else finds it handy, do let me know!

Download it here.

Munin plugin for graphing MySQL slave delay

Sunday, 10 February 2008, 11:41

I wanted to track the delay of a mysql slave server in Munin, and found nothing appropriate after a quick google. So, in case anyone else finds it useful, here is a plugin which will allow you to do just that.

Here’s a typical graph:

Slave Delay Graph

Download: mysql_slave (1 Kb)

EDIT: I wrote this so I could better understand the recovery time of a slave which had 13 hours of updates to catch up on. While it was good for that, it’s less useful for the day-to-day lag as munin will only be polling it every 5 minutes. A better way would be to use mk-heartbeat to get a more accurate moving average.

MySQL gotcha using BETWEEN with dates

Friday, 16 November 2007, 12:47

We just updated a development server from Ubuntu 6.06 to 6.10. As a result, MySQL went up a couple of point releases from 5.0.22 to 5.0.24a.

Suddenly, the web application we were working on stopped working. It wasn’t possible to log in.

We quickly narrowed it down to a clause in query which checked a users subscription was valid by checking its start and end dates. It looked bit like this:

SELECT foo FROM bar WHERE NOW() BETWEEN start AND end;

MySQL 5.0.22 returned rows, but 5.0.24 didn’t. After scratching our heads, I checked the manual:

For best results when using BETWEEN with date or time values, you should use CAST() to explicitly convert the values to the desired data type. Examples: If you compare a DATETIME to two DATE values, convert the DATE values to DATETIME values. If you use a string constant such as ‘2001-1-1′ in a comparison to a DATE, cast the string to a DATE.

The phrase “best results” make it sound like we’re baking a cake, and that your query results will be fluffier and more flavoursome if you use CAST! What they actually mean is, if you’re comparing different date types, you’re screwed if you don’t cast. You might get nothing.

Sure enough, our start and end columns were dates, and using cast in the query worked just fine:

SELECT foo FROM bar WHERE CAST(NOW() AS DATE)  BETWEEN start AND end;

I couldn’t find anything directly related to this change in behaviour in the changelogs, but hopefully this will help someone out there…

Silk icons for web apps

Wednesday, 10 October 2007, 17:18

IconsThe free Silk icon set from famfamfam.com is very useful for giving even the most mundane of web pages a little polish. With over 700 icons you’d think you’d have one for every occasion, right? Wrong! Or so says Damien Guard, who is releasing a companion set of 400+ icons in the same style as the original.

Nice work, will look forward to seeing them!

(20th Nov 2007: the new icon set is now available!)

Pastebin fights the spam!

Tuesday, 21 August 2007, 19:15

A few people have emailed me recently disappointed by the level of spam postings on pastebin.com. I’ve never really understood why spammers bother, but as they are bothering in increasing numbers it was time to take some action.

Last night I built in some spam filtering which has caught hundreds of posts since going live. I also added a “report spam” link which has flagged over 500 posts in past 20 hours. By iteratively tweaking the spam filter to identify the legimately flagged posts, I’ve been able to quickly delete a lot of older spam posts.

Hopefully this will make pastebin look like a well tended garden rather than a run-down wasteland! Comments welcome…

Pastebin – Turbo Boost Success!

Tuesday, 17 July 2007, 8:19

Just been checking the stats on pastebin.com and clearly the recent changes have worked well! Usage has trebled since last week and it’s still very responsive.

It’s nice to see that people still want to use it, so I’m going to ride this wave of enthusiasm and improve it further over the coming weeks.

Your feedback, as ever, is welcome!

Yahoo ZoneTag

Wednesday, 11 July 2007, 8:43

Now that I’ve got pastebin back on an even keel, there’s lots of Geograph work to be done. I’ve been slowly working on a tagging engine which is nearing completion, but this morning Barry Hunter pointed me in the direction of Yahoo’s ZoneTag project (via HighEarthOrbit)

ZoneTag can suggest your location and tagging information based on cellphone tower, zipcode or latitude & longitude. The last piece of the tagging system I’m working on is the suggestion engine, so this was of great interest.

I gave it a quick test run, and predictably, it’s a case of feast or famine. A request for my home town of Baldock in the UK produces two results – “Plinston Hall” and “Letchworth” – so not entirely useful. More fruitful was a request for the location of the London Eye – you’re deluged with a plethora of data – here are the “venue” tags it suggests:

Waterloo, cleopatra’s needle, Eurostar, Downing Street, Eagle, London Eye, Guard, Waterloo Station, Parliament, Somerset House, Whitehall, Westminster, Memorial, skyline, Big Ben, Embankment, Thames, graffiti, Hungerford Bridge, Dali, SW1, horse, Eye, Banksy, train, Trafalgar Square, National Theatre, St Martins Lane, trains, station, Ferris Wheel, bridge, river, Guards, church, SE1, pub, Waterloo Bridge, Protest, IMAX, Horse Guards Parade, Wheel, River Thames, Charing Cross, Underground, View, sign, Southwark, House, Tube, Palace of Westminster, Horse Guards, Red, night, London Aquarium, Victoria Tower, england, sky, UK, Lion, long exposure, blue, statue, Trafalgar, Lambeth, Millenium Wheel, Demonstration, Peace, Anti War, Bus, Elephant, Europe, Clock, Nelsons Column, Street, Oxo Tower, Guess Where London

Perhaps a few too many there, but still, not a bad result. As it’s an experimental beta, not sure we can use it directly for Geograph, but it might serve as a useful comparison for our own suggestion engine. We could even offer a compatible service based on our own data too…

Pastebin Reloaded!

Tuesday, 10 July 2007, 20:09

Well, I promised it waaaay back in january, but I’ve finally released an update to pastebin.com. A few people have asked for the source over the past few months and have seen some of the updates already, but here’s what’s new…

  • MySQL storage replaced with file-based storage, making it much faster
  • Revamped the colour scheme, which has been pretty much the same for 5 years
  • Added a ‘delete post’ feature
  • Switched to Affero GPL licence

If you’ve drifted away from pastebin due it’s lethargic speed, now’s the time to come back! Give it a whirl and if you have any feedback, leave a comment on this post.

Here’s some more detail on the changes…

File based storage

Pastebin used MySQL for storage since it was first launched in 2002. It has steadily grown in popularity, but that popularity began to take its toll on performance in the past 12 months.

Pastebin started out just keeping the last 1000 posts, which kept things zippy. Then I added custom domains, which increased the number of posts being retained, but what really hurt it was adding a common request – permanent posts, which meant that over time, the database grew inexorably larger.

In January I began to wonder if I needed a relational database at all. After all, pastebin is really just a single table application, and there are only two main operations:

  • Fetch post x
  • Get last 10 posts on domain foo

So I refactored the code to allow the storage mechanism to be changed. The new file based mechanism assigns a random identifier to a new post, e.g. abcdefgh and stores it in a structured directory:

posts/<d|m|f>/ab/cd/ef/abcdefgh

The top level directory ‘d’, ‘m’, or ‘f’ is chosen based on the desired lifetime of the post (1 day, 1 month or forever). Garbage collection of the 1 day posts in the ‘d’ directory can thus be carried out by performing a find for files older than a day with something like this running from cron every day:

find /path/to/pastebin/posts/d -mtime +1 -exec rm \{\} \;

To maintain the MRU lists of recent posts, the code maintains a serialized array for each domain. Whenever a post is made, this serialized file is locked, updated and unlocked. This is the only time the code can find itself competing for a shared resource, and even then its on a per-domain basis, rather than for the entire application as with the mysql storage.

As I write, this mechanism has been running for a few hours on the live site, and performance is much improved. At peak times it could take 15-20 seconds to make a post, it’s now much, much zippier!

Revamped Colour Scheme

I thought the old CSS was looking a little tired so I’ve freshened it up a little. I want to avoid adding graphics to the design and just use pure HTML and CSS if possible, which keeps things speedy too.

Comments on it are welcome, it’s likely I’ll tinker with it some more…

Delete Post

This is quite neat I think – if you choose to hit the “remember me” button, you’ll be assigned a random token which is used to mark your posts. This token is stored in a cookie. When you later view a post, if your cookie token and the post token match, you’ll be offered the opportunity of deleting the post.

I like this as you don’t have to go entering a password or setting up an account – it just works.

As always, if you’ve made a post you want removing and this feature doesn’t do it for you, just ask and I’ll take care of it

Changed to Affero GPL

The last few releases of pastebin used the GPL licence. Trouble is, while the GPL guarantees access to the source if you receive a binary copy of the software, with a website that doesn’t happen. The Affero GPL is a modified version of the GPL which contains an extra clause guaranteeing your access to the source when you interact with the software over a network.

So if you use pastebin in your own site, or adapt it further, you must continue to offer that source to your users. Lovely

What’s next?

Well, now that pastebin is actually usable again, I’m on a roll. The code has partially complete support for translation, and I’ve an army of volunteers ready to translate, so that’s the next goal…

Google Reader ’shared items’ Wordpress widget

Sunday, 8 July 2007, 9:46

Use Wordpress? Use Google Reader? Want a widget to share what you’re reading with the world? The widget that powers my “Interesting Posts” sidebar is now available for all – download it here – should work with any “widget ready” Wordpress template.

There are other ways to get a similar result, a popular one is to use Feedburner to clean up the Google feed, then use a standard RSS widget to display it.

Enjoy!

Safari on Windows

Monday, 11 June 2007, 20:37

Well how about that?

The browser wars are back. But this time round, everybody wins!