" /> Kazuho at Work: December 2008 Archives

« October 2008 | Main | February 2009 »

December 12, 2008

Using Top N Sort on MySQL

One of the best practices on using MySQL is to avoid filesort. However there are cases where it is inevitable (e.g. ordering the result of fulltext search by modification date), and although in most cases we only the top N rows of sorted resultset are needed, MySQL does not implement top N sort.

After wondering for couple of months if I should hack the MySQL core to implement top-N-sort, today I decided to write a UDF that performs top N sort, and see how well it performs. And here it is: top-n-sort.c.

First the benchmark. When performing order by 〜 limit on an unindexed column of a 100k row table, top N sort is more than two times faster than the internal sort algorithm.

mysql> SELECT id,priority FROM testsort ORDER BY priority LIMIT 10;
10 rows in set (0.11 sec)

mysql> SELECT topn_get(v) AS id,topn_get_value(v) AS priority FROM int_seq WHERE v<(SELECT topn_set(10,id,priority) FROM testsort);
10 rows in set (0.05 sec)

The top-n-sort UDF consists of three functions, topn_set, topn_get, topn_get_value.

Topn_set function takes three arguments, number of top columns to preserve, id of the column, and sort value. It is an aggregate function that returns the number of rows that should be returned.

Topn_get function returns the n-th id of the sort result, while Topn_get_value function returns the n-th value.

So combining the results of the UDFs with a table (int_seq) that holds a sequence of integers starting from zero, it is possible to perform a top N sort within a single query using the UDF functions. And it is fast.

Next question will be whether I should add top N sort to MySQL by myself, now that I have proved there would be performance benefit? For myself the UDF version seems satisfactory althougth the syntax is a bit complicated.