« Benchmarking SSD for MySQL | Main | Using O_DIRECT on Mac OS X »

Using Top N Sort on MySQL

One of the best practices on using MySQL is to avoid filesort. However there are cases where it is inevitable (e.g. ordering the result of fulltext search by modification date), and although in most cases we only the top N rows of sorted resultset are needed, MySQL does not implement top N sort.

After wondering for couple of months if I should hack the MySQL core to implement top-N-sort, today I decided to write a UDF that performs top N sort, and see how well it performs. And here it is: top-n-sort.c.

First the benchmark. When performing order by 〜 limit on an unindexed column of a 100k row table, top N sort is more than two times faster than the internal sort algorithm.

mysql> SELECT id,priority FROM testsort ORDER BY priority LIMIT 10;
10 rows in set (0.11 sec)

mysql> SELECT topn_get(v) AS id,topn_get_value(v) AS priority FROM int_seq WHERE v<(SELECT topn_set(10,id,priority) FROM testsort);
10 rows in set (0.05 sec)

The top-n-sort UDF consists of three functions, topn_set, topn_get, topn_get_value.

Topn_set function takes three arguments, number of top columns to preserve, id of the column, and sort value. It is an aggregate function that returns the number of rows that should be returned.

Topn_get function returns the n-th id of the sort result, while Topn_get_value function returns the n-th value.

So combining the results of the UDFs with a table (int_seq) that holds a sequence of integers starting from zero, it is possible to perform a top N sort within a single query using the UDF functions. And it is fast.

Next question will be whether I should add top N sort to MySQL by myself, now that I have proved there would be performance benefit? For myself the UDF version seems satisfactory althougth the syntax is a bit complicated.


So basically the hack here is that you never write it to disk?

You just keep the top N in memory and then evict low ranking item when they fall below the N window?

I have needed this for a long time :)

I've tryed to compile top-n-sort.c in windows but failed.
Why alloc memory and cache it as thread local data?
just for avoid memory fragment?

just for avoid memory fragment


Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)