« Implementing Timeline in Web Services - Paradigms and Techniques | Main | Performance of MySQL UDFs vs. Native Functions »

Optimizing MySQL Performance Using Direct Access to Storage Engines (faster timelines)

First, let's look at the numbers. The table below lists the speed of building a timeline like Twitter does, all of them using pull model.

Building Timelines on MySQL
timelines / sec.
Stored Procedure136
UDF using Direct Access1,710

As I explained in my previous post (Implementing Timeline in Web Services - Paradigms and Techniques, it is difficult (if not impossible) to write an optimal SQL query to build timelines on the fly. Yesterday I asked on the MySQL Internals mailing list whether it is possible to write code that directly accesses the storage engine (in my case InnoDB) for the highest performance, and Brian gave me a quick response (thank you) that there is a MySQL branch that supports writing stored procedures in external languages. So I looked into it, but since it seemed to me like directed towards flexibility than performance. Wondering for a while, I came up with an idea of calling storage engine APIs from an UDF, tried, and it worked!

The code can be found here (/lang/sql/mysql_timeline - CodeRepos::Share - Trac). Its only about 120 lines long with a general helper library (in C++ template) with about the same size. And although it uses a better-tuned version of an algorithm described in my previous post, the core code is as small as follows. IMHO, it is easier to understand than the stored procedure version.

    int4store(follower_keybuff, user_id);
    if (follower_tbl->file->index_read_map(follower_tbl->record[0],
					   follower_keybuff, 1, HA_READ_PREFIX)
	== 0) {
      do {
	unsigned follower_id = follower_follower_id_fld->val_int();
	int4store(message_keybuff, follower_id);
	if (message_tbl->file->index_read_map(message_tbl->record[0],
					      message_keybuff, 1,
	    == 0) {
	  do {
	    if (! test_add_id(message_id_fld->val_int())) {
	  } while (message_tbl->file->index_prev(message_tbl->record[0]) == 0
		   && message_user_id_fld->val_int() == follower_id);
      } while (follower_tbl->file->index_next(follower_tbl->record[0]) == 0
	       && follower_user_id_fld->val_int() == user_id);

To use the code, all you have to do is to compile to a shared library, and install it as an ordinal UDF.

% g++ -I ~/dev/mysql/51/32-src/include -I ~/dev/mysql/51/32-src/sql -I ~/dev/mysql/51/32-src/regex -g -Wall -O1 -fno-rtti -fno-exceptions  -shared -o timeline.so timeline.cc
% cp timeline.so ~/dev/mysql/51/32-bin/lib/plugin
% mysql -u root -p test
mysql> CREATE FUNCTION timeline returns int soname 'timeline.so';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TEMPORARY TABLE fetch_timeline_tt (id int unsigned NOT NULL) ENGINE=heap;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT timeline(123);

mysql> SELECT message.* FROM message INNER JOIN fetch_timeline_tt USING (id);

And it returns the timeline for user id 123. A quick hack, but easy to implement, and very fast. No more need for writing custom servers. Whenever SQL doesn't do well, I can just call InnoDB directly.


Hi Kazuho,

very internesting what you created here. Surly I would call it a Hack. The performance increace you are experiencing comes from the fact that you are bypassing MySQL it self. Very dangerous if you ask me. This way you are deniing MySQL any type of optimisations and are dependand on the detailed Storage Engine API (that has changes over time.


Thank you for your comment.

Yes, IMHO its pretty much like writing in assembly or using SIMD extensions (instead of C). It's hard to maintain, might become outdated, but it sometimes gives you significant boost on performance.

Thank you

Thank you


Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)