Optimizing MySQL Performance Using Direct Access to Storage Engines (faster timelines)
First, let's look at the numbers. The table below lists the speed of building a timeline like Twitter does, all of them using pull model.
| timelines / sec. | |
|---|---|
| SQL | 56.7 |
| Stored Procedure | 136 |
| UDF using Direct Access | 1,710 |
As I explained in my previous post (Implementing Timeline in Web Services - Paradigms and Techniques, it is difficult (if not impossible) to write an optimal SQL query to build timelines on the fly. Yesterday I asked on the MySQL Internals mailing list whether it is possible to write code that directly accesses the storage engine (in my case InnoDB) for the highest performance, and Brian gave me a quick response (thank you) that there is a MySQL branch that supports writing stored procedures in external languages. So I looked into it, but since it seemed to me like directed towards flexibility than performance. Wondering for a while, I came up with an idea of calling storage engine APIs from an UDF, tried, and it worked!
The code can be found here (/lang/sql/mysql_timeline - CodeRepos::Share - Trac). Its only about 120 lines long with a general helper library (in C++ template) with about the same size. And although it uses a better-tuned version of an algorithm described in my previous post, the core code is as small as follows. IMHO, it is easier to understand than the stored procedure version.