Main

June 12, 2008

Optimizing MySQL Performance Using Direct Access to Storage Engines (faster timelines)

First, let's look at the numbers. The table below lists the speed of building a timeline like Twitter does, all of them using pull model.

Building Timelines on MySQL
timelines / sec.
SQL56.7
Stored Procedure136
UDF using Direct Access1,710

As I explained in my previous post (Implementing Timeline in Web Services - Paradigms and Techniques, it is difficult (if not impossible) to write an optimal SQL query to build timelines on the fly. Yesterday I asked on the MySQL Internals mailing list whether it is possible to write code that directly accesses the storage engine (in my case InnoDB) for the highest performance, and Brian gave me a quick response (thank you) that there is a MySQL branch that supports writing stored procedures in external languages. So I looked into it, but since it seemed to me like directed towards flexibility than performance. Wondering for a while, I came up with an idea of calling storage engine APIs from an UDF, tried, and it worked!

The code can be found here (/lang/sql/mysql_timeline - CodeRepos::Share - Trac). Its only about 120 lines long with a general helper library (in C++ template) with about the same size. And although it uses a better-tuned version of an algorithm described in my previous post, the core code is as small as follows. IMHO, it is easier to understand than the stored procedure version.

Continue reading "Optimizing MySQL Performance Using Direct Access to Storage Engines (faster timelines)" »

February 03, 2006

C-0.05

Version 0.05 of the C - a pseudo-interpreter of the C programming language, has been released.

Download: tgz / RPM

Install the RPM, or manually install the tarball by configure && make && make install.

C-0.05 is a minor version up from 0.04, including following features:

1. support for special filename (-) and parameter termination parameter (--)

The interpreter now can pass arguments to the scripts given throught standard input by using the special filename '-'. You can copy&paste source codes from the Internet and execute them with parameters given.

2. support for TMPDIR environment variable

The interpreter now uses TMPDIR, if given, for storing temporary and cached files.

3. eliminate the use of MAXPATHLEN

The interpreter now supports variable length path length (like hurd).

Continue reading "C-0.05" »

January 17, 2006

C - 0.04

Version 0.04 of the C - a pseudo-interpreter of the C programming language, has been released.

Download: tgz / RPM

Install the RPM, or manually install the tarball by configure && make && make install.

In version 0.04, a compile cache has been implemented, speeding up the execution of often-used scripts / oneliners. Also, the interpreter has now been rewritten in C. With these improvements, not only CPU-intensive tasks but even the most simple scripts such as hello-world runs faster than perl when executed by the pseudo-interpreter... well, from the second time, acutally :-p

Main Features of C-0.04 are:

+ supports for ANSI C / C++ (runs as a wrapper of GCC)
+ native-code execution speed
+ compile cache
+ support for debugging (calls GDB)

Continue reading "C - 0.04" »

January 10, 2006

C - 0.02

C - a pseudo-interpreter of the C programming language has now been updated to version 0.02.

To install, cp C-0_02 /usr/bin/C && chmod 755 /usr/bin/C.

Following features have been added:


-m option:
Support for source codes with the main function. Source files downloaded from the Internet can directly be executed.

% C -m uuencode.c.txt ...

debugger support:
By using the -d option, gdb can be used for debugging.

man page
To install the man page, pod2man --section=1 --center="Cybozu Labs" --release="C-0.02" /usr/bin/C > /usr/share/man/man1/C.1

#option directive
Compile options can be specified from within source files by using the #option directive.
For example, #option -cWall is equivalent to gcc -Wall, or to use strict; of perl.

Have fun.

Continue reading "C - 0.02" »

January 08, 2006

C - a pseudo-interpreter of the C programming language

In order to write one-liners in C, I created a tiny wrapper for GCC.

C (pronounced large-C) is a pseudo-interpreter of the C programming language.

Without the need of manual compilation, developers can rapidly create scripts or write one-liners using the C programming language that runs at native-code speed.

C, the pseudo-interpreter, has several advantages over other scripting languages, such as perl.

  • very fast (100x faster than perl when calculating fib(40))
  • easy handling of binary data
  • good for testing system calls and C APIs


To install,copy the downloaded file to /usr/bin, and chmod 755.

Below are some examples.

% C -e 'printf("hello world\n")'
hello world

% C -cO3 -e 'int fib(int i) { return i > 2 ? fib(i-1) + fib(i-2) : 1; } printf("%d\n", fib(40))'
102334155

% C -e 'int t, sum = 0; while (fread(&t, sizeof(int), 1, stdin) == 1) sum += t;printf("%d", sum)' < data
31289

% cat hello.C
#! /usr/bin/C

printf("hello world\n");

% ./hello.C
hello world

Continue reading "C - a pseudo-interpreter of the C programming language" »