" /> Kazuho at Work: May 2008 Archives

« April 2008 | Main | June 2008 »

May 29, 2008

Presentation Slides of Japanize, Mylingual, Pathtraq

This afternoon, famous bloggers of the United States came to our office as part of their trip to Japan (Lunarr founder invites famous US-bloggers to Japan | Asiajin).

Can anybody miss such a good chance to demonstrate your web services to people like, Tom Foremski (Sillicon Valley Watcher), Marshall Kirkpatrick (Read/Write Web), Kristen Nicole (Mashable), Bob Walsh (47 Hats). Well, certainly I couldn't, and did a short presentation of mine.

The services I introduced are:

  • Japanize - an end-user-based localization service (to Japanese) for web services
  • Mylingual - internationalized version of Japanize
  • Pathtraq - a realtime search engine, content recommendation service based on an alexa-like technology

The slides I used can be found here, and here.

Since it was a jump-in presentation (thank you to all the attendees for letting me do so), the Japanize / Mylingual slides are bit outdated (there are about 40,000 users today). Also they might be too oriented to the technical details (since they were originally written for YAPC), but I hope you will enjoy reading them.

Thank you for coming to the bloggers, have a nice stay in Japan!

May 27, 2008

Slides on Q4M

Today I had a chance to explain Q4M in detail, and here are the slides I used.

It covers from what (generally) a message queue is, the internals of Q4M, how it should be used as a pluggable storage engine of MySQL, to a couple of usage senarios. I hope you will enjoy reading it.

May 21, 2008

Maximum Peformance of MySQL and Q4M

I always use to blog my temporary ideas on one of my Japanese blog (id:kazuhooku's memos). When I wrote my thoughts on how to further optimize Q4M, Nishida-san asked me "how fast is the raw performance without client overhead?" Although it seems a bit diffcult to answer directly, it is easy to measure the performance of MySQL core and the storage engine interface, and by deducting the overhead, the raw performance of I/O operations in Q4M can be estimated. All the benchmarks were taken on linux 2.6.18 running on two Opteron 2218s.

So at first, I measured the raw performance of MySQL core on my testbed using mysqlslap, which was 115k queries per second.

$ perl -e 'print "select 1;\n" for 1..10000' > /tmp/select10k.sql && /usr/local/mysql51/bin/mysqlslap --query=/tmp/select10k.sql --socket=/tmp/mysql51.sock --iterations=1 --concurrency=40
Benchmark
        Average number of seconds to run all queries: 3.470 seconds
        Minimum number of seconds to run all queries: 3.470 seconds
        Maximum number of seconds to run all queries: 3.470 seconds
        Number of clients running queries: 40
        Average number of queries per client: 10000

And the throughput of single row selects to the Q4M storage engine was 76k queries per second.

$ perl -e 'print "select * from test.q4m_t limit 1;\n" for 1..10000' > /tmp/select10k.sql && /usr/local/mysql51/bin/mysqlslap --query=/tmp/select10k.sql --socket=/tmp/mysql51.sock --iterations=1 --concurrency=40
Benchmark
        Average number of seconds to run all queries: 5.282 seconds
        Minimum number of seconds to run all queries: 5.282 seconds
        Maximum number of seconds to run all queries: 5.282 seconds
        Number of clients running queries: 40
        Average number of queries per client: 10000

And finally, the queue consumption speed of Q4M (configure option: --with-mt-pwrite --with-sync=no) was 28k messages per second. And when I turned the --with-sync flag to fsync the speed was 20k messages per second. Considering the fact that consumption of a single row requires two queries (one query for retrieving a row, and one query for removing it), the numbers seem quite well to me, although further optimization would be possible.

$ MESSAGES=200000 CONCURRENCY=40 DBI='dbi:mysql:test;mysql_socket=/tmp/mysql51.sock' t/05-multireader.t 
1..4
ok 1
ok 2
ok 3
ok 4


Multireader benchmark result:
    Number of messages: 200000
    Number of readers:  40
    Elapsed:            7.040 seconds
    Throughput:         28410.198 mess./sec.

And regarding the question about the raw performance of Q4M, the answer would be that the overhead of consuming a single row takes about 30 microseconds in Q4M core with fsync enabled, and about 15 microseconds if only pwrite's are being called.

Cache::Swifty now on CPAN

Yesterday (the day has just change here in Japan), I uploaded Cache::Swifty to CPAN, a perl frontend for the very fast, memory-mapped cache: swifty.

Although you still have to install swifty manually, I hope it would release the pain of installing a perl module manually.

May 20, 2008

Q4M version 0.5.1 released

I have just released version 0.5.1 of Q4M, a message queue that acts as a pluggable storage engine of MySQL. In the release, I have fixed two bugs that might that might block table compaction from occuring, or cause a return of an empty result set when data exists. Thanks to Brian for pointing them out.

Q4M Hogepage

PS. If you have installation problems, using the svn version might help. Installation problems in 0.5.1 have been fixed (link error on linux/x86_64 and installation directory problem with binary distribution of mysql).

Slides from YAPC::Asia 2008 on MySQL Tuning in Pathtraq

Last friday I had a chance to give a talk at YAPC::Asia 2008 on the internals of Pathtraq, one of the largest web access stats service in Japan.

The talk covered from techniques we use for compressing data on MySQL tables, our cache architecture, and a mysql-based message queue (Q4M) that we developed and use.

The slides of the talk are available at http://www.slideshare.net/kazuho/yapcasia-2008-tokyo-pathtraq-building-a-computationcentric-web-service, so please have a look.