Monday, August 21, 2006

Deadlock problem in MySQL JConnector 3.1

Recently I found a very interesting but also annoying database deadlock problem in the my current project. For the performance reason we uses different storage engine in MySQL for different type of tables, for example InnoDB engine for OLTP type tables, MyISAM engine when high data throughput as well as certain level of data inconsistency can coexist, MEMORY engine for transient data or tables that have extreme performance requirement.

Just a couple of weeks ago, during system stability testing we discovered that occasionally deadlocks were found in MySQL JConnector 3.1 driver when we have large number of InnoDB and MyISAM tables coexist in the same database. According to MySQL bug base, this problem is scheduled to be fixed in JConnector 5, unfortunately due to many different reasons both technical and political we do not have the luxury to upgrade to JConnector 5 at least for now in this project, but we also disparately need this mixed engine deployment feature from MySQL to meet our performance requirement. It became obvious a couple weeks ago, I had to fixed this problem for this project to be successfully released. After downloading JConnector source code, by using the combination of simple JVM dump and JProfiler finally I pinpointed the problem in two classes com.mysql.jdbc.Connection and com.mysql.jdbc.ServerPreparedStatement. In the original Connection class prepareStatement method implementation, where PreparedStatement gets created, it only uses a simple synchronized keyword at method level as the semaphore. Within this method Connection class performs many call to ServerPreparedStatement to open, close, and check state of the statement. Based on my observation and investigation, realClose method on ServerPreparedStatement seems is the one that causes all of the problems. Looked at the realClose method, quickly I realized the problem is that in this method ServerPreparedStatement makes callback to Connection class which causes a classic cross-reference scenario (the breeding bed of the notorious deadlock problem). After that I also examined how ServerPreparedStatement uses its semaphore, it appears to me it is obvious that the orginal implementation of the locking mechanism in this part of the code is rather naïve and not well thought-through. The original implementation of ServerPreparedStatement uses three different semaphores in realClose method in the order of statement.this, connection.this, connection.mutex, and now if you look at the Connection.prepareStatement implementation as we mentioned before at first it seems only uses connection.this as the semaphore, but with a little bit digging I realized that before calling prepareStatement method the current thread has already acquired connection.mutex as a semaphore which means Connection.prepareStatement locks in connection.mutex then connection.this sequence, but ServerPreparedStatement.realClose locks in connection.this then connection.mutex sequence. Now everything is clear, it is a classic locking sequence problem, and the solution is easy that is to make the sequence of semaphore acquisition the same in both Connection.prepareStatement and ServerPreparedStatement.realClose methods by adding explicit locking order:

synchronized (getMutex()) {
synchronized (this) {

to Connection.prepareStatement implementation and switching the locking sequence:

synchronized (this.connection.getMutex()) {
synchronized (this.connection) {

in ServerPreparedStatement.realClose implementation.

Although this fix did fixed all the problem we had in our project, but according to MySQL official bug base that this deadlock problem might have a much broader base of causes and impact, therefore it can be only address in version 5, so before you apply this fix make sure you fully understand the implication and the causes of your particular problem, also I strongly recommend you running a thorough stability test and profiling session after applying this fix, nevertheless I am hoping this post will shed some light to this problem as well as its solution and hopefully can also provide some help.

2 comments:

Shebang said...

sweet analysis. If this was the problem, wouldnt it deadlock all the time or be prone enough it to be caught during qa of the driver?

Nick Zhu said...

Actually it only happens when your application try to open and close prepared statements at the same time by multiple threads on the same connection instance