Alexander The Great

March 20, 2008

The Prevayler, Klaus Wuestefeld, and Cooked Numbers

Filed under: Programming — alexanderthegreatest @ 12:16 pm
Tags: , , , ,

Actually, two-phase commit is unnecessary. –KlausWuestefeld

A 2 phase commit is a protocol for distributed transactions. A local transaction means work is done in a tentative way, and then committed, or made permanent. This is a single phase commit. When a transaction is run against more than one store or data source, this is insufficient. Consider this scenario

  1. Data must be changed in 3 related databases (each hosted on its own server or “box”) to processes a logical unit of work – say creating a new user account.
  2. A winter storm knocks down a power line, causing a server to loose power and go down.
  3. The 2 other servers are unable to get confirmation that the 3rd has successfully completed its update, so they roll back their part of the transaction, preserving a valid state across the system. (This must be possible even after a local commit.)

Ideally, the distributed transaction would not fail, and no rollback would be necessary. However, in the real world, this inevitability must be accounted for. A non computer analogy would be a space shuttle (or even a sea faring vessel) preparing to launch – if any system isn’t ready, the entire launch is aborted. This is very much necessary.

Consider this quote from the very same article

Actually the TwoPhaseCommit is the algorithm for resolving commit commands for distributed transactions. What prevayler does is a basic recovery mechanism, and it sweeps concurrency control under the carpet by serializing all concurrent acesses to the global object model. Well, it works in situations, but you have to be very careful in analyzing whether it works for your particular problem. If you have a distributed transaction situation, the lack of support for TwoPhaseCommit is can be a very serious deficiency, but again, depending on the particular situation it may not be too big a problem. Because what TwoPhaseCommit does at the infrastructure layer, transparent to the developer, could be done at the application layer with application specific business logic. I guess that’s what Klaus’ article it’s trying to say: that you can deal with the issues at the business logic level. However, depending on the situation that extra business logic you have to add can be extremeley simple or it can be extremely complex, and I’m affraid the examples he presented are not very representative of how and why one would be in a distributed transaction situation and how to handle it.

The wiki article introducing The Prevayler gives these statistics

-3,251 times faster than MySQL.
-9,983 times faster than ORACLE.

This is probably true for some operations.  For example, someone else in the article goes on to explain that Oracle is tuned for a mixture of read and write operations, while MySQL has less optimization for real world work loads.  While Oracle is in general much more performance delivering than MySQL, very specific cases can be set up to show the reverse.

The Prevayler stores its data entirely in memory, serializes all “transactions” (runs them step by step, one after another) so scale is severely limited, and real world data sets simply can not be processed with this Java toy.  Much like Safari, the world’s fastest browser, and one which ignores instructions (like in some cases anything to do with background) in order to achieve that speed – hardly useful!

Still, toy is perhaps too harsh a word, for there are many web applications with trivial needs that can be satisfied by such a system.  It’s annoying to read Klaus defend his work by saying, over and over again, that any feature not implemented by “The Prevayler” is unnecessary in any situation.  As shown with 2 phase commit, this type of arrogance is off putting, and makes a person want to stick with standardized, tested, non buggy RDBMS code.

About these ads

3 Comments »

  1. If you design applications as front-ends to RDBMSs then sure. 2-PC is inevitable…
    Otherwise, 2-PC is unnecessary. (And simplistic I may add…)
    In fact is undesirable as it relies on the fallacy #2 of the Fallacies of Distributed Computing.
    And consequently scalability suffers (more than the figures above show!).
    Have a look at this article and you’ll see that there other ways…:

    http://www.michaelnygard.com/blog/2007/11/architecting_for_latency.html

    Comment by Tasos Zervos — December 28, 2008 @ 8:38 am | Reply

  2. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

    Comment by Tasos Zervos — December 28, 2008 @ 8:39 am | Reply

  3. You suck

    Comment by Yomomma — January 28, 2009 @ 9:31 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: