Wednesday, March 25, 2009

Implement Harmony Oriented Programming in Hydra Cache: A real world project

In this post I would like to share some experience and lesson we learned when applying concepts and principals inspired by the Harmony Oriented Programming (HOP) in Hydra Cache project. Hydra Cache is an open source project created to implement Amazon’s Dynamo system in Java in order to provide highly reliable and scalable distributed in-memory cache. The system itself is essentially like a giant hash table which gets partitioned and distributed to a number of servers (we call this partition aware server cluster a Hydra Space) and with all servers working together to provide a unified and transparent storage for the client code. One of the core objectives we are tackling in this project is to provide reliable in-memory storage by partitioning, distributing and coordinating all GET and PUT operations to multiple servers hence offering redundancy and enhancing reliability. Let’s look at the following parameters used to describe Dynamo:

  • T – Total number of servers in the space.
  • N – Total number of replica that eventually will receive the replication
  • W – Minimum number of replicas to acknowledge for a successful write operation
  • R – Minimum number of replicas to consolidate for every successful read operation

To achieve reliability each PUT operation invoked on any of the node will be replicated to minimum W number of servers and eventually N number of servers, for example if I store my shopping cart data to the cache when W=3, N=6 then in theory the same data will be at least replicated to 3 servers before the PUT is considered to be successful, and eventually 6 servers. Similarly each GET operation will require minimum R number of servers to cooperate and consolidate the data they have in order to perform a successful reliable GET. Usually a system like Dynamo or Hydra Cache will be configured with the following characteristics T > N > max(W,R) and W >= R to make sense, and when you have W+R > N then you have yourself a basic quorum system which guarantees strong consistency. On the other hand, when you have W+R <= N configuration the system will produce weak consistency (in this case eventual consistency) since it’s not guaranteed to read most recent version written; however unsafe it might sound this is actually a widely deployed configuration when the performance penalty of strong consistency is not desirable when eventual consistency especially during failure can be tolerated, and moreover according to the CAP theorem only two of the three properties Consistency, Availability, Tolerance to network Partition can be achieved at the same time. It is common for large internet based service to sacrifice consistency for high availability and tolerance.

* In Hydra Cache the consistent hashing algorithm and partition-aware client help minimize the inconsistency window when weak/eventual consistency is configured meaning although still constraint by CAP theorem Hydra Cache delivers consistent data most of the time when there is no node or network failure.

While designing the system, initially I experiment driving design directly from this model and found the implementation sort of rigid and lacking of metaphor. Personally I always found powerful metaphor the best help you can get when designing a system, so whenever I code the first thing I do is to search for a metaphor with good fit for the problem at hand. Struggling with the design one speech at the OOPSLA caught my attention. The speech it self is about Harmony Oriented Programming. Although I was not so interested at the programming level, but fascinated about how the idea of basic Harmony Orientation principals and practice can be leveraged as a powerful ally for modeling the reliability framework in Hydra Cache project.

Harmony Orientation Principals

  • Balance – For a HOP system the ultimate goal is to achieve balance (harmony) as a whole. No single member of the system should selfishly perform at the cost of its neighbors. This principal pretty much describes the essence of the Hydra Cache since none of the node should out perform or under perform much comparing to their peers and as whole (Hydra Space) should always maintain the equilibrium even when node/connection failure happens or new node is introduced.
  • Code Fragmentation – In HOP the dominant decomposition of a program is a code fragmentation with minimal encapsulation; in a simplest case a fragmentation could be just a single line of statement. For Hydra Cache this principal perfectly maps to the two core operations the Hydra Space supports GET and PUT both can be simplified to almost a single statement from the client’s point-of-view.
  • Code Positioning - The code positioning principle states that every part of a program is assigned to one and more locations in a virtual space. Related parts of a program are positioned close to each other to form a specific context. In Hydra Space all code (node) are positioned in a consistent hash ring with related nodes positioned close to each other based on their hash value.
  • Information Sharing - The information sharing principle suggests that all data is readable (or can be queried) by any part of the program. This is the core feature that Hydra Cache offers - to be able to query any data stored anywhere in the space from pretty much anywhere.
  • Information Diffusion - The information diffusion principle states that data or a description of the data (meta data) generated by any part of the program is diffused throughout the virtual program space. Data has an associated intensity that decreases the further it is diffused. The concept of information diffusion is expressed in HOP using the concept of substance, which I will illustrate further; this concept maps nicely to the N/W/R values we discussed in the beginning of this post.

Enough principals and dry ideas, now let’s see how HOP (more accurately Harmony Oriented Design - HOD) is implemented in Hydra Cache. In Hydra Cache each server node is considered as a code fragmentation holder that resides in a HOP space, see the following illustration for a 4 server space deployment.

* In this diagram Hydra space is illustrated as a 2 dimensional space however in implementation the space is structured on a Consistent Hash Ring (a.k.a a Continuum) using a well uniformly distributed consistent hashing algorithm inspired by Ketama project, therefore the distance between server nodes as well as the radius of the substances are measured using the difference between each node’s hash value.

As you can see in this diagram Server A emits a substance to include both Server B and D hence any code fragment resides on Server A will diffuse its information to both B and D. In other words, every time when a PUT operation is invoked on A the same operation will be invoked on both B and D. Now it becomes obvious that the size of the substance is the N value, and on top of that Hydra Cache also allow server admin to configure the W and R parameter to fine tune the consistency level. Server A is not the only one that emits substance but rather all servers in the space do, here is a bit messy diagram showing all 4 substances.

Let’s see it in action. Based on the above example the Hydra space would be configured with N=2. To optimize read you can configure W = 2 and R = 1 which means every PUT operation will be only considered successful when the data gets diffused to at least one server (either D or C) in the substance, and eventually reaches both servers (most likely D and C but could be B if one of them crashes during the diffusion). For reads the server will just simply return the local copy since R is 1. As we have demonstrated here these parameters can be a very flexible and powerful tool to tune the space to focus and balance on different priorities: consistency, reliability, fault-tolerance, read vs. write, and performance.

HOD offered an almost perfect metaphor for solving and modeling our problem domain which in return made it extremely effective to communicate our design in human language and code. In our version 1.0 release the Hydra space is implemented through reliable multi-cast using JGroups. If you are interested to know more about Hydra Cache please visit our website http://www.hydracache.org.


References:

  1. Harmony Oriented Programming
  2. Towards harmony-oriented programming
  3. Byzantine fault tolerance
  4. Fault detection for Byzantine quorum systems
  5. Amazon Dynamo
  6. Project Voldemort
  7. Consistent Hashing and Random Trees
  8. Eventual Consistency