During the last eight years we've been working to build an open framework for database replication and clustering. Relational databases have been the keystone of information systems' dependability. They offer a uniform approach to data integrity and durability using tried and tested techniques based on a set of unanimously accepted assumptions. However, current systems leave a lot to be desired in terms of fault tolerance. Centralized systems (no matter how powerful and expensive they are) have inherent scalability issues and do not tolerate a crash without undergoing downtime for repair.
The ESCADA framework we built comprehends a rich set of database specific replication protocols, implementations of a generic database replication interface (GORDA Application Programming Interface) for the most representative open-source database management systems (MySQL, PostgreSQL and Apache Derby), implementations of a generic group communication interface (Java Group Communication Service) for most of current open source group communication toolkits, and the tools for setting up, monitoring and managing database clusters.
We are now turning our focus to cloud computing environments. It seems that the pendulum really swung back this time and despite many challenges ahead, critical services and large amounts of data are better off stored and managed inside the cloud. In the scope of the "Dependable Cloud Computing Management Services" (with HP Labs) we are looking into monitoring and data management protocols capable of scaling up to tens of thousands of nodes leveraging our previous results and knowledge on epidemic multicast protocols. In the Clouder project we are studying the emerging proposals of cloud-based storage and also the emerging client applications. Our aim is to build a (very) large scale fully decentralized object store with the right API and desired consistency guarantees.