Rui Oliveira homepage

A Cloud Computing environment is a collection of single nodes that can be seen transparently by a consumer as a single highly available computer. This is achieved by passing messages over the network to exchange business data among the nodes and to exchange system data that will keep the system working properly and in a coordinated manner ensuring a correct service delivery. To ensure that the system works properly we need to ensure that exchanged messages are properly delivered to its intended recipients. Guarantying this in large systems with several hundreds or thousands of nodes is difficult due to the inherent complexity of numbers let alone in the presence of failures. In systems of this dimension that are required to run 24/7, failures are more or less common either related to hardware or software problems of both links and nodes. Failures and business needs (for example in the presence of a peak in service demand or to save in the electrical bill, nodes are required to be turned on and off without imposing constraints on service delivery) lead to an highly dynamic environment where change is the normal event. It is clear then that coordination issues and the problems associated with the environment dynamics are key issues in a Cloud Computing environment. At this low level it is fundamental to ensure reliable message delivery and proper node coordination thus allowing higher levels of abstraction to work properly to the delivery of the service. We do believe that the key to solve these issues shouldn't pass by protocols and systems capable of dealing with changes but protocols that handle change by design, as change is a natural part of those systems.

Project website

The hardware industry's commitment to the use of multi-core processors as the only practical way to improve computing power for the new generation of computers, brought concurrent programming, finally, into the realms of mainstream programming. Yet, almost all modern programming languages lack adequate abstractions for concurrent programming. Software Transactional Memories (STMs) are emerging as a powerful paradigm to develop concurrent applications. By relishing the programmer from the burden of managing locks or other low-level concurrency control mechanisms, the reliability of the code is increased and the software development time significantly shortened. The INESC-ID team has coordinated the development of a middleware university management system, called FenixEDU, that is based on the STM technology. The application is in production since 2001 at the Instituto Superior Tecnico (IST) of the Technical University of Lisbon for a population of 12000 students, 900 faculty members and 800 administrative staff. Fenix is the first system in the world that uses, in production, a STM approach. FenixEDU already augments the basic STM model with persistence, to provide ACID properties to web application functionalities, and replication, because due to the high load of the system the application needs to be deployed in more than one cluster server. However, our current solutions to tackle these two important aspects suffer several limitations, including:

The interface with the persistence subsystem (a relational database) often requires the use of an excessive amount of memory, increasing garbage collecting and degrading the system performance.
It is only able to interface a single datastore, which introduces a single point of failure.
Resolution of conflicts between STMs running on different nodes of the cluster requires nodes to obtain exclusive access to the datastore during the commit phase, limiting the concurrency of the system.

These limitations are not surprising given that, to our knowledge, techniques to persist and replicate STMs are still an open research problem. This project brings the FenixEDU team together with two research teams with a significant expertise on building replication solutions to relational and object oriented databases. This will allow the FenixEDU to get an insight on the pros and cons of existing replication techniques (both from a theoretical and a practical point of view). Reciprocally, the database replication experts will obtain an inside view of the singularities of the STM approach. The project will use these synergy to design, develop, implement and validate a Persistent and highly Available Software TRansactional.

Project website

Database replication is a key technology for the long-term competitiveness of today's businesses. In this novel context, replication technology is faced with new requirements. First, enterprise-wide availability should take into account legacy information systems. Moreover, open markets and a globalized economy also mean that wide-area support is required not only by large business but also by small and medium enterprises (SMEs), which inevitably raises security issues. Finally, wider applicability of database technology calls for highly scalable DBMSs. The resulting heterogeneity, geographical dispersion, and scale have a profound impact on the applicability of replication techniques. The wider applicability of DBMS also calls for an increased concern with cost. The goal of the GORDA project is to foster database replication as a means to address the challenges of trust, integration, performance, and cost in current database systems underlying the information society. This is to be achieved by standardizing architecture and interfaces, and by sparking their usage with a comprehensive set of components ready to be deployed.

Project website

The StrongRep project proposes to address the problem of maintaining strong-consistent replicated data in a large-scale setting. To achieve this goal, the project intends to combine the recent research of replication management with broadcast protocols optimized for large-scale operation. The project, departs from recent efforts to implement and optimize topology-aware broadcast algorithms, as well as semantically reliable algorithms and intends to configure and tune them in the specific application area of database replication. It is expected that the novel combination of these approaches can result in a solution that exhibits enough performance to sustain the management of the critical application data that strictly requires strong consistency.

Project website

In recent years, two orthogonal promising approaches to improve the scalability of multicast protocols have emerged. One is the use of probabilistic epidemic protocols. These protocols support the efficient dissemination of data among a large number of nodes providing probabilistic guarantees of delivery. The other is the use of semantic knowledge to support higher sustained throughput in groups with heterogeneous performance behavior. The current project intends to conduct research in order to integrate both approaches in a single efficient large-scale multicast protocol. All large-scale applications that require the multicast of messages from a large number of members and that exhibit some degree of message obsolescence can potentially benefit from the results of this project.

Project website

The project aims to study, design and implement transaction replication mechanism suited to large scale distributed systems. In particular, the proposal intends to exploit partial replication techniques to provide strong consistency criteria whithout introducing significant synchronization and performance overheads. Although several replication frameworks consider the possibility for partial replication, the technique is usually justified by storage constraints and regulated by weak consistency criteria.

Project website

The project studies new protocols that make use of application semantics to improve the performance and throughput stability of end-to-end multicast communication. It comprehends the study of new multicast protocols based on the concept of message obsolescence and message semantics to improve throughput stability on group communication, and the implementation of semantic multicast protocols on a layered and modular communication framework Appia. These protocols can be applied in diverse application fields, such as publish-subscribe and data replication applications.

Project website

Past projects

DC2MS - Dependable Cloud Computing Management Services

Pastramy - Persistent and highly Available Software TRansactional MemorY

GORDA - Open Replication of Databases

StrongRep - Strongly Consistent Replicated Databases in Geographically Large-Scale Systems

Rumor - Probabilistic semantically reliable multicast protocols

ESCADA - Fault-tolerant Scalable Distributed Databases

SHIFT - Group Communication with Differentiated Messages