Enterprise-wide applications vs World-wide applications

Enterprise-wide applications provide services to a community where a few thousands, maybe hundreds of thousands, users use the application to achieve some business goals. These enterprise applications are characterized by Martin Fowler as containing a large amount data which is simultaneously accessed by users through a set of operations that implement complex business logic. To guarantee the consistency of data these kind of systems use a transactional architecture which ensures the ACID (Atomicity, Consistency, Isolation, and Durability) properties for the operations executed by users. From a software engineering point of view transactional architectures are very convenient for developers because they can focus on the business logic of the application and ignore the infrastructural aspects resulting from the simultaneous access by users. Therefore, developers can code each operation as it will execute in isolation.

World-wide applications, like Facebook or Amazon, are simular to enterprise application but for the number of users and information they have to deal with. This kind of applications have millions of users and their communities are constantly enlarging, which requires some plasticity of the production infrastructure, the cost of increasing the number of servers and disks should be linear with the increase of the users base. This quality is called scalability. The tactic this kind of applications use to achieve the scalability quality is to apply a shared nothing architecture, where each component provides its services independently of the other. A shared nothing architecture splits its data in independent shards such that each shard is self-sufficient to provide the service. Therefore there is no significant overhead in adding, or removing, components in a shared nothing architecture. For example, in Facebook, which has more that one billion (thousand million) users, it is necessary to have several shards for the single reason that there isn’t a single disk with the capacity to contain all the information. So, each node in the architecture contains the data for some of the users and when, due to their activity, the disk capacity is not enough a new data node is added to the system and the users’ data split between the two nodes.

Today’s transactional technologies do not scale, at least they do not fully support the ACID properties and so a weak consistency model is provided, which may not fit the end user intuition. Consider one of the Facebook shards, which is necessary because of scalability on the amount of data Facebook supports, the data in this shard is accessed by millions of users and this number keeps increasing. Therefore, there is another scalability need in terms of processing capability to support all these accesses. The architectural tactic followed by Facebook is to have several replicas of the shard to augment its capability in terms of read operations. This corresponds to a master-slave architecture where writes are done in the master, which propagates them to the slaves, and reads are supported by the slaves. Therefore, more slaves can be added to the system when the demand increases. However, to provide good performance the propagation of writes to the slaves is not transactional, because it would delay the termination of the execution of the write operation in the master until all slaves were synchronized, and an additional number of messages would need to be exchanged between nodes. As a consequence of this architectural decision it results in a system behavior where two different users, accessing at the same time different slave nodes of the same shard, may not obtain the same version of the data.

From the above example we can identify two aspects that need to be taken into consideration by an architect of a world-wide application: (1) what will be the weak consistency model, (2) what will be the system workload. The Facebook consistency mode guarantees that eventually the users will obtain the most recent data, which is called eventual consistency, and the delay can also be specified. For instance the architecture of MediaWiki, the Wikipedia engine, tries to enforce a maximum lag of 30 seconds between a write in the master and its completed propagation to the slaves. Second, the software architect needs to have a precise description of the system workload. Note that the master-slave tactic of Facebook’s architecture is based on a precise requirement associated with the scalability of reads, and the scalability of writes is dealt by the sharding tactic because the increase of writes is related to the amount of data that needs to be stored. Note that in Facebook the information is not deleted, thus the relation between the number of writes and the amount of disk capacity.

The lack of an infrastructure that supports the ACID properties is a burden for software developers because they need to concern about both the business logic and the “logic” of the interactions among the different operations. Therefore, the development of world-wide applications is harder and more error prone, which increases the overall cost of development.