Scalable Web Architecture and Distributed Systems

In the Scalable Web Architecture and Distributed System article Kate Matsudaira does a good overview of the qualities and solutions to scalable web applications. In the following analysis I use the SEI Software Architecture terminology, mainly, I’ll distinguish a module, a unit of implementation, from a component (connector), a unit of execution. I will also describe the architecture using several views, which can be of type module, component-and-connector, or allocation.

The article does not describe a particular system but it does a systematic enumeration of the mechanisms today’s highly scalable systems use. The article has three parts: the first part identifies the qualities web applications should have; the second describes different architectural configurations to achieve the scalability quality; and the third part details mechanisms that are pervasive in today’s highly scalable systems and that improve performance and availability.

In the identification of requirements it includes a business requirement, cost, and several system requirements, like performance or scalability. Usually, business and system requirements are not independent, for instance manageability, how easy is the system to operate, has an impact on cost because, for instance, the number of system administrators needed to manage the scalable system depends on the how easy it is to operate it. Actually, system requirements may result from a refinement of business requirements.

Due to the kind of qualities in play, the architecture views are of the component-and-connector type in order to describe how the runtime execution of the system can support, por instance, performance or availability. In particular, in the second part, section 1.2, a few architectural views, of an hypothetical Image Hosting Application, are presented to illustrate how the qualities can be achieved. These views describe components that interact through connectors which provide most of the relevant qualities. The discussion on what are the mechanisms, like caching, that connectors need to use is the focus of the third part, section 1.3.

Four components are identified. Two client components, in order to illustrate that there are two kind of operations which, from an architectural perspective, have a different use of the system resources: upload operation which may be slower since it is necessary to write the file in disk, and download operations that are faster and occur more frequently. Additionally, note that in any typical workload of this kind of system the number of download are several times bigger than the number of uploads. Two other components, the ImageRequestHandler and the ImageFileStorage support the computation of requests from clients and the storage of the files, respectively. The connectors between these components support the communication protocols and that have the properties which provide the overall system qualities. It is by discussing the different configurations of the four components and the properties of their connectors that software architects can reason about the systemic qualities of the system.

A few examples of how the reasoning about the system qualities can take place.

Suppose that the team is discussing an architectural view that depicts a single ImageRequestHandler component to which both kind of client components connect to upload and download files. Which kind of questions may be asked? The questions may be driven by the architecturally significant requirements that were agreed between the stakeholders. Let’s consider that 95% of the request will be reads and only 5% will be writes. In this situations it would be a good idea to try to reduce the load due to reads on the ImageRequestHandler component. To do so, it may be a good tactic to define another connector between the DownloadClient component and the ImageFileStorage such that the actually transference of the file is done without the intermediation of the ImageRequestHandler. Therefore, according to this new architectural view, DownloadClient components will interact with the ImageRequestHandler to obtain the file location and then they will send a request to the ImageFileStorage for the file transference. Note that the new connector may be using streaming while the old connector will be of request/reply type. Overall the load of the system kernel, ImageRequestHandler and ImageFileStorage components, is reduced by moving some of the work to the DownloadClient component.

The previous description shows that by changing the configuration in a component-and-connector view we can discuss systemic qualities, like performance. Consider now that the same files are frequently downloaded and that it is an architecturally significant requirements that states that they should be downloaded quickly. A possible tactic to achieve this requirement would be to have a copy of the file closer to the DownloadClient components, with the additional advantage that it will reduce the load on the ImageFileStorage, and so a cache is a technique that may help to accomplish the tactic. How can we represent a cache? The first approach would be to say that the file transfer connector between DownloadClient components and the ImageFileStorage supports that cache, which can be represented as a property of the connector. Then we may ask a few more questions. First, if two different DownloadClient components download the same file will it be downloaded twice? What actually corresponds to ask whether we will have a local or global cache. Suppose that the decision is to have a global cache because the workload of downloads gives us an hint that different clients will frequently download the same files during a certain period of time, for instance when a photo is published in Facebook it is immediately downloaded by several users. How can we represent this? In the architectural view we have a different connector for each pair of DownloadClient and ImageFileStorage components. A possible representation would be to have a single connector among all DownloadClient component and the ImageFileStorage and label it with a property that indicates that a global cache will be used to implement it.

This second interaction through the architectural view shows that by adding properties to connectors we can discuss how the system qualities are achieved. The next step, if the team considered it necessary, would be to decompose the file transfer connector into a component, GlobalCache component, and two connectors, one with the DownloadClient components and another with the ImageFileStorage, to detail how the interactions will occur and, for instance, what will be eviction the policy in the cache, whether it would be functionality-specific or application-independent. Note that the level of detail of the architectural views depends on the architecturally significant requirements and, in this case, the decomposition of the file transfer connector should only be done if there is an explicit architectural requirement for which labelling the connector with a property would not be enough to show how this quality is achieved.