Category Archives: Practical Design

Hadoop Cluster Sizing

When you are completely ready to start your “big data” initiative with Hadoop, one of your first questions would be related to the cluster sizing. What is the right hardware to choose in terms of price/performance? How much hardware you need to handle your data and your workload? I will do my best to answer these questions in my article.

Measuring the Elephant Hadoop

Continue reading

Twitter Architecture Analysis (part 1)

Over the time, the more people get internet connectivity, the more complicated internet services become. Twitter is one of the most complicated distributed systems deployed as for now, and it is really interesting to understand how it works under the hood.

If you pretend to be a distributed systems architect, the common question on your interview would looks like this: “Imagine that you need to build a Twitter from scratch. Define the technologies you use for the backend and perform initial system sizing”. In this article I will give you my understanding of this problem and provide an example of the answer that I’d consider to be a good one, even though it might be far from the real state of things. Be aware that I have no relation to the Twitter company itself and everything stated below is just my thoughts on the topic stated above. Continue reading