Tag Archives: design

Apache HAWQ Architecture Talk

Finally I have translated my talk from Highload++ 2015 conference in Moscow into English, so now you can enjoy the fresh information about the Apache HAWQ internals!

If you’d like to download the slides, you can find them here: HAWQ Architecture HL++ 2015 Moscow

Spark Architecture

87 Replies

Edit from 2015/12/17: Memory model described in this article is deprecated starting Apache Spark 1.6+, the new memory model is based on UnifiedMemoryManager and described in this article

Over the recent time I’ve answered a series of questions related to ApacheSpark architecture on StackOverflow. All of them seem to be caused by the absence of a good general description of the Spark architecture in the internet. Even official guide does not have that many details and of cause it lacks good diagrams. Same for the “Learning Spark” book and the materials of official workshops.

In this article I would try to fix this and provide a single-stop shop guide for Spark architecture in general and some most popular questions on its concepts. This article is not for complete beginners – it will not provide you an insight on the Spark main programming abstractions (RDD and DAG), but requires their knowledge as a prerequisite.

This is the first article in a series. The second one regarding shuffle is available here. The third one about new memory management model is available here.

Continue reading →

Why independent consultancy matters?

Distributed Systems Architecture

brought to you by Alexey Grishchenko

Tag Archives: design

Apache HAWQ Architecture Talk

Spark Architecture

Why independent consultancy matters?