The first blog post of mine is accepted to official Pivotal blog! Feel free to comment and share your opinion on the subject:
Today I had a great talk at the Hadoop User Group Ireland meetup in Dublin, and it was an adapted and refactored version of the article on the same subject, MPP vs Hadoop. Here are the slides:
Feel free to comment and share your opinion on this subject
Here are the slides for the talk I just gave at JavaDay Kiev about the architecture of Apache Spark, its internals like memory management and shuffle implementation:
If you’d like to download the slides, you can find them here: Spark Architecture – JD Kiev v04
Here are the slides for the talk I just gave at JavaDay Kiev about the modern data architecture and different modern approaches of data processing:
If you’d like to download the slides, you can find them here: Modern Data Architecture – JD Kiev v05
The faster your data warehousing solution runs, the higher would be the business demand related to the speed of new data availability in their reports. Over the last time I’ve seen a number of attempts to build up a cool thing called “online DWH” – a data warehouse that is almost in sync with data sources and has its data marts and reports dynamically updated as new data flows into it. This is a very great and powerful thing, but unfortunately its implementation is not as straightforward as the business wants it to be.
Great news! I have participated in a podcast recorded by Pivotal and published in our official blog. In this podcast I discuss the data architecture in general – how the things started, what was the main driver for its evolution and what we have now as a “modern data architecture”. Come and listen here: http://blog.pivotal.io/pivotal-perspectives/features/discussing-modern-data-architecture
Text transcript of this talk is also available by the same URL
Over the latest time I’ve heard many discussions on this topic. Also this is a very popular question asked by the customers with not much experience in the field of “big data”. In fact, I dislike this buzzword for ambiguity, but this is what the customers are usually coming to us with, so I got to use it.
If we take a look 5 years back, that was the time when Hadoop was not an option for most of the companies, especially for the enterprises that ask for stable and mature platforms. At that very moment the choice was very simple: when your analytical database grow beyond 5-7 terabytes in size you just initiate an MPP migration project and move to one of the proven enterprise MPP solutions. No one heard about the “unstructured” data – if you got to analyze logs just parse them with Perl/Python/Java/C++ and load into you analytical DBMS. And no one heard about high velocity data – simply use traditional OLTP RDBMS for frequent updates and chunk them for insertion into the analytical DWH.