I have just read the “Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics” paper and decided to write a short blog post going through some of the key moments of the paper’s motivation. Let’s start.
Yesterday my blog has got the 100th subscriber. To commemorate this, I prepared the post on the major industry trends happening in the field of “data”. I might miss something, so feel free to comment and extend the article with your opinion!
Big data is falling down the hype curve
Even though Gartner has removed “Big Data” from the last year’s hype diagram, it does not mean it suddenly moved from the peak of the “hype” to the plateau of adoption. Here is how the hype cycle look like:
The first blog post of mine is accepted to official Pivotal blog! Feel free to comment and share your opinion on the subject:
Here is the video of my talk on Modern Data Architecture from Java Day Kiev 2015
The slides are available here: Modern Data Architecture – JD Kiev v05
Open source data community has been rapidly growing over the last 10 years. You can feel this by the emerge of projects like Apache Hadoop, Apache Spark and the likes. It is growing this fast that there is almost no chance of keeping up with its growth without constantly monitoring the related events, announcements and other changes. 10 years ago it was enough to know “just Oracle” or “just MySQL” to make a successful career in data. Now the things has greatly changed, and if you cannot answer questions like “what is the difference between MapReduce and Spark?” and “when would you prefer to use Flink over Storm?” at your job interview you are screwed.
Also, what would be the “next big thing” in data?
This is the talk I made on Java Day Kiev 2015. It was a great conference after all
The question regarding running Hadoop on a remote storage rises again and again by many independent developers, enterprise users and vendors. And there are still many discussions in community, with completely opposite opinions. I’d like to state here my personal view on this complex problem.
Today I had a great talk at the Hadoop User Group Ireland meetup in Dublin, and it was an adapted and refactored version of the article on the same subject, MPP vs Hadoop. Here are the slides:
Feel free to comment and share your opinion on this subject
Finally I have translated my talk from Highload++ 2015 conference in Moscow into English, so now you can enjoy the fresh information about the Apache HAWQ internals!
If you’d like to download the slides, you can find them here: HAWQ Architecture HL++ 2015 Moscow