Snowflake or SnowflakeDB is a cloud SaaS database for analytical workloads and batch data ingestion, typically used for building a data warehouse in the cloud. However, it appears to be so cool and shiny that people are getting mad at praising it all around the internet. Seeing that, I could not resist the urge to take a closer look at this technology and poke into some of its pain points. What have also stumbled me at first is the lack of SnowflakeDB criticism in the blogs and message boards, which sounds suspicious given the self-proclaimed customer base of more than 1000 enterprises. So, let’s take a closer look at it.
Continue readingCategory Archives: DBMS
Apache Spark Future
Everyone around the internet is constantly talking about the bright future of Apache Spark. How cool it is, how innovative it is, how fast it is moving, how big its community is, how big the investments into it are, etc. But what is really hiding behind this enthusiasm of Spark adepts, and what is the real future of Apache Spark?
In this article I show you the real data and real trends, trying to be as agnostic and unbiased as possible. This article is not affiliated with any vendor.
Apache HAWQ: Next Step in MPP
The first blog post of mine is accepted to official Pivotal blog! Feel free to comment and share your opinion on the subject:
MPP vs Hadoop Talk
Today I had a great talk at the Hadoop User Group Ireland meetup in Dublin, and it was an adapted and refactored version of the article on the same subject, MPP vs Hadoop. Here are the slides:
Feel free to comment and share your opinion on this subject
Modern Data Architecture
Here are the slides for the talk I just gave at JavaDay Kiev about the modern data architecture and different modern approaches of data processing:
If you’d like to download the slides, you can find them here: Modern Data Architecture – JD Kiev v05
Cloudera Kudu: Catching a Unicorn
Recently Cloudera announces new storage engine for fast analytics and fast data called Kudu. This is a very interesting piece of code and I couldn’t withstand an attraction of analyzing this technology deeper and going beyond the marketing.
The Story of Online Data Warehouse
The faster your data warehousing solution runs, the higher would be the business demand related to the speed of new data availability in their reports. Over the last time I’ve seen a number of attempts to build up a cool thing called “online DWH” – a data warehouse that is almost in sync with data sources and has its data marts and reports dynamically updated as new data flows into it. This is a very great and powerful thing, but unfortunately its implementation is not as straightforward as the business wants it to be.