Snowflake or SnowflakeDB is a cloud SaaS database for analytical workloads and batch data ingestion, typically used for building a data warehouse in the cloud. However, it appears to be so cool and shiny that people are getting mad at praising it all around the internet. Seeing that, I could not resist the urge to take a closer look at this technology and poke into some of its pain points. What have also stumbled me at first is the lack of SnowflakeDB criticism in the blogs and message boards, which sounds suspicious given the self-proclaimed customer base of more than 1000 enterprises. So, let’s take a closer look at it.Continue reading
Everyone around the internet is constantly talking about the bright future of Apache Spark. How cool it is, how innovative it is, how fast it is moving, how big its community is, how big the investments into it are, etc. But what is really hiding behind this enthusiasm of Spark adepts, and what is the real future of Apache Spark?
In this article I show you the real data and real trends, trying to be as agnostic and unbiased as possible. This article is not affiliated with any vendor.
The first blog post of mine is accepted to official Pivotal blog! Feel free to comment and share your opinion on the subject:
Today I had a great talk at the Hadoop User Group Ireland meetup in Dublin, and it was an adapted and refactored version of the article on the same subject, MPP vs Hadoop. Here are the slides:
Feel free to comment and share your opinion on this subject
Here are the slides for the talk I just gave at JavaDay Kiev about the modern data architecture and different modern approaches of data processing:
If you’d like to download the slides, you can find them here: Modern Data Architecture – JD Kiev v05
The faster your data warehousing solution runs, the higher would be the business demand related to the speed of new data availability in their reports. Over the last time I’ve seen a number of attempts to build up a cool thing called “online DWH” – a data warehouse that is almost in sync with data sources and has its data marts and reports dynamically updated as new data flows into it. This is a very great and powerful thing, but unfortunately its implementation is not as straightforward as the business wants it to be.
Recently Databricks announced availability of DataFrames in Spark , which gives you a great opportunity to write even simpler code that would execute faster, especially if you are heavy Python/R user. In this article I would go a bit deeper than the publicly available benchmark results to show you how it really works.
Great news! I have participated in a podcast recorded by Pivotal and published in our official blog. In this podcast I discuss the data architecture in general – how the things started, what was the main driver for its evolution and what we have now as a “modern data architecture”. Come and listen here: http://blog.pivotal.io/pivotal-perspectives/features/discussing-modern-data-architecture
Text transcript of this talk is also available by the same URL
Today I will tell you about the startup called “Splice Machine”. They position themselves as “The Only Hadoop RDBMS”, which is quite bold given the boom we see now in SQL-on-Hadoop solutions field, almost each of the big vendors implemented their own “one and only” solution and claim it to be the best. But let’s take a look at its internals to say whether its design really reflects the marketing slogan they’ve chosen.