Category Archives: Enterprises

Snowflake: The Good, The Bad and The Ugly

Snowflake or SnowflakeDB is a cloud SaaS database for analytical workloads and batch data ingestion, typically used for building a data warehouse in the cloud. However, it appears to be so cool and shiny that people are getting mad at praising it all around the internet. Seeing that, I could not resist the urge to take a closer look at this technology and poke into some of its pain points. What have also stumbled me at first is the lack of SnowflakeDB criticism in the blogs and message boards, which sounds suspicious given the self-proclaimed customer base of more than 1000 enterprises. So, let’s take a closer look at it.

Continue reading

Data Industry Trends

Yesterday my blog has got the 100th subscriber. To commemorate this, I prepared the post on the major industry trends happening in the field of “data”. I might miss something, so feel free to comment and extend the article with your opinion!

Big data is falling down the hype curve

Even though Gartner has removed “Big Data” from the last year’s hype diagram, it does not mean it suddenly moved from the peak of the “hype” to the plateau of adoption. Here is how the hype cycle look like:hype curve

Continue reading

Hadoop on Remote Storage

The question regarding running Hadoop on a remote storage rises again and again by many independent developers, enterprise users and vendors. And there are still many discussions in community, with completely opposite opinions. I’d like to state here my personal view on this complex problem.

Hadoop elephant balancing on the shared storage ball

Continue reading

The Story of Online Data Warehouse

The faster your data warehousing solution runs, the higher would be the business demand related to the speed of new data availability in their reports. Over the last time I’ve seen a number of attempts to build up a cool thing called “online DWH” – a data warehouse that is almost in sync with data sources and has its data marts and reports dynamically updated as new data flows into it. This is a very great and powerful thing, but unfortunately its implementation is not as straightforward as the business wants it to be.

Rocket_Tortoise

Continue reading

Modern Data Architecture Podcast

Great news! I have participated in a podcast recorded by Pivotal and published in our official blog. In this podcast I discuss the data architecture in general – how the things started, what was the main driver for its evolution and what we have now as a “modern data architecture”. Come and listen here: http://blog.pivotal.io/pivotal-perspectives/features/discussing-modern-data-architecture

Pivotal Podcast Modern Data Architecture

Text transcript of this talk is also available by the same URL

Hadoop vs MPP

Over the latest time I’ve heard many discussions on this topic. Also this is a very popular question asked by the customers with not much experience in the field of “big data”. In fact, I dislike this buzzword for ambiguity, but this is what the customers are usually coming to us with, so I got to use it.

Screen Shot 2015-07-13 at 12.41.07 PM

If we take a look 5 years back, that was the time when Hadoop was not an option for most of the companies, especially for the enterprises that ask for stable and mature platforms. At that very moment the choice was very simple: when your analytical database grow beyond 5-7 terabytes in size you just initiate an MPP migration project and move to one of the proven enterprise MPP solutions. No one heard about the “unstructured” data – if you got to analyze logs just parse them with Perl/Python/Java/C++ and load into you analytical DBMS. And no one heard about high velocity data – simply use traditional OLTP RDBMS for frequent updates and chunk them for insertion into the analytical DWH.

Continue reading

Hadoop Cluster Backup

Over the time working with enterprise customers, I repeatedly hear the question regarding the Hadoop cluster backup. It is a very reasonable question from the customer standpoint as they know that the backup is the best option to protect themselves from the data loss, and it is a crucial concept for each of the enterprises. But this question should be treated with care because when interpreted in a wrong way it might lead to huge investments from the customer side, that in the end would be completely useless. I will try to highlight the main pitfalls and potential approaches that would allow you to work out the best Hadoop backup approach, which would fulfill your needs.

Two_Elephants_in_Addo_Elephant_National_Park

Continue reading

Why independent consultancy matters?

The world is biased. You can find many examples of it everywhere around you. I really like the story about the doctor:

I felt sick and went to the doctor. The doctor prescribed me specific pills that would help me get better. And it’s completely fine, unless I mentioned that this doctor has a pen, notepad and calendar branded by the same pills he prescribed me to take. I’ve never taken this pills.

This is a true story happening everywhere in my home country. The problem is this kind of things happens everywhere, including the IT sector.

Continue reading