Snowflake or SnowflakeDB is a cloud SaaS database for analytical workloads and batch data ingestion, typically used for building a data warehouse in the cloud. However, it appears to be so cool and shiny that people are getting mad at praising it all around the internet. Seeing that, I could not resist the urge to take a closer look at this technology and poke into some of its pain points. What have also stumbled me at first is the lack of SnowflakeDB criticism in the blogs and message boards, which sounds suspicious given the self-proclaimed customer base of more than 1000 enterprises. So, let’s take a closer look at it.Continue reading
MVCC stands for Multi-Version Concurrency Control. It is the basic transaction isolation idea that stands behind many transactional systems and allows different processes see different version of truth for the same data. Considering DBMS system, when you are running the query that performs “update” of a specific number of records in the table, you should guarantee specific transaction isolation: if you run “select” in parallel with this “update”, you most likely want this “select” to see the data that was in the table before the “update” has started and not the “dirty” data that was created by this “update” (that might be rollbacked as well as committed).
The solution to handle this particular problem is MVCC – you need to store a number of versions for each row of the table that got changed. This data should be stored somehow and somehow maintained. I will discuss a number of approaches to make it.