Snowflake Architecture

How is the Snowflake AI Data Cloud put together?

The biggest area on the Snowflake SnowPro Core certification is entitled Snowflake AI Data Cloud Features & Architecture. This area covers a full 25% of the marks available, so if you can do well here you're a third of the way to passing the exam.

This means you really need to do your research on how Snowflake is put together. The first place to start is Snowflake's explainer on the data cloud, which gives a decent overview. Here are some of the key points:

  • Optimized storage - supports structured, semi-structured and unstructured data
  • Storage can be stored at near-infinite scale
  • Security and governance controls - Snowflake has multiple features which allow organisations to secure sensitive data
  • Operates across regions and clouds - uses Snowgrid, a cross-cloud technology layer which connects various business ecosystems
  • Separate layers - Snowflake's architecture consists of three layers, compute, storage and service
  • Elastic multi-cluster compute - automatic upscaling and downscaling
  • Multiple language support - Python, SQL, Java and Scala
  • Integration with various applications, e.g. Tableau and Power BI
  • Integration with customers' own cloud environments across AWS, Azure and GCP
  • Only pay for the compute and storage you actually use
  • Fully managed service

What about the technical architecture?

Snowflake describes itself as a data platform, rather than a DBMS or database. In truth, this is no different to systems like SQL Server or Oracle. For instance, many people think SQL Server is just a database. But it has an ETL component (SSIS), an analysis component (SSAS) and a reporting platform (SSRS), in addition to other bits and pieces. So, you could argue Snowflake is very similar to existing platforms in that way. The big difference with Snowflake is it was built as a cloud-first system, so absolutely nothing is kept on-premise.

Let's list some important elements of Snowflake's architecture:

  • Uses a completely new SQL query engine, designed from the ground up
  • The engine is essentially a relational database engine
  • Completely cloud-based
  • Uses a mix of traditional shared-disk and shared-nothing database architectures. In shared disk, each node has its own distinct memory, but the nodes share the same disks. In shared-nothing, the nodes have distinct memory and distinct disks
  • All maintenance is handled by Snowflake - no need to manage software upgrades or installations

The three layers of Snowflake

  • Database storage
    Snowflake uses a compressed columnar format. This allows data to be optimized when stored. Tables are also divided into micro-partitions, which is a key concept you need to understand. These micro-partitions are used to store columnar data in particular ranges, allowing efficient query processing.
     
  • Query processing
    Query processing is performed by the use of something called "virtual warehouses". A virtual warehouse is a compute cluster, consisting of multiple nodes. When you set up Snowflake, you choose a cloud provider, e.g. GCP. Snowflake allocates compute nodes from your chosen cloud provider.
     
  • Cloud services
    This covers the remaining services, which coordinate your activities within Snowflake. This includes things like authentication and query parsing.

Page top