diff --git a/Snowflake.md b/Snowflake.md new file mode 100644 index 00000000..d794fb04 --- /dev/null +++ b/Snowflake.md @@ -0,0 +1,53 @@ +# Snowflake + +**Introduction** + +Snowflake’s founders started from scratch and built a data platform that would harness the immense power of the cloud. But their vision didn’t stop there. They engineered Snowflake to power the Data Cloud, where thousands of organizations have seamless access to explore, share, and unlock the true value of their data. + +With the shift to digital automation, businesses are rapidly adjusting the ways they use data to automate processes and create innovative products and services. The Snowflake Data Cloud gives businesses a way of uniting data to make it easily discoverable, securely shareable, and available for diverse analytic workloads. But a key challenge remains in validating the reliability of that data so that it is useful, relevant, and timely for the teams who need it, when they need it. + +**Project Summary** +||| +|-|-| +| Website | https://www.soda.io/snowflake | +|Foundation Name | Snowflake | +| Snowflake | Snowflake.inc | +|Open/Proprietary | Open | +|Source Code|https://github.com/snowflakedb| +|Brief description | The Snowflake Data Cloud gives businesses a way of uniting data to make it easily discoverable, securely shareable, and available for diverse analytic workloads. | + + + +**Project Details** + +**Key Features** + +· Execute dataset and record-level quality checks. + +· Create and track data quality incidents to streamline issue resolution and decrease data downtime. + +· Access health dashboards to get at-a-glance visibility into dataset health. + +· Establish data quality agreements to ensure that data is, and remains, fit-for-purpose. + +· Reduce the risk and cost of migrating data from its source to the Data Cloud + +**Architecture** + +As the number of sources and types of data that businesses accumulate continues to expand in volume and complexity - first-party data generated in-house, second-party data produced from collaborations, third-party data acquired externally - uniting and sharing good-quality data gives businesses the ability to maximize its value. +![Architecture overview](https://docs.snowflake.com/en/_images/architecture-overview.png) + +**Current Usage** + +Using Soda, customers and partners on Snowflake gain a complete data quality workflow across the entire Data Cloud, from data ingestion to consumption. With these workflows, data teams can make trusted data available 24/7, and get ahead of data issues before they have a downstream impact on the business. + +### Comparison + +**Redshift vs. Snowflake: Which warehouse makes sense for you?** + +Further comparison between these two data warehouse solutions illustrates how they're suited for different needs: + +- **Features: bundled or not?** Redshift bundles compute and storage to provide the immediate potential to scale to an enterprise-level data warehouse. But by splitting computation and storage and offering tiered editions, Snowflake provides businesses the flexibility to purchase only the features they need while preserving the potential to scale. +- **JSON: dealbreaker or no big deal?** When it comes to JSON storage, Snowflake's support is decidedly more robust than Redshift. This means that with Snowflake you can store and query JSON with native, built-in functions. When JSON is loaded into Redshift, it's split into strings, which makes it harder to work with and query. +- **Security: everything you could ever need, or only what your business needs?** Redshift includes a deep bench of customizable encryption solutions, but Snowflake provides security and compliance features oriented to its specific editions so that you have the level of protection most relevant to your data strategy. +- **Data duties: automated or hands-on?** Redshift requires more hands-on maintenance for a greater range of tasks that can't be automated, such as data vacuuming and compression. Snowflake has the advantage in this regard: it automates more of these issues, saving significant time in diagnosing and resolving issues.