Merge branch 'main' into delta-vs-orc

delta-io · Sep 20, 2024 · bc3640d · bc3640d
2 parents 114714c + 37ad9a1
commit bc3640d
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 2 deletions.
diff --git a/src/blog/delta-lake-vs-data-lake/image5.png b/src/blog/delta-lake-vs-data-lake/image5.png
diff --git a/src/blog/delta-lake-vs-data-lake/index.mdx b/src/blog/delta-lake-vs-data-lake/index.mdx
@@ -134,6 +134,8 @@ To read your data from a Parquet data lake, you will first have to list all the
 
 Delta Lake stores the paths to all of the underlying Parquet files in the transaction log. This is a separate file which doesn’t require an expensive listing operation. The more files you have, the faster it will be to read your data with Delta Lake compared to regular Parquet files.
 
+![](image5.png)
+
 ### Delta Lake vs Data Lake: Metadata
 
 Regular Parquet files store metadata about column values in the footer of each file. This metadata contains min/max values of the columns per row group. This means that when you want to read the metadata of your data lake, you will have to read the metadata from each individual Parquet file. This requires fetching each file and grabbing the footer metadata, which is slow when you have lots of Parquet files.

diff --git a/src/blog/delta-lake-vs-parquet-comparison/index.mdx b/src/blog/delta-lake-vs-parquet-comparison/index.mdx
@@ -192,7 +192,7 @@ Delta Lake allows for schema evolution so you can seamlessly add new columns to
 
 Suppose you append a DataFrame to a Parquet table with a mismatched schema. In that case, you must remember to set a specific option every time you read the table to ensure accurate results. Query engines usually take shortcuts when determining the schema of a Parquet table. They look at the schema of one file and just assume that all the other files have the same schema.
 
-The engine can consults the schema of all the files in a Parquet table when determining the schema of the overall table when you manually set a flag. Checking the schema of all the files is more computationally expensive, so it isn’t set by default. Delta Lake schema evolution is better than what’s offered by Parquet.
+The engine consults the schema of all the files in a Parquet table when determining the schema of the overall table when you manually set a flag. Checking the schema of all the files is more computationally expensive, so it isn’t set by default. Delta Lake schema evolution is better than what’s offered by Parquet.
 
 ## Delta Lake vs. Parquet: check constraints
 
@@ -210,7 +210,7 @@ Versioned data also impacts how engines execute certain transactions. For exampl
 
 Parquet tables don’t support versioned data. When you remove data from a Parquet table, you actually delete it from storage, which is referred to as a “physical deletes”.
 
-Logical data operations are better because they are safer and allow for mistakes to be reversed. If you overwrite a Parquet table, it is an irreversible error (unless there is a separate mechanism backing up the data). It’s easy to undo an overwrite tranaction in a Delta table.
+Logical data operations are better because they are safer and allow for mistakes to be reversed. If you overwrite a Parquet table, it is an irreversible error (unless there is a separate mechanism backing up the data). It’s easy to undo an overwrite transaction in a Delta table.
 
 See this blog post on [Why PySpark append and overwrite operations are safer in Delta Lake than Parquet tables](https://delta.io/blog/2022-11-01-pyspark-save-mode-append-overwrite-error/) to learn more.