diff --git a/_config.yml b/_config.yml index c233c43..3f370c7 100644 --- a/_config.yml +++ b/_config.yml @@ -5,7 +5,7 @@ author: emma marshall execute: execute_notebooks: 'off' # allow_errors: true - timeout: 500 + timeout: 1000 # Add a bibtex file so that we can create citations bibtex_bibfiles: diff --git a/asf_local_vrt.ipynb b/asf_local_vrt.ipynb index d07d786..6ac9918 100644 --- a/asf_local_vrt.ipynb +++ b/asf_local_vrt.ipynb @@ -2848,7 +2848,7 @@ "source": [ "### Taking a look at chunking\n", "\n", - "If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n", + "If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [asf_local_mf.ipynb]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n", "\n", "Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations [here](https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size) and [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes).\n", "When chunking is set to `auto` (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking [here](https://docs.dask.org/en/stable/array-chunks.html)."