Skip to content

Commit

Permalink
fix link and config execution limit
Browse files Browse the repository at this point in the history
  • Loading branch information
e-marshall committed Jan 29, 2024
1 parent 05c2bcc commit 4b85e8a
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: emma marshall
execute:
execute_notebooks: 'off' #
allow_errors: true
timeout: 500
timeout: 1000

# Add a bibtex file so that we can create citations
bibtex_bibfiles:
Expand Down
2 changes: 1 addition & 1 deletion asf_local_vrt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2848,7 +2848,7 @@
"source": [
"### Taking a look at chunking\n",
"\n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [asf_local_mf.ipynb]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"\n",
"Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations [here](https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size) and [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes).\n",
"When chunking is set to `auto` (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking [here](https://docs.dask.org/en/stable/array-chunks.html)."
Expand Down

0 comments on commit 4b85e8a

Please sign in to comment.