Merge branch 'main' into development

MannLabs · Dec 15, 2024 · f7cf794 · f7cf794
2 parents a94b8cb + de25590
commit f7cf794
Show file tree

Hide file tree

Showing 6 changed files with 46 additions and 30 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,4 +1,4 @@
-scPortrait - image-based single cell analysis at scale in Python
+scPortrait – image-based single cell analysis at scale in Python
 ================================================================
 
 scPortrait is a scalable toolkit to analyse single-cell image datasets. This Python implementation efficiently segments individual cells, generates single-cell datasets and provides tools for the efficient deep learning classification of their phenotypes for downstream applications.

diff --git a/docs/pages/notebooks/example_scPortrait_project.ipynb b/docs/pages/notebooks/example_scPortrait_project.ipynb
@@ -5,17 +5,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# A walk through the scPortrait Ecosystem\n",
+    "# A Walk Through The scPortrait Ecosystem\n",
     "\n",
     "This notebook will introduce you to the scPortrait ecosystem and give you a complete working example of how to use scPortrait. It will walk you through the following steps of the scPortrait workflow:\n",
     "\n",
-    "1. **segmentation**: Generates masks for the segmentation of input images into individual cells\n",
+    "1. **Segmentation**: Generates masks for the segmentation of input images into individual cells\n",
     "\n",
-    "2. **extraction**: The segmentation masks are applied to extract single-cell images for all cells in the input images. Images of individual cells are rescaled to [0, 1] per channel.\n",
+    "2. **Extraction**: The segmentation masks are applied to extract single-cell images for all cells in the input images. Images of individual cells are rescaled to [0, 1] per channel.\n",
     "\n",
-    "3. **featurization**: The image-based phenotype of each individual cell in the extracted single-cell dataset is featurized using the specified featurization method. Multiple featurization runs can be performed on the same dataset using different methods. Here we utilize the pretrained binary classifier from the original [SPARCS manuscript](https://doi.org/10.1101/2023.06.01.542416) that identifies individual cells defective in a biological process called \"autophagy\". \n",
+    "3. **Featurization**: The image-based phenotype of each individual cell in the extracted single-cell dataset is featurized using the specified featurization method. Multiple featurization runs can be performed on the same dataset using different methods. Here we utilize the pretrained binary classifier from the original [SPARCS manuscript](https://doi.org/10.1101/2023.06.01.542416) that identifies individual cells defective in a biological process called \"autophagy\". \n",
     "\n",
-    "4. **selection**: Cutting instructions for the isolation of selected individual cells by laser microdissection are generated. The cutting shapes are written to an ``.xml`` file which can be loaded on a leica LMD microscope for automated cell excision.\n",
+    "4. **Selection**: Cutting instructions for the isolation of selected individual cells by laser microdissection are generated. The cutting shapes are written to an ``.xml`` file which can be loaded on a Leica LMD7 microscope for automated cell excision.\n",
     "\n",
     "The data used in this notebook was previously stitched using the stitching workflow in [SPARCStools](https://github.com/MannLabs/SPARCStools). Please see the notebook [here](https://mannlabs.github.io/SPARCStools/html/pages/notebooks/example_stitching_notebook.html)."
    ]
@@ -768,14 +768,19 @@
     "fig.tight_layout()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, you can also visualize the input images as well as all other objects saved in a spatialdata object"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# alternatively you can also visualize the input images as well as all other objects saved in spatialdata object\n",
-    "\n",
     "project.view_sdata()"
    ]
   },
@@ -1048,7 +1053,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Looking at Segmentation Results\n",
+    "### Inspecting Segmentation Results\n",
     "\n",
     "The Segmentation Results are written to a hdf5 file called `segmentation.h5` located in the segmentation directory of our scPortrait project.\n",
     "\n",
@@ -1177,7 +1182,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Extracting single-cell images\n",
+    "## Extracting single cell images\n",
     "\n",
     "Once we have generated a segmentation mask, the next step is to extract single-cell images of segmented cells in the dataset.\n",
     "\n",
@@ -1289,7 +1294,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Look at extracted single-cell images\n",
+    "### Look at extracted single cell images\n",
     "\n",
     "The extracted single-cell images are written to a h5py file `single_cells.h5` located under `extraction\\data` within the project folder.\n",
     "\n",
@@ -1377,7 +1382,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Classification of extracted single-cells\n",
+    "## Classification of extracted single cells\n",
     "\n",
     "Next we can apply a pretained model to classify our cells within the scPortrait project. \n",
     "\n",
@@ -1458,7 +1463,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### looking at the generated results\n",
+    "### Looking at the generated results\n",
     "\n",
     "The results are written to a csv file which we can load with pandas.\n",
     "\n",
@@ -1651,7 +1656,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Exporting Cutting contours for excision on the LMD7\n",
+    "## Exporting Cutting contours for excision on a Leice LMD7\n",
     "\n",
     "scPortrait directly interfaces with our other open-source python library [py-lmd](https://github.com/MannLabs/py-lmd) to easily select and export cells for excision on a Leica LMD microscope. \n",
     "\n",

diff --git a/docs/pages/tools/parsing/example_parsing_notebook.ipynb b/docs/pages/tools/parsing/example_parsing_notebook.ipynb
@@ -5,7 +5,7 @@
    "id": "6fc618c5",
    "metadata": {},
    "source": [
-    "# Example Parsing Notebook to rename phenix experiments"
+    "# Example Notebook to parse and rename files from experiments imaged on an Opera Phenix microscope"
    ]
   },
   {

diff --git a/docs/pages/tools/stitching/example_stitching_notebook.ipynb b/docs/pages/tools/stitching/example_stitching_notebook.ipynb
@@ -8,6 +8,13 @@
     "# Example Stitching Notebook"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "scPortrait uses [Ashlar](https://labsyspharm.github.io/ashlar/) for stitching images. When stitching from `.tif` files, Ashlar reads channel and tile position information from filenames according to a predefined `pattern`. Hence, filenames matter when stitching from `.tif` files."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -45,7 +52,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### initializing the stitcher object"
+    "### Initializing the `Stitcher` object"
    ]
   },
   {
@@ -66,12 +73,12 @@
     "slidename = \"stitching_test\"\n",
     "outdir = os.path.join(str(input_dir).replace(\"stitching_example\", \"example_projects/stitching\"), slidename)\n",
     "\n",
-    "row = str(2).zfill(2)  # specify the row of the well you want to stitch\n",
-    "well = str(4).zfill(2)  # specifc the well number you wish to stitch\n",
+    "row = str(2).zfill(2)  # specify the row of the well you want to stitch, here = 2\n",
+    "well = str(4).zfill(2)  # specifc the well number you wish to stitch, here = 4\n",
     "zstack_value = str(1).zfill(\n",
     "    3\n",
     ")  # specify the zstack you want to stitch. for multiple zstacks please make a loop and iterate through each of them.\n",
-    "timepoint = str(1).zfill(3)  # specifz the timepoint you wish to stitch\n",
+    "timepoint = str(1).zfill(3)  # specify the timepoint you wish to stitch\n",
     "\n",
     "pattern = f\"Timepoint{timepoint}_Row{row}_Well{well}_{{channel}}_zstack{zstack_value}_r{{row:03}}_c{{col:03}}.tif\"\n",
     "\n",
@@ -487,14 +494,14 @@
    "source": [
     "## Multi-threaded Stitching\n",
     "\n",
-    "Using the ParallelStitcher class stitching can be speed up by using multiple threads. The code to perform stitching remains more or less the same."
+    "The `ParallelStitcher` class can speed up stitching by using multiple threads. The code to start stitching remains the same, but `ParallelStitcher` takes an additional argument `threads`, specifying the number of parallel threads to use."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### initializing the stitcher object"
+    "### Initializing the `ParallelStitcher` object"
    ]
   },
   {
@@ -516,12 +523,12 @@
     "outdir_parallel = os.path.join(str(input_dir).replace(\"stitching_example\", \"example_projects/stitching\"), slidename)\n",
     "\n",
     "\n",
-    "row = str(2).zfill(2)  # specify the row of the well you want to stitch\n",
-    "well = str(4).zfill(2)  # specifc the well number you wish to stitch\n",
+    "row = str(2).zfill(2)  # specify the row of the well you want to stitch, here = 2\n",
+    "well = str(4).zfill(2)  # specifc the well number you wish to stitch, here = 4\n",
     "zstack_value = str(1).zfill(\n",
     "    3\n",
     ")  # specify the zstack you want to stitch. for multiple zstacks please make a loop and iterate through each of them.\n",
-    "timepoint = str(1).zfill(3)  # specifz the timepoint you wish to stitch\n",
+    "timepoint = str(1).zfill(3)  # specify the timepoint you wish to stitch\n",
     "\n",
     "pattern = f\"Timepoint{timepoint}_Row{row}_Well{well}_{{channel}}_zstack{zstack_value}_r{{row:03}}_c{{col:03}}.tif\"\n",
     "\n",

diff --git a/src/scportrait/io/daskmmap.py b/src/scportrait/io/daskmmap.py
@@ -1,3 +1,6 @@
+import warnings
+
+warnings.filterwarnings("ignore", message=".*`dataframe.query-planning`.*")
 import dask
 import dask.array as da
 import h5py

diff --git a/src/scportrait/tools/ml/pretrained_models.py b/src/scportrait/tools/ml/pretrained_models.py
@@ -48,17 +48,18 @@ def get_data_dir() -> Path:
         Path to data directory
     """
 
-    def find_root_by_file(marker_file: str, current_path: Path) -> Path | None:
+    def find_root_by_folder(marker_folder: str, current_path: Path) -> Path | None:
         for parent in current_path.parents:
-            if (parent / marker_file).exists():
+            if (parent / marker_folder).is_dir():
                 return parent
         return None
 
-    src_code_dir = find_root_by_file("README.md", Path(__file__))
-    if src_code_dir is None:
-        raise FileNotFoundError("Could not find scPortrait root directory")
+    src_code_dir = find_root_by_folder("io", Path(__file__))
 
+    if src_code_dir is None:
+        raise FileNotFoundError("Could not find scPortrait source directory")
     data_dir = src_code_dir / "scportrait_data"
+
     return data_dir.absolute()