New Hydra genomes reveal conserved principles of hydrozoan transcriptional regulation

This repository contains step-by-step descriptions of all analyses and associated code related to the following manuscript:

Cazet JF, Siebert S, Morris Little H, Bertemes P, Primack AS, Ladurner P, Achrainer M, Fredriksen MT, Moreland RT, Singh S, Zhang S, Wolfsberg TG, Schnitzler CE, Baxevanis AD, Simakov O, Hobmayer B, Juliano CE. A chromosome-scale epigenetic map of the Hydra genome reveals conserved regulators of cell state. Genome Res. 2023 Jan 13:gr.277040.122. doi: 10.1101/gr.277040.122. PMID: 36639202.

The manuscript is also accompanied by a genome portal, available here, that allows users to interact with and download the data generated in this study. A BLAST server is available to search for genes of interest in the H. oligactis and strain AEP H. vulgaris gene models. The portal includes an interactive genome browser for visualizing gene models, repetitive regions, ATAC-seq and CUT&Tag peaks, ATAC-seq and CUT&Tag read density, and sequence conservation across the AEP assembly. The website also features an interactive ShinyCell portal for viewing the AEP-aligned Hydra single-cell atlas.

Structure and intent of this repository

This repository is organized around markdown documents that are each focused on one particular computational aspect of the manuscript. Each markdown includes all code used for the analysis in question along with accompanying text that explains the code's purpose and rationale. Each markdown is also accompanied by a folder that contains the original script files used for the analysis as well as files generated by the analysis itself. Descriptions for all files within each folder can be found at the bottom of the accompanying markdown document.

Our intention in generating this repository was to document the methodology we used to produce the results reported in the manuscript in sufficient detail for other researchers to reproduce our findings. However, the code is written in a manner that relies on directory/file structures and software path configurations that are specific to the systems on which the analyses were initially performed. This original file organization is not recapitulated by this repository. In addition, because of file size limitations, we are not able to provide all necessary files for every analysis via GitHub. As such, users will need to modify the paths within each script and download additional files from other sources (described below) for the code to run properly after the repository has been cloned.

Accessing additional necessary files

All files necessary for performing the analyses described in this repository are available through the Hydra vulgaris, strain AEP genome portal. Specifically, we provide complete versions of the folders that accompany each markdown document in this repository, including all files that were too large to host on GitHub. We also provide all sequencing data as well as R binary files containing various versions of the AEP-mapped single-cell RNA-seq atlas formatted as Seurat objects (v4).

Raw sequencing data is also available via NCBI under the BioProject ID PRJNA816482. The strain AEP H. vulgaris genome assembly is hosted on GenBank under the accession JALDPZ010000000 and the H. oligactis assembly is hosted under the accession JALDAD010000000.

A note on naming conventions

When preparing the new genome portal, modifications were made to the naming conventions used for genome contigs/scaffolds and gene/transcript models. Because these changes were done after all analyses for the manuscript had already been completed, the code in this repository is based around a different naming convention than the one used for the genome portal.

The scaffold naming convention used for the H. vulgaris, strain AEP genome assembly and annotation process uses the prefix 'chr-' (for chromosome) followed by a number. For example, chr-1 refers to the scaffold 'chromosome 1'. Gene models were named using the format HVAEP1_G######, with 'HVAEP1' indicating the genome version (i.e., H. vulgaris, strain AEP, version 1), 'G' indicating that the identifier refers to a gene, and '######' being a unique padded numeric ID for a particular gene model. For example, the gene name for wnt3 is HVAEP1_G010730. Genes are named according to their order in the genome, such that HVAEP1_G010729 is the gene immediately upstream of HVAEP1_G010730 and HVAEP1_G010731 is the the gene immediately downstream of HVAEP1_G010730. The transcript naming convention uses the format HVAEP1_T######.#, with 'HVAEP1' again indicating the genome version, 'T' indicating that the identifier refers to a transcript, '######' being the same unique numeric ID as the parent gene, and '.#' indicating the transcript isoform number. For example, the first isoform for wnt3 in the AEP gene models is HVAEP1_T010730.1.

On the genome portal, the AEP scaffold prefix was modified from 'chr-' to 'HVAEP', such that the scaffold chr-1 became HVAEP1. For the AEP gene and transcript models, the 'HVAEP1' prefix, which was initially intended to indicate genome version, was modified to instead reflect the scaffold that contains the gene model. In addition, underscores were replaced with a dot. Thus, the wnt3 transcript ID HVAEP1_T010730.1 was changed to HVAEP6.T010730.1.

The contig/scaffold naming convention used for the H. oligactis genome assembly process uses either the prefix 'contig_' or 'scaffold_' followed by an arbitrary number. The transcript models follow the standard AUGUSTUS format (e.g., g1842.t1), with the 'g' prefix indicating that the identifier refers to a gene model, a unique non-padded numeric ID for each gene ( '1842'), followed by a transcript isoform ID ('.t1').

On the genome portal, the oligactis contigs/scaffolds were renamed to all have the prefix 'HOLI'. The numbering was also changed to reflect contig/scaffold size, with the largest contig/scaffold being assigned the ID HOLI00001 (previously contig_18179) and the smallest being assigned the ID HOLI16314 (previously contig_5588). Gene models kept the same AUGUSTUS-formatted ID, but the parent contig/scaffold was prepended to each ID, such that g1842.t1 became HOLI00150.g1842.t1.

All files and code associated with this repository, including the downloadable files hosted on the genome portal (specifically via the 'Scripts and Data' page, available here), use the original naming convention, whereas all other parts of the genome portal use the new naming convention. We include tables in the folder 'ID_Conversion' that provide the necessary information for mapping the IDs from the old naming convention to their equivalent ID under the new naming convention.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01_aepGenomeAssembly		01_aepGenomeAssembly
02_repeatMasking		02_repeatMasking
03_aepGenomeAnnotation		03_aepGenomeAnnotation
04_oligactisDraftGenome		04_oligactisDraftGenome
05_hydraAtlasReMap		05_hydraAtlasReMap
06_geneAge		06_geneAge
07_genomeConservation		07_genomeConservation
08_creIdentification		08_creIdentification
09_3dChromatin		09_3dChromatin
10_hydraRegulators		10_hydraRegulators
11_clytiaAtlasReMap		11_clytiaAtlasReMap
12_crossSpeciesAtlasAlignment		12_crossSpeciesAtlasAlignment
13_conservedRegulators		13_conservedRegulators
ID_Conversion		ID_Conversion
resources		resources
01_aepGenomeAssembly.html		01_aepGenomeAssembly.html
01_aepGenomeAssembly.md		01_aepGenomeAssembly.md
02_repeatMasking.html		02_repeatMasking.html
02_repeatMasking.md		02_repeatMasking.md
03_aepGenomeAnnotation.html		03_aepGenomeAnnotation.html
03_aepGenomeAnnotation.md		03_aepGenomeAnnotation.md
04_oligactisDraftGenome.html		04_oligactisDraftGenome.html
04_oligactisDraftGenome.md		04_oligactisDraftGenome.md
05_hydraAtlasReMap.html		05_hydraAtlasReMap.html
05_hydraAtlasReMap.md		05_hydraAtlasReMap.md
06_geneAge.html		06_geneAge.html
06_geneAge.md		06_geneAge.md
07_genomeConservation.html		07_genomeConservation.html
07_genomeConservation.md		07_genomeConservation.md
08_creIdentification.html		08_creIdentification.html
08_creIdentification.md		08_creIdentification.md
09_3dChromatin.html		09_3dChromatin.html
09_3dChromatin.md		09_3dChromatin.md
10_hydraRegulators.html		10_hydraRegulators.html
10_hydraRegulators.md		10_hydraRegulators.md
11_clytiaAtlasReMap.html		11_clytiaAtlasReMap.html
11_clytiaAtlasReMap.md		11_clytiaAtlasReMap.md
12_crossSpeciesAtlasAlignment.html		12_crossSpeciesAtlasAlignment.html
12_crossSpeciesAtlasAlignment.md		12_crossSpeciesAtlasAlignment.md
13_conservedRegulators.html		13_conservedRegulators.html
13_conservedRegulators.md		13_conservedRegulators.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New Hydra genomes reveal conserved principles of hydrozoan transcriptional regulation

Structure and intent of this repository

Accessing additional necessary files

A note on naming conventions

About

Releases

Packages

Languages

sjwu571/brown_hydra_genomes

Folders and files

Latest commit

History

Repository files navigation

New Hydra genomes reveal conserved principles of hydrozoan transcriptional regulation

Structure and intent of this repository

Accessing additional necessary files

A note on naming conventions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages