index-clip.html

<!DOCTYPE html>
<html lang="en">
  <head>
  <title>Galaxy Australia</title>
  <meta property="og:title" content="" />
  <meta property="og:description" content="" />
  <meta property="og:image" content="/assets/media/galaxy-eu-logo.512.png" />
  <meta name="description" content="The Australian Galaxy Instance">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">

  <link rel="stylesheet" href="/assets/css/bootstrap.min.css">
  <link rel="stylesheet" href="/assets/css/main.css">


  <link rel="canonical" href="https://usegalaxy-au.github.io/index-clip.html">
  <link rel="shortcut icon" href="/assets/media/galaxy-eu-logo.64.png" type="image/x-icon" />
  <link rel="alternate" type="application/rss+xml" title="Galaxy Australia" href="/feed.xml">

  <link href="/assets/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous">
  <script src="/assets/js/jquery-3.2.1.slim.min.js" integrity="sha256-k2WSCIexGzOj3Euiig+TlR8gA0EmPjuc79OEeY5L45g=" crossorigin="anonymous"></script>
  <script src="/assets/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script>
</head>

  <body>
    <div id="wrap">
      <div id="main">
        <div class="container" id="maincontainer">
          <div class="home">


  <h1 id="galaxy-clip-explorer">Galaxy CLIP-Explorer</h1>

<p>Welcome to the Galaxy CLIP-Explorer – a webserver to process, analyse and visualise CLIP-Seq data.</p>

<p><img src="/assets/media/cover_design_clipseq.png" alt="" /></p>

<h2 id="1-getting-started-with-galaxy-clip-explorer">1. Getting Started with Galaxy CLIP-Explorer</h2>

<p>Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take <a target="_parent" href="https://hicexplorer.usegalaxy.eu/tours/core.galaxy_ui">a guided tour</a> through Galaxy’s user interface.</p>

<p>Take a look at the CLIP-Seq data analysis tutorial on the <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">Galaxy Training Network</a>  where you can analyse CLIP-Seq data of RBFOX2 from human liver cancer cells (Hep G2). The tutorial will help you to understand the analysis steps and the most important parameters and tools that are used in CLIP-Explorer.</p>

<p>The underlying workflow of the tutorial can be found <a target="_parent" href="https://github.com/galaxyproject/training-material/tree/master/topics/transcriptomics/tutorials/clipseq/workflows/">here</a>.</p>

<p>We recommend to follow the tutorial on <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html">FastQC</a> for quality checks and the tutorial for <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/introduction/tutorials/igv-introduction/tutorial.html">IGV</a> for data inspection.</p>

<p>The Galaxy Training Network tutorial uses eCLIP data from human liver cancer cells (Hep G2) and is hosted on zenodo: <a target="_parent" href="https://zenodo.org/record/1327423"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.1327423.svg" alt="DOI" /></a></p>

<p>Galaxy CLIP-Explorer can process large CLIP-Seq data of eCLIP and iCLIP. We processed eCLIP data with around 20 million reads from <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a>. CLIP-Explorer can handle multiplexed and de-multiplexed eCLIP and iCLIP data in FASTQ and FASTA format.</p>

<h2 id="2-galaxy-clip-explorer--many-possibilities">2. Galaxy CLIP-Explorer – Many Possibilities</h2>

<p><img src="/assets/media/content_design_clipseq.png" alt="" />
 <b>(A)</b> Galaxy CLIP-Explorer workflows and tools; <b>(B)</b> Output of <code class="highlighter-rouge">multiBamSummary</code> and <code class="highlighter-rouge">plotCorrelation</code> comparing two biological replicates of a CLIP-Seq experiment and one control sample. <b>(C)</b> Output of <code class="highlighter-rouge">plotFingerprint</code> that shows the read coverage for the CLIP-Seq and control samples. <b>(D)</b> Output of <code class="highlighter-rouge">CollectInsertSizeMetrics</code> estimating the insert size for the read libraries. <b>(E)</b> Output of <code class="highlighter-rouge">FastQC</code> showing the duplication levels of the read libraries. <b>(F)</b> Sequence motifs of <code class="highlighter-rouge">MEME-Chip</code> (DREME and MEME) from binding sequence motifs that were predicted from potential binding regions (peaks) obtained by a peak caller like <code class="highlighter-rouge">PEAKachu</code>, <code class="highlighter-rouge">Piranha</code> or <code class="highlighter-rouge">PureCLIP</code>. <b>(G-I)</b> Example output of <code class="highlighter-rouge">RCAS</code> (RNA Centric Annotation System); <b>(G)</b> showing the binding coverage for the transcript and the 5’ and 3’ UTR, <b>(H)</b> depicting the binding coverage around the exon-intron boundaries, <b>(I)</b> and a generated target distribution plot which states what kind of RNAs the protein of interest prevalently binds to.</p>

<h2 id="3-workflows">3. Workflows</h2>

<p>We provide the subsequent workflows to automatize the data analysis for iCLIP and eCLIP data. All workflows can be found <a href="https://github.com/Florian-H-Lab/CLIP-Explorer">here</a>. The data needs to be in FASTA or FASTQ format and can be either multiplexed or de-multiplexed. All workflows, except the robust peak analysis, require the data as a list of dataset pairs. A tutorial to create a list of dataset pairs can be found in the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a> or <a href="https://galaxyproject.github.io/training-material/topics/galaxy-data-manipulation/tutorials/collections/tutorial.html">here</a>. Please have in mind that all workflows need additional input files from the user.</p>

<h3 id="31-from-scratch-to-de-multiplexed-fastq-files">3.1 From scratch to de-multiplexed FASTQ files</h3>

<p>If your data is not de-multiplexed yet, then use the following workflows. The user has to provide the in-line barcodes in a tab-delimited tabular format, for example:</p>

<ul>
  <li>rep1  TTAG</li>
  <li>rep2	TGGC</li>
  <li>rep3	TTAA</li>
</ul>

<p>The raw data needs to be in FASTA or FASTQ format as a list of dataset pairs.</p>

<ul>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1demultiplexeclip">Workflow to de-multiplex eCLIP read library</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2demultiplexiclip">Workflow to de-multiplex iCLIP read library</a></li>
</ul>

<h3 id="32-from-scratch-with-de-multiplexed-fastq-files">3.2 From scratch with de-multiplexed FASTQ files</h3>

<p>You can choose between three different types of peak calling for the data analysis of eCLIP and iCLIP data. The data specification of each of the peak calling algorithms is listed below:</p>

<p><strong>Table 1</strong>: Data specification of the different peak calling algorithms.</p>

<table class="table table-striped">
  <thead>
    <tr>
      <th>Tool</th>
      <th style="text-align: center">Biological Replicates (Yes/No)</th>
      <th style="text-align: center">Control Data (Yes/No)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="https://github.com/tbischler/PEAKachu">PEAKachu</a></td>
      <td style="text-align: center">Yes</td>
      <td style="text-align: center">Yes</td>
    </tr>
    <tr>
      <td><a href="https://doi.org/10.1186/s13059-017-1364-2">PureCLIP</a></td>
      <td style="text-align: center">No</td>
      <td style="text-align: center">Yes</td>
    </tr>
    <tr>
      <td><a href="https://doi.org/10.1093/bioinformatics/bts569">Piranha</a></td>
      <td style="text-align: center">No</td>
      <td style="text-align: center">No</td>
    </tr>
  </tbody>
</table>

<h4 id="note-if-you-have-used-the-de-mutliplexing-workflows">Note if you have used the de-mutliplexing workflows:</h4>
<p>If you used the preceding workflows for de-multiplexing, then remove the steps of <code class="highlighter-rouge">Cutadapt</code> and <code class="highlighter-rouge">UMI-tools extract</code> from the following workflows to analyse your data. Simply, import the workflow into you account, remove the tools and connect the lose end directly to the alignment step.</p>

<h4 id="note-if-you-use-eclip-data-of-nostrand-et-al-2016">Note if you use eCLIP data of Nostrand et al. (2016):</h4>
<p>The workflow for the eCLIP data of <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a> was used to analyse the data of RBFOX2. Beware when using other data of the study of <a href="https://doi.org/10.1038/nmeth.3810">Nostrand et al. (2016)</a>, because the size of the unique molecular identifier (UMI) can be different. The workflow is set to a UMI of five nucleotides. You can change this by importing the workflow into your account and amend the parameter <code class="highlighter-rouge">Cut bases from reads before adapter trimming</code> of the second <code class="highlighter-rouge">Cutadapt</code> step for the CLIP and control data.</p>

<h4 id="eclip">eCLIP</h4>
<ul>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1clipseq-explorerdemultiplexedpeakachuecliphg19n5-1">Workflow for the eCLIP data of Nostrand et al. (2016)</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2clipseq-explorerdemultipeakachuecliphg19">Peak calling with PEAKachu</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/3clipseq-explorerdemultipureclipecliphg19">Peak calling with PureCLIP</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/4clipseq-explorerdemultipiranhaecliphg19">Peak calling with Piranha</a></li>
</ul>

<h4 id="iclip">iCLIP</h4>
<ul>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/1clipseq-explorerdemultipeakachuicliphg19">Peak calling with PEAKachu</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/2clipseq-explorerdemultipureclipicliphg19">Peak calling with PureCLIP</a></li>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/3clipseq-explorerdemultipiranhaicliphg19">Peak calling with Piranha</a></li>
</ul>

<h3 id="33-further-optional-peak-analysis">3.3 Further optional peak analysis</h3>

<p>The following workflow can be used if you have picked a peak calling algorithm that do not support biological replicated data. The workflow finds and analysis robust binding regions shared between different peak files.</p>

<ul>
  <li><a href="https://clipseq.usegalaxy.eu/u/heylf/w/robustpeakanalysis">Robust peak analysis</a></li>
</ul>

<h2 id="4-remarks">4. Remarks</h2>

<p>Please follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a> for a deeper understand of the tools of CLIP-Explorer. Changes to the workflows can be done anytime and without any problems. Simply import the workflow into your account and amend the necessary tools. Therefore, keep the following things in mind:</p>

<h3 id="41-adapter-sequences">4.1 Adapter sequences</h3>
<p>The workflows uses <code class="highlighter-rouge">Cutadapt</code> to remove standard eCLIP and iCLIP adapter sequences. You need to change <code class="highlighter-rouge">Cutadapt</code> parameters if your read library covers other adapter sequences.</p>

<h3 id="42-umi-and-in-line-barcodes">4.2 UMI and in-line barcodes</h3>
<p>The workflows uses <code class="highlighter-rouge">Cutadapt</code> to trim of the length of the UMI (+ barcode) from one site of the read pair. This depends on the iCLIP, eCLIP and your own protocol. Please check or change the parameter in <code class="highlighter-rouge">Cutadapt</code> based on your UMI and in-line barcode. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>.</p>

<p>CLIP-explorer uses <code class="highlighter-rouge">UMI-tools extract</code> to find the UMIs inside your reads. Change the pattern of <code class="highlighter-rouge">UMI-tools extract</code> based on your read library preparation.</p>

<h3 id="43-read-alignment">4.3 Read alignment</h3>
<p>Read alignment is done with <code class="highlighter-rouge">STAR</code> which combines genome and transcriptome data. CLIP-Explorer focusses only on uniquely mapped read. Furthermore, <code class="highlighter-rouge">STAR</code> is executed with soft-clipping turned off. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>.</p>

<h3 id="44-peak-calling-with-peakachu">4.4 Peak calling with PEAKachu</h3>
<p>You need to specific the insert size of your paired-end reads for <code class="highlighter-rouge">PEAKachu</code>. For that reason, check the output image of <code class="highlighter-rouge">CollectInsertSizeMetric</code> to get an estimate for that parameter.</p>

<h3 id="45-peak-calling-with-pureclip">4.5 Peak calling with PureCLIP</h3>
<p>PureCLIP works best with only one site of the paired end reads, where the cross linking event occurs. Thus, CLIP-Explorer filters out the other mate before the peak calling. Remove the <code class="highlighter-rouge">Bam filter</code> tool to disable this behavior or change <code class="highlighter-rouge">Bam filter</code> to pick the correct site.</p>

<h3 id="46-extension-of-the-binding-regions">4.6 Extension of the binding regions</h3>
<p>CLIP-Explorer uses <code class="highlighter-rouge">SlopBED</code> to extend the peaks a few basepairs to the left and right in order to correct for an underestimation of the binding regions of the peak calling algorithms. For more information follow the CLIP-Seq data analysis <a target="_parent" href="https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/clipseq/tutorial.html">tutorial</a>. Remove the tool or change the parameter of <code class="highlighter-rouge">SlopBED</code> to change this behavior.</p>


<h2>Our Data Policy</h2>

<style>
    th, td {
        border-bottom: 1px solid #ddd;
        padding: 10px;
    }
    th {
        background-color: #f2f2f2;
    }

</style>

<p><strong>Galaxy Australia</strong> is designed for data analysis and not for
long term storage of data.</p>
<br />
<p>Use Galaxy Australia to host your input data only for the period required for
analysis. Also, remember to export/download your analysed data as this also
will not be stored beyond the limits set out below.</p>

<p>It is <u>your responsibility</u> as a user of the system to manage your own data
and routinely remove both your input and output data from this community system
to enable capacity for other users. Any data that is not removed by you within
a defined time period (see below) will be automatically and permanently deleted.
</p>

<p>Galaxy Australia maintains a collection of frequently used reference genomes
and annotation datasets. The inclusion of additional reference genomes and/or
annotation data on the system for community use can be <a href="https://docs.google.com/forms/d/e/1FAIpQLSdXuarvkzFA5kRqoCfO8uiUGAB0PplfR4yvAfpCPSpdMcehmA/viewform">
requested</a>. Galaxy Australia's hosting of all reference and annotation data
does not count to your quota and it is the best way to access
reference/annotation data.</p>

<h3>Data storage quotas and retention periods</h3><br />
<center>
<table>
    <tr>
        <th></th>
        <th>Storage quota</th>
        <th>Data retention period</th>
    </tr>
    <tr>
        <td><strong>Registered Australian researchers</strong></td><td>600GB</td><td>1 year (52 weeks)</td>
    </tr>
    <tr>
        <td><strong>Other registered users</strong></td><td>100GB</td><td>1 year (52 weeks)</td>
    </tr>
    <tr>
        <td><strong>Unregistered users</strong></td><td>5GB</td><td>NA</td>
    </tr>
</table>
</center>

<h3>Registered Users</h3>

<ul>
    <li><u>Registered Australian Researchers</u> are defined by registration
        Email from:
        <ul>
            <li>@domain.edu.au</li>
            <li>@domain.org.au</li>
            <li>@domain.edu - only in the case of known Australian Universities
                not on the .au domain</li>
        </ul>
    </li>
    <li><u>Other Registered Users</u> are defined as any registration Email
        from all other @domains</li>
    <li>Please contact us (help@genome.edu.au) if your institution does not
        conform to this rule but you understand it should be defined as
        performing publicly funded Australian research</li>
</ul>

<p>Registered users from Australian publicly funded research organisations have
a 600GB data storage quota. Other registered users have a 100GB quota. An
increased data storage quota can be <a href="https://docs.google.com/forms/d/e/1FAIpQLSeiw6ajmkezLCwbXc3OFQEU3Ai9hGnBd967u9YbQ8ANPgvatA/viewform">
requested</a> for a limited time period in special cases.</p>


<p>Registered User's data (i.e. datasets, histories) will be available on the
system for 1 year (52 weeks) from the point of upload or creation. Within this period,
any data marked by a Registered User as "deleted" will be permanently removed
within 5 days. If a registered user "purges" the dataset, it will be removed
immediately and permanently.</p>

<h3>Unregistered Users</h3>

<p>Processed data will only be accessible during one browser session, using a
browser cookie to identify an Unregistered User's data. This cookie is not used
for any other purposes (e.g. tracking or analytics.)</p>


<h3>What does it mean when I go over quota?</h3>

<p>You data and histories are still accessible, but you not be able to run new
jobs or import more data. If you know in advance then take advantage of
<a href="https://docs.google.com/forms/d/e/1FAIpQLSeiw6ajmkezLCwbXc3OFQEU3Ai9hGnBd967u9YbQ8ANPgvatA/viewform">
requesting</a> more analysis storage or downloading and deleting old, unwanted data.</p>

<style>
    .column {
        float: left;
        width: 25%;
    }
    
    .big_column {
        float: left;
        width: 100%
    }
    /* Clear floats after the columns */
    
    .row:after {
        content: "";
        display: table;
        clear: both;
    }
    /* Responsive layout - makes a two column-layout instead of four columns */
    
    @media screen and (max-width: 1200px) {
        .column {
            width: 50%;
        }
    }
    /* Responsive layout - makes the two columns stack on top of each other instead of next to each other */
    
    @media screen and (max-width: 600px) {
        .column {
            width: 100%;
        }
    }
</style>
<div class="row" width="90%">
    <div class="big_column">
        <!--<iframe width="100%" height="200px" src="https://stats.genome.edu.au/d-solo/-D4mtTAik/for_embedding?refresh=10s&orgId=1&panelId=2" frameborder="0" ></iframe> -->
        <iframe src="https://stats.usegalaxy.org.au/d-solo/-D4mtTAik/for_embedding?orgId=1&refresh=10s&panelId=2" width="100%" height="200px" frameborder="0"></iframe>
    </div>
</div>
<center>
    <div class="row" width="90%">
        <div class="column">
            <iframe src="https://stats.usegalaxy.org.au/d-solo/-D4mtTAik/for_embedding?orgId=1&refresh=1d&panelId=19" width="95%" height="110px" frameborder="0"></iframe>
        </div>
        <div class="column">
            <iframe src="https://stats.usegalaxy.org.au/d-solo/-D4mtTAik/for_embedding?orgId=1&refresh=1d&panelId=21" width="95%" height="110px" frameborder="0"></iframe>
        </div>
        <div class="column">
            <iframe src="https://stats.usegalaxy.org.au/d-solo/-D4mtTAik/for_embedding?orgId=1&refresh=1d&panelId=23" width="95%" height="110px" frameborder="0"></iframe>
        </div>
        <div class="column">
            <iframe src="https://stats.usegalaxy.org.au/d-solo/-D4mtTAik/for_embedding?orgId=1&refresh=1d&panelId=25" width="95%" height="110px" frameborder="0"></iframe>
        </div>
    </div>
</center>
<div class="row">

    <section class="section-content">
        <div class="col-md-12">
        </div>
    </section>
</div>

</div>

        </div>
      </div>
    </div>
    

<footer class="navbar-default">
    <div class="container">
        <div class="row">
            <div class="col-lg-12" style="text-align:center">
                <p>UseGalaxy.org.au is maintained largely by the <a href="/people">Australian Galaxy Team</a> including staff from QCIF, UQ-RCC and Melbourne Bioinformatics.
All content on this site is available under <a href="https://creativecommons.org/share-your-work/public-domain/cc0/" target="_blank">CC0-1.0</a>, unless otherwise specified.
Galaxy Australia is currently running Galaxy version 21.09 (September 2021)</p>

            </div>
        </div>
        <div class="row">
            <div class="col-lg-12" style="text-align:center">
                <ul class="contact-info">
                    
                      <li><i class="fa fa-envelope"></i><a href="mailto:help@genome.edu.au">help@genome.edu.au</a></li>
                    
                    
                      <li><i class="fa fa-github"></i><a href="https://github.com/usegalaxy-au" target="_blank">usegalaxy-au</a></li>
                    
                    
                      <li><i class="fa fa-twitter"></i><a href="https://twitter.com/galaxyaustralia" target="_blank">galaxyaustralia</a></li>
                    
                      <!-- <li><i class="fa fa-rss"></i>Subscribe <a href="/feed.xml">via RSS (UseGalaxy.eu Feed)</a></li> -->
                </ul>
            </div>
        </div>
    </div>
</footer>

  </body>
</html>