Changelog_v2.1.txt

Changelog

v2.1

New features & bug fixes:

1:
  - Generation of consensus sequence is now parallel as the use of multiple threads for generating consensus sequences did not work efficiently. However, the process now uses all available cores. If only a single core is to be used the original version of the script is available as 1_generate_consensus_sequences_single-core.sh
  - The intronerated contig sequences is now recovered and saved as not interleaved, which had caused issued for the merging in step 4
  - The script now removes all text after the sample name in the fasta file, i.e. everything after the first space. HybPiper 2.0 does write information into the fasta files, which can lead to issues downstream.

Bugfixes:

1a: The script hang up when an empty consensus file existed. Now a consensus file must have at least some content. 

1b: Fixed last line of file "2a_List_of_paralogs_removed_for_all_samples.txt" by adding end of line character

1c: The removal of loci from sequence lists for intronerated /consensus did not work, but is fixed now. 

1c: Fixed bug that aborted sequence generation when there were failed loci

3a: Correct generation of reference names when using intronerated contigs/consensus


v2.0

Changed folder structure!
  Reads and contigs from HybPiper are copied to new folder (HybPhaser output folder/01_data)
  All HybPhaser output is collected in this folder

Name change from "supercontig" to "intronerated" to avoid ambiguous meaning of "supercontig"
  The term "supercontig" had used to refer to the sequence generated by HybPiper-intronerate, which contains the exons as well as intron regions. 
  However, the term is ambiguous. While HybPiper does refer to the intronerated contigs as supercontigs (mainly at the graphics and the contig file name), it refers   to the contigs generated by exonerate as supercontigs (especially in the code). 
  Therefore all sequences generated using intronerate are from now on called referred to as intronerated contigs or intronerated consensus sequences. 


Improved generation of consensus sequences!

  -Contig sequences and reads are collected and saved in output subfolder "01_data" before mapping. The missing gene name in the contig sequences are corrected. 
  Paired-end reads are usually sepearted into *_interleaved.fasta and *_unpaired.fasta. These are now combined into *._combined.fasta. BWA can account for mixed      paired and single end reads files. 
  -Output of script is improved to show which sample the process is working on and how long it takes per sample. 
  -Process is speed up using multi-threads in several more steps of the variant calling. (real spead up is questionable though)

Configuration in a single file (config.txt)

Script 1b_cleaning and 1c_heterozygosity_assessment combined to one script 1b_assessment
  Script generated also a file listing the removed paralogs (for all samples), which can be used as input for the paralog removal of phased accessions

...more small bug fixes and changes


v.1.3.


BugFixes:

- Fixed error when target files have spaces after gene names (1b)
  "Error in names(mean_seq_length_loci) <- gene_names : 'names' attribute [743] must be the same length as the vector [353]
  
- Problem with "'samples_to_remove' not found" resolved (1d)

- Changed rename function to mv for the sequence generation, to avoid issue with different rename versions. (1d)


Other:

Bash scripts wont exit when errors occur during operations in a loop, but continue. 


v.1.2

Added functionality:

- generates txt file with list of included samples (useful for clade association)
- changed "extract_mapped_reads.sh" to work with BLASTX output
    Before it extracted mapped reads from the BWA files, now it collects all mapped read files from the gene subdirectories and concatenates the files. 
    It further generated R1 and R2 files for paired-end reads. 


Bugfixes:

- does not abort when no paralogs_for_all are removed (1b)
- fixed "no threshold_value found" (1b)
- fixed wrong plots in PDf for diverse levels of heterozygosity (1c)
- removed irreleant error messages from "'system("cat ...") ' (1d)
- fixed error involving wrong locus name with error message that stopped script (1d)
- fixed issue with combinging tables (2b)
- fixed that non-phased sequences of phased samples are removed from the combined alignment (4b)

Known issues:

- extract mapping reads script only works with BAM files from BWA mapping. When BLASTX is used the output file in blastx format has to be transformed to BAM first. (blast2bam)


v.1.1

Added funtionality:

- Target file can be nucleotides or amino acid (fasta format). 
- Improved performance of Rscript_1d 
- Improve output of graphic for paralogous loci for all (1c)
  
Bug fixes:    

- fixed bug with read names in paird-end mode (2b)
- fixed bug with file name recognition (2b)
- reads correct Summary table (2b)
- fixed bug, now it gets the correct folder (3a)
- corrected numbers in graph for n loci, and n samples (1b)
- save table as R object with correct file ending (1c)
- empty only subfolder to start clean (1d)