Separating ancient DNA from modern contamination.
This package includes a modified version of PMDtools. The original software developed by Pontus Skoglund is available at https://code.google.com/p/pmdtools/.
PMDtools implements a likelihood framework incorporating postmortem damage (PMD), base quality scores and biological polymorphism to identify degraded DNA sequences that are unlikely to originate from modern contamination. Using the model, each sequence is assigned a PMD score, for which positive values indicate support for the sequence being genuinely ancient. For details of the method, please see the main paper in PNAS: http://www.pnas.org/content/111/6/2229.abstract
In addition, PMDtools also offers PMD-aware base quality score adjustment and investigation of damage patterns.
PMDtools takes SAM-formatted input, and requires an MD tag with alignment information. The MD tag is featured in the output of many aligners but can otherwise be added e.g. using the SAMtools fillmd/calmd tool (Li, Handsaker et al. 2009).
Calculate and filter post-mortem degeneration score for NGS reads in SAM files.
Modified version of PMDtools based on v0.55 by Pontus Skoglund:
cite: P Skoglund, BH Northoff, MV Shunkov, AP Derevianko, S Paabo, J Krause, M Jakobsson (2014) Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal, PNAS, advance online 27 January
Included changes:
Adjust base quality value (sequencing error probability).
Parameters: | |
---|---|
Returns: | Adjusted base quality value. |
Return type: | float |
Unused method stub to get reference sequence from supplied FASTA?
Calls SAMtools in subprocess to create index of FASTA file.
Parameters: | |
---|---|
Returns: | reference sequence |
Return type: | str |
Calculate geometrically distributed probability.
Parameters: | |
---|---|
Returns: | Probability of deamination based on geometric distribution. |
Return type: | float |
Calculate likelihood of a match under given model.
Parameters: | |
---|---|
Returns: | Probability of match under given model. |
Return type: | float |
Calculate likelihood of a mismatch under given model.
Parameters: | |
---|---|
Returns: | Probability of mismatch under given model. |
Return type: | float |
Executable to calculate post-mortem degradation scores for SAM files.
See --help for details on expected arguments. Takes input only on STDIN. Logs messages to STDERR and writes processed SAM file to STDOUT.
Convert PHRED score to probability of sequencing error.
Parameters: | quality (int) – PHRED score. |
---|---|
Returns: | The probability of a sequencing error. |
Return type: | float |
Calculate probability of either match or mismatch under given model.
Expects distance of base from end of read, a pre-computed distribution of damage probabilities, PHRED score of base (probability of sequencing error) and probability of true polymorphism.
Parameters: | |
---|---|
Returns: | Probability of either match or mismatch under given model. |
Return type: | float |
Convert probability of sequencing error to ASCII-encoded PHRED score.
By default uses ASCII offset 33 (Illumina standard).
Parameters: |
|
---|
Convert probability of sequencing error to PHRED score.
Parameters: | probability (float) – The probability of a sequencing error. |
---|---|
Returns: | PHRED score. |
Return type: | float |
Calculate post-mortem degradation score (PMDS).
Requires full read and reference sequence including clips, skips and gaps as recovered from CIGAR and MD to use the correct distance from 5’ or 3’ end in the PMDS calculation. However only the aligned read bases with PHRED scores are considered for PMD scoring.
Optional adjustment of base qualities requires additional adjustment model.
Parameters: |
|
---|---|
Returns: |
|