pmdtools package

Separating ancient DNA from modern contamination.

This package includes a modified version of PMDtools. The original software developed by Pontus Skoglund is available at https://code.google.com/p/pmdtools/.

PMDtools implements a likelihood framework incorporating postmortem damage (PMD), base quality scores and biological polymorphism to identify degraded DNA sequences that are unlikely to originate from modern contamination. Using the model, each sequence is assigned a PMD score, for which positive values indicate support for the sequence being genuinely ancient. For details of the method, please see the main paper in PNAS: http://www.pnas.org/content/111/6/2229.abstract

In addition, PMDtools also offers PMD-aware base quality score adjustment and investigation of damage patterns.

PMDtools takes SAM-formatted input, and requires an MD tag with alignment information. The MD tag is featured in the output of many aligners but can otherwise be added e.g. using the SAMtools fillmd/calmd tool (Li, Handsaker et al. 2009).

pmdtools_mod module

Calculate and filter post-mortem degeneration score for NGS reads in SAM files.

Modified version of PMDtools based on v0.55 by Pontus Skoglund:

cite: P Skoglund, BH Northoff, MV Shunkov, AP Derevianko, S Paabo, J Krause, M Jakobsson (2014) Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal, PNAS, advance online 27 January

Included changes:

  • count and show excluded reads due to unsupported CIGAR operations
  • count and show excluded reads due to % identity filtering
  • port from Python2 to Python3
  • bugfix: ambiguous combinations of CIGAR and MD with deletions lead to shifts in reconstructed reference sequence
  • support for all CIGAR operations except padded references (untested)
  • support of IUPAC ambiguity codes
  • optional parameters to adapt % identity calculation to MALT defaults
  • refactoring into main method
  • use of ArgumentParser over deprecated OptionParser
  • improved PEP8 and pylint compliance
  • ReST documentation
adjust_quality(position, model, qual)[source]

Adjust base quality value (sequencing error probability).

Parameters:
  • position (int) – Position of base from end of read.
  • model (list) – Distribution of damage probabilities.
  • qual (int) – PHRED score of base.
Returns:

Adjusted base quality value.

Return type:

float

fa_get(ffile, chrom, fstart, fend, samtoolspath, verbose=False)[source]

Unused method stub to get reference sequence from supplied FASTA?

Calls SAMtools in subprocess to create index of FASTA file.

Parameters:
  • ffile (str) – Path to FASTA file.
  • chrom (int) – Chromosome number.
  • fstart (int) – Start position.
  • fend (int) – End position.
  • samtoolspath (str) – Path to samtools.
  • verbose (bool, optional) – Enable output of executed commandline, defaults to False.
Returns:

reference sequence

Return type:

str

geometric(pval, kval, constant)[source]

Calculate geometrically distributed probability.

Parameters:
  • pval (float) – Parameter p of geometric distribution.
  • kval (float) – Parameter k of geometric distribution.
  • constant (float) – Constant value of geometric distribution.
Returns:

Probability of deamination based on geometric distribution.

Return type:

float

likelihood_match(position, model, qual, poly)[source]

Calculate likelihood of a match under given model.

Parameters:
  • pos (int) – Position of base from end of read.
  • model (list) – Distribution of damage probabilities.
  • qual (int) – PHRED score of base.
  • poly (float) – Probability of a true polymorphism at this base.
Returns:

Probability of match under given model.

Return type:

float

likelihood_mismatch(position, model, qual, poly)[source]

Calculate likelihood of a mismatch under given model.

Parameters:
  • pos (int) – Position of base from end of read.
  • model (list) – Distribution of damage probabilities.
  • qual (int) – PHRED score of base.
  • poly (float) – Probability of a true polymorphism at this base.
Returns:

Probability of mismatch under given model.

Return type:

float

main()[source]

Executable to calculate post-mortem degradation scores for SAM files.

See --help for details on expected arguments. Takes input only on STDIN. Logs messages to STDERR and writes processed SAM file to STDOUT.

phred2prob(quality)[source]

Convert PHRED score to probability of sequencing error.

Parameters:quality (int) – PHRED score.
Returns:The probability of a sequencing error.
Return type:float
prob(pos, model, qual, poly, match=True)[source]

Calculate probability of either match or mismatch under given model.

Expects distance of base from end of read, a pre-computed distribution of damage probabilities, PHRED score of base (probability of sequencing error) and probability of true polymorphism.

Parameters:
  • pos (int) – Position of base from end of read.
  • model (list) – Distribution of damage probabilities.
  • qual (int) – PHRED score of base.
  • poly (float) – Probability of a true polymorphism at this base.
  • match (bool, optional) – Switch between calculation for a match (default) or mismatch.
Returns:

Probability of either match or mismatch under given model.

Return type:

float

prob2ascii(probability, offset=33)[source]

Convert probability of sequencing error to ASCII-encoded PHRED score.

By default uses ASCII offset 33 (Illumina standard).

Parameters:
  • probability (float) – The probability of a sequencing error.
  • offset (int, optional) – ASCII encoding offset, defaults to 33.
prob2phred(probability)[source]

Convert probability of sequencing error to PHRED score.

Parameters:probability (float) – The probability of a sequencing error.
Returns:PHRED score.
Return type:float
score_pmd(read, ref, quals, ancient_model_deam, modern_model_deam, adjustment_model_deam=None, polymorphism_contamination=0.001, polymorphism_ancient=0.001, adjustbaseq_all=False, adjustbaseq=False, baseq=0, deamination=False, cpg=False, nocpg=False, udg_half=False, pmds_prim=False)[source]

Calculate post-mortem degradation score (PMDS).

Requires full read and reference sequence including clips, skips and gaps as recovered from CIGAR and MD to use the correct distance from 5’ or 3’ end in the PMDS calculation. However only the aligned read bases with PHRED scores are considered for PMD scoring.

Optional adjustment of base qualities requires additional adjustment model.

Parameters:
  • read (str) – Full read sequence.
  • ref (str) – Aligned reference sequence including clips, skips and gaps.
  • quals (list) – Quality scores for aligned bases.
  • ancient_model_deam (list) – Probability distribution for ancient deamination.
  • modern_model_deam (list) – Probability distribution for modern deamination.
  • adjustment_model_deam (list, optional) – Adjustments to the deamination probabilities, defaults to None.
  • polymorphism_contamination (float, optional) – Probability of polymorphism due to contamination, defaults to 0.001.
  • polymorphism_ancient (float, optional) – Probability of polymorphism due to age, defaults to 0.001.
  • adjustbaseq_all (bool, optional) – Switch to adjust all base quality values, defaults to False.
  • adjustbaseq (bool, optional) – Switch to adjust affected base quality values, defaults to False.
  • baseq (int, optional) – Minimum quality value for processed bases, defaults to 0
  • deamination (bool, optional) – Enable output of base frequencies in deaminated context, defaults to False.
  • cpg (bool, optional) – Only use Cs and Gs in CpG context, defaults to False.
  • nocpg (bool, optional) – Do NOT use Cs and Gs in CpG context, defaults to False.
  • udg_half (bool, optional) – Only use Cs and Gs in CpG context, the first and last base are used regardless of dinucleotide context. Defaults to False.
  • pmds_prim (bool, optional) – Defaults to False.
Returns:

  • float – Likelihood of deamination under ancient DNA model.
  • float – Likelihood of deamination under ancient DNA model.
  • float – Post-mortem degradation score
  • list – Adjusted base qualities.

Table Of Contents

Previous topic

SamSifter API documentation

Next topic

samsifter package

This Page