Filters, tools and wrappers for the manipulation of SAM files.
This package includes a collection of stand-alone filters and tools as well as wrappers for external tools that process SAM files in a POSIX pipeline context. They can be divided into the following groups:
All modules in this package should provide an item() method which returns an instance of samsifter.models.filter.FilterItem representing the tool and its parameters in the SamSifter GUI. This item should also point to either the external command or the entry point of the script/tool/main method that is supposed to be executed. The executable main() may be located within the same module but this is not required at all.
New tools or wrappers need to be imported in samsifter.samsifter and also registered in its method samsifter.samsifter.MainWindow.populate_filters() to be selectable from the SamSifter main menu and tools dock. In case of Python scripts it may be convenient to also add an entry point in setup.py to let the Python installation routine automatically create executables that work on any of the supported operating systems.
Wrapper for SAMtools view functionality converting BAM to SAM files.
Wrapper for PMDtools score calculation functionality.
Wrapper for GNU Gzip compression functionality.
Analysing filter step to count reads per taxon.
Wrapper for GNU Gzip decompression functionality.
Filters highly conserved reads in a SAM file.
Identifies reads assigned to multiple taxa with similar identity. Excludes reads mapping to different accessions/taxa with similar alignment scores.
Wrapper for PMDtools identity filter functionality.
Filters reads by list of QNAMES.
Filtering reads by a list of QNAMES (read identifiers) given in a tab-separated CSV file.
Wrapper for PMDtools ancient read filter functionality.
Identify reference accessions with uneven coverage in MALT’ed SAM files.
Comes with several methods to create optional plots of coverage and read length distributions.
Warning
Activating the plotting of these distributions for a large input dataset can create I/O problems due to the large amounts of PNG files generated. It will also decrease the performance of this filter considerably and should only be used to troubleshoot filter parameters for small subsets of the data.
Calculate average depth from a coverage distribution.
Optionally ignores uncovered bases (first array element).
Parameters: |
|
---|
Calculates length of reference covered by read from CIGAR operations.
Note
Parameters: | cigar (str) – Unmodified CIGAR string from SAM file. |
---|---|
Returns: | Length of the reference sequence. |
Return type: | int |
Calculates Gini coefficient and area under Lorenz curve.
The Gini coefficient (also known as the Gini index) is a measure of statistical dispersion. When applied to the distribution of aligned read bases per reference base an even distribution of reads across the reference should have a low Gini coefficient (towards 0) while an alignment with all reads covering the same reference region should have a high Gini coefficient (towards 1).
Parameters: |
|
---|---|
Returns: |
|
Integrate discrete distribution with stepsize 1 by adding up values.
Parameters: |
|
---|---|
Returns: | Integral of the distribution between 0 and upper limit. |
Return type: | float |
Integrates scaled discrete distribution with arbitrary stepsize.
Parameters: |
|
---|---|
Returns: | Integral of the distribution between 0 and upper limit. |
Return type: | float |
Create item representing this tool in list and tree views.
Returns: | Item for use in item-based list and tree views. |
---|---|
Return type: | FilterItem |
Calculates Lorenz curve from coverage distribution.
Parameters: | depth_dist (array_like) – Coverage depth distribution. |
---|---|
Returns: |
|
Calculate Lorenz curve from base2base coverage distribution.
Parameters: |
|
---|---|
Returns: |
|
Executable to filter SAM files for references with uneven coverage.
See --help for details on expected arguments. Takes input from either STDIN, or optional, or positional arguments. Logs messages to STDERR and writes processed SAM files to STDOUT.
Creates a bar plot of a cumulative coverage distribution.
Parameters: |
|
---|
Creates a bar plot of a coverage distribution.
Parameters: |
|
---|
Creates a Lorenz curve plot of a coverage distribution.
Parameters: |
|
---|
Creates a Lorenz curve plot of a base2base coverage distribution.
Parameters: |
|
---|
Creates a bar plot of a normalized cumulative coverage distribution.
Includes legend stating average scaled and total depth.
Parameters: |
|
---|
Creates a bar plot of a read length distribution.
Includes scaled expected distribution based on all reads in file.
Parameters: |
|
---|
Creates a bar plot of a scaled cumulative coverage distribution.
Parameters: |
|
---|
Filters references by identity values of assigned reads.
This filter processes reference accessions with too few or too many reads of high or low percent identity in MALT’ed SAM files.
Filter references by a list of accessions.
Filter references with high attribution of ancient reads in a MALT’ed and PMD’ed SAM file
Filter SAM files for a list of taxon IDs.
Filter taxa with high attribution of ancient reads in a MALT’ed and PMD’ed SAM file
Wrapper for SAMtools rmdup
Wrapper for SAMtools view functionality to convert SAM to BAM files.
Wrapper for SAMtools sort functionality for sorting reads by coordinates.