SigMA (Signature Multivariate Analysis): We developed SigMA to detect mutational signatures from the SNV calls of whole-genome, exome or targeted gene panel data. A detailed description of the algorithm and its performance is provided in the related manuscript.
In brief, SigMA consists of 5 main steps. First, mutational signatures in WGS data are discovered using NMF. Second, the tumor subtypes based on their signature composition are determined with clustering, and used as a reference for panels. Third, we simulate cancer-gene panels and exomes from the WGS data. In our simulations, the labels (whether a tumor is true Signature 3-positive or -negative) are known based on the signature analysis in WGS data. In the fourth step, the likelihood measure, cosine similarity and exposure of Signature 3 with NNLS are calculated for simulated panels, exomes and WGS data. Finally, we train Gradient Boosting Classifiers (GBCs) specific for each tumor type, and sequencing platform, using the features from step 4. The GBCs yield a final combined score. We determine the thresholds on SigMA score, which corresponds to small false positive rates, using the simulated data and the true labels from the WGS analysis. The thresholds depend on tumor type and on the platform.
Go to the "Get Started" tab. Browse and select a file or directory as described on the figure to upload your data.