TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

TumorTracer : a method to identify the tissue of origin from the somatic mutations of a tumor specimen. / Marquard, Andrea Marion; Birkbak, Nicolai Juul; Thomas, Cecilia Engel; Favero, Francesco; Krzystanek, Marcin; Lefebvre, Celine; Ferté, Charles; Jamal-Hanjani, Mariam; Wilson, Gareth A; Shafi, Seema; Swanton, Charles; André, Fabrice; Szallasi, Zoltan; Eklund, Aron Charles.

In: BMC Medical Genomics, Vol. 8, 01.10.2015, p. 58.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Marquard, AM, Birkbak, NJ, Thomas, CE, Favero, F, Krzystanek, M, Lefebvre, C, Ferté, C, Jamal-Hanjani, M, Wilson, GA, Shafi, S, Swanton, C, André, F, Szallasi, Z & Eklund, AC 2015, 'TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen', BMC Medical Genomics, vol. 8, pp. 58. https://doi.org/10.1186/s12920-015-0130-0

APA

Marquard, A. M., Birkbak, N. J., Thomas, C. E., Favero, F., Krzystanek, M., Lefebvre, C., Ferté, C., Jamal-Hanjani, M., Wilson, G. A., Shafi, S., Swanton, C., André, F., Szallasi, Z., & Eklund, A. C. (2015). TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen. BMC Medical Genomics, 8, 58. https://doi.org/10.1186/s12920-015-0130-0

Vancouver

Marquard AM, Birkbak NJ, Thomas CE, Favero F, Krzystanek M, Lefebvre C et al. TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen. BMC Medical Genomics. 2015 Oct 1;8:58. https://doi.org/10.1186/s12920-015-0130-0

Author

Marquard, Andrea Marion ; Birkbak, Nicolai Juul ; Thomas, Cecilia Engel ; Favero, Francesco ; Krzystanek, Marcin ; Lefebvre, Celine ; Ferté, Charles ; Jamal-Hanjani, Mariam ; Wilson, Gareth A ; Shafi, Seema ; Swanton, Charles ; André, Fabrice ; Szallasi, Zoltan ; Eklund, Aron Charles. / TumorTracer : a method to identify the tissue of origin from the somatic mutations of a tumor specimen. In: BMC Medical Genomics. 2015 ; Vol. 8. pp. 58.

Bibtex

@article{948f352c2bfb4b19ab9ceb527a837009,
title = "TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen",
abstract = "BACKGROUND: A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.METHODS: We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.RESULTS: The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.CONCLUSIONS: Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.",
keywords = "Breast Neoplasms/genetics, Databases, Genetic, Female, Genes, Neoplasm, Humans, Lung Neoplasms/genetics, Male, Organ Specificity, Point Mutation, Polymorphism, Single Nucleotide",
author = "Marquard, {Andrea Marion} and Birkbak, {Nicolai Juul} and Thomas, {Cecilia Engel} and Francesco Favero and Marcin Krzystanek and Celine Lefebvre and Charles Fert{\'e} and Mariam Jamal-Hanjani and Wilson, {Gareth A} and Seema Shafi and Charles Swanton and Fabrice Andr{\'e} and Zoltan Szallasi and Eklund, {Aron Charles}",
year = "2015",
month = oct,
day = "1",
doi = "10.1186/s12920-015-0130-0",
language = "English",
volume = "8",
pages = "58",
journal = "BMC Medical Genomics",
issn = "1755-8794",
publisher = "BioMed Central Ltd.",

}

RIS

TY - JOUR

T1 - TumorTracer

T2 - a method to identify the tissue of origin from the somatic mutations of a tumor specimen

AU - Marquard, Andrea Marion

AU - Birkbak, Nicolai Juul

AU - Thomas, Cecilia Engel

AU - Favero, Francesco

AU - Krzystanek, Marcin

AU - Lefebvre, Celine

AU - Ferté, Charles

AU - Jamal-Hanjani, Mariam

AU - Wilson, Gareth A

AU - Shafi, Seema

AU - Swanton, Charles

AU - André, Fabrice

AU - Szallasi, Zoltan

AU - Eklund, Aron Charles

PY - 2015/10/1

Y1 - 2015/10/1

N2 - BACKGROUND: A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.METHODS: We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.RESULTS: The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.CONCLUSIONS: Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.

AB - BACKGROUND: A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.METHODS: We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.RESULTS: The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.CONCLUSIONS: Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.

KW - Breast Neoplasms/genetics

KW - Databases, Genetic

KW - Female

KW - Genes, Neoplasm

KW - Humans

KW - Lung Neoplasms/genetics

KW - Male

KW - Organ Specificity

KW - Point Mutation

KW - Polymorphism, Single Nucleotide

U2 - 10.1186/s12920-015-0130-0

DO - 10.1186/s12920-015-0130-0

M3 - Journal article

C2 - 26429708

VL - 8

SP - 58

JO - BMC Medical Genomics

JF - BMC Medical Genomics

SN - 1755-8794

ER -

ID: 198231772