Machine Learning Methylation Profiles Accurately Identify Tissue of Origin in Metastatic Cancers of Unknown Primary

Kindai University researchers use machine learning and DNA methylation to predict cancer origins with 95% accuracy. A potential life-saver for CUP patients.

By: AXL Media

Published: Apr 21, 2026, 3:59 AM EDT

Source: Information for this report was sourced from American Association for Cancer Research

Machine Learning Methylation Profiles Accurately Identify Tissue of Origin in Metastatic Cancers of Unknown Primary - article image
Machine Learning Methylation Profiles Accurately Identify Tissue of Origin in Metastatic Cancers of Unknown Primary - article image

Solving the Diagnostic Crisis of Unknown Primaries

Cancers of Unknown Primary (CUP) represent a significant clinical challenge, as these metastatic malignancies are identified only after they have spread, with the original tumor site remaining elusive. Patients diagnosed with CUP typically face poor prognoses, often surviving only six to nine months, as they are frequently treated with broad, nonspecific chemotherapy. According to Dr. Marco A. De Velasco of Kindai University, only a small fraction of these patients receive site specific therapies. The development of a molecular diagnostic tool to pinpoint the origin of these cells is critical, as targeted treatment can extend patient survival to 24 months or more.

DNA Methylation as a Molecular Fingerprint

The research team focused on CpG DNA methylation, a chemical modification involving cytosine and guanine bases that remains remarkably stable even as cancer cells migrate throughout the body. These methylation patterns act as a biological "fingerprint" unique to the tissue where the cancer first developed. By leveraging these tissue specific markers, the researchers aimed to bridge the gap in current molecular profiling, which has historically struggled to translate complex genomic data into clear survival benefits within a clinical setting.

Streamlining Prediction with Machine Learning

In a major departure from previous models that required massive, unwieldy datasets, De Velasco’s team utilized machine learning to distill hundreds of thousands of genomic regions into a practical subset of approximately 1,000 CpG markers. This streamlined approach was trained using data from nearly 7,500 patients across 21 cancer types, sourced from The Cancer Genome Atlas. The resulting model demonstrated a 95% accuracy rate in the initial test cohort and maintained an 87% accuracy rate when applied to an independent validation group from the researchers' own institution, covering 17 distinct cancer types.

Categories

Topics

Related Coverage