Research reveals insufficient samples from racial minorities to detect moderately common genomic alterations
The current ethnic demography of the U.S. gives truth to the melting pot metaphor: 61.3% white, 17.8% Hispanic, 13.3% black, 5.7% Asian, and 1.5% Native peoples. Cancer researchers have begun evaluating this demographic breakdown against the available genomic data for carcinomas.
For instance, there’s The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute and the National Human Genome Research Institute (NHGRI) to create multi-dimensional maps of the key genomic changes in 33 types of cancer.
More than 11,000 cancer patients have contributed biospecimens for genomic sequencing and analysis to TCGA, with upwards of 500 samples analyzed for each tumor type, including lung cancer. These large datasets are needed to provide statistical power to produce a comprehensive genomic profile of each cancer. A large sample size is also necessary to provide the power to detect mutations against the background rate.
“TCGA project has uncovered numerous uncommon subtypes and mutations across multiple cancer types, and these results are being used to develop new therapies and ultimately improve outcomes for patients with cancer,” noted Joseph Osborne, MD, PhD, of Memorial Sloan Kettering Cancer Center in New York City, and colleagues.
But there is a weakness of the data used to study cancer genomics, and that’s an imbalance in the representation of the various ethnic groups.
Alex Adjei, MD, PhD, editor-in-chief of the Journal of Thoracic Oncology, noted: “The genomic revolution has led to the sequencing of lung cancer specimens in large consortia such as TCGA in the United States, that have provided useful genomic information to drive therapeutic as well as other research in lung cancer. However, tumors from under-represented minorities in the United States such as blacks and Hispanics were under-represented in these samples. This has meant that the mutational profiles of lung cancer from these populations are not accurately documented.”
Researchers have tackled this disparity by taking a closer look at how databases like TCGA and others can increase the sample size to better represent the population. As Osborne’s group remarked, “Without adequate representation of racial minorities within massive sequencing efforts, healthcare disparities may inadvertently be increased, because race-specific mutational patterns are unable to be appreciated.”
‘Avoid Widening the Gap’
“It is probable, but poorly understood, that ethnic diversity is related to the pathogenesis of cancer, and may have an impact on the generalizability of findings from TCGA to racial minorities,” Osborne et al continued. “Despite the important benefits that continue to be gained from genomic sequencing, dedicated efforts are needed to avoid widening the already pervasive gap in healthcare disparities.”
The team reviewed ethnic data in TCGA from 5,729 samples in 10 of the 33 available tumor types, including lung adenocarcinoma and lung squamous cell carcinoma. They used the estimated median somatic mutational frequency for each tumor type by racial ethnicity to calculate the samples needed beyond TCGA to detect a 5% and 10% mutational frequency over the background somatic mutation frequency.
For patients of white ethnicity, TCGA is very powerful, the authors said. All tumor types from white patients contained enough samples to detect a 10% mutational frequency. Of the 5,729 samples analyzed by the team 77% (4,389) came from white patients — an overrepresentation of white patients compared with their percentage of the U.S. population, they pointed out.
“This is in contrast to all other racial ethnicities, for which group-specific mutations with 10% frequency would be detectable only for black patients with breast cancer. Group-specific mutations with 5% frequency would be undetectable in any racial minority, but detectable in white patients for all cancer types except lung (adenocarcinoma and squamous cell carcinoma) and colon cancer.”
The median somatic mutation frequency (per Mb) was 8.1 for lung adenocarcinoma and 9.9 for lung squamous cell carcinoma.
Black ethnicity comprised 12% (660) of patients, Asian were 3% (173), Hispanic made up 3% (149), and less than 0.5% combined were from Native Peoples of the 5,729 TCGA samples analyzed.
“As we demonstrate, despite the approximately proportional relative sample size of many demographic minorities within TCGA when compared with the U.S. population, the absolute sample size of these minorities is inadequate to capture even relatively common somatic mutations that are specific to those groups,” the authors wrote. “Still, TCGA can be commended for their enrollment of racial minorities that has been far more successful than many clinical trial efforts.”
The investigators cited non-small cell lung cancer (NSCLC) and the epidermal growth factor receptor (EGFR) mutation as an example of a carcinoma where ethnicity-specific data made a difference. The phase III ISEL trial failed to show a benefit of treatment with gefitinib (Iressa) in a predominantly white cohort. But there was a significant overall survival benefit in Asian patients.
“These observations are explained by the PIONEER study, a multinational epidemiologic prospective study that demonstrated that EGFR mutations are present in 51.4% of stage IIIB or IV lung adenocarcinomas among Asian patients, in contrast to approximately 20% in white and African-American patients,” the researchers said. “Given the potential for disparate tumor biology by race, we must critically evaluate the generalizability of new discoveries to all patients.”
NSCLC and Hispanics
Giuseppe Giaccone, MD, PhD, co-leader of the Experimental Therapeutics Program at the Lombardi Comprehensive Cancer Center of Georgetown University Medical Center in Washington, D.C., and colleagues sought to narrow the Hispanic cancer genomic knowledge gap by assessing EGFR mutations (exons 18-21) among NSCLC patients at seven institutions in the U.S. and Latin America.
Samples were obtained from 642 patients; 75% (480) of the samples had EGFR mutation analysis successfully performed. The ethnic breakdown of the samples was:
- 66% (318) non-Latino whites
- 19% (90) Latino
- 7% (35) non-Latino Asians
- 6% (30) non-Latino blacks
- 2% other races/ethnicities
EGFR mutations were found in 23% (21) of the Latino cohort, with varying frequencies according to the country of origin. Latinos from Peru demonstrated the highest frequency at 37%, followed by the U.S. at 23%, Mexico at 18%, Venezuela at 10%, and Bolivia at 8%.
The researchers found a significant difference in the frequency of EGFR mutations among the different racial and ethnic subgroups analyzed (P < 0.001), with non-Latino Asians having the highest frequency at 57%, followed by Latinos at 23%, non-Latino whites at 19%, and non-Latino blacks at 10%. Patients from Peru had an overall higher frequency of mutations (37%) than all other Latinos (17%), but this difference exhibited only a trend toward significance (P = 0.058).
There were two significant study limitations, the authors said: First, Latino patient enrollment in the U.S. was low (30 patients, 7%) despite a study protocol specifically targeted toward Latino enrollment. In addition, although several large Latin American cancer centers participated, they also had low Latino enrollment.
“This problem highlights the significant difficulties of research collaborations with developing countries in which resource constraints, logistic, and legal challenges may significantly affect enrollment,” the authors stated.
Second, the study did not account for the significant racial and genetic differences within the Latino population. The authors did not collect information on race in the Latino population, nor did they perform genetic ancestry analyses or germline ancestry informative markers that could characterize genetic origin within admixed populations.
“It is possible that we may have primarily sampled a subset of Latino patients with NSCLC, such as the Latino white population. Because this population is similar, in terms of genetic ancestry, to the non-Latino white population in the U.S., this may have obscured a potential difference in EGFR mutation frequency between the two groups.”
The authors acknowledged that Latinos with Native peoples ancestry are of special interest, given that they represent the majority population in Mexico, Central America, and parts of South America (such as Peru), and Latinos from these geographic areas comprise the largest subgroup of Latinos in the U.S.
Citing the high frequency (37%) of EGFR mutations found in Peru, the investigators said they believe this may be an indication there may be a higher EGFR mutation frequency among Latinos defined by a high Native Peoples ancestry. Or, it may be related to a sampling of the Peruvian population of Chinese and Japanese descent, which is among the largest in Latin America.
“Although we did not observe a difference in the frequency of EGFR mutations between Latinos and non-Latinos, our results should be interpreted with caution, given the significant limitations of the study,” the researchers wrote.
Latino Lung Registry
In an effort to address the lack of genomic data from Hispanic/Latino patients with lung cancer, the Latino Lung Cancer Registry was recently established. It is a multinational effort among the University of South Florida in Tampa; Ponce Health Sciences University in Ponce, Puerto Rico; and Universidad Peruana Cayetano Heredia in Lima, Peru.
The registry currently has NSCLC tumor samples from 163 Hispanic/Latino patients. The ethnic background of the Hispanic/Latino patients in the registry is reported as 67% European, 21% Native peoples, and 12% African. Patients are clustered into ancestral groups on the basis of ancestry informative marker analyses.
In another study, Nicholas Gimbrone, of Lee Moffitt Cancer Center and Research Institute in Tampa, FL, and colleagues from the Latino Lung Cancer Registry performed targeted exome sequencing of the registry samples to determine how ethnicity may affect the genetic aberrations found in NSCLC. The mutation frequencies were compared with those in a similar cohort of non-Hispanic white (NHW) patients. The adenocarcinomas (120) in the Hispanic/Latino group had EGFR mutations in 31% versus 17% in the NHW group (P<0.001).
“Our data suggest that the increase in EGFR mutations within our [Hispanic/Latino] cohort is driven by females, with 48% having EGFR mutations,” the authors wrote.
In addition, they said, the data suggests that relative to European ancestry, Native peoples ancestry correlates with low rates of tumor protein p53 (TP53) and serine/threonine kinase 11 (STK11) mutations and high rates of EGFR mutations and that African ancestry correlates to low rates of KRAS mutations.
“This observation may point to a connection to a genetic component from Asia-Pacific migration, because it is known that the EGFR mutation rate is high among Asian patients,” the authors stated.
Study limitations were the various tissue sources and sequencing technologies utilized by the participating institutions; the incomplete clinical data for several samples; and the lack of age, tumor stage, and outcome data for all patients. “Substantial variation in the distribution of sex, smoking history, and ancestry is evident between the three Latino cohorts included in the registry, indicating the potential complexity in untangling the factors contributing to driver mutation frequencies,” stated Ann Schwartz, PhD, and Donovan Watza, both from the Barbara Ann Karmanos Cancer Institute in Detroit, in an accompanying editorial.
They noted that the formation of the Latino Lung Cancer Registry represents progress in addressing the dearth of genomic cancer data in the Hispanic/Latino population, but cautioned that additional funding, enrollment, and data collection will be necessary for this registry to reach maturity.