Lung Ca and Ethnicity: Why Genomics Needs to Step Up

Research reveals insufficient samples from racial minorities to detect moderately common genomic alterations

The current ethnic demography of the U.S. gives truth to the melting pot metaphor: 61.3% white, 17.8% Hispanic, 13.3% black, 5.7% Asian, and 1.5% Native peoples. Cancer researchers have begun evaluating this demographic breakdown against the available genomic data for carcinomas.

For instance, there’s The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute and the National Human Genome Research Institute (NHGRI) to create multi-dimensional maps of the key genomic changes in 33 types of cancer.

More than 11,000 cancer patients have contributed biospecimens for genomic sequencing and analysis to TCGA, with upwards of 500 samples analyzed for each tumor type, including lung cancer. These large datasets are needed to provide statistical power to produce a comprehensive genomic profile of each cancer. A large sample size is also necessary to provide the power to detect mutations against the background rate.

“TCGA project has uncovered numerous uncommon subtypes and mutations across multiple cancer types, and these results are being used to develop new therapies and ultimately improve outcomes for patients with cancer,” noted Joseph Osborne, MD, PhD, of Memorial Sloan Kettering Cancer Center in New York City, and colleagues.

But there is a weakness of the data used to study cancer genomics, and that’s an imbalance in the representation of the various ethnic groups.

Alex Adjei, MD, PhD, editor-in-chief of the Journal of Thoracic Oncology, noted: “The genomic revolution has led to the sequencing of lung cancer specimens in large consortia such as TCGA in the United States, that have provided useful genomic information to drive therapeutic as well as other research in lung cancer. However, tumors from under-represented minorities in the United States such as blacks and Hispanics were under-represented in these samples. This has meant that the mutational profiles of lung cancer from these populations are not accurately documented.”

Researchers have tackled this disparity by taking a closer look at how databases like TCGA and others can increase the sample size to better represent the population. As Osborne’s group remarked, “Without adequate representation of racial minorities within massive sequencing efforts, healthcare disparities may inadvertently be increased, because race-specific mutational patterns are unable to be appreciated.”

‘Avoid Widening the Gap’

“It is probable, but poorly understood, that ethnic diversity is related to the pathogenesis of cancer, and may have an impact on the generalizability of findings from TCGA to racial minorities,” Osborne et al continued. “Despite the important benefits that continue to be gained from genomic sequencing, dedicated efforts are needed to avoid widening the already pervasive gap in healthcare disparities.”

The team reviewed ethnic data in TCGA from 5,729 samples in 10 of the 33 available tumor types, including lung adenocarcinoma and lung squamous cell carcinoma. They used the estimated median somatic mutational frequency for each tumor type by racial ethnicity to calculate the samples needed beyond TCGA to detect a 5% and 10% mutational frequency over the background somatic mutation frequency.

For patients of white ethnicity, TCGA is very powerful, the authors said. All tumor types from white patients contained enough samples to detect a 10% mutational frequency. Of the 5,729 samples analyzed by the team 77% (4,389) came from white patients — an overrepresentation of white patients compared with their percentage of the U.S. population, they pointed out.

“This is in contrast to all other racial ethnicities, for which group-specific mutations with 10% frequency would be detectable only for black patients with breast cancer. Group-specific mutations with 5% frequency would be undetectable in any racial minority, but detectable in white patients for all cancer types except lung (adenocarcinoma and squamous cell carcinoma) and colon cancer.”

The median somatic mutation frequency (per Mb) was 8.1 for lung adenocarcinoma and 9.9 for lung squamous cell carcinoma.

Black ethnicity comprised 12% (660) of patients, Asian were 3% (173), Hispanic made up 3% (149), and less than 0.5% combined were from Native Peoples of the 5,729 TCGA samples analyzed.

“As we demonstrate, despite the approximately proportional relative sample size of many demographic minorities within TCGA when compared with the U.S. population, the absolute sample size of these minorities is inadequate to capture even relatively common somatic mutations that are specific to those groups,” the authors wrote. “Still, TCGA can be commended for their enrollment of racial minorities that has been far more successful than many clinical trial efforts.”

The investigators cited non-small cell lung cancer (NSCLC) and the epidermal growth factor receptor (EGFR) mutation as an example of a carcinoma where ethnicity-specific data made a difference. The phase III ISEL trial failed to show a benefit of treatment with gefitinib (Iressa) in a predominantly white cohort. But there was a significant overall survival benefit in Asian patients.

“These observations are explained by the PIONEER study, a multinational epidemiologic prospective study that demonstrated that EGFR mutations are present in 51.4% of stage IIIB or IV lung adenocarcinomas among Asian patients, in contrast to approximately 20% in white and African-American patients,” the researchers said. “Given the potential for disparate tumor biology by race, we must critically evaluate the generalizability of new discoveries to all patients.”

NSCLC and Hispanics

Giuseppe Giaccone, MD, PhD, co-leader of the Experimental Therapeutics Program at the Lombardi Comprehensive Cancer Center of Georgetown University Medical Center in Washington, D.C., and colleagues sought to narrow the Hispanic cancer genomic knowledge gap by assessing EGFR mutations (exons 18-21) among NSCLC patients at seven institutions in the U.S. and Latin America.

Samples were obtained from 642 patients; 75% (480) of the samples had EGFR mutation analysis successfully performed. The ethnic breakdown of the samples was:

  • 66% (318) non-Latino whites
  • 19% (90) Latino
  • 7% (35) non-Latino Asians
  • 6% (30) non-Latino blacks
  • 2% other races/ethnicities

EGFR mutations were found in 23% (21) of the Latino cohort, with varying frequencies according to the country of origin. Latinos from Peru demonstrated the highest frequency at 37%, followed by the U.S. at 23%, Mexico at 18%, Venezuela at 10%, and Bolivia at 8%.

The researchers found a significant difference in the frequency of EGFR mutations among the different racial and ethnic subgroups analyzed (P < 0.001), with non-Latino Asians having the highest frequency at 57%, followed by Latinos at 23%, non-Latino whites at 19%, and non-Latino blacks at 10%. Patients from Peru had an overall higher frequency of mutations (37%) than all other Latinos (17%), but this difference exhibited only a trend toward significance (P = 0.058).

There were two significant study limitations, the authors said: First, Latino patient enrollment in the U.S. was low (30 patients, 7%) despite a study protocol specifically targeted toward Latino enrollment. In addition, although several large Latin American cancer centers participated, they also had low Latino enrollment.

“This problem highlights the significant difficulties of research collaborations with developing countries in which resource constraints, logistic, and legal challenges may significantly affect enrollment,” the authors stated.

Second, the study did not account for the significant racial and genetic differences within the Latino population. The authors did not collect information on race in the Latino population, nor did they perform genetic ancestry analyses or germline ancestry informative markers that could characterize genetic origin within admixed populations.

“It is possible that we may have primarily sampled a subset of Latino patients with NSCLC, such as the Latino white population. Because this population is similar, in terms of genetic ancestry, to the non-Latino white population in the U.S., this may have obscured a potential difference in EGFR mutation frequency between the two groups.”

The authors acknowledged that Latinos with Native peoples ancestry are of special interest, given that they represent the majority population in Mexico, Central America, and parts of South America (such as Peru), and Latinos from these geographic areas comprise the largest subgroup of Latinos in the U.S.

Citing the high frequency (37%) of EGFR mutations found in Peru, the investigators said they believe this may be an indication there may be a higher EGFR mutation frequency among Latinos defined by a high Native Peoples ancestry. Or, it may be related to a sampling of the Peruvian population of Chinese and Japanese descent, which is among the largest in Latin America.

“Although we did not observe a difference in the frequency of EGFR mutations between Latinos and non-Latinos, our results should be interpreted with caution, given the significant limitations of the study,” the researchers wrote.

Latino Lung Registry

In an effort to address the lack of genomic data from Hispanic/Latino patients with lung cancer, the Latino Lung Cancer Registry was recently established. It is a multinational effort among the University of South Florida in Tampa; Ponce Health Sciences University in Ponce, Puerto Rico; and Universidad Peruana Cayetano Heredia in Lima, Peru.

The registry currently has NSCLC tumor samples from 163 Hispanic/Latino patients. The ethnic background of the Hispanic/Latino patients in the registry is reported as 67% European, 21% Native peoples, and 12% African. Patients are clustered into ancestral groups on the basis of ancestry informative marker analyses.

In another study, Nicholas Gimbrone, of Lee Moffitt Cancer Center and Research Institute in Tampa, FL, and colleagues from the Latino Lung Cancer Registry performed targeted exome sequencing of the registry samples to determine how ethnicity may affect the genetic aberrations found in NSCLC. The mutation frequencies were compared with those in a similar cohort of non-Hispanic white (NHW) patients. The adenocarcinomas (120) in the Hispanic/Latino group had EGFR mutations in 31% versus 17% in the NHW group (P<0.001).

“Our data suggest that the increase in EGFR mutations within our [Hispanic/Latino] cohort is driven by females, with 48% having EGFR mutations,” the authors wrote.

In addition, they said, the data suggests that relative to European ancestry, Native peoples ancestry correlates with low rates of tumor protein p53 (TP53) and serine/threonine kinase 11 (STK11) mutations and high rates of EGFR mutations and that African ancestry correlates to low rates of KRAS mutations.

“This observation may point to a connection to a genetic component from Asia-Pacific migration, because it is known that the EGFR mutation rate is high among Asian patients,” the authors stated.

Study limitations were the various tissue sources and sequencing technologies utilized by the participating institutions; the incomplete clinical data for several samples; and the lack of age, tumor stage, and outcome data for all patients. “Substantial variation in the distribution of sex, smoking history, and ancestry is evident between the three Latino cohorts included in the registry, indicating the potential complexity in untangling the factors contributing to driver mutation frequencies,” stated Ann Schwartz, PhD, and Donovan Watza, both from the Barbara Ann Karmanos Cancer Institute in Detroit, in an accompanying editorial.

They noted that the formation of the Latino Lung Cancer Registry represents progress in addressing the dearth of genomic cancer data in the Hispanic/Latino population, but cautioned that additional funding, enrollment, and data collection will be necessary for this registry to reach maturity.

The First Book To Be Encoded in DNA.

Two Harvard scientists have produced 70 billion copies of a book in DNA code –and it’s smaller than the size of your thumbnail.
Despite the fact there are 70 billion copies of it in existence, very few people have actually read the book Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA, by George Church and Ed Regis. The reason? It is written in the basic building blocks of life: Deoxyribonucleic acid, or DNA.

Church, along with his colleague Sriram Kosuri, both molecular geneticists from the Wyss Institute for Biomedical Engineering at Harvard, used the book to demonstrate a breakthrough in DNA data storage. By copying the 53,000 word book (alongside 11 jpeg images and a computer program) they’ve managed to squeeze a thousand times more data than ever previously encoded into strands of DNA, as reported in the August 17 issue of the journal Science. (To give you some idea of how much information we’re talking about, 70 billion copies is more than three times the total number of copies for the next 200 most popular books in the world combined.)

Part of DNA’s genius is just how conspicuously small it is: so dense and energy efficient that one gram of the stuff can hold 455 billion gigabytes. Four grams could in theory hold ever scrap of data the entire world produces in a year. Couple this with a theoretical lifespan of 3.5 billion years and you have a revolution in data storage, with wide ranging implications for the amount of information we could record and store.

Don’t expect your library to transform from paperbacks to vials of DNA anytime soon though. “It took a decade to work out the next generation of reading and writing of DNA – I’ve been working on reading for 38 years, and writing since the 90s,” Church tells TIME.

The actual work of encoding the book into DNA and then decoding it and copying it only took a couple weeks. “I did it with my own two hands!” says Dr. Church, “which is very rare to have that kind of time to spend doing something like this.” Church and Kosuri took a computer file of Regenesis and converted it into binary code — strings of ones and zeroes. They then translated that code into the basic building blocks of DNA. “The 1s stand for adenine (A) or cytosine (C) and the zero for guanine (G) and thymine (T),” says Kosuri.  Using a computer program, this translation was simple.

While the future implications and applications are not yet clear, the DNA storage industry is moving at an incredible speed. “Classical electronic technology is moving forward something like 1.5 fold per year,” says Dr. Church, “whereas reading and writing DNA is improving roughly ten fold per year. We’ve already had a million-fold improvement in the past few years, which is shocking.”

Given that the genomics field has attracted its fair share of criticism — witness, for example, the firestorm that greeted biologist Craig Venter and his colleagues when they created the first synthetic cell in 2010 — there are ethical questions to address. Dr. Church and co-author Ed Regis have decided not to include a DNA insert of the book with the actual paper copy when it comes out in October because of this sensitivity.

“We’re always trying to think proactively about the ethical, social and economic implications in this line of work,” says Dr. Church. He explains that the risks are relatively small, but both he and Dr. Kosuri mention that if it is possible to encode a book using DNA encode, it is also theoretically possible to encode a virus–though this would be a far-fetched scenario.

“The chances that something bad will come out of this is so small,” says Dr. Kosuri. “If someone really nefarious wanted to make a virus they would have to use a much larger chunk of DNA to encode function.”

Why make 70 billion copies of the book? “Oh that was a bit of fun,” says Dr. Church. “We calculated the total copies of the top 200 books of all time, including A Tale of Two Cities and the Bible and so on, and they add up to about 20 billion. We figured we needed to go well beyond that.”

Source: Time

Read more: