While students share movies and music, researchers are sharing cancer DNA.
The Cancer Genomics Hub (CGHub) is a new online repository for cancer genomic sequences. Released by UC Santa Cruz professor David Haussler, CGHub will provide access to invaluable information on an unprecedented scale to cancer researchers everywhere.
“This is a time of blossoming, of great creativity in computational genomics,” Haussler said.
CGHub is a database of basic genome, or DNA, sequences from tumorous cancers collected both nationally and internationally. It is not designed to analyze or extrapolate information, Haussler said. Rather, it will serve as a foundation for programs that will, and do, use it for looking at cancer in ways that have been impossible until now.
One in two people born today will be diagnosed with cancer, according to the National Cancer Institute. It is the second leading cause of death in the United States after heart disease, according to the Center for Disease Control, and it is notoriously difficult to fight.
That’s because every case is different, Haussler said.
“Cancer is a disease that is entirely caused by changes in the DNA,” Haussler said. “Certain cells in your body accumulate changes in their genome and it causes them to grow uncontrollably, and that is cancer.”
This makes computational genomics, or the analysis of DNA with a computer, especially important. Databases like CGHub enable huge strides in analyzing the differences and similarities between cancers, Haussler said.
But it’s difficult to make sense of just raw data. That’s where researchers like UCSC’s Josh Stuart come in.
Stuart is one of many researchers looking at the data and examining what’s useful.
“Each one of these 500 samples is like a new car crash. There’s 500 car crashes … and you have some data that you’re given: velocity, what people were saying in the car, location parameters, instrument readings, how much fuel they were burning, that kind of data,” Stuart said. “Somebody drops that ream of data on your desk and asks you, ‘So what went wrong? Why did the cars crash?’ Some of [the data] is relevant, some of it isn’t. You want to figure out what makes them the same —what’s the common theme.”
But such comparisons are no walk in the park. CGHub is currently designed to hold 15,000 patients, said Robert Zimmerman, CGHub’s program director, and possibly 20,000 patients in the future. That’s a lot of data to cover.
“If you’ve ever tried downloading a movie, a 3D Blu-ray movie would be 25-30 gigabytes,” Zimmerman said. “[A] single genome for a single patient would be roughly 10-20 times that size.”
But, with growing interest in the research and databases like CGHub, Haussler doesn’t predict any shortage of new scientists willing to analyze the data.
“There are young geeks coming up through the ranks who are attracted to this, and they want to to do something meaningful,” Haussler said. “They want to turn their talents to a really challenging, really important problem, and so cancer genomics is attracting them.”
Stuart and Zimmerman said a database similar to CGHub was originally part of the federal government: the Short Read Archive (SRA), run by the National Center for Biotechnology Information (NCBI). But the SRA was running out of room to store the massive amounts of data and the NCBI needed money to expand — money they were not granted.
The SRA had already collected substantial amounts of data, and a call went out for someone to take the SRA’s data and continue what NCBI could not.
Having already created the UCSC Genome Browser and the Cancer Genomics Browser, Haussler and his team took on the task.
“David’s a hero,” Stuart said. “He’s doing this out of the kindness of his heart.”
But for Haussler and his team, creating CGHub was no easy process.
“We designed it from scratch, and the system’s challenges were exceptional,” Haussler said. “This was, I would say, among the most difficult projects I’ve been involved in in my career.”
With so much data being used, Haussler said big results and discoveries are on the horizon for cancer research.
“I would say to the readers,” Haussler said, “watch Nature and other scientific journals for some exciting results over the next several months.”