In a major advance in astronomy, scientists announced last month that they had observed two neutron stars colliding, a never-before-seen cosmic event that made headlines the world over — and two UC Merced computer scientists were instrumental in making it happen.
Computer science Professor Florin Rusu and fourth-year graduate student Weijie Zhao, coauthors on one of the groundbreaking papers published in Science, see it as a victory for their field as well. The data processing tools they developed enabled astronomers to determine that they were indeed observing something unprecedented.
Astronomers received the first signal of a momentous cosmic event on Aug. 17, when the Laser Interferometer Gravitational-Wave Observatory (LIGO) detected gravitational waves — literal ripples in the fabric of spacetime. That’s when terrestrial observatories began searching for their source.
Ground-based telescopes scoured the skies looking for the luminous bursts of electromagnetic radiation (EMR) that should accompany any cosmic event energetic enough to produce gravitational waves detectable on earth. But in a universe brimming with objects that produce all kinds of EMR, from low-energy radio waves to high-energy gamma rays, how were astronomers able to find what they were looking for?
The answer, it turns out, is databases.
“Astronomy requires databases with huge amounts of data,” Rusu explained. “You have huge telescopes at observatories like Palomar that scan the sky every night. They take 40 to 80 large, high-dimensional images every hour to produce a database of images for any given point in the sky.”
All of this data is assembled into massive catalogs (e.g., the Sloan Digital Sky Survey and Palomar Transient Factory) that provide astronomers with reference images of each point in space. These references serve as a kind of celestial baseline, letting astronomers know what the sky looks like most of the time. When something unusual appears — a “transient,” in astronomical parlance — scientists compare it to the references in their databases and determine whether they’re seeing something new or if it’s just a false alarm.
But astronomers can’t do this manually — there’s too much data. Instead, they rely on database experts like Rusu and Zhao.
“In one night, you can have over 10,000 candidates,” Rusu said. “We try to reduce this number to somewhere between 10 and 100 candidates. Our focus was identifying possible candidates by applying techniques from array databases.”
Arrays are a common form of multidimensional data, and they happen to be the kind of data that astronomers amass in their celestial catalogs. In this case, astronomers needed to compare images across multiple catalogs to eliminate false positives and arrive at a short list of possible candidates for the neutron star collision. To do so, they used techniques that Rusu and Zhao developed in collaboration with Bin Dong, Kesheng Wu and Peter Nugent of Lawrence Berkeley National Lab.
These novel database techniques — the array similarity joint operator and views over arrays — were first described in papers presented at the 2016 and 2017 SIGMOD conference, one of the most highly selective conferences in the field of data management. In essence, these techniques let scientists rapidly compare huge amounts of array data across multiple databases by minimizing data transfer, reducing network congestion and eliminating redundant processing.
“Our techniques let astronomers assess far more candidate images than would otherwise be possible,” Rusu said.
If not for the rapid comparisons that Rusu and Zhao’s database techniques enabled, astronomers might never have found the neutron star collision. But they did, and the astronomical community was able to solve a long-standing mystery about the origin of heavy elements (like gold and platinum) while also gleaning additional insight into gravitational waves. It’s a fundamental advance in astronomy that couldn’t have happened without fundamental advances in computer science.
“For me, it is the astronomical event of a lifetime,” said Daniel Kasen, professor of physics and astronomy at UC Berkeley, who coauthored the paper. “It’s also an incredible moment for the field of scientific computing."
Rusu and Zhao note that their innovative database techniques are broadly applicable. Scientists in a variety of fields, from astronomy to genetics, are likely to find them useful.
“Our technique is a basic part of what led to this discovery,” Rusu said. “But our technique goes beyond any one domain. If you have large multidimensional data sets and you’re trying to figure out what is similar to what, you can use this.”
The research was funded in part by a Department of Energy Early Career Award. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory served as the primary supercomputing facility for the research. Additional media coverage of the discovery can be found in the New York Times, Washington Post, Scientific American and Wired.