Until recently, topology was seen as being among the most abstract fields of mathematics, one that bore out Henry John Stephen Smith’s 19th century toast: “Pure mathematics — may it never be of use to anyone!” But now the field, which deals with the shape of many-dimensional objects, has unexpectedly proved its usefulness in, of all places, medicine. Researchers have used topology to discover a new subgroup of breast cancer patients with a 100 percent survival rate. More generally, the method may prove powerful for making sense of the massive, high-dimensional, noisy datasets modern science is producing.
Genetics experiments can produce vast quantities of data — determining the activity of each of the approximately 20,000 genes in a sample of breast cancer tissue, for example. Each sample can be seen as a point in 20,000-dimensional space. But these readings aren’t absolutely accurate, so each point may not be in exactly the right place. That makes plucking information out of that sea of data particularly challenging.
One key is to recognize that “data has shape, and that shape matters,” says mathematician Gunnar Carlsson of Stanford University.
Topology turns out to be especially useful for identifying the shape of noisy data, because it characterizes shapes in a flexible, qualitative way. Squish, twist or enlarge an object and topology will consider it unchanged, as long as you don’t punch holes or glue bits together. So from the perspective of topology, a coffee cup and a doughnut have the same shape: By squishing the cup down, the handle turns into a doughnutlike ring. This qualitative understanding turns out to deal perfectly with the noisiness of data sets, since the precise location of data points doesn’t matter.
To show the power of topological methods for data analysis, Carlsson and his colleagues Monica Nicolau of Stanford and Arnold Levine of the Institute for Advanced Study in Princeton, N.J., analyzed gene activity data from about 300 Dutch breast cancer patients.
To turn the discrete data points into a surface that topology could analyze, the researchers calculated how different each breast cancer sample was from normal tissue and decreed two data points to be close to one another if they had a similar degree of difference from the normal tissue. The scientists then “fattened up” the data points to form a surface by essentially considering all the points within a certain distance from the existing data points to be within the surface.
The next step was to understand the shape of this 20,000-dimensional surface. Carlsson notes that the crucial details probably fit in many fewer than 20,000 dimensions. A cylinder, for example, lives in three dimensions, but since it can be squished flat it’s topologically equivalent to a circle, which lives in only two. Carlsson’s team created a version of the data in two dimensions that captured essential aspects of the data’s shape, if not every detail.
The resulting shape looked like a Y, with normal patients at the bottom. The right-hand flare consisted of known subgroups of patients with mostly poor prognoses, but the left-hand flare consisted of patients who had not been previously identified as a coherent subgroup.
To see what these patients had in common biologically, Nicolau checked their survival rates. She was shocked: “I saw the best survival curve I’ve seen in my entire life.” Eight percent of the patients fell into the newly identified group, and not one of them had died from their cancer in the 10 years they’d been followed. Further analysis showed that gene activity patterns among these patients were extremely similar, suggesting that the same gene had been mutated in each case. Applying the same analysis to two more groups comprising 134 women yielded the same result.
The impact on breast cancer treatment is not yet clear. Most of the patients in the new subgroup were already known to have good prognoses, so they were unlikely to receive aggressive treatment. Further studies would be needed to know if these patients would do just as well with no treatment. A private company that Carlsson cofounded, Ayasdi Inc., is working to bring the result to clinical practice. The scientists are also working to identify subgroups of leukemia patients with the hope of understanding which treatments are appropriate for which patients.
Since publishing the breast cancer work in the Proceedings of the National Academy of Sciences in April, the researchers have applied their method to many other datasets. Among other things, they tracked the fate of the oil plume after the Gulf of Mexico spill, the disappearance of moderate votes in Congress in 2009–2010 and the number of functional positions that basketball players assume on the court.