How A Look At Your Gmail Reveals The Power Of Metadata
Sometimes you have to give up a little privacy in order to find out how much — or how little — privacy you really have. So I handed over the keys to my Gmail account to Cesar Hidalgo, a professor at the MIT Media Lab and the designer of a program called Immersion.
Immersion looks at the volume and frequency of your email traffic. It looks at what's known as metadata — to/from information. In this sense, it operates similar to the program the National Security Agency uses to collect data on most U.S. phone traffic, as part of its effort to fight terrorism. Neither the NSA nor Immersion can see what you are actually saying to your friends, family and work colleagues. But it's surprising just how much you can see simply by looking at metadata.
A few minutes after looking at my Immersion profile, Hidalgo had my number. Like a fortune teller, he could immediately ferret out my closest relationships.
For example, my correspondence with my girlfriend, Anita, put her right at the center of my Immersion profile. She appears as a big blue circle. By sorting our emails by month and year, Hidalgo could see that we met four years ago. He could see that things started out slowly, and as we became closer, messages flew back and forth at a faster rate.
"It was a little bit timid in the beginning, and then the relationship intensified," Hidalgo said with a wink.
A look at my correspondence with my son showed rather one-sided exchanges that should be familiar to any parent. "For every five emails that you send, he gives you two back," Hidalgo said. Again, a very accurate reading of a father's efforts to stay in touch with a wonderful son who is not the best correspondent. Hidalgo could see all that, based on just a few hundred emails.
The analogy to the NSA program is, of course, not perfect. In some ways, NSA investigators know less than Hidalgo does. The government says investigators cannot see identifying information; they just see connections between anonymous numbers.
My Gmail account, on the other hand, conveniently reveals the screen names of my contacts, and that often helps identify who is male and who is female. Leaks from Edward Snowden tell us that the NSA is able to follow these trails much further than Immersion can. The NSA can pursue three "hops," so it learns not just the name of my contacts, but those of my contacts' contacts, and even my contacts' contacts' contacts.
Hidalgo notes that even without identifying information attached to these communications, the patterns he sees in metadata are unique. Few people have the exact same communications pattern with my girlfriend, my son, and with my close friends. So that information could potentially be used to identify me.
Just imagine how powerful this metadata could be when paired with other forms of metadata: Web-surfing information, credit card purchases, even perhaps cellphone location information, which some regard as the new frontier of metadata.
As many of us now know, our cellphones send out a signal, telling our provider where we are throughout the day. As with phone records, the government has argued that the location information from your cellphone belongs to the telecommunications company you subscribe to. The courts have agreed, saying these records are not subject to Fourth Amendment protections.
Thanks to electronic gadgets and our increasing use of the Internet, the quantity of metadata is exploding. And as Hidalgo's Immersion research shows, our understanding of what that metadata means is also advancing rapidly. The question is whether laws to protect that information are keeping up.
RENEE MONTAGNE, HOST:
More now on the debate over surveillance and privacy. The government says it's merely been collecting our telephone metadata, not actually eavesdropping on our conversations or reading our emails. Metadata are the lists of phone numbers called, when the calls were placed, and how long they lasted. Yesterday on this program we heard why this metadata is not usually considered private. But sometimes metadata, whether seemingly anonymous phone or email logs, can tell you a lot.
NPR's Larry Abramson found this out for himself.
And Larry, you've brought up an image on our computer here in the studio. And what are we looking at? I'll just say right off, it's a bunch of circles, colored different colors with lines, connected by lines. Looks to me, at first glance, like an organizational chart.
LARRY ABRAMSON, BYLINE: Well, in some ways it is. This is actually my life as represented through my Gmail account. This graphic shows all of the emails that I've sent to other people, emails that they've sent to me. And I decided to hand over the keys to this particular Gmail account to a guy named Cesar Hidalgo. He's a professor at the MIT Media Lab and he's devised a program called Immersion.
With one click, Immersion takes your e-mail and turns it into this graphic according to how many emails you've sent to your friends and how many they've sent you. And within second - it was kind of creepy, Renee - Hidalgo was able to tell me a lot about my life. Listen to what he had to say
CESAR HIDALGO: The first thing that comes conspicuously out of the screen here is that there are three people that are very important in your life.
MONTAGNE: And Larry, looking at the screen, I assume those three people are represented by the three biggest balloons on this graphic - two blue and one orange.
ABRAMSON: Exactly. Hidalgo could see that I email a lot with my girlfriend, Anita, my son Seth, and also with a close friend named Scott.
MONTAGNE: But we're just looking at the amount of email traffic here, right? I mean I can see these circles. But I mean, we can't see what you were saying to these people.
ABRAMSON: Well, right. But that's the point. Without having to sort through all of the content of the messages, you can still see really important information about the nature of my relationship with these different people. Cesar Hidalgo looked more closely at my exchanges with my girlfriend, Anita.
HIDALGO: This is a relationship that started around the beginning of 2010. And it was a little bit timid in the beginning and then the relationship intensified...
ABRAMSON: I think that's enough about my relationship with my girlfriend.
ABRAMSON: But by looking at the frequency of our email exchanges, Hidalgo mapped the progress of our relationship pretty accurately. Now, here's another example. He looked at my emails to my son, and this pattern should be familiar to a lot of parents.
HIDALGO: So you see you have sent Seth 488 emails and you have received only 192. So basically for every five emails that you send, he gives you two back.
MONTAGNE: I'm wondering if an analyst might look at that and go, oh, of course, child and parent...
MONTAGNE: But I do see a bunch of green balloons here that are interconnected by a web of lines and they all seem to be linked to your close friend Scott, which I can tell because there's Scott and he's in a very large orange balloon.
ABRAMSON: Right in the middle, right.
MONTAGNE: Yeah. So what does that mean?
ABRAMSON: Well, to me, this is the most interesting part. Immersion brings out groups, networks of people. And when Cesar Hidalgo looked at this, like a trained investigator he instantly guess what that would mean.
HIDALGO: The fact that they form a tightly knit group tells me that probably you guys either went to school together or you participate in sports together or do some sort of activity as a group.
ABRAMSON: Once again, spot on. These guys are part of a kayaking group that I belong to and that my friend Scott introduced me to.
MONTAGNE: So he's honing in on the nature of the relationship just by looking at the to and from information. Although I must say he probably didn't figure out it was a kayaking group.
ABRAMSON: No. He could just see that it was a sports group or something like that because he could see that it was a bunch of guys. And again, it was a wild guess on his part, but it was right. It also shows the power, Renee, of not just looking at my contacts but also looking at my contacts' contacts, because all of these guys were emailing one another in addition to emailing to me.
MONTAGNE: How different would things look if you and your friends were up to no good - plotting a crime, for heaven sakes, or god forbid, a terrorist attack?
ABRAMSON: Well, you know, that's true. You would probably see a similar web of connections between the leaders of any group and its various members. And this is why intelligence officials emphasize they don't just look at collections of phone numbers and draw a conclusion unless they have a suspicion that somebody in these groups is connected to terrorism somehow.
They don't investigate groups of innocent people, they say.
MONTAGNE: So Larry, how particular is this sort of thing? Won't my email patterns look a lot like yours? I mean lots of messages to my friends, that sort of thing.
ABRAMSON: Right. We all have close friends in the center of our networks and no two networks will be identical. But Cesar Hildalgo has looked at lots and lots of patterns and he says these patterns are unique. They're kind of like a bit of personal DNA.
HIDALGO: Obviously there's probably only one person that interacts, first with Anita, then with Seth and then with Scott, you know, and that's you.
MONTAGNE: Bigger question. If this information is so valuable, could it not be argued that it's foolish for law enforcement to ignore all of this metadata?
ABRAMSON: Yeah, it would be foolish. I mean it would be like trying to be a journalist without using the Internet. You'd be ignoring really important information that characterizes modern life. Government investigators say this is the kind of information that we need to get in order to find criminals and terrorists. The challenge for the courts and for Congress is figuring out whether the availability of all of this information about you and me requires new laws and new protections for privacy.
MONTAGNE: That's NPR's national security correspondent Larry Abramson. Thanks very much.
ABRAMSON: All right. Thank you. Transcript provided by NPR, Copyright NPR.