February 13, 2016

How NSA contact chaining combines domestic and foreign phone records

(Updated: September 18, 2017)

In the previous posting we saw that the domestic telephone records, which NSA collected under authority of Section 215 of the USA PATRIOT Act (internally referred to as BR-FISA), were stored in the centralized contact chaining system MAINWAY, which also contains all kinds of metadata collected overseas.

Here we will take a step-by-step look at what NSA analysts do with these data in order to find yet unknown conspirators of foreign terrorist organisations.

It becomes clear that the initial contact chaining is followed by various analysis methods, and that the domestic metadata are largely integrated with the foreign ones, something NSA never talked about and which only very few observers noticed.

What is described here is the situation until the end of 2015. The current practice under the USA FREEDOM Act differs in various ways. The information in this article is almost completely derived from documents declassified by the US government, but these have various parts redacted.


 

RAS-approval

As a seed for starting a contact chain, NSA analysts can take a telephone identifier like a phone number (also called a selector), based upon:
- their own ongoing analysis on an existing target set;
- a Request for Information (RFI) from another government agency;
- a notification of a match between a known counterterrorism-related selector and an identifier among newly ingested phone metadata.

Access to the domestic phone records was granted to about 125 intelligence analysts from the Homeland Security Analysis Center (HSAC, or S2I4) of the NSA's Signals Intelligence Directorate. There were also up to 22 specially trained officials called Homeland Mission Coordinators or HMCs (initially shift coordinators).

As required by the FISA Court orders, only these HMCs, the chief and the deputy chief of the HSAC are allowed to determine that there is a Reasonable, Articulable Suspicion (RAS) that a certain selector is associated with a designated foreign terrorism group and/or Iran. Such a RAS-approval is only needed for the domestic phone records, not the ones collected overseas.

NSA has a special RAS Identifier Management System to streamline the adjudication of the requests for RAS approval and the documentation thereof. The codename of this system is IRONMAN, as we learn from this document from a declassified 2011 training presentation (.pdf) in which this codeword wasn't redacted twice:



A RAS-approval is effective for one year, meaning that during the next year, repeated queries using the approved seed selector can be made. If the selector is reasonably believed to be used by a US person, the approval period is 6 months.

The number of RAS-approved identifiers varied substantially over the years, but in 2012, there were fewer than 300. According to the annual Transparancy Report from the Director of National Intelligence (DNI), there were 423 such selectors in 2013, but just 161 in 2014. It's not known how many of these belonged to Americans.
 


Different kinds of queries

From various declassified documents analysed in an article on the weblog EmptyWheel, it becomes clear that there are three different kinds of queries that NSA analysts conducted on the domestic phone records database:
1. Queries for data integrity purposes
2. Queries for "Ident lookups"
3. Queries for contact chaining

In the EmptyWheel article it's assumed that besides these queries, NSA also conducted some kind of pattern analysis: in many declassified documents a redaction appears right after the term "contact chaining", which according to EmptyWheel could hide something like "pattern analysis".

Given that in these documents the targets are also redacted, there's also the possibility that the redaction hides a description of the target, like "contact chaining al-Qaida affiliates".

At least one NSA memorandum from 2009 indeed speaks about "chaining and analysis", but there can be two kinds of analysis: one conducted on the bulk of raw metadata records, and another one on selected results of contact chaining.

NSA always denied that it conducts pattern analysis on the bulk metadata themselves, stating that every search begins with a specific telephone number or other specific selection term. So far, there are no indications of the contrary, so the analysis apparently refers to the results of contact chaining queries, which is confirmed by the 2014 report (.pdf) about the Section 215 program by the Privacy and Civil Liberties Oversight Board (PCLOB).

As we will see later on, this second type of analysis is indispensable for making the contact chaining queries useful for foreign intelligence purposes.




(1) Data integrity queries

The first way the domestic phone records were queried was for data integrity purposes. This was done by some 25 specialized Data Integrity Analysts (DIAs). They didn't conduct target analysis, but helped intelligence analysts with questions on a target. For those cases, a DIA could use a standard login (with appropriate controls) to query the phone records for foreign intelligence purposes.

However, when they queried for data integrity purposes, DIAs used a special login that bypassed the normal controls (like EAR) and also the auditing. This because for this task, they were allowed to use identifiers that were not RAS-approved (not allowed though were selectors that had expired because they were not revalidated).

One goal of these data integrity queries was to discover selectors that, for reasons that were redacted in the review report, should not become part of analysis, both for BR FISA and other purposes. These selectors could then be added to a defeat list of identifiers that were deemed to be of little analytic value, and/or to a database holding those that should not be tasked onto the collection system.

There was of course a risk of mixing up these tasks, and after an expired identifier had been queried in March 2010, the NSA Inspector General recommended that the duties of DIAs and foreign intelligence analysts should be clearly separated.


(2) Ident lookup queries

A second kind of query was for so-called "ident lookup". According to an NSA Inspector General test report (.pdf) from April 2010, this refers to:
"querying a selector using [tool name redacted] to determine the approval status of a selector. In such cases, the Emphatic Access Restriction controls will prevent chaining of a selector that is not marked as approved for querying, and return an error message to the analyst. Because the selector was not actually chained, there is no violation of the Order"

Emphatic Access Restriction (EAR, pronounced as "ear") is a tool that was installed at the MAINWAY database in February 2009. It automatically prevents using a selector that is not RAS-approved. It seems therefore that when an analyst started a query and the seed selector appeared to be not approved, that query was called an "ident lookup" (although EmptyWheel has a different interpretation).

This could be the way it worked before the IRONMAN system was established, as in a training module from 2011, it is said that by then, analysts just had to "use [tool name redacted] to determine the identifier’s approval status".
 


(3) Contact chaining queries

The most important queries on the domestic phone records were of course those conducted by intelligence analysts in order to "identify unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects".

For this, an analyst took a RAS-approved selector (often a telephone number) and entered it into a specialized metadata tool, which searched the telephone metadata in the MAINWAY contact chaining system. To limit the number of results, the analyst could set a certain timeframe for the query.

The metadata tool then returns "a .cml file, usually referred to as a chain, which is made up of the individual first hop contacts of the seed". Usually, the analyst will also be interested in the second-hop contacts, and then the tool will retrieve the batches of one-hop chains for the identifiers that had been in direct contact with those from the first hop series.



Number of hops

Based upon the FISA Court orders, NSA analysts were also allowed to retrieve the numbers in contact with all the numbers from the second hop, which would make a third hop. The software tools are said to prevent looking beyond the third hop, or performing a query of a selection term that has not been RAS-approved.

The initial authorizations under the President's Surveillance Program (PSP) did not prohibit chaining more than two degrees of separation from the target, but "NSA analysts determined that it was not analytically useful to do so".* When this collection was brought under supervision of the FISA Court, it limited contact chaining to 3 hops.

But despite that authorization, the policy of NSA's Counter Terrorism branch restricted chaining to 2 hops, as can be seen in an NSA training presentation (.pdf) from 2007:


A 2011 training module says that chaining to a third hop is possible, but only after prior approval by the analyst's division management (for example when a contact that comes up with the first hop appears to be an already known suspect).

Strangely enough, both a government white paper and the PCLOB-report don't mention this policy restriction and in the latter it's even assumed that chaining 3 hops was regular practice:
"If a seed number has seventy-five direct contacts, for instance, and each of these first-hop contact has seventy-five new contacts of its own, then each query would provide the government with the complete calling records of 5,625 telephone numbers. And if each of those second-hop numbers has seventy-five new contacts of its own, a single query would result in a batch of calling records involving over 420,000 telephone numbers"

As of 2012, the FISA Court also allowed an automated chaining process in which "the NSA's database periodically performs queries on all RAS-approved seed terms, up to three hops away from the approved seeds. The database places the results of these queries together in a repository called the "corporate store" - the NSA was never able to get that working though (although the PCLOB report, again, describes it as if it was actually implemented).


Visualization

The results from a contact chaining query can be visualized by a contact graph. An example was published by the German magazine Der Spiegel, showing a slide from an NSA presentation with a 2-hop contact graph for the e-mail addresses of the CEO and the chairwoman of the Chinese telecommunications company Huawei:




Domestic and foreign results

Generally, it is said that analysts query the "Section 215 calling records", the "BR metadata" or something similar. This sounds like they only access the domestic telephone records and that therefore the resulting contact chains would fully consist of American phone numbers.

The initial seed number however will often be a foreign number, as the whole purpose of the Section 215 program is to discover connections between foreign terrorists and potential conspirators inside the US. Analysts will therefore choose a seed for which they expect a good chance it has a domestic nexus, which probably explains the low numbers of RAS-approved identifiers.

But as we have seen in the previous article, NSA stored the domestic phone records in MAINWAY, which also contains the foreign telephone and internet metadata collected overseas. That means that a contact chaining query will not only return identifiers from the domestic, but also from the NSA's worldwide metadata collection.


Federated queries

Such results from multiple sources are called federated queries. According to a 2011 training module, BR FISA queries initially only resulted in these federated queries, but in later versions of the query tool, the analyst could also check boxes to conduct an "unfederated" query and choose individual collection sources.

These options can be seen in the following screenshot from the user interface (the codename of which is redacted) used to conduct the contact chaining:


Selecting the "FISABR Mode" makes that an additional checkbox for the EO12333 source appears. An NSA memorandum explains that when this BR FISA option is chosen, the analyst will not only be provided with the domestic telephone metadata, but also with those from the SIGINT realm (which is collection overseas under EO 12333 authority), dating back to late 1998.

When the analyst used a RAS-approved selector, he could also check the box for PENREGISTRY, or PR/TT, which refers to the domestic internet metadata, but the collection thereof was ended by the end of 2011. Normal mode is for all other metadata collected abroad.
Analysts can determine the collection sources of each result by examining the Producer Designator Digraph (PDDG) and/or SIGINT Activity Designator (SIGAD) from each line of the contact chain file. BR FISA metadata can be identified by specific SIGADs.

SPCMA

There's also a fourth box for SPCMA mode, which stands for the "Special Procedures governing Communications Metadata Analysis" from January 2011. These allow contact chaining and other types of analysis on metadata that have already been collected under EO 12333, regardless of nationality and location (because metadata aren't constitutionally protected).

This means that US person identifiers that were in contact with valid foreign intelligence targets may be used for searching these foreign metadata too.

NSA isn't allowed to collect US data overseas, but these do come in "incidentally" when for example foreigners communicate with Americans - precisely the kind of communications that could reveal conspirators inside the US. Many international phone calls from or to the US, will likely be intercepted by NSA collection facilities abroad too.


In other words:
- By default, any contact chaining query will use the foreign metadata collected overseas. For these, any useful selector may be used as a seed, and, under SPCMA, even one that belongs to an American.

- If the seed selector is RAS-approved, then the domestic phone records will be used too, which could lead to the discovery of additional contacts within in the US.

The fact that most contact chains will consist of both foreign and domestic identifiers means that they contain much less American numbers then in calculations like the one from PCLOB, which give the impression that queries resulted in up to 3 hops of domestic numbers.


 


Analysing the contact chains

It should be noted that the phone numbers (or other selectors) which are returned after an initial contact chaining query are anonymous and therefore meaningless. They're just numbers which could belong to anyone: from a pizza delivery to a dangerous conspirator.

So, in order to identify which numbers are of interest for finding unknown suspects, additional analysis is needed - a comprehensive GCHQ book (.pdf) disclosed last week calls contact chaining the start of a "painstaking process of assembling information about a terrorist cell or network".


Analytic tools

In the early years of the President's Surveillance Program (PSP), only the SIGINT Navigator (SIGNAV) tool was available to view the output of the MAINWAY contact chaining system. Later, new tools were created to improve efficiency and to obtain the most complete results, they were designed to use phone records collected both domestically and overseas.

According to the 2009 BR FISA review, there were 19 different analytic tools used for analysing both the raw metadata as well as the results of contact chaining. The glossary of the review lists following tools, unfortunately with their codenames redacted:


S................?
"This tool is used by HMCs to conduct contact chaining against BR FISA metadata and provide the results to the [...]team. HMCs only used RAS-approced selectors when using this tool. The [...] team ultimately provided the results to NSA's [....]"

S.........?
"The primary desktop graphical user interface (GUI) for access to [....] data and services"

S....?
"An analytic query tool used to seek out additional information on telephony selectors from [MAINWAY?] and other knowledge bases and reporting repositories"

[SYNAPSE Workbench?]
"A next generation metadata analysis graphical user interface (GUI) which is the replacement for [......]"

W......?
"The query tool, which indicates whether a telephony selector is present in NSA data repositories, the total number of unique contacts, total number of calls, and "first heard" and "last heard" information for the selector"


The 2009 PR/TT review also mentions the following tool, which could have been redacted in the BR FISA review:

M.....?
"A database analytic system and user interface tool for integrated analysis of multiple types of metadata, facilitating more comprehensive target activity tracking"


Update:
According to the internal NSA newsletter SIDToday from March 4, 2005, which was published by The Intercept in September 2017, MAINWAY's Sigint Navigator (SigNav) version 4.0 became the vehicle for the new single sign-on tool GLOBALVISION, which gave analysts access to 11 databases.


Combining multiple contact chains

In 2006, a "high-level Bush Administration intelligence official" told Seymour Hersh that analysts could for example look whether any number that is two or three hops away from the seed number is also in direct contact with that original suspect number. That sounds smart, but in that case, that number which is two or three hops away is simply a first-hop contact.

Finding suspects just by looking at connections between anonymous numbers could work however when several contact chains (from related suspect seed numbers for example) are combined: then a number that appears to be in contact with seed #1 and also with seed #2, would be suspicious, as it apparently belongs to someone known by both initial suspects.

This approach was seen in the CBS television program 60 Minutes from December 15, 2013, in which an NSA employee gave a demonstration of how metadata contact chaining works. He used a tool for foreign collection under EO 12333, resulting in some contact chains of almost fully masked phone numbers from Somalia. Clearly visible are numbers that different targets had in common:



Detailed call record analysis

Besides analysing the breadth of the contact chains, each contact between two phone numbers can also be analysed in depth. For this, the analytic software provides analysts access to the complete calling records associated with all the phone calls from a contact chain.

Such a record, as provided by the telecoms, includes the calling and the called number, a calling-card number, the IMEI number of a mobile handset and the IMSI number of a SIM card, as well as the date and time of the call, its duration and technical information about how the call was routed through the telephone networks.

This provides analysts with information like which number initiated the call, the day and time the call was made, and how long it lasted. And although the domestic phone records may not contain cell phone location data, the area code and prefix of a landline telephone number, as well as the trunk identifier for mobile networks, still indicate the area where a particular phone was located.

As described in the previous article, these data weren't derived from the MAINWAY system, but from a second database which holds "individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events".


Searching the second database

This database of calling records also enables analysts to subject these records "to other analytic methods or techniques besides querying", like for example searching them "using numbers, words, or symbols that uniquely identify a particular caller or device", or using "selection terms that are not uniquely associated with any particular caller or device" - according to the PCLOB report.

So, when analysing one or more contact chains resulted in finding several suspicious phone numbers, analysts can then use those numbers for querying the second database in order to see whether these numbers also appear in phone records that were not included in their initial contact chains.

And it also seems possible to query for example a trunk identifier to discover other phones from the same region. These kind of searches can therefore provide potential connections that could not have been found by conducting a direct contact chaining query.

Update:
An NSA slide that was already published in December 2013, shows that MAINWAY can indeed be used for queries with cell tower identifiers, in order to find selectors in certain geographical areas:



Some numbers

In a Department of Justice report (.pdf) from 2006 it's said that NSA "estimated that only a tiny fraction (0,000025% or one in four million) of the call-detail records [...] were expected to be analyzed". This would mean that of the 1,8 billion domestic phone records provided daily by AT&T, just 450 would be used for analysis.

So in a year, the records (not the content) of roughly 230.000 individual calls from the domestic metadata collection could have been used for analysis in addition to contact chaining.



Foreign call records

As we have seen, a contact chaining query on Section 215 telephone metadata will generally result in both foreign and domestic numbers. Analysts will therefore not only like to analyze the associated call records from the domestic collection, but also those from foreign collection conducted abroad.

These foreign phone records could be retrieved from the known metadata repositories like ASSOCIATION (for mobile calls) and BANYAN (for landline calls), or from a single foreign "SIGINT" database, as is suggested by an NSA memorandum from 2009.


Enrichment

Analyzing the detailed call records will still not provide names or other information that allows the identification of the people to which the numbers from a contact chain belong. For that, the phone numbers have to be correlated ("enriched") with other kinds of information.

The easiest way is probably to combine them with target watch lists to see if the contact chains contain phone numbers that belong to already known targets. This is demonstrated in the following video, which shows contact chain analysis using Sentinel Visualizer, which is a commercially available program for this purpose:





Telephone identifiers found through contact chaining and subsequent analysis can of course also be correlated with internet metadata. NSA does not collect domestic internet metadata anymore, but its collection abroad results in over 10 billion internet metadata a day being stored in the MARINA database.

The metadata from contact chains can also be enriched with data from for example GPS and TomTom, billing records and bank transactions, passenger manifests, voter registration rolls, property records and unspecified tax data - for both Americans and foreigners, according to a New York Times report, but in which NSA denies using this for the domestic metadata collected under Section 215.


SYNAPSE Data Model

With all this, analysts can build extensive social network graphs (or "community of interest" profiles) using 164 different relationship types like "travelsWith, hasFather, sentForumMessage, employs". It seems that this refers to the SYNAPSE Data Model, for which internal NSA relationships are shown in the following diagram that was published by The New York Times too:



Apparently also based upon this data model is SYNAPSE Workbench, which seems to be the "next generation metadata analysis graphical user interface (GUI)" described in the 2009 BR FISA review. SYNAPSE Workbench is apparently capable of fusing metadata from multiple sources and is also enabled for SPCMA searches.


Further action

When all this makes an analyst to believe that a certain telephone identifier belongs to someone who is of interest but wasn't yet known or identified, the following actions can be taken:
Is the identifier American and of counterterrorism value, then it can be passed on to the FBI for further intelligence or criminal investigation. From 2006-2009, NSA provided the FBI (and other intelligence agencies) a total of 277 reports containing 2883 telephone identifiers.
Is the identifier foreign, then NSA can use it as a selector to retrieve the content of associated communications that might be already in its databases. It can also be entered into the NSA collection system in order to pull in the content of any future communications of the target systematically.

In case the identifier of the yet unknown suspect is foreign, the analyst might have found out a name through the various enrichment correlations, but if not, this can also be achieved by listening into the content of associated phone calls or additional Human Intelligence (HUMINT) methods.


 

Conclusion

As we have seen, the domestic phone records collected by NSA under Section 215 are used for contact chaining that combines both domestic and foreign identifiers. NSA never explicitly explained this, probably because they didn't want to draw attention to their foreign metadata collection and analysis efforts. But it did became clear from the many documents about the Section 215 program that were declassified by the US government.

These documents made clear that NSA rarely went to 3 hops of contact chaining, which is contrary to what most people, including the Privacy and Civil Liberties Oversight Board (PCLOB) assumed. Because of the federated queries, the resulting contact chains were made up of both domestic and foreign identifiers, which means contact chaining under the Section 215 program involved far less American phone numbers than often presumed.

The documents also show that contact chaining for finding yet unknown conspirators isn't as easy as it may appear. It's not that one enters a phone numbers and the software provides a list of suspects. Data retrieved through the contact chains have to be analysed and correlated with other data sets in order to find out which numbers could matter. It still depends on experience, analysis and eventually even guessing which data and which numbers might be worth a closer investigation.

How successful this contact chaining and subsequent analysis is, is difficult to say. The PCLOB report judged that there was "no instance in which the [Section 215] program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack" - but it's also possible that there were just no such conspirators.

The PCLOB report noticed that analysing the domestic telephone metadata did provide some value "by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus" - although useful, this seems a rather meager result of what for sure required lots of work.


> Next: Collection of domestic phone records under the USA FREEDOM Act



Links and Sources
- Lawfare Blog: Understanding Footnote 14: NSA Lawyering, Oversight, and Compliance (2016)
- EmptyWheel.net: Federated Queries and EO 12333 FISC Workaround (2013) - What We Know about the Section 215 Phone Dragnet and Location Data (2016)
- PCLOB: Report on the Telephone Records Program Conducted under Section 215 of the USA PATRIOT Act (pdf) (2014)
- Cryptome.org: NSA FISA Business Records Offer a Lot to Learn (2013)
- Huffingtonpost.com: The NSA's Telephone Meta-data Program: Part I (2013)
- US Administration White Paper: Bulk Collection of Telephony Metadata under Section 215 of the USA PATRIOT Act (pdf) (2013)
- The New Yorker: What the N.S.A. Wants to Know About Your Phone Calls (2013)
- NSA: Business Records FISA NSA Review (.pdf) (2009)

No comments:

In Dutch: Meer over het wetsvoorstel voor de Tijdelijke wet cyberoperaties