Thursday, 26 May 2016

The content v metadata contest at the heart of the Investigatory Powers Bill

After more than 30 hours of Commons Committee debate and 1,000 or so proposed Opposition amendments, the Investigatory Powers Bill is moving on to its Report stage. Now is a good time to revisit one of the most fundamental points in the Bill: the dividing line between content and metadata.

This is especially topical in view of reports that David Anderson Q.C. is to undertake an independent review of the operational case for bulk powers. As will become apparent the dividing line is not the same for every power. This has particular relevance for bulk interception and equipment interference. 

Sensitivity of content and metadata

Why does the distinction between content and metadata matter? 

The government’s position, which finds support in human rights law, is that intercepting, acquiring, processing and examining the content of a communication is more intrusive than for the “who, when, where, how” contextual data wrapped around it.

Others argue that as, 
thanks to the mobile internet and smartphone apps, metadata has become ever richer and more revealing so the difference in intrusiveness has become less marked. We know from the Report of the Intelligence and Security Committee of Parliament in March 2o15 that the intelligence agencies value metadata as much as, if not more than, content. 

Be that as it may, under the Bill fewer safeguards and constraints apply to selection and examination of metadata than to content.

Content and metadata separation

Where to draw a line between content and metadata is not necessarily obvious. There is no assurance, come the inevitable human rights scrutiny, that courts applying ECHR Articles 8 and 10 or the EU Charter will draw a dividing line in the same place as domestic legislation.

In fact the Bill creates different dividing lines between content and metadata for different purposes: one version for mandatory retention and acquisition of communications data from service providers and another for communications interception and equipment interference. The latter designates more information as metadata and less as content.

This is perhaps not wholly surprising, since the Anderson Review (10.28) was sympathetic to the usefulness of content-derived metadata. Whether the possible extent of the change wrought by the Bill is generally appreciated is another matter.

Consequences of the demarcation

The demarcation between content and metadata has significant practical consequences. The Snowden disclosures suggest that GCHQ has bulk intercepted and stored metadata by the tens of billions of records. Even where such 'related communications data' (the term used in the Regulation of Investigatory Powers Act (RIPA)) is gathered as the by-product of an overseas-focused bulk interception campaign, the agency is able to look in the resulting metadata pool for information about people known to be in the British Islands.

Under RIPA it cannot do that for content without obtaining special Ministerial authorisation. Under the Bill that would need a targeted examination warrant. In Committee the government resisted an amendment that would have extended the requirement for a targeted examination warrant to include metadata as well as content. 

Does Parliament have enough information to know where the line is drawn?

The Commons Science and Technology Committee and a Joint Parliamentary Committee scrutinised the draft Bill. Neither seemed confident that it (or anyone else) understood where the legislation drew the line between content and metadata. 

The Joint Committee identified the definitions of communications data and content as one of the most common concerns among witnesses. Its Recommendation 1 said:

“Parliament will need to look again at this issue when the Bill is introduced. We urge the Government to undertake further consultation with communications service providers, oversight bodies and others to ascertain whether the definitions are sufficiently clear to those who will have to use them.”

For bulk interception the Committee noted the concerns of witnesses about the distinction between ‘related communications data’ and content. It recorded my own suggestion that:

“The Home Office could usefully produce a comprehensive list of datatype examples, where appropriate with explanations of context, categorised as to whether the Home Office believes that each would be entity data, events data, contents of a communication, data capable of being related communications data when extracted from the contents of a communication and so on.”

The Science and Technology Committee had previously noted that the government, in seeking to future-proof the legislation, had produced definitions that had led to significant confusion on the part of communications service providers and others. It said that definitions such as ‘communications content’ needed to be clarified as a matter of urgency.

The closest that the Home Office has come to producing a systematic analysis is in Annex A to evidence submitted to the Joint Committee, categorising a selection of datatypes. This fell too late to be considered by most witnesses and was light on analysis of why particular items fell on one side of the line or the other.

Since then, the Bill as introduced into Parliament in March 2016 has revised some of the definitions. Most significantly it replaces 'related communications data' with ‘secondary data’. This, explained the government:

“[makes] clear that it is broader than communications data. This clarifies the distinction between this type of data and the narrower class of data available under a communications data authorisation.” (emphasis added)

The government published draft Codes of Practice alongside the Bill. In principle the wealth of explanation in these and other sources – Explanatory Notes, Home Office evidence, fact sheets, operational cases, Ministerial statements in Committee, Home Office letters to the Committee and so on – should help us understand where the dividing line lies.

How does the Bill draw the line?

Any attempt to draw a line between content and metadata has to avoid circularity: “Why is this information not content?” “Because it is less sensitive.” “Why is this information less sensitive?” “Because it is not content.”

The Bill's new definition of content (there is no existing definition in RIPA) turns on whether data reveals anything of what might reasonably be considered to be the meaning (if any) of a communication. The Joint Committee commented on the draft Bill:

The impression of having to perform metaphysical gymnastics is bolstered when we are introduced to the concept of ‘inferred meaning’. Paragraph 2.14 of the draft Interception Code of Practice says:

“There are two exceptions to the definition of content section out in section 223(6). The first is there to address inferred meaning. When a communication is sent, the simple fact of the communication conveys some meaning, e.g. it can provide a link between persons or between a person and a service. This exception makes clear than any communications data associated with the communication remains communications data and the fact that some meaning can be inferred from it does not make it content.”

If anything this confirms Paul Bernal’s concern that since meaning can be derived from almost any data, a dividing line based on the existence of meaning is problematic.

What is the practical result of the Bill’s definitions? 

Since the Bill draws the line in different places for different purposes the practical result depends on which set of definitions is used. One set applies to interception and equipment interference, the other to retention and acquisition of communications data. 

The interception variety of metadata is ‘secondary data’. For equipment interference it is the similar ‘equipment data’. Both consist of either ‘systems data’ or ‘identifying data’. Systems data is a critical definition, since S.223(6) lays down that if something is systems data it cannot be content.

The overriding nature of the systems data definition relieves the intercepting or interfering agency of the need to grapple with questions of the ‘meaning’ of the communication. The draft Interception Code of Practice notes that in practice the agency will only have to decide whether information fits within the definition of systems data. If so, it cannot be content even if it reveals some of the meaning of the communication.

The Bill will also enable ‘identifying data’ to be extracted from the contents of a communication and treated as secondary data. Under RIPA, information such as an e-mail address embedded in a web page is treated as content. Under the Bill, intercepting and interfering agencies would be able to scrape such data from the body of a communication and treat it as metadata.

For retention and acquisition of communications data metadata is either ‘entity data’ or ‘events data’. Here the position is reversed: content takes precedence. If information reveals anything of the meaning of the communication (beyond the mere fact or transmission of the communication) then for these purposes it is content, even if for interception or equipment interference purposes it would be systems data. The ‘identifying data’ scraping exception does not apply.

The result is that some types of information may be treated as metadata for the purposes of interception and equipment interference, but as content for the purposes of communications data retention and acquisition.

This overlap of content and metadata is not merely theoretical. The draft Communications Data Code of Practice suggests that some communications may consist entirely of systems data (and thus be deemed to contain no content). The draft Equipment Interference Code of Practice gives the example of machine to machine messages between items of network infrastructure to enable the system to manage the flow of communications. 

Testing the content/metadata dividing line

The most comprehensive way of testing the dividing line between content and metadata is to take a large number of examples of different types of information and assess which side of the line they would fall.  

I have adopted a different approach: take a short e-mail and evaluate which of its components might count as content and which as metadata.

For this exercise I have used the version of the dividing line that contrasts content with ‘secondary data’. This applies to targeted, thematic and bulk interception warrants. It replaces ‘related communications data’ under RIPA. As we have seen, ‘secondary data’ is generally broader than the ‘communications data’ definition used for mandatory retention and acquisition.

Here is my sample e-mail.

An initial impression is probably that the From/To and Sent fields are metadata and everything else is content. Indeed that is the current position under RIPA. When we turn to the Bill however, things seem to be rather different. It appears that most of the e-mail may be either systems data, or identifying data that can be extracted and treated as metadata.

Of course only the visible parts of the e-mail are shown. More datatypes will be lurking in the header. Depending on exactly what they contain those are likely to be secondary data.

To understand how what looks like e-mail content can become metadata, we need to delve more deeply into the definition of 'secondary data'.

What is secondary data?

S.120 of the Bill provides that secondary data, in relation to any communication transmitted by means of a telecommunication system, means any data falling within either of two subsections:

Subsection (4) is systems data which is comprised in, included as part of, attached to or logically associated with the communication (whether by the sender or otherwise). In general terms systems data is data that enables or facilitates a telecommunication system or service, a system holding a communication, or a service provided by such a system, to function. It is not limited to the system that is conveying the communication in question. For a graphical representation of the full definition of systems data, see here.

Subsection (5) concerns identifying data. Like systems data it must be comprised in, included as part of, attached to or logically associated with the communication (whether by the sender or otherwise). Unlike systems data it must also be capable of being logically separated from the remainder of the communication; and, if it were separated, must not “reveal anything of what might reasonably be considered to be the meaning (if any) of the communication, disregarding any meaning arising from the fact of the communication or from any data relating to the transmission of the communication.”

This last condition mirrors the Bill’s general definition of content. It raises the perplexing question of what (and how much) information can be extracted from the content of a communication without revealing anything of the meaning of the communication. Examples given in the Explanatory Notes include:

  • the location of a meeting in a calendar appointment; 
  • photograph information - such as the time/date and location it was taken; and 
  • contact 'mailto' addresses within a webpage
The first two of these examples reveal a possibly surprising feature of identifying data. The data can, it seems, relate to matters such as a real world meeting or the taking of a photograph that are not an aspect of a communication.

This conclusion follows from the definition of ‘identifying data’, which includes data which may be used to identify any person, apparatus, system or service, any event, or the location of any person, event or thing. Events are – apparently - not limited to events forming part of the use of a communications system. Data may relate to the fact of the event, the type, method or pattern of event, or the time or duration of the event. For a graphical representation of the full definition of identifying data, see here.

The Home Office in its evidence to the Joint Committee said: “It is also possible for certain structured data types to be extracted from the content of a communication”. In the Bill neither the systems data nor identifying data definitions appear to be restricted to structured data (and the definition of ‘data’ is certainly not limited in that way).

Identifying data must be capable of being logically separated from the content of the communication. Does that imply some element of structure in the extractable data? It may just mean that physical separation is unnecessary. In the Bill Committee on 12 April 2016 the Minister said: “For example, if there are email addresses embedded in a webpage, those could be extracted as identifying data.”

Another conundrum is whether each item of identifying data has to be evaluated separately in determining whether it reveals anything of the meaning of the communication, or whether extracted items of identifying data should be considered cumulatively.

For the purposes of analysing my sample e-mail I have assumed that unstructured information can for the purposes of the Bill (whether it is technically possible is another matter) be “logically separated” from the rest of the communication; and that extracted elements of identifying data are not considered cumulatively. These are points on which further elucidation would be desirable.

Analysis of sample e-mail

Below is a marked up version of the e-mail. All the highlighted text could, it seems, be either systems data (yellow) or identifying data (orange). 

The “From”, “To” and “Sent” fields fit the definition of systems data, as data facilitating the functioning of a telecommunications service. This is unsurprising and corresponds to the existing position under RIPA.

An e-mail 'Subject' line is content. However, as the draft Equipment Interference Code of Practice explains in relation to equipment data, elements of the subject line may be capable of being extracted and treated as metadata: “the text in the subject line would not be equipment data (unless separated as identifying data).”

So consider “last night’s call”. ‘call’ appears to be identifying data, since it identifies both the fact and type of an event (S.225(2)(b), (3)(a) and (b)). “last night’s” relates to the time of the event (225(3)(c)).

“Bill” and “Graham” both identify, or may assist in identifying, persons (s.225(2)(a)).

“Meet”, Wednesday”and "Red Lion” all appear to be identifying data. “Meet” relates to the type of event (S.225(2)(b), (3)(b)), “Wednesday” to its time (225(3)(c)) and “Red Lion” to the location of the event (225(2)(c)). The fact that this is a real world event rather than a communications event does not appear to prevent it being identifying data. The Explanatory Note gives an example of the location of a meeting in a calendar appointment. It would be odd if information sent in a calendar appointment was treated differently from the same information sent in an e-mail.

“DM”. It is possible that this is systems data, describing something connected with enabling or facilitating the functioning of a telecommunications service. If not, it appears to be identifying data as assisting in identifying a service (225(2)(a)).

“@cyberleagle” is probably systems data (there no apparent requirement that the data should relate to means used to send the intercepted communication itself). If not, this is identifying data.

If this tentative analysis is correct, the secondary data (and equipment data) provisions of the Bill would represent a significant change to the existing content/metadata boundary under RIPA. 

Despite all the supporting Bill materials these provisions still present a challenge to understand. If Parliament is to have a properly informed debate on these matters a fully detailed and reasoned Home Office explanation of what data falls within each category and why would be helpful.

Friday, 15 April 2016

Future-proofing the Investigatory Powers Bill

[Based on a presentation to BILETA 2016 on 11 April 2016]

If we know one thing about the Investigatory Powers Bill, it must be future-proof. Legislation should, self-evidently, stand the test of time in the face of rapid technological change and not become out of date overnight.

However the task is not a simple matter of spraying a coat of future-proof paint on to the Bill. Future-proofing can give rise to serious difficulties when the legislation furnishes the state with intrusive powers over its citizens. An attempt to future-proof blighted the current Regulation of Investigatory Powers Act (RIPA). The signs are that some of the mistakes of RIPA are about to be repeated in the 
Investigatory Powers Bill.

How should we set about future-proofing legislation? In the communications surveillance field two techniques have been tried.

One is a broad, flexible, order-making power. The statute would empower the Secretary of State to make and revise regulations from time to time, subject to less Parliamentary scrutiny than for primary legislation. However when considering the primary legislation Parliament has only the mistiest outline of what it is being asked to approve. The features of the landscape do not appear until it is too late.

That was the approach adopted in the draft Communications Data Bill (CDB), which in 2012 was stopped in its tracks by a Joint Parliamentary Committee. Clause 1 of the draft Bill was a general order-making power that could be used to mandate collection, generation and retention of communications data. Home Office official Charles Farr said in evidence to the Committee:

"Future-proofing and flexibility are at the heart of the language we have used in clause 1."
The Committee noted the "wide anxiety raised by the breadth of clause 1". It concluded:
"We do not think that Parliament should grant powers that are required only on the precautionary principle. There should be a current and pressing need for them."
Remnants of the CDB approach survive in parts of the Investigatory Powers Bill. 

The power to serve technical capability notices on telecommunications operators sets out a list of obligations that can be imposed, including the obligation to remove electronic protection applied by or on behalf of the operator. Although the list is fairly specific, the power itself is open-ended. The obligations that may be specified in regulations merely "include, among other things" the items in the list.

The direct descendant of Clause 1 of the CDB is Clause 78 of the IP Bill. Clause 78, u
nlike the CDB, sets out a list of ‘Relevant Communications Data” that can be the subject of data retention notices issued by the Secretary of State. The items on the list are still described in quite general terms, including for instance “data which may be used to identify, or assist in identifying, … the type, method or pattern, or fact, of communication”.

Clause 78 also retains a strong bias towards the ‘precautionary principle’ deprecated by the 2012 Joint Committee. At present notices under DRIPA can require retention of a few specific types of data in respect of limited categories of communication such as internet e-mail, SMS messages and internet telephony. The Counter Terrorism and Security Act 2015 added IP address resolution data. The financial projections in the Home Office’s IP Bill Impact Assessment allow only for the addition of so-called internet connection records. Yet Clause 78 is much broader than that, encompassing for instance the machine to machine communications that will underpin the internet of things. There has been no attempt to explain or justify this broad scope.

Another method of future-proofing is technological neutrality. This approach contrasts with technology-specific legislation. The objective is to draft at a sufficiently abstract level to allow for future changes in technology.

IT and technology lawyers have been brought up to think of technologically neutral legislation as a Good Thing. Professor Chris Reed observed in 2007 that technological neutrality had become part of the general wisdom: 'motherhood and apple pie'. And so it was, when we were trying to avoid problems such as statutory writing requirements that assumed paper. However technological neutrality runs into trouble when applied to intrusive state powers.

The first problem is that abstract drafting has a tendency to be unintelligible. The obvious example is RIPA. Sir David Omand, the Permanent Secretary in the Home Office at the time RIPA was prepared, told the Commons Home Affairs Select Committee in February 2014:

“The instructions to parliamentary draftsmen were to make it technology-neutral, because everyone could see that the technology was moving very fast. Parliamentary draftsmen did an excellent job in doing that, but as a result I do not think the ordinary person or Member of Parliament would be able to follow the Act without a lawyer to explain how these different sections interact.”
RIPA is notoriously impenetrable, even to lawyers. It has been criticised almost from birth:
"We have found RIPA to be a particularly puzzling statute" (R v W, Court of Appeal, 2003)
"longer and even more perplexing" than the "short but difficult" Interception of Communications Act 1985. (Lord Bingham, A-G’s Ref (No 5 of 2002), 2004) 
"this impenetrable statute … one of the most complex and unsatisfactory statutes currently in force." Professor David Ormerod (2005) 
"a complex and difficult piece of legislation" Mummery LJ (then President of the Investigatory Powers Tribunal, 2006) 
"RIPA 2000 is a difficult statute to understand" (Sir Anthony May, IOCC Report for 2013) 
"RIPA, obscure since its inception, has been patched up so many times as to make it incomprehensible to all but a tiny band of initiates" (David Anderson Q.C., A Question of Trust, 2015.)
Unintelligibility is a direct consequence of the attempt to future-proof by technologically neutral, abstract drafting.

Intelligibility is not just a lawyer's nice to have. Where intrusive powers are concerned it is a rule of law principle that the public should be able to know with reasonable certainty the kind of circumstances in which the powers may be used against them. Unintelligible legislation fails that test. A Question of Trust said:

“The desire for legislative clarity is more than just tidy-mindedness. Obscure laws –and there are few more impenetrable than RIPA and its satellites – corrode democracy itself, because neither the public to whom they apply, nor even the legislators who debate and amend them, fully understand what they mean.”
A Question of Trust challenged the government to produce legislation that is both comprehensive and comprehensible.

A second problem with applying technological neutrality to intrusive powers arises from the fact that where the technology goes, so the powers automatically follow.

This of course is what the technique is intended to achieve. As people use technology in ways that were unknown at the time of the legislation, the powers will apply to the new behaviour. However the result is that the balance between privacy and intrusion that Parliament contemplated at the time it passed the legislation is liable to shift due to mere accidents of technology.

Again, RIPA is a prime example. Mobile phones existed in 2000, as did the internet. But they were not yet combined. When they merged on the smartphone all kinds of human activity that were previously untouched by RIPA suddenly fell into its scope.

It will be said that that is how it should be: conspirators who used to communicate by telephone and now use over-the-top messaging should be subject to equivalent powers. That may be so. But entirely personal behaviour that does not involve any kind of messaging between two or more human beings has also been swept up. We never used to read books or newspapers over the telephone. Now we read websites remotely. RIPA counts this activity, equivalent to sitting at home reading a book, as a communication - as if it were the same as e-mailing or text messaging a contact.

The mobile internet was not contemplated by the legislators in 2000. The result of this accident of technology is a major shift in the privacy/intrusion balance, without Parliament ever having had the opportunity to consider it. Now that Parliament is considering it, it is doing so against the background of a sense of entitlement to the bounty of data that adventitiously fell into the laps 
of intelligence agencies and law enforcement bodies.

What should we do? The key is to ask what we should be seeking to future-proof: the powers themselves or the privacy/intrusion balance settled upon by Parliament when it enacts legislation of this kind. My own view is that we should learn the lesson of RIPA and seek to future-proof the privacy/intrusion balance, not the powers.

That would require a fundamentally different approach: concrete, technology-specific drafting, sunsetting of powers, frequent review by Parliament and continued openness by the government about how the powers have been used. The latter is critical if Parliament is to engage in an informed debate when powers come back for renewal.

Regrettably, the IP Bill has gone down a similar track to RIPA. It has tried to future proof the powers and, as with RIPA, the predictable result is unintelligibility. The House of Commons Science and Technology Committee said in its report on the draft Bill:

“The Home Secretary told us subsequently that the definitions for ‘communication data’ and ICRs were intended to be “technology neutral and flexible in order that, should user behaviour and technology change, they will still apply”. The definitions were to be applied “to the full range of powers and obligations under the draft Bill” which had subsumed provisions from several current statutes. As a result, “the definitions as they are formulated are necessarily abstract”.” (emphasis added)
The Committee concluded:
“The government, in seeking to future-proof the proposed legislation, has produced definitions of internet connection records and other terms which have led to significant confusion on the part of communications service providers and others.”
The "others" include the general public, whose communications form the subject of the Bill and who should, as a matter of the rule of law, be able to understand the scope of the powers.

The government, responding to a recommendation by the Joint Committee, has included provision for review of the Bill after five and a half years. However that is insufficient without addressing the problems of over-abstract drafting. Nor is hiving off detail to Codes of Practice a good approach. It is not the function of Codes of Practice to compensate for obscure legislation. 

Further reading on technology neutrality

Alberto Escudero Pascual and Ian Hosein, The hazards of technology-neutral policy: questioning lawful access to traffic data, by (Communications of the Association for Computer Machinery (CACM) Journal Published 29 Feb 2004)

Chris Reed, Taking Sides on Technology Neutrality, (2007) 4:3 SCRIPTed 263

Graham Smith, Are Techlaw principles in the Ascendency? Intellectual Property Forum: journal of the Intellectual and Industrial Property Society of Australia and New Zealand, Issue 96 (Mar 2014)

[Amended 15 April 2016 to make specific reference to mobile internet.]

Friday, 1 April 2016

An official announcement

The following official statement was issued this morning.
“A temporary ceasefire has been agreed among combatants in the Semantic Wars. 
A list of banned words and phrases has been drawn up including ‘Itemised Phone Bill’, ‘The Outside of an Envelope’ and ‘We only want to do what [named Silicon Valley company] does’.

Any permutation of (indiscriminate, blanket, mass, dragnet, random, uncontrolled, at will) and (surveillance, trawling, snooping, browsing, monitoring) is also prohibited, whether accusations or denials thereof.
Use of the term 'Snoopers Charter' will be regarded as grounds for immediate termination of the accord.”

Early indications are that the truce is unlikely to hold.

[BREAKING NEWS, 10.45 am. Unconfirmed reports suggest that teams of inspectors are in the process of being deployed to eliminate stockpiles of unused non-denial denials.]

Tuesday, 29 March 2016

Woe unto you, cryptographers!

A hitherto unknown translation of the Bible has been found in a Cheltenham safe deposit. So far it has been possible to decipher only a few verses:

Matthew 7:16: "Ye shall know them by their metadata".

Job 31:4: "Does not GCHQ see my ways, and count all my hops?" 

Revelation 20:13: "And they were judged all according to the pattern of their communications."

Revelation 3:8: "Behold, I have set before thee an open door, and no man can shut it: for thou hast a little strength, and hast kept my word, and hast not installed end-to-end encryption".

Psalm 1391-2 (To Wearable Tech): "Thou knowest my sitting down and my rising up, thou understandest my thought afar off."

Luke 11:52: "Woe unto you, cryptographers! for ye have taken away the key of knowledge: ye entered not in yourselves, and them that were entering in ye hindered."

Thursday, 24 March 2016

All about the metadata

If it is true that granularity of language reflects the importance of the subject matter then metadata, not content, is at the heart of the Investigatory Powers Bill.

For content the Bill provides a few definitions: Content, Relevant Content, Intercepted Content and Protected Material. 

For metadata we have a richer set: Communications Data, Relevant Communications Data, Internet Connection Records, Entity Data, Events Data, Systems Data, Related Systems Data, Equipment Data, Secondary Data and Identifying Data.

The emphasis on metadata is perhaps unsurprising, since the Intelligence and Security Committee told us in its March 2015 report that metadata is indeed more valuable than content to the intelligence agencies in their mission to join up the dots and spot potential malefactors:

The plethora of definitions (not to mention the proliferation of cross-linked sub-definitions) does not make for easy understanding. 

In an attempt to untangle the spaghetti heap I have been experimenting with flowchart visualisations of the more significant and complex data definitions. More of that anon. 

The table below shows where the major varieties of telecommunications data fit in the scheme of the Bill. For simplicity it focuses mainly on bulk powers and also omits definitions of overseas-related communications, overseas-related equipment data and overseas-related information in the bulk equipment interference part of the Bill.  

In general terms the types of metadata obtainable under the bulk interception and interference warrants are broader than those under the powers and bulk warrant for acquisition of communications data.

Subject matter
Communications data retention notice (78(1))
Relevant Communications Data (78)(9)
  • Communications Data (223(5))
Communications data acquisition - authorisation and notice (53)
Communications Data (223(5))

  • Entity Data (223(3))
  • Events Data (223(4))
Restrictions on use of S.53  power to access or process internet connection records (54(4))
Internet Connection Records (54(6))

  • Communications Data (223(5))
Bulk communications data acquisition warrant (138)
Communications Data (223(5))

  • Entity Data (223(3))
  • Events Data (223(4))
Bulk interception warrant (119)
Communications (223(2))

Content (223(6))
Intercepted Content (137(1))
Relevant Content (134(5))

Secondary Data (120(3))

  • Systems Data (225(4))
  • Identifying Data (225(2) and (3))

Related Systems Data (119(6))

  • Systems Data (225(4))
Bulk equipment interference warrant (154)
Communications (223(2))

Protected Material (170(9))

  • [not] Equipment Data (155(5))
  • Private Information (173(1))

Equipment Data (155(5))

  • Systems Data (225(4))
  • Identifying Data (225(2) and (3))
Warrant for retention or examination of bulk personal datasets (175)
Bulk Personal Dataset (174)

It can be seen that around half a dozen different kinds of power or authority provide routes for the compulsory retention and acquisition of various kinds of metadata. They all have in common that the Bill’s restrictions on selecting and accessing bulk content (an individual located within the British Islands at the time of selection cannot normally be targeted without a further warrant) do not apply.

This is a diagram of the overall metadata ingestion scheme of the Bill.

Turning to the definitions, the Clause 78 power to direct retention of communications data rests on the definition of Relevant Communications Data. Internet Connection Records are a subset of Relevant Communications Data to which Clause 54 applies some access restrictions (although fewer in the Bill than the draft Bill). 

Relevant Communications Data in turn depends on the dividing line between Content and Communications Data. The definition of content interfaces separately with Systems Data. The draft Codes of Practice released with the Bill suggest that it is possible for communications to consist entirely of Systems Data and so contain no content.

What the definition of content lacks in companions it makes up for in conceptual difficulty.  The Parliamentary Joint Committee scrutinising the draft Bill remarked:

Communications Data consists of either Entity Data or Events Data, to which different levels of authorisation apply under the targeted communications data access regime in Part 3 of the Bill. This is the equivalent of the current RIPA communications data access regime under which over 500,000 access demands are made on communications service providers annually.

Turning to bulk powers, the bulk communications data acquisition warrant authorises the obtaining of Communications Data. A bulk interception warrant authorises the interception of Secondary Data in addition to content. Secondary Data is the Bill’s version of what under RIPA is known as Related Communications Data. Secondary Data consists of either Systems Data (as before) or Identifying Data. Unlike with RIPA, the Bill will allow metadata contained within the content of a communication to be scraped and be no longer treated as content. 

Similarly a bulk equipment interference warrant authorises the obtaining of Equipment Data, a close cousin of Secondary Data.

Last, a bulk interception warrant also authorises the obtaining of Related Systems Data from telecommunications operators. 

That's all about the metadata.

The chief remaining omission from the visualisations is Protected Material in S.170(9). This is the bulk equipment warrant equivalent of Content. As such it defines the material for which a targeted examination warrant is necessary if it is to be selected for examination by reference to an individual known to be located in the British Islands. 

The definition contains a triple negative that presents a considerable challenge to parse and represent graphically. Instead, here is the unadorned raw text to ponder:
“protected material” means any material obtained under the warrant other than material which is -

(a) equipment data;
(b) information (other than a communication or equipment data) which is not private information.”
Relevant Content crops up in relation to targeted examination warrants in Part 1. It means 'any content of communications intercepted by an interception authorised or required by a bulk interception warrant'. 

Intercepted Content, in relation to a bulk interception warrant in Part 6, is defined almost identically: 'any content of communications intercepted by an interception authorised or required by the warrant'.

Tuesday, 15 March 2016

Relevant Communications Data revisited

One of the more critical definitions in the Investigatory Powers Bill is 'relevant communications data'. This determines the scope of the Secretary of State's power under Section 78 to direct telecommunications operators (both public and private networks) to generate, obtain and retain communications data including (but by no means limited to) so-called internet connection records (site browsing histories).

It is also one of the most complex definitions in the Bill. The draft Bill version consisted of 14 interlinked definitions and sub-definitions.  If anything it has become even more complex in the Bill itself, now expanded to 16 definitions and sub-definitions.  On the upside at least we now have only one definition of internet connection records.

For the draft Bill I attempted a visualisation of the web of definitions that make up 'Relevant communications data'.  

Here is my updated version for the Bill, accompanied by a colour-coded reference list of the definitions: all 985 words of them.

Reference list of definitions

78(9): In this Part “relevant communications data” means communications data
which may be used to identify, or assist in identifying, any of the following—

(a) the sender or recipient of a communication (whether or not a person),
(b) the time or duration of a communication,
(c) the type, method or pattern, or fact, of communication,
(d) the telecommunication system (or any part of it) from, to or through which, or by means of which, a communication is or may be transmitted, or
(e) the location of any such system,
and this expression therefore includes, in particular, internet connection records.

54(6): In this Act “internet connection record” means communications data which -

(a) may be used to identify, or assist in identifying, a telecommunications service to which a communication is transmitted by means of a telecommunication system for the purpose of obtaining access to, or running, a computer file or computer program, and
(b) comprises data generated or processed by a telecommunications operator in the process of supplying the telecommunications service to the sender of the communication (whether or not a person).

223(2): “Communication”, in relation to a telecommunications operator, telecommunications service or telecommunication system, includes—
(a) anything comprising speech, music, sounds, visual images or data of any description, and
(b) signals serving either for the impartation of anything between persons, between a person and a thing or between things or for the actuation or control of any apparatus.

223(13): “Telecommunication system” means a system (including the apparatus comprised in it) that exists (whether wholly or partly in the United Kingdom or elsewhere) for the purpose of facilitating the transmission of communications by any means involving the use of electrical or electromagnetic energy.

223(11) and (12): “Telecommunications service” means any service that consists in the provision of access to, and of facilities for making use of, any telecommunication system (whether or not one provided by the person providing the service).

For the purposes of subsection (11), the cases in which a service is to be taken to consist in the provision of access to, and of facilities for making use of, a telecommunication system include any case where a service consists in or includes facilitating the creation, management or storage of communications transmitted, or that may be transmitted, by means of such a system.

223(10): “Telecommunications operator” means a person who—

(a) offers or provides a telecommunications service to persons in the United Kingdom, or
(b) controls or provides a telecommunication system which is (wholly or partly)—
(i) in the United Kingdom, or
(ii) controlled from the United Kingdom.

223(5): “Communications data”, in relation to a telecommunications operator, telecommunications service or telecommunication system, means entity data or events data

(a) which is (or is to be or is capable of being) held or obtained by, or on behalf of, a telecommunications operator and—
(i) is about an entity to which a telecommunications service is provided and relates to the provision of the service,
(ii) is comprised in, included as part of, attached to or logically associated with a communication (whether by the sender or otherwise) for the purposes of a telecommunication system by means of which the communication is being or may be transmitted, or
(iii) does not fall within sub-paragraph (i) or (ii) but does relate to the use of a telecommunications service or a telecommunication system,
(b) which is available directly from a telecommunication system and falls within sub paragraph (ii) of paragraph (a), or
(c) which—
(i) is (or is to be or is capable of being) held or obtained by, or on behalf of, a telecommunications operator,
(ii) is about the architecture of a telecommunication system, and
(iii) is not about a specific person,
but does not include any content of a communication or anything which, in the absence of subsection (6)(b), would be content of a communication.

225(1): “data” includes data which is not electronic data and any information (whether or not electronic),

223(3): “Entity data” means any data which—

(a) is about—
(i) an entity,
(ii) an association between a telecommunications service and an entity, or
(iii) an association between any part of a telecommunication system and an entity,
(b) consists of, or includes, data which identifies or describes the entity (whether or not by reference to the entity’s location), and
(c) is not events data.

223(4): “Events data” means any data which identifies or describes an event (whether or not by reference to its location) on, in or by means of a telecommunication system where the event consists of one or more entities engaging in a specific activity at a specific time.

223(7): “Entity” means a person or thing.

225(1): “person” (other than in Parts 2 and 5) includes an organisation and any association or combination of persons

223(6): “Content”, in relation to a communication and a telecommunications operator, telecommunications service or telecommunication system, means any element of the communication, or any data attached to or logically associated with the communication, which reveals anything of what might reasonably be considered to be the meaning (if any) of the communication, but—
(a) any meaning arising from the fact of the communication or from any data relating to the transmission of the communication is to be disregarded, and
(b) anything which is systems data is not content.

225(4): In this Act “systems data” means any data that enables or facilitates, or identifies or describes anything connected with enabling or facilitating, the functioning of any of the following—
(a) a postal service;
(b) a telecommunication system (including any apparatus forming part of the system);
(c) any telecommunications service provided by means of a telecommunication system;
(d) a relevant system (including any apparatus forming part of the system);
(e) any service provided by means of a relevant system.

225(5): For the purposes of subsection (4), a system is a “relevant system” if any communications or other information are held on or by means of the system.

225(1): “apparatus” includes any equipment, machinery or device (whether physical or logical) and any wire or cable