Monday, July 10, 2017

The .io Error: A Problem With Bad Optics, But Little Substance

It's been a while since I've posted anything here. Being busy with work and various other projects, nothing has quite risen to the level of gotta write about it to get me writing, until now.

This morning, someone at work drew my attention to a post on The Hacker Blog that they thought warranted some attention. For those who haven't seen it, it purports to describe how the author managed to almost hijack most of the DNS traffic for the .io Top Level Domain. The post has started to receive a bit of attention elsewhere, and while it describes a definite mistake on the part of the Backend Registry Operator for the .io TLD, it definitely does not constitute the catastrophe implied by the article.

The problem with the article stems from the author's misunderstanding of how delegations in DNS actually work, and the part that the behaviour of both recursive and authoritative name servers has to play in the described "hijack." The author assumes that because he's able to register a domain name that matches several of the authoritative name server names for the .io TLD that it is "likely that clients will randomly select our hijacked nameservers over any of the legitimate nameservers..." This is wrong.

The author demonstrates his "hijack" by pasting these results to a DNS query:

; <<>> DiG 9.8.3-P1 <<>> NS ns-a1.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8052
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ns-a1.io.          IN  NS

;; ANSWER SECTION:
ns-a1.io.       86399   IN  NS  ns2.networkobservatory.com.
ns-a1.io.       86399   IN  NS  ns1.networkobservatory.com.

;; Query time: 4 msec
;; SERVER: 2604:5500:16:32f9:6238:e0ff:feb2:e7f8#53(2604:5500:16:32f9:6238:e0ff:feb2:e7f8)
;; WHEN: Wed Jul  5 08:46:44 2017
;; MSG SIZE  rcvd: 84

In order to poison a DNS server, you must understand the normal queries that would typically be sent by that server and be in a position to answer one of them with a crafted response, or be in a position to trigger specific abnormal queries that will elicit your poisoning response. The problem with the example query is that it would never be sent by a typical client, without some sort of abnormal prompting.

Since the author doesn't claim any ability to trigger unusual queries in arbitrary recursive servers, in order to evaluate the attack we should look at typical queries that would be sent by a recursive name server trying to look up an .io domain. To see what would actually happen in his attack, let's see what would happen if someone were to look up the A record for 'bit.io' (the first .io domain that popped into my head). We'll assume an empty cache in order to give the attacker the greatest advantage, but skip the root priming query for simplicity, and because it wouldn't be relevant here.

The first query that will be sent is to the root. Because recursive servers are normally trying to get the most work done with the least effort, they always send the query for the information they're actually interested in, and deal with whatever response they get (an exception to this is a new option in the lookup algorithm called Query Minimisation, but it would have no effect on this). Therefore, the server does not ask the root for the .io name servers; instead, it asks for the A record for bit.io.

; <<>> DiG 9.11.1-P1 <<>> @a.root-servers.net +norec IN A bit.io.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21410
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 7, ADDITIONAL: 13

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;bit.io.    IN A

;; AUTHORITY SECTION:
io.   172800 IN NS a0.nic.io.
io.   172800 IN NS b0.nic.io.
io.   172800 IN NS c0.nic.io.
io.   172800 IN NS ns-a1.io.
io.   172800 IN NS ns-a2.io.
io.   172800 IN NS ns-a3.io.
io.   172800 IN NS ns-a4.io.

;; ADDITIONAL SECTION:
a0.nic.io.  172800 IN A 65.22.160.17
b0.nic.io.  172800 IN A 65.22.161.17
c0.nic.io.  172800 IN A 65.22.162.17
ns-a1.io.  172800 IN A 194.0.1.1
ns-a2.io.  172800 IN A 194.0.2.1
ns-a3.io.  172800 IN A 74.116.178.1
ns-a4.io.  172800 IN A 74.116.179.1
a0.nic.io.  172800 IN AAAA 2a01:8840:9e::17
b0.nic.io.  172800 IN AAAA 2a01:8840:9f::17
c0.nic.io.  172800 IN AAAA 2a01:8840:a0::17
ns-a1.io.  172800 IN AAAA 2001:678:4::1
ns-a2.io.  172800 IN AAAA 2001:678:5::1

;; Query time: 23 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Mon Jul 10 16:06:40 EDT 2017
;; MSG SIZE  rcvd: 422

The root servers respond with a delegation to the .io name servers. So far this meets with the attacker's requirements.

For the second query, the client will select one of the .io name servers in the previous response, and send it the same query.

; <<>> DiG 9.11.1-P1 <<>> @a0.nic.io. +norec IN A bit.io.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15355
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;bit.io.    IN A

;; AUTHORITY SECTION:
bit.io.   86400 IN NS ns1.dreamhost.com.
bit.io.   86400 IN NS ns3.dreamhost.com.
bit.io.   86400 IN NS ns2.dreamhost.com.

;; Query time: 25 msec
;; SERVER: 65.22.160.17#53(65.22.160.17)
;; WHEN: Mon Jul 10 16:08:47 EDT 2017
;; MSG SIZE  rcvd: 102

The name server responds with the list of name servers for the bit.io domain. Note the very important difference between this and the example given in the original article. The list of .io name servers are nowhere to be seen. The client will go on and ask one of the Dreamhost name servers the same question:

; <<>> DiG 9.11.1-P1 <<>> @ns1.dreamhost.com. +norec IN A bit.io.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23214
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2800
;; QUESTION SECTION:
;bit.io.    IN A

;; AUTHORITY SECTION:
bit.io.   14400 IN SOA ns1.dreamhost.com. hostmaster.dreamhost.com. 2014052400 18794 1800 1814400 14400

;; Query time: 62 msec
;; SERVER: 64.90.62.230#53(64.90.62.230)
;; WHEN: Mon Jul 10 16:11:12 EDT 2017
;; MSG SIZE  rcvd: 99

It turns out there is no A record for bit.io, so the client is going to be disappointed. More importantly, it will not be poisoned with the attacker's NS set. The only set of name servers in this query chain to give the list of .io name servers was the root, and the contents of the root zone are unaffected by what you can convince your Registrar to pass up to your Registry, or your Registry to put in their zone.

The key element here is that the name servers for the .io TLD don't respond with their own NS set in their response. The only way you're likely to get that response out of those servers is to specifically ask, and that's a query rarely performed by your typical recursive DNS server. Even then, the author's attack doesn't work.

Let's use a concrete example. Here's a nearly-empty zone I just created for the TLD "myTLD".

$TTL 3600
@               IN  SOA ns1.localhost.myTLD. hostmaster.localhost.myTLD. (
                        2017071000  ; serial
                        12h         ; refresh
                        15m         ; retry
                        2w          ; expiry
                        1h )        ; negative TTL

                IN  NS  ns1.localhost.myTLD.
                IN  NS  ns2.localhost.myTLD.
                IN  NS  ns3.localhost.myTLD.

ns1.localhost   IN  A   127.0.0.1
ns2.localhost   IN  A   127.0.0.2
ns3.localhost   IN  A   127.0.0.3

example         IN  NS  ns1.example.myTLD.
example         IN  NS  ns2.example.myTLD.
ns1.example     IN  A   192.0.2.1
ns2.example     IN  A   192.0.2.2

The zone contains only those things necessary to make it a valid zone, plus one delegation for example.myTLD. The Hacker Blog author's attack was to add a delegation for one of the name servers. Here's what that would look like in the above zone:

ns1.localhost   IN  NS  ns1.attacker.example.com.
ns1.localhost   IN  NS  ns2.attacker.example.com.

Note that this creates a conflict in the zone. There is both a delegation and an A record for ns1.localhost.myTLD. In all authoritative DNS servers this converts that A record from an authoritative record in the zone to an "occluded name." In simple terms, the A record is hidden by the presence of a delegation at the same point in the tree.

Querying this zone for the A record for www.example.myTLD has the same results as for bit.io in the above tests, but what happens if we query this TLD name server for its own NS set?

; <<>> DiG 9.11.1-P1 <<>> -p 5053 @localhost +norec IN NS myTLD.
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5117
;; flags: qr aa; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 17c0de220c7de23977eeeabd5963eb3576021df81bbff58f (good)
;; QUESTION SECTION:
;myTLD.    IN NS

;; ANSWER SECTION:
myTLD.   3600 IN NS ns1.localhost.myTLD.
myTLD.   3600 IN NS ns2.localhost.myTLD.
myTLD.   3600 IN NS ns3.localhost.myTLD.

;; ADDITIONAL SECTION:
ns2.localhost.myTLD. 3600 IN A 127.0.0.2
ns3.localhost.myTLD. 3600 IN A 127.0.0.3

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5053(127.0.0.1)
;; WHEN: Mon Jul 10 17:01:41 EDT 2017
;; MSG SIZE  rcvd: 158

The occluded A record is not returned in the result, but neither is anything from the attacker's delegation. To get that, you have to ask specifically for that domain name:

; <<>> DiG 9.11.1-P1 <<>> -p 5053 @localhost +norec IN A ns1.localhost.myTLD.
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15774
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 2dcb47dea6b96a9c075f84a15963ebcf3878b68057c0ef7c (good)
;; QUESTION SECTION:
;ns1.localhost.myTLD.  IN A

;; AUTHORITY SECTION:
ns1.localhost.myTLD. 3600 IN NS ns2.attacker.example.com.
ns1.localhost.myTLD. 3600 IN NS ns1.attacker.example.com.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5053(127.0.0.1)
;; WHEN: Mon Jul 10 17:04:15 EDT 2017
;; MSG SIZE  rcvd: 132

And again, this is not a query that is normally going to be sent by any recursive name server trying to look up 'www.example.myTLD.' That client just wouldn't care. What the author succeeds in doing is to add a delegation to the .io zone. What he needs to do, to redirect any traffic at all, is to get an address into the .io zone. This is not going to happen with the method he's used.

In theory, getting an address for one of the TLDs nameservers into the TLD zone might be possible if he can register a host record with his registrar, but success assumes several things:
That the .io TLD uses host records at all:
Unbound by the same restrictions as the gTLDs, some ccTLDs don't bother with all of the bells and whistles of the EPP standards. I don't know about .io specifically, but there are some ccTLDs that do not allow the registration of host records.
That the same restrictions are absent for host records as were for delegations:
Assuming the Registry for .io uses host records, they may have a different set of criteria for allowing or disallowing the registration of host records than they do delegations. For example, it's unlikely you can register a host record that is not a subdomain of a domain you already have the delegation for, and it's not unreasonable to expect that the registry might require the host record to actually be a subdomain, and not be equal to the domain name.
That the registry doesn't have safeguards in its publishing that prevent duplicate records for its name server set:
Depending on how the Registry publishes its zone, it's possible (even likely) that the minimal set of records (the SOA, the apex NS set, and glue for the apex NS set) are handled differently during zone generation than delegations and their glue. The former is likely a static set, and the latter would typically come out of a database. Most Registries have safeguards in place to check the validity of a newly generated zone, and would be wise to include checks of the former "minimal set" in those tests.