For several years, DNS-OARC has been collecting DNS query data “from busy and interesting DNS name servers” as part of an annual “Day-in-the-Life” (DITL) effort (an effort originated by CAIDA in 2002) that I discussed in the first blog post in this series. DNS-OARC currently offers eight such data sets, covering the queries to many but not all of the 13 DNS root servers (and some non-root data) over a two-day period or longer each year from 2006 to present. With tens of billions of queries, the data sets provide researchers with a broad base of information about how the world is interacting with the global DNS as seen from the perspective of root and other name server operators.
In order for second-level domain (SLD) blocking to mitigate the risk of name collisions for a given gTLD, it must be the case that the SLDs associated with at-risk queries occur with sufficient frequency and geographical distribution to be captured in the DITL data sets with high probability. Because it is a purely quantitative countermeasure, based only on the occurrence of a query, not the context around it, SLD blocking does not offer a model for distinguishing at-risk queries from queries that are not at risk. Consequently, SLD blocking must make a stronger assumption to be effective: that any queries involving a given SLD occur with sufficient frequency and geographical distribution to be captured with high probability.
Put another way, the DITL data set – limited in time to an annual two-day period and in space to the name servers that participate in the DITL study – offers only a sample of the queries from installed systems, not statistically significant evidence of their behavior and of which at-risk queries are actually occurring.
(more…)