As someone who has long studied trends in the domain name industry, the opening of hundreds of new gTLDs has intrigued me for quite some time on many levels. One question I found myself pondering was: Will new gTLDs create “new” naming trends or redundant domains across many TLDs? With more than 3 million domains delegated in the new TLD space there is now a corpus to study to answer this question.
The short answer is clear from these first two pie charts which illustrate the percentage of the second-level domains (SLDs) that were available in .com as of 12/15/2014:
Did you know nearly 75% of the names registered in #ngTLDs are available in .COM. Read more http://t.co/PEXIDCTDpz pic.twitter.com/c1DtzAJyCo
— VERISIGN (@VERISIGN) January 8, 2015
To answer the redundancy question, I looked at comparing SLDs in new gTLDS with SLDs in .com. The results show that a significant majority (~84%) of the SLD strings being registered in the new gTLDs are also registered in .com. However, there are 521,834 new gTLD domain names (493,563 unique SLD strings, or 16% of all new gTLD SLDs) that are registered in new gTLDs but not in .com.
Next, I looked at whether the combination of the SLD and new gTLD string is available as a SLD in .com –e.g. Andy.NewgTLD as AndyNewgTLD.com. As can be seen in the second pie chart, when the new gTLD is combined with the SLD, nearly 75% of the names registered in new gTLDs are available in .com today.
Digging deeper, I proceeded to explore which new gTLDs are the home of new SLDs that don’t exist in .com?
For starters, a few gTLDs seem to have a disproportionate number of these strings. The bar chart shows the top new gTLDs that have “distinct” new gTLD names along with the percentage of their zone that is in effect now “distinct” from .com:
One interesting takeaway is that the IDN TLDs all skew higher in terms of the portion of their base that is “distinct”. Intuitively, it seems that these may be an area where broader internationalization in new gTLDs help the domains make sense (i.e., IDN.IDN).
A couple of other interesting facts:
- Using a domain tokenization algorithm I have written that identifies domains that are made up of exclusively English keywords, 153,316 (31%) of the SLDs registered in new gTLDs that are available in .com are keyword exclusive domains. A few example strings that were available in .com at the time of writing include: pvcbusiness, emailinvention or searchcustomerservicejobs.
- I also observed that it is possible that end users and / or applications are confused using the new gTLDs and are trying to reach the .com domains in error. I observed this by looking at DNS requests for the new gTLD strings that are available in .com. When observing the DNS requests for the string in .com, I found that for more than 20 thousand strings they began being requested as .com strings only after the new gTLD string was registered. While this is an opportunity for applicants that may wish to acquire the corresponding SLDs within .com, it also further illustrates the universal acceptance challenges that continue to exist with new gTLDs.
If you want to learn more about these domains that are innovating in the SLD arena, here is how to go about it:
- Obtain access to the new gTLD zone files from ICANN by using the Centralized Zone Data Service (CZDS) at http://czds.icann.org/
- Obtain access to the .com zone file by following instructions available here:http://www.verisigninc.com/en_US/channel-resources/domain-registry-products/zone-file-information/index.xhtml. Other established gTLDs also make their zone files public but those that began operation before the new gTLDs are not yet available in CZDS (except .museum which has migrated to the new system).
- Once you have secured access, you can download the zone files from the respective servers, per the terms of the zone file access agreements. The files you will receive are essentially server configuration files that an authoritative name server references to determine how to respond when asked about a domain. The files contain various DNS records (typically NS and A) for domains that can be transformed into a list of SLDs that should currently be active within the corresponding TLDs.
- Use your favorite programming language to combine, compare and analyze. I typically use a hybrid of unix utilities and scripting languages like awk and perl.
Look for more analysis like this in the future from me and I hope to see what other interesting insights others are able to derive on their own.