The blogosphere has played an instrumental role in the transition and the evolution of linking technologies and practices. This research traces and maps historical changes in the Dutch blogosphere and the interconnections between blogs, which — traditionally considered — turn a set of blogs into a blogosphere. This paper will discuss the definition of the blogosphere by asking who the actors are which make up the blogosphere through its interconnections. This research aims to repurpose the Wayback Machine so as to trace and map transitions in linking technologies and practices in the blogosphere over time by means of digital methods and custom software. We are then able to create yearly network visualizations of the historical Dutch blogosphere (1999–2009). This approach allows us to study the emergence and decline of blog platforms and social media platforms within the blogosphere and it also allows us to investigate local blog cultures.

Read the full article at First Monday.

Acknowledgments

We would like to express our sincere thanks to Erik Borra for developing custom tools, discussing methods and probing sharp questions, Mathieu Jacomy for his help with creating Gephi maps and giving access to the G-Atlas tool for analyzing maps and Jan–Willem Hiddink and Robert–Reinder Nederhoed for providing a database dump from “Loglijst”. Last but not least, we would like to thank Marguerite Lely and Anneke Agema for their editorial advice.


Authors

Anne Helmond is a Ph.D. candidate with the Digital Methods Initiative, the New Media Ph.D. program at the Department of Media Studies, University of Amsterdam. In her research she focuses on software–engine relations in the blogosphere and cross-syndication politics in social media. She also teaches new media courses in the Media Studies Department.

Esther Weltevrede is a Ph.D. candidate with the Digital Methods Initiative, the New Media Ph.D. program at the Department of Media Studies, University of Amsterdam, where she also teaches. Esther’s research interests include national Web studies as well as platform and engine politics. Additionally Esther has been coordinating the DMI Summer schools and is also a member of Govcom.org, a foundation dedicated to creating political Web tools.


TLD analysis

The top-level domain (TLD) analysis presented here is part of a larger series of URL analysis methods discussed in this paper. As a first step we counted the TLDs of our starting points per year by entering URLs in batches corresponding to a single year using the TLDCount tool. Figure 1 shows the relative distribution of TLD usage over time. The Dutch blogs in our collection favor the .nl domain over all other domains throughout the years. Moreover, a significant increase in the .nl domain becomes apparent, whereas the .com domain is steadily losing share over time; our preliminary findings in the next section show that Dutch bloggers move away from .com blogging platforms such as Blogger’s Blogspot to go to Dutch .nl blogging platforms.

The Dutch .nl domain is one of the top five country code top level domains (ccTLDs) in the world, which is also reflected in the Dutch blogs. It is however remarkable that the .nl domain has been dominant from the beginning, since the .nl domain only became available to private individuals since 2003. As a forerunner, since 2000, individuals were allowed to register third–level domains such as jansen.123.nl but these domains were rather rare and are absent from our collection of blogs. As stated before, the Dutch blog collection contains a number of .be blogs, steady from 2000 onwards. Furthermore, 2002 presents a peak of .tk. Dot.tk “Renaming the Internet” offers free domain names and includes URL redirection and forwarding services. Lastly, a number of domains is unconventionally used for “commercial or vanity” purposes, including .nu (country code for Niue), marketed as ‘now’ in Dutch and .is (country code for Iceland), which is used as the verb ‘to be'.

Figure 1: Relative distribution of Top Level Domains (TLDs) in the Dutch blogs over time.


Platform analysis

A second way to answer the question ‘Where do bloggers blog?’, complementing the TLD analysis, is by visualizing the variety and proportion of blog platforms used in the Dutch blogosphere. This requires basic background knowledge of blog platforms. With the use of Google Refine, “a power tool for working with messy data”, we ‘coded’ each of the blog platforms in GREL (Google Refine Expression Language) to automatically search, transform and count the platforms in our set of URLs. The results are presented in Figure 2, a visualization combining the blog platform analysis with the self-hosted software analysis as discussed in the next section.

The graph shows the rise and popularity of Blogger’s platform, Blogspot, in the beginning of 2000. The decline of Blogspot coincides with the rise of the Web–Log.nl blogging platform, and other Dutch blog platforms such as BlogNL, Blogo, Blogse, Punt and Blogeiland. Figure 2 clearly shows how from 2004–2005 onwards Dutch bloggers — except for a relatively small number of Blogspot and WordPress.com users — shift to Dutch platforms, which are orange color–coded. Only a few bloggers remain on legacy platforms such as Pitas, which no longer accept new members but are still functional for old members.

Dutch software and platforms play an important role in the Dutch blogosphere and between 2004 and 2009 over 40 percent of all bloggers use Dutch blog software or Dutch blog platforms. When zooming into the use of platforms only, in 2009 almost all bloggers on blog platforms make use of Dutch platforms (see Figure 3).

Figure 2: Relative distribution of self–hosted blog software & blog platforms in Dutch blogs.

Figure 3: The relative amount of Dutch blog platforms over time compared to other blog platforms.


Self–hosted software analysis

URLs were analyzed to investigate the distribution of TLDs and platforms used in the Dutch blogosphere. The outcome suggests that the early Dutch bloggers did not use blog platforms. In general, they preferred to manually create their blogs, written in HTML or they used specifically designed self–hosted blog software. In HTML, the reverse-chronology, which is considered to be a key characteristic of blogs (Blood 2004; boyd 2006), had to be manually enforced in order to place the latest blog post on top. In order to include these kinds of blogs in our analysis we developed a method going beyond the blog’s URL. We searched within the page’s source code to look for the URL referencing the software powering the blog to create an accurate list of blog software.

Initially, the list was compiled by analyzing maps and then refined with newly discovered blog software throughout the research project. To compile the list of self–hosting software, we used the reflexivity of bloggers. Typically, bloggers tend to analyze and describe the practice of blogging (Hourihan, 2002; Blood, 2002). When researching our initial list of software, we found blog posts comparing or mentioning different types of software. For each year we searched the source code of the collection of archived blog front pages for the presence of the blog software types with the sourceCodeSearch tool. The results were editorially checked to establish whether the reference to the software implied that the blog was indeed running on it. Especially in the beginning, references to self–hosted blog software were not standardized. In later years the ‘powered by’ button in the side bar or footer became standard for most self–hosting software.

Contrary to the blog platform counts, the self–hosted blog software results suggest that the Dutch blog software Pivot/PivotX has been powering Dutch blogs from the start; it appears to have been the most frequently used software in the heydays of Dutch blogging. The decline of Blogger, the first blog platform used by Dutch bloggers, coincides with the rise of Blogspot — Blogger’s platform. Furthermore, the bar graph shows a boost of blogs powered by WordPress.org in the blogosphere from 2006 onwards. Movable Type and the Belgian Nucleus have a small but loyal share of bloggers running the software.

In terms of blog software and blog platforms, the peak of Dutch blogs was around 2005 for platforms and 2006 for software. Notably, the share of self–hosted software exceeds one–click publishing platforms, which even the bloggers themselves had not expected. A number of posts from early bloggers express fear that soon everybody will be blogging; some others voice rivalry between self–hosting bloggers and platform bloggers.


Network visualization over time

Mapping the outlinks of the blogs we retrieved from the Internet Archive from 1999 until 2009 allows us to go back in time and study how and where the Dutch blogosphere originated. The network is visualized with Gephi for each year. Figure 4 shows the rise, evolution and first signs of decline of the Dutch blogosphere, grey depicting the hyperlink network of all years together and red the blogosphere of a particular year. The first Dutch bloggers starting mid 1999 were not interlinked into a ‘sphere’, so we can trace back the beginning of a structural Dutch blogosphere to 2000.

Figure 4: The Dutch blogosphere in transition.


Pre-blogosphere in 1999

Figure 5 shows that some of the known Dutch bloggers, as mentioned in Meeuwsen (2010), together with less well–known bloggers, are present but do not form a blogosphere yet. Most notably Alt0169, ~wzweers and ~onnoz reach out to other Dutch blogs and may be seen as an effort to establish a community between blogs. Exemplary are links to blogs that list blogs, like http://beboo.org/metalog, listing the top 50 (international) blogs.

Figure 5: The pre–blogosphere in 1999. Early blogs linking outward.


Cluster analysis over time

In 2000 the Dutch blogosphere is dominated for the first time (see Figure 6) by bloggers on personal homepage providers (blue) and student pages (pink). The left side of the map shows a loosely defined news–tech cluster of Dutch news sites, surrounded by U.S. and U.K. news and tech blogs. Similar to the early U.S. blogosphere, tech and news are prominent in the Dutch blogosphere (Stevenson, 2010). The right side of the blogosphere shows a cluster of Dutch homepages (~) and student homepages. The free homepage provider DDS and Dutch Internet service provider XS4ALL are the most prominent providers. The larger nodes in the center are the founding blogs of the Dutch blogosphere, such as Alt0169, Sikkema, S-lr, Smoel, Rikmulder, Tonie, Prolific, Pjoe, Stronk, Ben Bender, Vandenb, Retecool. They are actually a closely linked cluster. Alt0169.com, a heavy linker in 1999 but without receiving any links back, is a central node in 2000. Figure 7 shows the Dutch marketing cluster, which emerged in 2005 and still a very dominant cluster in the Dutch blogosphere. Another distinct cluster in the later blogosphere is the Blog.nl cluster. Blog.nl has a very distinct shape because all Blog.nl blogs list and link the other blogs on that platform as can be seen on the right in Figure 9.

Figure 6: The Dutch blogosphere in 2000. Note: Blue: personal homepages; Pink: student pages; Yellow: blog platforms.

Figure 7: The Dutch marketing cluster in 2005.


Blog related software: statistics

The newly defined blogosphere includes a variety of blog–related actors. The blogosphere does not only take shape by the interconnections between the blogs but also by the interconnections between the blogs and other actors, such as links to external (blog) services and links to the blog software homepages. Blog related services include portals, manual and automatic blog indexers, external comment services and statistics providers. One of the most prominent nodes since 1999 has been Nedstat, the Dutch statistics provider. Nedstat — and its basic/free service Nedstatbasic — is a Dutch service providing statistics for Web masters and bloggers about their visitors and has been present in the blogosphere together with other statistics providers. Most bloggers publish their statistics, which supports the claim that “the blogosphere is obsessed with measuring, counting, and feeding” (Lovink, 2009, p. 30). Zooming into the node (see Figure 8) shows us all linked bloggers, hence presumably using Nedstat as their statistics provider.

Figure 8: Links to Nedstat.


Social media analysis

The early blogosphere is characterized by larger nodes such as Alt0169, Sikkema, ~wzweers, the founding fathers of the Dutch blogosphere. The heydays of the Dutch blogosphere are characterized by the rise of specific clusters, such as the marketing cluster and the blog platform cluster of Blog.nl, and by the rise of blog related services such as statistics. The later period is characterized by social media and content links. In this social media research project, we aimed at developing methods to analyze more closely the practices between blogs and social media.

When we compare the 2009 blogosphere with and without our custom actor definition (Figure 9), it becomes apparent that the social media platforms privilege a more fine–grained analysis. Social media are the big nodes in the network without custom actor definition; however, with custom actor definition the social media platforms seem to lose prominence in the blogosphere.

The question then arises what do people link to in social media: to user pages or to content (e.g., video, photo, status update)? Figure 10 shows the large social media platform nodes, containing smaller nodes. Comparing the various social media platforms, the results suggest that some platforms can be defined as ‘media sharing’ platforms, such as YouTube and Flickr, which mainly consist of embedded content links in blogs. In the blogosphere map with actor definition, these nodes decrease in size. Facebook is a relatively small node in the Dutch blogosphere and the links it receives dissolve into a divers set of profiles, pages, apps, events, and groups. Hyves — the Dutch social network that still outnumbers Facebook in the Netherlands (Comscore, 2011) — is one of the smallest social media references. Although the Dutch blogosphere prefers Dutch software and platforms, this is not reflected in social media platform links. Twitter, the largest node in the network is a platform mainly receiving links to user pages. This means that bloggers refer to themselves or to friends on the micro–blogging platform.

Traditional link analysis has its limitations when analyzing the share of social media in blogosphere networks. Our study suggests that the uniform large platform nodes are misleading. We found that link analysis zooms out to look at platforms as a whole and treats the entire platform domain as the node; in doing so the individual content link and the individual author disappear. The platform nodes require a more nuanced exploration.

Figure 9: Big social media nodes. The 2009 blogosphere with and without actor definition.

Figure 10: Social media in the 2009 Dutch blogosphere. A fine–grained URL analysis of the Big social media nodes. References to social media platforms demystified.


Download the starting points, used to generate the Dutch blogosphere, in .csv format or explore the network visualizations yourself in the G-Atlas. The G-Atlas, developed by the TIC-Migrations group from Paris, is a piece of software that on the one hand allows researchers to explore their corpus and on the other hand can be used as an analytical tool because it outputs corpus statistics. The G-Atlas contains the networks of each blogosphere from 1999 to 2009 and one composite network of all the years combined.