XSXKCD Project logo
Collection and Scope


We consider whole web pages, including both data inside html and images themselves to be xkcd documents. Within these documents are the titles, images, URLs, bits of javascript and title-texts that make up xkcd comics. While this is an expansive document type, this is the only document type being included in this collection.

This collection grows three times a week, as Munroe updates xkcd every Monday, Wednesday, and Friday. The software running this website on the back end is capable of intake that keeps up with this release schedule. As more instances of xkcd appear in the wild, it is critical that we include them in our collection.

While that aspect of our collection development is automated, our system of tagging is not. Our subjective tags are critical for providing subject access, and require work to maintain. Each tag groups instances of xkcd in a way that goes beyond simple keywords. For example, xkcd instances may not include any text or keywords that disambiguate the fact that the comic in question contains a graph. So the automated methods of today cannot do the subjective groupings that help users access similar concepts across comics. Simply put, maintaining our system of manual classification tags is one of the most critical aspects of this collection.

Our coverage, of course, extends all the way back to the beginnings of xkcd, xkcd.com/1/. This coverage is uniform - all of the elements we collect, title text, comic transcript, title, etc., are included for each instance. In doing this, the way in which the collection works is predictable and users will be able to easily learn how to search our collection well.

The only element of our collection that is not uniform, beyond the subjective tagging of comics, is the inclusion of links. As will be discussed elsewhere, we believe certain external resources to be vital to the audience for this xkcd collection, and we are including them in our records. However, not every comic in the collection has relevant links. As a result, we view this as a value-added service, rather than something that is essential to a collection of xkcd. Simply put, a project to organize the xkcd corpus can work without including relevant external links, but xkcd projects of this kind improve their quality by including these resources. This is a way to further engage the community in the context of our ever-growing collection.


As mentioned in the previous section, our definition of xkcd documents is expansive, but there are limits to our scope. First, we are not ascribing certain meanings or points to comics beyond that which is readily accessible in the text we scrape from xkcd proper. In short, we are not like the Explain xkcd wiki (http://www.explainxkcd.com/wiki/index.php?title=Main_Page). We do not claim to know the points of specific xkcd instances. Instead, we provide methods of access (including linking to external resources [more on that feature below]) to allow users to derive meaning or entertainment from xkcd comics themselves.

We do, however, exclude some resources that allow users to gain meaning from xkcd themselves. The first print publication of xkcd “xkcd: volume 0” is not included in the catalog because it is not a digital resource and cannot be data mined as the rest of our xkcd documents are. It modifies instances of xkcd and its provenance is unclear. This collection of xkcd includes comments, notes, annotations, and changes that force us to treat it as its own work. It is not known when the changes to xkcd comics in “volume 0” were made, whereas each xkcd comic has a known publication date that we access.

Beyond the book collection of xkcd, we deem all xkcd merchandise as beyond the scope of our collection. Many of the goods available in the xkcd store are directly related to xkcd comics, but several are not. Beside that point, a more obvious reason for exclusion is that the merchandise available for purchase are simply not comics. They are t-shirts with images from comics printed on top. They are posters printed and signed by the author. To a different collector, these items could easily be construed as documents, but our collection is limited to the comics proper and providing coherent access to them.

Similarly, we do not include public events in which Munroe participated or in which xkcd was the focus in our collection. Beyond the issue of not knowing about all of events that could fit in that category and the probably impossible task discovering each and every one would be, again, they are simply not comics. Often times Munroe will reference his webcomic strip in speeches and appearances, but audience interaction and lines of inquiry that go beyond the comic disqualify events beyond the simply disqualification that they are not pure comics. Public events are simply not digital comics.

Although it is included in the same website, we exclude “what-if”, a sister project of xkcd hosted on the same site as the xkcd comics. Every Tuesday Monroe answers questions with article-length explanations of situations and physics. These include drawings in the style of many xkcd comics, but they are not comics. They are drawings to illustrate points in “what-if” explanation articles. There are curious interactions, especially when Munroe uses characters that are recognizable in multiple xkcd instances. But, no matter the cross-pollination of comics characters and style, “what-if” is simply not a comic. Munroe himself is not solely responsible for its authorship (the questions that are answered are user-submissions). A concept we may explore is using the external link aspect of our collection to take into account especially relevant “what-if” articles. But we do not expect to use many of these articles in the records of items in our collection.

We make available the rest of the Internet in the linking field, but beyond the specific resources we choose to link, the web writ large is excluded from the collection. There are many xkcd spin-off materials by the audience like discussion posts on the xkcd forums and blogs that explain, criticize, and comment on xkcd. But they are not comics authorized by Munroe, so they are excluded from the collection.

Overall, our scope is limited for two reasons: simplicity and coherence. We allow links to slightly relax our collections strict limitation to xkcd proper, but these links do not totally open up the proverbial floodgates. Keeping the floodgates essentially closed to materials that could distort the collection limits our scope and allows us to have a manageable amount of resources to organize. This limited and well-organized domain, in turn, becomes coherent through our efforts.

Created and maintained by Brian Wilson, Garrett Traylor, Keri Carroll, and Brian Balsamo (click to contact!), and in conjunction with Randall Munroe.