edit in Google

How Metrics Shape the Creation of Artifactual Data

        As per my prior interest in professional baseball, I decided to look at some old baseball scorecards. These scorecards were created by the Villanova college baseball team from 1866-1874, and reveal to us some important distinctions between rules and techniques that differ from the modern professional baseball game. I realized, through looking at these scorecards, that the preservation of historical datasets in their original forms is extremely important, as while many may think that datasets as seemingly evanescent as baseball scorecards have become meaningless with the modern accumulation of digitized versions, the physical copies can help us understand deep things about the cultures that created them. The analysis of Foucault and Scott below gives a better understanding of both the significance, and the type of information that we can look for in order to gain this deeper level of understanding.

Digital transformation of artifactual data allows for better visualization and storage, but fails to account for some of the intricacies of the data that are essential to the reader’s understanding of it. Although some data can certainly be viewed in a non-contextual form such as a spreadsheet, many artifacts can have much more significance when viewed in their original forms. The significance of original data is exemplified by the word episteme, or the implicit ‘rules’ that govern the world during a specific time period and in a specific cultural context. Foucault explains the classical episteme as a system that can be understood “in its most general arrangement in terms of the articulated system of a mathesis, a taxonomia, and a genetic analysis.” “The sciences… are always directed… towards the discovery of simple elements… on which knowledge is displayed in a system contemporary with itself,” Foucault states (74, The Order of Things). Applying this concept of contemporary epistemology to a specific artifact we can thus derive the most simplistic of meanings from it: why it was created, who it was created for/by, and what elements of the specific genre of artifact were important/different when it was created versus now.

One way in which we can derive these simple meanings is through gaining an understanding of the standard metrics that were present at the time of the artifact’s creation. These metrics are laws, regulations, or rules that are specific to the artifact that is being measured, and can afford us the background information that we can use to understand the reason behind the object’s creation. James C. Scott’s “door-and-window tax” in Seeing Like a State exemplifies this idea. Scott states, “[The tax’s] originator must have assumed that the number of windows and doors in a dwelling was proportional to the dwelling's size. Thus a tax assessor need not enter the house or measure it but merely count the doors and windows'' (47). While it was an ingenious plan to simplify individual taxation, as houses with more windows were generally bigger, and their residents more likely to be wealthy, thus making those residents eligible for a higher tax rate, it also had one major design flaw: it didn’t account for future developments; wealthy residential home builders merely requested their homes be constructed with fewer windows and doors in order to pay less taxes. We can therefore assume, in this example, that a large house built in France before 1917 (when the tax was abolished) that had few openings may have been built as a result of a successful attempt to find a loophole in this law.

Before applying this logic to the baseball cards, we need to understand some of their basic elements. The cards in this set were created from 1866 to 1874 by Villanova, and were used to keep track of scores and runs during their baseball games. There are approximately 80 scorecards in the collection, all stored within a scorebook. Each card consists of two pages of the book - one page for the home team (Villanova) and the other for the away team. Each individual team’s page has the team name written at the very top, and below that a section for the players names, their positions, the innings, number of outs, number of runs, and remarks. These sections are categorized by player, rather than team, and record how many runs or outs a player got during the course of each game. Below these sections are records of the umpire’s name, the scorer’s name, passed balls, missed fly catches, and the time and date of the game.

Through an in-depth analysis of the scorecards in connection to the Scott example we can gain an initial understanding of the rules and metrics of the nineteenth century game. This specific scorecard above depicts a game between the Villanova Nine and the Bradford’s Nine that was played on April 9th, 1867. One thing we can immediately notice is that the score of the game is extraordinarily high for a baseball game (31 - 25). This may reflect the lack of a wide variety of pitching techniques, such as the curve ball which was invented in 1867[1], that are at the disposal of modern pitchers and are used in order to effectively strike opponents out. It also could have been because the MLB only lifted the ban on overhand pitching in 1884, making it much harder for batters to hit the ball.[2] Another thing we can notice is that the game only lasted six innings as opposed to the ‘traditional’ nine. Drawing from the previous assumption, we may additionally assume that the games may have been shorter because there was much more scoring than there is in contemporary baseball games due to the comparatively ineffective pitching strategies.

There are a few other things to note. There is a section at the bottom tallying up all of the outs, the runs, and the total score. The slashes (/) signify runs, 0’s signify the first out, 2’s signify second out, and 3’s signify third out of the inning. The card can thus be read horizontally starting from the first player down to the last.

If we examine the altercations between the cards throughout the eight years of their creation, we can additionally understand how the rules, metrics, and recording techniques changed over time. For example, the card above depicting the 1874 game between the Villanova Nine and the Philadelphia Nine has some interesting differences. Compared to the first card we looked at, the score of the game is a bit lower (10 - 31), and the game has been lengthened to nine innings. The notation of recording runs was changed to a dot rather than a slash, probably because the slash looked too much like the “1” that is written when a player gets out for the first time in an inning. There is also now notation for a base hit: the “x” written in any given square signifies a base hit, while an x “to the power of” a number signifies a base hit and then a subsequent out. There is also a slight difference in the way that the number of runs is recorded; whereas before they were recorded only by player and total, runs on this card are recorded by player, inning, and total.

The differences between these two cards give us a glimpse into the rule changes that occurred throughout the seven years between their creation; a wider view of the scorebook as a whole contributes to an understanding of the time in which the cards were created. Not only is there more information being recorded in the later cards - which has continually increased throughout the decades, as current sports groups record and track a wide variety of data on their players -, but rules have also changed continually throughout the course of all the games. Additionally, through looking at the physical copies of these cards, we can imagine the sport’s first games being played out in our heads. We can compare cards to see how players joining and leaving the team affected the team’s record, we can make assumptions about how different rule changes affected each game’s score, but most importantly we can see the artifact in its entirety: the handwritten transcription on the old, faded piece of paper that allows us to better imagine the time surrounding when it was written.

Another artifact that is interesting in its connection to the cultural episteme is this sheet of boat crew members. The sheet is ordered by name, condition (married/unmarried), rank or occupation, where born, and if deaf-and-dumb or blind. We can see that this specific sheet is important because it holds the chief, or captain, of the ship among its names. One particularly interesting part of this document is the ages of the crew members. Two of the crew members, Thomas Coleman (35) and Catherine Coleman (28) have a son of ten years, also named Thomas Coleman, and a daughter of 12 years, Martha Coleman. Not only is it interesting that the crewmember named his son after himself, which was more common in the eighteenth and nineteenth centuries[3], but the Colemans also had children at a very young age. Seeing this reinforces my preconceived notion that in the nineteenth century children were born to younger parents because it was more culturally acceptable back then.

Seeing these two artifacts in their original forms allowed me to derive certain meanings, or metrics, from them (old baseball rules and strategies from the old baseball cards, and culture from the ship logs) that I would not have been able to do if they were merely words on a spreadsheet. Through this understanding of the societal metrics, or rules, I further gained an understanding of the episteme surrounding their creation.

        Although it is important to preserve the original form of these artifacts as just described, digitization is also essential to their ability to be used by a modern audience. A complete digitization of the baseball scorebook would not take long with the modern technology we have at our disposal. I would use an excel spreadsheet to digitize this data. During the process of digitization, I would create headers with all of the subsequent information: baseball club, players, position, inning, outs, runs etcetera. I would then write an excel formula to add up all of the columns and the rows of the scores and runs in order to more efficiently digitize the data. Because outs are recorded with numbers, I would not be able to record runs with a number, and would most likely use the “*” symbol. I would use the sumif equation on excel, and specify “*” as the sum condition, thus giving me the number of the “*” symbol in each column and row. This simplification would allow me to complete the project in a lot less time than it would to input every number and symbol into the sheet. Finally, I would copy and paste the whole thing eighty times, as there are eighty scorecards, and place the teams that played against each other side by side. I would then complete the project by manually inputting the rest of the information. I estimate that this digitization would take me around three hours to complete. The digitized version of the baseball scorecards would offer me easier navigation of the data contained in them. I could search the spreadsheet for a specific year, date, or game, and be able to find the information pertaining to that a lot quicker than I would with the physical files.

Both means of recording data were discussed in this article, and both have their benefits. Navigation of the data would be much easier digitally, but it would also be much harder to effectively understand the culture surrounding baseball that is more easily detectable on the physical copies of these scorecards. The intersection between these two means of representing old data sometimes results in academic articles providing a transcription of the data below the original version, and for the reasons stated above, is exactly what I advocate for.


[1]  https://www.baseball-almanac.com/blog/who-invented-curveball/

[2] https://www.sbnation.com/2018/12/1/18119682/the-story-of-baseballs-first-curveball

[3] https://www.familytreemagazine.com/names/naming-traditions/jan-2012-naming-practices-feature/