Useless Knowledge? Notes from the K/V Dark Data Workshop

lore_darkdata
No, not that Dark Data

According to many IT dictionaries, dark data is defined as data that is collected, but merely stored and not actively used. Much of this data is collected in compliance with regulations or as a by-product of research processes. This definition, however, was complicated by the participants of the ‘Dark Data’ workshop, organised by the Knowledge-Value research group. Across speakers and interlocutors, there was a sense that ‘dark data’ is produced by, and has an impact on, anything from scientific dating to qualitative evaluation. Almost every speaker presented their own taxonomy. Sharon Traweek, for instance, provocatively stated that ‘all data are dark’, explaining her definition by drawing attention to the similarities of data types rather than differences, whereas Carlo Caduff claimed the opposite, namely, that ‘dark data is anything but dark’. There were many other positions and provocations. Sabina Leonelli offered a contrast with ‘open data’, emphasising the effects of ‘newly institutionalised openness’ on the status of ‘dark data’. Alison Wylie, discussing archaeological methods, reminded the audience of the challenges that archaeologists are facing in working with many facets of ‘dark data’ including ‘legacy data’ (inherited from past research) and data gaps. Jennifer Cuffe, using the example of drug safety policy-making, illustrated how changes in coding can also produce ‘darkness’ (she also offered five types of ‘dark data’ that I was too slow to write down, and I also missed Sally Wyatt’s high speed differentiations).

Overall, speakers agreed that there were many types of absence, darkness, obscurity – from mindless ‘data dumping’ (Rachel Ankeny) to deadly secrecy (Brian Balmer, Neal White) – and also different ‘data vernaculars’ (e.g. Mike Fischer) that warrant a more nuanced approach to ‘dark data’. A key distinction, for me, was that between ‘passive’ and ‘active’ production of ‘dark data’. Emilia Sanabria, for instance, touched on this through her discussion of obesity campaigns: ‘what can be known… what is actively made unknown?’ Her paper pointed to a very urgent issue: how are we being affected by data that is deliberately withheld? This partly addressed the implicit question of the workshop (I’m remixing questions by Kaushik Sunder Rajan here): why should we care about data that is sitting around and seemingly not of value to anyone? What determines the value of data? What determines audiences of data? To this, Gail Davies added a concern with the geographical distribution of data: how, and between whom, does knowledge move? Other participants complemented this by asking: Who defines what is sensitive or accessible? And how can one intervene in such ‘dark’ or grey spaces?

These questions were tackled through a focus on sharing experiences about working with ‘dark data’, big data, open data – or data in general. Brian Balmer talked about an ‘ecology of practices’ around data: the everyday maintenance and curation of data by collectors and users. He also pointed to the fact, that even in the ‘darkest’ spaces, these mundane levels of processing exist: what do we include under the heading ‘top secret’? (Balmer contributed a tragi-comical anecdote from the 1950s about Whitehall civil servants complaining about the overuse of the classification ‘top secret’). Similarly, Rachel Ankeny argued that it is the activities around data that are important, not data itself. In her example, she contrasted data dumping with data sharing – amassing versus fostering collaboration (although, for her, sharing does not necessarily imply collaboration). Pointing to the shaping of knowledge by relationships and goals, her questions could be summed up as follows: How is data used as knowledge? What are the implicit social contracts when we collect data? How is data rendered ‘dark’ by that? Sally Wyatt usefully added a reminder that ‘persuasion is part of working with data’ and that this dynamic has consequences on knowledge production.

Other speakers struggled with the complexity of ‘dark data’. Sharon Traweek commented on how all fields have their coping strategies, and that the identification of coping strategies is useful. She named a myriad of perplexing questions that researchers had to face when working with ‘open data’, including ‘should collected data on an open data project be open, too?’ For her own topic, Emilia Sanabria offered two frameworks of complexity as a starting point: one ‘romantic’ (coherent model of relations), the other ‘baroque’ (without final coherence, but rich with uncertain/partial relations). Alison Wylie wondered about the necessity to sometimes ‘pull the data away from the question’, whereas Rachel Akeny found that, perhaps, academic standards were changing altogether, extrapolating from a potentially eroding distinction between ‘real scientific literature’ and grey literature. Within this context, temporality emerged as provocation. Whether it was Neal White’s bemusement at the fact that incredibly big and complex data infrastructure appear to be necessary to access incredibly small amounts of ‘time’ or information – whether this was at CERN, stock trading or military surveillance – or Ann Kelly‘s insights into the role of the temporality of data in emergencies such as epidemics – in particular the clash between the need for an urgent, pragmatic response and the slower temporality of scientific research that deals with complex situations. She also introduced the terms ‘charismatic data’ and ‘pragmatic liveliness’ of data from her field.

Connected to the practices of working with data was the theme of ‘data imaginaries’. Whether it was data as ‘currency’ (Elena Aronova), as property (John Kelly), or as ‘resource’ (Alison Wylie)– how we value data matters. Here, Elizabeth Johnson offered some interesting thoughts about data imaginaries in the context of global environmental change. For her, data was oriented around four major tensions around ‘demand’ (demand for data versus seemingly ‘never enough data’ to prove a particular environmental problem); ‘intervention’ (desire to not bring the future we see into being); ‘wager’ (the moment where one has to decide to intervene, despite ‘unknowns’) and ‘imagination’ (where is the line between data and fiction). Gail Davies further raised the role of data visualisations, citing examples such as the controversial ‘burning embers’ climate change diagram. How do images shape imaginaries? This led to discussions of boundaries between art (or style) and data circulation, archiving & access. Lastly, participants pointed out that ‘dark’ should not only have negative associations. Many speakers noted the enjoyment that they or their research participants showed in the ‘detective work’ surrounding ‘dark data’ (Cuffe, White, Davies). Elena Aronova even contested the negative framing of secrecy and offered a view of secrecy as enabling: ignorance can be a resource. This was later taken up by Brian Balmer, Neal White and Linsey McGoey who argued around the productive potential of ignorance – from the fearlessness of ‘overt research’ to a more subversive kind of ‘strategic ignorance’ (more below).

A further theme that emerged was that of knowledge production and epistemic injustice. This began with a discussion of the ‘crafting practices of data’ by Sharon Traweek who asked about who and what is involved in these practices. According to her, issues of epistemic injustice start with ‘funding ecologies’ and gender/race inequalities in research hierarchies. Others, too, criticised ‘strategic blindness’ and ‘white ignorance’ in the service of authority maintenance (McGoey, Balmer), as well as the colonial history of archives and the on going unequal status of knowledges (Sunder Rajan). What are the researcher’s responsibilities? (Sanabria) Against this background, many participants called for academic data related activism. Whether it was the need to subvert the economic rationale behind data handling in universities (Fischer), the use of data to strengthen academic activism against metrics (Traweek) or the involvement of students in the analysis of debt and its history. This tied in with a wider discussion of capitalist structures. Here, Kaushik Sunder Rajan noted the difficulties of analysing together economic knowledge, logic of capital and corporate power. Linsey McGoey added a critique of the endemic forgetting of the caveat ‘for use for the public good’ in intellectual property legislation. Returning to the earlier theme of ‘legacy data’, Mike Fischer addressed the conditions of data inheritance through the underfunding-privatisation dynamic, using the example of environmental services, where companies are stepping in & generate/inherit data with different systems. He lamented that experts managing data ‘commons’ were being laid off.

Potential paths of action were also considered. Neal White’s ‘overt research’ represented one tactic, where secrecy was countered with open and assertive public scrutiny. An almost opposite approach came from Elizabeth Johnson, who drew on French philosopher Frédéric Neyrat and his proposal for strategic withdrawal from data to refuse the constant demand for it and for its rationality. Both tactics, despite their differences, aimed at the production of alternative imaginaries of politics and activism. Linsey McGoey further alerted to the necessity to pay attention to definition – that how we define things, particularly to do with uncertainty and ambiguity, can undermine our own ability to critique. In addition, art emerged as a medium of contestation. Again, Neal White argued that art has the capacity for ‘creating room for what is un-thought’. He also (rightly) joked that ‘an artist can get into anything these days’, alluding to the diversity of places with ‘artist residencies’. Other participants hoped, too, that the presence of artists through residencies in unusual places would ‘make something happen’ (Fischer) or that artists such as the Critical Art Ensemble posed a dual critique for both biopolitics and art (Joe Dumit). Artistic experiments were regarded as especially effective, because they combined the experimental and the experiential. Other than art, experimental, marginal ‘institutions in the wild’ that create a sense of common trajectory towards something (Sunder Rajan; White), were seen as a promising means of contestation. Elena Aronova contributed a historical example of contestation from the times of the Cold War: contestation through the use of a new medium. While the US focused their research on computers, the Soviet Union turned their efforts to micro-film. Will we see a return to the analogue? And, if so, does analogue still represent a contestation?

Finally, the usefulness (or not) of the term ‘dark’ was debated. On the one hand, the metaphor was appreciated for its associations with contrast, with darkness being essential for ‘making things sharper’ (Jim Griesemer). It was further appreciated as a ‘frontier metaphor’ – dark data as a frontier and ‘site on the verge of exploitation’ and colonisation, bringing into focus the ways this frontier is ‘crafted’ in both good and bad ways (Caduff). On the other hand, there were concerns about other inheritances and connotations of the word ‘dark’. Does it have racist connotations? Is it unfashionably ocular? Are we trying to bring light into darkness in an Enlightenment sense? What is our project, if there is one? Or, as one participant asked: if we are not doing enlightenment (of ‘dark data’), what are we doing? This challenge was countered with a critique of the critique of Enlightenment. John Kelly, for instance, lamented (as part of his critique of imperialist histories of theory) that it almost seems as if ‘to be ethical you have to be against reason’. He noted that he was against the idea that the Enlightenment is the problem and that, instead of the image of ‘shedding light’, a more realistic critique is needed that takes into consideration the situatedness of reason. As part of the same discussion, Kaushik Sunder Rajan offered that ‘Enlightenment not negotiable, but how we do it is negotiable’. For him, it could also be a project of attentiveness. In the end, was no consensus on how best to deal with the term ‘dark’, although some interesting references to Enlightenment critiques surfaced, such as Stefano Harney and Fred Moten’s ‘Undercommons of the Enlightenment’ challenge and Moten’s ‘Touring Machine‘.

As this blog post hopefully illustrates, despite its many ‘absences’ (I could go very ‘meta’ on data and blogging here!), the two days of discussions were very rich and dense, and also constructively challenging in their interdisciplinarity. I was pretty exhausted at the end just from listening and felt seriously sorry for the workshop participants who were in for the ‘long-haul’ – another three days of debating data-intensive science at a follow-on workshop. As someone who has mainly engaged with big, open and dark data as an occasional visitor to Open Tech and CCC meetings, it was interesting to see how these debates play out in a purely academic context. Despite the heavy theorisation, there was a hopeful parallel: like at these more practitioner and activist orientated events, I came away with a feeling that data handling concerns everyone, and that everyone has some means of accessing the debate through the many ways one is affected by data. I was particularly pleased that the university itself was ‘brought to light’ as a problematic data space in need of closer scrutiny and intervention, thus dissolving the boundaries between topic and setting. Everything was suddenly (and uncomfortably?) data. And here it is perhaps fitting to end with John Kelly’s opening questions to his talk (that played on both Tolstoy and Martin Luther King): Where have we been? And where do we go from here?

The workshop, titled ‘Knowledge/Value and Dark Data: Absences, Interventions and Digital Worlds’, was held at the University of Exeter on 15-16 December 2014 and was organised by Sabina Leonelli, Gail Davies, Brian Rappert and Kaushik Sunder Rajan.

Leave a comment