Co-written with Peter Pulsifer and originally published for the Arctic Horizons Program.
Data is a hot topic right now. All over the news and social media, there are mentions of “big data,” “data science,” “data visualization,” and beyond. It is no secret that data, and lots of it, can be powerful tools for scientific research, enabling research questions across broad spatial scales over deep time periods. A close collaborator (who likes to avoid all things digital) once said in response to a presentation on aggregated archeological faunal data across the whole of Europe, “I think being able to aggregate all of these data provides interesting interpretations, but I always wonder what’s behind those pie charts and visualizations…” She was right to be skeptical, highlighting an urgent need to consider all aspects of data life cycle, because in addition to the need of “lots of” data, these data also need to be high quality and reliable. Indeed, when compiling lots of data from many different sources (if you can even find them and access them), how reliable are those aggregated data?
While my collaborator would never use this term, she recognized the need for data “curation,” which is far more than simple data storage, archival, management. To be reused and aggregated, these data need to be high quality, accessible, discoverable, which takes a lot of time and effort from both the scientist and the data manager (if they are two different people, of course!) to document metadata, employ standard terms and vocabularies, and provide file formats that can be accessed by open source software now and in the future. In addition to being scientists, however, we are also social scientists, meaning we often deal with data that are sensitive in nature, contain identifying information about our research subjects, or provide information about locations that need to be protected (e.g., sacred or archaeological sites). Another term that comes up often that we need to carefully consider is “open data.” While “open data” is a championed scenario (to enable data sharing and access) and often required by funding organizations, we like to add the word ethically to the start of that phrase. By ensuring “ethically open data,” we can make the data appropriately “open” based on the specific needs to the data, whether by ensuring the anonymization of research subjects or randomizing the location of protected sites.
Fortunately, many initiatives are ongoing in the Arctic research community to build capacity, infrastructure, and community to enable data curation for the Arctic social sciences and to ensure high quality data are accessible, discoverable, usable, and linked with various projects. Based out of the National Snow and Ice Data Center, Strawhacker leads the DataARC (formerly known as cyberNABO) project (supported by two grants from the National Science Foundation - SMA 1439389 and SMA 1637076) designed to support the linking of databases from archaeology, climate science, paleoecology, and the humanities and the development of data products to enable research on the long-term human ecodynamics of the North Atlantic. Another project, Patchwork Barents based out of the Jefferson Institute, is designed to make data more widely available and easily visualized for data journalists working in the Barents Region. Both of these projects show how we can approach creating data products for a variety of audiences.
While this focus of this blog has been on social science data, given our research context in the Arctic, it is not surprising that much of this topic feeds into Indigenous Knowledge. In the previous blog, we discuss the special considerations needed to curate and manage data, information, and documentation of Knowledge coming directly from Indigenous communities in the Arctic, focusing on many efforts from the Exchange for Local Observations and Knowledge in the Arctic (ELOKA, eloka-arctic.org).