Archiving and sustainability

KDL's pragmatic approach to managing 100 Digital Humanities projects, and more

Archiving and sustainability is integral to KDL’s Software Development Life Cycle (SDLC). Discussions about archiving and forward planning inform our pre-project phases and determine release and post-project phases. When and how archiving and sustainability considerations are introduced and discussed with partners depends on the scope, size and type of project, but are typically discussed and evaluated in early conversations, as part of  our preliminary feasibility assessment.

Part of those discussions focus on what KDL can offer with respect to data hosting and maintenance, explaining what a Service Level Agreement (SLA) with KDL entails (what it includes and indicative costs). If the partner is a colleague within the same University (King’s College London) and if the research data component of the project is substantial -- as opposed to, for example, an experimental project where tests on the data are deemed ephemeral -- options for data management with the University Library Services King's Open Research Data System (KORDS) are also discussed. Subject repositories or other external research infrastructures connected to relevant data management might also be taken into consideration at this stage. The KDL feasibility document includes a section titled Forward Planning Definition which, amongst other things, describes the hosting and infrastructure SLA along with archiving and sustainability and research data management options.

Background

At its inception in 2015, KDL inherited just under 100 digital research projects and websites. Supporting that number of projects (most of which had no funding attached) was a challenging proposition, but KDL accepts its responsibility to the community to manage them in a fair and transparent manner. This in itself is significant: accepting responsibility entailed shouldering responsibility for issues that were often outside the control of the team, resting with external Principal Investigators or PIs and their institutions, communities of scholars, and funding bodies. The archiving process entailed far more than merely mechanical technical work, in other words. It required sometimes lengthy discussions with research partners (usually PIs) and in some cases stakeholder communities to assess the options available and choose the most appropriate one for their particular project.

Ideally projects were placed on an SLA to ensure ongoing hosting and maintenance, but this was not always either feasible or sensible. In many cases PIs responded promptly, asking us to gracefully archive their project because it had fulfilled its purpose. In other situations it became apparent that migration to a new host was possible, or conversion to a simpler format that substantially reduces the cost of hosting and maintenance. The archiving options offered by the team have evolved over time, to suit the needs of our research community. Currently, we offer the options outlined below.

Archiving and Sustainability Options

  1. SLA with KDL

    Maintaining the project under a costed Service Level Agreement with KDL - we offer agreements of variable durations depending on project age, infrastructure requirements, and PI preference but our standard SLA duration is five years.

  2. Migration to ITS

    Migration of the project (and associated data) to a King’s College London ITS microsite if that is agreeable to both PIs and ITS.

  3. Migration to external host

    Migration of the project (and associated data) to an external host, such as the PI’s new institution or a commercial hosting provider.

  4. Static conversion

    Conversion of project website to a static website. This is offered as a free service for suitable projects. It can be a very cost effective solution, and maintains public access to legacy projects and datasets that are no longer updated but continue to have value for the PI and research community.

  5. Datasets deposit

    These options are evolving, in collaboration with the Department of Digital Humanities and other partners: KDL-DDH CKAN data catalogue and repository (https://data.kdl.kcl.ac.uk/); KCL Research Data Management System (https://www.kcl.ac.uk/library/researchsupport/research-data-management/Preserve/Deposit-your-data-with-Kings3.aspx); External institutional or subject specific data repository.

  6. Minimal archiving

    Minimal storage (for two years minimum): project and data are archived on our infrastructure and a placeholder page is shown at a project URL. This placeholder includes a description of the project and other metadata, as well as links to snapshots at the Internet Archive and other web archives (such as webarchive.org.uk) where the project has been archived.

We are also interested in options such as providing projects as virtual machines (VMs) for offline use, or capturing their last good state in recording tools such as webrecorder.io but (interestingly enough) those options have not yet been requested. Where PIs or Co-Is are not able to be reached, options are considered within KDL and with appropriate Department of Digital Humanities staff and a decision made on a per project basis.

Projects under SLA

Institutions that support SLAs for projects hosted and maintained by KDL are included in brackets, below. FAH stands for Faculty of Arts & Humanities at King’s College London.

Migrated projects

If researchers wanted to maintain dynamic functionality for their projects but were unable or uninterested in financially supporting a Service Level Agreement with KDL then we sometimes helped them to migrate their projects to other institutional or commercial infrastructure. King’s College London academics with relatively straightforward and current (in that they are still actively updated), websites were offered the opportunity to migrate to KCL ITS infrastructure where appropriate. We are always sad to see projects leave KDL, but view migration as a positive outcome: the long-term interests of the project should always come first.

Migration in progress

Projects converted to static sites

Projects converted to static sites remain on KDL infrastructure, but no longer offer search or other dynamic functionality. These are hosted for free by KDL, although we cannot promise to fix issues that appear that are outside our control, such as updates to browsers by external vendors.

Conversion to static in progress

Minimally archived projects

Minimally archived projects are no longer accessible. We maintain their original URL if it is under our control, and content files and data are archived for a minimum of two years on KDL’s infrastructure (we have had to limit this to two years for practical purposes but aim to keep data longer when feasible). We provide a static placeholder page for historic purposes, with basic information about the project and (where possible) screenshots and links to web archive snapshots. The reasons why a project would be archived in this fashion are diverse:

Governance

Responsibility for KDL's archiving and sustainability strategy rests with its Director, in consultation with the Faculty of Arts & Humanities, King's College London, via the overseeing role of the SLA committee which meets annually. For further information see:

The team

Samantha Callaghan

Research Software Analyst

I've been responsible for documenting the decommission process for around 70+ projects that have been migrated, converted to static sites or minimally archived. Having worked in the information management sector for several years, the KDL Archiving and Sustainability project is a fantastic opportunity to come to grips with digital archiving of frequently complex digital objects. One lesson that we have taken from this work is to "document as you go"!

Read more

Arianna Ciula

Director and Senior Research Software Analyst

As the lists above witness, not only the scale of the archiving process is substantial but as new projects start and end, it lives in the background of KDL daily operations. So it was important to consult with other analysts and feed our perspective to the lab manager, so to align the workflow with KDL Software Development Life Cycle.

Read more

Brian Maher (2017-2023)

Senior Research Software Engineer and Systems Administrator

Through our Archiving & Sustainability efforts, we have been able to greatly improve the security of our systems. I have worked to update or archive as many websites as possible, converting many to static sites, in order to keep important academic research online for as long as possible.

Read more

Pam Mellen

Research Software Lab Manager

I've primarily worked on integrating the Archiving and Sustainability project with the larger KDL Software Development Lifecycle, as well as supporting implementation at various points throughout the project. With a goal of building Archiving and Sustainability into KDL's 'business as usual'; this has involved working with with the team processes and documentation which are robust and standardised. As we reach the stage where SLA maintenance is integrated into the normal work of the Lab, I have implemented processes to ensure that we maintain good oversight of our portfolio of projects under SLA.

Read more

Tiffany Ong

Senior Research Software UI/UX Designer

I worked on the design of a simple html structure and neutral styling for the decommissioned sites template. I also worked on the templates for the footer, privacy and cookie policy for sites under SLA, following the current KDL style.

Read more

Toby Pitts (2018)

Student Intern - Archiving and Sustainability Assistant

I began the decommissioning documenting process over a two week period in 2018.

Read more

Natasha Romanova

Digital Methods Lead (2019-2020)

As part of my role as Digital Methods Lead at KDL in 2019-2020, I assisted the decommissioning analyst in preparing decommissioning documentation and, wherever appropriate, placeholder pages for a number of legacy projects and projects reaching the end of their SLA period. As part of this work, I often had to look into the history of the project and liaise with project partners and technical teams.

Read more

Anna Maria Sichani

Marie Curie Postdoctoral Researcher (2016)

I was involved in the KDL Archiving and Sustainability project at its initial stages during my KDL PhD research fellowship. I captured as much information we could for legacy projects so as to sketch an informed sustainability strategy. It was a great school for me in understanding several crucial qualities of a DH project: documentation, documentation, documentation.

Read more

Miguel Vieira

Principal Research Software Engineer

My work on Archiving & Sustainability focused on creating simple and easily deployable tools and scripts, that could be incorporated into legacy projects, in order to integrate them smoothly and efficiently into KDL Software Development Lifecycle.

Read more

Tim Watts

Senior Research Software Systems Manager

As part of our infrastructure renewal, I incorporated some cost effective additional storage into our SAN (disk array) which is just a fancy way of saying: we added some high volume slower disks to complement the super fast SSDs we use for our main service. We designed in the same robust level of fault protection into this archival group (RAID6 plus 2 hotspares - which means any 2 disks can fail simultaneously without loss of data and there are 2 spare disks available to take over from the failed disks). In addition to this, our archival datasets are backed up to the same enterprise grade backup disk systems using the same backup software and monitoring as our main service. At least one of the backup storage systems is located in a remote datacenter to reduce risk. I designed a very simple and effective curation process for archiving whole virtual servers and any associated components such as databases, image sets etc.

Read more

Carina Westling

Project Manager (2016-2017)

I worked with James Smithies and the wider team at KDL to structure and define the problem space of the Archiving and Sustainability project at KDL, and devise a strategy for negotiating onward solutions for ca. 100 legacy projects inherited from the Department of Digital Humanities. Understanding and costing the resources involved in the ongoing maintenance of digital research projects was a big part of this work, and contributed to the development of KDL Software Development Lifecycle processes.

Read more