pat gallowaypat galloway
pat galloway pat galloway pat galloway pat galloway pat galloway pat galloway pat galloway pat galloway pat galloway
pat galloway
   
INF 392K Digital Archiving and Preservation - Schedule, Spring 2016
pat galloway
pat galloway
pat galloway
  INF 392K Home
  Objectives
  Text
  Assignment
pat galloway Schedule
  Resources
  pat galloway
pat galloway
pat galloway
Search
Site Map
Contact Info
UT Home
pat galloway

NOTE: This syllabus is a work in progress until the first class meets and may change slightly through the semester if new issues come up. Please do not print it once and then keep referring to that version; instead, record the link.

January 20: Course overview: overall discussion of course, assignments, student skillsets, and the history of computer technology (!)

Discuss student backgrounds and skills; at rollcall students will introduce themselves. Direct students who need them toward resources to bring skills up to speed.

Lecture Topic: Discuss the history of corporate and individual production of digital objects: what are the technologies we should be concerned with? What are digital archivists likely to encounter? A history of hardware, software, systems, networks, and media will be sketched. I will outline the history of the iSchool repository and its present contents, some of which, especially the final reports of projects from earlier iterations of this course, you will be using. I will also list the projects for this semester, which will be assigned week after next, and discuss the overall schedule of work to accomplish semester projects.

Readings: These are brief things that may be interesting for you to follow up with after class; not required, as I will refer to them, but you can learn more by looking at them.

Micah Beck, "Is [Scalable] Digital Preservation Possible?" This is a presentation available at http://www.dlib.indiana.edu/education/brownbags/fall2011/scalable/scalable.pdf

Allen Renear, David Dubin, Karen M. Wickett, "When Digital Objects Change--Exactly What changes?" Proceedings of the American Society for Information Science and Technology, 45(1), 1-3, 2008. Available at http://onlinelibrary.wiley.com/doi/10.1002/meet.2008.14504503143/pdf

Chris Rusbridge, "Excuse Me... Some Digital Preservation Fallacies?" Ariadne 46, February 2006. Available at http://www.ariadne.ac.uk/issue46/rusbridge/

January 27: Preservation action: Overview of the digital preservation problem and field and the approach taken in this course.

Topic: Basic digital preservation management, including the (all too brief) history of digital archiving and preservation research and practice, will be discussed in the light of the relatively popular presentations of them that are easily available. We will discuss what we are trying to accomplish in the course in terms of both what you will learn and what you will learn how to learn.

Questions to prepare for discussion:
1) What are we trying to do in digital preservation? Is it possible? Why or why not? (skip back and look at Beck and Rusbridge too)
2) From the perspective of the Cornell tutorial, how would you summarize the efforts made so far toward digital preservation?
3) After having read the assignments for today, what form of digital cultural object worries you most in terms of survival and why?

Readings:

If you have little or no background on the topic of digital preservation, to prepare for class today you can "take" the Cornell tutorial on digital records preservation; this means spend at least an hour or so going through it, taking notes for discussion in class, making sure you pay attention to the sidebar issues and follow the major links :
"Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems." Available at: http://www.icpsr.umich.edu/dpm/

Michael Gruber, Digital Archaeology (WIRED 2002). For a more romantic (and possibly more realistic--hence the name of our lab) view of the above, find this very brief article here: http://www.wired.com/wired/archive/1.05/1.5_archaeology_pr.html

Patricia Galloway, "Preservation of Digital Objects" (ARIST 38, 2004, Chapter 11, 549-590) and "Digital Archiving" (ELIS 3rd Edition, 2010, 1518-1527). Both of these will be available through Canvas and they provide another view of the changes that have taken place in the field in a short time.

Alex Ball, Preservation and Curation in Institutional Repositories (DCC State of the Art Report, March 10, 2010), available at http://www.dcc.ac.uk/sites/default/files/documents/reports/irpc-report-v1.3.pdf This report has a distinctly European flavor, but it provides a good overview of major projects that are either currently in play or have been completed and have created tools, some of which we may use. Read this so that you understand how these current projects fit into the picture.

National Digital Stewardship Alliance, National Agenda for Digital Stewardship 2014: http://www.digitalpreservation.gov/ndsa/documents/2014NationalAgenda.pdf

Cal Lee and Helen Tibbo, "Where’s the Archivist in Digital Curation? Exploring the Possibilities through a Matrix of Knowledge and Skills"(Archivaria 72, 2011, 123-168).

Peter Chan, "What Does it Tale to Be a Well-rounded Digital Archivist," Library of Congress The Signal blog, October 7, 2014: http://blogs.loc.gov/digitalpreservation/2014/10/what-does-it-take-to-be-a-well-rounded-digital-archivist/

Patricia Galloway, "Educating for Digital Archiving through Studio Pedagogy, Sequential Case Studies, and Reflective Practice" (Archivaria 72, 2011, 169-196).

February 3: Context of Creation: Reliability, authenticity, custodianship; file format conversion, migration, emulation, reauthentication; digital genres and their significant properties

Project Assignments: Students will be assigned to project teams and an outline protocol for project work will be discussed, including the steps that will be undertaken through the project and how they will coincide with class lectures. We will discuss the inventory instrument(s) you will be using, the basic SIP agreement, and the methods you will use to review your digital materials safely so as to preserve authenticity.

Topic: Discussion of major issues related to the nature of digital objects and the nature of archives. Different "genres" of electronic records (email, webpages, databases, etc.) represent different bundles of affordances, necessitating different strategies for preservation and different "significant properties" to be considered in devising those strategies. Should all properties be preserved? Should only "significant properties" be provided for access? Discuss strategies: bitstreams as authenticity guarantors and starting place for serious study; use copies as digital library fodder; making readers and other tools available. Think in terms of your project's digital objects and the significant properties issues that they raise.

Once you receive your assignment, you should begin thinking of the specific problems raised by the materials you are dealing with, and we will discuss these issues in subsequent classes. Today we will begin to discuss the kinds of replications that are parts of the digital preservation task: disk images, forensic copies, non-forensic copies, use copies, etc. etc. Emphasis is on bitstream preservation and contextualization/documentation of the capture process. We'll discuss a range of options for overcoming hardware/software obsolescence and when each is appropriate. We will also discuss the associated practices supporting these concepts, including a general protocol for capture and preprocessing of archival materials and a range of tools available for use. Finally, ground rules for group work will be discussed.

Questions to prepare for discussion:
1) What do "reliability" and "authenticity" mean in the digital environment?
2) Should we separate out "significant properties"? As opposed to what? What might be an "insignificant property"?
3) How important is it to distinguish the form or genre of a digital object? What aspects distinguish a genre?
4) How do significant properties and genres map onto file formats--if they do?

Readings:

Luciana Duranti, "Reliability and Authenticity: The Concepts and their Implications," Archivaria 39:1-10 (1995). This is the canonical definition of the two concepts as used by archivists of the diplomatic persuasion (and us in perhaps a stricter sense than suggested by the next reading below), and you need to be clear on the two concepts and the difference between them as terms of art. Even if you have read this before, read it again.

Jean-Francois Blanchette, "A Material History of Bits," JASIST 62(6) 2011, 1042-1057. Download from Canvas. Especially useful for thinking through the thicket of things it is necessary to be concerned about, articulated through the idea of stacks.

Matthew Kirschenbaum, Mechanisms: new Media and the Forensic Imagination, Chapter 1 (Cambridge: MIT Press, 2011--2012 in paperback).

Jeff Rothenberg, Avoiding Technological Quicksand (his classic tirade on emulation from 1999): http://www.clir.org/pubs/reports/rothenberg/contents.html

Phil Mellor, Paul Wheatley, and Derek Sergeant, "Migration on Request, a Practical Technique for Preservation," CaMiLEON report from 2002, available on Canvas. This reading is background for the next.

Kam Woods and Geoffrey Brown, "Migration Performance for Legacy Data Access," International Journal of Digital Curation 3(2), 2008 (this paper discusses how you can actually make migration on demand work in a dynamic environment): http://www.ijdc.net/index.php/ijdc/article/viewFile/88/59

Margaret Hedstrom, Christopher Lee, "Significant properties of digital objects: definitions, applications, implications," (in Proceedings of 2002 DLM-Forum): http://www.ils.unc.edu/callee/sigprops_dlm2002.pdf

J.E.P. Currall, M.S. Moss, and S.A.J. Stuart, "Authenticity: A red herring?" Journal of Applied Logic 6(4), 2008: 534-544. Available at: http://eprints.gla.ac.uk/4658/

Geoffrey Yeo, "'Nothing is the same as something else': Significant properties and notions of identity and originality," Archival Science 10(2), 2010: 85-116.

February 10: Transition from Context of Creation to Context of Preservation: Digital archaeology and preprocessing steps; levels of service and digital forensics

Topic: What are the details of preprocessing steps beginning with capture and ending with ingest? What does "digital archaeology" mean and what are the techniques used for identifying and recovering digital objects that can no longer be accessed using current technology? Finally, can/should we distinguish degrees of care/effort that we expend with reference to digital records? What is the relation between cost-benefit and levels of service?

Questions to prepare for discussion:
1) How have expectations and realities of levels of service from digital repositories changed over time?
2) Has NARA been wise or foolish to have been so slow in its development of the Electronic Records Archive?
3) How is digital forensics useful to digital preservation?

Readings:

William LeFurgy, "Levels of Service for Digital Repositories," D-LIb Magazine (May 2002); this is a central concept that needs to be addressed in order to define what preservation steps will be taken: http://www.dlib.org/dlib/may02/lefurgy/05lefurgy.html

Matthew Kirschenbaum, Mechanisms, Chapters 2 and 3.

Digital Forensics and Born-Digital Content in Cultural Heritage Collections, CLIR report, 2010: http://www.clir.org/pubs/reports/pub149/pub149.pdf

Jeremy Leighton-John, Digital Forensics and Preservation, a Digital Preservation Coalition Technology Watch Report, 2012. Available at http://www.dpconline.org/component/docman/doc_download/810-dpctw12-03pdf

Cal Lee, Kam Woods, Matt Kirschenbaum, Alexandra Chassanoff, From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions, Report of the Bitcurator Project, September 20, 2013: http://www.bitcurator.net/docs/bitstreams-to-heritage.pdf

Dan Farmer and Wietse Venema, Digital Discovery (Addison-Wesley, 2006); read the read chapter 1 (and also chapter 2 if you are feeling more curious). Available online in an html version at http://www.porcupine.org/forensics/forensic-discovery/

February 17: Context of Preservation: Archival institutional repositories and the OAIS model; distinctive characteristics of the digital archive

Topic: Fortunately for the digital archiving community, there is now a widely-accepted model for the functions that a digital archives should provide: the Open Archives Information System (OAIS) model, and it is important that you be familiar with it. Readings include the original OAIS specifications, MIT/HP's open-source DSpace as an implementation of that model and the version of the model that we will use in this class, and a recent book on institutional repositories considered as archival.

Questions to prepare for discussion:
1) Why do you think the OAIS model has been so successful in taking digital preservation forward?
2) Have you used a digital repository? If so which one(s) and for what?
3) What features make a digital repository "archival"? What is the difference between an archival repository and a digital library?

Readings (NOTE that for this class you are expected to review these documents and especially to read in detail the "Functional Overview" section of the DSpace system documentation; we will return to these documents for discussion as the course progresses):

Brian Lavoie, The Open Archival Information System (OAIS) Reference Model: Introductory Guide (2nd ed.). DPC Technology Watch Report 14-)@ (October 2, 2014). Find this at http://www.dpconline.org/publications/technology-watch-reports
Please use the following complete document for reference; the examples in Annex A are useful to browse: OAIS model: http://public.ccsds.org/publications/archive/650x0m2.pdf --this is the most recent 2012 "Magenta Book" version of the OAIS specification

Anne Marie Donovan, Maria Esteva, Addy Sonder, and Sue Trombley, "Proposal for Establishment of a DSpace Digital Repository at the School of Information, University of Texas at Austin," final report for 2003 INF 392K class. Available on ford at: https://ford.ischool.utexas.edu/handle/2081/1273

Richard Jones, Theo Andrew, and John MacColl, The Institutional Repository (Oxford: Chandos, 2006), Chapter 3, "Technologies and Technicalities." Available on Canvas.

DSpace system documentation for version 5.x: available on Canvas under Files > Manuals--read especially the section "Functional Overview." It is especially important that you become familiar with the DSpace documentation so that we can discuss how DSpace instantiates the OAIS model (or doesn't).

DSpace roadmap document: https://wiki.duraspace.org/display/DSPACE/RoadMap.

Trustworthy Repositories Audit and Certification document (this is the original--free--document from 2007 that became the ISO 16363--not free--document). Skim this to see how repositories are now being audited: http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf

Galloway, "Institutionalizing a University Department-Level Institutional Repository," a review of the state of pacer in 2006, available on ford at: https://ford.ischool.utexas.edu/handle/2081/29101

February 24: Simulating the Context of Creation: Metadata for preservation and description

Students will report on the progress of their projects, including the first meeting with collection creators (or custodians) if that has happened.

Topic: Without descriptive metadata digital objects are literally lost, and without preservation metadata they might as well be. There has been an enormous amount of attention devoted to the metadata requirements for archival digital objects: what metadata are needed, when they are generated, how they are generated. Metadata is the crucial "wrapper" that facilitates all digital archival activities and is crucial to the structure of DSpace. We will discuss the DSpace Dublin Core registry and the addition of METS as well as the PREMIS standard for preservation metadata. We will also discuss the inclusion of additional metadata sets to DSpace as well as the inclusion of controlled vocabularies. Finally, we will look at available metadata harvesting tools and how to use them and discuss a handout on metadata standards for the course, including biog/hist, scope/content, controlled vocabularies, and special format subsets.

Questions to prepare for discussion:
1) What kinds of metadata are most important for digital preservation?
2) What kinds of metadata are most needed for management of digital collections?
3) What kinds of metadata do digital objects contain as part of their structure?

Metadata readings for background:

Introduction to Metadata: Pathways to Digital Information, 1998; including Anne Gilliland-Swetland, “Setting the Stage,” Tony Gill, “Metadata and the World Wide Web,” and Mary Woodley, “Crosswalks: the Path to Universal Access?”: http://www.getty.edu/research/conducting_research/standards/intrometadata/index.html

Qualified Dublin Core is what DSpace supports out of the box. There is now a repository of all the papers from Dublin Core international conferences, 2001-present: http://dcpapers.dublincore.org/pubs. You don't have to register to get access and the site is a treasure-trove of work on a standardization effort that has been astoundingly successful because it has been open and free. 2009 includes a metadata framework for manga and 2012 includes an article on metadata for Thai palm-leaf manuscripts.

Readings:

OAIS model, sections 4-6 and annexes

DSpace system documentation, version 5.x, section 1.2, Functional Overview. This version of the manual has a less useful Functional Overview than the previous manuals have had, but it covers more ground and tallies better with many parts of OAIS.

PREMIS preservation metadata document:
PREMIS Data Dictionary for Preservation Metadata v. 2.2 (2012): http://www.loc.gov/standards/premis/v2/premis-2-2.pdf

Brian Lavoie and Richard Gartner, Preservation Metadata, 2nd ed. DPC Technology Watch Report 13-03, May 2013. Linked from http://www.dpconline.org/publications/technology-watch-reports This report bundles together an overall view of preservation metadata.

There are several metadata extractors that we have used; look at the ones mentioned below and get a general idea of what they do before class, because we will experiment with them.

New Zealand Metadata Extraction Tool: see its SourceForge pages and user manual etc. here: http://meta-extractor.sourceforge.net/ User guides are found under Documentation.

Investigate the GDFR (http://www.gdfr.info/) file format registry and the PRONOM (http://www.nationalarchives.gov.uk/PRONOM/Default.aspx) file format registry, both of them now of historical interest, and find out how they differ; then look at the UDFR project information:http://udfr.org/project/UDFR-final-report.pdf) that describes the merged effort. Investigate the file-profilers JHOVE2 (https://bitbucket.org/jhove2/main/wiki/Home) and DROID (http://www.nationalarchives.gov.uk/information-management/our-services/dc-file-profiling-tool.htm) as well as the omnibus File Information Tool Set (FITS) tool wrapper, which puts it all into one package: http://code.google.com/p/fits/

March 2: Structuring the Context of Preservation: Logical models for digital collections ("arrangement")

Topic: Discuss the OAIS and other logical models for the sake of features that they might add to OAIS/DSpace. Discuss the structure of collections in DSpace and how the DSpace object model can be used to advantage in creating virtual collections. Discuss student progress with research on areas of expertise. Discuss order as received (creator order) vs virtual orderings (interpreted order[s]). We will work through possible structures for several of the projects in order to discuss these issues. Students should identify the operating system environment that runs on the machine they will be working with and learn something about it: what does the interface look like? How are commands delivered? How many commands are there (tens? hundreds?).

Questions to prepare for discussion:
1) How can a logical model force the way we think about archival materials? Examples?
2) How are orderings represented in the digital environment? What is "original order" in that environment?
3) How might you represent original order in an OAIS-compliant repository? How does DSpace do it?
4) What happens to the concept of original order when disk images are captured?

Readings:

CEDARS/OAIS model (two documents): these documents are now historical and represent optional reading only

Kelly Russell, "Digital Preservation and the Cedars Project Experience," available at: http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070504/viewer/file137.html

CEDARS final report:--go to this page: http://www.ukwebarchive.org.uk/wayback/archive/20050111000000/http://www.leeds.ac.uk/cedars/index.html
click on "Homepage archived 11 Jan 2005," which will show you the archived CEDARS site; click on Publications and Conferences, then scroll down to "The Cedars Project Report, April 1998 to March 2001", click on it, and download the full pdf. You will note that this is a digital archiving website archive archived for the Brits by the Internet Archive--and most of the site works.

DSpace as real and virtual model

Review the DSpace data model in the DSpace 5.x documentation that I placed on Canvas in the Files section under Manuals. Focus on the 1.2 section as for last time. Think about how you might structure the project data you expect to retrieve.

Patricia Galloway, "Representing Archival Descriptive Metadata in a DSpace Environment," linked here; also "Order as Received: Constructing an Initial Virtual Order for Digital Objects," linked here --both of these unpublished papers address issues in the "real" (i.e. visible to users--first paper) and the "virtual" (what DSpace can do--second paper) DSpace models of data arrangement.

David S.H. Rosenthal, Emulation & Virtualization as Preservation Strategies, a report commissioned by The Andrew W. Mellon Foundation, New York, October 2015, https://mellon.org/Rosenthal-Emulation-2015

 

March 9: Actors in the Context of Preservation: Authentication structure (Producer-Archive interface): communities, groups, e-people, collections (lab)

For this week's project meeting reports please provide a list in class of what you see as your remaining tasks and how you plan to tackle them.

Topic: Presentation of DSpace management interface and elements of the DSpace authentication system. Setting up groups and levels of access. We will set up and discuss appropriate collection structures in DSpace for your materials and review the details of "ingest," both as envisioned in OAIS and as implemented in DSpace as a manual process. Deconstructing DSpace and using it to instantiate/represent archival materials. Discussion of issues of closed vs open collections and how to assure the desired outcome.

Questions to prepare for discussion:
1) Envision your project repository structure considering: a) how it relates to the materials as received/found/harvested, and b) how it will fit into the existing ford structure, using (roughly) a subcommunity as the fonds and collections as series.
2) Think about how you will allocate the roles that DSpace defines: (sub)community administrator, collection administrator, submitter.
3) Review the manual ingest process and the metadata needed to carry out such an ingest.
4) Finalize your narrative aggregate metadata and choose a logo for the subcommunity page and collection pages.

Management policy: Each team will describe briefly and present schematically its proposed management policy for its designated community as a part of the topic discussion this week. Teams will have consulted the initial proposed policy document on ford
(https://ford.ischool.utexas.edu/handle/2081/718) on overall policy for the iSchool repository as a context for their policy development. Use the document as a list of things that your handling of your particular materials might need to consider, as (for example) any privacy considerations or renegotiation if necessary of terms for materials for a project that continues a previous one. Especially, work out how you see the structure around communities, subcommunities, collections, items, and bitstreams; as well as what materials you plan to ingest and how you propose to ingest them--if you will be doing manual ingest, how you propose to set up any workflow you wish to use. The purpose here is to prepare for setting up the structure to receive your archival collections in DSpace and to have a plan for their maintenance.

Readings:

See DSpace system documentation (https://wiki.duraspace.org/display/DSDOC/DSpace+Documentation) under "Functional Overview: Ingest Process and Workflow" if you have not already done so (and even if you have).

Also review Chapter 4 in The Institutional Repository. Available on Canvas.

Finally, there is a (slightly old--2003) policy-outline document from MIT that is worth review (you can find it by plugging the following URL into the Wayback Machine): http://dspace.org/implement/policy-issues.html

ERPANET, ERPA Guidance: Ingest Strategies (ERPANET, September 2004). Available at http://www.erpanet.org/guidance/docs/ERPANETIngestTool.pdf This document will provide guidance to each team in deciding on the overall strategies/templates for their collection creation.

Producer-Archive Interface Methodology Abstract Standard, CCSDS 651.0-R-1 (this is the OAIS Ingest document from the CCSDS, in the final "Blue Book" format from May 2004). Available at: http://public.ccsds.org/publications/archive/651x0m1.pdf
You need to read this document carefully for a broader picture of the process than is represented in the ERPANET document, to be sure you can contextualize the DSpace version of the process adequately.

Read 1) the ingest worksheet for manual ingest and 2-4) the batch import/ingest documents, all from the Resources page.

 

SPRING BREAK March 14-19

 

March 23: Ingest test set of documents (lab)

Topic: Students should have a test set of files and appropriate metadata ready to ingest into DSpace and will describe to the class how they chose the files and what specific problems the files raise. Step through the manual ingest process.

Readings:

DSpace system documentation: Ingest workflow (this is part of the document mentioned for last week).

March 30: Preparation of batch ingest (lab)

Topic: Perfecting understanding of the batch ingest process.

April 6: Ingest collections, test (lab)

Topic: Students will get on with the tasks of preparing and ingesting collections.

April 13: Ingest collections, test (lab)

Topic: Students will get on with the tasks of preparing and ingesting collections.

April 20: Ingest remaining collections, test (lab)

Topic: Additional manual ingest and/or preparation of batch ingest directory trees.

April 27: Complete any remaining tasks (lab)

Topic: For example: mapping real collections onto virtual ones!

May 4: Summative discussion

Class evaluation to be done after this class (online).

Project team formal presentations, 10:00-12:00 (i.e., 15 minutes allotted to each project; all team members should participate).

Note: We will invite all the collection custodians to attend your presentations; treat the presentation as you would a professional presentation.

Final project report due (ingested to the pacer repository) and task journal complete.

May 6: Spring Open House display of project posters