INF 382R - Introduction to Scientific Data Informatics
Introduction to the characteristics of scientific data and the emerging practices applied
toward their management and preservation.
The class will include (1) an introduction to the information properties of scientific data, and scientific data modeling; (2) an introduction to emerging semantic web approaches of describing these data sets; (3) hands-on assessments and evaluations of public scientific data sets; and (4) discussions of current issues in scientific data sharing, publication, and preservation.
Three lecture hours a week for one semester.
Detailed Course Summary
To quote Godfray and Knapp (from an article on taxonomy for the 21st century):
"Any student of data needs to learn the tools and techniques needed to identify and describe the various elements and properties of data objects and data collections.”
The goal of this class is to facilitate such learning. The focus of the class is on the information properties of scientific and technical data and collections of such data. In addition we will devote time to learning about some methods and tools associated with data science. We will use existing libraries in the statistics system R to explore what these tools can provide for data informaticians. Those of us aspiring to be data informaticians need familiarity with emerging trends in research data and big-data domains.
A course synopsis: An introduction to scientific data informatics, in which we (1) examine the information properties of scientific data; (2) develop criteria to appraise the properties of scientific data sets, (3) learn about and use command-line data science tools and R libraries to answer data informatics questions; (4) take an expedition into the Semantic Web and the networks of Linked Data, and learn about their use in the sciences; (5) examine issues of long-term management of, and access to, scientific data from the perspectives of working scientists; and (6) summarize our own learnings from an inquiry-based project.
The course work includes weekly readings from information and computer science and systems literature, as well as current active science and science policy and practice media, and social media. Weekly news and views are collected on a class wiki. There are a set of assignments to provide hands-on experience with data informatics tools. The data science and R sections also include hands-on assignments. There is a major project that results in a report and presentation.
Other than comfort in using computers for common research and writing work, there are no prerequisites for this class. The entire scientific and research data area is in a period of growth, transformation, and development. I look forward to learning more about it with you.
 H.C.J. Godfray and S. Knapp (2004), Introduction to 'Taxonomy for the twenty-first century', http://dx.doi.org/10.1098/rstb.2003.1457