North Temperate Lakes Information Management System
Philosophy and Goals
Information management is an integral part of the research process at our site. From the design of data collection, to incorporation in the centralized database, our information management system (IMS) is designed to be modular, adaptive and light weight. Almost 40 years of data management experience have taught us that a successful IMS needs to respond easily to developing best practices, advancing programming environments and evolving skills of its operators. In this rapidly changing environment our primary goals for information management, however, continue to be: (1) maintain data integrity; (2) document the research process for transparency and reusability; (3) facilitate research by making high quality data available for a long period; (4) provide technologies that enable researchers to collect, discover, access, integrate and analyze data across disciplines. The information management staff strives to be informed about and evaluate new technologies and approaches that may enhance the above-stated goals as well as actively provide IM training, advice and service to graduate students, post-docs, and faculty.
Information Management System (IMS) Scope
The NTL IMS maintains all datasets collected as part of the NTL LTER project as well as datasets from associated projects (i.e., other NSF- funded research conducted at the site and by NTL PIs) and frequently needed supporting data from other agencies which may not otherwise be easily available. In addition to the main NTL website, the life cycle of associated project websites is supported, which includes creation, content development, and eventually curation and archiving of the content and inclusion of the most pertinent information in the main NTL website under the ‘related projects’ category.
NTL zooplankton samples, phytoplankton slides, fish scales, and benthic invertebrate samples are stored at the University of Wisconsin-Madison Zoology Museum. Periodic maintenance is performed to prevent drying out of wet samples. An electronic catalog provides information on this collection.
Data Release, Access and Use Policies
Our longstanding policy is that all core data sets are available as soon as possible to all project PIs and staff. Core data sets are collected and managed centrally as opposed to being collected and managed by individual PIs. Typically, physical limnology data are available within a month, fish data within two months, and other data, including chemical limnology and other biological data, within a year but often much sooner. Data from instrumented buoys are uploaded to the database and published on the website in near real-time (approximately hourly).
We encourage collaborative explorations of our data. Our policy is to provide all core data prior to the most recent two years online on the website, and, in actual practice, much of the data are available publicly on the NTL website considerably sooner. Non-core data that are not restricted for access are also made available publicly. Very few data sets have restricted access, and the reasons for these access restrictions are almost always either proprietary (e.g., data sets produced by another source) or human subject confidentiality issues. Please see our data access/data use policy.
Long-term core data are stored in a relational database system (MySQL). In addition to maintaining the live data in a dynamic database, datasets are periodically archived as immutable, versioned ASCII files on our server. These files are used for upload to the repository of the Environmental Data Initiative (EDI) on a regular basis (annually for most, more frequently for high frequency streaming sensor data) where they receive a Digital Object Identifier (DOI). Short term (mostly student) research data of less broad application are deposited only as ASCII text files in our above-described archive and EDI. All data are accessible on our website for discovery and download while the long-term core data may also be queried (subset).
NTL LTER has computer facilities at the Hasler Laboratory and Trout Lake Station.
Corinna Gries was hired as lead information manager in 2009 replacing Barbara Benson who is now retired but has led the team since 1983. Gries provides stability and continuity to information management and its linkage to our science. With a Ph.D in ecology, she is directly engaged in design of research and frequently serves as lead or co-author of ecological papers in addition to papers on information management. She consults with students and other researchers on research data management. Our assistant Information Manager, Mark Gahler, joined our team in 2014. Paul Hanson, maintains the LAN and computers at the Center for Limnology with the help of student hourly staff. With a Ph.D. in Limnology, he is a frequent contributor to NTL publications.
IMS Support for Science
This website provides researchers with the powerful search capabilities for simple, direct access. Currently, most NTL researchers and the public access data through the dynamic query application available on the NTL web site within the online data catalog that also supplies the supporting metadata.
In addition, all data are available through the public EDI data repository which may be search directly or via the DataONE search interface. There datasets are available as comma separated text files, which after download may be manipulated with local tools.
Information management staff provides support to project scientists and staff by developing data acquisition tools for technicians and assisting data analysis by researchers. To speed data acquisition and reduce data entry errors, custom software has been written for recording fish field data and counting and measuring zooplankton. Numerous programs have been written to manipulate raw data into forms requested by researchers (e.g., hypsometric averages of depth profile data, estimation of mixed layer depth, histograms of fish lengths). We have developed tools for querying sensor data to facilitate the use of these data in modeling, including nowcasting.
Metadata are a crucial part of our information system. Ecological Metadata Language (EML), a metadata standard based on the XML Schema specification, is the standard adopted by the LTER Network. Each NTL online data set has associated metadata online in text file and EML formats.
The metadata for the non-spatial data are stored and managed in a MySQL database with this website being used as the management front end. The dynamic data catalog and forms for querying are generated for each data set based on these metadata.
Data Quality Assurance / Quality Control
A number of different quality-control mechanisms have been established. For example, the sampling and analysis protocol for physical and chemical parameters includes random blind samples and replicate analyses at the ratio of approximately 1:10 (replicate:sample). Quality of the chemical results is verified by comparison with previous years’ data and parameter consistency checks. Water chemistry analytical performance is also evaluated every six months via analysis of blind samples provided through the USGS laboratory accreditation process. Error checking occurs in some data entry software and through proofreading. Data sets have a system of flags to indicate quality conditions such as non-standard routine or equipment used. Database triggers perform range checks on data captured in the database in near-real time from instrumented lake buoys. For other data sets, information management staff and technicians follow data-specific protocols for visual and computer-aided screening of data.