North Temperate Lakes Information Management System

Philosophy and Goals

Information management is an integral part of the research process at our site. The NTL information management system is designed to facilitate interdisciplinary research. From the design of data collection, to incorporation in the centralized database, to analyses, we focus on linkages among the components of the ecological and social systems we study. Our primary goals for information management are to (1) maintain database integrity, (2) create a powerful and accessible environment for the retrieval of information, (3) facilitate linkages among diverse data sets, and (4) provide adequate metadata for data interpretation and analysis.

Information Management System (IMS)Scope

Most of the core data collected by NTL reside in the North Temperate Lakes LTER Database implemented in MySQL. Core spatial imagery, i.e., MODIS, is stored on network hard drives at the Center for Limnology and is accessible to NTL researchers. Non-core data of general interest such as regional limnological surveys are also maintained in MySQL as are data from associated projects (e.g., the NTL Microbial Observatory). Other data such as student research data of less broad application are archived in text format. Under our current operating procedure, all LTER graduate students meet with the information management staff to be advised on data management and to assure that their data and metadata are archived. In the past, LTER graduate student data were archived as resources permitted according to priorities set by the PIs. Core data are available to all NTL researchers as soon as the data are entered and quality screened.

NTL zooplankton samples, phytoplankton slides, fish scales, and benthic invertebrate samples are stored at the University of Wisconsin-Madison Zoology Museum. Periodic maintenance is performed to prevent drying out of wet samples. An electronic catalog provides information on this collection.

Data Release, Access and Use Policies

Our longstanding policy is that all core data sets are available as soon as possible to all project PIs and staff. Core data sets are collected and managed centrally as opposed to being collected and managed by individual PIs. Typically, physical limnology data are available within a month, fish data within two months, and other data, including chemical limnology and other biological data, within a year but often much sooner. Data from instrumented buoys are uploaded to the database and published on the website in near real-time (approximately hourly).

We encourage collaborative explorations of our data. Our policy is to provide all core data prior to the most recent two years online on the website, and, in actual practice, much of the data are available publicly on the NTL website considerably sooner. Non-core data that are not restricted for access are also made available publicly. Very few data sets have restricted access, and the reasons for these access restrictions are almost always either proprietary (e.g., data sets produced by another source) or human subject confidentiality issues. Our data access/data use policy is available available at http://lter.limnology.wisc.edu/about/ntl-lter-data-access-policy.

IMS Design

Core data are entered, updated and maintained in the MySQL database using scripts in the worklfow system Kepler as well as through use of an application installed on networked Windows workstations. Most physical and chemical limnology data are entered through a web application developed locally to manage samples and data from the core data sampling. Data from the MySQL database are made available by several methods for viewing and/or downloading over a network connection. The database has broad functionality to maintain database integrity (e.g., passwords for access, privileges and roles to control read/write, recovery from system crashes, backup utilities). Files archived as text may also be downloaded through the online data catalog.

A major area of recent innovation has been the development of IM tools for handling sensor data streams. These data bring new challenges of near-real time processing and increased data volumes. We are collaborating with computer scientists (the San Diego Supercomputer Center, Indiana University, and SUNY-Binghamton) on automated scaling and data processing of sensor network data.  At NTL we are exploring an alternative data model more suited to automated reconfiguration necessitated by changes  in sensor deployment as well as developing tools to query, edit, display, and download sensor data.

NTL LTER has computer facilities at the Hasler Laboratory and Trout Lake Station.

IMS Staffing

Corinna Gries was hired as lead information manager in 2009 replacing Barbara Benson who is now retired but has led the team since 1983. Gries provides stability and continuity to information management and its linkage to our science. With a Ph.D in ecology, she is directly engaged in design of research and frequently serves as lead or co-author of ecological papers in addition to papers on information management. She consults with students and other researchers on research data management. Our assistant Information Manager and GIS specialist, Aaron Stephenson, received a M.S. in environmental monitoring and has worked in industry before joining our team in 2010.  The laboratory manager, Paul Hanson, maintains the LAN and computers at the Center for Limnology with the help of student hourly staff. With a Ph.D. in Limnology, he is a frequent contributor to NTL publications.

IMS Support for Science

The client/server environment provides researchers with the powerful search and linkage capabilities of a relational database together with end-user query tools for simple, direct access. Currently, most NTL researchers and the public access data through the dynamic query application available on the NTL web site within the online data catalog that also supplies the supporting metadata. For more sophisticated queries a generic online interface is provided. With this tool, a researcher may retrieve information from the database to answer questions such as "What was the average epilimnetic chlorophyll concentration in Trout Lake during the ice-free season for each year since 1982?" The relational database supports the linking of the chlorophyll concentration table with the ice duration table, and the subsetting and aggregation that this request entails.

Information management staff provides support to project scientists and staff by developing data acquisition tools for technicians and assisting data analysis by researchers. To speed data acquisition and reduce data entry errors, custom software has been written for recording fish field data and counting and measuring zooplankton. Numerous programs have been written to manipulate raw data into forms requested by researchers (e.g., hypsometric averages of depth profile data, estimation of mixed layer depth, histograms of fish lengths). We have developed tools for querying sensor data to facilitate the use of these data in modeling, including nowcasting.

Metadata

Metadata are a crucial part of our information system. Ecological Metadata Language (EML), a metadata standard based on the XML Schema specification, is the standard adopted by the LTER Network. Each NTL online data set has associated metadata online in text file and EML formats. Metadata for spatial data include copyrights, map scale, coordinate accuracy, and data lineage information and are available as text or EML.  Field and lab methods are documented for each core data set and available online for most data sets.

The metadata for the non-spatial data are stored in the MySQL database and drive the dynamic database application in the data catalog, allowing forms for querying to be generated dynamically for each data set that is maintained in the Oracle database and provided in the catalog. We have harvested the EML documents for the NTL core data sets into the central metadata catalog, Metacat, for the LTER Network. These harvested EML documents are valid EML and contain EML describing identification, discovery, evaluation, access, and integration Information.

Data Quality Assurance / Quality Control

A number of different quality-control mechanisms have been established. For example, the sampling and analysis protocol for physical and chemical parameters includes random blind samples and replicate analyses at the ratio of approximately 1:10 (replicate:sample). Quality of the chemical results is verified by comparison with previous years’ data and parameter consistency checks. Water chemistry analytical performance is also evaluated every six months via analysis of blind samples provided through the USGS laboratory accreditation process. Error checking occurs in some data entry software and through proofreading. Data sets have a system of flags to indicate quality conditions such as non-standard routine or equipment used. Database triggers perform range checks on data captured in the database in near-real time from instrumented lake buoys. For other data sets, information management staff and technicians follow data-specific protocols for visual and computer-aided screening of data.

Future Directions and Challenges

The universal access of the web makes our web site the main entry point for external data distribution. We plan future enhancements to the data catalog for searching and browsing that will allow searches based on study site, thematic area, and selected metadata fields.

To enhance the dissemination of our spatial data, we will be extending our current web-mapping applications.  New tools will allow for searching, segmenting, and downloading spatial data sets based on a user-specified spatial extent (e.g. a watershed, riparian buffer, or groundwater recharge zone).  In addition, we plan to offer tools that enable users to integrate spatial and tabular data using geoprocessing algorithms that can be served over the Web

We will continue our leadership role in IMS design and tools for sensor network data both within LTER and GLEON.