Hanson PC, Carpenter S, Cardille JA, Coe MT, Winslow LA. 2007. Small lakes dominate a random sample of regional lake characteristics. Freshwater Biology. 52:814-22Lakes were selected from unique Water Body Identification Codes (WBICs). Linear features and water bodies identified as impoundments or stream openings were identified from maps digitised by the Departments of Natural Resources of Michigan and Wisconsin (1 : 24 000 USGS 7.5’ topographic quadrangles) and were excluded. More than 7500 lakes ranging in size from about 0.01 to over 2800 ha remained in the data set. We used a stratified random survey, an approach consistent with the Environmental Monitoring and Assessment Program (EMAP) guidelines (Larsen et al., 1994) of the U.S. Environmental Protection Agency, to select and sample 300 lakes from the data set as follows. All lakes were ordered by area and divided into 20 bins of equal population. From each bin, 15 lakes were chosen at random. Because of logistical issues in travelling to many lakes scattered over a wide geographical region, we clustered lakes into 31 geographically small regions of about 150 km2 each. The order of regions sampled was randomised to reduce correlation of geographic region with time. For any one sampling date we visited only one region, although not all lakes in a region could be visited on a single trip. After all 31 regions were visited, the regions were again selected at random, and lakes previously not visited were sampled. There were 45 sampling days spread between May 20 and August 19. Some lakes that were chosen for sampling could not be visited. Difficulty portaging the sampling gear to a lake or failure to gain access to a lake through private property were reasons for abandoning a sampling effort.Lakes were sampled at their approximate geographic centre. Lake depth and water clarity were measured with a Secchi disk. Our measurement of lake depth was neither a measurement of the maximum nor the mean depth. Because the measurement was made in the middle of the lake and most lakes in the region tend to be bowl shaped, our measurement was probably between mean and maximum depth. Dissolved oxygen (DO) and thermal profiles were obtained from a YSI Model 58 (YSI, Inc., Yellow Springs, OH, U.S.A.) metre (DO air calibrated; temperature calibrated in the laboratory), and the approximate middle of the epilimnion was estimated from the profile. Thermal stratification was calculated from the thermal profile according to the methods listed on the Internet at the North Temperate Lakes Long Term Ecological Research (NTL-LTER) program Web site (http://lter.limnology.wisc.edu). Water samples for later analyses (Table 1, chemical variables) were obtained from the middle of the epilimnion, using a peristaltic pump. For samples that required filtration [dissolved inorganic carbon (DIC), DOC, cations and anions], a 0.45 μm filter was attached in-line. All samples were refrigerated upon returning to the vehicle, and samples for total nitrogen (TN) and total phosphorus (TP) were preserved by acidification. Acid neutralizing capacity (ANC) and pH were determined the day of sampling by Gran alkalinity titration (for ANC) and measurement by pH probe (Accumet 950; Fisher Scientific, Hanover Park, IL U.S.A.). pH was not air equilibrated. DIC and DOC were measured with a carbon analyzer (TOC-V; Shimadzu Scientific Instruments, Columbia, MD, U.S.A.). TN and TP were measured with a segmented flow auto-analyzer (Astoria-Pacific, Inc., Clackamas, OR, U.S.A.). Anions were measured using an ion chromatograph (DX500; Dionex Corporation, Sunnyvale, CA, U.S.A.), and cations using mass spectrometry (ICP-MS; PerkinElmer Life and Analytical Sciences, Shelton, CT, U.S.A.). Details of chemical analyses are available on the Internet at the NTL-LTER Web site listed above.To correct for bias introduced by not sampling all 300 lakes, we replaced missing data using multiple imputation (Levy, 1999). Multiple imputation is a technique for estimating the uncertainty of imputed variables. For each variable for each lake not sampled in a given bin, we chose at random (with replacement) a value from lakes sampled in that bin. We repeated the imputation 1000 times to provide a distribution of estimates for each variable in the lakes not sampled. The distribution mean for each variable in each lake was used in the calculation of the median for the regional lake population. We chose to present the median for the 300 lakes because distributions tended to be highly skewed. For comparison purposes, we also calculated the median from sampled lakes only (i.e. excluding imputed data). The mean cumulative distributions for some variables, including 95% confidence intervals, were plotted from the 1000 cumulative distributions generated by multiple imputation.We fit a Pareto distribution to the regional lake area data set to compare the size distribution of NHLD lakes with those of other regions. We used the maximum likelihood estimator for parameter estimates (Bernardo & Smith, 2000). Of particular interest is the parameter (β) that describes the logarithmic decline in number of lakes with lake area, because this parameter has been used previously (Downing et al., 2006, Table 1) to compare lake area distributions among regions and to estimate the global abundance of lakes.Where indicated, results have been area weighted to reflect the influence of lake size. For correlations, data were transformed (log10) to normalise distributions and linearise relationships. Shoreline development factor (SDF), an index of the irregular shape of lakes, was calculated for each lake according to Kalff (2002). The minimum SDF, 1, indicates a lake is a perfect circle.