A subset of lake and geographic data was created to examine spatial variation in TP and water color relationships with CHLa across broad geographic extents using spatially-varying coefficient models with a Bayesian framework. Lakes were selected that had complete records for summer epilimnetic total TP, true water color, and CHLa. In addition we selected lakes with surface area greater than or equal to 4 ha and less than 10,000 ha to exclude very small and very large lakes from the analyses. The resulting dataset includes 838 lakes in Wisconsin, Michigan, New York, and Maine with 7395 observations. The majority of lakes in the data subset have only one water chemistry observation (~72% of lakes). There are 228 lakes with more than one water chemistry observation taken on different sampling occasions over time (average of 29 observations per lake with repeated measures). The dataset reports the original, individual measurements. The proportion of agriculture and wetlands in the lake catchment were derived from land cover and land use data in the National Land Cover Dataset (2006). For the analyses we withheld ten percent of the observations for model validation and to assess prediction accuracy. The remaining observations were used in the model building steps. The 'dataset' column in the data indicates whether the observation belongs to the model-building ('mb') or hold-out dataset ('h').