Sunday, August 30, 2015

Lab 1 - Calculating Metrics for Spatial Data Quality

This first lab was about calculating metrics for spatial data. One of the first things to get a handle on was the difference between accuracy and precision. Accuracy is basically how close the measured or observed value is to the known or “true” value, where precision is how close or clustered the measured or observed values are to each other. The first part of the lab introduced us to using ArcMap to determine precision and accuracy. Using a set of 50 different points representing a single location mapped 50 times using a GPS unit, we performed tasks to determine the accuracy and precision of the GPS unit. First, we determined the average location by using the Statistics tool to obtain an average X and Y location, and created a feature class, using Editor to place this point on our map. The shapefiles needed to be projected into a coordinate system that uses meters instead of decimal degrees. Next, we needed to find a distance from the average location that corresponded to a percentage of the observations, specifically 50%, 68%, and 95%. To do this, I first performed a spatial join on the waypoints and average location shapefiles, which created a distance field. After sorting the distance field in ascending order, I set my buffers at the distance corresponding to the 25th value (50%), the 34th value (68%), and I used the value halfway between the 47th and 48th value (95%). This created buffers for the 50th, 68th, and 95th percentile, which shows the precision of the results. The map displaying this is shown below.



Using the 68th percentile as an accepted value for precision, I compared the device’s accuracy and precision. In the horizontal direction, the distance between the average location and the reference point was determined (using the Measure tool) to be 3.8 meters. The precision value was 4.4 meters. As the distance between the average and the known is less than the precision value, I determined that it is reasonably accurate in the horizontal direction. In the vertical direction, the distance between the average and the reference value was approximately 6 meters, where the precision value is 5.7 meters. This tells me that the device is not as accurate in the vertical direction.

In Part B, we worked with a larger dataset and determined the Root Mean Square Error (RMSE) and created a cumulative distribution function (CDF) of the error. After opening the Excel file and copying the benchmark X and Y values over, I had columns for various points (X and Y values) and the benchmark X and Y values, from which I created new columns and calculated the X and Y errors (difference between the point values and benchmark values), the XY error squared, and the error XY values. From the error XY values, I calculated the mean, median, RMSE, 68th, 90th, and 95th percentiles. I also determined the minimum and maximum.
From the error XY values and the cumulative percent (there are 200 values, so each is 0.5 percent of the total), I created a scatterplot showing the Error_XY versus the cumulative percent, which helps to demonstrate that several of the metrics can be determined from the scatterplot chart.

The most difficult part of the assignment was determining the difference between accuracy and precision. The definition is straightforward enough, but I was having issues mentally picturing the difference when performing the calculations. It made sense to me to say that if the distance between the average location and the reference point was less than the 68% precision value, then the data is accurate, but it’s still a little unclear. Hopefully I can work a little more with the statistical aspect of GIS in the future and gain a better understanding of it.

No comments:

Post a Comment