Sunday, August 30, 2015

Lab 1 - Calculating Metrics for Spatial Data Quality

This first lab was about calculating metrics for spatial data. One of the first things to get a handle on was the difference between accuracy and precision. Accuracy is basically how close the measured or observed value is to the known or “true” value, where precision is how close or clustered the measured or observed values are to each other. The first part of the lab introduced us to using ArcMap to determine precision and accuracy. Using a set of 50 different points representing a single location mapped 50 times using a GPS unit, we performed tasks to determine the accuracy and precision of the GPS unit. First, we determined the average location by using the Statistics tool to obtain an average X and Y location, and created a feature class, using Editor to place this point on our map. The shapefiles needed to be projected into a coordinate system that uses meters instead of decimal degrees. Next, we needed to find a distance from the average location that corresponded to a percentage of the observations, specifically 50%, 68%, and 95%. To do this, I first performed a spatial join on the waypoints and average location shapefiles, which created a distance field. After sorting the distance field in ascending order, I set my buffers at the distance corresponding to the 25th value (50%), the 34th value (68%), and I used the value halfway between the 47th and 48th value (95%). This created buffers for the 50th, 68th, and 95th percentile, which shows the precision of the results. The map displaying this is shown below.



Using the 68th percentile as an accepted value for precision, I compared the device’s accuracy and precision. In the horizontal direction, the distance between the average location and the reference point was determined (using the Measure tool) to be 3.8 meters. The precision value was 4.4 meters. As the distance between the average and the known is less than the precision value, I determined that it is reasonably accurate in the horizontal direction. In the vertical direction, the distance between the average and the reference value was approximately 6 meters, where the precision value is 5.7 meters. This tells me that the device is not as accurate in the vertical direction.

In Part B, we worked with a larger dataset and determined the Root Mean Square Error (RMSE) and created a cumulative distribution function (CDF) of the error. After opening the Excel file and copying the benchmark X and Y values over, I had columns for various points (X and Y values) and the benchmark X and Y values, from which I created new columns and calculated the X and Y errors (difference between the point values and benchmark values), the XY error squared, and the error XY values. From the error XY values, I calculated the mean, median, RMSE, 68th, 90th, and 95th percentiles. I also determined the minimum and maximum.
From the error XY values and the cumulative percent (there are 200 values, so each is 0.5 percent of the total), I created a scatterplot showing the Error_XY versus the cumulative percent, which helps to demonstrate that several of the metrics can be determined from the scatterplot chart.

The most difficult part of the assignment was determining the difference between accuracy and precision. The definition is straightforward enough, but I was having issues mentally picturing the difference when performing the calculations. It made sense to me to say that if the distance between the average location and the reference point was less than the 68% precision value, then the data is accurate, but it’s still a little unclear. Hopefully I can work a little more with the statistical aspect of GIS in the future and gain a better understanding of it.

Friday, August 7, 2015

Module 11 - Sharing Tools

This was our last assignment of the course, and in this one we work with Python tools again. Last week we learned to create tools, and this week we learned to share them. This is somewhat of a continuation from last week. In this lab we modified script so that the parameters would use file paths set in the tool itself instead of from the set file paths in the script. We did this by using the sys.argv[] code. The positions numbers for this start at "1", as opposed to the arcpy.GetParameter() code, which starts at "0".

We learned to edit the item description of the script tool in ArcCatalog. We learned to embed the script into the tool by right-clicking the tool in ArcCatalog and selected "Import Script." One of the more interesting features we can do now that we've imported the script is we can now set a password. The password still allows the tool to run, but the password needs to be input to view or edit the script. The tool in the assignment creates a number of random points within a feature, and then creates a buffer around those points. Below are the results of both my tool dialog and the results in ArcMap.



You can see the feature in purple with the random points (in this case, 50 of them) in black, with the light blue buffers around them.

I really enjoyed this course, and I learned a lot from it. I would have liked to have done a little more of script writing, but this course is designed to use Python with ArcGIS, and I think it teaches that very well. I learned how to use Python to perform geoprocessing tasks, and about toolboxes and tools within Python, working with rasters and different geometries, etc. I think this course is a great start as one moves forward to work more with GIS and Python programming.