Sunday, June 28, 2015

Lab 6 - Crime Analysis

This week’s exercise was about crime mapping, specifically hotspot analysis. We became familiar with various techniques commonly used in crime analysis, and we learned to aggregate crime events to calculate crime rates. We examined spatial patterns and socio-economic characteristics in crime rates. We also learned about global and local spatial clustering methods, and we compared the reliability of hotspot mapping in crime prediction.

In the first part of the assignment, we were to determine if a relationship between soci-economic variables and residential burglaries existed. From the crime dataset provided, I used a SQL query to select only residential burglaries, and joined that data spatially with census data. Using a table join with the demographic data allowed me to calculate the crime rate per 1,000 units. From here, I created a choropleth map of residential burglaries per 1,000 housing units by census tract, and I also created a choropleth map of the % of housing units that were rented per census tract. Finally, I created a scatterplot of % housing units rented vs. the number of burglaries per 1,000 housing units.

The second part of the assignment introduced us to kernel density hotspot mapping, which is a technique commonly used to show clustering of point events. Although this technique does not show the statistical significance, it does show hotspots in a visually pleasing manner. One has to be careful, however, to choose a reasonable bandwidth, as this is what corresponds to the amount of smoothing. If the bandwidth is too large, there will be too much smoothing and you will lose details in the map. If the bandwidth is too small, there will be not enough smoothing. Here, we want a kernel density map of auto thefts with a bandwidth of 0.5 miles and a cell size of 100 feet (as the cell size needs to be much smaller than the bandwidth). As I wanted to use the average density as a classification, I excluded any values of “0”, determined the mean density, and classified the data as multiples of the mean (2 * mean, 3 * mean, etc.) to create the hotspot map.

The third part of the assignment was to create 3 hotspot maps using different techniques and to compare them. The first method was grid-based hotspot mapping. I performed a spatial join between the burglaries and grids layers, which created a count field displaying the number of burglaries in each grid. I then manually selected the top 20% (the top quintile) and dissolved them to form a single polygon, which became my hotspot map. The second method was a kernel density hotspot map similar to that in the second part of the assignment. I chose the same bandwidth (0.5 miles) as a bandwidth. To classify the data, I excluded any areas with a value of “0” and used the Reclassify tool so only areas greater than 3 times the mean density would be displayed as part of the hotspot map. The third type of hotspot map was created using the local Moran’s I technique. This technique uses crime counts or rates aggregated by meaningful boundaries. I created a spatial join between block groups and 2007 burglaries, and calculated the crime rate (# of burglaries per 1,000 housing units). I ran the Local Moran’s I tool in ArcMap, which gave different types of spatial clusters. As this is a hotspot map, I selected only those areas with “high-high” clusters, meaning high incidents of burglaries next to other areas with high incidences of burglaries. To compare these three types of hotspot maps, I created a map that is shown below displaying all 3 hotspot maps on the same data frame (I used various levels of transparency so all are visible).



Next we were to determine the reliability of the hotspot maps to predict crime. To do this, I used burglary data from 2008 (the hotspot maps used 2007 data). I determined the total area of the 3 hotspot areas and the number of 2008 burglaries within the hotspot areas. From this, I calculated the % of 2008 burglaries within the 2007 hotspots and the crime density. The grid-based overlay method showed the highest percentage of 2008 burglaries in the 2007 hotspot area, but the hotspot map was also spread over a larger area. The kernel density method showed a smaller percentage of 2008 burglaries in the hotspot area, but the total hotspot area was much smaller and the crime density was higher, so in my opinion this is the better analysis for predicting future crime, at least in regards to deployment of more officers to the area. There would be a greater number of officers in a smaller area (as compared to the other hotspot maps), so there would be a shorter response time to calls and backup would be quicker to the scene, if necessary. The Local Moran’s I method showed a lower percentage and lower crime density.


This was really an interesting assignment. I have some experience with criminal justice courses, but have never worked with crime mapping or hotspot analysis at all. I found in interesting to learn about the various mapping methods and the pros and cons of each type. I also found it helpful to think about which analysis is better in terms of where officers are needed and how fast they could likely get to an incident.

Wednesday, June 24, 2015

Module 6 - Geoprocessing with Python

In this lab, we were to write a script that performs some geoprocessing functions. For me, being able to use the Search tool to look up the proper syntax for a geoprocessing function was very useful. In this script, I imported the relevant modules and set the workspace. This script first adds XY coordinates to the "hospitals" shapefile using the AddXY tool. It then uses the Buffer tool to create a 1000 meter buffer around the hospital features. After the buffer is created, it uses the Dissolve tool to dissolve the hospital buffers into a single feature. After each geoprocessing function was completed, I used the GetMessages() function to print out the messages from that tool. After that I went back through and commented the script to show which geoprocessing tool was being run. I also went back to the top of the script and used comments to add my name, date, the script's name, and a description of what the script does. Below is a screenshot of the results from the interactive window (created from the GetMessages() function).


This lab, while relatively short, was a good instruction of how to use Python to perform various geoprocessing tools, and learning that the Desktop Help and the Search button in ArcMap will show us the syntax is very helpful, and will be useful in future labs.

Monday, June 22, 2015

Lab 5 - Spatial Accessibility Modeling

This week’s exercise was on spatial accessibility modeling, and had three parts. In Part A we worked through a number of GIS tutorials learning the different types of network analysis using the Network Analyst extension. In Part B, we measured the straight-line distance from psychiatric hospitals in Georgia to different counties to determine who had access to those services. In Part C, we performed a network analysis determining the accessibility of community college campuses to potential students before and after a specific campus closed.

In Part A, I worked through the tutorials assigned (as well as the two optional extra ones mentioned in the lab assignment), and I really got a feel for how Network Analyst works for each type of analysis. Each tutorial focused on a different type of analysis: the best route, closest facility, service area, location-allocation, origin-destination (OD) cost matrix, and the vehicle routing problem. Working on these tutorials and viewing the ESRI video provided helped me learn not only the concept, but applying it in ArcMap. I was especially impressed by how customizable the analysis can be, down to whether U-turns are allowed or not or the time of day if I wanted to simulate traffic.

In Part B, we wanted to determine the accessibility of psychiatric hospitals to the population of Georgia. Using a spatial join, I determined the distance from each county to the nearest hospital. From here, I could set a reasonable distance from the nearest hospital to be “accessible” and determine how many people lived within that area. Here, I wanted to create a Cumulative Distribution Function (CDF) of the data. I opened the DBF file in Excel and began to manipulate the data. Deleting fields unnecessary for the analysis helped make the table not as cumbersome to work with. Adding and manipulating the fields wasn’t so bad, but I am a bit rusty with creating graphs in Excel and it took probably more time than it should have. After creating a new field called cumulative percentage (which was the cumulative percentage of all the counties of Georgia as we moved through the table), I created a scatterplot graph of the distance of the nearest psych hospital (in miles) vs. the cumulative frequency (in %). I then converted the census tracts to centroids and performed a spatial join to determine the distance from each census tract to the nearest hospital. The end product here is another CDF of accessibility to the hospitals. This showed the accessibility of psych hospitals to those over the age of 65 and to those under the age of 65.

Part C was quite tricky. For this accessibility analysis, I wanted to consider two scenarios. One was the accessibility of a population to a total of 7 community college campuses. The other was for a total of 6 campuses after one of the original 7 had closed. We wanted to determine the impact the closure would have on the community college system. Using the Network Analyst extension, I created service areas of 5, 10, and 15 minutes from each campus for both scenarios. Below is my map of the service areas of the two scenarios side-by-side, with the campus to be closed labeled.



Next I used the closest facility analysis for both the 7 campus and the 6 campus scenarios, using the campuses as the facilities and the block group centroids as the incidents. The main output here is the travel time from each centroid to the nearest campus. At this point, it became a little tricky, as we needed to create a field for FIPS in order to join the tables. After joining the two tables, I was able to use the select by attributes tool to determine the number of college age residents within the service area boundaries, both before and after the closure of the campus. Then we wanted to answer how this closure affected travel times of those residents. Using another spatial join, I created a layer focusing only on students for whom the campus to be closed was closest and which campus would be closest after it had closed. After opening the DBF file with Excel, I was able to determine the number of students for whom the closed campus was closest (and how long they needed to travel to it), and where and how far the closest campus was to those potential students after they were displaced by the campus closure. I then created a CDF comparing travel times before and after the closure. As expected, travel times are consistently longer after the closure of that campus.


I learned a lot during this lab, and I enjoyed working with spatial accessibility metrics. The biggest difficulties of this lab were figuring out some of the processes required to answer some of the analysis questions. Once I was able to do that though, the calculations required came rather quickly. Another obstacle for me personally was creating the CDFs in Excel. I haven’t used Excel to graph often, so it took some time for me to arrange the extent of the data correctly. I enjoyed learning about the different accessibility analysis techniques, and I thought the tutorials and especially the ESRI video were excellent.

Tuesday, June 16, 2015

Module 5 - Geoprocessing in ArcGIS

This lab was about geoprocessing in ArcGIS. In it, we learned to create a toolbox, and how to create tools using ModelBuilder and scripts tools. We also learned to export scripts from ModelBuilder. We also learned how to update a script derived from a model so that it will work on its own.
This module was really interesting to me because I have wanted more practice with ModelBuilder since we were introduced to it last semester. I created a new toolbox, which was a pretty straightforward process in ArcCatalog. Inside the toolbox , I created a new model, the end product of which would contain the soils within a basin not classified as “not prime farmland.” The model uses the clip, select, and erase tools to accomplish this.
I set all the input and output variables as model parameters. This allows us to change the input features, output name, or filepath if we choose. One thing to note, which I feel is pretty easy to forget, is that all the input features need to be added to the current document for the model and script tools to work properly. We exported the model as a Python script. This is really useful as now a lot of the scripting is done for us. But there was still some to do to allow the script to work as a stand-alone script. To do this, all we really needed to change in this case was to explicitly define the full file path of the layers we were working with. It’s also important to allow Arc to overwrite the output, or there will be an error and the script will not run. Using this process, I created a shapefile showing the soils not classified as “not prime farmland.” A screenshot of the shapefile is below.



Next we learned how to share a toolbox. I selected both the toolbox and the scripts folder and sent them to a new zipped folder.

This lab, while not very long, was very informative and full of useful material. I really liked how easy it was to export the model to a script, and to create the model itself. I also really liked how complete the script was; it even had some comments to describe what the script was doing. I learned quite a bit in this lab and hope to work with some of this in the future.

Sunday, June 14, 2015

Lab 4 - Visibility Analysis

This lab was very lengthy but we learned several new techniques around the overall theme of visibility analysis. We learned to perform viewshed and line of sight (LOS) analyses using elevation models. We worked with LIDAR data and prepared it for use in visibility analysis. We created and investigated profile graphs for LOS analysis. We also learned to adjust various parameters used in visibility analysis, such as offset and various viewing angles.

In the first part of this assignment, our objective was to perform a visibility analysis for three potential fire tower locations. We created a viewshed and an observer raster and investigated the differences. The viewshed shows areas visible from a specific observation point or points, where an observer raster shows which observer points are visible from each raster location, which is a somewhat confusing concept.

For the second part of the assignment, our objective is to create a map of visibility from the roads in Yellowstone National Park. Fortunately, the visibility raster was provided for us. It identifies areas not seen from any roads and areas seen from one or several locations on roads. After creating the map, it can be seen by qualitatively comparing the topography and visibility that the higher visibility areas coincided with either areas of higher elevation or areas very near the roads, which makes sense. My map is found below.



The third part of the assignment introduced us to LIDAR and LOS analysis. Here, we want to examine the viewshed from camera locations and place new ones for optimal coverage at the site of the Boston marathon. First, we added the LIDAR image of the area and created a DEM using the LIDAR data as the input. After adding the cameras and orthographic image of the Boston marathon site, we performed a viewshed analysis. We had to modify the offset and viewing angle to be more realistic (the default offset is “0”, which is at ground level and the default viewing angle is all angles). Determining the placement of two new cameras took some trial and error. I would try a location, run a visibility analysis, and move the camera if I didn’t like the location. I wanted to cover the street, the sidewalk areas, and the buildings immediately surrounding the finish line with three security cameras. Camera 1 is placed really well, as it views most of the length of the street, but there are several areas, mainly near sidewalks, that it misses. For Camera 2, I decided to place it at a height of 80 feet, on the corner of a building located on the northeast corner of the intersection just to the northwest of the finish line. Its facing in such a way that it covers a viewing angle of 90° to 180°, so east to south. The height of this building is approximately 180 feet, but I was concerned that placing the camera that high up would not be able to see things well at ground level, such as behind vegetation or on sidewalk areas under buildings. So I placed the camera at 80 feet, at what looks like the corner of the building. There appears to be a ledge there, so it should be able to be mounted fairly easily. Camera 2 captures the area east to south of that location, and it gets a good view of the sidewalk side of the vegetation, which is what I was having trouble with when placing it at other locations. Camera 3 is the camera placed furthest away from the finish line. It’s on the corner of a building across the intersection east-northeast of the finish line. Camera 3 is at a height of 100 feet, also on the corner of a building, covering a viewing angle from 180° to 270°, or south to west. My main purpose for placing it here is that I want to capture both intersections immediately next to the finish line, and this camera does that well as well as capturing the finish line itself. The blue areas show where its covered by one camera, green by 2 cameras, and red by all 3 cameras. All 3 cameras capture the finish line and the area immediately surrounding it. Below is a screenshot of my camera locations and visibility analysis.



The final part of this lab was a line of sight analysis using the Create Line of Sight tool in the 3D Analyst toolbar of ArcMap. This is a really neat feature that quickly gives point-to-point line of site analyses, and can work with multiple points at once. Our data contained all 15 summits and we created an LOS analysis for all at once. What this did is create a LOS path from each summit to every other summit. We then compared this LOS tool to the viewshed output for a particular summit.

I really learned a lot in this lab. I had some issues seemingly where others did, around the last couple questions of Part A. I’m still a little iffy on the differences in what the viewshed raster represents as opposed to the observer raster, and finding which summits can be seen from a specific peak was confusing as the layers did not define the peaks with a consistent ID value, so there was a lot of comparing between the two visually on that section. Overall though, I learned a lot from this lab and really enjoyed working with the various methods of creating a visibility analysis.

Wednesday, June 10, 2015

Lab 4 - Debugging and Error Handling

This lab was about learning to debug and handle errors in Python code. We also got some experience with the debugger in PythonWin. I think it’s really helpful to learn some of this before writing code too much more complicated. In this lab, we had three scripts with which we had to handle errors in different ways.
Part I was basically finding and correcting errors, so that the code would print out the field names of the “parks” shapefile. In this case, we knew we only had 2 errors, which of course when we write a code we won’t know. Below is a screenshot of the output of the first corrected script.



The second script had a total of eight errors, most of which were pretty easy for me to spot. This script was to print the layers active in a data frame. This had several errors, some of which were in commands we have not used yet, but using a search engine to look up the command in question (mainly to see what the syntax normally looked like) was very helpful. Below is a screenshot of my output.



The third part was trickier. We were to bypass the error using a try-except statement, where Part A returns an error message, but Part B runs correctly. This was mainly trial and error for me as far as where to place the try-except statements. I imagine I will get faster at it with more practice. Below is a screenshot of my output.




I feel having some prior experience with debugging helped a lot. The new concepts to me were really learning how to use the debugger tool for Python and using the try-except statements. I feel this was a good lab to help us get comfortable with the types of errors we will see in future labs.

Sunday, June 7, 2015

Participation Assignment #1 -

When looking through the UWF library, I found an article relating the use of GIS to establish relationships between precipitation and terrain data. The article was written by Hong Haoyuan from the Jiangxi Meteorological Observatory in China. The objective of the study I chose is to use GIS and a regression equation to provide better support for flood disaster prevention and support.
The authors obtained DEM, slope, slope aspect, and water system data. Most of the data came from the Jiangxi Meteorological Bureau; precipitation data came from observational data from several meteorological stations from the days before, during, and after the rainstorm event.
The authors used a multiple linear regression model based on the least square method. They determined that the greatest impacts on the water system were due to rainfall, slope, altitude, and slope aspect (the direction the slope faces). Using the analysis tools of ArcGIS, they found the distribution characteristics of the criteria, and found that the factors most related to flood disasters in this region are the slope and slope aspect of the terrain. The province being discussed in the study has flat plains in the center and more mountainous regions in both the western and eastern sections. This region contains Poyang Lake, the largest freshwater lake in China, and the precipitation amount showed a close correlation with proximity to the lake.
Also using the tools in ArcGIS, the authors mapped the cumulative rainfall distribution of the event, with a spatial resolution of 30 m x 30 m. Based on the spatial distribution of precipitation with the maximum rainfall near the areas of steeper slopes and near the large lake, they determine that the orographic lifting process was an important contributor to the rainstorm event. Orographic lifting is air forced upwards by terrain, causing it to cool and become saturated and produce rain. In addition to larger precipitation near the large lake, there were also precipitation maxima just off some of the steeper slopes, supporting the authors idea that orographic lifting was a major contributor for this rainstorm.
The authors used observational data from weather monitoring stations to determine which factors were the most important to the spatial distribution of rainfall and established a regression model to examine this. Using ArcGIS software, they analyzed the spatial distribution of rainfall using those criteria. Using their model, the modeled rainfall and actual rainfall differed by only 1.358%, so GIS showed that using their criteria produced an accurate representation of rainfall distribution in this case.
I chose this article because this is an interesting blend of GIS and meteorological concepts. I found it really interesting to see some of the meteorological concepts being mapped out using some of the GIS analysis techniques that we have recently learned. I feel that GIS isn’t used enough in meteorology, and I love finding interesting articles that blend the two together.


Lab 3 - Watershed Analysis

This lab dealt with some more complicated concepts dealing with watershed analysis. We were introduced to the procedures necessary to delineate streams and watersheds from a digital elevation model (DEM), which was provided to us but originally obtained from the National Elevation Dataset (NED) through the Seamless Data Distribution System of the USGS. The first step to prepare the DEM was to fill the sinks. A sink is a depression with no visible outlet. Filling the sinks allows the water to flow properly throughout the area we are investigating. After filling the sinks, we created a flow direction raster, which utilizes the D8 algorithm to determine the direction of the flow. From this flow direction raster, we are able to create a flow accumulation raster, which is a calculation of the accumulated weight of all cells flowing into the downstream cell. Cells with higher values are cells where flow is concentrated and can be used to identify streams. From the flow accumulation raster, we created a raster of a stream network based on a threshold. The threshold value is the number of cells we set as necessary to denote a stream. Part of this lab was changing the threshold value and determining which we thought gave the best results when comparing our modeled stream network to the actual stream network. We converted our streams to features and from there, created a stream link and stream order raster. I found the stream order raster particularly interesting. Using the Strahler method, the stream order increases only if two streams of a lesser order meet. The Streve method stream order is determined by the sum of streams meeting irrespective of upstream stream order. In our lab, we used the Strahler method, but I feel that either gives one a good idea of the rate of flow or size of the stream, even though it doesn't seem to be quantifiable from this tool alone.

Next we worked with watersheds and pour points. The term "pour point" was a little confusing to me at first; I tended to keep thinking of the term as an inflow. What it basically is is where water "pours" out of the stream system, often to a larger body of water. To determine a pour point, we added a new feature class and digitized a pour point where the stream drained out to the ocean.Using the watershed tool, we created a flow direction raster using our pour point as an input, which results in a watershed containing the area that drains to that pour point. We also learned to snap a pour point to a stream if that pour point is not exactly on a stream cell.

For our final product, we were to select an existing watershed with only one pour point. We needed to show a map displaying a comparison of the modeled and existing streams, as well as a comparison between existing and modeled watersheds. I chose the Lumahai River watershed. I outlined the borders of the existing watersheds in red. I displayed the existing watershed as a darker blue and placed the modeled watershed on top to demonstrate any discrepancies between the two. I'm showing the modeled streams as orange so they stand out from the blue existing streams. The inset shows the larger extent of the island with the Lumahai River watershed displayed. I found it interesting that using hillshade as a separate raster looks much better than using the hillshade effect within the symbology window of the DEM. Some of the concepts of this week took me quite a bit of time to really understand and I feel a lot more is possible within watershed analysis, but I feel I learned some very useful techniques, especially when working with data at the National Weather Service and the hydrology department, as I should soon be.


Monday, June 1, 2015

Module 3: Python Fundamentals Part II

This week we learned some more about the fundamentals of Python. We learned how to import modules into a script, find and correct errors in the code, create loops and conditional statements, add comments, and how to iterate variables within loops.

For this assignment, we were provided a partially written code that we needed to complete. The objective was to create a small dice rolling game based on a players' name length, then create a list containing 20 numbers. We needed to create a conditional statement with a nested while loop that removed all instances of a specified number from that list.

First, we added a line importing the random module. Then we were directed to a couple of lines of code with errors in them that we needed to find and correct. Then the fun part of writing new code began.  The next part of code needed to create a loop that generated 20 random integers between 0 and 10 to a list. I started with a counter and an empty list, and used a while loop to accomplish this. The next block of code took some more thought. I wanted to remove a specific number anywhere it turned up in the list. I assigned a new variable and assigned the integer I wanted to remove to that variable. Using a conditional statement and the .count ( ) method, I determined if that specific number was contained in that list. If it did, I used a while loop to find how many times and removed that specific number from the list. Throughout the writing of the program, we were encouraged to use comments, as they not only tell who wrote the script and when, but they also can help describe in plain English what a line or block of code is supposed to accomplish. The screenshot below shows the output of my program. It shows each player's dice roll and whether they won or lost, the 20 integer long list, how many times my specific number was removed, and the list after that integer has been removed.

I really enjoyed working on this lab. It was good to start writing some simple scripts while practicing importing module, using conditional statements and loops, and using comments within the script. I think I may have had an easier time than some due to previous programming experience, but this was an excellent exercise and I still learned a lot about Python specifically. I'm still used to code not being happy with me when I leave an open-ended conditional statement (one without endif), but I am enjoying learning Python and I like it much better than what I have worked with before. I enjoyed working on this script and I look forward to writing more.