Now that we know that more variables are required for the model, I performed another analysis using 3 variables: Population, Low Education, and Distance to Urban Centers. This model shows that an increase in population or in lower levels of education lead to an increase in 911 calls, and a decrease in the distance from urban centers lead to an increase in 911 calls. The Jacque-Bera is not statistically significant, which means that the data is normally distributed and we are using the correct number of explanatory variables in the model. The VIF for all 3 explanatory variables is between 1 and 2, so the variables are not redundant (I’m not using too many variables). Based on the adjusted R-Squared value, 74% of the variation in the number of 911 calls can be explained by the changes in population, the lower education level, and the distance from urban centers. This is a good model, but how do I know it's the best?
Part D investigates how to determine the best model. This is done using the Exploratory Regression tool. This tool runs a regression analysis on all combinations of the explanatory variables selected, from which we can look at the statistics to determine the best model. In this case, the best model was determined from 4 explanatory variables: Jobs, Low Education, Distance to Urban Centers, and Alcohol. Three of the 4 variables show a positive relationship compared to 911 calls: jobs, low education, and alcohol. The distance to urban centers had a negative relationship compared to 911 calls. The performance of the model is determined partially by the adjusted R-squared value, which is basically a measure of how much of the variation of the dependent variable can be explained by changes in the explanatory variables. Also looked at are the VIF, of which values >7.5 mean that the variables are redundant. The Jacque-Bera statistic is basically a measure of whether or not the residuals are normally distributed. If they are, we have a properly specified model. Another way to tell this is by using the Spatial Correlation (Global Moran's I) tool, which shows a chart displaying whether there is clustering in the residuals or in the correlation is random. If there is clustering, that means that we need to include more variables in the analysis.
No comments:
Post a Comment