* This problem is described in file named "Moran's I in Stata.pdf"
* Need to install user-written STATA command called spatgsa if it is not
* already installed on your computer.

use http://www.ats.ucla.edu/stat/stata/faq/ozone.dta, clear

summarize lat lon

* Based on the minumum and maximum values of these variables, we can calculate
* the greatest Euclidean distance we might measure between two points in this
* dataset.

display sqrt((34.69012 - 33.6275)^2 + (-116.2339 - -118.5347)^2)

* Knowing this maximum distance between two points in our data, we can 
* generate a matrix based on the distances between points.  In the spatwmat
* command, we name the weights matrix to be generated, indicate which of our
* variables are the x- and y-coordinate variables, and provide a range of
* distance values that are of interest in the band option.  All of the 
* distances are of interest in this example, so we create a band with an
* upper bound greater than our largest possible distance.  If we did not
* care about distances greater than 2, we could indicate this in the band
* option.  

spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band (0 3) 

* As described in the ouput, the command above generated a matrix with 32
* rows and 32 columns because our data includes 32 locations.  Each
* off-diagonal entry (i,j) in the matrix is equal to 1/(distance between
* point i and point j).  Thus, the matrix entries for pairs of points that are
* close together are higher than for pairs of points that are far apart.  If 
* you wixh to look at the matrix, you can display it with the "matrix list"
* command.  With our matrix of weights, we can now calculate Moran's I and do
* hypothesis testing.  The null hypothesis is that there is no spatial
* autocorrelation correlation between the ozone measurements versus the
* alternative hypothesis that is some spatial correlation between the
* ozone measurements.  

spatgsa av8top, weights(ozoneweights) moran

* Based on these results, we can reject the null hypothesis that there is
* zero global spatial autocorrelation present in the variable av8top. 

* Variations

* BINARY MATRIX: If there exists some threshold distance d such that pairs
* with distnaces less than d are neighbors and pairs with distances greater
* than d are not, you can create a binary neighbors matrix with the 
* spatwmat command (indicating "bin" and setting band to have an upper bound
* of d) and use this weights matrix for calculating Moran's I.  We could do this
* for d = 1:

spatwmat, name(ozoneweights) xcoord(lon) ycoord(lat) band(0 1) bin

* Using this binary weight matrix we can again calculate Moran's I to test
* for the presence of spatial autocorrelation among the defined "neighbors."
* In this example, the binary formulation of distance yields a similar result.
* We can reject the null hypothesis that there is zero spatial autocorrelation
* present in the variable av8top at alpha = 0.05.

spatgsa av8top, weights(ozoneweights) moran

* USING AN EXISTING MATRIX: If you have calculated a weights matrix according
* to some other metric than those available in spatwmat and wish to use it in
* calculating Moran's I, spatwmat allows you to read in a STATA dataset of
* the required dimensions and format it as a distance matrix that can be 
* used by spatgsa.  If altweights.dta is a dataset with 32 columns and 32 
* rows, it could be converted to a weighted matrix "aweights" to be used in
* analyzing av8top:
* spatwmat using "C:\altweights.dta", name(aweights)