Spatial joins

Analysing sets of geolocated data with the help of Spatial Joins
Spatial joins refer to queries you can make between two sets of geolocated data. The spatial join function in mWater allows users to make these queries based on the distance between various entities.  These can be between water points and communities, schools and surveys, health facilities and hand washing facilities, pipes and households, boreholes and shapefiles, to name a few.

Spatial joins allow users to make analyses between geolocated sets of data
Spatial joins allow users to make analyses between geolocated sets of data
With these spatial queries you can for example: 

  • Count how many taps are within 100m of a water kiosk.
  • Show a list of water points within 500m of a selected school.
  • Calculate the amount of households your survey has visited within 1km of a community.
  • See if a borehole is within an area of low, moderate or good ground water potential.
  • Build a map of all the health facilities which don't have any other health facilities within 10km mapped in mWater. 

and much more.

In this guide we'll cover how spatial joins work, and look at a few case studies in detail. 

Note: You can also do geospatial queries on population data within the system. Learn more here.
Explore a demonstration dashboard here
Contents
This feature has been funded by USAID’s WATSAN project, implemented by DAI
A quick demonstration:
You can learn more about creating dashboards here.

Building a spatial join query

You can use the spatial join expression across mWater's data analysis features, including in dashboard widgets such as tables, charts, pivot tables, as well as maps and datagrids. 

To build a spatial data query, you need two data sources which both have GPS location data as part of them. This can mean a survey with a location or site question, Sites, indicators with GPS data, etc. 

Once you have selected which two geolocated data sources you want to query, and the kind of analysis you want to build (table, chart, etc), you can build the spatial join query with these basic steps:

1) Select the first data source as the source of the Widget, Map or Table.
2) Select the Spatial Join formula from the list of available expressions.
3) Configure the spatial join to point at the second data source, and set the distance in meters that you are interested in.


1. Pick a data source with GPS
1. Pick a data source with GPS
2. Select the formula Spatial Join
2. Select the formula Spatial Join
3. Configure the spatial join to point to the second data source. This example will create a list of schools and show the *number* of water points within 100m of each school.
3. Configure the spatial join to point to the second data source. This example will create a list of schools and show the *number* of water points within 100m of each school.
The key step to get the result you want is to configure the spatial join expression correctly, so let's look at the configuration options in detail. 

The data sources you are querying between are stored as tables within mWater. The data source you have selected for your widget or table is the From Table (source) of the join, and the second data source you want to query is the To Table (target). 
For the source table, select the appropriate GPS Location for the Location field. You need to manually select this because for example a survey can have any number of location questions so the system expects you to specify the exact one. 
For the target table you need to first select the data source and then specify the Location field for that. 
The Calculated Value field is critical in giving your query the value you like. The most common option is to count the number of entries of the target table (e.g. the number of water points) within a given distance (100m) of each entry in the source table (schools). 
However, you can also build more complex calculations as fit your needs. For example if you wanted to find out the average depth of boreholes within 100m of schools, or the total amount of subscribers, you could can build that expression here. We'll look at more examples in case studies below. 
Next, you can Filter the target table to narrow down the query. If the target table's data source is the water point site table, you could filter that table down to only boreholes, for example, or to only one country or district. 
You also need to set the distance for the query. A value of 100 means that the query will join all rows between the source table and the target table which are 100 meters or less from each other. 
You can optionally choose to Aggregate results, which combines the source table results together and doesn't duplicate any values. Let's look at that option in detail next.

Aggregating results

Note: Aggregate results have been temporarily disabled due to an issue in PostGIS

The aggregate results option combines all distance queries together and includes each result calculation only once. Select this option when you want to accurately aggregate a calculation across a level beyond a single point, such as a district, region, or utility without duplication. 

In the below example, we see a spatial join done between water kiosks and households that are within 200m of those kiosks. When we don't check the aggregate results option, our pivot table counts a total of 50 households as matching that query. This is because the query looks at each kiosk and counts how many households are within the specified distance. 
With the aggregate results box checked, we get a total of 41 households. This is because within the queried area, 41 households are within 200m of ANY water point in the source we have chosen. So the households which fall within 200m of two or more kiosks are only counted once. 
You can explore this example here.

Example - Kiosk subscribers


In Haiti utilities are using this feature to find out keep track of how many customers they have within a given distance of a water point. Combined with the population query feature this also allows the utilities to estimate the future potential customer base by subtracting existing customers from the estimated population around a water kiosk. 

Note: The population count is divided by the average number of people per household to get an Est. # of households within 100m of each kiosk.

In this approach, customers are tracked as water points classified as taps to the yard or dwelling. Then it is possible to do a spatial join by filtering the source table to the kiosks, and filtering the target table (also water points) to only taps in yards or dwellings.
The spatial join feature allows utilities to calculate the amount of customers within a distance from a water kiosk
The spatial join feature allows utilities to calculate the amount of customers within a distance from a water kiosk
The target table is filtered to only the relevant domestic water points
The target table is filtered to only the relevant domestic water points
View this example here

Example - Groundwater potential

Spatial joins can also be done with custom shapefiles which you have imported into the platform. This opens many possibilities of geospatial analysis, for example of electrical grids, lithology, or ground water potential. 

In this example we have imported a shapefile that contains regions categorized by ground water potential. On top of this layer, we are mapping water points and ranking them on the basis of what shapefile area they are overlapping. 

The key steps to take into account in creating the map layer is to set the shapefile as the target table and the distance to 0 meters (the distance is 0 meters when the point data overlaps with the shapefile polygon exactly). The value that the spatial join should return depends on what attribute of the shapefile you want to analyze - in this example the shapefile has a value for total groundwater potential for each region in it (Total gw_pot) which we bring in and then color code. 

The details of the water point layer configuration depend on the shapefile you have imported:
You can view the example dashboard here

Example - A map of solitary health care facilities

You can customize maps in mWater to for example filter a layer down to display only those things which meet a certain spatial join criteria. In this example, we've created a map that shows only solitary health care facilities, meaning ones that are at least 10km away from any other mapped health care facility. 

The key step is to create the spatial join in the filter of the layer, make sure the target table is the same as the source table, so in this case Health facility for both, and then set the operator of the filter to is l. Why is this? The spatial join returns the number of health care facilities which are less than 10,000 meters away. Because any geolocation is by definition less than 10,000 meters away from itself, the spatial join will always return at least the value 1 for each health facility. 

This same filtering principle can be applied to any data you are mapping. 
The spatial join in this case returns a numerical value. We want it to be 1 for those health care facilities which have only themselves within 10,000 meters of their own position.
The spatial join in this case returns a numerical value. We want it to be 1 for those health care facilities which have only themselves within 10,000 meters of their own position.
We can compare this map to a view of all mapped health care facilities to see that they are indeed being excluded by our filter:
All mapped health care facilities
All mapped health care facilities
Explore the map here.
These are just some of the possible use cases - remember that any data you have added to or can access in the platform which has GPS locations included can be part of a spatial join.

Questions or comments on this feature? Get in touch via info@mwater.co
Please feel free to get in touch if you have suggestions or want to share a use case for this feature which might benefit other users.
Updated: 2021-2-4