





















































In this article by Michael Diener, author of the Python Geospatial Analysis Cookbook, we will cover the following topics:
(For more resources related to this topic, see here.)
Geospatial data comes in hundreds of formats and massaging this data from one format to another is a simple task. The ability to convert data types, such as rasters or vectors, belongs to data wrangling tasks that are involved in geospatial analysis. Here is an example of a raster and vector dataset so you can see what I am talking about:
Source: Michael Diener drawing
The best practice is to run analysis functions or models on data stored in a common format, such as a Postgresql PostGIS database or a set of Shapefiles, in a common coordinate system. For example, running analysis on input data stored in multiple formats is also possible, but you can expect to find the devil in the details of your results if something goes wrong or your results are not what you expect.
This article looks at some common data formats and demonstrates how to move these formats around from one to another with the most common tools.
The simplest way to transform data from one format to another is to directly use the ogr2ogr tool that comes with the installation of GDAL. This powerful tool can convert over 200 geospatial formats. In this solution, we will execute the ogr2ogr utility from within a Python script to execute generic vector data conversions. The python code is, therefore, used to execute this command-line tool and pass around variables that are needed to create your own scripts for data imports or exports.
The use of this tool is also recommended if you are not really interested in coding too much and simply want to get the job done to move your data. A pure python solution is, of course, possible but it is definitely targeted more at developers (or python purists).
To run this script, you will need the GDAL utility application installed on your system. Windows users can visit OSGeo4W (http://trac.osgeo.org/osgeo4w) and download the 32-bit or 64-bit Windows installer as follows:
For Ubuntu Linux users, use the following steps for installation:
$ sudo apt-get install gdal-bin
Sudo su createuser –U postgres –P pluto
Setting up your Postgresql database with the PostGIS extension in Windows is the same as setting it up in Ubuntu Linux. Perform the following steps to do this:
Createuser.exe –U postgres –P pluto
$ sudo su createdb –O pluto –U postgres py_geoan_cb
createdb.exe –O pluto –U postgres py_geoan_cb
psql –U postgres -d py_geoan_cb -c "CREATE EXTENSION postgis;"
psql.exe –U postgres –d py_geoan_cb –c "CREATE EXTENSION postgis;"
Lastly, create a schema called geodata to store the new spatial table. It is common to store spatial data in another schema outside the Postgresql default schema, public. Create the schema as follows:
sudo -u postgres psql -d py_geoan_cb -c "CREATE SCHEMA geodata
AUTHORIZATION pluto;"
psql.exe –U postgres –d py_geoan_cb –c "CREATE SCHEMA geodata
AUTHORIZATION pluto;"
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import subprocess
# database options
db_schema = "SCHEMA=geodata"
overwrite_option = "OVERWRITE=YES"
geom_type = "MULTILINESTRING"
output_format = "PostgreSQL"
# database connection string
db_connection = """PG:host=localhost port=5432
user=pluto dbname=py_test password=stars"""
# input shapefile
input_shp = "../geodata/bikeways.shp"
# call ogr2ogr from python
subprocess.call(["ogr2ogr","-lco", db_schema, "-lco", overwrite_option, "-nlt", geom_type, "-f", output_format, db_connection, input_shp])
$ python ch03-01_shp2pg.py
We begin with importing the standard python module subprocess that will call the ogr2ogr command-line tool. Next, we'll set a range of variables that are used as input arguments and various options for ogr2ogr to execute.
Starting with the SCHEMA=geodata Postgresql database, we'll set a nondefault database schema for the destination of our new table. It is best practice to store your spatial data tables in a separate schema outside the "public" schema, which is set as the default. This practice will make backups and restores much easier and keep your database better organized.
Next, we'll create a overwrite_option variable that's set to "yes" so that we can overwrite any table with the same name when its created. This is helpful when you want to completely replace the table with new data, otherwise, it is recommended to use the -append option. We'll also specify the geometry type because, sometimes, ogr2ogr does not always guess the correct geometry type of our Shapefile so setting this value saves you any worry.
Now, setting our output_format variable with the PostgreSQL keyword tells ogr2ogr that we want to output data into a Postgresql database. This is then followed by the db_connection variable, which specifies our database connection information. Do not forget that the database must already exist along with the "geodata" schema, otherwise, we will get an error.
The last input_shp variable gives the full path to our Shapefile, including the .shp file ending. Now, we will call the subprocess module ,which will then call the ogr2ogr command-line tool and pass along the variable options required to run the tool. We'll pass this function an array of arguments, the first object in the array being the ogr2ogr command-line tool name. After this, we'll pass each option after another in the array to complete the call.
Subprocess can be used to call any command-line tool directly. It takes a list of parameters that are separated by spaces. This passing of parameters is quite fussy, so make sure you follow along closely and don't add any extra spaces or commas.
Last but not least, we need to execute our script from the command line to actually import our Shapefile by calling the python interpreter and passing the script. Then, head over to the PgAdmin Postgresql database viewer and see if it worked, or even better, open up Quantum GIS (www.qgis.org) and take a look at the newly created tables.
If you would like to see the full list of options available with the ogr2ogr command, simple enter the following in the command line:
$ ogr2ogr –help
You will see the full list of options that are available. Also, visit http://gdal.org/ogr2ogr.html to read the required documentation.
We would like to extend our last script to loop over a folder full of Shapefiles and import them into PostGIS. Most importing tasks involve more than one file to import so this makes it a very practical task.
The following steps will batch import a folder of Shapefiles into PostGIS using ogr2ogr:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import subprocess
import os
import ogr
def discover_geom_name(ogr_type):
"""
:param ogr_type: ogr GetGeomType()
:return: string geometry type name
"""
return {ogr.wkbUnknown : "UNKNOWN",
ogr.wkbPoint : "POINT",
ogr.wkbLineString : "LINESTRING",
ogr.wkbPolygon : "POLYGON",
ogr.wkbMultiPoint : "MULTIPOINT",
ogr.wkbMultiLineString : "MULTILINESTRING",
ogr.wkbMultiPolygon : "MULTIPOLYGON",
ogr.wkbGeometryCollection : "GEOMETRYCOLLECTION",
ogr.wkbNone : "NONE",
ogr.wkbLinearRing : "LINEARRING"}.get(ogr_type)
def run_shp2pg(input_shp):
"""
input_shp is full path to shapefile including file ending
usage: run_shp2pg('/home/geodata/myshape.shp')
"""
db_schema = "SCHEMA=geodata"
db_connection = """PG:host=localhost port=5432
user=pluto dbname=py_geoan_cb password=stars"""
output_format = "PostgreSQL"
overwrite_option = "OVERWRITE=YES"
shp_dataset = shp_driver.Open(input_shp)
layer = shp_dataset.GetLayer(0)
geometry_type = layer.GetLayerDefn().GetGeomType()
geometry_name = discover_geom_name(geometry_type)
print (geometry_name)
subprocess.call(["ogr2ogr", "-lco", db_schema, "-lco", overwrite_option,
"-nlt", geometry_name, "-skipfailures",
"-f", output_format, db_connection, input_shp])
# directory full of shapefiles
shapefile_dir = os.path.realpath('../geodata')
# define the ogr spatial driver type
shp_driver = ogr.GetDriverByName('ESRI Shapefile')
# empty list to hold names of all shapefils in directory
shapefile_list = []
for shp_file in os.listdir(shapefile_dir):
if shp_file.endswith(".shp"):
# apped join path to file name to outpout "../geodata/myshape.shp"
full_shapefile_path = os.path.join(shapefile_dir, shp_file)
shapefile_list.append(full_shapefile_path)
# loop over list of Shapefiles running our import function
for each_shapefile in shapefile_list:
run_shp2pg(each_shapefile)
print ("importing Shapefile: " + each_shapefile)
$ python ch03-02_batch_shp2pg.py
Here, we will reuse our code from the previous script but have converted it into a python function called run_shp2pg (input_shp), which takes exactly one argument to complete the path to the Shapefile we want to import. The input argument must include a Shapefile ending with .shp.
We have a helper function that will get the geometry type as a string by reading in the Shapefile feature layer and outputting the geometry type so that the ogr commands know what to expect. This does not always work and some errors can occur. The skipfailures option will plow over any errors that are thrown during insert and will still populate our tables.
To begin with, we need to define the folder that contains all our Shapefiles to be imported. Next up, we'll create an empty list object called shapefile_list that will hold a list of all the Shapefiles we want to import.
The first for loop is used to get the list of all the Shapefiles in the directory specified using the standard python os.listdir() function. We do not want all the files in this folder; we only want files with ending with .shp, hence, the if statement will evaluate to True if the file ends with .shp. Once the .shp file is found, we need to append the file path to the file name to create a single string that holds the path plus the Shapefile name, and this is our variable called full_shapefile_path. The final part of this is to add each new file with its attached path to our shapefile_list list object. So, we now have our final list to loop through.
It is time to loop through each Shapefile in our new list and run our run_shp2pg(input_shp) function for each Shapefile in the list by importing it into our Postgresql PostGIS database.
If you have a lot of Shapefiles, and by this I mean mean hundred or more Shapefiles, performance will be one consideration and will, therefore, indicate that there are a lot of machines with free resources.
We will now change directions and take a look at how we can batch export a list of tables from our PostGIS database into a folder of Shapefiles. We'll again use the ogr2ogr command-line tool from within a python script so that you can include it in your application programming workflow. Near the end, you can also see how this all of works in one single command line.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
import subprocess
import os
# folder to hold output Shapefiles
destination_dir = os.path.realpath('../geodata/temp')
# list of postGIS tables
postgis_tables_list = ["bikeways", "highest_mountains"]
# database connection parameters
db_connection = """PG:host=localhost port=5432 user=pluto
dbname=py_geoan_cb password=stars active_schema=geodata"""
output_format = "ESRI Shapefile"
# check if destination directory exists
if not os.path.isdir(destination_dir):
os.mkdir(destination_dir)
for table in postgis_tables_list:
subprocess.call(["ogr2ogr", "-f", output_format, destination_dir,
db_connection, table])
print("running ogr2ogr on table: " + table)
else:
print("oh no your destination directory " + destination_dir +
" already exist please remove it then run again")
# commandline call without using python will look like this
# ogr2ogr -f "ESRI Shapefile" mydatadump
# PG:"host=myhost user=myloginname dbname=mydbname password=mypassword"
neighborhood parcels
$ python ch03-03_batch_postgis2shp.py
Beginning with a simple import of our subprocess and os modules, we'll immediately define our destination directory where we want to store the exported Shapefiles. This variable is followed by the list of table names that we want to export. This list can only include files located in the same Postgresql schema. The schema is defined as the active_schema so that ogr2ogr knows where to find the tables to be exported.
Once again, we'll define the output format as ESRI Shapefile. Now, we'll check whether the destination folder exists. If it does, continue and call our loop. Then, loop through the list of tables stored in our postgis_tables_list variable. If the destination folder does not exist, you will see an error printed on the screen.
If you are programming an application, then executing the ogr2ogr command from inside your script is definitely quick and easy. On the other hand, for a one-off job, simply executing the command-line tool is what you want when you export your list of Shapefiles. To do this in a one-liner, please take a look at the following information box.
A one line example of calling the ogr2ogr batch of the PostGIS table to Shapefiles is shown here if you simply want to execute this once and not in a scripting environment:
ogr2ogr -f "ESRI Shapefile" /home/ch03/geodata/temp
PG:"host=localhost user=pluto dbname=py_geoan_cb password=stars"
bikeways highest_mountainsThe list of tables you want to export is located as a list separated by spaces. The destination location of the exported Shapefiles is ../geodata/temp. Note this this /temp directory must exist.
OpenStreetMap (OSM) has a wealth of free data, but to use it with most other applications, we need to convert it to another format, such as a Shapefile or Postgresql PostGIS database. This recipe will use the ogr2ogr tool to do the conversion for us within a python script. The benefit derived here is simplicity.
To get started, you will need to download the OSM data at http://www.openstreetmap.org/export#map=17/37.80721/-122.47305 and saving the file (.osm) to your /ch03/geodata directory. The download button is located on the bar on the left-hand side, and when pressed, it should immediately start the download (refer to the following image). The area we are testing is from San Francisco, just before the golden gate bridge.
If you choose to download another area from OSM feel free to, but make sure that you take a small area like preceding example link. If you select a larger area, the OSM web tool will give you a warning and disable the download button. The reason is simple: if the dataset is very large, it will most likely be better suited for another tool, such as osm2pgsql (http://wiki.openstreetmap.org/wiki/Osm2pgsql), for you conversion. If you need to get OSM data for a large area and want to export to Shapefile, it would be advisable to use another tool, such as osm2pgsql, which will first import your data to a Postgresql database. Then, export from the PostGIS database to Shapefile using the pgsql2shp tool.
A python tool to import OSM data into a PostGIS database is also available and is called imposm (located here at http://imposm.org/). Version 2 of it is written in python and version 3 is written in the "go" programming language if you want to give it a try.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# convert / import osm xml .osm file into a Shapefile
import subprocess
import os
import shutil
# specify output format
output_format = "ESRI Shapefile"
# complete path to input OSM xml file .osm
input_osm = '../geodata/OSM_san_francisco_westbluff.osm'
# Windows users can uncomment these two lines if needed
# ogr2ogr = r"c:/OSGeo4W/bin/ogr2ogr.exe"
# ogr_info = r"c:/OSGeo4W/bin/ogrinfo.exe"
# view what geometry types are available in our OSM file
subprocess.call([ogr_info, input_osm])
destination_dir = os.path.realpath('../geodata/temp')
if os.path.isdir(destination_dir):
# remove output folder if it exists
shutil.rmtree(destination_dir)
print("removing existing directory : " + destination_dir)
# create new output folder
os.mkdir(destination_dir)
print("creating new directory : " + destination_dir)
# list of geometry types to convert to Shapefile
geom_types = ["lines", "points", "multilinestrings", "multipolygons"]
# create a new Shapefile for each geometry type
for g_type in geom_types:
subprocess.call([ogr2ogr,
"-skipfailures", "-f", output_format,
destination_dir, input_osm,
"layer", g_type,
"--config","OSM_USE_CUSTOM_INDEXING", "NO"])
print("done creating " + g_type)
# if you like to export to SPATIALITE from .osm
# subprocess.call([ogr2ogr, "-skipfailures", "-f",
# "SQLITE", "-dsco", "SPATIALITE=YES",
# "my2.sqlite", input_osm])
$ python ch03-04_osm2shp.py
Go and have a look at your ../geodata folder to see the newly created Shapefiles, and try to open them up in Quantum GIS, which is a free GIS software (www.qgis.org)
This script should be clear as we are using the subprocess module call to fire our ogr2ogr command-line tool. Specify our OSM dataset as an input file, including the full path to the file. The Shapefile name is not supplied as ogr2ogr and will output a set of Shapefiles, one for each geometry shape according to the geometry type it finds inside the OSM file. We only need to specify the name of the folder where we want ogr2ogr to export the Shapefiles to, automatically creating the folder if it does not exist.
Windows users: if you do not have your ogr2ogr tool mapped to your environment variables, you can simply uncomment at lines 16 and 17 in the preceding code and replace the path shown with the path on your machine to the Windows executables.
The first subprocess call prints out the screen that the geometry types have found inside the OSM file. This is helpful in most cases to help you identify what is available. Shapefiles can only support one geometry type per file, and this is why ogr2ogr outputs a folder full of Shapefiles, each one representing a separate geometry type.
Lastly, we'll call subprocess to execute ogr2ogr, passing in the output "ESRI Shapefile" file type, output folder, and the name of the OSM dataset.
Moving data from format to format also includes moving from vector to raster or the other way around. In this recipe, we move from a vector (Shapefile) to a raster (GeoTiff) with the python gdal and ogr modules.
We need to be inside our virtual environment again, so fire it up so that we can access our gdal and ogr python modules.
As usual, enter your python virtual environment with the workon pygeoan_cb command:
$ source venvs/pygeoan_cb/bin/activate
Let's dive in and convert our golf course polygon Shapefile into a GeoTif; here is the code to do this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from osgeo import ogr
from osgeo import gdal
# set pixel size
pixel_size = 1
no_data_value = -9999
# Shapefile input name
# input projection must be in cartesian system in meters
# input wgs 84 or EPSG: 4326 will NOT work!!!
input_shp = r'../geodata/ply_golfcourse-strasslach3857.shp'
# TIF Raster file to be created
output_raster = r'../geodata/ply_golfcourse-strasslach.tif'
# Open the data source get the layer object
# assign extent coordinates
open_shp = ogr.Open(input_shp)
shp_layer = open_shp.GetLayer()
x_min, x_max, y_min, y_max = shp_layer.GetExtent()
# calculate raster resolution
x_res = int((x_max - x_min) / pixel_size)
y_res = int((y_max - y_min) / pixel_size)
# set the image type for export
image_type = 'GTiff'
driver = gdal.GetDriverByName(image_type)
new_raster = driver.Create(output_raster, x_res, y_res, 1, gdal.GDT_Byte)
new_raster.SetGeoTransform((x_min, pixel_size, 0, y_max, 0, -
pixel_size))
# get the raster band we want to export too
raster_band = new_raster.GetRasterBand(1)
# assign the no data value to empty cells
raster_band.SetNoDataValue(no_data_value)
# run vector to raster on new raster with input Shapefile
gdal.RasterizeLayer(new_raster, [1], shp_layer, burn_values=[255])
$ python ch03-05_shp2raster.py
The resulting raster should look like this if you open it using QGIS (http://www.qgis.org):
There are several steps involved in this code so please follow along as some points could lead to trouble if you are not sure what values to input. We start with the import of the gdal and ogr modules, respectively, since they will do the work for us by inputting a Shapefile (vector) and outputting a GeoTiff (raster).
The pixel_size variable is very important since it will determine the size of the new raster we will create. In this example, we only have two polygons, so we'll set pixel_size = 1 to keep a fine border. If you have many polygons stretching across the globe in one Shapefile, it is wiser to set this value to 25 or more. Otherwise, you could end up with a 10 GB raster and your machine will run all night long! The no_data_value parameter is needed to tell GDAL what values to set in the empty space around our input polygons, and we set these to -9999 in order to be easily identified.
Next, we'll simply set the input Shapefile stored in the EPSG:3857 web mercator and output GeoTiff. Check to make sure that you change the file names accordingly if you want to use some other dataset. We start by working with the OGR module to open the Shapefile and retrieve its layer information and the extent information. The extent is important because it is used to calculate the size of the output raster width and height values, which must be integers that are represented by the x_res and y_res variables.
Note that the projection of your Shapefile must be in meters not degrees. This is very important since this will NOT work in EPSG:4326 or WGS 84, for example. The reason is that the coordinate units are LAT/LON. This means that WGS84 is not a flat plane projection and cannot be drawn as is. Our x_res and y_res values would evaluate to 0 since we cannot get a real ratio using degrees. This is a result of use not being able to simply subtract coordinate x from coordinate y because the units are in degrees and not in a flat plane meter projection.
Now moving on to the raster setup, we'll define the type of raster we want to export as a Gtiff. Then, we can get the correct GDAL driver by the raster type. Once the raster type is set, we can create a new empty raster dataset, passing in the raster file name, width, and height of the raster in pixels, number of raster bands, and finally, the type of raster in GDAL terms, that is, the gdal.GDT_Byte. These five parameters are mandatory to create a new raster.
Next, we'll call SetGeoTransform that handles transforming between pixel/line raster space and projection coordinates space. We want to activate the band 1 as it is the only band we have in our raster. Then, we'll assign the no data value for all our empty space around the polygon.
The final step is to call the gdal.RasterizeLayer() function and pass in our new raster band Shapefile and the value to assign to the inside of our raster. The value of all the pixels inside our polygon will be 255.
If you are interested, you can visit the command-line tool gdal_rasterize at http://www.gdal.org/gdal_rasterize.html. You can run this straight from the command line.
We have now looked at how we can go from vector to raster, so it is time to go from raster to vector. This method is much more common because most of our vector data is derived from remotely sensed data such as satellite images, orthophotos, or some other remote sensing dataset such as lidar.
As usual, please enter your python virtual environment with the help of the workon pygeoan_cb command:
$ source venvs/pygeoan_cb/bin/activate
Now let's begin:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from osgeo import ogr
from osgeo import gdal
# get raster datasource
open_image = gdal.Open( "../geodata/cadaster_borders-2tone-black-
white.png")
input_band = open_image.GetRasterBand(3)
# create output datasource
output_shp = "../geodata/cadaster_raster"
shp_driver = ogr.GetDriverByName("ESRI Shapefile")
# create output file name
output_shapefile = shp_driver.CreateDataSource( output_shp + ".shp" )
new_shapefile = output_shapefile.CreateLayer(output_shp, srs = None )
gdal.Polygonize(input_band, None, new_shapefile, -1, [], callback=None)
new_shapefile.SyncToDisk()
$ python ch03-06_raster2shp.py
Working with ogr and gdal is similar in all our recipes; we must define the inputs and get the appropriate file driver to open the files. The GDAL library is very powerful and in only one line of code, we can convert a raster to a vector with the gdal. Polygonize function. All the preceding code is simply setup code to define which format we want to work with. We can then set up the appropriate driver to input and output our new file.
In this article we covered converting a Shapefile to a PostGIS table using ogr2ogr, batch importing a folder of Shapefiles into PostGIS using ogr2ogr, batch exporting a list of tables from PostGIS to Shapefiles, converting an OpenStreetMap (OSM) XML to a Shapefile, converting a Shapefile (vector) to a GeoTiff (raster), and converting a GeoTiff (raster) to a Shapefile (vector) using GDAL
Further resources on this subject: