GIS uses points, lines, polygons, coverages, and rasters to store data. Each of these GIS data types can be used in different ways when performing analyses, and have different attributes and traits. Python, like GIS, has data types that it uses to organize data. The main data types used in this book are strings, integers, floats, lists, tuples, and dictionaries. They each have their own attributes and traits, and are used for specific parts of code automation. There are also built-in functions that allow for data types to be converted from one type to another; for instance, the integer 1
can be converted to the string '1'
using the function str()
:
>>> variable = 1
>>> strvar = str(variable)
>>> strvar
'1'
Strings are used to contain any kind of character, and begin and end with quotation marks. Either single or double quotes can be used; the string must begin and end with the same type of quotation mark. Quoted text can appear within a string; it must use the opposite quotation mark to avoid conflicting with the string, as shown here:
>>> quote = 'This string contains a quote: "Here is the quote" '
A third type of string is also employed: a multiple line string, which starts and ends with three single quote marks like this:
>>> multiString = '''This string has
multiple lines and can go for
as long as I want it too'''
Integers are whole numbers that do not have any decimal places. There is a special consequence to the use of integers in mathematical operations: if integers are used for division, an integer result will be returned. Check out the following code snippet to see an example of this:
>>> 5/2
2
Instead of an accurate result of 2.5, Python will return the "floor", or the lowest whole integer for any integer division calculation. This can obviously be problematic, and can cause small bugs in scripts, which can have major consequences. Please be aware of this issue when writing scripts.
Data must often be grouped, ordered, counted, and sorted. Python has a number of built-in data "containers", which can be used for each and all of these needs. Lists, tuples, sets, and dictionaries are the main data containers, and can be created and manipulated without the need to import any libraries.
For array types like lists and tuples, the order of the data is very important for retrieval. Data containers like dictionaries "map" data using a "key-value" retrieval system, where the "key" is mapped or connected to the "value". In dictionaries, the order of the data is not important. For all mutable data containers, sets can be used to retrieve all unique values within a data container such as a list.
Data stored in ordered arrays like lists and tuples often needs to be individually accessed. To directly access a particular item within a list or tuple, you need to pass its index number to the array in square brackets. This makes it important to remember that Python indexing and counting starts at 0
instead of 1
. This means that the first member of a group of data is at the 0
index position, and the second member is at the 1
index position, and so on:
>>> newlist = ['run','chase','dog','bark']
>>> newlist[0]
'run'
>>> newlist[2]
'dog'
Zero-based indexing applies to characters within a string. Here, the list item is accessed using indexing, and then individual characters within the string are accessed, also using indexing:
>>> newlist[3][0]
'b'
Zero-based indexing also applies when there is a for
loop iteration within a script. When the iteration starts, the first member of the data container being iterated is data item 0
, the next is data item 1
, and so on:
>>> newlist = ['run','chase','dog','bark']
>>> for counter, item in enumerate(newlist):
print counter, newlist[counter]
0 run
1 chase
2 dog
3 bark
Note
The enumerate
module is used to add a counter variable to a for
loop, which can report the current index value.
Lists are ordered arrays of data, which are contained in square brackets, []
. Lists can contain any other type of data, including other lists. Mixing of data types, such as floats, integers, strings, or even other lists, is allowed within the same list. Lists have properties, such as length and order, which can be accessed to count and retrieve. Lists have methods to be extended, reversed, sorted, and can be passed to built-in Python tools to be summed, or to get the maximum or minimum value of the list.
Data pieces within a list are separated by commas. List members are referenced by their index or position in the list, and the index always starts at zero. Indexes are passed to square brackets []
to access these members, as in the following example:
>>> alist = ['a','b','c','d']
>>> alist[0]
'a'
This preceding example shows us how to extract the first value (at index 0
) from the list called alist
. Once a list has been populated, the data within it is referenced by its index, which is passed to the list in square brackets. To get the second value in a list (the value at index 1
), the same method is used:
>>> alist[1]
'b'
Lists, being mutable, can be changed. Data can be added or removed. To merge two lists, the extend
method is used:
>>> blist = [2,5,6]
>>> alist.extend(blist)
>>> alist
['a', 'b', 'c', 'd', 2, 5, 6]
Lists are a great way to organize data, and are used all the time in ArcPy.
Tuples are cousins to lists, and are denoted by parentheses ()
. Unlike lists, tuples are "immutable". No data can be added or removed, nor can they cannot be adjusted or extended, once a tuple has been created in memory (it can be overwritten). Data within a tuple is referenced in the same way as a list, using index references starting at 0
passed to square brackets []
:
>>> atuple = ('e','d','k')
>>> atuple[0]
'e'
Dictionaries are denoted by curly brackets "{ }", and are used to create "key-value" pairs. This allows us to map values from a key to a value so that the value can replace the key, and data from the value can be used in processing. Here is a simple example:
>>> new_dic = {}
>>> new_dic['key'] = 'value'
>>> new_dic
{'key': 'value'}
>>> adic = {'key':'value'}
>>> adic['key']
'value'
Note that instead of referring to an index position, like lists or tuples, the values are referenced using a key. Also, keys can be any data object except lists (because they are mutable).
This can be very valuable when reading data from a shapefile or feature class for use in analysis. For example, when using an address_field
as a key, the value would be a list of row attributes associated with that address. Look at the following example:
>>> business_info = { 'address_field' : ['100', 'Main', 'St'], 'phone':'1800MIXALOT' }
>>> business_info['address_field']
['100', 'Main', 'St']
Dictionaries are very valuable for reading in feature classes, and for easily parsing through the data by calling only the rows of interest, among other operations. They are great for ordering and reordering data for later use in a script.
Dictionaries are also useful for counting data items such as the number of times a value appears within a dataset, as seen in this example:
>>> list_values = [1,4,6,7,'foo',3,2,7,4,2,'foo']
>>> count_dictionary = {}
>>> for value in list_values:
if value not in count_dictionary:
count_dictionary[value] = 1
else:
count_dictionary[value] += 1
>>> count_dictionary['foo']
2