ArcPy and ArcGIS

Chapter 1. Introduction to Python for ArcGIS

In this chapter, we will discuss the development of Python as a programming language from its introduction in the late 1980s to its current state. We will discuss the creator of the language and the philosophy of design that spurred its development. We will also touch on important modules that will be used throughout the rest of the book, and especially focus on the modules built into the Python Standard Library. We will configure Python and the computer to execute Python scripts. The structure of the Python folder will be discussed, as will the location of the ArcPy module within the ArcGIS folder structure. We will also discuss Integrated Development Environments (IDEs)--programs designed to assist in code creation and code execution--comparing and contrasting existing IDEs to determine what benefits each IDE can offer when scripting Python code.

This chapter will cover the following topics:

Quick Overview of Python: what it is and what it does, who created it, where is it now
Important Python modules, both built-in and third party
Python core concepts including data types, data containers, and looping
The location of the Python interpreter, and how it is called to execute a script
Adjusting the computer's environment variables to ensure correct code execution
Integrated Development Environments (IDEs)
Python's folder structure, and where the modules are stored

Python as a programming language

Over the last 40+ years of computer science, programming languages have developed from assembly and machine code towards high-level abstracted languages, which are much closer to English. The Python programming language, begun by Guido van Rossum in 1989, was designed to overcome issues that programmers were dealing with in the 1980s: slow development time, overly complicated syntax, and horrible readability. He developed Python as a language that would enable rapid code development and testing, have beautiful (or at least readable) syntax, and produce results with fewer lines of code, in less time. The first version of Python (0.9.0) was released in 1991, and it has always been free to download and use.

Note

Go to https://www.python.org/ to explore Python documentation, try tutorials, get help, find useful Python code libraries, and download Python. Python has multiple major and minor versions. For much of the book, we are using Python 2.7, which is installed automatically along with ArcGIS for Desktop. For chapters on ArcGIS Pro, we will use Python 3.5.

Interpreted language

Python is an interpreted language. It is written in C, a compiled language, and the code is "interpreted" from Python into C before it is executed. Practically, this means that the code is executed as soon as it is converted and compiled. While the code interpretation can have speed implications for the execution of Python-based programs, this has very little real-world implications for its use with ArcGIS. Testing of code snippets is much faster in an interpretive environment, and it is perfect for creating scripts to automate basic, repeatable computing tasks.

Standard (built-in) library

Python, when installed, has a basic set of functions that are referred to as the built-in library. These built-in tools allow Python to perform string manipulations, math computations, HTTP calls, and URL parsing along with many other functions. Some of the tool libraries, or modules, are available as soon as Python is started, while others must be explicitly called using the "import" keyword to make their functions and classes available. Other modules have been developed by third parties, and can be downloaded and installed onto the Python installation as needed.

Glue language

Python is often called a "glue" language. This term describes the use of Python code to control other software programs by sending inputs to their Application Programming Interface (API) and collecting outputs, which are then sent to another program to repeat the process. A GIS example would be to use Python's urllib2 to download zipped shapefiles from a website, unzipping the files, processing the files using ArcToolbox, and compiling the results into an Excel spreadsheet. All of this is accomplished using freely available modules that are either included in Python's built-in library, or added on when ArcGIS is installed.

Wrapper modules

The ArcPy module is a "wrapper" module. It "wraps" a Python code interface over the existing ArcGIS tools and source code, allowing us to access these tools within our scripts. Wrapper modules are common in Python, and are so named because they "wrap" Python onto the tools we will need. They allow us to use Python to interface with programs written in C or other programming languages using the API of those programs. The wrapper module os allows for operating system operations.

For example, wrapper modules make it possible to extract data from an Excel spreadsheet, transform the data into another format such as a shapefile, and load it into an MXD as a layer. Not all modules are wrappers; some modules are written in "pure Python", and perform their analysis and computations using Python syntax. Either way, the end result is that a computer and its programs are available to be manipulated and controlled using Python.

The basics of Python programming

Python has a number of language requirements and conventions, which allow for the control of modules and the structuring of code. Following are a number of important basic concepts which will be used throughout this book, and when crafting scripts for use with geospatial analysis.

To test these examples, open the IDLE (Python GUI) program from the Start Menu/ArcGIS/Python2.7 folder after installing ArcGIS for Desktop. It has a built-in "interpreter" or code entry interface, indicated by the triple chevron >>> and a blinking cursor. To create a script in IDLE to save your code, click on to the File menu and then click New File. Save any script with a .py extension. Otherwise, just enter commands into the interpreter and push Enter to execute or add the next line.

Import statements

Import statements are used to augment the power of Python by calling other modules for use in the script. These modules can be a part of the standard Python library of modules, such as the math module (used to do higher mathematical calculations), or, importantly, can be like ArcPy, which will allow us to interact with ArcGIS. Import statements can be located anywhere before the module is used, but, by convention, they are located at the top of a script.

There are three ways to create an import statement. The first, and most standard, is to import the whole module as follows:

import arcpy

Using this method, we can even import more than one module on the same line. Next, we will import three modules: arcpy, os (the operating system module), and sys (the Python system module):

import arcpy, os, sys

The next method of importing a script is to import a specific portion of a module instead of importing the entire module using the from <module> import <submodule> syntax:

from arcpy import mapping

This method is used when only a portion of the code from ArcPy is needed; it has the practical effect of limiting the amount of memory used by the module when it is called. We can also import multiple portions of the module in the same fashion.

from arcpy import mapping, da

The third way to import a module is to write the from <module> import <submodule> syntax, but use an asterisk * to import all parts of the module as follows:

from arcpy import *

This method is still used, but it is discouraged as it can have unknown effects--the main one is that the names of the variables in the module might conflict with another variable in another module. For this reason, it is best to avoid this third method. However, lots of existing scripts include import statements in this format, so it is good to know that it exists.

Variables

Variables are a part of all programming languages. They are used to reference data objects stored in memory for using later in a script. There are a lot of arguments over the best method of naming variables. No variable standard has been developed for Python scripting for ArcGIS, so I will describe some common practices to use when naming variables here:

Making them descriptive: Don't just name a variable, x; that variable will be useless later when the script is reviewed, and there is no way of knowing what it is used for, or why. They should be longer rather than shorter, and should explain what they do, or even what type of data they hold. For example:

shapefilePath = "C:/Data/shapefile.shp"

Using CamelCase to make the variable readable: Camel case is a term used for variables that start with a lowercase letter but have uppercase letters in the middle, resembling a camel's hump. For example:

camelCase = 'camel case is twoWords stuck together like this'

Using an underscore to separate parts of the name: This makes the name longer, but adds some clarity when reading the variable name, like this:

location_address = '100 Main St'

Including the data type in the variable name: If the variable contains a string, call it variableString or variable_string. This is not standard, and will not be used in this book, but it can help organize the script, and is helpful for others who will read these scripts. Python is dynamically typed instead of statically typed, a programming language distinction, which means that a variable does not have to be declared before it can be used, unlike Visual Basic or other statically typed languages. For example:

variableString = 'this is a string'

For loops

Built into all programming languages is the ability to iterate over a dataset to perform an operation on the data, thus transforming the data or extracting data that meets specific criteria. The dataset must be iterable to be used in a for loop. We will use iteration in the form of for loops throughout this book. Here is a simple example of a for loop, which takes string values and prints them in uppercase using the string upper method. Open IDLE (Python GUI) from the Start Menu/ArcGIS/Python2.7 folder to try a for loop. Enter commands at the Python interpreter's triple chevron >>> :

>>> newlist = ['a','b','c','d']
>>>for value in newlist:
           print value.upper()
A
B
C
D

If/Elif/Else statements

Conditional statements, called if...else statements in Python, are another programming language standard. They are used when evaluating data; when certain conditions are met, one action will be taken (the initial if statement); if another condition is met, another action is taken (this is an elif statement), and if the data does not meet the condition, a final action is assigned to deal with those cases (the else statement). They are similar to a conditional in an SQL statement used with the Select Tool in ArcToolbox. Here is an example using an if...else statement to evaluate data in a list. In the example, within the for loop, the modulo operator % produces the remainder of a division operation. The if condition checks for no remainder when divided in half, a elif condition looks for remainder of two when divided by three, and the else condition catches any other result, as shown:

>>> data = [1,2,3,4,5,6,7]
>>> for val in data:
          if val % 2 == 0:
               print val,"no remainder"
          elif val % 3 == 2:
               print val, "remainder of two"
          else:
               print "final case"


final case
2 no remainder
4 no remainder
5 remainder of two
6 no remainder
final case

While statements

Another important evaluation tool is the while statement. It is used to perform an action while a condition is true; when the condition is false, the evaluation will stop. Note that the condition must become false, or the action will be performed forever, creating an "infinite loop" that will not stop until the Python interpreter is shut off externally. Here is an example of using a while loop to perform an action until a true condition becomes false:

>>> x = 0
>>> while x < 5:
      print x
      x+=1
0
1
2
3
4

Comments

Comments in Python are used to add notes within a script. They are marked by a pound sign, and are ignored by the Python interpreter when the script is run. Comments are useful for explaining what a code block does when it is executed, or for any other helpful note that a script author would like to make for future script users:

#This is a comment

Data types

GIS uses points, lines, polygons, coverages, and rasters to store data. Each of these GIS data types can be used in different ways when performing analyses, and have different attributes and traits. Python, like GIS, has data types that it uses to organize data. The main data types used in this book are strings, integers, floats, lists, tuples, and dictionaries. They each have their own attributes and traits, and are used for specific parts of code automation. There are also built-in functions that allow for data types to be converted from one type to another; for instance, the integer 1 can be converted to the string '1' using the function str():

>>> variable = 1
>>> strvar = str(variable)
>>> strvar
'1'

Strings

Strings are used to contain any kind of character, and begin and end with quotation marks. Either single or double quotes can be used; the string must begin and end with the same type of quotation mark. Quoted text can appear within a string; it must use the opposite quotation mark to avoid conflicting with the string, as shown here:

>>> quote = 'This string contains a quote: "Here is the quote" '

A third type of string is also employed: a multiple line string, which starts and ends with three single quote marks like this:

>>> multiString = '''This string has
multiple lines and can go for
as long as I want it too'''

Integers

Integers are whole numbers that do not have any decimal places. There is a special consequence to the use of integers in mathematical operations: if integers are used for division, an integer result will be returned. Check out the following code snippet to see an example of this:

>>> 5/2
2

Instead of an accurate result of 2.5, Python will return the "floor", or the lowest whole integer for any integer division calculation. This can obviously be problematic, and can cause small bugs in scripts, which can have major consequences. Please be aware of this issue when writing scripts.

Floats

Floating point values, or floats, are used by Python to represent decimal values. Because computers store values in a base 2 binary system, there can be issues representing a floating value that would normally be represented in a base 10. Read https://docs.python.org/2/tutorial/floatingpoint.html for a further discussion on the ramifications of this limitation; for applications discussed within this book, it won't be an issue.

Data containers

Data must often be grouped, ordered, counted, and sorted. Python has a number of built-in data "containers", which can be used for each and all of these needs. Lists, tuples, sets, and dictionaries are the main data containers, and can be created and manipulated without the need to import any libraries.

For array types like lists and tuples, the order of the data is very important for retrieval. Data containers like dictionaries "map" data using a "key-value" retrieval system, where the "key" is mapped or connected to the "value". In dictionaries, the order of the data is not important. For all mutable data containers, sets can be used to retrieve all unique values within a data container such as a list.

Zero-based indexing

Data stored in ordered arrays like lists and tuples often needs to be individually accessed. To directly access a particular item within a list or tuple, you need to pass its index number to the array in square brackets. This makes it important to remember that Python indexing and counting starts at 0 instead of 1. This means that the first member of a group of data is at the 0 index position, and the second member is at the 1 index position, and so on:

>>> newlist = ['run','chase','dog','bark']
>>> newlist[0]
'run'
>>> newlist[2]
'dog'

Zero-based indexing applies to characters within a string. Here, the list item is accessed using indexing, and then individual characters within the string are accessed, also using indexing:

>>> newlist[3][0]
'b'

Zero-based indexing also applies when there is a for loop iteration within a script. When the iteration starts, the first member of the data container being iterated is data item 0, the next is data item 1, and so on:

>>> newlist = ['run','chase','dog','bark']
>>> for counter, item in enumerate(newlist):
       print counter, newlist[counter]


0 run
1 chase
2 dog
3 bark

Note

The enumerate module is used to add a counter variable to a for loop, which can report the current index value.

Lists

Lists are ordered arrays of data, which are contained in square brackets, []. Lists can contain any other type of data, including other lists. Mixing of data types, such as floats, integers, strings, or even other lists, is allowed within the same list. Lists have properties, such as length and order, which can be accessed to count and retrieve. Lists have methods to be extended, reversed, sorted, and can be passed to built-in Python tools to be summed, or to get the maximum or minimum value of the list.

Data pieces within a list are separated by commas. List members are referenced by their index or position in the list, and the index always starts at zero. Indexes are passed to square brackets [] to access these members, as in the following example:

>>> alist = ['a','b','c','d']
>>> alist[0]
'a'

This preceding example shows us how to extract the first value (at index 0) from the list called alist. Once a list has been populated, the data within it is referenced by its index, which is passed to the list in square brackets. To get the second value in a list (the value at index 1), the same method is used:

>>> alist[1]
'b'

Lists, being mutable, can be changed. Data can be added or removed. To merge two lists, the extend method is used:

>>> blist = [2,5,6]
>>> alist.extend(blist)
>>> alist
['a', 'b', 'c', 'd', 2, 5, 6]

Lists are a great way to organize data, and are used all the time in ArcPy.

Tuples

Tuples are cousins to lists, and are denoted by parentheses (). Unlike lists, tuples are "immutable". No data can be added or removed, nor can they cannot be adjusted or extended, once a tuple has been created in memory (it can be overwritten). Data within a tuple is referenced in the same way as a list, using index references starting at 0 passed to square brackets []:

>>> atuple = ('e','d','k')
>>> atuple[0]
'e'

Dictionaries

Dictionaries are denoted by curly brackets "{ }", and are used to create "key-value" pairs. This allows us to map values from a key to a value so that the value can replace the key, and data from the value can be used in processing. Here is a simple example:

>>> new_dic = {}
>>> new_dic['key'] = 'value'
>>> new_dic
{'key': 'value'}
>>> adic = {'key':'value'}
>>> adic['key']
'value'

Note that instead of referring to an index position, like lists or tuples, the values are referenced using a key. Also, keys can be any data object except lists (because they are mutable).

This can be very valuable when reading data from a shapefile or feature class for use in analysis. For example, when using an address_field as a key, the value would be a list of row attributes associated with that address. Look at the following example:

>>> business_info = { 'address_field' :  ['100', 'Main', 'St'], 'phone':'1800MIXALOT'  }
>>> business_info['address_field']
['100', 'Main', 'St']

Dictionaries are very valuable for reading in feature classes, and for easily parsing through the data by calling only the rows of interest, among other operations. They are great for ordering and reordering data for later use in a script.

Dictionaries are also useful for counting data items such as the number of times a value appears within a dataset, as seen in this example:

>>> list_values = [1,4,6,7,'foo',3,2,7,4,2,'foo']
>>> count_dictionary = {}
>>> for value in list_values:
       if value not in count_dictionary:
           count_dictionary[value] = 1
       else:
           count_dictionary[value] += 1
>>> count_dictionary['foo']
2

Other important concepts

The use of Python for programming requires an introduction to a number of concepts that are either unique to Python, but required, or common programming concepts that will be invoked repeatedly when creating scripts. The following are a number of these concepts which must be covered to be fluent in Python.

Indentation

Python, unlike most other programming languages, enforces strict rules on indenting lines of code. This concept derives again from Guido's desire to produce clean, readable code. When creating functions, or using for loops or if...else statements, indentation is required on the succeeding lines of code. If a for loop is included inside an if...else statement, there will be two levels of indentation. New programmers generally find it to be helpful, as it makes it easy to organize code. A lot of programmers new to Python will create an indentation error at some point, so make sure to pay attention to the indentation levels.

Functions

Functions are used to take code that is repeated over and over within a script, or across scripts, and make formal tools out of them. Using the keyword def, short for "define function", functions are created with defined inputs and outputs (which are returned from the function using the keyword return). The idea of a function in computing is that it takes in data in one state, and converts it into data in another state, without affecting any other part of the script. This can be very valuable for automating a GIS analysis.

Here is an example of a function that returns the square of any number supplied:

def square(inVal):
   return inVal ** 2
>>> square(3)
9

Keywords

There are a number of keywords built into Python that should be avoided when naming variables. These include max, min, sum, return, list, tuple, def, del, from, not, in, as, if, else, elif, or, and while among many others. Using these keywords as variables can result in errors in your code.

Namespaces

Namespaces are a logical way to organize variable names, to allow a variable inside a function or an imported module to share the same name as a variable in the main script body, without overwriting the variable. Referred to as "local" variables versus "global" variables, local variables are contained within a function (either in the script or within an imported module), while global variables are within the main body of the script.

These issues often arise when a variable within an imported module unexpectedly has the same name of a variable in the script, and the interpreter has to use namespace rules to decide between the two variables.

Important Python modules

Modules, or code libraries that can be called by a script to increase its programming potential, are either built into Python, or are created by third parties, and added later to Python. Most of these are written in Python, but a number of them are also written in other programming languages, and then "wrapped" in Python to make them available within Python scripts. Wrappers are also used to make other software available to Python, such as the tools built into Microsoft Excel.

The OS (operating system) module

The os module, part of the standard library, is very useful for a number of regular operations within Python. The most used part of the os module is the os.path method, which allows the script to control file paths, and to divide them into directory paths and base paths. There is also a useful method, os.walk, which will "walk" a directory and return all files within the folders and the subfolders.

The sys (Python system) module

The sys module, part of the standard library, refers to the Python installation itself. It has a number of methods that will get information about the version of Python installed, as well as information about the script, and any "arguments" supplied to the script, using the sys.argv method. The sys.path method is very useful for appending the Python file path; practically, this means that folders containing scripts can be referenced by other scripts to make the functions they contain importable.

The CSV, XLRD, and XLWT modules

The csv, xlrd, and xlwt modules are used to read and write data spreadsheets. They can be very useful for extracting data from the spreadsheets and converting them into data for GIS analysis, or for writing out analysis results as spreadsheets when an analysis is complete. The csv module (which creates text file spreadsheets using text delimiters like commas) is a built-in module, while xlrd and xlwt (which read and write Excel files respectively) are not part of the standard library, but are installed along with ArcGIS and Python 2.7.

Commonly used built-in functions

There are a number of built-in functions that we will use throughout the book. The main ones are listed as follows:

str: The string function is used to convert any other type of data into a string.
int: The integer function is used to convert a string or float into an integer. To avoid an error, any string passed to the integer function must be a number such as '1'.
float: The float function is used to convert a string or an integer into a float, much like the integer function.

Standard library modules

Commonly used standard library modules that must be imported are as follows:

datetime: The datetime module has date and time information, and can convert date data formats
math: The math module is for higher level math functions, such as getting a value for Pi or squaring a number
string: The string module is used for string manipulations
csv: The csv module is used for creating, accessing, and editing text spreadsheets.

Check out https://docs.python.org/2/library/ for a complete list of the built-in modules.

How Python executes a script

Understanding how Python works to interpret a script, and then executes the commands within, is as important as understanding the Python language itself. Hours of debugging and error checking can be avoided by taking the time to set up Python correctly. The interpretive nature of Python means that a script will first have to be converted into bytecode before it can be executed. We will cover the steps that Python takes to achieve our goal of automating GIS analysis.

What is a Python script?

Let's start with the very basics of writing and executing a Python script. What is a Python script? It is a simple text file that contains a series of organized commands, written in a formalized language. The text file has the extension .py, but other than that, there is nothing to distinguish it from any other text file. It can be opened using a text editor such as Notepad or WordPad, but the "magic" that Python does is that it does not reside in a Python script. Without the Python interpreter, a Python script cannot run, and its commands cannot be executed.

Python interpreter

The Python interpreter, in a Windows environment, is a program that has been 'compiled' from the Python source code into a Windows executable and has the extension .exe. The Python interpreter, python.exe, is written in C, an older and extensively used programming language with a more complex syntax.

The Python interpreter, as its name implies, interprets the commands contained within a Python script. When a Python script is run, or executed, the syntax is first checked to make sure that it conforms to the rules of Python (for example, indentation rules are followed, and that the variables follow naming conventions). Then, if the script is valid, the commands contained within are converted into bytecode, a specialized code that is executed by the bytecode interpreter, a virtual machine written in C. The bytecode interpreter further converts the bytecode (which is contained within files that end with the extension .pyc) into the correct machine code for the computer being used, and then the CPU executes the script. This is a complex process, which allows Python to maintain a semblance of simplicity.

Where is the Python interpreter located?

The location of the Python interpreter within the folder structure of a computer is an important detail to master. Python is often downloaded directly from https://www.python.org/, and installed separately from ArcGIS. However, each ArcGIS version will require a specific version of Python; given this requirement, the inclusion of Python within the ArcGIS installation package is helpful. For this book, we will be using ArcGIS 10.5, and this will require Python 2.7.

On a Windows machine, the Python folder structure is placed directly in the C:\ drive unless it is explicitly loaded on another drive. The installation process for ArcGIS 10.5 will create a folder at C:\Python27, which will contain another folder called either ArcGIS10.5 or ArcGIS10.5x64 depending on the version of ArcGIS that has been installed. For this book, we will be using the 32-bit version of ArcGIS, so, the final folder path will be C:\Python27\ArcGIS10.5.

Within that folder are a number of subfolders as well as python.exe, which is the Python interpreter itself. Also included is a second version of the interpreter called pythonw.exe; this version is also very important, as it will execute a script without causing a terminal window to appear. Both python.exe and pythonw.exe contain complete copies of all Python commands, and can be used to execute a script.

Which Python interpreter should be used?

The general rule for executing a script directly using the Python interpreters is to use pythonw.exe, as no terminal window will appear. When there is a need to test code snippets, or to see output within a terminal window, then start python.exe by double-clicking the executable file.

When python.exe is started, a Python interpreter console will appear as seen in the following screenshot:

Note the distinctive three chevrons >>> that appear below the header explaining version information. That is the Python "prompt" where code is entered to be executed line by line, instead of in a completed script. This direct access to the interpreter is useful for testing code snippets and understanding syntax. A version of this interpreter, the Python Window, has been built into ArcMap and ArcCatalog since ArcGIS 10. It will be discussed further in later chapters.

How does the machine know where the interpreter is?

To be able to execute Python scripts directly (that is, to make the scripts run by double-clicking on them), the computer will also need to know where the interpreter sits within its folder structure. To accomplish this requires both administrative account access, and advanced knowledge of how Windows searches for a program. If you have this, you can adjust an environment variable within the advanced system settings dialogue to register the interpreter with the system path.

On a Windows 7/10 machine, click on the Start menu, and right-click on Computer. Then select Properties from the menu. On a Windows 8 machine, open up Windows explorer, right click on This PC, and select Properties from the menu. These commands are shortcuts to get to the Control Panel's System and Security/System menu. Select Advanced system settings from the panel on the left. Click on the Environment Variables button at the bottom of the System Properties menu that appears. In the lower portion of the Environment Variables menu, scroll in the System variables window until the Path variable appears. Select it by clicking on it, and click on the Edit button. The Edit System Variable window will appear like this:

This variable has two components: Variable name (Path) and Variable value. The value is a series of folder paths separated by semicolons. This is the path that is searched when Windows looks for specific executables that have been associated with a file extension. In our case, we will add in the folder path that contains the Python interpreter. Type C:\Python27\ArcGIS10.5 (or the equivalent on your machine) into the Variable value field, making sure to separate it from the value before it with a semi-colon. Press OK to exit the Edit dialogue, OK to exit the Environment Variables menu, and OK to exit the System Properties menu. The machine will now know where the Python interpreter is, as it will search all folders contained within the Path variable to look for an executable called Python. To test that the path adjustment worked correctly, open up a command window (Start Menu/Run, and type "cmd"), and type python.

The interpreter should start directly in the command window:

If the Python header with version information and the triple chevron appears, the path adjustment has worked correctly.

If there is no admin access available, there is a workaround. In a command-line window, pass the entire path to the Python interpreter (for example, C:\Python27\ArcGIS10.5\python.exe) to start the interpreter.

Make Python scripts executable when clicked

The final step to make the scripts run when clicked (which also means they can run outside of the ArcGIS environment, saving lots of memory overhead) is to associate files with the .py extension with the Python interpreter. If the scripts have not already been associated with the interpreter, they will appear as files of an unknown type or as a text file.

To change this, right-click on a Python script. Select Open With, and then select Choose Default Program. If python.exe or pythonw.exe does not appear as a choice, navigate to the folder that holds them (C:\Python27\ArcGIS10.5 in this case), and select either python.exe or pythonw.exe. Again, the difference between the two is the appearance of a terminal window when the scripts are run using python.exe, which will contain any output from the script (but this window will disappear when the script is done). I recommend using pythonw.exe when executing scripts, and python.exe for testing out code. Python scripts can also explicitly call pythonw.exe by saving the script with the extension .pyw instead of .py.

Integrated Development Environments (IDEs)

The Python interpreter contains everything that is needed to execute a Python script or to test Python code by interacting with the Python interpreter. However, writing scripts requires a text editor. There are usually at least two included simple text editors on a Windows machine (Notepad and WordPad), and they would work in an emergency to edit a script or even write a whole script.

Unfortunately, they are very simple, and do not allow the user functionality that would make it easier to write multiple scripts or very long scripts. To bridge the gap, a series of programs, collectively known as Integrated Development Environments (IDEs), have been developed. IDEs exist for all programming languages, and include functions such as variable listing, code assist, and more, which makes them ideal for crafting programming scripts. We will review a few of them later to assess their usefulness for writing Python scripts. The following three discussed are all free and well-established within different Python communities.

IDLE

Python includes an IDE when it is installed. To start it in Windows 7, go to the Start menu, and find the ArcGIS folder within the Programs menu. Then find the Python folder; IDLE will be one of the choices within that folder. Select it to start IDLE.

IDLE contains an interactive interpreter (that is, the famous triple chevron), and the ability to run whole Python scripts. It is also written using Python's built-in GUI module called Tkinter, so it has the advantage of being written using the same language that it executes.

IDLE is a passable IDE, which is useful if no other programs can be installed on the machine. It is also very useful for rapid testing of code snippets. While it is not my IDE of choice, I find myself using IDLE almost daily.

PythonWin

PythonWin includes an Interactive Window where the user can directly interact with the Python interpreter. Scripts can also be opened within PythonWin, and it includes a set of tiling commands in the Windows menu, which allows the user to organize the display of all open scripts and the Interactive Window. It is very popular for users of ArcPy and ArcGIS, but it has been eclipsed in use by the full-fledged IDEs described as follows.

Atom, Notepad++, and Sublime Text

Some text editors have full-fledged code editing capabilities, making them ideal IDEs. While Sublime Text is commercial, it is a powerful program, which allows for easy code editing across multiple operating systems. Similarly, Notepad++ for Windows is a powerful text editor that works well for editing code. Atom, available at https://atom.io/, is a product of the GitHub development team, and offers multiple powerful language options such as code completion and error highlighting.

All three of these advanced text editors recognize Python keywords and code structure, and will make it easy to indent code according to the rules of Python. I use them all, often interchangeably, and have no strong opinion about which one is better, though I prefer Atom and Sublime Text, as these can be used in multiple operating systems, while Notepad++ is only available for Windows. They are powerful IDEs, which are available for download from online sources.

IDE summary

There are many other IDEs, both commercial and free, available for coding in Python. In the end, each GIS analyst must choose the tool that makes them feel productive and comfortable. This may change as programming becomes a bigger part of their daily work flow. Be sure to test out a few different IDEs to find one that is easy to use and intuitive.

Python folder structure

Python's folder structure holds more than just the Python interpreter. Within the subfolders reside a number of important scripts, digital link libraries, and even C language modules. Not all of the scripts are used all the time, but each has a role in making the Python programming environment possible. The most important folder to know about is the site-packages folder, where most modules that will be imported in Python scripts are contained.

Where modules reside

Within every Python installation is a folder called Lib, and within that folder is a folder called site-packages. On my machine, the folder sits at C:\Python27\ArcGIS10.5\Lib\site-packages.

Almost all third-party modules are copied into this folder to be imported as needed. The main exception to this rule, for our purposes, is the ArcPy module, which is stored within the ArcGIS folder in the Program Files folder (for example, C:\Program Files (x86)\ArcGIS\Desktop10.5\arcpy). To make that possible, the ArcGIS installer adjusts the Python system path (using the sys module) to make the arcpy module importable, as described next.

Installing a third-party module

To add greater functionality, thousands of third-party modules, or packages, are available for download. Online module repositories include the Python Package Index (PyPI) as well as GitHub, and others. Python 2 and Python 3 now include a module designed to make installing these packages more simple than it was in the past. This module, pip, will check for registered modules in PyPI, and install the latest version using the command install. Use pip from the command prompt by passing the command install and the name of the package to install.

If the module is not available on PyPI, pip may not be able to install it. For instance, if it’s on GitHub instead (even Python 3.7 is now hosted on https://github.com/, so GitHub is worth knowing), download the zip file of the module, and unzip it into the Lib/site-packages folder. Open a Command Prompt terminal, change directory (cd) into the newly unzipped folder, and run the script setup.py that is part of each module, using the command python setup.py install. This script will install the module, and configure the environmental variables required to make it run.

Many Python modules are only available in the GZip format, which can be unzipped using freeware such as 7Zip. Unzip the .gz file, then unzip the resulting .tar file into the Lib/site-packages folder in the Python folder.

Using Python's sys module to add a module

Python's sys module allows the user to take advantage of the system tools built into the Python interpreter. One of the most useful properties of the sys module is sys.path. It is a list of file paths which the user can modify to adjust where Python will look for a module to import, without needing administrative access.

When Python 2.7 is installed by the ArcGIS 10.5 installer, the installer takes advantage of sys.path functions to add C:\Program Files (x86)\ArcGIS\Desktop10.5\arcpy to the system path. To test this, start up the Python interpreter or an IDE, and type the following:

>>> import sys
>>> print sys.path
['', 'C:\\WINDOWS\\SYSTEM32\\python27.zip', 'C:\\Python27\\ArcGIS10.5\\Dlls', 'C:\\Python27\\ArcGIS10.5\\lib', 'C:\\Python27\\ArcGIS10.5\\lib\\plat-win', 'C:\\Python27\\ArcGIS10.5', 'C:\\Program Files (x86)\\ArcGIS\\Desktop10.5\\arcpy', 'C:\\Program Files (x86)\\ArcGIS\\Desktop10.5\\ArcToolbox\\Scripts']

The system path (stored in the sys property sys.path) includes all of the folders that ArcPy requires to automate ArcGIS. The system path incorporates all directories listed in the PYTHONPATH environment variable (if one has been created); this is separate from the Windows Path environment variable discussed earlier. The two separate path variables work together to help Python locate modules.

The sys.path.append method

The sys.path property is a mutable list, and can be appended or extended to include new file paths that will point to modules the user wants to import. To avoid the need to adjust the sys.path, copy the module into the site-packages folder; however, this is not always possible, so use the sys.path.append method as needed:

>>> sys.path.append("C:\\Projects\\Requests")
>>> sys.path
['', 'C:\\WINDOWS\\SYSTEM32\\python27.zip', 'C:\\Python27\\ArcGIS10.5\\Dlls', 'C:\\Python27\\ArcGIS10.5\\lib', 'C:\\Python27\\ArcGIS10.5\\lib\\plat-win', 'C:\\Python27\\ArcGIS10.5', 'C:\\Program Files (x86)\\ArcGIS\\Desktop10.5\\arcpy', 'C:\\Program Files (x86)\\ArcGIS\\Desktop10.5\\ArcToolbox\\Scripts','C:\\Projects\\Requests']

When the sys.path.append method is used, the adjustment is temporary. Adjust the PYTHONPATH environment variable in the Windows System Properties menu (discussed earlier in the Path environment variable section) to make a permanent change (and create the PYTHONPATH if it has not been created).

One last, valuable note: to import a module without adjusting the system path or copying the module into the site-packages folder, place the module in the folder that contains the script that is importing it. As long as the module is configured correctly, it will work normally. This is useful when there is no administrative access available to the executing machine.

Briannne Aug 19, 2019

I was nervous about buying this book because there were no reviews so I decided to write something up so it helps people in the future. I have written a couple scripts before for use with ArcMap so I was familiar with a lot of the basics but I wanted something to use as reference for my desk. Also I can skim it occasionally to just see some things I could use in future scripts. This book would probably not be super great for someone trying to learn from the ground up but if you have some basic understanding its great, especially for the less obvious tools. (The things you can do that can't be found in ArcToolbox) It turned out to be exactly what I wanted. The lessons are pretty similar to what you would find online, descriptions, sample scripts, etc. But it goes the extra mile, like I said introducing some modules you wouldn't even think of naturally.

Amazon Verified review

Varun Singh Mar 24, 2018

Customization in the field of GIS is an important domain. Book provides valuable information regarding automating the geoprocessing tasks in a systematic way.

Jessica S Oct 10, 2019

Ordered for school!

ArcPy and ArcGIS: Automating ArcGIS for Desktop and ArcGIS Online with Python , Second Edition

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

FAQs