Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

The Five Kinds of Python Functions Python 3.4 Edition

Save for later
  • 33 min read
  • 06 Feb 2015

article-image

This article is written by Steven Lott, author of the book Functional Python Programming.

You can find more about him at http://slott-softwarearchitect.blogspot.com.

(For more resources related to this topic, see here.)

What's This About?

We're going to look at various ways that Python 3 lets us define things which behave like functions. The proper term here is Callable – we're looking at objects that can be called like a function.

We'll look at the following Python constructs:

  • Function definitions
  • Higher-order functions
  • Function wrappers (around methods)
  • Lambdas
  • Callable objects
  • Generator functions and the yield parameter

And yes, we're aware that the list above has six items on it. That's because higher-order functions in Python aren't really all that complex or different. In some languages, functions that take functions are arguments involving special syntax. In Python, it's simple and common and barely worth mentioning as a separate topic.

We'll look at when it's appropriate and inappropriate to use one or the other of these various functional forms.

Some background

Let's take a quick peek at a basic bit of mathematical formalism. We'll look at a function as an abstract formalism. We often annotate it like this:

five-kinds-python-functions-python-34-edition-img-0

This shows us that f() is a function. It has one argument, x, and will map this to a single value, y. Some mathematical functions are written in front, for example, y=sin x. Some are written in other places around the argument, for example, y=|x|.

In Python, the syntax is more consistent, for example, we use a function like this:

>>> abs(-5)
5

We've applied the abs() function to an argument value of -5. The argument value was mapped to a value of 5.

Terminology

Consider the following function:

five-kinds-python-functions-python-34-edition-img-1

In this definition, the argument is a pair of values, (a,b). This is called the domain. We can summarize it as the domain of values for which the function is defined. Outside this domain, the function is not defined. In Python, we get a TypeError exception if we provide one value or three values as the argument.

The function maps the domain pair to a pair of values, (q,r). This is the range of the function. We can call this the range of values that could be returned by the function.

Mathematical function features

As we look at the abstract mathematical definition of functions, we note that functions are generally assumed to have no hysteresis; they have no history or memory of prior use. This is sometimes called the property of being idempotent: the results are always the same for a given argument value.

We see this in Python as a common feature. But it's not universally true. We'll look at a number of exceptions to the rule of idempotence. Here's an example of the usual situation:

>>> int("10f1", 16)
4337

The value returned from the evaluation of int("10f1", 16) never changes.

There are, however, some common examples of non-idempotent functions in Python.

Examples of hysteresis

Here are three common situations where a function has hysteresis. In some cases, results vary based on history. In other cases, results vary based on events in some external environment, such as follows:

  • Random number generators. We don't want them to produce the same value over and over again. The Python random.randrange() function, is not obviously idempotent.
  • OS functions depend on the state of the machine as a whole. The os.listdir() function returns values that depend on the use of functions such as os.unlink(), os.rename(), and open() (among several others).While the rules are generally simple, it requires a stateful object outside the narrow world of the code itself.

These are examples of Python functions that don't completely fit the formal mathematical definition; they lack idempotence, and their values depend on history, other functions, or both.

Function Definitions

Python has two statements that are essential features of function definition. The def statement specifies the domain and the return statement(s) specify the range. A simplified gloss of the syntax is as follows:

def name(params):
   body
   return expression

In effect, the function's domain is defined by the parameters provided in the def statement. This list of parameter names is not all the information on the domain, however. Even if we use one of the Python extensions to add type annotations, that's still not all the information.

There may be if statements in the body of the function that impose additional explicit restrictions. There may be other functions that impose their own kind of implicit restrictions. If, for example, the body included math.sqrt() then there would be an implicit restriction on some values being non-negative.

The return statements provide the function's range. An empty return statement means a range of simply None values. When there are multiple return statements, the range is the union of the ranges on all the return statements.

This mapping between Python syntax and mathematical concepts isn't very complete. We need more information about a function.

Example definition

Here's an example of function definition:

def odd(n):
   """odd(n) -> boolean, true if n is odd."""
   return n % 2 == 1

What do does this definition tell us? Several things such as:

  • Domain: We know that this function accepts n, a single object.
  • Range: Boolean value, True if n is an odd number. This is the most likely interpretation. It's also remotely possible that the class of n has repurposed __mod__() or __rmod__() methods, in which case the semantics can be pretty obscure.

Because of the inherent ambiguity in Python, this function has provided a triple-quoted """Docstring""" parameter with a summary of the function. This is a best practice, and should be followed universally except in articles like this where it gets too long-winded to include a docstring parameter everywhere.

In this case, the doctoring parameter doesn't state unambiguously that n is intended to be a number. There are two ways to handle this gap, they are as follows:

  • Actually include words like n is a number in the docstring parameter
  • Include the docstring parameter test cases that show the required behavior

Either is acceptable. Both are preferable.

Using a function

To complete this example, here's how we'd use this odd little function named odd():

>>> odd(3)
True
>>> odd(4)
False

This kind of example text can be included into the docstring parameter to create two test cases that offer insight into what the function really means.

The lack of declarations

More verbose type declarations—as used in many popular programming languages—aren't actually enough information to fully specify a function's domain and range. To be rigorously complete, we need type definitions that include optional predicates. Take a look at the following command:

isinstance(n,int) and n >= 0

The assert statement is a good place for this kind of additional argument domain checking. This isn't the perfect solution because assert statements can be disabled very easily. It can help during design and testing and it can help people to read your code.

The fussy formal declarations of data type used in other languages are not really needed in Python. Python replaces an up-front claim about required types with a runtime search for appropriate class methods.

This works because each Python object has all the type information bound into it. Static compile-time type information is redundant, since the runtime type information is complete.

A Python function definition is pretty spare. In includes the minimal amount of information about the function. There are no formal declaration of parameter types or return type.

This odd little function will work with any object that implements the % operator:

  • Generally, this means any object that implements __mod__() or __rmod__().
  • This means most subclasses of numbers.Number.
  • It also means instances of any class that happen to provide these methods. That could become very weird, but still possible. We hesitate to think about non-numeric objects that work with the number-like % operator.

Some Python features

In Python, functions we declare are proper first-class objects. This means that they have attributes that can be assigned to variables and placed into collections. Quite a few clever things can be done with function objects.

One of the most elegant things is to use a function as an argument or a return value from another function. The ability to do this means that we can easily create and use higher-order functions in Python.

For folks who know languages such as C (and C++), functions aren't proper first-class objects. A pointer to a function, however, is a first class object in C. But the function itself is a block of code that can't easily be manipulated.

We'll look at a number of simple ways in which we can write—and use—higher-order functions in Python.

Functions are objects

Consider the following command example:

>>> not_even = odd
>>> not_even(3)
True

We've assigned the odd little function object to a new variable, not_even.

This creates an alias for a function. While this isn't always the best idea, there are times when we might want to provide an alternate name for a function as part of maintaining reverse compatibility with a previous release of a library.

Using functions

Consider the following function definition:

def some_test(function, value):
   print(function, value)
   return function(value)

This function's domain includes arguments named function and value. We can see that it prints the arguments, then applies the function argument to the given value.

When we use the preceding function, it looks like this:

>>> some_test(odd, 3)
<function odd at 0x613978> 3
True

The some_test() function accepted a function as an argument. When we printed the function, we got a summary, <function odd at 0x613978>, that shows us some information about the object. We also show a summary of the argument value, 3.

When we applied the function to a value, we got the expected result. We can—of course—extend this concept. In particular, we can apply a single function to many values.

Higher-order Functions

Higher-order functions become particularly useful when we apply them to collections of objects. The built-in map() function applies a simple function to each value in an argument sequence. Here's an example:

>>> list(map(odd, [1,2,3,4]))
[True, False, True, False]

We've used the map() function to apply the odd() function to each value in the sequence. This is a lot like evaluating:

>>> [odd(x) for x in [1,2,3,4]]

We've created a list comprehension instead of applying a higher-order map() function. This is equivalent to the following command snippet:

[odd(1), odd(2), odd(3), odd(4)]

Here, we've manually applied the odd() function to each value in a sequence. Yes, that's a diesel engine alternator and some hoses:

five-kinds-python-functions-python-34-edition-img-2

We'll use this alternator as a subject for some concrete examples of higher-order functions.

Diesel engine background

Some basic diesel engine mechanics. The following some basic information:

  • The engine turns the alternator.
  • The alternator generates pulses that drive the tachometer. Amongst other things, like charging the batteries.

The alternator provides an indirect measurement of engine RPMs. Direct measurement would involve connecting to a small geared shaft. It's difficult and expensive. We already have a tachometer; it's just incorrect.

The new alternator has new wheels. The ratios between engine and alternator have changed.

We're not interested in installing a new tachometer. Instead, we'll create a conversion from a number on the tachometer, which is calibrated to the old alternator, to a proper number of engine RPMs. This has to allow the change in ratio between the original tachometer and the new tach.

Let's collect some data and see what we can figure out about engine RPMs.

New alternator

First approximation: all we did was get new wheels. We can presume that the old tachometer was correct. Since the new wheel is smaller, we'll have higher alternator RPMs. That means higher readings on the old tachometer.

Here's the key question: How far wrong are the RPMs?

The old wheel was approximately 3.5 RPM and the new wheel is approximately 2.5 RPM.

We can compute the potential ratio between what the tach says and what the engine is really doing:

>>> 3.5/2.5
1.4
>>> 1/_
0.7142857142857143

That's nice. Is it right?

Can we really just multiply and display RPMs by .7 to get actual engine RPMs?

Let's create the conversion card first, then collect some more data.

Use case

Given RPM on the tachometer, what's the real RPM of the engine? Use the following command to find the RPM:

def eng(r):
   return r/1.4

Use it like the following:

>>> eng(2100)
1500.0

This seems useful. Tach says 2100, engine (theoretically) spinning at 1500, more or less.

Let's confirm our hypothesis with some real data.

Data collection

Over a period of time, we recorded tachometer readings and actual RPMs using a visual RPM measuring device. The visual device requires a strip of reflective tape on one of the engine wheels. It uses a laser and counts returns per minute. Simple. Elegant. Accurate.

It's really inconvenient. But it got some data we could digest.

Skipping some boring statistics, we wind up with the following function that maps displayed RPMs to actual RPMs, such as this:

def eng2(r):
   return 0.7724*r**1.0134

Here's a sample result:

>>> eng2(2100)
1797.1291903589386

When tach says 2100, the engine is measured as spinning at about 1800 RPM.

That's not quite the same as the theoretical model. But it's so close that it gives us a lot of confidence in this version.

Of course, the number displayed is hideous. All that floating-point cruft is crazy. What can we do?

Rounding is only part of the solution. We need to think through the use case. After all, we use this standing at the helm of the boat; how much detail is appropriate?

Limits and ranges

The engine has governors and only runs between 800 and 2500 RPM. There's a very tight limit here.

Realistically, we're talking about this small range of values:

>>> list(range(800, 2500, 200))
[800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400]

There's no sensible reason for proving any more detailed engine RPMs. It's a sailboat; top speed is 7.5 knots (Nautical miles per hour). Wind and current have far more impact on the boat speed than the difference between 1600 and 1700 RPMs.

The tach can't be read to closer than 100-200 RPM. It's not digital, it's a red pointer near little tick lines. There's no reason to preserve more than a few bits of precision.

Example of Tach translation

Given the engine RPMs and the conversion function, we can deduce that the tachometer display will be between 1000 to 3200. This will map to engine RPMs in the range of about 800 to 2500.

We can confirm this with a mapping like this:

>>> list(map(eng2, range(1000,3200,200)))
[847.3098694826986, 1019.258964596305, 1191.5942982618956,
1364.2609728487703, 1537.2178605443924, 1710.4329833319157,
1883.8807562755746, 2057.5402392829747, 2231.3939741669838,
2405.4271806626366, 2579.627182659544]

We've applied the eng2() mapping from tach to engine RPM. For tach readings between 1000 and 3200 in steps of 200, we've computed the actual engine RPMs.

For those who use spreadsheets a lot, the range() function is like filling a column with values. The map(eng2, …) function is like filling an adjacent column with a calculation. We've created the result of applying a function to each value of a given range.

As shown, this is little difficult to use. We need to do a little more cleanup. What other function do we need to apply to the results?

Round to 100

Here's a function that will round up to the nearest 100:

def next100(n):
   return int(round(n, -2))

We could call this a kind of composite function built from a partial application of round() and int() functions.

If we map this function to the previous results, we get something a little easier to work with. How does this look?

>>> tach= range(1000,3200,200)
>>> list(map(next100, map(eng2, tach)))
[800, 1000, 1200, 1400, 1500, 1700, 1900, 2100, 2200, 2400, 2600]

This expression is a bit complex; let's break it down into three discrete steps:

  • First, map the eng2() function to tach numbers between 1000 and 3200. The result is effectively a sequence of values (it's not actually a list, it's a generator, a potential list)
  • Second, map the next100() function to results of previous mapping
  • Finally, collect a single list object from the results

We've applied two functions, eng2() and next100(), to a list of values. In principle, we've created a kind of composite function, next100○eng20(rpm). Python doesn't support function composition directly, hence the complex-looking map of map syntax.

Interleave sequences of values

The final step is to create a table that shows both the tachometer reading and the computed engine RPMs. We need to interleave the input and output values into a single list of pairs.

Here are the tach readings we're working with, as a list:

>>> tach= range(1000,3200,200)

Here are the engine RPMs:

>>> engine= list(map(next100,map(eng2,tach)))

Here's how we can interleave the two to create something that shows our tachometer reading and engine RPMs:

>>> list(zip(tach, engine))
[(1000, 800), (1200, 1000), (1400, 1200), (1600, 1400), (1800, 1500),
(2000, 1700),(2200, 1900), (2400, 2100), (2600, 2200), (2800, 2400),
(3000, 2600)]

The rest is pretty-printing.

What's important is that we could take functions like eng() or eng2() and apply it to columns of numbers, creating columns of results. The map() function means that we don't have to write explicit for loops to simply apply a function to a sequence of values.

Map is lazy

We have a few other observations about the Python higher-order functions. First, these functions are lazy, they don't compute any results until required by other statements or expressions. Because they don't actually create intermediate list objects, they may be quite fast.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime

The laziness feature is true for the built-in higher-order functions map() and filter(). It's also true for many of the functions in the itertools library. Many of these functions don't simply create a list object, they yield values as requested.

For debugging purposes, we use list() to see what's being produced. If we don't apply list() to the result of a lazy function, we simply see that it's a lazy function. Here's an example:

>>> map(lambda x:x*1.4, range(1000,3200,200))
<map object at 0x102130610>

We don't see a proper result here, because the lazy map() function didn't do anything. The list(), tuple(), or set() functions will force a lazy map() function to actually get up off the couch and compute something.

Function Wrappers

There are a number of Python functions which are syntactic sugar for method functions. One example is the len() function. This function behaves as if it had the following definition:

def len(obj):
   return obj.__len__()

The function acts like it's simply invoking the object's built-in __len__() method.

There are several Python functions that exist only to make the syntax a little more readable. Post-fix syntax purists would prefer to see syntax such as some_list.len(). Those who like their code to look a little more mathematical prefer len(some_list).

Some people will go so far as to claim that the presence of prefix functions means that Python isn't strictly object-oriented. This is false; Python is very strictly object-oriented. It doesn't—however—use only postfix method notation.

We can write function wrappers to make some method functions a little more palatable.

Another good example is the divmod() function. This relies on two method functions, such as the following:

  • a.__divmod__(b)
  • b.__rdivmod__(a)

The usual operator rules apply here. If the class for object a implements __divmod__(), then that's used to compute the result. If not, then the same test is made for the class of object b; if there's an implementation, that will be used to compute the results. Otherwise, it's undefined and we'll get an exception.

Why wrap a method?

Function wrappers for methods are syntactic sugar. They exist to make object methods look like simple functions. In some cases, the functional view is more succinct and expressive.

Sometimes the object involved is obvious. For example, the os module functions provide access to OS-level libraries. The OS object is concealed inside the module.

Sometimes the object is implied. For example, the random module makes a Random instance for us. We can simply call random.randint() without worrying about the object that was required for this to work properly.

Lambdas

A lambda is an anonymous function with a degenerate body. It's like a function in some respects and it's unlike a function because of the following two things:

  • A lambda has no name
  • A lambda has no statements

A lambda's body is a single expression, nothing more. This expression can have parameters, however, which is why a lambda is a handy form of a callable function. The syntax is essentially as follows:

lambda params : expression

Here's a concrete example:

lambda r: 0.7724*r**1.0134

You may recognize this as the eng2() function defined previously. We don't always need a complete, formal function. Sometimes, we just need an expression that has parameters.

Speaking theoretically, a lambda is a one-argument function. When we have multi-argument functions, we can transform it to a series of one-argument lambda forms. This transformation can be helpful for optimization. None of that applies to Python. We'll move on.

Using a Lambda with map

Here are two equivalent results:

  • map(eng2, tach)
  • map(lambda r: 0.7724*r**1.0134, tach)

Here's a previous example, using the lambda instead of the function:

>>> tach= range(1000,3200,200)
>>> list( map(lambda r: 0.7724*r**1.0134, tach))
[847.3098694826986, 1019.258964596305, 1191.5942982618956,
1364.2609728487703, 1537.2178605443924, 1710.4329833319157,
1883.8807562755746, 2057.5402392829747, 2231.3939741669838,
2405.4271806626366, 2579.627182659544]

You could scroll back to see that the results are the same. If we're doing a small thing once only, a lambda object might be more clear than a complete function definition.

Emphasis here is on small once only. If we start trying to reuse a lambda object, or feel the need to assign a lambda object to a variable, we should really consider a function definition and the associated docstring and doctest features.

Another use of Lambdas

A common use of lambdas is with three other higher-order functions: sort(), min(), and max(). We might use one of these with a list object:

  • list.sort(key= lambda x: expr)
  • list.min(key= lambda x: expr)
  • list.max(key= lambda x: expr)

In each case, we're using a lambda object to embed an expression into the argument values for a function. In some cases, the expression might be very sophisticated; in other cases, it might be something as trivial as lambda x: x[1].

When the expression is trivial, a lambda object is a good idea. If the expression is going to get reused, however, a lambda object might be a bad idea.

You can do this… But…

The following kind of statement makes sense:

some_name = lambda x: 3*x+1

We've created a callable object that takes a single argument value and returns a numeric value such as the following command snippet:

def some_name(x): return 3*x+1.

There are some differences. Most notably the following:

  • A lambda object is all on one line of code. A possible advantage.
  • There's no docstring. A disadvantage for lambdas of any complexity.
  • Nor is there any doctest in the missing docstring. A significant problem for a lambda object that requires testing. There are ways to test lambdas with doctest outside a docstring, but it seems simpler to switch to a full function definition.
  • We can't easily apply decorators to it. To do it, we lose the @decorator syntax.
  • We can't use any Python statements in it. In particular, no try-except block is possible.

For these reasons, we suggest limiting the use of lambdas to truly trivial situations.

Callable Objects

A callable object fits the model of a function. The unifying feature of all of the things we've looked at is that they're callable. Functions are the primary example of being callable but objects can also be callable.

Callable objects can be subclasses of collections.abc.Callable. Because of Python's flexibility, this isn't a requirement, it's merely a good idea. To be callable, a class only needs to provide a __call__() method.

Here's a complete callable class definition:

from collections.abc import Callable
class Engine(Callable):
   def __call__(self, tach):
       return 0.7724*tach**1.0134

We've imported the collections.abc.Callable class. This will provide some assurance that any class that extends this abstract superclass will provide a definition for the __call__() method. This is a handy error-checking feature.

Our class extends Callable by providing the needed __call__() method. In this case, the __call__() method performs a calculation on the single parameter value, returning a single result.

Here's a callable object built from this class:

eng= Engine()

This creates a function that we can then use. We can evaluate eng(1000) to get the engine RPMs when the tach reads 1000.

Callable objects step-by-step

There are two parts to making a function a callable object. We'll emphasize these for folks who are new to object-oriented programming:

  1. Define a class. Generally, we make this a subclass of collections.abc.Callable. Technically, we only need to implement a __call__() method. It helps to use the proper superclass because it might help catch a few common mistakes.
  2. Create an instance of the class. This instance will be a callable object. The object that's created will be very similar to a defined function. And very similar to a lambda object that's been assigned to a variable.

While it will be similar to a def statement, it will have one important additional feature: hysteresis. This can be the source of endless bugs. It can also be a way to improve performance.

Callables can have hysteresis

Here's an example of a callable object that uses hysteresis as a kind of optimization:

class Factorial(Callable):
   def __init__(self):
       self.previous = {}
   def __call__(self, n):
       if n not in self.previous:
           self.previous[n]= self.compute(n)
       return self.previous[n]
   def compute(self, n):
       if n == 0 : return 1
       return n*self.__call__(n-1)
Here's how we can use this:
>>> fact= Factorial()
>>> fact(5)
120

We create an instance of the class, and then call the instance to compute a value for us.

The initializer

The initialization method looks like this:

   def __init__(self):
       self.previous = {}

This function creates a cache of previously computed values. This is a technique called memoization. If we've already computed a result once, it's in the self.previous cache; we don't need to compute it again, we already know the answer.

The Callable interface

The required __call__() method looks like this:

   def __call__(self, n):
       if n not in self.previous:
           self.previous[n]= self.compute(n)
       return self.previous[n]

We've checked the memoization cache first. If the value is not there, we're forced to compute the answer, and insert it into the cache. The final answer is always a value in the cache.

A common what if question is what if we have a function of multiple arguments? There are two minuscule changes to support more complex arguments. Use def __call__(self, *n): and self.compute(*n). Since we're only computing factorial, there's no need to over-generalize.

The Compute method

The essential computation has been allocated to a method called compute. It looks like this:

   def compute(self, n):
       if n == 0: return 1
           return n*self.__call__(n-1)

This does the real work of the callable object: it computes n!. In this case, we've used a pretty standard recursive factorial definition. This recursion relies on the __call__() method to check the cache for previous values.

If we don't expect to compute values larger than 1000! (a 2,568 digit number, by the way) the recursion works nicely. If we think we need to compute really large factorials, we'll need to use a different approach.

Execute the following code to compute very large factorials: functools.reduce(operator.mul, range(1,n+1))

Either way, we can depend on the internal memoization to leverage previous results.

Note the potential issue

Hysteresis—memory of what came before—is available to the callable objects. We call functions and lambdas stateless, where callable objects can be stateful.

This may be desirable to optimize performance. We can memoize the previous results or we can design an object that's simply confusing. Consider a function like divmod() that returns two values. We could try to define a callable object that first returns the quotient and on the second call with the same arguments returns the remainder:

>>> crazy_divmod(355,113)
3
>>> crazy_divmod(255,113)
16

This is technically possible. But it's crazy. Warning: Stay away.

We generally expect idempotence: functions do the same thing each time. Implementing memoization didn't alter the basic idempotence of our factorial function.

Generator Functions

Here's a fun generator, the Collatz function. The function creates a sequence using a simple pair of rules. We'll could call this rule, Half-Or-Three-Plus-One (HOTPO). We'll call it collatz():

def collatz(n):
   if n % 2 == 0:
       return n//2
   else:
       return 3*n+1

Each integer argument yields a next integer. These can form a chain. For example, if we start with collatz(13), we get 40. The value of collatz(40) is 20. Here's the sequence of values:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
At 1, it loops: 1 → 4 → 2 → 1 …

Interestingly, all chains seem to lead—eventually—to 1. To explore this, we need a simple function that will build a chain from a given starting value.

Successive values

Here's a generator function that will build a list object. This iterates through values in the sequence until it reaches 1, when it terminates:

def col_list(n):
   seq= [n]
   while n != 1:
       n= collatz(n)
       seq.append(n)
   return seq

This is not wrong. But it's not really the most useful implementation. This always creates a sequence object. In many cases, we don't really want an object, we only want information about the sequence. We might only want the length, or the largest numbers, or the sum.

This is where a generator function might be more useful. A generator function yields elements from the sequence instead of building the sequence as a single object.

Generator functions

To create a generator function, we write a function that has a loop; inside the loop, there's a yield statement.

A function with a yield statement is effectively an Iterable object, it can be used in a for statement to produce values. It doesn't create a big list object, it creates the items that can be accumulated into a list or tuple object.

A generator function is lazy: it doesn't compute anything unless forced to by another function needing results. We can iterate through as many (or as few) of the results as we need.

For example, list(some_generator()) forces all values to be returned. For another example of a lazy generator, look at how range() objects work. If we evaluate range(10), we only get a generator. If we evaluate list(range(10)), we get a list object.

The Collatz generator

Here's a generator function that will produce sequences of values using the collatz() method rule shown previously:

def col_iter(n):
   yield n
   while n != 1:
       n= collatz(n)
       yield n

When we use this in a for loop or with the list() function, this will yield the argument number. While the number is not 1, it will apply the collatz() function and yield successive values in the chain. When it has yielded 1, it will will terminate.

One common pattern for generator functions is to replace all list-accumulation statements with yield statements. Instead of building a list one time at a time, we yield each item.

The collatz() function it lazy. We don't get an answer unless we use list() or tuple() or some variation of a for statement context.

Using a generator function

Here's how this function looks in practice:

>>> for i in col_iter(3):
…   print(i)
3
10
5
16
8
4
2
1

We've used the generator function in a for loop so that it will yield all of the values until it terminates.

Collatz function sequences

Now we can do some exploration of this Collatz sequence. Here are a few evaluations of the col_iter() function:

>>> list(col_iter(3))
[3, 10, 5, 16, 8, 4, 2, 1]
>>> list(col_iter(5))
[5, 16, 8, 4, 2, 1]
>>> list(col_iter(6))
[6, 3, 10, 5, 16, 8, 4, 2, 1]
>>> list(syracuse_iter(13))
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

There's an interesting pattern here. It seems that from 16, we know the rest. Generalizing this: from any number we've already seen, we know the rest.

Wait. This means that memoization might be a big help in exploring the values created by this sequence.

When we start combining function design patterns like this, we're doing functional programming. We're stepping outside the box of purely object-oriented Python.

Alternate styles

Here is an alternative version of the collatz() function:

def collatz2(n):
   return n//2 if n%2 == 0 else 3*n+1

This simply collapses the if statements into a single if expression and may not help much. We also have this:

collatz3= lambda n: n//2 if n%2 == 0 else 3*n+1

We've collapsed the expression into a lambda object. Helpful? Perhaps not. On the other hand, the function doesn't really need all of the overhead of a full function definition and multiple statements. The lambda object seems to capture everything relevant.

Functions as object

There's a higher-level function that will produce values until some ending condition is met. We can plug in one of the versions of the collatz() function and a termination test into this general-purpose function:

def recurse_until(ending, the_function, n):
   yield n
   while not ending(n):
       n= the_function(n)
       yield n

This requires two plug-in functions, they are as follows:

  • ending() is a function to test to see whether we're done, for example, lambda n: n==1
  • the_function() is a form of the Collatz function

We've completely uncoupled the general idea of recursively applying a function from a specific function and a specific terminating condition.

Using the recurs_until() function

We can apply this higher-order recurse_until() function like this:

>>> recurse_until(lambda n: n==1, syracuse2, 9)
<generator object recurse_until at 0x1021278c0>

What's that?

That's how a lazy generator looks: it didn't return any values because we didn't demand any values. We need to use it in a loop or some kind of expression that iterates through all available values. The list() function, for example, will collect all of the values.

Getting the list of values

Here's how we make the lazy generator do some work:

>>> list(_)
[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4,
2, 1]

The _ variable is the previously computed value. It relieves us from the burden of having to write an assignment statement. We can write an expression, see the results, and know the results were automatically saved in the _ variable.

Project Euler #14

Which starting number, under one million, produces the longest chain?

Try it without memoization. It's really simple:

>>> collatz_len= [len(list(recurse_until(lambda n: n==1, 
collatz2, i)))
... for i in range(1,11)]
>>> results = zip(collatz_len, range(1,11))
>>> max(results)
(20, 9)
>>> list(col_iter(9))
[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4,
2, 1]

We defined collatz_len as a list. We're writing a list comprehension that shows the values built from a generator expression. The generator expression evaluates len(something) for i in range(1,11). This means we'll be collecting ten values into the list, each of which is the length of something. The something is a list object built from the recurse_until(lambda n: n==1, collatz2, i) function that we discussed. This will compute the sequence of Collatz values starting from i and proceeding until the value in the sequence is 1.

We've zipped the lengths with the original values of i. This will create pairs of lengths and starting numbers. The maximum length will now have a starting value associated with it so that we can confirm that the results match our expectations.

And yes, this Project Euler problem could—in principle—be solved in a single line of code.

Will this scale to 100? 1,000? 1,000,000? How much will memoization help?

Summary

In this article, we've looked at five (or six) kinds of Python callables. They all fit the y = f(x) model of a function to varying degrees. When is it appropriate to use each of these different ways to express the same essential concept?

Functions are created with def and return. It shouldn't come as a surprise that this should cover most cases. This allows a docstring comment and doctest test cases. We could call these def functions, since they're built with the def statement.

Higher-order functions—map(), filter(), and the itertools library—are generally written as plain-old def functions. They're higher-order because they accept functions as arguments or return functions as results. Otherwise, they're just functions.

Function wrappers—len(), divmod(), pow(), str(), and repr()—are function syntax wrapped around object methods. These are def'd functions with very tiny bodies. We use them because a.pow(2) doesn't seem as clear as pow(2,a).

Lambdas are appropriate for one-time use of something so simple that it doesn't deserve being wrapped in a def statement body. In some cases, we have a small nugget of code that seems more clear when written as a lambda expression rather than a more complete function definition. Simple filter rules, and simple computations are often more clearly shown as a lambda object.

The Callable objects have a special property that other functions lack: hysteresis. They can retain the results of previous calculations. We've used this hysteresis property to implement memoizing. This can be a huge performance boost. Callable objects can be used badly, however, to create objects that have simply bizarre behavior. Most functions should strive for idempotence—the same arguments should yield the same results.

Generator functions are created with a def statement and at least one yield statement. These functions are iterable. They can be used in a for statement to examine each resulting value. They can also be used with functions like list(), tuple(), and set() to create an actual object from the iterable sequence of values. We might combine them with higher-order functions to do complex processing, one item at a time.

It's important to work with each of these kinds of callables. If you only have one tool—a hammer—then every problem has to be reshaped into a nail before you can solve it. Once you have multiple tools available, you can pick the tools that provides the most succinct and expressive solution to the problem.

Resources for Article:


Further resources on this subject: