





















































This article is written by Steven Lott, author of the book Functional Python Programming.
You can find more about him at http://slott-softwarearchitect.blogspot.com.
(For more resources related to this topic, see here.)
We're going to look at various ways that Python 3 lets us define things which behave like functions. The proper term here is Callable – we're looking at objects that can be called like a function.
We'll look at the following Python constructs:
And yes, we're aware that the list above has six items on it. That's because higher-order functions in Python aren't really all that complex or different. In some languages, functions that take functions are arguments involving special syntax. In Python, it's simple and common and barely worth mentioning as a separate topic.
We'll look at when it's appropriate and inappropriate to use one or the other of these various functional forms.
Let's take a quick peek at a basic bit of mathematical formalism. We'll look at a function as an abstract formalism. We often annotate it like this:
This shows us that f() is a function. It has one argument, x, and will map this to a single value, y. Some mathematical functions are written in front, for example, y=sin x. Some are written in other places around the argument, for example, y=|x|.
In Python, the syntax is more consistent, for example, we use a function like this:
>>> abs(-5)
5
We've applied the abs() function to an argument value of -5. The argument value was mapped to a value of 5.
Consider the following function:
In this definition, the argument is a pair of values, (a,b). This is called the domain. We can summarize it as the domain of values for which the function is defined. Outside this domain, the function is not defined. In Python, we get a TypeError exception if we provide one value or three values as the argument.
The function maps the domain pair to a pair of values, (q,r). This is the range of the function. We can call this the range of values that could be returned by the function.
As we look at the abstract mathematical definition of functions, we note that functions are generally assumed to have no hysteresis; they have no history or memory of prior use. This is sometimes called the property of being idempotent: the results are always the same for a given argument value.
We see this in Python as a common feature. But it's not universally true. We'll look at a number of exceptions to the rule of idempotence. Here's an example of the usual situation:
>>> int("10f1", 16)
4337
The value returned from the evaluation of int("10f1", 16) never changes.
There are, however, some common examples of non-idempotent functions in Python.
Here are three common situations where a function has hysteresis. In some cases, results vary based on history. In other cases, results vary based on events in some external environment, such as follows:
These are examples of Python functions that don't completely fit the formal mathematical definition; they lack idempotence, and their values depend on history, other functions, or both.
Python has two statements that are essential features of function definition. The def statement specifies the domain and the return statement(s) specify the range. A simplified gloss of the syntax is as follows:
def name(params):
body
return expression
In effect, the function's domain is defined by the parameters provided in the def statement. This list of parameter names is not all the information on the domain, however. Even if we use one of the Python extensions to add type annotations, that's still not all the information.
There may be if statements in the body of the function that impose additional explicit restrictions. There may be other functions that impose their own kind of implicit restrictions. If, for example, the body included math.sqrt() then there would be an implicit restriction on some values being non-negative.
The return statements provide the function's range. An empty return statement means a range of simply None values. When there are multiple return statements, the range is the union of the ranges on all the return statements.
This mapping between Python syntax and mathematical concepts isn't very complete. We need more information about a function.
Here's an example of function definition:
def odd(n):
"""odd(n) -> boolean, true if n is odd."""
return n % 2 == 1
What do does this definition tell us? Several things such as:
Because of the inherent ambiguity in Python, this function has provided a triple-quoted """Docstring""" parameter with a summary of the function. This is a best practice, and should be followed universally except in articles like this where it gets too long-winded to include a docstring parameter everywhere.
In this case, the doctoring parameter doesn't state unambiguously that n is intended to be a number. There are two ways to handle this gap, they are as follows:
Either is acceptable. Both are preferable.
To complete this example, here's how we'd use this odd little function named odd():
>>> odd(3)
True
>>> odd(4)
False
This kind of example text can be included into the docstring parameter to create two test cases that offer insight into what the function really means.
More verbose type declarations—as used in many popular programming languages—aren't actually enough information to fully specify a function's domain and range. To be rigorously complete, we need type definitions that include optional predicates. Take a look at the following command:
isinstance(n,int) and n >= 0
The assert statement is a good place for this kind of additional argument domain checking. This isn't the perfect solution because assert statements can be disabled very easily. It can help during design and testing and it can help people to read your code.
The fussy formal declarations of data type used in other languages are not really needed in Python. Python replaces an up-front claim about required types with a runtime search for appropriate class methods.
This works because each Python object has all the type information bound into it. Static compile-time type information is redundant, since the runtime type information is complete.
A Python function definition is pretty spare. In includes the minimal amount of information about the function. There are no formal declaration of parameter types or return type.
This odd little function will work with any object that implements the % operator:
In Python, functions we declare are proper first-class objects. This means that they have attributes that can be assigned to variables and placed into collections. Quite a few clever things can be done with function objects.
One of the most elegant things is to use a function as an argument or a return value from another function. The ability to do this means that we can easily create and use higher-order functions in Python.
For folks who know languages such as C (and C++), functions aren't proper first-class objects. A pointer to a function, however, is a first class object in C. But the function itself is a block of code that can't easily be manipulated.
We'll look at a number of simple ways in which we can write—and use—higher-order functions in Python.
Consider the following command example:
>>> not_even = odd
>>> not_even(3)
True
We've assigned the odd little function object to a new variable, not_even.
This creates an alias for a function. While this isn't always the best idea, there are times when we might want to provide an alternate name for a function as part of maintaining reverse compatibility with a previous release of a library.
Consider the following function definition:
def some_test(function, value):
print(function, value)
return function(value)
This function's domain includes arguments named function and value. We can see that it prints the arguments, then applies the function argument to the given value.
When we use the preceding function, it looks like this:
>>> some_test(odd, 3)
<function odd at 0x613978> 3
True
The some_test() function accepted a function as an argument. When we printed the function, we got a summary, <function odd at 0x613978>, that shows us some information about the object. We also show a summary of the argument value, 3.
When we applied the function to a value, we got the expected result. We can—of course—extend this concept. In particular, we can apply a single function to many values.
Higher-order functions become particularly useful when we apply them to collections of objects. The built-in map() function applies a simple function to each value in an argument sequence. Here's an example:
>>> list(map(odd, [1,2,3,4]))
[True, False, True, False]
We've used the map() function to apply the odd() function to each value in the sequence. This is a lot like evaluating:
>>> [odd(x) for x in [1,2,3,4]]
We've created a list comprehension instead of applying a higher-order map() function. This is equivalent to the following command snippet:
[odd(1), odd(2), odd(3), odd(4)]
Here, we've manually applied the odd() function to each value in a sequence. Yes, that's a diesel engine alternator and some hoses:
We'll use this alternator as a subject for some concrete examples of higher-order functions.
Some basic diesel engine mechanics. The following some basic information:
The alternator provides an indirect measurement of engine RPMs. Direct measurement would involve connecting to a small geared shaft. It's difficult and expensive. We already have a tachometer; it's just incorrect.
The new alternator has new wheels. The ratios between engine and alternator have changed.
We're not interested in installing a new tachometer. Instead, we'll create a conversion from a number on the tachometer, which is calibrated to the old alternator, to a proper number of engine RPMs. This has to allow the change in ratio between the original tachometer and the new tach.
Let's collect some data and see what we can figure out about engine RPMs.
First approximation: all we did was get new wheels. We can presume that the old tachometer was correct. Since the new wheel is smaller, we'll have higher alternator RPMs. That means higher readings on the old tachometer.
Here's the key question: How far wrong are the RPMs?
The old wheel was approximately 3.5 RPM and the new wheel is approximately 2.5 RPM.
We can compute the potential ratio between what the tach says and what the engine is really doing:
>>> 3.5/2.5
1.4
>>> 1/_
0.7142857142857143
That's nice. Is it right?
Can we really just multiply and display RPMs by .7 to get actual engine RPMs?
Let's create the conversion card first, then collect some more data.
Given RPM on the tachometer, what's the real RPM of the engine? Use the following command to find the RPM:
def eng(r):
return r/1.4
Use it like the following:
>>> eng(2100)
1500.0
This seems useful. Tach says 2100, engine (theoretically) spinning at 1500, more or less.
Let's confirm our hypothesis with some real data.
Over a period of time, we recorded tachometer readings and actual RPMs using a visual RPM measuring device. The visual device requires a strip of reflective tape on one of the engine wheels. It uses a laser and counts returns per minute. Simple. Elegant. Accurate.
It's really inconvenient. But it got some data we could digest.
Skipping some boring statistics, we wind up with the following function that maps displayed RPMs to actual RPMs, such as this:
def eng2(r):
return 0.7724*r**1.0134
Here's a sample result:
>>> eng2(2100)
1797.1291903589386
When tach says 2100, the engine is measured as spinning at about 1800 RPM.
That's not quite the same as the theoretical model. But it's so close that it gives us a lot of confidence in this version.
Of course, the number displayed is hideous. All that floating-point cruft is crazy. What can we do?
Rounding is only part of the solution. We need to think through the use case. After all, we use this standing at the helm of the boat; how much detail is appropriate?
The engine has governors and only runs between 800 and 2500 RPM. There's a very tight limit here.
Realistically, we're talking about this small range of values:
>>> list(range(800, 2500, 200))
[800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400]
There's no sensible reason for proving any more detailed engine RPMs. It's a sailboat; top speed is 7.5 knots (Nautical miles per hour). Wind and current have far more impact on the boat speed than the difference between 1600 and 1700 RPMs.
The tach can't be read to closer than 100-200 RPM. It's not digital, it's a red pointer near little tick lines. There's no reason to preserve more than a few bits of precision.
Given the engine RPMs and the conversion function, we can deduce that the tachometer display will be between 1000 to 3200. This will map to engine RPMs in the range of about 800 to 2500.
We can confirm this with a mapping like this:
>>> list(map(eng2, range(1000,3200,200)))
[847.3098694826986, 1019.258964596305, 1191.5942982618956,
1364.2609728487703, 1537.2178605443924, 1710.4329833319157,
1883.8807562755746, 2057.5402392829747, 2231.3939741669838,
2405.4271806626366, 2579.627182659544]
We've applied the eng2() mapping from tach to engine RPM. For tach readings between 1000 and 3200 in steps of 200, we've computed the actual engine RPMs.
For those who use spreadsheets a lot, the range() function is like filling a column with values. The map(eng2, …) function is like filling an adjacent column with a calculation. We've created the result of applying a function to each value of a given range.
As shown, this is little difficult to use. We need to do a little more cleanup. What other function do we need to apply to the results?
Here's a function that will round up to the nearest 100:
def next100(n):
return int(round(n, -2))
We could call this a kind of composite function built from a partial application of round() and int() functions.
If we map this function to the previous results, we get something a little easier to work with. How does this look?
>>> tach= range(1000,3200,200)
>>> list(map(next100, map(eng2, tach)))
[800, 1000, 1200, 1400, 1500, 1700, 1900, 2100, 2200, 2400, 2600]
This expression is a bit complex; let's break it down into three discrete steps:
We've applied two functions, eng2() and next100(), to a list of values. In principle, we've created a kind of composite function, next100○eng20(rpm). Python doesn't support function composition directly, hence the complex-looking map of map syntax.
The final step is to create a table that shows both the tachometer reading and the computed engine RPMs. We need to interleave the input and output values into a single list of pairs.
Here are the tach readings we're working with, as a list:
>>> tach= range(1000,3200,200)
Here are the engine RPMs:
>>> engine= list(map(next100,map(eng2,tach)))
Here's how we can interleave the two to create something that shows our tachometer reading and engine RPMs:
>>> list(zip(tach, engine))
[(1000, 800), (1200, 1000), (1400, 1200), (1600, 1400), (1800, 1500),
(2000, 1700),(2200, 1900), (2400, 2100), (2600, 2200), (2800, 2400),
(3000, 2600)]
The rest is pretty-printing.
What's important is that we could take functions like eng() or eng2() and apply it to columns of numbers, creating columns of results. The map() function means that we don't have to write explicit for loops to simply apply a function to a sequence of values.
We have a few other observations about the Python higher-order functions. First, these functions are lazy, they don't compute any results until required by other statements or expressions. Because they don't actually create intermediate list objects, they may be quite fast.
The laziness feature is true for the built-in higher-order functions map() and filter(). It's also true for many of the functions in the itertools library. Many of these functions don't simply create a list object, they yield values as requested.
For debugging purposes, we use list() to see what's being produced. If we don't apply list() to the result of a lazy function, we simply see that it's a lazy function. Here's an example:
>>> map(lambda x:x*1.4, range(1000,3200,200))
<map object at 0x102130610>
We don't see a proper result here, because the lazy map() function didn't do anything. The list(), tuple(), or set() functions will force a lazy map() function to actually get up off the couch and compute something.
There are a number of Python functions which are syntactic sugar for method functions. One example is the len() function. This function behaves as if it had the following definition:
def len(obj):
return obj.__len__()
The function acts like it's simply invoking the object's built-in __len__() method.
There are several Python functions that exist only to make the syntax a little more readable. Post-fix syntax purists would prefer to see syntax such as some_list.len(). Those who like their code to look a little more mathematical prefer len(some_list).
Some people will go so far as to claim that the presence of prefix functions means that Python isn't strictly object-oriented. This is false; Python is very strictly object-oriented. It doesn't—however—use only postfix method notation.
We can write function wrappers to make some method functions a little more palatable.
Another good example is the divmod() function. This relies on two method functions, such as the following:
The usual operator rules apply here. If the class for object a implements __divmod__(), then that's used to compute the result. If not, then the same test is made for the class of object b; if there's an implementation, that will be used to compute the results. Otherwise, it's undefined and we'll get an exception.
Function wrappers for methods are syntactic sugar. They exist to make object methods look like simple functions. In some cases, the functional view is more succinct and expressive.
Sometimes the object involved is obvious. For example, the os module functions provide access to OS-level libraries. The OS object is concealed inside the module.
Sometimes the object is implied. For example, the random module makes a Random instance for us. We can simply call random.randint() without worrying about the object that was required for this to work properly.
A lambda is an anonymous function with a degenerate body. It's like a function in some respects and it's unlike a function because of the following two things:
A lambda's body is a single expression, nothing more. This expression can have parameters, however, which is why a lambda is a handy form of a callable function. The syntax is essentially as follows:
lambda params : expression
Here's a concrete example:
lambda r: 0.7724*r**1.0134
You may recognize this as the eng2() function defined previously. We don't always need a complete, formal function. Sometimes, we just need an expression that has parameters.
Speaking theoretically, a lambda is a one-argument function. When we have multi-argument functions, we can transform it to a series of one-argument lambda forms. This transformation can be helpful for optimization. None of that applies to Python. We'll move on.
Here are two equivalent results:
Here's a previous example, using the lambda instead of the function:
>>> tach= range(1000,3200,200)
>>> list( map(lambda r: 0.7724*r**1.0134, tach))
[847.3098694826986, 1019.258964596305, 1191.5942982618956,
1364.2609728487703, 1537.2178605443924, 1710.4329833319157,
1883.8807562755746, 2057.5402392829747, 2231.3939741669838,
2405.4271806626366, 2579.627182659544]
You could scroll back to see that the results are the same. If we're doing a small thing once only, a lambda object might be more clear than a complete function definition.
Emphasis here is on small once only. If we start trying to reuse a lambda object, or feel the need to assign a lambda object to a variable, we should really consider a function definition and the associated docstring and doctest features.
A common use of lambdas is with three other higher-order functions: sort(), min(), and max(). We might use one of these with a list object:
In each case, we're using a lambda object to embed an expression into the argument values for a function. In some cases, the expression might be very sophisticated; in other cases, it might be something as trivial as lambda x: x[1].
When the expression is trivial, a lambda object is a good idea. If the expression is going to get reused, however, a lambda object might be a bad idea.
The following kind of statement makes sense:
some_name = lambda x: 3*x+1
We've created a callable object that takes a single argument value and returns a numeric value such as the following command snippet:
def some_name(x): return 3*x+1.
There are some differences. Most notably the following:
For these reasons, we suggest limiting the use of lambdas to truly trivial situations.
A callable object fits the model of a function. The unifying feature of all of the things we've looked at is that they're callable. Functions are the primary example of being callable but objects can also be callable.
Callable objects can be subclasses of collections.abc.Callable. Because of Python's flexibility, this isn't a requirement, it's merely a good idea. To be callable, a class only needs to provide a __call__() method.
Here's a complete callable class definition:
from collections.abc import Callable
class Engine(Callable):
def __call__(self, tach):
return 0.7724*tach**1.0134
We've imported the collections.abc.Callable class. This will provide some assurance that any class that extends this abstract superclass will provide a definition for the __call__() method. This is a handy error-checking feature.
Our class extends Callable by providing the needed __call__() method. In this case, the __call__() method performs a calculation on the single parameter value, returning a single result.
Here's a callable object built from this class:
eng= Engine()
This creates a function that we can then use. We can evaluate eng(1000) to get the engine RPMs when the tach reads 1000.
There are two parts to making a function a callable object. We'll emphasize these for folks who are new to object-oriented programming:
While it will be similar to a def statement, it will have one important additional feature: hysteresis. This can be the source of endless bugs. It can also be a way to improve performance.
Here's an example of a callable object that uses hysteresis as a kind of optimization:
class Factorial(Callable):
def __init__(self):
self.previous = {}
def __call__(self, n):
if n not in self.previous:
self.previous[n]= self.compute(n)
return self.previous[n]
def compute(self, n):
if n == 0 : return 1
return n*self.__call__(n-1)
Here's how we can use this:
>>> fact= Factorial()
>>> fact(5)
120
We create an instance of the class, and then call the instance to compute a value for us.
The initialization method looks like this:
def __init__(self):
self.previous = {}
This function creates a cache of previously computed values. This is a technique called memoization. If we've already computed a result once, it's in the self.previous cache; we don't need to compute it again, we already know the answer.
The required __call__() method looks like this:
def __call__(self, n):
if n not in self.previous:
self.previous[n]= self.compute(n)
return self.previous[n]
We've checked the memoization cache first. If the value is not there, we're forced to compute the answer, and insert it into the cache. The final answer is always a value in the cache.
A common what if question is what if we have a function of multiple arguments? There are two minuscule changes to support more complex arguments. Use def __call__(self, *n): and self.compute(*n). Since we're only computing factorial, there's no need to over-generalize.
The essential computation has been allocated to a method called compute. It looks like this:
def compute(self, n):
if n == 0: return 1
return n*self.__call__(n-1)
This does the real work of the callable object: it computes n!. In this case, we've used a pretty standard recursive factorial definition. This recursion relies on the __call__() method to check the cache for previous values.
If we don't expect to compute values larger than 1000! (a 2,568 digit number, by the way) the recursion works nicely. If we think we need to compute really large factorials, we'll need to use a different approach.
Execute the following code to compute very large factorials: functools.reduce(operator.mul, range(1,n+1))
Either way, we can depend on the internal memoization to leverage previous results.
Hysteresis—memory of what came before—is available to the callable objects. We call functions and lambdas stateless, where callable objects can be stateful.
This may be desirable to optimize performance. We can memoize the previous results or we can design an object that's simply confusing. Consider a function like divmod() that returns two values. We could try to define a callable object that first returns the quotient and on the second call with the same arguments returns the remainder:
>>> crazy_divmod(355,113)
3
>>> crazy_divmod(255,113)
16
This is technically possible. But it's crazy. Warning: Stay away.
We generally expect idempotence: functions do the same thing each time. Implementing memoization didn't alter the basic idempotence of our factorial function.
Here's a fun generator, the Collatz function. The function creates a sequence using a simple pair of rules. We'll could call this rule, Half-Or-Three-Plus-One (HOTPO). We'll call it collatz():
def collatz(n):
if n % 2 == 0:
return n//2
else:
return 3*n+1
Each integer argument yields a next integer. These can form a chain. For example, if we start with collatz(13), we get 40. The value of collatz(40) is 20. Here's the sequence of values:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
At 1, it loops: 1 → 4 → 2 → 1 …
Interestingly, all chains seem to lead—eventually—to 1. To explore this, we need a simple function that will build a chain from a given starting value.
Here's a generator function that will build a list object. This iterates through values in the sequence until it reaches 1, when it terminates:
def col_list(n):
seq= [n]
while n != 1:
n= collatz(n)
seq.append(n)
return seq
This is not wrong. But it's not really the most useful implementation. This always creates a sequence object. In many cases, we don't really want an object, we only want information about the sequence. We might only want the length, or the largest numbers, or the sum.
This is where a generator function might be more useful. A generator function yields elements from the sequence instead of building the sequence as a single object.
To create a generator function, we write a function that has a loop; inside the loop, there's a yield statement.
A function with a yield statement is effectively an Iterable object, it can be used in a for statement to produce values. It doesn't create a big list object, it creates the items that can be accumulated into a list or tuple object.
A generator function is lazy: it doesn't compute anything unless forced to by another function needing results. We can iterate through as many (or as few) of the results as we need.
For example, list(some_generator()) forces all values to be returned. For another example of a lazy generator, look at how range() objects work. If we evaluate range(10), we only get a generator. If we evaluate list(range(10)), we get a list object.
Here's a generator function that will produce sequences of values using the collatz() method rule shown previously:
def col_iter(n):
yield n
while n != 1:
n= collatz(n)
yield n
When we use this in a for loop or with the list() function, this will yield the argument number. While the number is not 1, it will apply the collatz() function and yield successive values in the chain. When it has yielded 1, it will will terminate.
One common pattern for generator functions is to replace all list-accumulation statements with yield statements. Instead of building a list one time at a time, we yield each item.
The collatz() function it lazy. We don't get an answer unless we use list() or tuple() or some variation of a for statement context.
Here's how this function looks in practice:
>>> for i in col_iter(3):
… print(i)
3
10
5
16
8
4
2
1
We've used the generator function in a for loop so that it will yield all of the values until it terminates.
Now we can do some exploration of this Collatz sequence. Here are a few evaluations of the col_iter() function:
>>> list(col_iter(3))
[3, 10, 5, 16, 8, 4, 2, 1]
>>> list(col_iter(5))
[5, 16, 8, 4, 2, 1]
>>> list(col_iter(6))
[6, 3, 10, 5, 16, 8, 4, 2, 1]
>>> list(syracuse_iter(13))
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
There's an interesting pattern here. It seems that from 16, we know the rest. Generalizing this: from any number we've already seen, we know the rest.
Wait. This means that memoization might be a big help in exploring the values created by this sequence.
When we start combining function design patterns like this, we're doing functional programming. We're stepping outside the box of purely object-oriented Python.
Here is an alternative version of the collatz() function:
def collatz2(n):
return n//2 if n%2 == 0 else 3*n+1
This simply collapses the if statements into a single if expression and may not help much. We also have this:
collatz3= lambda n: n//2 if n%2 == 0 else 3*n+1
We've collapsed the expression into a lambda object. Helpful? Perhaps not. On the other hand, the function doesn't really need all of the overhead of a full function definition and multiple statements. The lambda object seems to capture everything relevant.
There's a higher-level function that will produce values until some ending condition is met. We can plug in one of the versions of the collatz() function and a termination test into this general-purpose function:
def recurse_until(ending, the_function, n):
yield n
while not ending(n):
n= the_function(n)
yield n
This requires two plug-in functions, they are as follows:
We've completely uncoupled the general idea of recursively applying a function from a specific function and a specific terminating condition.
We can apply this higher-order recurse_until() function like this:
>>> recurse_until(lambda n: n==1, syracuse2, 9)
<generator object recurse_until at 0x1021278c0>
That's how a lazy generator looks: it didn't return any values because we didn't demand any values. We need to use it in a loop or some kind of expression that iterates through all available values. The list() function, for example, will collect all of the values.
Here's how we make the lazy generator do some work:
>>> list(_)
[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4,
2, 1]
The _ variable is the previously computed value. It relieves us from the burden of having to write an assignment statement. We can write an expression, see the results, and know the results were automatically saved in the _ variable.
Which starting number, under one million, produces the longest chain?
Try it without memoization. It's really simple:
>>> collatz_len= [len(list(recurse_until(lambda n: n==1,
collatz2, i)))
... for i in range(1,11)]
>>> results = zip(collatz_len, range(1,11))
>>> max(results)
(20, 9)
>>> list(col_iter(9))
[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4,
2, 1]
We defined collatz_len as a list. We're writing a list comprehension that shows the values built from a generator expression. The generator expression evaluates len(something) for i in range(1,11). This means we'll be collecting ten values into the list, each of which is the length of something. The something is a list object built from the recurse_until(lambda n: n==1, collatz2, i) function that we discussed. This will compute the sequence of Collatz values starting from i and proceeding until the value in the sequence is 1.
We've zipped the lengths with the original values of i. This will create pairs of lengths and starting numbers. The maximum length will now have a starting value associated with it so that we can confirm that the results match our expectations.
And yes, this Project Euler problem could—in principle—be solved in a single line of code.
Will this scale to 100? 1,000? 1,000,000? How much will memoization help?
In this article, we've looked at five (or six) kinds of Python callables. They all fit the y = f(x) model of a function to varying degrees. When is it appropriate to use each of these different ways to express the same essential concept?
Functions are created with def and return. It shouldn't come as a surprise that this should cover most cases. This allows a docstring comment and doctest test cases. We could call these def functions, since they're built with the def statement.
Higher-order functions—map(), filter(), and the itertools library—are generally written as plain-old def functions. They're higher-order because they accept functions as arguments or return functions as results. Otherwise, they're just functions.
Function wrappers—len(), divmod(), pow(), str(), and repr()—are function syntax wrapped around object methods. These are def'd functions with very tiny bodies. We use them because a.pow(2) doesn't seem as clear as pow(2,a).
Lambdas are appropriate for one-time use of something so simple that it doesn't deserve being wrapped in a def statement body. In some cases, we have a small nugget of code that seems more clear when written as a lambda expression rather than a more complete function definition. Simple filter rules, and simple computations are often more clearly shown as a lambda object.
The Callable objects have a special property that other functions lack: hysteresis. They can retain the results of previous calculations. We've used this hysteresis property to implement memoizing. This can be a huge performance boost. Callable objects can be used badly, however, to create objects that have simply bizarre behavior. Most functions should strive for idempotence—the same arguments should yield the same results.
Generator functions are created with a def statement and at least one yield statement. These functions are iterable. They can be used in a for statement to examine each resulting value. They can also be used with functions like list(), tuple(), and set() to create an actual object from the iterable sequence of values. We might combine them with higher-order functions to do complex processing, one item at a time.
It's important to work with each of these kinds of callables. If you only have one tool—a hammer—then every problem has to be reshaped into a nail before you can solve it. Once you have multiple tools available, you can pick the tools that provides the most succinct and expressive solution to the problem.
Further resources on this subject: