Book Excerpt: Chapter 6: Functions and Functional Programming
Read an excerpt from the 4th Edition of the book Python Essential Reference by David Beazley.
Chapter 6: Functions and Functional Programming
Substantial programs are broken up into functions for better modularity and ease of maintenance. Python makes it easy to define functions but also incorporates a surprising number of features from functional programming languages. This chapter describes the basic mechanisms associated with Python functions including scoping rules, closures, decorators, generators, and coroutines.
Functions
Functions are defined with the def statement:
def add(x,y): return x + y
The body of a function is simply a sequence of statements that execute when the function is called. You invoke a function by writing the function name followed by a tuple of function arguments, such as a = add(3,4). The order and number of arguments must match those given in the function definition. If a mismatch exists, a TypeError exception is raised.
You can attach default arguments to function parameters by assigning values in the function definition. For example:
def split(line,delimiter=','): statements
When a function defines a parameter with a default value, that parameter and all the parameters that follow are optional. If values are not assigned to all the optional parameters in the function definition, a SyntaxError exception is raised.
Default parameter values are always set to the objects that were supplied as values when the function was defined. Here’s an example:
a = 10 def foo(x=a): return x a = 5 # Reassign 'a'. foo() # returns 10 (default value not changed)
In addition, the use of mutable objects as default values may lead to unintended behavior:
def foo(x, items=[]): items.append(x) return items foo(1) # returns [1] foo(2) # returns [1, 2] foo(3) # returns [1, 2, 3]
Notice how the default argument retains modifications made from previous invocations. To prevent this, it is better to use None and add a check as follows:
def foo(x, items=None): if items is None: items = [] items.append(x) return items
A function can accept a variable number of parameters if an asterisk (*) is added to the last parameter name:
def fprintf(file, fmt, *args): file.write(fmt % args) # Use fprintf. args gets (42,"hello world", 3.45) fprintf(out,"%d %s %f", 42, "hello world", 3.45)
In this case, all the remaining arguments are placed into the args variable as a tuple. To pass a tuple args to a function as if they were parameters, the *args syntax can be used in a function call as follows:
def printf(fmt, *args): # Call another function and pass along args fprintf(sys.stdout, fmt, *args)
Function arguments can also be supplied by explicitly naming each parameter and specifying a value. These are known as keyword arguments. Here is an example:
def foo(w,x,y,z): statements # Keyword argument invocation foo(x=3, y=22, w='hello', z=[1,2])
With keyword arguments, the order of the parameters doesn’t matter. However, unless there are default values, you must explicitly name all of the required function parameters. If you omit any of the required parameters or if the name of a keyword doesn’t match any of the parameter names in the function definition, a TypeError exception is raised. Also, since any Python function can be called using the keyword calling style, it is generally a good idea to define functions with descriptive argument names.
Positional arguments and keyword arguments can appear in the same function call, provided that all the positional arguments appear first, values are provided for all non-optional arguments, and no argument value is defined more than once. Here’s an example:
foo('hello', 3, z=[1,2], y=22) foo(3, 22, w='hello', z=[1,2]) # TypeError. Multiple values for w
If the last argument of a function definition begins with **, all the additional keyword arguments (those that don’t match any of the other parameter names) are placed in a dictionary and passed to the function. This can be a useful way to write functions that accept a large number of potentially open-ended configuration options that would be too unwieldy to list as parameters. Here’s an example:
def make_table(data, **parms): # Get configuration parameters from parms (a dict) fgcolor = parms.pop("fgcolor","black") bgcolor = parms.pop("bgcolor","white") width = parms.pop("width",None) ... # No more options if parms: raise TypeError("Unsupported configuration options %s" % list(parms)) make_table(items, fgcolor="black", bgcolor="white", border=1, borderstyle="grooved", cellpadding=10, width=400)
You can combine extra keyword arguments with variable-length argument lists, as long as the ** parameter appears last:
# Accept variable number of positional or keyword arguments def spam(*args, **kwargs): # args is a tuple of positional args # kwargs is dictionary of keyword args ...
Keyword arguments can also be passed to another function using the **kwargs syntax:
def callfunc(*args, **kwargs): func(*args,**kwargs)
This use of *args and **kwargs is commonly used to write wrappers and proxies for other functions. For example, the callfunc() accepts any combination of arguments and simply passes them through to func().
Parameter Passing and Return Values
When a function is invoked, the function parameters are simply names that refer to the passed input objects. The underlying semantics of parameter passing doesn’t neatly fit into any single style, such as “pass by value” or “pass by reference,” that you might know about from other programming languages. For example, if you pass an immutable value, the argument effectively looks like it was passed by value. However, if a mutable object (such as a list or dictionary) is passed to a function where it’s then modified, those changes will be reflected in the original object. Here’s an example:
a = [1, 2, 3, 4, 5] def square(items): for i,x in enumerate(items): items[i] = x * x # Modify items in-place square(a) # Changes a to [1, 4, 9, 16, 25]
Functions that mutate their input values or change the state of other parts of the program behind the scenes like this are said to have side effects. As a general rule, this is a programming style that is best avoided because such functions can become a source of subtle programming errors as programs grow in size and complexity (for example, it’s not obvious from reading a function call if a function has side effects). Such functions interact poorly with programs involving threads and concurrency because side effects typically need to be protected by locks.
The return statement returns a value from a function. If no value is specified or you omit the return statement, the None object is returned. To return multiple values, place them in a tuple:
def factor(a): d = 2 while (d <= (a / 2)): if ((a / d) * d == a): return ((a / d), d) d = d + 1 return (a, 1)
Multiple return values returned in a tuple can be assigned to individual variables:
x, y = factor(1243) # Return values placed in x and y.
or
(x, y) = factor(1243) # Alternate version. Same behavior.
Scoping Rules
Each time a function executes, a new local namespace is created. This namespace represents a local environment that contains the names of the function parameters, as well as the names of variables that are assigned inside the function body. When resolving names, the interpreter first searches the local namespace. If no match exists, it searches the global namespace. The global namespace for a function is always the module in which the function was defined. If the interpreter finds no match in the global namespace, it makes a final check in the built-in namespace. If this fails, a NameError exception is raised.
One peculiarity of namespaces is the manipulation of global variables within a function. For example, consider the following code:
a = 42 def foo(): a = 13 foo() # a is still 42
When this code executes, a retains its value of 42, despite the appearance that we might be modifying the variable a inside the function foo. When variables are assigned inside a function, they’re always bound to the function’s local namespace; as a result, the variable a in the function body refers to an entirely new object containing the value 13, not the outer variable. To alter this behavior, use the global statement. global simply declares names as belonging to the global namespace, and it’s necessary only when global variables will be modified. It can be placed anywhere in a function body and used repeatedly. Here’s an example:
a = 42 b = 37 def foo(): global a # 'a' is in global namespace a = 13 b = 0 foo() # a is now 13. b is still 37.
Python supports nested function definitions. Here’s an example:
def countdown(start): n = start def display(): # Nested function definition print('T-minus %d' % n) while n > 0: display() n -= 1
Variables in nested functions are bound using lexical scoping. That is, names are resolved by first checking the local scope and then all enclosing scopes of outer function definitions from the innermost scope to the outermost scope. If no match is found, the global and built-in namespaces are checked as before. Although names in enclosing scopes are accessible, Python 2 only allows variables to be reassigned in the innermost scope (local variables) and the global namespace (using global). Therefore, an inner function can’t reassign the value of a local variable defined in an outer function. For example, this code does not work:
def countdown(start): n = start def display(): print('T-minus %d' % n) def decrement(): n -= 1 # Fails while n > 0: display() decrement()
In Python 2, you can work around this by placing values you want to change in a list or dictionary. In Python 3, you can declare n as nonlocal as follows:
def countdown(start): n = start def display(): print('T-minus %d' % n) def decrement(): nonlocal n # Bind to outer n (Python 3 only) n -= 1 while n > 0: display() decrement()
Functions as Objects and Closures
Functions are first-class objects in Python. This means that they can be passed as arguments to other functions, placed in data structures, and returned by a function as a result. Here is an example of a function that accepts another function as input and calls it:
# foo.py def callf(func): return func()
Here is an example of using the above function:
>>> import foo >>> def helloworld(): ... return 'Hello World' ... >>> foo.callf(helloworld) # Pass a function as an argument 'Hello World' >>>
When a function is handled as data, it implicitly carries information related to the surrounding environment where the function was defined. This affects how free variables in the function are bound. As an example, consider this modified version foo.py that now contains a variable definition:
# foo.py x = 42 def callf(func): return func()
Now, observe the behavior of this example:
>>> import foo >>> x = 37 >>> def helloworld(): ... return "Hello World. x is %d" % x ... >>> foo.callf(helloworld) # Pass a function as an argument 'Hello World. x is 37' >>>
In this example, notice how the function helloworld() uses the value of x that’s defined in the same environment as where helloworld() was defined. Thus, even though there is also an x defined in foo.py and that’s where helloworld() is actually being called, that value of x is not the one that’s used when helloworld() executes.
When the statements that make up a function are packaged together with the environment in which they execute, the resulting object is known as a closure. The behavior of the previous example is explained by the fact that all functions have a __globals__ attribute that points to the global namespace in which the function was defined. This always corresponds to the enclosing module in which a function was defined. For the previous example, you get the following:
>>> helloworld.__globals__ {'__builtins__': <module '__builtin__' (built-in)>, 'helloworld': <function helloworld at 0x7bb30>, 'x': 37, '__name__': '__main__', '__doc__': None 'foo': <module 'foo' from 'foo.py'>} >>>
When nested functions are used, closures capture the entire environment needed for the inner function to execute. Here is an example:
import foo def bar(): x = 13 def helloworld(): return "Hello World. x is %d" % x foo.callf(helloworld) # returns 'Hello World, x is 13'
Closures and nested functions are especially useful if you want to write code based on the concept of lazy or delayed evaluation. Here is another example:
from urllib import urlopen # from urllib.request import urlopen (Python 3) def page(url): def get(): return urlopen(url).read() return get
In this example, the page() function doesn’t actually carry out any interesting computation. Instead, it merely creates and returns a function get() that will fetch the contents of a web page when it is called. Thus, the computation carried out in get() is actually delayed until some later point in a program when get() is evaluated. For example:
>>> python = page("https://www.python.org") >>> jython = page("https://www.jython.org") >>> python <function get at 0x95d5f0> >>> jython <function get at 0x9735f0> >>> pydata = python() # Fetches https://www.python.org >>> jydata = jython() # Fetches https://www.jython.org >>>
In this example, the two variables python and jython are actually two different versions of the get() function. Even though the page() function that created these values is no longer executing, both get() functions implicitly carry the values of the outer variables that were defined when the get() function was created. Thus, when get() executes, it calls urlopen(url) with the value of url that was originally supplied to page(). With a little inspection, you can view the contents of variables that are carried along in a closure. For example:
>>> python.__closure__ (<cell at 0x67f50: str object at 0x69230>,) >>> python.__closure__[0].cell_contents 'https://www.python.org' >>> jython.__closure__[0].cell_contents 'https://www.jython.org' >>>
A closure can be a highly efficient way to preserve state across a series of function calls. For example, consider this code that runs a simple counter:
def countdown(n): def next(): nonlocal n r = n n -= 1 return r return next # Example use next = countdown(10) while True: v = next() # Get the next value if not v: break
In this code, a closure is being used to store the internal counter value n. The inner function next() updates and returns the previous value of this counter variable each time it is called.
The fact that closures capture the environment of inner functions also make them useful for applications where you want to wrap existing functions in order to add extra capabilities. This is described next.
Decorators
A decorator is a function whose primary purpose is to wrap another function. The primary purpose of this wrapping is to transparently alter or enhance the behavior of the object being wrapped. Syntactically, decorators are denoted using the special @ symbol as follows:
@trace def square(x): return x*x
The preceding code is shorthand for the following:
def square(x): return x*x square = trace(square)
In the example, a function square() is defined. However, immediately after its definition, the function object itself is passed to the function trace(), which returns an object that replaces the original square. Now, let’s consider an implementation of trace that will clarify how this might be useful:
enable_tracing = True if enable_tracing: debug_log = open("debug.log","w") def trace(func): if enable_tracing: def callf(*args,**kwargs): debug_log.write("Calling %s: %s, %s\n" % (func.__name__, args, kwargs)) r = func(*args,**kwargs) debug_log.write("%s returned %s\n" % (func.__name, r)) return r return callf else: return func
In this code, trace() creates a wrapper function that writes some debugging output and then calls the original function object. Thus, if you call square(), you will see the output of the write() methods in the wrapper. The function callf that is returned from trace() is a closure that serves as a replacement for the original function. A final interesting aspect of the implementation is that the tracing feature itself is only enabled through the use of a global variable enable_tracing as shown. If set to False, the trace() decorator simply returns the original function unmodified. Thus, when tracing is disabled, there is no added performance penalty associated with using the decorator.
When decorators are used, they must appear on their own line immediately prior to a function or class definition. More than one decorator can also be applied. Here’s an example:
@foo @bar @spam def grok(x): pass
In this case, the decorators are applied in the order listed. The result is the same as this:
def grok(x): pass grok = foo(bar(spam(grok)))
Decorators can interact strangely with other aspects of functions such as recursion, documentation strings, and function attributes. These issues are described later in this chapter.
Generators and yield
If a function uses the yield keyword, it defines an object known as a generator. A generator is a function that produces a sequence of values for use in iteration. Here’s an example:
def countdown(n): print("Counting down from %d" % n) while n > 0: yield n n -= 1 return
If you call this function, you will find that none of its code starts executing. For example:
>>> c = countdown(10) >>>
Instead, a generator object is returned. The generator object, in turn, executes the function whenever next() is called (or __next__() in Python 3). Here’s an example:
>>> c.next() # Use c.__next__() in Python 3 Counting down from 10 10 >>> c.next() 9
When next() is invoked, the generator function executes statements until it reaches a yield statement. The yield statement produces a result at which point execution of the function stops until next() is invoked again. Execution then resumes with the statement following yield.
You normally don’t call next() directly on a generator but use it with the for statement, sum(), or some other operation that consumes a sequence. For example:
for n in countdown(10): statements a = sum(countdown(10))
A generator function signals completion by returning or raising StopIteration, at which point iteration stops. It is never legal for a generator to return a value other than None upon completion.
Coroutines and yield Expressions
Inside a function, the yield statement can also be used as an expression that appears on the right side of an assignment operator. For example:
def receiver(): print("Ready to receive") while True: n = (yield) print("Got %s" % n)
A function that uses yield in this manner is known as a coroutine, and it executes in response to values being sent to it. Its behavior is also very similar to a generator. For example:
>>> r = receiver() >>> r.next() # Advance to first yield (r.__next__() in Python 3) Ready to receive >>> r.send(1) Got 1 >>> r.send(2) Got 2 >>> r.send("Hello") Got Hello >>>
In this example, the initial call to next() is necessary so that the coroutine executes statements leading to the first yield expression. At this point, the coroutine suspends, waiting for a value to be sent to it using the send() method of the associated generator object r. The value passed to send() is returned by the (yield) expression in the coroutine. Upon receiving a value, a coroutine executes statements until the next yield statement is encountered.
The requirement of first calling next() on a coroutine is easily overlooked and a common source of errors. Therefore, it is recommended that coroutines be wrapped with a decorator that automatically takes care of this step.
def coroutine(func): def start(*args,**kwargs): g = func(*args,**kwargs) g.next() return g return start
Using this decorator, you would write and use coroutines using:
@coroutine def receiver(): print("Ready to receive") while True: n = (yield) print("Got %s" % n) # Example use r = receiver() r.send("Hello World") # Note : No initial .next() needed
A coroutine will typically run indefinitely unless it is explicitly shut down or it exits on its own. To close the stream of input values, use the close() method like this:
>>> r.close() >>> r.send(4) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Once closed, a StopIteration exception will be raised if further values are sent to a coroutine. The close() operation raises GeneratorExit inside the coroutine For example:
def receiver(): print("Ready to receive") try: while True: n = (yield) print("Got %s" % n) except GeneratorExit: print("Receiver done")
Exceptions can be raised inside a coroutine using the throw(exctype [, value [, tb]]) method where exctype is an exception type, value is the exception value, and tb is a traceback object. For example:
>>> r.throw(RuntimeError,"You're hosed!") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in receiver RuntimeError: You're hosed!
Exceptions raised in this manner will originate at the currently executing yield statement in the coroutine. A coroutine can elect to catch exceptions and handle them as appropriate. It is not safe to use throw() as an asynchronous signal to a coroutine—it should never be invoked from a separate execution thread or in a signal handler.
Using Generators and Coroutines
At first glance, it might not be obvious how to use generators and coroutines for practical problems. However, generators and coroutines can be particularly effective when applied to certain kinds of programming problems in systems, networking, and distributed computation. For example, generator functions are useful if you want to set up a processing pipeline, similar in nature to using a pipe in the UNIX shell. Here is an example involving a set of generator functions related to finding, opening, reading, and processing files:
import os import fnmatch def find_files(topdir, pattern): for path, dirname, filelist in os.walk(topdir): for name in filelist: if fnmatch.fnmatch(name, pattern): yield os.path.join(path,name) import gzip, bz2 def opener(filenames): for name in filenames: if name.endswith(".gz"): f = gzip.open(name) elif name.endswith(".bz2"): f = bz2.BZ2File(name) else: f = open(name) yield f def cat(filelist): for f in filelist: for line in f: yield line def grep(pattern, lines): for line in lines: if pattern in line: yield line
Here is an example of using these functions to set up a processing pipeline:
wwwlogs = find("www","access-log*") files = opener(wwwlogs) lines = cat(files) pylines = grep("python", lines) for line in pylines: sys.stdout.write(line)
In this example, the program is processing all lines in all "access-log*" files found within all subdirectories of a top-level directory "www". Each "access-log" is tested for file compression and opened using an appropriate file opener. Lines are concatenated together and processed through a filter that is looking for a substring "python". The entire program is being driven by the for statement at the end. Each iteration of this loop pulls a new value through the pipeline and consumes it. Moreover, the implementation is highly memory-efficient because no temporary lists or other large data structures are ever created.
Coroutines can be used to write programs based on data-flow processing. Programs organized in this way look like inverted pipelines. Instead of pulling values through a sequence of generator functions using a for loop, you send values into a collection of linked coroutines. Here is an example of coroutine functions written to mimic the generator functions shown previously:
import os import fnmatch @coroutine def find_files(target): while True: topdir, pattern = (yield) for path, dirname, filelist in os.walk(topdir): for name in filelist: if fnmatch.fnmatch(name,pattern): target.send(os.path.join(path,name)) import gzip, bz2 @coroutine def opener(target): while True: name = (yield) if name.endswith(".gz"): f = gzip.open(name) elif name.endswith(".bz2"): f = bz2.BZ2File(name) else: f = open(name) target.send(f) @coroutine def cat(target): while True: f = (yield) for line in f: target.send(line) @coroutine def grep(pattern, target): while True: line = (yield) if pattern in line: target.send(line) @coroutine def printer(): while True: line = (yield) sys.stdout.write(line)
Here is how you would link these coroutines to create a dataflow processing pipeline:
finder = find_files(opener(cat(grep("python",printer())))) # Now, send a value finder.send(("www","access-log*")) finder.send(("otherwww","access-log*"))
In this example, each coroutine sends data to another coroutine specified in the target argument to each coroutine. Unlike the generator example, execution is entirely driven by pushing data into the first coroutine find_files(). This coroutine, in turn, pushes data to the next stage. A critical aspect of this example is that the coroutine pipeline remains active indefinitely or until close() is explicitly called on it. Because of this, a program can continue to feed data into a coroutine for as long as necessary—for example, the two repeated calls to send() shown in the example.
Coroutines can be used to implement a form of concurrency. For example, a centralized task manager or event loop can schedule and send data into a large collection of hundreds or even thousands of coroutines that carry out various processing tasks. The fact that input data is “sent” to a coroutine also means that coroutines can often be easily mixed with programs that use message queues and message passing to communicate between program components. Further information on this can be found in Chapter 20, “Threads and Concurrency.”
The lambda Operator
Anonymous functions in the form of an expression can be created using the lambda statement:
lambda args : expression
args is a comma-separated list of arguments, and expression is an expression involving those arguments. Here’s an example:
a = lambda x,y : x+y r = a(2,3) # r gets 5
The code defined with lambda must be a valid expression. Multiple statements and other non-expression statements, such as for and while, cannot appear in a lambda statement. lambda expressions follow the same scoping rules as functions.
The primary use of lambda is in specifying short callback functions. For example, if you wanted to sort a list of names with case-insensitivity, you might write this:
names.sort(key=lambda n: n.lower())
Recursion
Recursive functions are easily defined. For example:
def factorial(n): if n <= 1: return 1 else: return n * factorial(n - 1)
However, be aware that there is a limit on the depth of recursive function calls. The function sys.getrecursionlimit() returns the current maximum recursion depth, and the function sys.setrecursionlimit() can be used to change the value. The default value is 1000. Although it is possible to increase the value, programs are still limited by the stack size limits enforced by the host operating system. When the recursion depth is exceeded, a RuntimeError exception is raised. Python does not perform tail-recursion optimization that you often find in functional languages such as Scheme.
Recursion does not work as you might expect in generator functions and coroutines. For example, this code prints all items in a nested collection of lists:
def flatten(lists): for s in lists: if isinstance(s,list): flatten(s) else: print(s) items = [[1,2,3],[4,5,[5,6]],[7,8,9]] flatten(items) # Prints 1 2 3 4 5 6 7 8 9
However, if you change the print operation to a yield, it no longer works. This is because the recursive call to flatten() merely creates a new generator object without actually iterating over it. Here’s a recursive generator version that works:
def genflatten(lists): for s in lists: if isinstance(s,list): for item in genflatten(s): yield item else: yield item
Care should also be taken when mixing recursive functions and decorators. If a decorator is applied to a recursive function, all inner recursive calls now get routed through the decorated version. For example:
@locked def factorial(n): if n <= 1: return 1 else: return n * factorial(n - 1) # Calls the wrapped version of factorial
If the purpose of the decorator was related to some kind of system management such as synchronization or locking, recursion is something probably best avoided.
Documentation Strings
It is common practice for the first statement of function to be a documentation string describing its usage. For example:
def factorial(n): """Computes n factorial. For example: >>> factorial(6) 120 >>> """ if n <= 1: return 1 else: return n*factorial(n-1)
The documentation string is stored in the __doc__ attribute of the function that is commonly used by IDEs to provide interactive help.
If you are using decorators, be aware that wrapping a function with a decorator can break the help features associated with documentation strings. For example, consider this code:
def wrap(func): call(*args,**kwargs): return func(*args,**kwargs) return call @wrap def factorial(n): """Computes n factorial.""" ...
If a user requests help on this version of factorial(), he will get a rather cryptic explanation:
>>> help(factorial) Help on function call in module __main__: call(*args, **kwargs) (END) >>>
To fix this, write decorator functions so that they propagate the function name and documentation string. For example:
def wrap(func): call(*args,**kwargs): return func(*args,**kwargs) call.__doc__ = func.__doc__ call.__name__ = func.__name__ return call
Because this is a common problem, the functools module provides a function wraps that can automatically copy these attributes. Not surprisingly, it is also a decorator:
from functools import wraps def wrap(func): @wraps(func) call(*args,**kwargs): return func(*args,**kwargs) return call
The @wraps(func) decorator, defined in functools, propagates attributes from func to the wrapper function that is being defined.
Function Attributes
Functions can have arbitrary attributes attached to them. Here’s an example:
def foo(): statements foo.secure = 1 foo.private = 1
Function attributes are stored in a dictionary that is available as the __dict__ attribute of a function.
The primary use of function attributes is in highly specialized applications such as parser generators and application frameworks that would like to attach additional information to function objects.
As with documentation strings, care should be given if mixing function attributes with decorators. If a function is wrapped by a decorator, access to the attributes will actually take place on the decorator function, not the original implementation. This may or may not be what you want depending on the application. To propagate already defined function attributes to a decorator function, use the following template or the functools.wraps() decorator as shown in the previous section:
def wrap(func): call(*args,**kwargs): return func(*args,**kwargs) call.__doc__ = func.__doc__ call.__name__ = func.__name__ call.__dict__.update(func.__dict__) return call
© Copyright Pearson Education. All rights reserved.