Real Python2019-09-04T14:00:00+00:00https://realpython.com/Real PythonPython args and kwargs: Demystifiedhttps://realpython.com/python-kwargs-and-args/2019-09-04T14:00:00+00:00In this step-by-step tutorial, you'll learn how to use args and kwargs in Python to add more flexibility to your functions. You'll also take a closer look at the single and double-asterisk unpacking operators, which you can use to unpack any iterable object in Python.
<p>Sometimes, when you look at a function definition in Python, you might see that it takes two strange arguments: <strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong>. If you’ve ever wondered what these peculiar variables are, or why your IDE defines them in <code>main()</code>, then this article is for you. You’ll learn how to use args and kwargs in Python to add more flexibility to your functions.</p>
<p><strong>By the end of the article, you’ll know:</strong></p>
<ul>
<li>What <code>*args</code> and <code>**kwargs</code> actually mean</li>
<li>How to use <code>*args</code> and <code>**kwargs</code> in function definitions</li>
<li>How to use a single asterisk (<code>*</code>) to unpack iterables</li>
<li>How to use two asterisks (<code>**</code>) to unpack dictionaries</li>
</ul>
<p>This article assumes that you already know how to define Python functions and work with <a href="https://realpython.com/lessons/mutable-data-structures-lists-dictionaries/">lists and dictionaries</a>.</p>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-cheat-sheet-shortened" data-focus="false">Click here to get a Python Cheat Sheet</a> and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.</p></div>
<h2 id="passing-multiple-arguments-to-a-function">Passing Multiple Arguments to a Function</h2>
<p><strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong> allow you to pass multiple arguments or keyword arguments to a function. Consider the following example. This is a simple function that takes two arguments and returns their sum:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
</pre></div>
<p>This function works fine, but it’s limited to only two arguments. What if you need to sum a varying number of arguments, where the specific number of arguments passed is only determined at runtime? Wouldn’t it be great to create a function that could sum <em>all</em> the integers passed to it, no matter how many there are?</p>
<h2 id="using-the-python-args-variable-in-function-definitions">Using the Python args Variable in Function Definitions</h2>
<p>There are a few ways you can pass a varying number of arguments to a function. The first way is often the most intuitive for people that have experience with collections. You simply pass a list or a <a href="https://realpython.com/python-sets/">set</a> of all the arguments to your function. So for <code>my_sum()</code>, you could pass a list of all the integers you need to add:</p>
<div class="highlight python"><pre><span></span><span class="c1"># sum_integers_list.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">my_integers</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">my_integers</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
<span class="k">return</span> <span class="n">result</span>
<span class="n">list_of_integers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="n">list_of_integers</span><span class="p">))</span>
</pre></div>
<p>This implementation works, but whenever you call this function you’ll also need to create a list of arguments to pass to it. This can be inconvenient, especially if you don’t know up front all the values that should go into the list.</p>
<p>This is where <code>*args</code> can be really useful, because it allows you to pass a varying number of positional arguments. Take the following example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1"># Iterating over the Python args tuple</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
<span class="k">return</span> <span class="n">result</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</pre></div>
<p>In this example, you’re no longer passing a list to <code>my_sum()</code>. Instead, you’re passing three different positional arguments. <code>my_sum()</code> takes all the parameters that are provided in the input and packs them all into a single iterable object named <code>args</code>.</p>
<p>Note that <strong><code>args</code> is just a name.</strong> You’re not required to use the name <code>args</code>. You can choose any name that you prefer, such as <code>integers</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args_2.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">integers</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">integers</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
<span class="k">return</span> <span class="n">result</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</pre></div>
<p>The function still works, even if you pass the iterable object as <code>integers</code> instead of <code>args</code>. All that matters here is that you use the <strong>unpacking operator</strong> (<code>*</code>).</p>
<p>Bear in mind that the iterable object you’ll get using the unpacking operator <code>*</code> is <a href="https://realpython.com/python-lists-tuples/">not a <code>list</code> but a <code>tuple</code></a>. A <code>tuple</code> is similar to a <code>list</code> in that they both support slicing and iteration. However, tuples are very different in at least one aspect: lists are <a href="https://realpython.com/courses/immutability-python/">mutable</a>, while tuples are not. To test this, run the following code. This script tries to change a value of a list:</p>
<div class="highlight python"><pre><span></span><span class="c1"># change_list.py</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">my_list</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">9</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span>
</pre></div>
<p>The value located at the very first index of the list should be updated to <code>9</code>. If you execute this script, you will see that the list indeed gets modified:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python change_list.py
<span class="go">[9, 2, 3]</span>
</pre></div>
<p>The first value is no longer <code>0</code>, but the updated value <code>9</code>. Now, try to do the same with a tuple:</p>
<div class="highlight python"><pre><span></span><span class="c1"># change_tuple.py</span>
<span class="n">my_tuple</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">my_tuple</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">9</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_tuple</span><span class="p">)</span>
</pre></div>
<p>Here, you see the same values, except they’re held together as a tuple. If you try to execute this script, you will see that the Python interpreter returns an <a href="https://realpython.com/python-exceptions/">error</a>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python change_tuple.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "change_tuple.py", line 3, in <module></span>
<span class="go"> my_tuple[0] = 9</span>
<span class="go">TypeError: 'tuple' object does not support item assignment</span>
</pre></div>
<p>This is because a tuple is an immutable object, and its values cannot be changed after assignment. Keep this in mind when you’re working with tuples and <code>*args</code>.</p>
<h2 id="using-the-python-kwargs-variable-in-function-definitions">Using the Python kwargs Variable in Function Definitions</h2>
<p>Okay, now you’ve understood what <code>*args</code> is for, but what about <code>**kwargs</code>? <code>**kwargs</code> works just like <code>*args</code>, but instead of accepting positional arguments it accepts keyword (or <strong>named</strong>) arguments. Take the following example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># concatenate.py</span>
<span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="s2">""</span>
<span class="c1"># Iterating over the Python kwargs dictionary</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
<span class="k">return</span> <span class="n">result</span>
<span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">"Real"</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">"Python"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">"Is"</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">"Great"</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">"!"</span><span class="p">))</span>
</pre></div>
<p>When you execute the script above, <code>concatenate()</code> will iterate through the Python kwargs <a href="https://realpython.com/python-dicts/">dictionary</a> and concatenate all the values it finds:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python concatenate.py
<span class="go">RealPythonIsGreat!</span>
</pre></div>
<p>Like <code>args</code>, <code>kwargs</code> is just a name that can be changed to whatever you want. Again, what is important here is the use of the <strong>unpacking operator</strong> (<code>**</code>).</p>
<p>So, the previous example could be written like this:</p>
<div class="highlight python"><pre><span></span><span class="c1"># concatenate_2.py</span>
<span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">words</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="s2">""</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">words</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
<span class="k">return</span> <span class="n">result</span>
<span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">"Real"</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">"Python"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">"Is"</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">"Great"</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">"!"</span><span class="p">))</span>
</pre></div>
<p>Note that in the example above the iterable object is a standard <code>dict</code>. If you <a href="https://realpython.com/iterate-through-dictionary-python/">iterate over the dictionary</a> and want to return its values, like in the example shown, then you must use <code>.values()</code>.</p>
<p>In fact, if you forget to use this method, you will find yourself iterating through the <strong>keys</strong> of your Python kwargs dictionary instead, like in the following example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># concatenate_keys.py</span>
<span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="s2">""</span>
<span class="c1"># Iterating over the keys of the Python kwargs dictionary</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
<span class="k">return</span> <span class="n">result</span>
<span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">"Real"</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">"Python"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">"Is"</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">"Great"</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">"!"</span><span class="p">))</span>
</pre></div>
<p>Now, if you try to execute this example, you’ll notice the following output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python concatenate_keys.py
<span class="go">abcde</span>
</pre></div>
<p>As you can see, if you don’t specify <code>.values()</code>, your function will iterate over the keys of your Python kwargs dictionary, returning the wrong result.</p>
<h2 id="ordering-arguments-in-a-function">Ordering Arguments in a Function</h2>
<p>Now that you have learned what <code>*args</code> and <code>**kwargs</code> are for, you are ready to start writing functions that take a varying number of input arguments. But what if you want to create a function that takes a changeable number of both positional <em>and</em> named arguments?</p>
<p>In this case, you have to bear in mind that <strong>order counts</strong>. Just as non-default arguments have to precede default arguments, so <code>*args</code> must come before <code>**kwargs</code>.</p>
<p>To recap, the correct order for your parameters is:</p>
<ol>
<li>Standard arguments</li>
<li><code>*args</code> arguments</li>
<li><code>**kwargs</code> arguments</li>
</ol>
<p>For example, this function definition is correct:</p>
<div class="highlight python"><pre><span></span><span class="c1"># correct_function_definition.py</span>
<span class="k">def</span> <span class="nf">my_function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>The <code>*args</code> variable is appropriately listed before <code>**kwargs</code>. But what if you try to modify the order of the arguments? For example, consider the following function:</p>
<div class="highlight python"><pre><span></span><span class="c1"># wrong_function_definition.py</span>
<span class="k">def</span> <span class="nf">my_function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>Now, <code>**kwargs</code> comes before <code>*args</code> in the function definition. If you try to run this example, you’ll receive an error from the interpreter:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python wrong_function_definition.py
<span class="go"> File "wrong_function_definition.py", line 2</span>
<span class="go"> def my_function(a, b, **kwargs, *args):</span>
<span class="go"> ^</span>
<span class="go">SyntaxError: invalid syntax</span>
</pre></div>
<p>In this case, since <code>*args</code> comes after <code>**kwargs</code>, the Python interpreter throws a <code>SyntaxError</code>.</p>
<h2 id="unpacking-with-the-asterisk-operators">Unpacking With the Asterisk Operators: <code>*</code> & <code>**</code></h2>
<p>You are now able to use <code>*args</code> and <code>**kwargs</code> to define Python functions that take a varying number of input arguments. Let’s go a little deeper to understand something more about the <strong>unpacking operators</strong>.</p>
<p>The single and double asterisk unpacking operators were introduced in Python 2. As of the 3.5 release, they have become even more powerful, thanks to <a href="https://www.python.org/dev/peps/pep-0448/">PEP 448</a>. In short, the unpacking operators are operators that unpack the values from iterable objects in Python. The single asterisk operator <code>*</code> can be used on any iterable that Python provides, while the double asterisk operator <code>**</code> can only be used on dictionaries.</p>
<p>Let’s start with an example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># print_list.py</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span>
</pre></div>
<p>This code defines a list and then prints it to the standard output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python print_list.py
<span class="go">[1, 2, 3]</span>
</pre></div>
<p>Note how the list is printed, along with the corresponding brackets and commas.</p>
<p>Now, try to prepend the unpacking operator <code>*</code> to the name of your list:</p>
<div class="highlight python"><pre><span></span><span class="c1"># print_unpacked_list.py</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
</pre></div>
<p>Here, the <code>*</code> operator tells <code>print()</code> to unpack the list first.</p>
<p>In this case, the output is no longer the list itself, but rather <em>the content</em> of the list:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python print_unpacked_list.py
<span class="go">1 2 3</span>
</pre></div>
<p>Can you see the difference between this execution and the one from <code>print_list.py</code>? Instead of a list, <code>print()</code> has taken three separate arguments as the input.</p>
<p>Another thing you’ll notice is that in <code>print_unpacked_list.py</code>, you used the unpacking operator <code>*</code> to call a function, instead of in a function definition. In this case, <code>print()</code> takes all the items of a list as though they were single arguments.</p>
<p>You can also use this method to call your own functions, but if your function requires a specific number of arguments, then the iterable you unpack must have the same number of arguments.</p>
<p>To test this behavior, consider this script:</p>
<div class="highlight python"><pre><span></span><span class="c1"># unpacking_call.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
</pre></div>
<p>Here, <code>my_sum()</code> explicitly states that <code>a</code>, <code>b</code>, and <code>c</code> are required arguments.</p>
<p>If you run this script, you’ll get the sum of the three numbers in <code>my_list</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python unpacking_call.py
<span class="go">6</span>
</pre></div>
<p>The 3 elements in <code>my_list</code> match up perfectly with the required arguments in <code>my_sum()</code>.</p>
<p>Now look at the following script, where <code>my_list</code> has 4 arguments instead of 3:</p>
<div class="highlight python"><pre><span></span><span class="c1"># wrong_unpacking_call.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
<span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
</pre></div>
<p>In this example, <code>my_sum()</code> still expects just three arguments, but the <code>*</code> operator gets 4 items from the list. If you try to execute this script, you’ll see that the Python interpreter is unable to run it:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python wrong_unpacking_call.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "wrong_unpacking_call.py", line 6, in <module></span>
<span class="go"> my_sum(*my_list)</span>
<span class="go">TypeError: my_sum() takes 3 positional arguments but 4 were given</span>
</pre></div>
<p>When you use the <code>*</code> operator to unpack a list and pass arguments to a function, it’s exactly as though you’re passing every single argument alone. This means that you can use multiple unpacking operators to get values from several lists and pass them all to a single function.</p>
<p>To test this behavior, consider the following example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args_3.py</span>
<span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
<span class="k">return</span> <span class="n">result</span>
<span class="n">list1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">list2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
<span class="n">list3</span> <span class="o">=</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">list1</span><span class="p">,</span> <span class="o">*</span><span class="n">list2</span><span class="p">,</span> <span class="o">*</span><span class="n">list3</span><span class="p">))</span>
</pre></div>
<p>If you run this example, all three lists are unpacked. Each individual item is passed to <code>my_sum()</code>, resulting in the following output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python sum_integers_args_3.py
<span class="go">45</span>
</pre></div>
<p>There are other convenient uses of the unpacking operator. For example, say you need to split a list into three different parts. The output should show the first value, the last value, and all the values in between. With the unpacking operator, you can do this in just one line of code:</p>
<div class="highlight python"><pre><span></span><span class="c1"># extract_list_body.py</span>
<span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="n">a</span><span class="p">,</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span> <span class="n">my_list</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</pre></div>
<p>In this example, <code>my_list</code> contains 6 items. The first variable is assigned to <code>a</code>, the last to <code>c</code>, and all other values are packed into a new list <code>b</code>. If you run the <a href="https://realpython.com/run-python-scripts/">script</a>, <code>print()</code> will show you that your three variables have the values you would expect:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python extract_list_body.py
<span class="go">1</span>
<span class="go">[2, 3, 4, 5]</span>
<span class="go">6</span>
</pre></div>
<p>Another interesting thing you can do with the unpacking operator <code>*</code> is to split the items of any iterable object. This could be very useful if you need to merge two lists, for instance:</p>
<div class="highlight python"><pre><span></span><span class="c1"># merging_lists.py</span>
<span class="n">my_first_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">my_second_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="n">my_merged_list</span> <span class="o">=</span> <span class="p">[</span><span class="o">*</span><span class="n">my_first_list</span><span class="p">,</span> <span class="o">*</span><span class="n">my_second_list</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_merged_list</span><span class="p">)</span>
</pre></div>
<p>The unpacking operator <code>*</code> is prepended to both <code>my_first_list</code> and <code>my_second_list</code>.</p>
<p>If you run this script, you’ll see that the result is a merged list:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python merging_lists.py
<span class="go">[1, 2, 3, 4, 5, 6]</span>
</pre></div>
<p>You can even merge two different dictionaries by using the unpacking operator <code>**</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># merging_dicts.py</span>
<span class="n">my_first_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"A"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">"B"</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span>
<span class="n">my_second_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"C"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s2">"D"</span><span class="p">:</span> <span class="mi">4</span><span class="p">}</span>
<span class="n">my_merged_dict</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">my_first_dict</span><span class="p">,</span> <span class="o">**</span><span class="n">my_second_dict</span><span class="p">}</span>
<span class="nb">print</span><span class="p">(</span><span class="n">my_merged_dict</span><span class="p">)</span>
</pre></div>
<p>Here, the iterables to merge are <code>my_first_dict</code> and <code>my_second_dict</code>.</p>
<p>Executing this code outputs a merged dictionary:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python merging_dicts.py
<span class="go">{'A': 1, 'B': 2, 'C': 3, 'D': 4}</span>
</pre></div>
<p>Remember that the <code>*</code> operator works on <em>any</em> iterable object. It can also be used to unpack a <a href="https://realpython.com/python-strings/">string</a>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># string_to_list.py</span>
<span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="o">*</span><span class="s2">"RealPython"</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
</pre></div>
<p>In Python, strings are iterable objects, so <code>*</code> will unpack it and place all individual values in a list <code>a</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python string_to_list.py
<span class="go">['R', 'e', 'a', 'l', 'P', 'y', 't', 'h', 'o', 'n']</span>
</pre></div>
<p>The previous example seems great, but when you work with these operators it’s important to keep in mind the seventh rule of <a href="https://www.python.org/dev/peps/pep-0020/"><em>The Zen of Python</em></a> by Tim Peters: <em>Readability counts</em>.</p>
<p>To see why, consider the following example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># mysterious_statement.py</span>
<span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="o">=</span> <span class="s2">"RealPython"</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
</pre></div>
<p>There’s the unpacking operator <code>*</code>, followed by a variable, a comma, and an assignment. That’s a lot packed into one line! In fact, this code is no different from the previous example. It just takes the string <code>RealPython</code> and assigns all the items to the new list <code>a</code>, thanks to the unpacking operator <code>*</code>.</p>
<p>The comma after the <code>a</code> does the trick. When you use the unpacking operator with variable assignment, Python requires that your resulting variable is either a list or a tuple. With the trailing comma, you have actually defined a tuple with just one named variable <code>a</code>.</p>
<p>While this is a neat trick, many Pythonistas would not consider this code to be very readable. As such, it’s best to use these kinds of constructions sparingly.</p>
<h2 id="conclusion">Conclusion</h2>
<p>You are now able to use <strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong> to accept a changeable number of arguments in your functions. You have also learned something more about the unpacking operators. </p>
<p>You’ve learned:</p>
<ul>
<li>What <code>*args</code> and <code>**kwargs</code> actually mean</li>
<li>How to use <code>*args</code> and <code>**kwargs</code> in function definitions</li>
<li>How to use a single asterisk (<code>*</code>) to unpack iterables</li>
<li>How to use two asterisks (<code>**</code>) to unpack dictionaries</li>
</ul>
<p>If you still have questions, don’t hesitate to reach out in the comments section below! To learn more about the use of the asterisks in Python, have a look at <a href="https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/">Trey Hunner’s article on the subject</a>.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Lists and Tuples in Pythonhttps://realpython.com/courses/lists-tuples-python/2019-09-03T14:00:00+00:00In this course, you'll cover the important characteristics of lists and tuples in Python 3. You'll learn how to define them and how to manipulate them. When you're finished, you'll have a good feel for when and how to use these object types in a Python program.
<p>In this course, you’ll learn about working with lists and tuples. <strong>Lists</strong> and <strong>tuples</strong> are arguably Python’s most versatile, useful <a href="https://realpython.com/python-data-types/">data types</a>. You’ll find them in virtually every non-trivial Python program.</p>
<p><strong>Here’s what you’ll learn in this tutorial:</strong> You’ll cover the important characteristics of lists and tuples. You’ll learn how to define them and how to manipulate them. When you’re finished, you’ll have a good feel for when and how to use these object types in a Python program.</p>
<div class="alert alert-primary" role="alert">
<p><strong><i class="fa fa-graduation-cap" aria-hidden="true"></i> Take the Quiz:</strong> Test your knowledge with our interactive โPython Lists and Tuplesโ quiz. Upon completion you will receive a score so you can track your learning progress over time:</p><p class="text-center my-2"><a class="btn btn-primary" href="/quizzes/python-lists-tuples/" target="_blank">Take the Quiz ยป</a></p>
</div>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Natural Language Processing With spaCy in Pythonhttps://realpython.com/natural-language-processing-spacy-python/2019-09-02T14:00:00+00:00In this step-by-step tutorial, you'll learn how to use spaCy. This free and open-source library for Natural Language Processing (NLP) in Python has a lot of built-in capabilities and is becoming increasingly popular for processing and analyzing data in NLP.
<p><strong>spaCy</strong> is a free and open-source library for <strong>Natural Language Processing</strong> (NLP) in Python with a lot of in-built capabilities. It’s becoming increasingly popular for processing and analyzing data in NLP. Unstructured textual data is produced at a large scale, and it’s important to process and derive insights from unstructured data. To do that, you need to represent the data in a format that can be understood by computers. NLP can help you do that.</p>
<p><strong>In this tutorial, you’ll learn:</strong></p>
<ul>
<li>What the foundational terms and concepts in NLP are</li>
<li>How to implement those concepts in spaCy</li>
<li>How to customize and extend built-in functionalities in spaCy</li>
<li>How to perform basic statistical analysis on a text</li>
<li>How to create a pipeline to process unstructured text</li>
<li>How to parse a sentence and extract meaningful insights from it</li>
</ul>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-tricks-sample" data-focus="false">Click here to get access to a chapter from Python Tricks: The Book</a> that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.</p></div>
<h2 id="what-are-nlp-and-spacy">What Are NLP and spaCy?</h2>
<p><strong>NLP</strong> is a subfield of <strong>Artificial Intelligence</strong> and is concerned with interactions between computers and human languages. NLP is the process of analyzing, understanding, and deriving meaning from human languages for computers.</p>
<p>NLP helps you extract insights from unstructured text and has several use cases, such as:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Automatic_summarization">Automatic summarization</a></li>
<li><a href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named entity recognition</a></li>
<li><a href="https://en.wikipedia.org/wiki/Question_answering">Question answering systems</a></li>
<li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment analysis</a></li>
</ul>
<p>spaCy is a free, open-source library for NLP in Python. It’s written in <a href="https://cython.org/">Cython</a> and is designed to build information extraction or natural language understanding systems. It’s built for production use and provides a concise and user-friendly API.</p>
<h2 id="installation">Installation</h2>
<p>In this section, you’ll install spaCy and then download data and models for the English language.</p>
<h3 id="how-to-install-spacy">How to Install spaCy</h3>
<p>spaCy can be installed using <strong><code>pip</code></strong>, a Python package manager. You can use a <strong>virtual environment</strong> to avoid depending on system-wide packages. To learn more about virtual environments and <code>pip</code>, check out <a href="https://realpython.com/what-is-pip/">What Is Pip? A Guide for New Pythonistas</a> and <a href="https://realpython.com/python-virtual-environments-a-primer/">Python Virtual Environments: A Primer</a>.</p>
<p>Create a new virtual environment:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python3 -m venv env
</pre></div>
<p>Activate this virtual environment and install spaCy:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">source</span> ./env/bin/activate
<span class="gp">$</span> pip install spacy
</pre></div>
<h3 id="how-to-download-models-and-data">How to Download Models and Data</h3>
<p>spaCy has <a href="https://spaCy.io/models">different types</a> of models. The default model for the English language is <code>en_core_web_sm</code>.</p>
<p>Activate the virtual environment created in the previous step and download models and data for the English language:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python -m spacy download en_core_web_sm
</pre></div>
<p>Verify if the download was successful or not by loading it:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">spacy</span>
<span class="gp">>>> </span><span class="n">nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'en_core_web_sm'</span><span class="p">)</span>
</pre></div>
<p>If the <code>nlp</code> object is created, then it means that spaCy was installed and that models and data were successfully downloaded.</p>
<h2 id="using-spacy">Using spaCy</h2>
<p>In this section, you’ll use spaCy for a given input string and a text file. Load the language model instance in spaCy:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">spacy</span>
<span class="gp">>>> </span><span class="n">nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'en_core_web_sm'</span><span class="p">)</span>
</pre></div>
<p>Here, the <code>nlp</code> object is a language model instance. You can assume that, throughout this tutorial, <code>nlp</code> refers to the language model loaded by <code>en_core_web_sm</code>. Now you can use spaCy to read a string or a text file.</p>
<h3 id="how-to-read-a-string">How to Read a String</h3>
<p>You can use spaCy to create a processed <a href="https://spaCy.io/api/doc">Doc</a> object, which is a container for accessing linguistic annotations, for a given input string:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">introduction_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'This tutorial is about Natural'</span>
<span class="gp">... </span> <span class="s1">' Language Processing in Spacy.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">introduction_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">introduction_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Extract tokens for the given doc</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">introduction_doc</span><span class="p">])</span>
<span class="go">['This', 'tutorial', 'is', 'about', 'Natural', 'Language',</span>
<span class="go">'Processing', 'in', 'Spacy', '.']</span>
</pre></div>
<p>In the above example, notice how the text is converted to an object that is understood by spaCy. You can use this method to convert any text into a processed <code>Doc</code> object and deduce attributes, which will be covered in the coming sections.</p>
<h3 id="how-to-read-a-text-file">How to Read a Text File</h3>
<p>In this section, you’ll create a processed <a href="https://spaCy.io/api/doc">Doc</a> object for a text file:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">file_name</span> <span class="o">=</span> <span class="s1">'introduction.txt'</span>
<span class="gp">>>> </span><span class="n">introduction_file_text</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">introduction_file_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">introduction_file_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Extract tokens for the given doc</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">introduction_file_doc</span><span class="p">])</span>
<span class="go">['This', 'tutorial', 'is', 'about', 'Natural', 'Language',</span>
<span class="go">'Processing', 'in', 'Spacy', '.', '\n']</span>
</pre></div>
<p>This is how you can convert a text file into a processed <code>Doc</code> object.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> </p>
<p>You can assume that:</p>
<ul>
<li>Variable names ending with the suffix <strong><code>_text</code></strong> are <strong><a href="https://realpython.com/python-encodings-guide/">Unicode</a> string objects</strong>.</li>
<li>Variable name ending with the suffix <strong><code>_doc</code></strong> are <strong>spaCy’s language model objects</strong>.</li>
</ul>
</div>
<h2 id="sentence-detection">Sentence Detection</h2>
<p><strong>Sentence Detection</strong> is the process of locating the start and end of sentences in a given text. This allows you to you divide a text into linguistically meaningful units. You’ll use these units when you’re processing your text to perform tasks such as <strong>part of speech tagging</strong> and <strong>entity extraction</strong>.</p>
<p>In spaCy, the <code>sents</code> property is used to extract sentences. Here’s how you would extract the total number of sentences and the sentences for a given input text:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">about_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Gus Proto is a Python developer currently'</span>
<span class="gp">... </span> <span class="s1">' working for a London-based Fintech'</span>
<span class="gp">... </span> <span class="s1">' company. He is interested in learning'</span>
<span class="gp">... </span> <span class="s1">' Natural Language Processing.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">about_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">about_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">about_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="n">sentences</span><span class="p">)</span>
<span class="go">2</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">sentences</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">'Gus Proto is a Python developer currently working for a</span>
<span class="go">London-based Fintech company.'</span>
<span class="go">'He is interested in learning Natural Language Processing.'</span>
</pre></div>
<p>In the above example, spaCy is correctly able to identify sentences in the English language, using a full stop(<code>.</code>) as the sentence delimiter. You can also customize the sentence detection to detect sentences on custom delimiters.</p>
<p>Here’s an example, where an ellipsis(<code>...</code>) is used as the delimiter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">set_custom_boundaries</span><span class="p">(</span><span class="n">doc</span><span class="p">):</span>
<span class="gp">... </span> <span class="c1"># Adds support to use `...` as the delimiter for sentence detection</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="o">==</span> <span class="s1">'...'</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">doc</span><span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">is_sent_start</span> <span class="o">=</span> <span class="kc">True</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">doc</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">ellipsis_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Gus, can you, ... never mind, I forgot'</span>
<span class="gp">... </span> <span class="s1">' what I was saying. So, do you think'</span>
<span class="gp">... </span> <span class="s1">' we should ...'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Load a new model instance</span>
<span class="gp">>>> </span><span class="n">custom_nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'en_core_web_sm'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">add_pipe</span><span class="p">(</span><span class="n">set_custom_boundaries</span><span class="p">,</span> <span class="n">before</span><span class="o">=</span><span class="s1">'parser'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">custom_ellipsis_doc</span> <span class="o">=</span> <span class="n">custom_nlp</span><span class="p">(</span><span class="n">ellipsis_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">custom_ellipsis_sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">custom_ellipsis_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">custom_ellipsis_sentences</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus, can you, ...</span>
<span class="go">never mind, I forgot what I was saying.</span>
<span class="go">So, do you think we should ...</span>
<span class="gp">>>> </span><span class="c1"># Sentence Detection with no customization</span>
<span class="gp">>>> </span><span class="n">ellipsis_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">ellipsis_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ellipsis_sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">ellipsis_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">ellipsis_sentences</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus, can you, ... never mind, I forgot what I was saying.</span>
<span class="go">So, do you think we should ...</span>
</pre></div>
<p>Note that <code>custom_ellipsis_sentences</code> contain three sentences, whereas <code>ellipsis_sentences</code> contains two sentences. These sentences are still obtained via the <code>sents</code> attribute, as you saw before.</p>
<h2 id="tokenization-in-spacy">Tokenization in spaCy</h2>
<p><strong>Tokenization</strong> is the next step after sentence detection. It allows you to identify the basic units in your text. These basic units are called <strong>tokens</strong>. Tokenization is useful because it breaks a text into meaningful units. These units are used for further analysis, like part of speech tagging.</p>
<p>In spaCy, you can print tokens by iterating on the <code>Doc</code> object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">idx</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus 0</span>
<span class="go">Proto 4</span>
<span class="go">is 10</span>
<span class="go">a 13</span>
<span class="go">Python 15</span>
<span class="go">developer 22</span>
<span class="go">currently 32</span>
<span class="go">working 42</span>
<span class="go">for 50</span>
<span class="go">a 54</span>
<span class="go">London 56</span>
<span class="go">- 62</span>
<span class="go">based 63</span>
<span class="go">Fintech 69</span>
<span class="go">company 77</span>
<span class="go">. 84</span>
<span class="go">He 86</span>
<span class="go">is 89</span>
<span class="go">interested 92</span>
<span class="go">in 103</span>
<span class="go">learning 106</span>
<span class="go">Natural 115</span>
<span class="go">Language 123</span>
<span class="go">Processing 132</span>
<span class="go">. 142</span>
</pre></div>
<p>Note how spaCy preserves the <strong>starting index</strong> of the tokens. It’s useful for in-place word replacement. spaCy provides <a href="https://spacy.io/api/token#attributes">various attributes</a> for the <code>Token</code> class:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">idx</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">text_with_ws</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">token</span><span class="o">.</span><span class="n">is_alpha</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_space</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">token</span><span class="o">.</span><span class="n">shape_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus 0 Gus True False False Xxx False</span>
<span class="go">Proto 4 Proto True False False Xxxxx False</span>
<span class="go">is 10 is True False False xx True</span>
<span class="go">a 13 a True False False x True</span>
<span class="go">Python 15 Python True False False Xxxxx False</span>
<span class="go">developer 22 developer True False False xxxx False</span>
<span class="go">currently 32 currently True False False xxxx False</span>
<span class="go">working 42 working True False False xxxx False</span>
<span class="go">for 50 for True False False xxx True</span>
<span class="go">a 54 a True False False x True</span>
<span class="go">London 56 London True False False Xxxxx False</span>
<span class="go">- 62 - False True False - False</span>
<span class="go">based 63 based True False False xxxx False</span>
<span class="go">Fintech 69 Fintech True False False Xxxxx False</span>
<span class="go">company 77 company True False False xxxx False</span>
<span class="go">. 84 . False True False . False</span>
<span class="go">He 86 He True False False Xx True</span>
<span class="go">is 89 is True False False xx True</span>
<span class="go">interested 92 interested True False False xxxx False</span>
<span class="go">in 103 in True False False xx True</span>
<span class="go">learning 106 learning True False False xxxx False</span>
<span class="go">Natural 115 Natural True False False Xxxxx False</span>
<span class="go">Language 123 Language True False False Xxxxx False</span>
<span class="go">Processing 132 Processing True False False Xxxxx False</span>
<span class="go">. 142 . False True False . False</span>
</pre></div>
<p>In this example, some of the commonly required attributes are accessed:</p>
<ul>
<li><strong><code>text_with_ws</code></strong> prints token text with trailing space (if present).</li>
<li><strong><code>is_alpha</code></strong> detects if the token consists of alphabetic characters or not.</li>
<li><strong><code>is_punct</code></strong> detects if the token is a punctuation symbol or not.</li>
<li><strong><code>is_space</code></strong> detects if the token is a space or not.</li>
<li><strong><code>shape_</code></strong> prints out the shape of the word.</li>
<li><strong><code>is_stop</code></strong> detects if the token is a stop word or not.</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You’ll learn more about <strong>stop words</strong> in the next section.</p>
</div>
<p>You can also customize the tokenization process to detect tokens on custom characters. This is often used for hyphenated words, which are words joined with hyphen. For example, “London-based” is a hyphenated word.</p>
<p>spaCy allows you to customize tokenization by updating the <code>tokenizer</code> property on the <code>nlp</code> object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">re</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">spacy</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">spacy.tokenizer</span> <span class="k">import</span> <span class="n">Tokenizer</span>
<span class="gp">>>> </span><span class="n">custom_nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'en_core_web_sm'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">prefix_re</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">compile_prefix_regex</span><span class="p">(</span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">Defaults</span><span class="o">.</span><span class="n">prefixes</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">suffix_re</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">compile_suffix_regex</span><span class="p">(</span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">Defaults</span><span class="o">.</span><span class="n">suffixes</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">infix_re</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'''[-~]'''</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">customize_tokenizer</span><span class="p">(</span><span class="n">nlp</span><span class="p">):</span>
<span class="gp">... </span> <span class="c1"># Adds support to use `-` as the delimiter for tokenization</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">,</span> <span class="n">prefix_search</span><span class="o">=</span><span class="n">prefix_re</span><span class="o">.</span><span class="n">search</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">suffix_search</span><span class="o">=</span><span class="n">suffix_re</span><span class="o">.</span><span class="n">search</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">infix_finditer</span><span class="o">=</span><span class="n">infix_re</span><span class="o">.</span><span class="n">finditer</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">token_match</span><span class="o">=</span><span class="kc">None</span>
<span class="gp">... </span> <span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">customize_tokenizer</span><span class="p">(</span><span class="n">custom_nlp</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">custom_tokenizer_about_doc</span> <span class="o">=</span> <span class="n">custom_nlp</span><span class="p">(</span><span class="n">about_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">custom_tokenizer_about_doc</span><span class="p">])</span>
<span class="go">['Gus', 'Proto', 'is', 'a', 'Python', 'developer', 'currently',</span>
<span class="go">'working', 'for', 'a', 'London', '-', 'based', 'Fintech',</span>
<span class="go">'company', '.', 'He', 'is', 'interested', 'in', 'learning',</span>
<span class="go">'Natural', 'Language', 'Processing', '.']</span>
</pre></div>
<p>In order for you to customize, you can pass various parameters to the <code>Tokenizer</code> class:</p>
<ul>
<li><strong><code>nlp.vocab</code></strong> is a storage container for special cases and is used to handle cases like contractions and emoticons.</li>
<li><strong><code>prefix_search</code></strong> is the function that is used to handle preceding punctuation, such as opening parentheses.</li>
<li><strong><code>infix_finditer</code></strong> is the function that is used to handle non-whitespace separators, such as hyphens.</li>
<li><strong><code>suffix_search</code></strong> is the function that is used to handle succeeding punctuation, such as closing parentheses.</li>
<li><strong><code>token_match</code></strong> is an optional boolean function that is used to match strings that should never be split. It overrides the previous rules and is useful for entities like URLs or numbers.</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> spaCy already detects hyphenated words as individual tokens. The above code is just an example to show how tokenization can be customized. It can be used for any other character.</p>
</div>
<h2 id="stop-words">Stop Words</h2>
<p><strong>Stop words</strong> are the most common words in a language. In the English language, some examples of stop words are <code>the</code>, <code>are</code>, <code>but</code>, and <code>they</code>. Most sentences need to contain stop words in order to be full sentences that make sense.</p>
<p>Generally, stop words are removed because they aren’t significant and distort the word frequency analysis. spaCy has a list of stop words for the English language:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">spacy</span>
<span class="gp">>>> </span><span class="n">spacy_stopwords</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">lang</span><span class="o">.</span><span class="n">en</span><span class="o">.</span><span class="n">stop_words</span><span class="o">.</span><span class="n">STOP_WORDS</span>
<span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="n">spacy_stopwords</span><span class="p">)</span>
<span class="go">326</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">stop_word</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">spacy_stopwords</span><span class="p">)[:</span><span class="mi">10</span><span class="p">]:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">stop_word</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">using</span>
<span class="go">becomes</span>
<span class="go">had</span>
<span class="go">itself</span>
<span class="go">once</span>
<span class="go">often</span>
<span class="go">is</span>
<span class="go">herein</span>
<span class="go">who</span>
<span class="go">too</span>
</pre></div>
<p>You can remove stop words from the input text:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus</span>
<span class="go">Proto</span>
<span class="go">Python</span>
<span class="go">developer</span>
<span class="go">currently</span>
<span class="go">working</span>
<span class="go">London</span>
<span class="go">-</span>
<span class="go">based</span>
<span class="go">Fintech</span>
<span class="go">company</span>
<span class="go">.</span>
<span class="go">interested</span>
<span class="go">learning</span>
<span class="go">Natural</span>
<span class="go">Language</span>
<span class="go">Processing</span>
<span class="go">.</span>
</pre></div>
<p>Stop words like <code>is</code>, <code>a</code>, <code>for</code>, <code>the</code>, and <code>in</code> are not printed in the output above. You can also create a list of tokens not containing stop words:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">about_no_stopword_doc</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">]</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">about_no_stopword_doc</span><span class="p">)</span>
<span class="go">[Gus, Proto, Python, developer, currently, working, London,</span>
<span class="go">-, based, Fintech, company, ., interested, learning, Natural,</span>
<span class="go">Language, Processing, .]</span>
</pre></div>
<p><code>about_no_stopword_doc</code> can be joined with spaces to form a sentence with no stop words.</p>
<h2 id="lemmatization">Lemmatization</h2>
<p><strong>Lemmatization</strong> is the process of reducing inflected forms of a word while still ensuring that the reduced form belongs to the language. This reduced form or root word is called a <strong>lemma</strong>.</p>
<p>For example, <em>organizes</em>, <em>organized</em> and <em>organizing</em> are all forms of <em>organize</em>. Here, <em>organize</em> is the lemma. The inflection of a word allows you to express different grammatical categories like tense (<em>organized</em> vs <em>organize</em>), number (<em>trains</em> vs <em>train</em>), and so on. Lemmatization is necessary because it helps you reduce the inflected forms of a word so that they can be analyzed as a single item. It can also help you <strong>normalize</strong> the text.</p>
<p>spaCy has the attribute <code>lemma_</code> on the <code>Token</code> class. This attribute has the lemmatized form of a token:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">conference_help_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Gus is helping organize a developer'</span>
<span class="gp">... </span> <span class="s1">'conference on Applications of Natural Language'</span>
<span class="gp">... </span> <span class="s1">' Processing. He keeps organizing local Python meetups'</span>
<span class="gp">... </span> <span class="s1">' and several internal talks at his workplace.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">conference_help_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_help_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">conference_help_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">lemma_</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus Gus</span>
<span class="go">is be</span>
<span class="go">helping help</span>
<span class="go">organize organize</span>
<span class="go">a a</span>
<span class="go">developer developer</span>
<span class="go">conference conference</span>
<span class="go">on on</span>
<span class="go">Applications Applications</span>
<span class="go">of of</span>
<span class="go">Natural Natural</span>
<span class="go">Language Language</span>
<span class="go">Processing Processing</span>
<span class="go">. .</span>
<span class="go">He -PRON-</span>
<span class="go">keeps keep</span>
<span class="go">organizing organize</span>
<span class="go">local local</span>
<span class="go">Python Python</span>
<span class="go">meetups meetup</span>
<span class="go">and and</span>
<span class="go">several several</span>
<span class="go">internal internal</span>
<span class="go">talks talk</span>
<span class="go">at at</span>
<span class="go">his -PRON-</span>
<span class="go">workplace workplace</span>
<span class="go">. .</span>
</pre></div>
<p>In this example, <code>organizing</code> reduces to its lemma form <code>organize</code>. If you do not lemmatize the text, then <code>organize</code> and <code>organizing</code> will be counted as different tokens, even though they both have a similar meaning. Lemmatization helps you avoid duplicate words that have similar meanings.</p>
<h2 id="word-frequency">Word Frequency</h2>
<p>You can now convert a given text into tokens and perform statistical analysis over it. This analysis can give you various insights about word patterns, such as common words or unique words in the text:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">collections</span> <span class="k">import</span> <span class="n">Counter</span>
<span class="gp">>>> </span><span class="n">complete_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Gus Proto is a Python developer currently'</span>
<span class="gp">... </span> <span class="s1">'working for a London-based Fintech company. He is'</span>
<span class="gp">... </span> <span class="s1">' interested in learning Natural Language Processing.'</span>
<span class="gp">... </span> <span class="s1">' There is a developer conference happening on 21 July'</span>
<span class="gp">... </span> <span class="s1">' 2019 in London. It is titled "Applications of Natural'</span>
<span class="gp">... </span> <span class="s1">' Language Processing". There is a helpline number '</span>
<span class="gp">... </span> <span class="s1">' available at +1-1234567891. Gus is helping organize it.'</span>
<span class="gp">... </span> <span class="s1">' He keeps organizing local Python meetups and several'</span>
<span class="gp">... </span> <span class="s1">' internal talks at his workplace. Gus is also presenting'</span>
<span class="gp">... </span> <span class="s1">' a talk. The talk will introduce the reader about "Use'</span>
<span class="gp">... </span> <span class="s1">' cases of Natural Language Processing in Fintech".'</span>
<span class="gp">... </span> <span class="s1">' Apart from his work, he is very passionate about music.'</span>
<span class="gp">... </span> <span class="s1">' Gus is learning to play the Piano. He has enrolled '</span>
<span class="gp">... </span> <span class="s1">' himself in the weekend batch of Great Piano Academy.'</span>
<span class="gp">... </span> <span class="s1">' Great Piano Academy is situated in Mayfair or the City'</span>
<span class="gp">... </span> <span class="s1">' of London and has world-class piano instructors.'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">complete_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">complete_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Remove stop words and punctuation symbols</span>
<span class="gp">>>> </span><span class="n">words</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span>
<span class="gp">... </span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">word_freq</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">words</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># 5 commonly occurring words with their frequencies</span>
<span class="gp">>>> </span><span class="n">common_words</span> <span class="o">=</span> <span class="n">word_freq</span><span class="o">.</span><span class="n">most_common</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">common_words</span><span class="p">)</span>
<span class="go">[('Gus', 4), ('London', 3), ('Natural', 3), ('Language', 3), ('Processing', 3)]</span>
<span class="gp">>>> </span><span class="c1"># Unique words</span>
<span class="gp">>>> </span><span class="n">unique_words</span> <span class="o">=</span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">freq</span><span class="p">)</span> <span class="ow">in</span> <span class="n">word_freq</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="n">freq</span> <span class="o">==</span> <span class="mi">1</span><span class="p">]</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">unique_words</span><span class="p">)</span>
<span class="go">['Proto', 'currently', 'working', 'based', 'company',</span>
<span class="go">'interested', 'conference', 'happening', '21', 'July',</span>
<span class="go">'2019', 'titled', 'Applications', 'helpline', 'number',</span>
<span class="go">'available', '+1', '1234567891', 'helping', 'organize',</span>
<span class="go">'keeps', 'organizing', 'local', 'meetups', 'internal',</span>
<span class="go">'talks', 'workplace', 'presenting', 'introduce', 'reader',</span>
<span class="go">'Use', 'cases', 'Apart', 'work', 'passionate', 'music', 'play',</span>
<span class="go">'enrolled', 'weekend', 'batch', 'situated', 'Mayfair', 'City',</span>
<span class="go">'world', 'class', 'piano', 'instructors']</span>
</pre></div>
<p>By looking at the common words, you can see that the text as a whole is probably about <code>Gus</code>, <code>London</code>, or <code>Natural Language Processing</code>. This way, you can take any unstructured text and perform statistical analysis to know what it’s about.</p>
<p>Here’s another example of the same text with stop words:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">words_all</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">word_freq_all</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">words_all</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># 5 commonly occurring words with their frequencies</span>
<span class="gp">>>> </span><span class="n">common_words_all</span> <span class="o">=</span> <span class="n">word_freq_all</span><span class="o">.</span><span class="n">most_common</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">common_words_all</span><span class="p">)</span>
<span class="go">[('is', 10), ('a', 5), ('in', 5), ('Gus', 4), ('of', 4)]</span>
</pre></div>
<p>Four out of five of the most common words are stop words, which don’t tell you much about the text. If you consider stop words while doing word frequency analysis, then you won’t be able to derive meaningful insights from the input text. This is why removing stop words is so important.</p>
<h2 id="part-of-speech-tagging">Part of Speech Tagging</h2>
<p><strong>Part of speech</strong> or <strong>POS</strong> is a grammatical role that explains how a particular word is used in a sentence. There are eight parts of speech:</p>
<ol>
<li>Noun</li>
<li>Pronoun</li>
<li>Adjective</li>
<li>Verb</li>
<li>Adverb</li>
<li>Preposition</li>
<li>Conjunction</li>
<li>Interjection</li>
</ol>
<p><strong>Part of speech tagging</strong> is the process of assigning a <strong>POS tag</strong> to each token depending on its usage in the sentence. POS tags are useful for assigning a syntactic category like <strong>noun</strong> or <strong>verb</strong> to each word.</p>
<p>In spaCy, POS tags are available as an attribute on the <code>Token</code> object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span><span class="p">,</span> <span class="n">spacy</span><span class="o">.</span><span class="n">explain</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">))</span>
<span class="gp">...</span>
<span class="go">Gus NNP PROPN noun, proper singular</span>
<span class="go">Proto NNP PROPN noun, proper singular</span>
<span class="go">is VBZ VERB verb, 3rd person singular present</span>
<span class="go">a DT DET determiner</span>
<span class="go">Python NNP PROPN noun, proper singular</span>
<span class="go">developer NN NOUN noun, singular or mass</span>
<span class="go">currently RB ADV adverb</span>
<span class="go">working VBG VERB verb, gerund or present participle</span>
<span class="go">for IN ADP conjunction, subordinating or preposition</span>
<span class="go">a DT DET determiner</span>
<span class="go">London NNP PROPN noun, proper singular</span>
<span class="go">- HYPH PUNCT punctuation mark, hyphen</span>
<span class="go">based VBN VERB verb, past participle</span>
<span class="go">Fintech NNP PROPN noun, proper singular</span>
<span class="go">company NN NOUN noun, singular or mass</span>
<span class="go">. . PUNCT punctuation mark, sentence closer</span>
<span class="go">He PRP PRON pronoun, personal</span>
<span class="go">is VBZ VERB verb, 3rd person singular present</span>
<span class="go">interested JJ ADJ adjective</span>
<span class="go">in IN ADP conjunction, subordinating or preposition</span>
<span class="go">learning VBG VERB verb, gerund or present participle</span>
<span class="go">Natural NNP PROPN noun, proper singular</span>
<span class="go">Language NNP PROPN noun, proper singular</span>
<span class="go">Processing NNP PROPN noun, proper singular</span>
<span class="go">. . PUNCT punctuation mark, sentence closer</span>
</pre></div>
<p>Here, two attributes of the <code>Token</code> class are accessed:</p>
<ol>
<li><strong><code>tag_</code></strong> lists the fine-grained part of speech.</li>
<li><strong><code>pos_</code></strong> lists the coarse-grained part of speech.</li>
</ol>
<p><code>spacy.explain</code> gives descriptive details about a particular POS tag. spaCy provides a <a href="https://spaCy.io/api/annotation#pos-tagging">complete tag list</a> along with an explanation for each tag.</p>
<p>Using POS tags, you can extract a particular category of words:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">nouns</span> <span class="o">=</span> <span class="p">[]</span>
<span class="gp">>>> </span><span class="n">adjectives</span> <span class="o">=</span> <span class="p">[]</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span> <span class="o">==</span> <span class="s1">'NOUN'</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">nouns</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span> <span class="o">==</span> <span class="s1">'ADJ'</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">adjectives</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">nouns</span>
<span class="go">[developer, company]</span>
<span class="gp">>>> </span><span class="n">adjectives</span>
<span class="go">[interested]</span>
</pre></div>
<p>You can use this to derive insights, remove the most common nouns, or see which adjectives are used for a particular noun.</p>
<h2 id="visualization-using-displacy">Visualization: Using displaCy</h2>
<p>spaCy comes with a built-in visualizer called <strong>displaCy</strong>. You can use it to visualize a <strong>dependency parse</strong> or <strong>named entities</strong> in a browser or a <a href="https://realpython.com/jupyter-notebook-introduction/">Jupyter notebook</a>.</p>
<p>You can use displaCy to find POS tags for tokens:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">spacy</span> <span class="k">import</span> <span class="n">displacy</span>
<span class="gp">>>> </span><span class="n">about_interest_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'He is interested in learning'</span>
<span class="gp">... </span> <span class="s1">' Natural Language Processing.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">about_interest_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">about_interest_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">about_interest_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">'dep'</span><span class="p">)</span>
</pre></div>
<p>The above code will spin a simple web server. You can see the visualization by opening <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser:</p>
<figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png" width="2630" height="600" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_pos_tags.45059f2bf851.png&w=657&sig=a49d6b5a0e5952aea59c0241f61fb09440bb326b 657w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_pos_tags.45059f2bf851.png&w=1315&sig=858218e45ae1e23a87ad42204154aeae77c9cc0c 1315w, https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png 2630w" sizes="75vw" alt="Displacy: Part of Speech Tagging Demo"/></a><figcaption class="figure-caption text-center">displaCy: Part of Speech Tagging Demo</figcaption></figure>
<p>In the image above, each token is assigned a POS tag written just below the token.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Here’s how you can use displaCy in a Jupyter notebook:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">displacy</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">about_interest_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">'dep'</span><span class="p">,</span> <span class="n">jupyter</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
<h2 id="preprocessing-functions">Preprocessing Functions</h2>
<p>You can create a <strong>preprocessing function</strong> that takes text as input and applies the following operations:</p>
<ul>
<li>Lowercases the text</li>
<li>Lemmatizes each token</li>
<li>Removes punctuation symbols</li>
<li>Removes stop words</li>
</ul>
<p>A preprocessing function converts text to an analyzable format. It’s necessary for most NLP tasks. Here’s an example:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">is_token_allowed</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
<span class="gp">... </span> <span class="sd">'''</span>
<span class="gp">... </span><span class="sd"> Only allow valid tokens which are not stop words</span>
<span class="gp">... </span><span class="sd"> and punctuation symbols.</span>
<span class="gp">... </span><span class="sd"> '''</span>
<span class="gp">... </span> <span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="n">token</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="ow">or</span>
<span class="gp">... </span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span> <span class="ow">or</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">return</span> <span class="kc">False</span>
<span class="gp">... </span> <span class="k">return</span> <span class="kc">True</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">preprocess_token</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
<span class="gp">... </span> <span class="c1"># Reduce token to its lowercase lemma form</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">lemma_</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">complete_filtered_tokens</span> <span class="o">=</span> <span class="p">[</span><span class="n">preprocess_token</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span> <span class="k">if</span> <span class="n">is_token_allowed</span><span class="p">(</span><span class="n">token</span><span class="p">)]</span>
<span class="gp">>>> </span><span class="n">complete_filtered_tokens</span>
<span class="go">['gus', 'proto', 'python', 'developer', 'currently', 'work',</span>
<span class="go">'london', 'base', 'fintech', 'company', 'interested', 'learn',</span>
<span class="go">'natural', 'language', 'processing', 'developer', 'conference',</span>
<span class="go">'happen', '21', 'july', '2019', 'london', 'title',</span>
<span class="go">'applications', 'natural', 'language', 'processing', 'helpline',</span>
<span class="go">'number', 'available', '+1', '1234567891', 'gus', 'help',</span>
<span class="go">'organize', 'keep', 'organize', 'local', 'python', 'meetup',</span>
<span class="go">'internal', 'talk', 'workplace', 'gus', 'present', 'talk', 'talk',</span>
<span class="go">'introduce', 'reader', 'use', 'case', 'natural', 'language',</span>
<span class="go">'processing', 'fintech', 'apart', 'work', 'passionate', 'music',</span>
<span class="go">'gus', 'learn', 'play', 'piano', 'enrol', 'weekend', 'batch',</span>
<span class="go">'great', 'piano', 'academy', 'great', 'piano', 'academy',</span>
<span class="go">'situate', 'mayfair', 'city', 'london', 'world', 'class',</span>
<span class="go">'piano', 'instructor']</span>
</pre></div>
<p>Note that the <code>complete_filtered_tokens</code> does not contain any stop word or punctuation symbols and consists of lemmatized lowercase tokens.</p>
<h2 id="rule-based-matching-using-spacy">Rule-Based Matching Using spaCy</h2>
<p><strong>Rule-based matching</strong> is one of the steps in extracting information from unstructured text. It’s used to identify and extract tokens and phrases according to patterns (such as lowercase) and grammatical features (such as part of speech).</p>
<p>Rule-based matching can use <a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions</a> to extract entities (such as phone numbers) from an unstructured text. It’s different from extracting text using regular expressions only in the sense that regular expressions don’t consider the lexical and grammatical attributes of the text.</p>
<p>With rule-based matching, you can extract a first name and a last name, which are always <strong>proper nouns</strong>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">spacy.matcher</span> <span class="k">import</span> <span class="n">Matcher</span>
<span class="gp">>>> </span><span class="n">matcher</span> <span class="o">=</span> <span class="n">Matcher</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">extract_full_name</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
<span class="gp">... </span> <span class="n">pattern</span> <span class="o">=</span> <span class="p">[{</span><span class="s1">'POS'</span><span class="p">:</span> <span class="s1">'PROPN'</span><span class="p">},</span> <span class="p">{</span><span class="s1">'POS'</span><span class="p">:</span> <span class="s1">'PROPN'</span><span class="p">}]</span>
<span class="gp">... </span> <span class="n">matcher</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s1">'FULL_NAME'</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
<span class="gp">... </span> <span class="n">matches</span> <span class="o">=</span> <span class="n">matcher</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">match_id</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="ow">in</span> <span class="n">matches</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">span</span> <span class="o">=</span> <span class="n">nlp_doc</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">span</span><span class="o">.</span><span class="n">text</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">extract_full_name</span><span class="p">(</span><span class="n">about_doc</span><span class="p">)</span>
<span class="go">'Gus Proto'</span>
</pre></div>
<p>In this example, <code>pattern</code> is a list of objects that defines the combination of tokens to be matched. Both POS tags in it are <code>PROPN</code> (proper noun). So, the <code>pattern</code> consists of two objects in which the POS tags for both tokens should be <code>PROPN</code>. This pattern is then added to <code>Matcher</code> using <code>FULL_NAME</code> and the the <code>match_id</code>. Finally, matches are obtained with their starting and end indexes.</p>
<p>You can also use rule-based matching to extract phone numbers:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">spacy.matcher</span> <span class="k">import</span> <span class="n">Matcher</span>
<span class="gp">>>> </span><span class="n">matcher</span> <span class="o">=</span> <span class="n">Matcher</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">conference_org_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'There is a developer conference'</span>
<span class="gp">... </span> <span class="s1">'happening on 21 July 2019 in London. It is titled'</span>
<span class="gp">... </span> <span class="s1">' "Applications of Natural Language Processing".'</span>
<span class="gp">... </span> <span class="s1">' There is a helpline number available'</span>
<span class="gp">... </span> <span class="s1">' at (123) 456-789'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">extract_phone_number</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
<span class="gp">... </span> <span class="n">pattern</span> <span class="o">=</span> <span class="p">[{</span><span class="s1">'ORTH'</span><span class="p">:</span> <span class="s1">'('</span><span class="p">},</span> <span class="p">{</span><span class="s1">'SHAPE'</span><span class="p">:</span> <span class="s1">'ddd'</span><span class="p">},</span>
<span class="gp">... </span> <span class="p">{</span><span class="s1">'ORTH'</span><span class="p">:</span> <span class="s1">')'</span><span class="p">},</span> <span class="p">{</span><span class="s1">'SHAPE'</span><span class="p">:</span> <span class="s1">'ddd'</span><span class="p">},</span>
<span class="gp">... </span> <span class="p">{</span><span class="s1">'ORTH'</span><span class="p">:</span> <span class="s1">'-'</span><span class="p">,</span> <span class="s1">'OP'</span><span class="p">:</span> <span class="s1">'?'</span><span class="p">},</span>
<span class="gp">... </span> <span class="p">{</span><span class="s1">'SHAPE'</span><span class="p">:</span> <span class="s1">'ddd'</span><span class="p">}]</span>
<span class="gp">... </span> <span class="n">matcher</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s1">'PHONE_NUMBER'</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
<span class="gp">... </span> <span class="n">matches</span> <span class="o">=</span> <span class="n">matcher</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">match_id</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="ow">in</span> <span class="n">matches</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">span</span> <span class="o">=</span> <span class="n">nlp_doc</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">span</span><span class="o">.</span><span class="n">text</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">conference_org_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_org_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">extract_phone_number</span><span class="p">(</span><span class="n">conference_org_doc</span><span class="p">)</span>
<span class="go">'(123) 456-789'</span>
</pre></div>
<p>In this example, only the pattern is updated in order to match phone numbers from the previous example. Here, some attributes of the token are also used:</p>
<ul>
<li><strong><code>ORTH</code></strong> gives the exact text of the token.</li>
<li><strong><code>SHAPE</code></strong> transforms the token string to show orthographic features.</li>
<li><strong><code>OP</code></strong> defines operators. Using <code>?</code> as a value means that the pattern is optional, meaning it can match 0 or 1 times.</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> For simplicity, phone numbers are assumed to be of a particular format: <code>(123) 456-789</code>. You can change this depending on your use case.</p>
</div>
<p>Rule-based matching helps you identify and extract tokens and phrases according to lexical patterns (such as lowercase) and grammatical features(such as part of speech).</p>
<h2 id="dependency-parsing-using-spacy">Dependency Parsing Using spaCy</h2>
<p><strong>Dependency parsing</strong> is the process of extracting the dependency parse of a sentence to represent its grammatical structure. It defines the dependency relationship between <strong>headwords</strong> and their <strong>dependents</strong>. The head of a sentence has no dependency and is called the <strong>root of the sentence</strong>. The <strong>verb</strong> is usually the head of the sentence. All other words are linked to the headword.</p>
<p>The dependencies can be mapped in a directed graph representation: </p>
<ul>
<li>Words are the nodes.</li>
<li>The grammatical relationships are the edges.</li>
</ul>
<p>Dependency parsing helps you know what role a word plays in the text and how different words relate to each other. It’s also used in <strong>shallow parsing</strong> and named entity recognition.</p>
<p>Here’s how you can use dependency parsing to see the relationships between words:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">piano_text</span> <span class="o">=</span> <span class="s1">'Gus is learning piano'</span>
<span class="gp">>>> </span><span class="n">piano_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">piano_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">piano_doc</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">head</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">dep_</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Gus NNP learning nsubj</span>
<span class="go">is VBZ learning aux</span>
<span class="go">learning VBG learning ROOT</span>
<span class="go">piano NN learning dobj</span>
</pre></div>
<p>In this example, the sentence contains three relationships:</p>
<ol>
<li><strong><code>nsubj</code></strong> is the subject of the word. Its headword is a verb.</li>
<li><strong><code>aux</code></strong> is an auxiliary word. Its headword is a verb.</li>
<li><strong><code>dobj</code></strong> is the direct object of the verb. Its headword is a verb.</li>
</ol>
<p>There is a detailed <a href="https://nlp.stanford.edu/software/dependencies_manual.pdf">list of relationships</a> with descriptions. You can use displaCy to visualize the dependency tree:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">piano_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">'dep'</span><span class="p">)</span>
</pre></div>
<p>This code will produce a visualization that can be accessed by opening <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser:</p>
<figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png" width="1278" height="596" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png&w=319&sig=111728c07cf2e1f64b8419cfce8a5f880c244d03 319w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png&w=639&sig=f90a72529d7bc2d2dd944af3c02bbf487b65aaf3 639w, https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png 1278w" sizes="75vw" alt="Displacy: Dependency Parse Demo"/></a><figcaption class="figure-caption text-center">displaCy: Dependency Parse Demo</figcaption></figure>
<p>This image shows you that the subject of the sentence is the proper noun <code>Gus</code> and that it has a <code>learn</code> relationship with <code>piano</code>.</p>
<h2 id="navigating-the-tree-and-subtree">Navigating the Tree and Subtree</h2>
<p>The dependency parse tree has all the properties of a <a href="https://en.wikipedia.org/wiki/Tree_(data_structure)">tree</a>. This tree contains information about sentence structure and grammar and can be traversed in different ways to extract relationships.</p>
<p>spaCy provides attributes like <code>children</code>, <code>lefts</code>, <code>rights</code>, and <code>subtree</code> to navigate the parse tree:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">one_line_about_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Gus Proto is a Python developer'</span>
<span class="gp">... </span> <span class="s1">' currently working for a London-based Fintech company'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">one_line_about_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">one_line_about_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Extract children of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">children</span><span class="p">])</span>
<span class="go">['a', 'Python', 'working']</span>
<span class="gp">>>> </span><span class="c1"># Extract previous neighboring node of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">nbor</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span>
<span class="go">Python</span>
<span class="gp">>>> </span><span class="c1"># Extract next neighboring node of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">nbor</span><span class="p">())</span>
<span class="go">currently</span>
<span class="gp">>>> </span><span class="c1"># Extract all tokens on the left of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">lefts</span><span class="p">])</span>
<span class="go">['a', 'Python']</span>
<span class="gp">>>> </span><span class="c1"># Extract tokens on the right of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">rights</span><span class="p">])</span>
<span class="go">['working']</span>
<span class="gp">>>> </span><span class="c1"># Print subtree of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">subtree</span><span class="p">))</span>
<span class="go">[a, Python, developer, currently, working, for, a, London, -,</span>
<span class="go">based, Fintech, company]</span>
</pre></div>
<p>You can construct a function that takes a subtree as an argument and returns a string by merging words in it:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">flatten_tree</span><span class="p">(</span><span class="n">tree</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">return</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text_with_ws</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">tree</span><span class="p">)])</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="c1"># Print flattened subtree of `developer`</span>
<span class="gp">>>> </span><span class="nb">print</span> <span class="p">(</span><span class="n">flatten_tree</span><span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">subtree</span><span class="p">))</span>
<span class="go">a Python developer currently working for a London-based Fintech company</span>
</pre></div>
<p>You can use this function to print all the tokens in a subtree.</p>
<h2 id="shallow-parsing">Shallow Parsing</h2>
<p><strong>Shallow parsing</strong>, or <strong>chunking</strong>, is the process of extracting phrases from unstructured text. Chunking groups adjacent tokens into phrases on the basis of their POS tags. There are some standard well-known chunks such as noun phrases, verb phrases, and prepositional phrases.</p>
<h3 id="noun-phrase-detection">Noun Phrase Detection</h3>
<p>A noun phrase is a phrase that has a noun as its head. It could also include other kinds of words, such as adjectives, ordinals, determiners. Noun phrases are useful for explaining the context of the sentence. They help you infer <em>what</em> is being talked about in the sentence.</p>
<p>spaCy has the property <code>noun_chunks</code> on <code>Doc</code> object. You can use it to extract noun phrases:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">conference_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'There is a developer conference'</span>
<span class="gp">... </span> <span class="s1">' happening on 21 July 2019 in London.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">conference_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Extract Noun Phrases</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">conference_doc</span><span class="o">.</span><span class="n">noun_chunks</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">a developer conference</span>
<span class="go">21 July</span>
<span class="go">London</span>
</pre></div>
<p>By looking at noun phrases, you can get information about your text. For example, <code>a developer conference</code> indicates that the text mentions a conference, while the date <code>21 July</code> lets you know that conference is scheduled for <code>21 July</code>. You can figure out whether the conference is in the past or the future. <code>London</code> tells you that the conference is in <code>London</code>.</p>
<h3 id="verb-phrase-detection">Verb Phrase Detection</h3>
<p>A <strong>verb phrase</strong> is a syntactic unit composed of at least one verb. This verb can be followed by other chunks, such as noun phrases. Verb phrases are useful for understanding the actions that nouns are involved in. </p>
<p>spaCy has no built-in functionality to extract verb phrases, so you’ll need a library called <a href="https://chartbeat-labs.github.io/textacy/"><code>textacy</code></a>:</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> </p>
<p>You can use <code>pip</code> to install <code>textacy</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install textacy
</pre></div>
</div>
<p>Now that you have <code>textacy</code> installed, you can use it to extract verb phrases based on grammar rules:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">textacy</span>
<span class="gp">>>> </span><span class="n">about_talk_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'The talk will introduce reader about Use'</span>
<span class="gp">... </span> <span class="s1">' cases of Natural Language Processing in'</span>
<span class="gp">... </span> <span class="s1">' Fintech'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="sa">r</span><span class="s1">'(<VERB>?<ADV>*<VERB>+)'</span>
<span class="gp">>>> </span><span class="n">about_talk_doc</span> <span class="o">=</span> <span class="n">textacy</span><span class="o">.</span><span class="n">make_spacy_doc</span><span class="p">(</span><span class="n">about_talk_text</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">lang</span><span class="o">=</span><span class="s1">'en_core_web_sm'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">verb_phrases</span> <span class="o">=</span> <span class="n">textacy</span><span class="o">.</span><span class="n">extract</span><span class="o">.</span><span class="n">pos_regex_matches</span><span class="p">(</span><span class="n">about_talk_doc</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Print all Verb Phrase</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">verb_phrases</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">chunk</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">will introduce</span>
<span class="gp">>>> </span><span class="c1"># Extract Noun Phrase to explain what nouns are involved</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">about_talk_doc</span><span class="o">.</span><span class="n">noun_chunks</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span> <span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">The talk</span>
<span class="go">reader</span>
<span class="go">Use cases</span>
<span class="go">Natural Language Processing</span>
<span class="go">Fintech</span>
</pre></div>
<p>In this example, the verb phrase <code>introduce</code> indicates that something will be introduced. By looking at noun phrases, you can see that there is a <code>talk</code> that will <code>introduce</code> the <code>reader</code> to <code>use cases</code> of <code>Natural Language Processing</code> or <code>Fintech</code>.</p>
<p>The above code extracts all the verb phrases <a href="https://chartbeat-labs.github.io/textacy/api_reference/information_extraction.html?highlight=pos#textacy.extract.pos_regex_matches">using a regular expression pattern</a> of POS tags. You can tweak the pattern for verb phrases depending upon your use case.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> In the previous example, you could have also done dependency parsing to see what the <a href="https://nlp.stanford.edu/software/dependencies_manual.pdf">relationships</a> between the words were.</p>
</div>
<h2 id="named-entity-recognition">Named Entity Recognition</h2>
<p><strong>Named Entity Recognition</strong> (NER) is the process of locating <strong>named entities</strong> in unstructured text and then classifying them into pre-defined categories, such as person names, organizations, locations, monetary values, percentages, time expressions, and so on.</p>
<p>You can use <strong>NER</strong> to know more about the meaning of your text. For example, you could use it to populate tags for a set of documents in order to improve the keyword search. You could also use it to categorize customer support tickets into relevant categories.</p>
<p>spaCy has the property <code>ents</code> on <code>Doc</code> objects. You can use it to extract named entities:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">piano_class_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Great Piano Academy is situated'</span>
<span class="gp">... </span> <span class="s1">' in Mayfair or the City of London and has'</span>
<span class="gp">... </span> <span class="s1">' world-class piano instructors.'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">piano_class_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">piano_class_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">ent</span> <span class="ow">in</span> <span class="n">piano_class_doc</span><span class="o">.</span><span class="n">ents</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">ent</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">ent</span><span class="o">.</span><span class="n">start_char</span><span class="p">,</span> <span class="n">ent</span><span class="o">.</span><span class="n">end_char</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">ent</span><span class="o">.</span><span class="n">label_</span><span class="p">,</span> <span class="n">spacy</span><span class="o">.</span><span class="n">explain</span><span class="p">(</span><span class="n">ent</span><span class="o">.</span><span class="n">label_</span><span class="p">))</span>
<span class="gp">...</span>
<span class="go">Great Piano Academy 0 19 ORG Companies, agencies, institutions, etc.</span>
<span class="go">Mayfair 35 42 GPE Countries, cities, states</span>
<span class="go">the City of London 46 64 GPE Countries, cities, states</span>
</pre></div>
<p>In the above example, <code>ent</code> is a <a href="https://spacy.io/api/span"><code>Span</code></a> object with various attributes:</p>
<ul>
<li><strong><code>text</code></strong> gives the Unicode text representation of the entity.</li>
<li><strong><code>start_char</code></strong> denotes the character offset for the start of the entity.</li>
<li><strong><code>end_char</code></strong> denotes the character offset for the end of the entity.</li>
<li><strong><code>label_</code></strong> gives the label of the entity.</li>
</ul>
<p><code>spacy.explain</code> gives descriptive details about an entity label. The spaCy model has a pre-trained <a href="https://spaCy.io/api/annotation#named-entities">list of entity classes</a>. You can use displaCy to visualize these entities:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">piano_class_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">'ent'</span><span class="p">)</span>
</pre></div>
<p>If you open <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser, then you can see the visualization:</p>
<figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_ner.1fba6869638f.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_ner.1fba6869638f.png" width="1930" height="140" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_ner.1fba6869638f.png&w=482&sig=18b93b0aed61930a6eedd37dbd12fbbce22733d4 482w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_ner.1fba6869638f.png&w=965&sig=f6b3cfb460053397a23a0eb49ebc22cf05dd15ab 965w, https://files.realpython.com/media/displacy_ner.1fba6869638f.png 1930w" sizes="75vw" alt="Displacy: Named Entity Recognition Demo"/></a><figcaption class="figure-caption text-center">displaCy: Named Entity Recognition Demo</figcaption></figure>
<p>You can use NER to redact people’s names from a text. For example, you might want to do this in order to hide personal information collected in a survey. You can use spaCy to do that:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">survey_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'Out of 5 people surveyed, James Robert,'</span>
<span class="gp">... </span> <span class="s1">' Julie Fuller and Benjamin Brooks like'</span>
<span class="gp">... </span> <span class="s1">' apples. Kelly Cox and Matthew Evans'</span>
<span class="gp">... </span> <span class="s1">' like oranges.'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">replace_person_names</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">ent_iob</span> <span class="o">!=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">token</span><span class="o">.</span><span class="n">ent_type_</span> <span class="o">==</span> <span class="s1">'PERSON'</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">return</span> <span class="s1">'[REDACTED] '</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">string</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">redact_names</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">ent</span> <span class="ow">in</span> <span class="n">nlp_doc</span><span class="o">.</span><span class="n">ents</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">ent</span><span class="o">.</span><span class="n">merge</span><span class="p">()</span>
<span class="gp">... </span> <span class="n">tokens</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="n">replace_person_names</span><span class="p">,</span> <span class="n">nlp_doc</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">return</span> <span class="s1">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">survey_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">survey_text</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">redact_names</span><span class="p">(</span><span class="n">survey_doc</span><span class="p">)</span>
<span class="go">'Out of 5 people surveyed, [REDACTED] , [REDACTED] and'</span>
<span class="go">' [REDACTED] like apples. [REDACTED] and [REDACTED]'</span>
<span class="go">' like oranges.'</span>
</pre></div>
<p>In this example, <code>replace_person_names()</code> uses <code>ent_iob</code>. It gives the IOB code of the named entity tag using <a href="https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)">inside-outside-beginning (IOB) tagging</a>. Here, it can assume a value other than zero, because zero means that no entity tag is set.</p>
<h2 id="conclusion">Conclusion</h2>
<p>spaCy is a powerful and advanced library that is gaining huge popularity for NLP applications due to its speed, ease of use, accuracy, and extensibility. Congratulations! You now know:</p>
<ul>
<li>What the foundational terms and concepts in NLP are</li>
<li>How to implement those concepts in spaCy</li>
<li>How to customize and extend built-in functionalities in spaCy</li>
<li>How to perform basic statistical analysis on a text</li>
<li>How to create a pipeline to process unstructured text</li>
<li>How to parse a sentence and extract meaningful insights from it</li>
</ul>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
PyCharm for Productive Python Development (Guide)https://realpython.com/pycharm-guide/2019-08-28T14:00:00+00:00In this step-by-step tutorial, you'll learn how you can use PyCharm to be a more productive Python developer. PyCharm makes debugging and visualization easy so you can focus on business logic and just get the job done.
<p>As a programmer, you should be focused on the business logic and creating useful applications for your users. In doing that, <a href="https://www.jetbrains.com/pycharm/">PyCharm</a> by <a href="https://www.jetbrains.com/">JetBrains</a> saves you a lot of time by taking care of the routine and by making a number of other tasks such as debugging and visualization easy. </p>
<p><strong>In this article, you’ll learn about:</strong></p>
<ul>
<li>Installing PyCharm</li>
<li>Writing code in PyCharm</li>
<li>Running your code in PyCharm</li>
<li>Debugging and testing your code in PyCharm</li>
<li>Editing an existing project in PyCharm</li>
<li>Searching and navigating in PyCharm</li>
<li>Using Version Control in PyCharm</li>
<li>Using Plugins and External Tools in PyCharm</li>
<li>Using PyCharm Professional features, such as Django support and Scientific mode</li>
</ul>
<p>This article assumes that you’re familiar with Python development and already have some form of Python installed on your system. Python 3.6 will be used for this tutorial. Screenshots and demos provided are for macOS. Because PyCharm runs on all major platforms, you may see slightly different UI elements and may need to modify certain commands.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note</strong>: </p>
<p>PyCharm comes in three editions: </p>
<ol>
<li><a href="https://www.jetbrains.com/pycharm-edu/">PyCharm Edu</a> is free and for educational purposes. </li>
<li><a href="https://www.jetbrains.com/pycharm">PyCharm Community</a> is free as well and intended for pure Python development. </li>
<li><a href="https://www.jetbrains.com/pycharm">PyCharm Professional</a> is paid, has everything the Community edition has and also is very well suited for Web and Scientific development with support for such frameworks as Django and Flask, Database and SQL, and scientific tools such as Jupyter.</li>
</ol>
<p>For more details on their differences, check out the <a href="https://www.jetbrains.com/pycharm/features/editions_comparison_matrix.html">PyCharm Editions Comparison Matrix</a> by JetBrains. The company also has <a href="https://www.jetbrains.com/pycharm/buy/#edition=discounts">special offers</a> for students, teachers, open source projects, and other cases.</p>
</div>
<div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
<h2 id="installing-pycharm">Installing PyCharm</h2>
<p>This article will use PyCharm Community Edition 2019.1 as it’s free and available on every major platform. Only the section about the professional features will use PyCharm Professional Edition 2019.1. </p>
<p>The recommended way of installing PyCharm is with the <a href="https://www.jetbrains.com/toolbox/app/">JetBrains Toolbox App</a>. With its help, you’ll be able to install different JetBrains products or several versions of the same product, update, rollback, and easily remove any tool when necessary. You’ll also be able to quickly open any project in the right IDE and version.</p>
<p>To install the Toolbox App, refer to the <a href="https://www.jetbrains.com/help/pycharm/installation-guide.html#toolbox">documentation</a> by JetBrains. It will automatically give you the right instructions depending on your OS. In case it didn’t recognize your OS correctly, you can always find it from the drop down list on the top right section: </p>
<p><a href="https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png" width="1010" height="679" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png&w=252&sig=e331b2eb15a3c8b9396327dedc700bd2bcbbc9e3 252w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png&w=505&sig=4a0a0527b968050fb042a0565f5d6970d72ee1f9 505w, https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png 1010w" sizes="75vw" alt="List of OSes in the JetBrains website"/></a></p>
<p>After installing, launch the app and accept the user agreement. Under the <em>Tools</em> tab, you’ll see a list of available products. Find PyCharm Community there and click <em>Install</em>:</p>
<p><a href="https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png" target="_blank"><img class="img-fluid mx-auto d-block border w-33" src="https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png" width="337" height="537" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png&w=84&sig=5f1e571c6c7bed958efddaec87d6ac5168713217 84w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png&w=168&sig=0a76111939657ae01eaa8203b24c0c2e4fff5ee6 168w, https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png 337w" sizes="75vw" alt="PyCharm installed with the Toolbox app"/></a></p>
<p>Voilร ! You have PyCharm available on your machine. If you don’t want to use the Toolbox app, then you can also do a <a href="https://www.jetbrains.com/help/pycharm/installation-guide.html#standalone">stand-alone installation of PyCharm</a>.</p>
<p>Launch PyCharm, and you’ll see the import settings popup:</p>
<p><a href="https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png" width="416" height="156" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png&w=104&sig=4920753cc035f162c505253937453e1aa7cc4d26 104w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png&w=208&sig=b65b1226bcc172a811d7cd1e00cd600408fba092 208w, https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png 416w" sizes="75vw" alt="PyCharm Import Settings Popup"/></a></p>
<p>PyCharm will automatically detect that this is a fresh install and choose <em>Do not import settings</em> for you. Click <em>OK</em>, and PyCharm will ask you to select a keymap scheme. Leave the default and click <em>Next: UI Themes</em> on the bottom right:</p>
<p><a href="https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png" width="805" height="666" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png&w=201&sig=644595a94c07780a552f76abbfc5fe526b3c9459 201w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png&w=402&sig=64c348872c1519bc4148e3dbbac2550ed6c0fa30 402w, https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png 805w" sizes="75vw" alt="PyCharm Keymap Scheme"/></a></p>
<p>PyCharm will then ask you to choose a dark theme called Darcula or a light theme. Choose whichever you prefer and click <em>Next: Launcher Script</em>: </p>
<p><a href="https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png" width="803" height="666" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png&w=200&sig=6998b85afd9e2ca1503624ba55b904f4051f1ffe 200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png&w=401&sig=a6480f0b0073f0680068828fc2f0f5c8ee55cbdb 401w, https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png 803w" sizes="75vw" alt="PyCharm Set UI Theme Page"/></a></p>
<p>I’ll be using the dark theme Darcula throughout this tutorial. You can find and install other themes as <a href="#using-plugins-and-external-tools-in-pycharm">plugins</a>, or you can also <a href="https://blog.codota.com/5-best-intellij-themes/">import them</a>.</p>
<p>On the next page, leave the defaults and click <em>Next: Featured plugins</em>. There, PyCharm will show you a list of plugins you may want to install because most users like to use them. Click <em>Start using PyCharm</em>, and now you are ready to write some code!</p>
<h2 id="writing-code-in-pycharm">Writing Code in PyCharm</h2>
<p>In PyCharm, you do everything in the context of a <strong>project</strong>. Thus, the first thing you need to do is create one.</p>
<p>After installing and opening PyCharm, you are on the welcome screen. Click <em>Create New Project</em>, and you’ll see the <em>New Project</em> popup:</p>
<p><a href="https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png" width="664" height="480" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png&w=166&sig=6423b68127eae8ca93165323df4884844265f5e3 166w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png&w=332&sig=50d660be9a42904b1161bd76df1ab9ddd77e2132 332w, https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png 664w" sizes="75vw" alt="New Project in PyCharm"/></a></p>
<p>Specify the project location and expand the <em>Project Interpreter</em> drop down. Here, you have options to create a new project interpreter or reuse an existing one. Choose <em>New environment using</em>. Right next to it, you have a drop down list to select one of <em>Virtualenv</em>, <em>Pipenv</em>, or <em>Conda</em>, which are the tools that help to keep dependencies required by different projects separate by creating isolated Python environments for them. </p>
<p>You are free to select whichever you like, but <em>Virtualenv</em> is used for this tutorial. If you choose to, you can specify the environment location and choose the base interpreter from the list, which is a list of Python interpreters (such as Python2.7 and Python3.6) installed on your system. Usually, the defaults are fine. Then you have to select boxes to inherit global site-packages to your new environment and make it available to all other projects. Leave them unselected. </p>
<p>Click <em>Create</em> on the bottom right and you will see the new project created:</p>
<p><a href="https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png" width="1174" height="734" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png&w=293&sig=d6394f174acab8ee63eb6ce0360d0174857f7afb 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png&w=587&sig=2fd8ea44cc0ad15f015aab687018ec7ad8861a53 587w, https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png 1174w" sizes="75vw" alt="Project created in PyCharm"/></a></p>
<p>You will also see a small <em>Tip of the Day</em> popup where PyCharm gives you one trick to learn at each startup. Go ahead and close this popup.</p>
<p>It is now time to start a new Python program. Type <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-n">N</kbd></span> if you are on Mac or <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-insert">Ins</kbd></span> if you are on Windows or Linux. Then, choose <em>Python File</em>. You can also select <em>File โ New</em> from the menu. Name the new file <code>guess_game.py</code> and click <em>OK</em>. You will see a PyCharm window similar to the following:</p>
<p><a href="https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png" width="1172" height="734" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png&w=293&sig=b1ee432e97d642aea67818cc7280971247196a62 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png&w=586&sig=3563b00140a3edd3f1df0e522d5069f6efcb62d3 586w, https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png 1172w" sizes="75vw" alt="PyCharm New File"/></a></p>
<p>For our test code, let’s quickly code up a simple guessing game in which the program chooses a number that the user has to guess. For every guess, the program will tell if the user’s guess was smaller or bigger than the secret number. The game ends when the user guesses the number. Here’s the code for the game:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">random</span> <span class="k">import</span> <span class="n">randint</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="k">def</span> <span class="nf">play</span><span class="p">():</span>
<span class="lineno"> 4 </span> <span class="n">random_int</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="lineno"> 5 </span>
<span class="lineno"> 6 </span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="lineno"> 7 </span> <span class="n">user_guess</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s2">"What number did we guess (0-100)?"</span><span class="p">))</span>
<span class="lineno"> 8 </span>
<span class="lineno"> 9 </span> <span class="k">if</span> <span class="n">user_guess</span> <span class="o">==</span> <span class="n">randint</span><span class="p">:</span>
<span class="lineno">10 </span> <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"You found the number (</span><span class="si">{random_int}</span><span class="s2">). Congrats!"</span><span class="p">)</span>
<span class="lineno">11 </span> <span class="k">break</span>
<span class="lineno">12 </span>
<span class="lineno">13 </span> <span class="k">if</span> <span class="n">user_guess</span> <span class="o"><</span> <span class="n">random_int</span><span class="p">:</span>
<span class="lineno">14 </span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Your number is less than the number we guessed."</span><span class="p">)</span>
<span class="lineno">15 </span> <span class="k">continue</span>
<span class="lineno">16 </span>
<span class="lineno">17 </span> <span class="k">if</span> <span class="n">user_guess</span> <span class="o">></span> <span class="n">random_int</span><span class="p">:</span>
<span class="lineno">18 </span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Your number is more than the number we guessed."</span><span class="p">)</span>
<span class="lineno">19 </span> <span class="k">continue</span>
<span class="lineno">20 </span>
<span class="lineno">21 </span>
<span class="lineno">22 </span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="lineno">23 </span> <span class="n">play</span><span class="p">()</span>
</pre></div>
<p>Type this code directly rather than copying and pasting. You’ll see something like this:</p>
<p><a href="https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif" width="528" height="480" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif&w=132&sig=7e5eb20fb9ae97b1cea80380f9ad00f35dd76707 132w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif&w=264&sig=eb242bca301203a38741a986d7847cf0f3ef4cff 264w, https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif 528w" sizes="75vw" alt="Typing Guessing Game"/></a></p>
<p>As you can see, PyCharm provides <a href="https://www.jetbrains.com/pycharm/features/coding_assistance.html">Intelligent Coding Assistance</a> with code completion, code inspections, on-the-fly error highlighting, and quick-fix suggestions. In particular, note how when you typed <code>main</code> and then hit tab, PyCharm auto-completed the whole <code>main</code> clause for you. </p>
<p>Also note how, if you forget to type <code>if</code> before the condition, append <code>.if</code>, and then hit <span class="keys"><kbd class="key-tab">Tab</kbd></span>, PyCharm fixes the <code>if</code> clause for you. The same is true with <code>True.while</code>. That’s <a href="https://www.jetbrains.com/help/pycharm/settings-postfix-completion.html">PyCharm’s Postfix completions</a> working for you to help reduce backward caret jumps.</p>
<h2 id="running-code-in-pycharm">Running Code in PyCharm</h2>
<p>Now that you’ve coded up the game, it’s time for you to run it.</p>
<p>You have three ways of running this program:</p>
<ol>
<li>Use the shortcut <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-r">R</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f10">F10</kbd></span> on Windows or Linux.</li>
<li>Right-click the background and choose <em>Run ‘guess_game’</em> from the menu.</li>
<li>Since this program has the <code>__main__</code> clause, you can click on the little green arrow to the left of the <code>__main__</code> clause and choose <em>Run ‘guess_game’</em> from there.</li>
</ol>
<p>Use any one of the options above to run the program, and you’ll see the Run Tool pane appear at the bottom of the window, with your code output showing:</p>
<p><a href="https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif" width="1068" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif&w=267&sig=44be962297881f8ae66557c19905a55202ee14de 267w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif&w=534&sig=000552999cea7a9ce79eeec2f9db7361e9585d77 534w, https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif 1068w" sizes="75vw" alt="Running a script in PyCharm"/></a></p>
<p>Play the game for a little bit to see if you can find the number guessed. Pro tip: start with 50. </p>
<h2 id="debugging-in-pycharm">Debugging in PyCharm</h2>
<p>Did you find the number? If so, you may have seen something weird after you found the number. Instead of printing the congratulations message and exiting, the program seems to start over. That’s a bug right there. To discover why the program starts over, you’ll now debug the program.</p>
<p>First, place a breakpoint by clicking on the blank space to the left of line number 8:</p>
<p><a href="https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png" width="1042" height="710" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png&w=260&sig=e714eeae34fad6c0e5889bee0f236f9c30e100a0 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png&w=521&sig=7c4692a273d49a627785a715fd0a08d4c8120649 521w, https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png 1042w" sizes="75vw" alt="Debug breakpoint in PyCharm"/></a></p>
<p>This will be the point where the program will be suspended, and you can start exploring what went wrong from there on. Next, choose one of the following three ways to start debugging:</p>
<ol>
<li>Press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-d">D</kbd></span> on Mac or <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-f9">F9</kbd></span> on Windows or Linux.</li>
<li>Right-click the background and choose <em>Debug ‘guess_game’</em>.</li>
<li>Click on the little green arrow to the left of the <code>__main__</code> clause and choose <em>Debug ‘guess_game</em> from there.</li>
</ol>
<p>Afterwards, you’ll see a <em>Debug</em> window open at the bottom:</p>
<p><a href="https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png" width="1043" height="711" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-start.04246b743469.png&w=260&sig=cea78f8df9a7f183330e3610c90a2abeab879923 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-start.04246b743469.png&w=521&sig=6ae0fe4515cf7f0d52cbcda986eb9c52d0a602ee 521w, https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png 1043w" sizes="75vw" alt="Start of debugging in PyCharm"/></a></p>
<p>Follow the steps below to debug the program:</p>
<ol>
<li>
<p>Notice that the current line is highlighted in blue.</p>
</li>
<li>
<p>See that <code>random_int</code> and its value are listed in the Debug window. Make a note of this number. (In the picture, the number is 85.)</p>
</li>
<li>
<p>Hit <span class="keys"><kbd class="key-f8">F8</kbd></span> to execute the current line and step <em>over</em> to the next one. You can also use <span class="keys"><kbd class="key-f7">F7</kbd></span> to step <em>into</em> the function in the current line, if necessary. As you continue executing the statements, the changes in the variables will be automatically reflected in the Debugger window.</p>
</li>
<li>
<p>Notice that there is the Console tab right next to the Debugger tab that opened. This Console tab and the Debugger tab are mutually exclusive. In the Console tab, you will be interacting with your program, and in the Debugger tab you will do the debugging actions.</p>
</li>
<li>
<p>Switch to the Console tab to enter your guess.</p>
</li>
<li>
<p>Type the number shown, and then hit <span class="keys"><kbd class="key-enter">Enter</kbd></span>.</p>
</li>
<li>
<p>Switch back to the Debugger tab.</p>
</li>
<li>
<p>Hit <span class="keys"><kbd class="key-f8">F8</kbd></span> again to evaluate the <code>if</code> statement. Notice that you are now on line 14. But wait a minute! Why didn’t it go to the line 11? The reason is that the <code>if</code> statement on line 10 evaluated to <code>False</code>. But why did it evaluate to <code>False</code> when you entered the number that was chosen?</p>
</li>
<li>
<p>Look carefully at line 10 and notice that we are comparing <code>user_guess</code> with the wrong thing. Instead of comparing it with <code>random_int</code>, we are comparing it with <code>randint</code>, the function that was imported from the <code>random</code> package.</p>
</li>
<li>
<p>Change it to <code>random_int</code>, restart the debugging, and follow the same steps again. You will see that, this time, it will go to line 11, and line 10 will evaluate to <code>True</code>:</p>
</li>
</ol>
<p><a href="https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif" width="1092" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif&w=273&sig=fc5de269fbc13ea5c1d8be4ca7f525e04a4bb68c 273w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif&w=546&sig=4719f69fd6e78ed3e68f16a9355d00d0edd85a26 546w, https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif 1092w" sizes="75vw" alt="Debugging Script in PyCharm"/></a></p>
<p>Congratulations! You fixed the bug.</p>
<h2 id="testing-in-pycharm">Testing in PyCharm</h2>
<p>No application is reliable without unit tests. PyCharm helps you write and run them very quickly and comfortably. By default, <a href="https://docs.python.org/3/library/unittest.html"><code>unittest</code></a> is used as the test runner, but PyCharm also supports other testing frameworks such as <a href="http://www.pytest.org/en/latest/"><code>pytest</code></a>, <a href="https://nose.readthedocs.io/en/latest/"><code>nose</code></a>, <a href="https://docs.python.org/3/library/doctest.html"><code>doctest</code></a>, <a href="https://www.jetbrains.com/help/pycharm/tox-support.html"><code>tox</code></a>, and <a href="https://twistedmatrix.com/trac/wiki/TwistedTrial"><code>trial</code></a>. You can, for example, enable <code>pytest</code> for your project like this:</p>
<ol>
<li>Open the <em>Settings/Preferences โ Tools โ Python Integrated Tools</em> settings dialog.</li>
<li>Select <code>pytest</code> in the Default test runner field.</li>
<li>Click <em>OK</em> to save the settings. </li>
</ol>
<p>For this example, we’ll be using the default test runner <code>unittest</code>. </p>
<p>In the same project, create a file called <code>calculator.py</code> and put the following <code>Calculator</code> class in it:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="k">class</span> <span class="nc">Calculator</span><span class="p">:</span>
<span class="lineno"> 2 </span> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="lineno"> 3 </span> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
<span class="lineno"> 4 </span>
<span class="lineno"> 5 </span> <span class="k">def</span> <span class="nf">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="lineno"> 6 </span> <span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</pre></div>
<p>PyCharm makes it very easy to create tests for your existing code. With the <code>calculator.py</code> file open, execute any one of the following that you like:</p>
<ul>
<li>Press <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-t">T</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-t">T</kbd></span> on Windows or Linux.</li>
<li>Right-click in the background of the class and then choose <em>Go To</em> and <em>Test</em>.</li>
<li>On the main menu, choose <em>Navigate โ Test</em>.</li>
</ul>
<p>Choose <em>Create New Test…</em>, and you will see the following window:</p>
<p><a href="https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png" width="500" height="402" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png&w=125&sig=0c50b83f35578fd8004dce9e7d55fcd3b09a1967 125w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png&w=250&sig=f6740142e71d2d2f6196ffbe341ae09bc9a64453 250w, https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png 500w" sizes="75vw" alt="Create tests in PyCharm"/></a></p>
<p>Leave the defaults of <em>Target directory</em>, <em>Test file name</em>, and <em>Test class name</em>. Select both of the methods and click <em>OK</em>. Voila! PyCharm automatically created a file called <code>test_calculator.py</code> and created the following stub tests for you in it:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="k">import</span> <span class="n">TestCase</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="k">class</span> <span class="nc">TestCalculator</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="lineno"> 4 </span> <span class="k">def</span> <span class="nf">test_add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno"> 5 </span> <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
<span class="lineno"> 6 </span>
<span class="lineno"> 7 </span> <span class="k">def</span> <span class="nf">test_multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno"> 8 </span> <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
</pre></div>
<p>Run the tests using one of the methods below:</p>
<ul>
<li>Press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-r">R</kbd></span> on Mac or <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f10">F10</kbd></span> on Windows or Linux.</li>
<li>Right-click the background and choose <em>Run ‘Unittests for test_calculator.py’</em>.</li>
<li>Click on the little green arrow to the left of the test class name and choose <em>Run ‘Unittests for test_calculator.py’</em>.</li>
</ul>
<p>You’ll see the tests window open on the bottom with all the tests failing:</p>
<p><a href="https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png" width="972" height="645" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png&w=243&sig=cb7ef285c20ed83b9771a91cc38d77342e4d3745 243w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png&w=486&sig=aaec55471da2d168048783b9c4e573f5ce876de4 486w, https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png 972w" sizes="75vw" alt="Failed tests in PyCharm"/></a></p>
<p>Notice that you have the hierarchy of the test results on the left and the output of the terminal on the right. </p>
<p>Now, implement <code>test_add</code> by changing the code to the following:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="k">import</span> <span class="n">TestCase</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="kn">from</span> <span class="nn">calculator</span> <span class="k">import</span> <span class="n">Calculator</span>
<span class="lineno"> 4 </span>
<span class="lineno"> 5 </span><span class="k">class</span> <span class="nc">TestCalculator</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
<span class="lineno"> 6 </span> <span class="k">def</span> <span class="nf">test_add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno"> 7 </span> <span class="bp">self</span><span class="o">.</span><span class="n">calculator</span> <span class="o">=</span> <span class="n">Calculator</span><span class="p">()</span>
<span class="lineno"> 8 </span> <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="mi">7</span><span class="p">)</span>
<span class="lineno"> 9 </span>
<span class="lineno">10 </span> <span class="k">def</span> <span class="nf">test_multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno">11 </span> <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
</pre></div>
<p>Run the tests again, and you’ll see that one test passed and the other failed. Explore the options to show passed tests, to show ignored tests, to sort tests alphabetically, and to sort tests by duration:</p>
<p><a href="https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif" width="1092" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-tests.6077562207ba.gif&w=273&sig=e2238425e1cfb9a9a298741244f3021f3984dbf8 273w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-tests.6077562207ba.gif&w=546&sig=85203afd5ff2189fdfadad0960da514b03063953 546w, https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif 1092w" sizes="75vw" alt="Running tests in PyCharm"/></a></p>
<p>Note that the <code>sleep(0.1)</code> method that you see in the GIF above is intentionally used to make one of the tests slower so that sorting by duration works. </p>
<h2 id="editing-an-existing-project-in-pycharm">Editing an Existing Project in PyCharm</h2>
<p>These single file projects are great for examples, but you’ll often work on much larger projects over a longer period of time. In this section, you’ll take a look at how PyCharm works with a larger project. </p>
<p>To explore the project-focused features of PyCharm, you’ll use the Alcazar web framework that was built for learning purposes. To continue following along, clone the repo locally:</p>
<div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
<p>Once you have a project locally, open it in PyCharm using one of the following methods:</p>
<ul>
<li>Click <em>File โ Open</em> on the main menu.</li>
<li>Click <em>Open</em> on the <a href="https://www.jetbrains.com/help/pycharm/welcome-screen.html">Welcome Screen</a> if you are there.</li>
</ul>
<p>After either of these steps, find the folder containing the project on your computer and open it.</p>
<p>If this project contains a <a href="https://realpython.com/python-virtual-environments-a-primer/">virtual environment</a>, then PyCharm will automatically use this virtual environment and make it the project interpreter.</p>
<p>If you need to configure a different <code>virtualenv</code>, then open <em>Preferences</em> on Mac by pressing <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-comma">,</kbd></span> or <em>Settings</em> on Windows or Linux by pressing <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-s">S</kbd></span> and find the <em>Project: ProjectName</em> section. Open the drop-down and choose <em>Project Interpreter</em>:</p>
<p><a href="https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png" width="1083" height="723" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-interpreter.57282306555a.png&w=270&sig=286643bc473f648bbcce27338c980eb023746ac2 270w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-interpreter.57282306555a.png&w=541&sig=6d33c5adf0380e8c5b01ae9430436983580b4b49 541w, https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png 1083w" sizes="75vw" alt="Project interpreter in PyCharm"/></a></p>
<p>Choose the <code>virtualenv</code> from the drop-down list. If it’s not there, then click on the settings button to the right of the drop-down list and then choose <em>Add…</em>. The rest of the steps should be the same as when we were <a href="#writing-code-in-pycharm">creating a new project</a>.</p>
<h2 id="searching-and-navigating-in-pycharm">Searching and Navigating in PyCharm</h2>
<p>In a big project where it’s difficult for a single person to remember where everything is located, it’s very important to be able to quickly navigate and find what you looking for. PyCharm has you covered here as well. Use the project you opened in the section above to practice these shortcuts: </p>
<ul>
<li><strong>Searching for a fragment in the current file:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Windows or Linux.</li>
<li><strong>Searching for a fragment in the entire project:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Windows or Linux.</li>
<li><strong>Searching for a class:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-o">O</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-n">N</kbd></span> on Windows or Linux.</li>
<li><strong>Searching for a file:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-o">O</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-n">N</kbd></span> on Windows or Linux.</li>
<li><strong>Searching all if you don’t know whether it’s a file, class, or a code fragment that you are looking for:</strong> Press <span class="keys"><kbd class="key-shift">Shift</kbd></span> twice.</li>
</ul>
<p>As for the navigation, the following shortcuts may save you a lot of time:</p>
<ul>
<li><strong>Going to the declaration of a variable:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd></span> on Windows or Linux, and click on the variable.</li>
<li><strong>Finding usages of a class, a method, or any symbol:</strong> Press <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-f7">F7</kbd></span>.</li>
<li><strong>Seeing your recent changes:</strong> Press <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-c">C</kbd></span> or go to <em>View โ Recent Changes</em> on the main menu.</li>
<li><strong>Seeing your recent files:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-e">E</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-e">E</kbd></span> on Windows or Linux, or go to <em>View โ Recent Files</em> on the main menu.</li>
<li><strong>Going backward and forward through your history of navigation after you jumped around:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-bracket-left">[</kbd></span> / <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-bracket-right">]</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-arrow-left">Left</kbd></span> / <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-arrow-right">Right</kbd></span> on Windows or Linux.</li>
</ul>
<p>For more details, see the <a href="https://www.jetbrains.com/help/pycharm/tutorial-exploring-navigation-and-search.html">official documentation</a>. </p>
<h2 id="using-version-control-in-pycharm">Using Version Control in PyCharm</h2>
<p>Version control systems such as <a href="https://git-scm.com/">Git</a> and <a href="https://www.mercurial-scm.org/">Mercurial</a> are some of the most important tools in the modern software development world. So, it is essential for an IDE to support them. PyCharm does that very well by integrating with a lot of popular VC systems such as Git (and <a href="https://github.com/">Github</a>), Mercurial, <a href="https://www.perforce.com/solutions/version-control">Perforce</a> and, <a href="https://subversion.apache.org/">Subversion</a>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note</strong>: <a href="https://realpython.com/python-git-github-intro/">Git</a> is used for the following examples.</p>
</div>
<h3 id="configuring-vcs">Configuring VCS</h3>
<p>To enable VCS integration. Go to <em>VCS โ VCS Operations Popup…</em> from the menu on the top or press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-v">V</kbd></span> on Mac or <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-grave">`</kbd></span> on Windows or Linux. Choose <em>Enable Version Control Integration…</em>. You’ll see the following window open:</p>
<p><a href="https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png" width="715" height="147" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png&w=178&sig=7ef55e4ed6068c86d831adaefc1af11f4c083763 178w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png&w=357&sig=c701d809b49e9837477c14e8dfe34dc4cfa66c33 357w, https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png 715w" sizes="75vw" alt="Enable Version Control Integration in PyCharm"/></a></p>
<p>Choose <em>Git</em> from the drop down list, click <em>OK</em>, and you have VCS enabled for your project. Note that if you opened an existing project that has version control enabled, then PyCharm will see that and automatically enable it.</p>
<p>Now, if you go to the <em>VCS Operations Popup…</em>, you’ll see a different popup with the options to do <code>git add</code>, <code>git stash</code>, <code>git branch</code>, <code>git commit</code>, <code>git push</code> and more:</p>
<p><a href="https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png" target="_blank"><img class="img-fluid mx-auto d-block border w-50" src="https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png" width="392" height="379" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png&w=98&sig=f285b015c957936448441c4ec8b03cf8627cdffc 98w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png&w=196&sig=a16c449a21dde2a2b0f2f7533c92e71574cd3d60 196w, https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png 392w" sizes="75vw" alt="VCS operations in PyCharm"/></a></p>
<p>If you can’t find what you need, you can most probably find it by going to <em>VCS</em> from the top menu and choosing <em>Git</em>, where you can even create and view pull requests.</p>
<h3 id="committing-and-conflict-resolution">Committing and Conflict Resolution</h3>
<p>These are two features of VCS integration in PyCharm that I personally use and enjoy a lot! Let’s say you have finished your work and want to commit it. Go to <em>VCS โ VCS Operations Popup… โ Commit…</em> or press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-k">K</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-k">K</kbd></span> on Windows or Linux. You’ll see the following window open:</p>
<p><a href="https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png" width="929" height="682" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png&w=232&sig=935dabf7a28cf757a5c87165e3da494540c3e4a6 232w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png&w=464&sig=7a05fe8a5cbfb1c3524c24515f9a8d7fa3211591 464w, https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png 929w" sizes="75vw" alt="Commit window in PyCharm"/></a></p>
<p>In this window, you can do the following:</p>
<ol>
<li>Choose which files to commit</li>
<li>Write your commit message</li>
<li>Do all kinds of checks and cleanup <a href="https://www.jetbrains.com/help/idea/commit-changes-dialog.html#before_commit">before commit</a></li>
<li>See the difference of changes</li>
<li>Commit and push at once by pressing the arrow to the right of the <em>Commit</em> button on the right bottom and choosing <em>Commit and Push…</em></li>
</ol>
<p>It can feel magical and fast, especially if you’re used to doing everything manually on the command line.</p>
<p>When you work in a team, <strong>merge conflicts</strong> do happen. When somebody commits changes to a file that you’re working on, but their changes overlap with yours because both of you changed the same lines, then VCS will not be able to figure out if it should choose your changes or those of your teammate. So you’ll get these unfortunate arrows and symbols:</p>
<p><a href="https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png" width="996" height="691" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png&w=249&sig=d02221be0ce12dbc9a8ea7514e047bb608b16c08 249w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png&w=498&sig=1286c7aa0f62795e01c6a2f0607096b8c0d67b20 498w, https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png 996w" sizes="75vw" alt="Conflicts in PyCharm"/></a></p>
<p>This looks strange, and it’s difficult to figure out which changes should be deleted and which ones should stay. PyCharm to the rescue! It has a much nicer and cleaner way of resolving conflicts. Go to <em>VCS</em> in the top menu, choose <em>Git</em> and then <em>Resolve conflicts…</em>. Choose the file whose conflicts you want to resolve and click on <em>Merge</em>. You will see the following window open:</p>
<p><a href="https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png" width="1174" height="709" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png&w=293&sig=ef195e8acbb5ec9fa55fca46a43486995c2efca7 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png&w=587&sig=067731ba579736c92c747eb128b3fb02b0d68417 587w, https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png 1174w" sizes="75vw" alt="Conflict resolving windown in PyCharm"/></a></p>
<p>On the left column, you will see your changes. On the right one, the changes made by your teammate. Finally, in the middle column, you will see the result. The conflicting lines are highlighted, and you can see a little <em>X</em> and <em>>></em>/<em><<</em> right beside those lines. Press the arrows to accept the changes and the <em>X</em> to decline. After you resolve all those conflicts, click the <em>Apply</em> button: </p>
<p><a href="https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif" width="1200" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif&w=300&sig=099dcca659431f9d2a1315b1fea5d7cbe246425c 300w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif&w=600&sig=6b98751300d8750c93d80cf7b65e96fb57d81714 600w, https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif 1200w" sizes="75vw" alt="Resolving Conflicts in PyCharm"/></a></p>
<p>In the GIF above, for the first conflicting line, the author declined his own changes and accepted those of his teammates. Conversely, the author accepted his own changes and declined his teammates’ for the second conflicting line.</p>
<p>There’s a lot more that you can do with the VCS integration in PyCharm. For more details, see <a href="https://www.jetbrains.com/help/pycharm/version-control-integration.html">this documentation</a>.</p>
<h2 id="using-plugins-and-external-tools-in-pycharm">Using Plugins and External Tools in PyCharm</h2>
<p>You can find almost everything you need for development in PyCharm. If you can’t, there is most probably a <a href="https://plugins.jetbrains.com/">plugin</a> that adds that functionality you need to PyCharm. For example, they can:</p>
<ul>
<li>Add support for various languages and frameworks </li>
<li>Boost your productivity with shortcut hints, file watchers, and so on </li>
<li>Help you learn a new programming language with coding exercises</li>
</ul>
<p>For instance, <a href="https://plugins.jetbrains.com/plugin/164-ideavim">IdeaVim</a> adds Vim emulation to PyCharm. If you like Vim, this can be a pretty good combination. </p>
<p><a href="https://plugins.jetbrains.com/plugin/8006-material-theme-ui">Material Theme UI</a> changes the appearance of PyCharm to a Material Design look and feel: </p>
<p><a href="https://files.realpython.com/media/pycharm-material-theme.178175815adc.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-material-theme.178175815adc.png" width="1110" height="743" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-material-theme.178175815adc.png&w=277&sig=60ec4c8b5f6a89af345a230518e21ee8a33d174b 277w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-material-theme.178175815adc.png&w=555&sig=fdac8603145ed77eb443abcbbe8ebdb9811bdff4 555w, https://files.realpython.com/media/pycharm-material-theme.178175815adc.png 1110w" sizes="75vw" alt="Material Theme in PyCharm"/></a></p>
<p><a href="https://plugins.jetbrains.com/plugin/9442-vue-js">Vue.js</a> adds support for <a href="https://vuejs.org/">Vue.js</a> projects. <a href="https://plugins.jetbrains.com/plugin/7793-markdown">Markdown</a> provides the capability to edit Markdown files within the IDE and see the rendered HTML in a live preview. You can find and install all of the available plugins by going to the <em>Preferences โ Plugins</em> on Mac or <em>Settings โ Plugins</em> on Windows or Linux, under the <em>Marketplace</em> tab:</p>
<p><a href="https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png" width="1047" height="687" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png&w=261&sig=8d4a9ba35b5eb27b5604f86108b79f28e40f3cc9 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png&w=523&sig=ec5e20ae96bf9ee37a1011bcc2203e7d1599b4a4 523w, https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png 1047w" sizes="75vw" alt="Plugin Marketplace in PyCharm"/></a></p>
<p>If you can’t find what you need, you can even <a href="http://www.jetbrains.org/intellij/sdk/docs/basics.html">develop your own plugin</a>.</p>
<p>If you can’t find the right plugin and don’t want to develop your own because there’s already a package in PyPI, then you can add it to PyCharm as an external tool. Take <a href="http://flake8.pycqa.org/en/latest/"><code>Flake8</code></a>, the code analyzer, as an example. </p>
<p>First, install <code>flake8</code> in your virtualenv with <code>pip install flake8</code> in the Terminal app of your choice. You can also use the one integrated into PyCharm:</p>
<p><a href="https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png" width="972" height="646" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-terminal.bb20cae6697e.png&w=243&sig=860217f31e60a4bb574e169ee05b6788cacaa388 243w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-terminal.bb20cae6697e.png&w=486&sig=26a30f328e42cc19e7e652c44767093480dbf352 486w, https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png 972w" sizes="75vw" alt="Terminal in PyCharm"/></a></p>
<p>Then, go to <em>Preferences โ Tools</em> on Mac or <em>Settings โ Tools</em> on Windows/Linux, and then choose <em>External Tools</em>. Then click on the little <em>+</em> button at the bottom (1). In the new popup window, insert the details as shown below and click <em>OK</em> for both windows:</p>
<p><a href="https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png" width="1082" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png&w=270&sig=152f2ccf0a75a950b6a5cd6b5087507b66288595 270w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png&w=541&sig=ee7221ee226076dfeaedfd5d2756263e351d8d14 541w, https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png 1082w" sizes="75vw" alt="Flake8 tool in PyCharm"/></a></p>
<p>Here, <em>Program</em> (2) refers to the Flake8 executable that can be found in the folder <em>/bin</em> of your virtual environment. <em>Arguments</em> (3) refers to which file you want to analyze with the help of Flake8. <em>Working directory</em> is the directory of your project.</p>
<p>You could hardcode the absolute paths for everything here, but that would mean that you couldn’t use this external tool in other projects. You would be able to use it only inside one project for one file. </p>
<p>So you need to use something called <em>Macros</em>. Macros are basically variables in the format of <code>$name$</code> that change according to your context. For example, <code>$FileName$</code> is <code>first.py</code> when you’re editing <code>first.py</code>, and it is <code>second.py</code> when you’re editing <code>second.py</code>. You can see their list and insert any of them by clicking on the <em>Insert Macro…</em> buttons. Because you used macros here, the values will change according to the project you’re currently working on, and Flake8 will continue to do its job properly. </p>
<p>In order to use it, create a file <code>example.py</code> and put the following code in it:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="n">CONSTANT_VAR</span> <span class="o">=</span> <span class="mi">1</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span>
<span class="lineno"> 4 </span>
<span class="lineno"> 5 </span><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="lineno"> 6 </span> <span class="n">c</span> <span class="o">=</span> <span class="s2">"hello"</span>
<span class="lineno"> 7 </span> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
</pre></div>
<p>It deliberately breaks some of the Flake8 rules. Right-click the background of this file. Choose <em>External Tools</em> and then <em>Flake8</em>. Voilร ! The output of the Flake8 analysis will appear at the bottom: </p>
<p><a href="https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png" width="997" height="634" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png&w=249&sig=8fecbaaf9d4e2daa2bbe443be4b6dee2634f2a46 249w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png&w=498&sig=4e41c6dbac28fbe23c18716f49c3cadf63767500 498w, https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png 997w" sizes="75vw" alt="Flake8 Output in PyCharm"/></a></p>
<p>In order to make it even better, you can add a shortcut for it. Go to <em>Preferences</em> on Mac or to <em>Settings</em> on Windows or Linux. Then, go to <em>Keymap โ External Tools โ External Tools</em>. Double-click <em>Flake8</em> and choose <em>Add Keyboard Shortcut</em>. You’ll see this window:</p>
<p><a href="https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png" width="1084" height="724" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png&w=271&sig=e95cc32634af125588e6881ec6992dace79ec667 271w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png&w=542&sig=836cbeec5256df8e8048aa6a0c3e7d89b2bac444 542w, https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png 1084w" sizes="75vw" alt="Add shortcut in PyCharm"/></a></p>
<p>In the image above, the shortcut is <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-a">A</kbd></span> for this tool. Add your preferred shortcut in the textbox and click <em>OK</em> for both windows. Now you can now use that shortcut to analyze the file you’re currently working on with Flake8.</p>
<h2 id="pycharm-professional-features">PyCharm Professional Features</h2>
<p>PyCharm Professional is a paid version of PyCharm with more out-of-the-box features and integrations. In this section, you’ll mainly be presented with overviews of its main features and links to the official documentation, where each feature is discussed in detail. Remember that none of the following features is available in the Community edition. </p>
<h3 id="django-support">Django Support</h3>
<p>PyCharm has extensive support for <a href="https://www.djangoproject.com/">Django</a>, one of the most popular and beloved <a href="https://realpython.com/learning-paths/become-python-web-developer/">Python web frameworks</a>. To make sure that it’s enabled, do the following:</p>
<ol>
<li>Open <em>Preferences</em> on Mac or <em>Settings</em> on Windows or Linux.</li>
<li>Choose <em>Languages and Frameworks</em>.</li>
<li>Choose <em>Django</em>.</li>
<li>Check the checkbox <em>Enable Django support</em>.</li>
<li>Apply changes.</li>
</ol>
<p>Now that you’ve enabled Django support, your Django development journey will be a lot easier in PyCharm:</p>
<ul>
<li>When creating a project, you’ll have a dedicated Django project type. This means that, when you choose this type, you’ll have all the necessary files and settings. This is the equivalent of using <code>django-admin startproject mysite</code>.</li>
<li>You can run <code>manage.py</code> commands directly inside PyCharm. </li>
<li>Django templates are supported, including:<ul>
<li>Syntax and error highlighting</li>
<li>Code completion</li>
<li>Navigation</li>
<li>Completion for block names</li>
<li>Completion for custom tags and filters</li>
<li>Quick documentation for tags and filters</li>
<li>Capability to debug them</li>
</ul>
</li>
<li>Code completion in all other Django parts such as views, URLs and models, and code insight support for Django ORM.</li>
<li>Model dependency diagrams for Django models.</li>
</ul>
<p>For more details on Django support, see the <a href="https://www.jetbrains.com/help/pycharm/django-support7.html">official documentation</a>.</p>
<h3 id="database-support">Database Support</h3>
<p>Modern database development is a complex task with many supporting systems and workflows. That’s why JetBrains, the company behind PyCharm, developed a standalone IDE called <a href="https://www.jetbrains.com/datagrip/">DataGrip</a> for that. It’s a separate product from PyCharm with a separate license. </p>
<p>Luckily, PyCharm supports all the features that are available in DataGrip through a plugin called <em>Database tools and SQL</em>, which is enabled by default. With the help of it, you can query, create and manage databases whether they’re working locally, on a server, or in the cloud. The plugin supports MySQL, PostgreSQL, Microsoft SQL Server, SQLite, MariaDB, Oracle, Apache Cassandra, and others. For more information on what you can do with this plugin, check out <a href="https://www.jetbrains.com/help/pycharm/relational-databases.html">the comprehensive documentation on the database support</a>.</p>
<h3 id="thread-concurrency-visualization">Thread Concurrency Visualization</h3>
<p><a href="https://channels.readthedocs.io/en/latest/"><code>Django Channels</code></a>, <a href="https://realpython.com/async-io-python/"><code>asyncio</code></a>, and the recent frameworks like <a href="https://www.starlette.io/"><code>Starlette</code></a> are examples of a growing trend in asynchronous Python programming. While it’s true that asynchronous programs do bring a lot of benefits to the table, it’s also notoriously hard to write and debug them. In such cases, <em>Thread Concurrency Visualization</em> can be just what the doctor ordered because it helps you take full control over your multi-threaded applications and optimize them.</p>
<p>Check out <a href="https://www.jetbrains.com/help/pycharm/thread-concurrency-visualization.html">the comprehensive documentation of this feature</a> for more details.</p>
<h3 id="profiler">Profiler</h3>
<p>Speaking of optimization, profiling is another technique that you can use to optimize your code. With its help, you can see which parts of your code are taking most of the execution time. A profiler runs in the following order of priority: </p>
<ol>
<li><a href="https://vmprof.readthedocs.io/en/latest/"><code>vmprof</code></a> </li>
<li><a href="https://github.com/sumerc/yappi"><code>yappi</code></a></li>
<li><a href="https://docs.python.org/3/library/profile.html"><code>cProfile</code></a></li>
</ol>
<p>If you don’t have <code>vmprof</code> or <code>yappi</code> installed, then it’ll fall back to the standard <code>cProfile</code>. It’s <a href="https://www.jetbrains.com/help/pycharm/profiler.html">well-documented</a>, so I won’t rehash it here. </p>
<h3 id="scientific-mode">Scientific Mode</h3>
<p>Python is not only a language for general and web programming. It also emerged as the best tool for data science and machine learning over these last years thanks to libraries and tools like <a href="http://www.numpy.org/">NumPy</a>, <a href="https://www.scipy.org/">SciPy</a>, <a href="https://scikit-learn.org/">scikit-learn</a>, <a href="https://matplotlib.org/">Matplotlib</a>, <a href="https://jupyter.org/">Jupyter</a>, and more. With such powerful libraries available, you need a powerful IDE to support all the functions such as graphing and analyzing those libraries have. PyCharm provides everything you need as <a href="https://www.jetbrains.com/help/pycharm/matplotlib-support.html">thoroughly documented here</a>. </p>
<h3 id="remote-development">Remote Development</h3>
<p>One common cause of bugs in many applications is that development and production environments differ. Although, in most cases, it’s not possible to provide an exact copy of the production environment for development, pursuing it is a worthy goal.</p>
<p>With PyCharm, you can debug your application using an interpreter that is located on the other computer, such as a Linux VM. As a result, you can have the same interpreter as your production environment to fix and avoid many bugs resulting from the difference between development and production environments. Make sure to check out the <a href="https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html">official documentation</a> to learn more.</p>
<h2 id="conclusion">Conclusion</h2>
<p>PyCharm is one of best, if not the best, full-featured, dedicated, and versatile IDEs for Python development. It offers a ton of benefits, saving you a lot of time by helping you with routine tasks. Now you know how to be productive with it!</p>
<p>In this article, you learned about a lot, including:</p>
<ul>
<li>Installing PyCharm</li>
<li>Writing code in PyCharm</li>
<li>Running your code in PyCharm</li>
<li>Debugging and testing your code in PyCharm</li>
<li>Editing an existing project in PyCharm</li>
<li>Searching and navigating in PyCharm</li>
<li>Using Version Control in PyCharm</li>
<li>Using Plugins and External Tools in PyCharm</li>
<li>Using PyCharm Professional features, such as Django support and Scientific mode</li>
</ul>
<p>If there’s anything you’d like to ask or share, please reach out in the comments below. There’s also a lot more information at the <a href="https://www.jetbrains.com/pycharm/documentation/">PyCharm website</a> for you to explore.</p>
<div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
How to Use Python Lambda Functionshttps://realpython.com/courses/python-lambda-functions/2019-08-27T14:00:00+00:00In this step-by-step course, you'll learn about Python lambda functions. You'll see how they compare with regular functions and how you can use them in accordance with best practices.
<p>Python and other languages like Java, C#, and even C++ have had lambda functions added to their syntax, whereas languages like LISP or the ML family of languages, Haskell, OCaml, and F#, use lambdas as a core concept. Python lambdas are little, anonymous functions, subject to a more restrictive but more concise syntax than regular Python functions.</p>
<p><strong>By the end of this course, you’ll know:</strong></p>
<ul>
<li>How Python lambdas came to be </li>
<li>How lambdas compare with regular function objects</li>
<li>How to write lambda functions</li>
<li>Which functions in the Python standard library leverage lambdas</li>
<li>When to use or avoid Python lambda functions</li>
</ul>
<p>This course is mainly for intermediate to experienced Python programmers, but it is accessible to any curious minds with interest in programming. All the examples included in this tutorial have been tested with Python 3.7.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
A Guide to Excel Spreadsheets in Python With openpyxlhttps://realpython.com/openpyxl-excel-spreadsheets-python/2019-08-26T14:00:00+00:00In this step-by-step tutorial, you'll learn how to handle spreadsheets in Python using the openpyxl package. You'll learn how to manipulate Excel spreadsheets, extract information from spreadsheets, create simple or more complex spreadsheets, including adding styles, charts, and so on.
<p>Excel spreadsheets are one of those things you might have to deal with at some point. Either it’s because your boss loves them or because marketing needs them, you might have to learn how to work with spreadsheets, and that’s when knowing <code>openpyxl</code> comes in handy!</p>
<p>Spreadsheets are a very intuitive and user-friendly way to manipulate large datasets without any prior technical background. That’s why they’re still so commonly used today.</p>
<p><strong>In this article, you’ll learn how to use openpyxl to:</strong></p>
<ul>
<li>Manipulate Excel spreadsheets with confidence</li>
<li>Extract information from spreadsheets</li>
<li>Create simple or more complex spreadsheets, including adding styles, charts, and so on</li>
</ul>
<p>This article is written for intermediate developers who have a pretty good knowledge of Python data structures, such as <a href="https://realpython.com/python-dicts/">dicts</a> and <a href="https://realpython.com/python-lists-tuples/">lists</a>, but also feel comfortable around <a href="https://realpython.com/python3-object-oriented-programming/">OOP</a> and more intermediate level topics.</p>
<div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
<h2 id="before-you-begin">Before You Begin</h2>
<p>If you ever get asked to extract some data from a database or log file into an Excel spreadsheet, or if you often have to convert an Excel spreadsheet into some more usable programmatic form, then this tutorial is perfect for you. Let’s jump into the <code>openpyxl</code> caravan!</p>
<h3 id="practical-use-cases">Practical Use Cases</h3>
<p>First things first, when would you need to use a package like <code>openpyxl</code> in a real-world scenario? You’ll see a few examples below, but really, there are hundreds of possible scenarios where this knowledge could come in handy.</p>
<h4 id="importing-new-products-into-a-database">Importing New Products Into a Database</h4>
<p>You are responsible for tech in an online store company, and your boss doesn’t want to pay for a cool and expensive CMS system.</p>
<p>Every time they want to add new products to the online store, they come to you with an Excel spreadsheet with a few hundred rows and, for each of them, you have the product name, description, price, and so forth.</p>
<p>Now, to import the data, you’ll have to iterate over each spreadsheet row and add each product to the online store.</p>
<h4 id="exporting-database-data-into-a-spreadsheet">Exporting Database Data Into a Spreadsheet</h4>
<p>Say you have a Database table where you record all your users’ information, including name, phone number, email address, and so forth.</p>
<p>Now, the Marketing team wants to contact all users to give them some discounted offer or promotion. However, they don’t have access to the Database, or they don’t know how to use SQL to extract that information easily.</p>
<p>What can you do to help? Well, you can make a quick script using <code>openpyxl</code> that iterates over every single User record and puts all the essential information into an Excel spreadsheet.</p>
<p>That’s gonna earn you an extra slice of cake at your company’s next birthday party!</p>
<h4 id="appending-information-to-an-existing-spreadsheet">Appending Information to an Existing Spreadsheet</h4>
<p>You may also have to open a spreadsheet, read the information in it and, according to some business logic, append more data to it.</p>
<p>For example, using the online store scenario again, say you get an Excel spreadsheet with a list of users and you need to append to each row the total amount they’ve spent in your store.</p>
<p>This data is in the Database and, in order to do this, you have to read the spreadsheet, iterate through each row, fetch the total amount spent from the Database and then write back to the spreadsheet.</p>
<p>Not a problem for <code>openpyxl</code>!</p>
<h3 id="learning-some-basic-excel-terminology">Learning Some Basic Excel Terminology</h3>
<p>Here’s a quick list of basic terms you’ll see when you’re working with Excel spreadsheets:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Term</th>
<th>Explanation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Spreadsheet or Workbook</td>
<td>A <strong>Spreadsheet</strong> is the main file you are creating or working with.</td>
</tr>
<tr>
<td>Worksheet or Sheet</td>
<td>A <strong>Sheet</strong> is used to split different kinds of content within the same spreadsheet. A <strong>Spreadsheet</strong> can have one or more <strong>Sheets</strong>.</td>
</tr>
<tr>
<td>Column</td>
<td>A <strong>Column</strong> is a vertical line, and it’s represented by an uppercase letter: <em>A</em>.</td>
</tr>
<tr>
<td>Row</td>
<td>A <strong>Row</strong> is a horizontal line, and it’s represented by a number: <em>1</em>.</td>
</tr>
<tr>
<td>Cell</td>
<td>A <strong>Cell</strong> is a combination of <strong>Column</strong> and <strong>Row</strong>, represented by both an uppercase letter and a number: <em>A1</em>.</td>
</tr>
</tbody>
</table>
</div>
<h3 id="getting-started-with-openpyxl">Getting Started With openpyxl</h3>
<p>Now that you’re aware of the benefits of a tool like <code>openpyxl</code>, let’s get down to it and start by installing the package. For this tutorial, you should use Python 3.7 and openpyxl 2.6.2. To install the package, you can do the following:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install openpyxl
</pre></div>
<p>After you install the package, you should be able to create a super simple spreadsheet with the following code:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"hello"</span>
<span class="n">sheet</span><span class="p">[</span><span class="s2">"B1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"world!"</span>
<span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"hello_world.xlsx"</span><span class="p">)</span>
</pre></div>
<p>The code above should create a file called <code>hello_world.xlsx</code> in the folder you are using to run the code. If you open that file with Excel you should see something like this:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png&w=540&sig=4c3acdcf35f528b6ed0cf6e299c2575781934414 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png&w=1080&sig=328d4ff12cec767d684f5b7666380d9f23a2a548 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png 2160w" sizes="75vw" alt="A Simple Hello World Spreadsheet"/></a></p>
<p><em>Woohoo</em>, your first spreadsheet created!</p>
<h2 id="reading-excel-spreadsheets-with-openpyxl">Reading Excel Spreadsheets With openpyxl</h2>
<p>Let’s start with the most essential thing one can do with a spreadsheet: read it.</p>
<p>You’ll go from a straightforward approach to reading a spreadsheet to more complex examples where you read the data and convert it into more useful Python structures.</p>
<h3 id="dataset-for-this-tutorial">Dataset for This Tutorial</h3>
<p>Before you dive deep into some code examples, you should <strong>download this sample dataset</strong> and store it somewhere as <code>sample.xlsx</code>:</p>
<div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
<p>This is one of the datasets you’ll be using throughout this tutorial, and it’s a spreadsheet with a sample of real data from Amazon’s online product reviews. This dataset is only a tiny fraction of what Amazon <a href="https://registry.opendata.aws/amazon-reviews/">provides</a>, but for testing purposes, it’s more than enough.</p>
<h3 id="a-simple-approach-to-reading-an-excel-spreadsheet">A Simple Approach to Reading an Excel Spreadsheet</h3>
<p>Finally, let’s start reading some spreadsheets! To begin with, open our sample spreadsheet:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="gp">>>> </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Sheet 1']</span>
<span class="gp">>>> </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="gp">>>> </span><span class="n">sheet</span>
<span class="go"><Worksheet "Sheet 1"></span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">title</span>
<span class="go">'Sheet 1'</span>
</pre></div>
<p>In the code above, you first open the spreadsheet <code>sample.xlsx</code> using <code>load_workbook()</code>, and then you can use <code>workbook.sheetnames</code> to see all the sheets you have available to work with. After that, <code>workbook.active</code> selects the first available sheet and, in this case, you can see that it selects <strong>Sheet 1</strong> automatically. Using these methods is the default way of opening a spreadsheet, and you’ll see it many times during this tutorial.</p>
<p>Now, after opening a spreadsheet, you can easily retrieve data from it like this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span>
<span class="go"><Cell 'Sheet 1'.A1></span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span><span class="o">.</span><span class="n">value</span>
<span class="go">'marketplace'</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"F10"</span><span class="p">]</span><span class="o">.</span><span class="n">value</span>
<span class="go">"G-Shock Men's Grey Sport Watch"</span>
</pre></div>
<p>To return the actual value of a cell, you need to do <code>.value</code>. Otherwise, you’ll get the main <code>Cell</code> object. You can also use the method <code>.cell()</code> to retrieve a cell using index notation. Remember to add <code>.value</code> to get the actual value and not a <code>Cell</code> object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">cell</span><span class="p">(</span><span class="n">row</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span>
<span class="go"><Cell 'Sheet 1'.F10></span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">cell</span><span class="p">(</span><span class="n">row</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span><span class="o">.</span><span class="n">value</span>
<span class="go">"G-Shock Men's Grey Sport Watch"</span>
</pre></div>
<p>You can see that the results returned are the same, no matter which way you decide to go with. However, in this tutorial, you’ll be mostly using the first approach: <code>["A1"]</code>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Even though in Python you’re used to a zero-indexed notation, with spreadsheets you’ll always use a one-indexed notation where the first row or column always has index <code>1</code>.</p>
</div>
<p>The above shows you the quickest way to open a spreadsheet. However, you can pass additional parameters to change the way a spreadsheet is loaded.</p>
<h4 id="additional-reading-options">Additional Reading Options</h4>
<p>There are a few arguments you can pass to <code>load_workbook()</code> that change the way a spreadsheet is loaded. The most important ones are the following two Booleans:</p>
<ol>
<li><strong>read_only</strong> loads a spreadsheet in read-only mode allowing you to open very large Excel files.</li>
<li><strong>data_only</strong> ignores loading formulas and instead loads only the resulting values.</li>
</ol>
<h3 id="importing-data-from-a-spreadsheet">Importing Data From a Spreadsheet</h3>
<p>Now that you’ve learned the basics about loading a spreadsheet, it’s about time you get to the fun part: <strong>the iteration and actual usage of the values within the spreadsheet</strong>.</p>
<p>This section is where you’ll learn all the different ways you can iterate through the data, but also how to convert that data into something usable and, more importantly, how to do it in a Pythonic way.</p>
<h4 id="iterating-through-the-data">Iterating Through the Data</h4>
<p>There are a few different ways you can iterate through the data depending on your needs.</p>
<p>You can slice the data with a combination of columns and rows:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A1:C2"</span><span class="p">]</span>
<span class="go">((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>),</span>
<span class="go"> (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>))</span>
</pre></div>
<p>You can get ranges of rows or columns:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Get all cells from column A</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A"</span><span class="p">]</span>
<span class="go">(<Cell 'Sheet 1'.A1>,</span>
<span class="go"> <Cell 'Sheet 1'.A2>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.A99>,</span>
<span class="go"> <Cell 'Sheet 1'.A100>)</span>
<span class="gp">>>> </span><span class="c1"># Get all cells for a range of columns</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A:B"</span><span class="p">]</span>
<span class="go">((<Cell 'Sheet 1'.A1>,</span>
<span class="go"> <Cell 'Sheet 1'.A2>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.A99>,</span>
<span class="go"> <Cell 'Sheet 1'.A100>),</span>
<span class="go"> (<Cell 'Sheet 1'.B1>,</span>
<span class="go"> <Cell 'Sheet 1'.B2>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.B99>,</span>
<span class="go"> <Cell 'Sheet 1'.B100>))</span>
<span class="gp">>>> </span><span class="c1"># Get all cells from row 5</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
<span class="go">(<Cell 'Sheet 1'.A5>,</span>
<span class="go"> <Cell 'Sheet 1'.B5>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.N5>,</span>
<span class="go"> <Cell 'Sheet 1'.O5>)</span>
<span class="gp">>>> </span><span class="c1"># Get all cells for a range of rows</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="mi">5</span><span class="p">:</span><span class="mi">6</span><span class="p">]</span>
<span class="go">((<Cell 'Sheet 1'.A5>,</span>
<span class="go"> <Cell 'Sheet 1'.B5>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.N5>,</span>
<span class="go"> <Cell 'Sheet 1'.O5>),</span>
<span class="go"> (<Cell 'Sheet 1'.A6>,</span>
<span class="go"> <Cell 'Sheet 1'.B6>,</span>
<span class="go"> ...</span>
<span class="go"> <Cell 'Sheet 1'.N6>,</span>
<span class="go"> <Cell 'Sheet 1'.O6>))</span>
</pre></div>
<p>You’ll notice that all of the above examples return a <code>tuple</code>. If you want to refresh your memory on how to handle <code>tuples</code> in Python, check out the article on <a href="https://realpython.com/python-lists-tuples/#python-tuples">Lists and Tuples in Python</a>.</p>
<p>There are also multiple ways of using normal Python <a href="https://realpython.com/introduction-to-python-generators/">generators</a> to go through the data. The main methods you can use to achieve this are:</p>
<ul>
<li><code>.iter_rows()</code></li>
<li><code>.iter_cols()</code></li>
</ul>
<p>Both methods can receive the following arguments:</p>
<ul>
<li><code>min_row</code></li>
<li><code>max_row</code></li>
<li><code>min_col</code></li>
<li><code>max_col</code></li>
</ul>
<p>These arguments are used to set boundaries for the iteration:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="go">(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>)</span>
<span class="go">(<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">column</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_cols</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">column</span><span class="p">)</span>
<span class="go">(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>)</span>
<span class="go">(<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>)</span>
<span class="go">(<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>)</span>
</pre></div>
<p>You’ll notice that in the first example, when iterating through the rows using <code>.iter_rows()</code>, you get one <code>tuple</code> element per row selected. While when using <code>.iter_cols()</code> and iterating through columns, you’ll get one <code>tuple</code> per column instead.</p>
<p>One additional argument you can pass to both methods is the Boolean <code>values_only</code>. When it’s set to <code>True</code>, the values of the cell are returned, instead of the <code>Cell</code> object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="go">('marketplace', 'customer_id', 'review_id')</span>
<span class="go">('US', 3653882, 'R3O9SGZBVQBV76')</span>
</pre></div>
<p>If you want to iterate through the whole dataset, then you can also use the attributes <code>.rows</code> or <code>.columns</code> directly, which are shortcuts to using <code>.iter_rows()</code> and <code>.iter_cols()</code> without any arguments:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">rows</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="go">(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1></span>
<span class="gp">...</span>
<span class="go"><Cell 'Sheet 1'.M100>, <Cell 'Sheet 1'.N100>, <Cell 'Sheet 1'.O100>)</span>
</pre></div>
<p>These shortcuts are very useful when you’re iterating through the whole dataset.</p>
<h4 id="manipulate-data-using-pythons-default-data-structures">Manipulate Data Using Python’s Default Data Structures</h4>
<p>Now that you know the basics of iterating through the data in a workbook, let’s look at smart ways of converting that data into Python structures.</p>
<p>As you saw earlier, the result from all iterations comes in the form of <code>tuples</code>. However, since a <code>tuple</code> is nothing more than an immutable <code>list</code>, you can easily access its data and transform it into other structures.</p>
<p>For example, say you want to extract product information from the <code>sample.xlsx</code> spreadsheet and into a dictionary where each key is a product ID.</p>
<p>A straightforward way to do this is to iterate over all the rows, pick the columns you know are related to product information, and then store that in a dictionary. Let’s code this out!</p>
<p>First of all, have a look at the headers and see what information you care most about:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="go">('marketplace', 'customer_id', 'review_id', 'product_id', ...)</span>
</pre></div>
<p>This code returns a list of all the column names you have in the spreadsheet. To start, grab the columns with names:</p>
<ul>
<li><code>product_id</code></li>
<li><code>product_parent</code></li>
<li><code>product_title</code></li>
<li><code>product_category</code></li>
</ul>
<p>Lucky for you, the columns you need are all next to each other so you can use the <code>min_column</code> and <code>max_column</code> to easily get the data you want:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="go">('B00FALQ1ZC', 937001370, 'Invicta Women\'s 15150 "Angel" 18k Yellow...)</span>
<span class="go">('B00D3RGO20', 484010722, "Kenneth Cole New York Women's KC4944...)</span>
<span class="gp">...</span>
</pre></div>
<p>Nice! Now that you know how to get all the important product information you need, let’s put that data into a dictionary:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">products</span> <span class="o">=</span> <span class="p">{}</span>
<span class="c1"># Using the values_only because you want to return the cells' values</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">min_col</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
<span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">product_id</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">product</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"parent"</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
<span class="s2">"title"</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span>
<span class="s2">"category"</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="p">}</span>
<span class="n">products</span><span class="p">[</span><span class="n">product_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">product</span>
<span class="c1"># Using json here to be able to format the output for displaying later</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">products</span><span class="p">))</span>
</pre></div>
<p>The code above returns a JSON similar to this:</p>
<div class="highlight json"><pre><span></span><span class="p">{</span>
<span class="nt">"B00FALQ1ZC"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"parent"</span><span class="p">:</span> <span class="mi">937001370</span><span class="p">,</span>
<span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Invicta Women's 15150 ..."</span><span class="p">,</span>
<span class="nt">"category"</span><span class="p">:</span> <span class="s2">"Watches"</span>
<span class="p">},</span>
<span class="nt">"B00D3RGO20"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"parent"</span><span class="p">:</span> <span class="mi">484010722</span><span class="p">,</span>
<span class="nt">"title"</span><span class="p">:</span> <span class="s2">"Kenneth Cole New York ..."</span><span class="p">,</span>
<span class="nt">"category"</span><span class="p">:</span> <span class="s2">"Watches"</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Here you can see that the output is trimmed to 2 products only, but if you run the script as it is, then you should get 98 products.</p>
<h4 id="convert-data-into-python-classes">Convert Data Into Python Classes</h4>
<p>To finalize the reading section of this tutorial, let’s dive into Python classes and see how you could improve on the example above and better structure the data.</p>
<p>For this, you’ll be using the new Python <a href="https://realpython.com/python-data-classes/">Data Classes</a> that are available from Python 3.7. If you’re using an older version of Python, then you can use the default <a href="https://realpython.com/python3-object-oriented-programming/#classes-in-python">Classes</a> instead.</p>
<p>So, first things first, let’s look at the data you have and decide what you want to store and how you want to store it.</p>
<p>As you saw right at the start, this data comes from Amazon, and it’s a list of product reviews. You can check the <a href="https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt">list of all the columns and their meaning</a> on Amazon.</p>
<p>There are two significant elements you can extract from the data available:</p>
<ol>
<li>Products</li>
<li>Reviews</li>
</ol>
<p>A <strong>Product</strong> has:</p>
<ul>
<li>ID</li>
<li>Title</li>
<li>Parent</li>
<li>Category</li>
</ul>
<p>The <strong>Review</strong> has a few more fields:</p>
<ul>
<li>ID</li>
<li>Customer ID</li>
<li>Stars</li>
<li>Headline</li>
<li>Body</li>
<li>Date</li>
</ul>
<p>You can ignore a few of the review fields to make things a bit simpler.</p>
<p>So, a straightforward implementation of these two classes could be written in a separate file <code>classes.py</code>:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">from</span> <span class="nn">dataclasses</span> <span class="k">import</span> <span class="n">dataclass</span>
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Product</span><span class="p">:</span>
<span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">parent</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">title</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">category</span><span class="p">:</span> <span class="nb">str</span>
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Review</span><span class="p">:</span>
<span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">customer_id</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">stars</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">headline</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">body</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">date</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span>
</pre></div>
<p>After defining your data classes, you need to convert the data from the spreadsheet into these new structures.</p>
<p>Before doing the conversion, it’s worth looking at our header again and creating a mapping between columns and the fields you need:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="go">('marketplace', 'customer_id', 'review_id', 'product_id', ...)</span>
<span class="gp">>>> </span><span class="c1"># Or an alternative</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">sheet</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="go">marketplace</span>
<span class="go">customer_id</span>
<span class="go">review_id</span>
<span class="go">product_id</span>
<span class="go">product_parent</span>
<span class="gp">...</span>
</pre></div>
<p>Let’s create a file <code>mapping.py</code> where you have a list of all the field names and their column location (zero-indexed) on the spreadsheet:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Product fields</span>
<span class="n">PRODUCT_ID</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">PRODUCT_PARENT</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">PRODUCT_TITLE</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">PRODUCT_CATEGORY</span> <span class="o">=</span> <span class="mi">6</span>
<span class="c1"># Review fields</span>
<span class="n">REVIEW_ID</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">REVIEW_CUSTOMER</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">REVIEW_STARS</span> <span class="o">=</span> <span class="mi">7</span>
<span class="n">REVIEW_HEADLINE</span> <span class="o">=</span> <span class="mi">12</span>
<span class="n">REVIEW_BODY</span> <span class="o">=</span> <span class="mi">13</span>
<span class="n">REVIEW_DATE</span> <span class="o">=</span> <span class="mi">14</span>
</pre></div>
<p>You don’t necessarily have to do the mapping above. It’s more for readability when parsing the row data, so you don’t end up with a lot of magic numbers lying around.</p>
<p>Finally, let’s look at the code needed to parse the spreadsheet data into a list of product and review objects:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span>
<span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="kn">from</span> <span class="nn">classes</span> <span class="k">import</span> <span class="n">Product</span><span class="p">,</span> <span class="n">Review</span>
<span class="kn">from</span> <span class="nn">mapping</span> <span class="k">import</span> <span class="n">PRODUCT_ID</span><span class="p">,</span> <span class="n">PRODUCT_PARENT</span><span class="p">,</span> <span class="n">PRODUCT_TITLE</span><span class="p">,</span> \
<span class="n">PRODUCT_CATEGORY</span><span class="p">,</span> <span class="n">REVIEW_DATE</span><span class="p">,</span> <span class="n">REVIEW_ID</span><span class="p">,</span> <span class="n">REVIEW_CUSTOMER</span><span class="p">,</span> \
<span class="n">REVIEW_STARS</span><span class="p">,</span> <span class="n">REVIEW_HEADLINE</span><span class="p">,</span> <span class="n">REVIEW_BODY</span>
<span class="c1"># Using the read_only method since you're not gonna be editing the spreadsheet</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">,</span> <span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">products</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">reviews</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># Using the values_only because you just want to return the cell value</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">product</span> <span class="o">=</span> <span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_ID</span><span class="p">],</span>
<span class="n">parent</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_PARENT</span><span class="p">],</span>
<span class="n">title</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_TITLE</span><span class="p">],</span>
<span class="n">category</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_CATEGORY</span><span class="p">])</span>
<span class="n">products</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
<span class="c1"># You need to parse the date from the spreadsheet into a datetime format</span>
<span class="n">spread_date</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="n">REVIEW_DATE</span><span class="p">]</span>
<span class="n">parsed_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">spread_date</span><span class="p">,</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2">"</span><span class="p">)</span>
<span class="n">review</span> <span class="o">=</span> <span class="n">Review</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_ID</span><span class="p">],</span>
<span class="n">customer_id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_CUSTOMER</span><span class="p">],</span>
<span class="n">stars</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_STARS</span><span class="p">],</span>
<span class="n">headline</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_HEADLINE</span><span class="p">],</span>
<span class="n">body</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_BODY</span><span class="p">],</span>
<span class="n">date</span><span class="o">=</span><span class="n">parsed_date</span><span class="p">)</span>
<span class="n">reviews</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">review</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">products</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="n">reviews</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
</pre></div>
<p>After you run the code above, you should get some output like this:</p>
<div class="highlight python"><pre><span></span><span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s1">'B00FALQ1ZC'</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="mi">937001370</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
<span class="n">Review</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s1">'R3O9SGZBVQBV76'</span><span class="p">,</span> <span class="n">customer_id</span><span class="o">=</span><span class="mi">3653882</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
</pre></div>
<p>That’s it! Now you should have the data in a very simple and digestible class format, and you can start thinking of storing this in a <a href="https://realpython.com/tutorials/databases/">Database</a> or any other type of data storage you like.</p>
<p>Using this kind of OOP strategy to parse spreadsheets makes handling the data much simpler later on.</p>
<h3 id="appending-new-data">Appending New Data</h3>
<p>Before you start creating very complex spreadsheets, have a quick look at an example of how to append data to an existing spreadsheet.</p>
<p>Go back to the first example spreadsheet you created (<code>hello_world.xlsx</code>) and try opening it and appending some data to it, like this:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="c1"># Start by opening the spreadsheet and selecting the main sheet</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"hello_world.xlsx"</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="c1"># Write what you want into a specific cell</span>
<span class="n">sheet</span><span class="p">[</span><span class="s2">"C1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"writing ;)"</span>
<span class="c1"># Save the spreadsheet</span>
<span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"hello_world_append.xlsx"</span>
</pre></div>
<p><em>Et voilร </em>, if you open the new <code>hello_world_append.xlsx</code> spreadsheet, you’ll see the following change:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png&w=540&sig=098886279b90048004feb6dcdbe1c66ac3e231ce 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png&w=1080&sig=8619e04c109779499f96dcd8aee01c4cf1ed52eb 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png 2160w" sizes="75vw" alt="Appending Data to a Spreadsheet"/></a></p>
<p>Notice the additional <em>writing ;)</em> on cell <code>C1</code>.</p>
<h2 id="writing-excel-spreadsheets-with-openpyxl">Writing Excel Spreadsheets With openpyxl</h2>
<p>There are a lot of different things you can write to a spreadsheet, from simple text or number values to complex formulas, charts, or even images.</p>
<p>Let’s start creating some spreadsheets!</p>
<h3 id="creating-a-simple-spreadsheet">Creating a Simple Spreadsheet</h3>
<p>Previously, you saw a very quick example of how to write “Hello world!” into a spreadsheet, so you can start with that:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="n">filename</span> <span class="o">=</span> <span class="s2">"hello_world.xlsx"</span>
<span class="lineno"> 4 </span>
<span class="lineno"> 5 </span><span class="hll"><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
</span><span class="lineno"> 6 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="lineno"> 7 </span>
<span class="lineno"> 8 </span><span class="hll"><span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"hello"</span>
</span><span class="lineno"> 9 </span><span class="hll"><span class="n">sheet</span><span class="p">[</span><span class="s2">"B1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"world!"</span>
</span><span class="lineno">10 </span>
<span class="lineno">11 </span><span class="hll"><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="n">filename</span><span class="p">)</span>
</span></pre></div>
<p>The highlighted lines in the code above are the most important ones for writing. In the code, you can see that:</p>
<ul>
<li><strong>Line 5</strong> shows you how to create a new empty workbook.</li>
<li><strong>Lines 8 and 9</strong> show you how to add data to specific cells.</li>
<li><strong>Line 11</strong> shows you how to save the spreadsheet when you’re done.</li>
</ul>
<p>Even though these lines above can be straightforward, it’s still good to know them well for when things get a bit more complicated.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You’ll be using the <code>hello_world.xlsx</code> spreadsheet for some of the upcoming examples, so keep it handy.</p>
</div>
<p>One thing you can do to help with coming code examples is add the following method to your Python file or console:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">print_rows</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</pre></div>
<p>It makes it easier to print all of your spreadsheet values by just calling <code>print_rows()</code>.</p>
<h3 id="basic-spreadsheet-operations">Basic Spreadsheet Operations</h3>
<p>Before you get into the more advanced topics, it’s good for you to know how to manage the most simple elements of a spreadsheet.</p>
<h4 id="adding-and-updating-cell-values">Adding and Updating Cell Values</h4>
<p>You already learned how to add values to a spreadsheet like this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"value"</span>
</pre></div>
<p>There’s another way you can do this, by first selecting a cell and then changing its value:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">cell</span> <span class="o">=</span> <span class="n">sheet</span><span class="p">[</span><span class="s2">"A1"</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">cell</span>
<span class="go"><Cell 'Sheet'.A1></span>
<span class="gp">>>> </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span>
<span class="go">'hello'</span>
<span class="gp">>>> </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="s2">"hey"</span>
<span class="gp">>>> </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span>
<span class="go">'hey'</span>
</pre></div>
<p>The new value is only stored into the spreadsheet once you call <code>workbook.save()</code>.</p>
<p>The <code>openpyxl</code> creates a cell when adding a value, if that cell didn’t exist before:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Before, our spreadsheet has only 1 row</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">('hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Try adding a value to row 10</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"B10"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"test"</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">('hello', 'world!')</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, 'test')</span>
</pre></div>
<p>As you can see, when trying to add a value to cell <code>B10</code>, you end up with a tuple with 10 rows, just so you can have that <em>test</em> value.</p>
<h4 id="managing-rows-and-columns">Managing Rows and Columns</h4>
<p>One of the most common things you have to do when manipulating spreadsheets is adding or removing rows and columns. The <code>openpyxl</code> package allows you to do that in a very straightforward way by using the methods:</p>
<ul>
<li><code>.insert_rows()</code></li>
<li><code>.delete_rows()</code></li>
<li><code>.insert_cols()</code></li>
<li><code>.delete_cols()</code></li>
</ul>
<p>Every single one of those methods can receive two arguments:</p>
<ol>
<li><code>idx</code></li>
<li><code>amount</code></li>
</ol>
<p>Using our basic <code>hello_world.xlsx</code> example again, let’s see how these methods work:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">('hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Insert a column before the existing column 1 ("A")</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">(None, 'hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Insert 5 columns between column 2 ("B") and 3 ("C")</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">(None, 'hello', None, None, None, None, None, 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Delete the created columns</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">('hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Insert a new row in the beginning</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">(None, None)</span>
<span class="go">('hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Insert 3 new rows in the beginning</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">(None, None)</span>
<span class="go">('hello', 'world!')</span>
<span class="gp">>>> </span><span class="c1"># Delete the first 4 rows</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">print_rows</span><span class="p">()</span>
<span class="go">('hello', 'world!')</span>
</pre></div>
<p>The only thing you need to remember is that when inserting new data (rows or columns), the insertion happens <strong>before</strong> the <code>idx</code> parameter.</p>
<p>So, if you do <code>insert_rows(1)</code>, it inserts a new row <strong>before</strong> the existing first row.</p>
<p>It’s the same for columns: when you call <code>insert_cols(2)</code>, it inserts a new column right <strong>before</strong> the already existing second column (<code>B</code>).</p>
<p>However, when deleting rows or columns, <code>.delete_...</code> deletes data <strong>starting from</strong> the index passed as an argument.</p>
<p>For example, when doing <code>delete_rows(2)</code> it deletes row <code>2</code>, and when doing <code>delete_cols(3)</code> it deletes the third column (<code>C</code>).</p>
<h4 id="managing-sheets">Managing Sheets</h4>
<p>Sheet management is also one of those things you might need to know, even though it might be something that you don’t use that often.</p>
<p>If you look back at the code examples from this tutorial, you’ll notice the following recurring piece of code:</p>
<div class="highlight python"><pre><span></span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
</pre></div>
<p>This is the way to select the default sheet from a spreadsheet. However, if you’re opening a spreadsheet with multiple sheets, then you can always select a specific one like this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Let's say you have two sheets: "Products" and "Company Sales"</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales']</span>
<span class="gp">>>> </span><span class="c1"># You can select a sheet using its title</span>
<span class="gp">>>> </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">"Products"</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">sales_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">"Company Sales"</span><span class="p">]</span>
</pre></div>
<p>You can also change a sheet title very easily:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales']</span>
<span class="gp">>>> </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">"Products"</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">products_sheet</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"New Products"</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['New Products', 'Company Sales']</span>
</pre></div>
<p>If you want to create or delete sheets, then you can also do that with <code>.create_sheet()</code> and <code>.remove()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales']</span>
<span class="gp">>>> </span><span class="n">operations_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">create_sheet</span><span class="p">(</span><span class="s2">"Operations"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales', 'Operations']</span>
<span class="gp">>>> </span><span class="c1"># You can also define the position to create the sheet at</span>
<span class="gp">>>> </span><span class="n">hr_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">create_sheet</span><span class="p">(</span><span class="s2">"HR"</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['HR', 'Products', 'Company Sales', 'Operations']</span>
<span class="gp">>>> </span><span class="c1"># To remove them, just pass the sheet as an argument to the .remove()</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">operations_sheet</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['HR', 'Products', 'Company Sales']</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">hr_sheet</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales']</span>
</pre></div>
<p>One other thing you can do is make duplicates of a sheet using <code>copy_worksheet()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales']</span>
<span class="gp">>>> </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">"Products"</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">copy_worksheet</span><span class="p">(</span><span class="n">products_sheet</span><span class="p">)</span>
<span class="go"><Worksheet "Products Copy"></span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
<span class="go">['Products', 'Company Sales', 'Products Copy']</span>
</pre></div>
<p>If you open your spreadsheet after saving the above code, you’ll notice that the sheet <em>Products Copy</em> is a duplicate of the sheet <em>Products</em>.</p>
<h4 id="freezing-rows-and-columns">Freezing Rows and Columns</h4>
<p>Something that you might want to do when working with big spreadsheets is to freeze a few rows or columns, so they remain visible when you scroll right or down.</p>
<p>Freezing data allows you to keep an eye on important rows or columns, regardless of where you scroll in the spreadsheet.</p>
<p>Again, <code>openpyxl</code> also has a way to accomplish this by using the worksheet <code>freeze_panes</code> attribute. For this example, go back to our <code>sample.xlsx</code> spreadsheet and try doing the following:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">freeze_panes</span> <span class="o">=</span> <span class="s2">"C2"</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"sample_frozen.xlsx"</span><span class="p">)</span>
</pre></div>
<p>If you open the <code>sample_frozen.xlsx</code> spreadsheet in your favorite spreadsheet editor, you’ll notice that row <code>1</code> and columns <code>A</code> and <code>B</code> are frozen and are always visible no matter where you navigate within the spreadsheet.</p>
<p>This feature is handy, for example, to keep headers within sight, so you always know what each column represents.</p>
<p>Here’s how it looks in the editor:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png&w=540&sig=5826de23e5df2e08d625844698fc3a29b32ee7b2 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png&w=1080&sig=c3abe2321f00372d975bbbf033f3ebe3687eb09f 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png 2160w" sizes="75vw" alt="Example Spreadsheet With Frozen Rows and Columns"/></a></p>
<p>Notice how you’re at the end of the spreadsheet, and yet, you can see both row <code>1</code> and columns <code>A</code> and <code>B</code>.</p>
<h4 id="adding-filters">Adding Filters</h4>
<p>You can use <code>openpyxl</code> to add filters and sorts to your spreadsheet. However, when you open the spreadsheet, the data won’t be rearranged according to these sorts and filters.</p>
<p>At first, this might seem like a pretty useless feature, but when you’re programmatically creating a spreadsheet that is going to be sent and used by somebody else, it’s still nice to at least create the filters and allow people to use it afterward.</p>
<p>The code below is an example of how you would add some filters to our existing <code>sample.xlsx</code> spreadsheet:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Check the used spreadsheet space using the attribute "dimensions"</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">dimensions</span>
<span class="go">'A1:O100'</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">auto_filter</span><span class="o">.</span><span class="n">ref</span> <span class="o">=</span> <span class="s2">"A1:O100"</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_with_filters.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You should now see the filters created when opening the spreadsheet in your editor:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png&w=540&sig=c1d7ad4f2dfc03fc8730e3babf9000ac74170c7d 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png&w=1080&sig=27e888a967ddc112f1e824be671a15e2c111fe6c 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png 2160w" sizes="75vw" alt="Example Spreadsheet With Filters"/></a></p>
<p>You don’t have to use <code>sheet.dimensions</code> if you know precisely which part of the spreadsheet you want to apply filters to.</p>
<h3 id="adding-formulas">Adding Formulas</h3>
<p><strong>Formulas</strong> (or <strong>formulae</strong>) are one of the most powerful features of spreadsheets.</p>
<p>They gives you the power to apply specific mathematical equations to a range of cells. Using formulas with <code>openpyxl</code> is as simple as editing the value of a cell.</p>
<p>You can see the list of formulas supported by <code>openpyxl</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.utils</span> <span class="k">import</span> <span class="n">FORMULAE</span>
<span class="gp">>>> </span><span class="n">FORMULAE</span>
<span class="go">frozenset({'ABS',</span>
<span class="go"> 'ACCRINT',</span>
<span class="go"> 'ACCRINTM',</span>
<span class="go"> 'ACOS',</span>
<span class="go"> 'ACOSH',</span>
<span class="go"> 'AMORDEGRC',</span>
<span class="go"> 'AMORLINC',</span>
<span class="go"> 'AND',</span>
<span class="go"> ...</span>
<span class="go"> 'YEARFRAC',</span>
<span class="go"> 'YIELD',</span>
<span class="go"> 'YIELDDISC',</span>
<span class="go"> 'YIELDMAT',</span>
<span class="go"> 'ZTEST'})</span>
</pre></div>
<p>Let’s add some formulas to our <code>sample.xlsx</code> spreadsheet.</p>
<p>Starting with something easy, let’s check the average star rating for the 99 reviews within the spreadsheet:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Star rating is column "H"</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"P2"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"=AVERAGE(H2:H100)"</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_formulas.xlsx"</span><span class="p">)</span>
</pre></div>
<p>If you open the spreadsheet now and go to cell <code>P2</code>, you should see that its value is: <em>4.18181818181818</em>. Have a look in the editor:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png&w=540&sig=5d7a9eb97acf524d5d2b9b93ae0e9214bbcf95c8 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png&w=1080&sig=8af67321cb101fb2f30fd0ca2bdcc62de35c9334 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png 2160w" sizes="75vw" alt="Example Spreadsheet With Average Formula"/></a></p>
<p>You can use the same methodology to add any formulas to your spreadsheet. For example, let’s count the number of reviews that had helpful votes:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># The helpful votes are counted on column "I"</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"P3"</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'=COUNTIF(I2:I100, ">0")'</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_formulas.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You should get the number <code>21</code> on your <code>P3</code> spreadsheet cell like so:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png&w=540&sig=0ec4c4c12a792a1a393e0273855282bfa0594d53 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png&w=1080&sig=67b399b8cb79ddbe7285da0325fe8f6b9edf3ecc 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png 2160w" sizes="75vw" alt="Example Spreadsheet With Average and CountIf Formula"/></a></p>
<p>You’ll have to make sure that the strings within a formula are always in double quotes, so you either have to use single quotes around the formula like in the example above or you’ll have to escape the double quotes inside the formula: <code>"=COUNTIF(I2:I100, \">0\")"</code>.</p>
<p>There are a ton of other formulas you can add to your spreadsheet using the same procedure you tried above. Give it a go yourself!</p>
<h3 id="adding-styles">Adding Styles</h3>
<p>Even though styling a spreadsheet might not be something you would do every day, it’s still good to know how to do it.</p>
<p>Using <code>openpyxl</code>, you can apply multiple styling options to your spreadsheet, including fonts, borders, colors, and so on. Have a look at the <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/styles.html">documentation</a> to learn more.</p>
<p>You can also choose to either apply a style directly to a cell or create a template and reuse it to apply styles to multiple cells.</p>
<p>Let’s start by having a look at simple cell styling, using our <code>sample.xlsx</code> again as the base spreadsheet:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Import necessary style classes</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">Font</span><span class="p">,</span> <span class="n">Color</span><span class="p">,</span> <span class="n">Alignment</span><span class="p">,</span> <span class="n">Border</span><span class="p">,</span> <span class="n">Side</span><span class="p">,</span> <span class="n">colors</span>
<span class="gp">>>> </span><span class="c1"># Create a few styles</span>
<span class="gp">>>> </span><span class="n">bold_font</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">bold</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">big_red_text</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">center_aligned_text</span> <span class="o">=</span> <span class="n">Alignment</span><span class="p">(</span><span class="n">horizontal</span><span class="o">=</span><span class="s2">"center"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">double_border_side</span> <span class="o">=</span> <span class="n">Side</span><span class="p">(</span><span class="n">border_style</span><span class="o">=</span><span class="s2">"double"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">square_border</span> <span class="o">=</span> <span class="n">Border</span><span class="p">(</span><span class="n">top</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">right</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">bottom</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">left</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Style some cells!</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A2"</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">bold_font</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A3"</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">big_red_text</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A4"</span><span class="p">]</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">center_aligned_text</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A5"</span><span class="p">]</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">square_border</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_styles.xlsx"</span><span class="p">)</span>
</pre></div>
<p>If you open your spreadsheet now, you should see quite a few different styles on the first 5 cells of column <code>A</code>:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png&w=540&sig=ecc21878006697a6135ae515442642a95ab2bfb6 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png&w=1080&sig=6f0ef4a148f1ca5a588e0cb2c02b0c9aad4246f2 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png 2160w" sizes="75vw" alt="Example Spreadsheet With Simple Cell Styles"/></a></p>
<p>There you go. You got:</p>
<ul>
<li><strong>A2</strong> with the text in bold</li>
<li><strong>A3</strong> with the text in red and bigger font size</li>
<li><strong>A4</strong> with the text centered</li>
<li><strong>A5</strong> with a square border around the text</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> For the colors, you can also use HEX codes instead by doing <code>Font(color="C70E0F")</code>.</p>
</div>
<p>You can also combine styles by simply adding them to the cell at the same time:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Reusing the same styles from the example above</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A6"</span><span class="p">]</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">center_aligned_text</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A6"</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">big_red_text</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="p">[</span><span class="s2">"A6"</span><span class="p">]</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">square_border</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_styles.xlsx"</span><span class="p">)</span>
</pre></div>
<p>Have a look at cell <code>A6</code> here:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png&w=540&sig=290bbf523eb24ac8c9741daf86701ca57cad4b96 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png&w=1080&sig=9decedef154c2138e26b287f2186213142650f6e 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png 2160w" sizes="75vw" alt="Example Spreadsheet With Coupled Cell Styles"/></a></p>
<p>When you want to apply multiple styles to one or several cells, you can use a <code>NamedStyle</code> class instead, which is like a style template that you can use over and over again. Have a look at the example below:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">NamedStyle</span>
<span class="gp">>>> </span><span class="c1"># Let's create a style template for the header row</span>
<span class="gp">>>> </span><span class="n">header</span> <span class="o">=</span> <span class="n">NamedStyle</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"header"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">header</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">bold</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">header</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">Border</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="n">Side</span><span class="p">(</span><span class="n">border_style</span><span class="o">=</span><span class="s2">"thin"</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">header</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">Alignment</span><span class="p">(</span><span class="n">horizontal</span><span class="o">=</span><span class="s2">"center"</span><span class="p">,</span> <span class="n">vertical</span><span class="o">=</span><span class="s2">"center"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Now let's apply this to all first row (header) cells</span>
<span class="gp">>>> </span><span class="n">header_row</span> <span class="o">=</span> <span class="n">sheet</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">header_row</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">cell</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="n">header</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_styles.xlsx"</span><span class="p">)</span>
</pre></div>
<p>If you open the spreadsheet now, you should see that its first row is bold, the text is aligned to the center, and there’s a small bottom border! Have a look below:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png&w=540&sig=199107a0c9ea60fbf1dfcc078a7680b43faeef3a 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png&w=1080&sig=af3e5225e36a24dea088e8c175da02050bb5dda9 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png 2160w" sizes="75vw" alt="Example Spreadsheet With Named Styles"/></a></p>
<p>As you saw above, there are many options when it comes to styling, and it depends on the use case, so feel free to check <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/styles.html">documentation</a> and see what other things you can do.</p>
<h3 id="conditional-formatting">Conditional Formatting</h3>
<p>This feature is one of my personal favorites when it comes to adding styles to a spreadsheet.</p>
<p>It’s a much more powerful approach to styling because it dynamically applies styles according to how the data in the spreadsheet changes.</p>
<p>In a nutshell, <strong>conditional formatting</strong> allows you to specify a list of styles to apply to a cell (or cell range) according to specific conditions.</p>
<p>For example, a widespread use case is to have a balance sheet where all the negative totals are in red, and the positive ones are in green. This formatting makes it much more efficient to spot good vs bad periods.</p>
<p>Without further ado, let’s pick our favorite spreadsheet—<code>sample.xlsx</code>—and add some conditional formatting.</p>
<p>You can start by adding a simple one that adds a red background to all reviews with less than 3 stars:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">PatternFill</span><span class="p">,</span> <span class="n">colors</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.styles.differential</span> <span class="k">import</span> <span class="n">DifferentialStyle</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">Rule</span>
<span class="gp">>>> </span><span class="n">red_background</span> <span class="o">=</span> <span class="n">PatternFill</span><span class="p">(</span><span class="n">bgColor</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">diff_style</span> <span class="o">=</span> <span class="n">DifferentialStyle</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">red_background</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">rule</span> <span class="o">=</span> <span class="n">Rule</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s2">"expression"</span><span class="p">,</span> <span class="n">dxf</span><span class="o">=</span><span class="n">diff_style</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">rule</span><span class="o">.</span><span class="n">formula</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"$H1<3"</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"A1:O100"</span><span class="p">,</span> <span class="n">rule</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"sample_conditional_formatting.xlsx"</span><span class="p">)</span>
</pre></div>
<p>Now you’ll see all the reviews with a star rating below 3 marked with a red background:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png&w=540&sig=f3c141c2fe708c031c32c083cb038a736fd8da87 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png&w=1080&sig=9ded3045cee09d34cd6f5dada7a4eea669ac4808 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png 2160w" sizes="75vw" alt="Example Spreadsheet With Simple Conditional Formatting"/></a></p>
<p>Code-wise, the only things that are new here are the objects <code>DifferentialStyle</code> and <code>Rule</code>:</p>
<ul>
<li><strong><code>DifferentialStyle</code></strong> is quite similar to <code>NamedStyle</code>, which you already saw above, and it’s used to aggregate multiple styles such as fonts, borders, alignment, and so forth.</li>
<li><strong><code>Rule</code></strong> is responsible for selecting the cells and applying the styles if the cells match the rule’s logic.</li>
</ul>
<p>Using a <code>Rule</code> object, you can create numerous conditional formatting scenarios.</p>
<p>However, for simplicity sake, the <code>openpyxl</code> package offers 3 built-in formats that make it easier to create a few common conditional formatting patterns. These built-ins are:</p>
<ul>
<li><code>ColorScale</code></li>
<li><code>IconSet</code></li>
<li><code>DataBar</code></li>
</ul>
<p>The <strong>ColorScale</strong> gives you the ability to create color gradients:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">ColorScaleRule</span>
<span class="gp">>>> </span><span class="n">color_scale_rule</span> <span class="o">=</span> <span class="n">ColorScaleRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">"min"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">start_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_type</span><span class="o">=</span><span class="s2">"max"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Again, let's add this gradient to the star ratings, column "H"</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"H2:H100"</span><span class="p">,</span> <span class="n">color_scale_rule</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_conditional_formatting_color_scale.xlsx"</span><span class="p">)</span>
</pre></div>
<p>Now you should see a color gradient on column <code>H</code>, from red to green, according to the star rating:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png&w=540&sig=782964d150a8adc1de811fab78b0accde6357f85 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png&w=1080&sig=4c7c87c194ec0a8eb3a9e73fdabf3ead991e96ea 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png 2160w" sizes="75vw" alt="Example Spreadsheet With Color Scale Conditional Formatting"/></a></p>
<p>You can also add a third color and make two gradients instead:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">ColorScaleRule</span>
<span class="gp">>>> </span><span class="n">color_scale_rule</span> <span class="o">=</span> <span class="n">ColorScaleRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">"num"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">start_value</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">start_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">mid_type</span><span class="o">=</span><span class="s2">"num"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">mid_value</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">mid_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">YELLOW</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_type</span><span class="o">=</span><span class="s2">"num"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_value</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
<span class="gp">>>> </span><span class="c1"># Again, let's add this gradient to the star ratings, column "H"</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"H2:H100"</span><span class="p">,</span> <span class="n">color_scale_rule</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample_conditional_formatting_color_scale_3.xlsx"</span><span class="p">)</span>
</pre></div>
<p>This time, you’ll notice that star ratings between 1 and 3 have a gradient from red to yellow, and star ratings between 3 and 5 have a gradient from yellow to green:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png&w=540&sig=17daddc356c8ff5c78497b200fe57ab69f80617d 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png&w=1080&sig=6dcb18f4aa3f8ae0a395ca1e3581f8d4c59805ad 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png 2160w" sizes="75vw" alt="Example Spreadsheet With 2 Color Scales Conditional Formatting"/></a></p>
<p>The <strong>IconSet</strong> allows you to add an icon to the cell according to its value:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">IconSetRule</span>
<span class="gp">>>> </span><span class="n">icon_set_rule</span> <span class="o">=</span> <span class="n">IconSetRule</span><span class="p">(</span><span class="s2">"5Arrows"</span><span class="p">,</span> <span class="s2">"num"</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"H2:H100"</span><span class="p">,</span> <span class="n">icon_set_rule</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"sample_conditional_formatting_icon_set.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You’ll see a colored arrow next to the star rating. This arrow is red and points down when the value of the cell is 1 and, as the rating gets better, the arrow starts pointing up and becomes green:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png&w=540&sig=388fc68ff53fa2e2d3d5678acfd41f10fa8eccde 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png&w=1080&sig=e33f483f9758189782bb619a6714c948130375aa 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png 2160w" sizes="75vw" alt="Example Spreadsheet With Icon Set Conditional Formatting"/></a></p>
<p>The <code>openpyxl</code> package has a <a href="https://openpyxl.readthedocs.io/en/stable/formatting.html#iconset">full list</a> of other icons you can use, besides the arrow.</p>
<p>Finally, the <strong>DataBar</strong> allows you to create progress bars:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">DataBarRule</span>
<span class="gp">>>> </span><span class="n">data_bar_rule</span> <span class="o">=</span> <span class="n">DataBarRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">"num"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">start_value</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_type</span><span class="o">=</span><span class="s2">"num"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">end_value</span><span class="o">=</span><span class="s2">"5"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">"H2:H100"</span><span class="p">,</span> <span class="n">data_bar_rule</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"sample_conditional_formatting_data_bar.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You’ll now see a green progress bar that gets fuller the closer the star rating is to the number 5:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png&w=540&sig=a7b8e3515fde3ff6b662ff1780cbc290da9ce2ad 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png&w=1080&sig=1e3d2befc147a43c5b98061cf5888abf22ca4c4a 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png 2160w" sizes="75vw" alt="Example Spreadsheet With Data Bar Conditional Formatting"/></a></p>
<p>As you can see, there are a lot of cool things you can do with conditional formatting.</p>
<p>Here, you saw only a few examples of what you can achieve with it, but check the <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/formatting.html">documentation</a> to see a bunch of other options.</p>
<h3 id="adding-images">Adding Images</h3>
<p>Even though images are not something that you’ll often see in a spreadsheet, it’s quite cool to be able to add them. Maybe you can use it for branding purposes or to make spreadsheets more personal.</p>
<p>To be able to load images to a spreadsheet using <code>openpyxl</code>, you’ll have to install <code>Pillow</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install Pillow
</pre></div>
<p>Apart from that, you’ll also need an image. For this example, you can grab the <em>Real Python</em> logo below and convert it from <code>.webp</code> to <code>.png</code> using an online converter such as <a href="https://cloudconvert.com/webp-to-png">cloudconvert.com</a>, save the final file as <code>logo.png</code>, and copy it to the root folder where you’re running your examples:</p>
<p><a href="https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png" target="_blank"><img class="img-fluid mx-auto d-block w-25" src="https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png" width="1500" height="1500" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/real-python-logo-round.4d95338e8944.png&w=375&sig=e431a39c9d7f2d5963a81687571a41288c359142 375w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/real-python-logo-round.4d95338e8944.png&w=750&sig=a098752adfc378feee6bc69748af593ed078b8c0 750w, https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png 1500w" sizes="75vw" alt="Real Python Logo"/></a></p>
<p>Afterward, this is the code you need to import that image into the <code>hello_word.xlsx</code> spreadsheet:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="kn">from</span> <span class="nn">openpyxl.drawing.image</span> <span class="k">import</span> <span class="n">Image</span>
<span class="c1"># Let's use the hello_world spreadsheet since it has less data</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"hello_world.xlsx"</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">logo</span> <span class="o">=</span> <span class="n">Image</span><span class="p">(</span><span class="s2">"logo.png"</span><span class="p">)</span>
<span class="c1"># A bit of resizing to not fill the whole spreadsheet with the logo</span>
<span class="n">logo</span><span class="o">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">150</span>
<span class="n">logo</span><span class="o">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">150</span>
<span class="n">sheet</span><span class="o">.</span><span class="n">add_image</span><span class="p">(</span><span class="n">logo</span><span class="p">,</span> <span class="s2">"A3"</span><span class="p">)</span>
<span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"hello_world_logo.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You have an image on your spreadsheet! Here it is:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png&w=540&sig=574c2c425c011fa21e790f7cd4a41547f2449b01 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png&w=1080&sig=e6b9c78c7daa929ae86f887d560c3f50f0851d5d 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png 2160w" sizes="75vw" alt="Example Spreadsheet With Image"/></a></p>
<p>The image’s left top corner is on the cell you chose, in this case, <code>A3</code>.</p>
<h3 id="adding-pretty-charts">Adding Pretty Charts</h3>
<p>Another powerful thing you can do with spreadsheets is create an incredible variety of charts. </p>
<p>Charts are a great way to visualize and understand loads of data quickly. There are a lot of different chart types: bar chart, pie chart, line chart, and so on. <code>openpyxl</code> has support for a lot of them.</p>
<p>Here, you’ll see only a couple of examples of charts because the theory behind it is the same for every single chart type:</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> A few of the chart types that <code>openpyxl</code> currently doesn’t have support for are Funnel, Gantt, Pareto, Treemap, Waterfall, Map, and Sunburst.</p>
</div>
<p>For any chart you want to build, you’ll need to define the chart type: <code>BarChart</code>, <code>LineChart</code>, and so forth, plus the data to be used for the chart, which is called <code>Reference</code>.</p>
<p>Before you can build your chart, you need to define what data you want to see represented in it. Sometimes, you can use the dataset as is, but other times you need to massage the data a bit to get additional information.</p>
<p>Let’s start by building a new workbook with some sample data:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="lineno"> 2 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">BarChart</span><span class="p">,</span> <span class="n">Reference</span>
<span class="lineno"> 3 </span>
<span class="lineno"> 4 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="lineno"> 5 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="lineno"> 6 </span>
<span class="lineno"> 7 </span><span class="c1"># Let's create some sample sales data</span>
<span class="lineno"> 8 </span><span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
<span class="lineno"> 9 </span> <span class="p">[</span><span class="s2">"Product"</span><span class="p">,</span> <span class="s2">"Online"</span><span class="p">,</span> <span class="s2">"Store"</span><span class="p">],</span>
<span class="lineno">10 </span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">45</span><span class="p">],</span>
<span class="lineno">11 </span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">30</span><span class="p">],</span>
<span class="lineno">12 </span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">25</span><span class="p">],</span>
<span class="lineno">13 </span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">30</span><span class="p">],</span>
<span class="lineno">14 </span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">25</span><span class="p">],</span>
<span class="lineno">15 </span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">35</span><span class="p">],</span>
<span class="lineno">16 </span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">40</span><span class="p">],</span>
<span class="lineno">17 </span><span class="p">]</span>
<span class="lineno">18 </span>
<span class="lineno">19 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="lineno">20 </span> <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</pre></div>
<p>Now you’re going to start by creating a <strong>bar chart</strong> that displays the total number of sales per product:</p>
<div class="highlight python"><pre><span></span><span class="lineno">22 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">BarChart</span><span class="p">()</span>
<span class="lineno">23 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="lineno">24 </span> <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="lineno">25 </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span>
<span class="lineno">26 </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">27 </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="lineno">28 </span>
<span class="lineno">29 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="lineno">30 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">"E2"</span><span class="p">)</span>
<span class="lineno">31 </span>
<span class="lineno">32 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"chart.xlsx"</span><span class="p">)</span>
</pre></div>
<p>There you have it. Below, you can see a very straightforward bar chart showing the difference between <strong>online</strong> product sales online and <strong>in-store</strong> product sales:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png&w=540&sig=bcdfa015a56b903169702ddbeb1ec06c8d67bc87 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png&w=1080&sig=904be9b2a172b3ae7436311c9d05f6a9ad8ae451 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png 2160w" sizes="75vw" alt="Example Spreadsheet With Bar Chart"/></a></p>
<p>Like with images, the top left corner of the chart is on the cell you added the chart to. In your case, it was on cell <code>E2</code>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Depending on whether you’re using Microsoft Excel or an open-source alternative (LibreOffice or OpenOffice), the chart might look slightly different.</p>
</div>
<p>Try creating a <strong>line chart</strong> instead, changing the data a bit:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">random</span>
<span class="lineno"> 2 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="lineno"> 3 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
<span class="lineno"> 4 </span>
<span class="lineno"> 5 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="lineno"> 6 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="lineno"> 7 </span>
<span class="lineno"> 8 </span><span class="c1"># Let's create some sample sales data</span>
<span class="lineno"> 9 </span><span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
<span class="lineno">10 </span> <span class="p">[</span><span class="s2">""</span><span class="p">,</span> <span class="s2">"January"</span><span class="p">,</span> <span class="s2">"February"</span><span class="p">,</span> <span class="s2">"March"</span><span class="p">,</span> <span class="s2">"April"</span><span class="p">,</span>
<span class="lineno">11 </span> <span class="s2">"May"</span><span class="p">,</span> <span class="s2">"June"</span><span class="p">,</span> <span class="s2">"July"</span><span class="p">,</span> <span class="s2">"August"</span><span class="p">,</span> <span class="s2">"September"</span><span class="p">,</span>
<span class="lineno">12 </span> <span class="s2">"October"</span><span class="p">,</span> <span class="s2">"November"</span><span class="p">,</span> <span class="s2">"December"</span><span class="p">],</span>
<span class="lineno">13 </span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">],</span>
<span class="lineno">14 </span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="p">],</span>
<span class="lineno">15 </span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="p">],</span>
<span class="lineno">16 </span><span class="p">]</span>
<span class="lineno">17 </span>
<span class="lineno">18 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="lineno">19 </span> <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="lineno">20 </span>
<span class="lineno">21 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">22 </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="lineno">23 </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">24 </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">):</span>
<span class="lineno">25 </span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">:</span>
<span class="lineno">26 </span> <span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div>
<p>With the above code, you’ll be able to generate some random data regarding the sales of 3 different products across a whole year.</p>
<p>Once that’s done, you can very easily create a line chart with the following code:</p>
<div class="highlight python"><pre><span></span><span class="lineno">28 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
<span class="lineno">29 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="lineno">30 </span> <span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">31 </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="lineno">32 </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="lineno">33 </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
<span class="lineno">34 </span>
<span class="lineno">35 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="lineno">36 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">"C6"</span><span class="p">)</span>
<span class="lineno">37 </span>
<span class="lineno">38 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"line_chart.xlsx"</span><span class="p">)</span>
</pre></div>
<p>Here’s the outcome of the above piece of code:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png&w=540&sig=362319a9716ded57c9567de98c52a4dc805b5346 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png&w=1080&sig=cbd7b37aa77318e4ed8d2401281817fb53a7144b 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart"/></a></p>
<p>One thing to keep in mind here is the fact that you’re using <code>from_rows=True</code> when adding the data. This argument makes the chart plot row by row instead of column by column.</p>
<p>In your sample data, you see that each product has a row with 12 values (1 column per month). That’s why you use <code>from_rows</code>. If you don’t pass that argument, by default, the chart tries to plot by column, and you’ll get a month-by-month comparison of sales.</p>
<p>Another difference that has to do with the above argument change is the fact that our <code>Reference</code> now starts from the first column, <code>min_col=1</code>, instead of the second one. This change is needed because the chart now expects the first column to have the titles.</p>
<p>There are a couple of other things you can also change regarding the style of the chart. For example, you can add specific categories to the chart:</p>
<div class="highlight python"><pre><span></span><span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
<span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
</pre></div>
<p>Add this piece of code before saving the workbook, and you should see the month names appearing instead of numbers:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png&w=540&sig=6e48719b75e585dcb58fec3768def0630bb367ff 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png&w=1080&sig=9fed1350289fe067d8f50c07516bfcc460f4a720 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart and Categories"/></a></p>
<p>Code-wise, this is a minimal change. But in terms of the readability of the spreadsheet, this makes it much easier for someone to open the spreadsheet and understand the chart straight away.</p>
<p>Another thing you can do to improve the chart readability is to add an axis. You can do it using the attributes <code>x_axis</code> and <code>y_axis</code>:</p>
<div class="highlight python"><pre><span></span><span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Months"</span>
<span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Sales (per unit)"</span>
</pre></div>
<p>This will generate a spreadsheet like the below one:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png&w=540&sig=ac73f73702b55a957c77d6da224cde46c2c9a802 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png&w=1080&sig=408f6929a900e4a5ccb5cb1cca8647cf64d0d069 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart, Categories and Axis Titles"/></a></p>
<p>As you can see, small changes like the above make reading your chart a much easier and quicker task.</p>
<p>There is also a way to style your chart by using Excel’s default <code>ChartStyle</code> property. In this case, you have to choose a number between 1 and 48. Depending on your choice, the colors of your chart change as well:</p>
<div class="highlight python"><pre><span></span><span class="c1"># You can play with this by choosing any number between 1 and 48</span>
<span class="n">chart</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="mi">24</span>
</pre></div>
<p>With the style selected above, all lines have some shade of orange:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png&w=540&sig=4b0439c6c48d0b3f96f411e6198f6fd49d5c7026 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png&w=1080&sig=4d0aad0a27321bb321dc9c88158f357155968121 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart, Categories, Axis Titles and Style"/></a></p>
<p>There is no clear documentation on what each style number looks like, but <a href="https://1drv.ms/x/s!Asf0Y5Y4GI3Mg6kZNRd1IA09NLWv9A">this spreadsheet</a> has a few examples of the styles available.</p>
<div class="card mb-3" id="collapse_card0fb191">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse0fb191" aria-expanded="false" aria-controls="collapse0fb191">Complete Code Example</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse0fb191" aria-expanded="false" aria-controls="collapse0fb191">Show/Hide</button></p></div>
<div id="collapse0fb191" class="collapse" data-parent="#collapse_card0fb191"><div class="card-body" markdown="1">
<p>Here’s the full code used to generate the line chart with categories, axis titles, and style:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="c1"># Let's create some sample sales data</span>
<span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="s2">""</span><span class="p">,</span> <span class="s2">"January"</span><span class="p">,</span> <span class="s2">"February"</span><span class="p">,</span> <span class="s2">"March"</span><span class="p">,</span> <span class="s2">"April"</span><span class="p">,</span>
<span class="s2">"May"</span><span class="p">,</span> <span class="s2">"June"</span><span class="p">,</span> <span class="s2">"July"</span><span class="p">,</span> <span class="s2">"August"</span><span class="p">,</span> <span class="s2">"September"</span><span class="p">,</span>
<span class="s2">"October"</span><span class="p">,</span> <span class="s2">"November"</span><span class="p">,</span> <span class="s2">"December"</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">],</span>
<span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="p">],</span>
<span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="p">],</span>
<span class="p">]</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
<span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">):</span>
<span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">:</span>
<span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="c1"># Create a LineChart and add the main data</span>
<span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
<span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Add categories to the chart</span>
<span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
<span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
<span class="c1"># Rename the X and Y Axis</span>
<span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Months"</span>
<span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Sales (per unit)"</span>
<span class="c1"># Apply a specific Style</span>
<span class="n">chart</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="mi">24</span>
<span class="c1"># Save!</span>
<span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">"C6"</span><span class="p">)</span>
<span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"line_chart.xlsx"</span><span class="p">)</span>
</pre></div>
</div></div>
</div>
<p>There are a lot more chart types and customization you can apply, so be sure to check out the <a href="https://openpyxl.readthedocs.io/en/stable/charts/introduction.html">package documentation</a> on this if you need some specific formatting.</p>
<h3 id="convert-python-classes-to-excel-spreadsheet">Convert Python Classes to Excel Spreadsheet</h3>
<p>You already saw how to convert an Excel spreadsheet’s data into Python classes, but now let’s do the opposite.</p>
<p>Let’s imagine you have a database and are using some Object-Relational Mapping (ORM) to map DB objects into Python classes. Now, you want to export those same objects into a spreadsheet.</p>
<p>Let’s assume the following <a href="https://realpython.com/python-data-classes/">data classes</a> to represent the data coming from your database regarding product sales:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">dataclasses</span> <span class="k">import</span> <span class="n">dataclass</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="k">import</span> <span class="n">List</span>
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Sale</span><span class="p">:</span>
<span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">quantity</span><span class="p">:</span> <span class="nb">int</span>
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Product</span><span class="p">:</span>
<span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">sales</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Sale</span><span class="p">]</span>
</pre></div>
<p>Now, let’s generate some random data, assuming the above classes are stored in a <code>db_classes.py</code> file:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">random</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="c1"># Ignore these for now. You'll use them in a sec ;)</span>
<span class="lineno"> 4 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="lineno"> 5 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
<span class="lineno"> 6 </span>
<span class="lineno"> 7 </span><span class="kn">from</span> <span class="nn">db_classes</span> <span class="k">import</span> <span class="n">Product</span><span class="p">,</span> <span class="n">Sale</span>
<span class="lineno"> 8 </span>
<span class="lineno"> 9 </span><span class="n">products</span> <span class="o">=</span> <span class="p">[]</span>
<span class="lineno">10 </span>
<span class="lineno">11 </span><span class="c1"># Let's create 5 products</span>
<span class="lineno">12 </span><span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">6</span><span class="p">):</span>
<span class="lineno">13 </span> <span class="n">sales</span> <span class="o">=</span> <span class="p">[]</span>
<span class="lineno">14 </span>
<span class="lineno">15 </span> <span class="c1"># Create 5 months of sales</span>
<span class="lineno">16 </span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="lineno">17 </span> <span class="n">sale</span> <span class="o">=</span> <span class="n">Sale</span><span class="p">(</span><span class="n">quantity</span><span class="o">=</span><span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">))</span>
<span class="lineno">18 </span> <span class="n">sales</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sale</span><span class="p">)</span>
<span class="lineno">19 </span>
<span class="lineno">20 </span> <span class="n">product</span> <span class="o">=</span> <span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="n">idx</span><span class="p">),</span>
<span class="lineno">21 </span> <span class="n">name</span><span class="o">=</span><span class="s2">"Product </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">idx</span><span class="p">,</span>
<span class="lineno">22 </span> <span class="n">sales</span><span class="o">=</span><span class="n">sales</span><span class="p">)</span>
<span class="lineno">23 </span> <span class="n">products</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
</pre></div>
<p>By running this piece of code, you should get 5 products with 5 months of sales with a random quantity of sales for each month.</p>
<p>Now, to convert this into a spreadsheet, you need to iterate over the data and append it to the spreadsheet:</p>
<div class="highlight python"><pre><span></span><span class="lineno">25 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="lineno">26 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="lineno">27 </span>
<span class="lineno">28 </span><span class="c1"># Append column names first</span>
<span class="lineno">29 </span><span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="s2">"Product ID"</span><span class="p">,</span> <span class="s2">"Product Name"</span><span class="p">,</span> <span class="s2">"Month 1"</span><span class="p">,</span>
<span class="lineno">30 </span> <span class="s2">"Month 2"</span><span class="p">,</span> <span class="s2">"Month 3"</span><span class="p">,</span> <span class="s2">"Month 4"</span><span class="p">,</span> <span class="s2">"Month 5"</span><span class="p">])</span>
<span class="lineno">31 </span>
<span class="lineno">32 </span><span class="c1"># Append the data</span>
<span class="lineno">33 </span><span class="k">for</span> <span class="n">product</span> <span class="ow">in</span> <span class="n">products</span><span class="p">:</span>
<span class="lineno">34 </span> <span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">product</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="n">product</span><span class="o">.</span><span class="n">name</span><span class="p">]</span>
<span class="lineno">35 </span> <span class="k">for</span> <span class="n">sale</span> <span class="ow">in</span> <span class="n">product</span><span class="o">.</span><span class="n">sales</span><span class="p">:</span>
<span class="lineno">36 </span> <span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sale</span><span class="o">.</span><span class="n">quantity</span><span class="p">)</span>
<span class="lineno">37 </span> <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
<p>That’s it. That should allow you to create a spreadsheet with some data coming from your database.</p>
<p>However, why not use some of that cool knowledge you gained recently to add a chart as well to display that data more visually?</p>
<p>All right, then you could probably do something like this:</p>
<div class="highlight python"><pre><span></span><span class="lineno">38 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
<span class="lineno">39 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="lineno">40 </span> <span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">41 </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">6</span><span class="p">,</span>
<span class="lineno">42 </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
<span class="lineno">43 </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
<span class="lineno">44 </span>
<span class="lineno">45 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="lineno">46 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">"B8"</span><span class="p">)</span>
<span class="lineno">47 </span>
<span class="lineno">48 </span><span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
<span class="lineno">49 </span> <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="lineno">50 </span> <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="lineno">51 </span> <span class="n">min_col</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="lineno">52 </span> <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
<span class="lineno">53 </span><span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
<span class="lineno">54 </span>
<span class="lineno">55 </span><span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Months"</span>
<span class="lineno">56 </span><span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">"Sales (per unit)"</span>
<span class="lineno">57 </span>
<span class="lineno">58 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"oop_sample.xlsx"</span><span class="p">)</span>
</pre></div>
<p>Now we’re talking! Here’s a spreadsheet generated from database objects and with a chart and everything:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png&w=540&sig=135f4ee5413467c91f65bbb6e914724cdf1fd413 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png&w=1080&sig=5b7dd165f92c237ddd049350ca6fc6a165e10512 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png 2160w" sizes="75vw" alt="Example Spreadsheet With Conversion from Python Data Classes"/></a></p>
<p>That’s a great way for you to wrap up your new knowledge of charts!</p>
<h3 id="bonus-working-with-pandas">Bonus: Working With Pandas</h3>
<p>Even though you can use <a href="https://realpython.com/working-with-large-excel-files-in-pandas/">Pandas to handle Excel files</a>, there are few things that you either can’t accomplish with Pandas or that you’d be better off just using <code>openpyxl</code> directly.</p>
<p>For example, some of the advantages of using <code>openpyxl</code> are the ability to easily customize your spreadsheet with styles, conditional formatting, and such.</p>
<p>But guess what, you don’t have to worry about picking. In fact, <code>openpyxl</code> has support for both converting data from a Pandas DataFrame into a workbook or the opposite, converting an <code>openpyxl</code> workbook into a Pandas DataFrame.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> If you’re new to Pandas, check our <a href="https://realpython.com/courses/pandas-dataframes-101/">course on Pandas DataFrames</a> beforehand.</p>
</div>
<p>First things first, remember to install the <code>pandas</code> package:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install pandas
</pre></div>
<p>Then, let’s create a sample DataFrame:</p>
<div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="lineno"> 4 </span> <span class="s2">"Product Name"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"Product 1"</span><span class="p">,</span> <span class="s2">"Product 2"</span><span class="p">],</span>
<span class="lineno"> 5 </span> <span class="s2">"Sales Month 1"</span><span class="p">:</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span>
<span class="lineno"> 6 </span> <span class="s2">"Sales Month 2"</span><span class="p">:</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">35</span><span class="p">],</span>
<span class="lineno"> 7 </span><span class="p">}</span>
<span class="lineno"> 8 </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
<p>Now that you have some data, you can use <code>.dataframe_to_rows()</code> to convert it from a DataFrame into a worksheet:</p>
<div class="highlight python"><pre><span></span><span class="lineno">10 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
<span class="lineno">11 </span><span class="kn">from</span> <span class="nn">openpyxl.utils.dataframe</span> <span class="k">import</span> <span class="n">dataframe_to_rows</span>
<span class="lineno">12 </span>
<span class="lineno">13 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
<span class="lineno">14 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="lineno">15 </span>
<span class="lineno">16 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">dataframe_to_rows</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="lineno">17 </span> <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="lineno">18 </span>
<span class="lineno">19 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"pandas.xlsx"</span><span class="p">)</span>
</pre></div>
<p>You should see a spreadsheet that looks like this:</p>
<p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png&w=540&sig=636303dd8f99512651c5868f4ef572b2afa75d3c 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png&w=1080&sig=c2ddf015c46566fc545ad03187cc5780a61938f9 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png 2160w" sizes="75vw" alt="Example Spreadsheet With Data from Pandas Data Frame"/></a></p>
<p>If you want to add the <a href="https://realpython.com/python-data-cleaning-numpy-pandas/#changing-the-index-of-a-dataframe">DataFrame’s index</a>, you can change <code>index=True</code>, and it adds each row’s index into your spreadsheet.</p>
<p>On the other hand, if you want to convert a spreadsheet into a DataFrame, you can also do it in a very straightforward way like so:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">values</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">values</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">values</span><span class="p">)</span>
</pre></div>
<p>Alternatively, if you want to add the correct headers and use the review ID as the index, for example, then you can also do it like this instead:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
<span class="kn">from</span> <span class="nn">mapping</span> <span class="k">import</span> <span class="n">REVIEW_ID</span>
<span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">"sample.xlsx"</span><span class="p">)</span>
<span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">values</span>
<span class="c1"># Set the first row as the columns for the DataFrame</span>
<span class="n">cols</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c1"># Set the field "review_id" as the indexes for each row</span>
<span class="n">idx</span> <span class="o">=</span> <span class="p">[</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_ID</span><span class="p">]</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="n">idx</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="n">cols</span><span class="p">)</span>
</pre></div>
<p>Using indexes and columns allows you to access data from your DataFrame easily:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">df</span><span class="o">.</span><span class="n">columns</span>
<span class="go">Index(['marketplace', 'customer_id', 'review_id', 'product_id',</span>
<span class="go"> 'product_parent', 'product_title', 'product_category', 'star_rating',</span>
<span class="go"> 'helpful_votes', 'total_votes', 'vine', 'verified_purchase',</span>
<span class="go"> 'review_headline', 'review_body', 'review_date'],</span>
<span class="go"> dtype='object')</span>
<span class="gp">>>> </span><span class="c1"># Get first 10 reviews' star rating</span>
<span class="gp">>>> </span><span class="n">df</span><span class="p">[</span><span class="s2">"star_rating"</span><span class="p">][:</span><span class="mi">10</span><span class="p">]</span>
<span class="go">R3O9SGZBVQBV76 5</span>
<span class="go">RKH8BNC3L5DLF 5</span>
<span class="go">R2HLE8WKZSU3NL 2</span>
<span class="go">R31U3UH5AZ42LL 5</span>
<span class="go">R2SV659OUJ945Y 4</span>
<span class="go">RA51CP8TR5A2L 5</span>
<span class="go">RB2Q7DLDN6TH6 5</span>
<span class="go">R2RHFJV0UYBK3Y 1</span>
<span class="go">R2Z6JOQ94LFHEP 5</span>
<span class="go">RX27XIIWY5JPB 4</span>
<span class="go">Name: star_rating, dtype: int64</span>
<span class="gp">>>> </span><span class="c1"># Grab review with id "R2EQL1V1L6E0C9", using the index</span>
<span class="gp">>>> </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">"R2EQL1V1L6E0C9"</span><span class="p">]</span>
<span class="go">marketplace US</span>
<span class="go">customer_id 15305006</span>
<span class="go">review_id R2EQL1V1L6E0C9</span>
<span class="go">product_id B004LURNO6</span>
<span class="go">product_parent 892860326</span>
<span class="go">review_headline Five Stars</span>
<span class="go">review_body Love it</span>
<span class="go">review_date 2015-08-31</span>
<span class="go">Name: R2EQL1V1L6E0C9, dtype: object</span>
</pre></div>
<p>There you go, whether you want to use <code>openpyxl</code> to prettify your Pandas dataset or use Pandas to do some hardcore algebra, you now know how to switch between both packages.</p>
<h2 id="conclusion">Conclusion</h2>
<p><em>Phew</em>, after that long read, you now know how to work with spreadsheets in Python! You can rely on <code>openpyxl</code>, your trustworthy companion, to:</p>
<ul>
<li>Extract valuable information from spreadsheets in a Pythonic manner</li>
<li>Create your own spreadsheets, no matter the complexity level</li>
<li>Add cool features such as conditional formatting or charts to your spreadsheets</li>
</ul>
<p>There are a few other things you can do with <code>openpyxl</code> that might not have been covered in this tutorial, but you can always check the package’s official <a href="https://openpyxl.readthedocs.io/en/stable/index.html">documentation website</a> to learn more about it. You can even venture into checking its <a href="https://bitbucket.org/openpyxl/openpyxl/src/default/">source code</a> and improving the package further.</p>
<p>Feel free to leave any comments below if you have any questions, or if there’s any section you’d love to hear more about.</p>
<div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Your Guide to the CPython Source Codehttps://realpython.com/cpython-source-code-guide/2019-08-21T16:10:00+00:00In this detailed Python tutorial, you'll explore the CPython source code. By following this step-by-step walkthrough, you'll take a deep dive into how the CPython compiler works and how your Python code gets executed.
<p>Are there certain parts of Python that just seem magic? Like how are dictionaries so much faster than looping over a list to find an item. How does a generator remember the state of the variables each time it yields a value and why do you never have to allocate memory like other languages? It turns out, CPython, the most popular Python runtime is written in human-readable C and Python code. This tutorial will walk you through the CPython source code. </p>
<p>You’ll cover all the concepts behind the internals of CPython, how they work and visual explanations as you go.</p>
<p><strong>You’ll learn how to:</strong></p>
<ul>
<li>Read and navigate the source code</li>
<li>Compile CPython from source code</li>
<li>Navigate and comprehend the inner workings of concepts like lists, dictionaries, and generators</li>
<li>Run the test suite</li>
<li>Modify or upgrade components of the CPython library to contribute them to future versions</li>
</ul>
<p>Yes, this is a very long article. If you just made yourself a fresh cup of tea, coffee or your favorite beverage, it’s going to be cold by the end of Part 1. </p>
<p>This tutorial is split into five parts. Take your time for each part and make sure you try out the demos and the interactive components. You can feel a sense of achievement that you grasp the core concepts of Python that can make you a better Python programmer.</p>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-mastery-course" data-focus="false">5 Thoughts On Python Mastery</a>, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.</p></div>
<h2 h1="h1" id="part-1-introduction-to-cpython">Part 1: Introduction to CPython</h2>
<p>When you type <code>python</code> at the console or install a Python distribution from <a href="https://www.python.org">python.org</a>, you are running <strong>CPython</strong>. CPython is one of the many Python runtimes, maintained and written by different teams of developers. Some other runtimes you may have heard are <a href="https://pypy.org/">PyPy</a>, <a href="https://cython.org/">Cython</a>, and <a href="https://www.jython.org/">Jython</a>.</p>
<p>The unique thing about CPython is that it contains both a runtime and the shared language specification that all Python runtimes use. CPython is the “official,” or reference implementation of Python.</p>
<p>The Python language specification is the document that the description of the Python language. For example, it says that <code>assert</code> is a reserved keyword, and that <code>[]</code> is used for indexing, slicing, and creating empty lists.</p>
<p>Think about what you expect to be inside the Python distribution on your computer:</p>
<ul>
<li>When you type <code>python</code> without a file or module, it gives an interactive prompt.</li>
<li>You can import built-in modules from the standard library like <code>json</code>.</li>
<li>You can install packages from the internet using <code>pip</code>.</li>
<li>You can test your applications using the built-in <code>unittest</code> library.</li>
</ul>
<p>These are all part of the CPython distribution. There’s a lot more than just a compiler.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> This article is written against version <a href="https://github.com/python/cpython/tree/v3.8.0b4">3.8.0b4</a> of the CPython source code.</p>
</div>
<h3 id="whats-in-the-source-code">What’s in the Source Code?</h3>
<p>The CPython source distribution comes with a whole range of tools, libraries, and components. We’ll explore those in this article. First we are going to focus on the compiler.</p>
<p>To download a copy of the CPython source code, you can use <code>git</code> to pull the latest version to a working copy locally:</p>
<div class="highlight sh"><pre><span></span><span class="go">git clone https://github.com/python/cpython</span>
<span class="go">cd cpython</span>
<span class="go">git checkout v3.8.0b4</span>
</pre></div>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> If you don’t have Git available, you can download the source in a <a href="https://github.com/python/cpython/archive/v3.8.0b4.zip">ZIP</a> file directly from the GitHub website.</p>
</div>
<p>Inside of the newly downloaded <code>cpython</code> directory, you will find the following subdirectories:</p>
<div class="highlight"><pre><span></span>cpython/
โ
โโโ Doc โ Source for the documentation
โโโ Grammar โ The computer-readable language definition
โโโ Include โ The C header files
โโโ Lib โ Standard library modules written in Python
โโโ Mac โ macOS support files
โโโ Misc โ Miscellaneous files
โโโ Modules โ Standard Library Modules written in C
โโโ Objects โ Core types and the object model
โโโ Parser โ The Python parser source code
โโโ PC โ Windows build support files
โโโ PCbuild โ Windows build support files for older Windows versions
โโโ Programs โ Source code for the python executable and other binaries
โโโ Python โ The CPython interpreter source code
โโโ Tools โ Standalone tools useful for building or extending Python
</pre></div>
<p>Next, we’ll compile CPython from the source code. This step requires a C compiler, and some build tools, which depend on the operating system you’re using.</p>
<h3 id="compiling-cpython-macos">Compiling CPython (macOS)</h3>
<p>Compiling CPython on macOS is straightforward. You will first need the essential C compiler toolkit. The Command Line Development Tools is an app that you can update in macOS through the App Store. You need to perform the initial installation on the terminal.</p>
<p>To open up a terminal in macOS, go to the Launchpad, then <em>Other</em> then choose the <em>Terminal</em> app. You will want to save this app to your Dock, so right-click the Icon and select <em>Keep in Dock</em>.</p>
<p>Now, within the terminal, install the C compiler and toolkit by running the following:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> xcode-select --install
</pre></div>
<p>This command will pop up with a prompt to download and install a set of tools, including Git, Make, and the GNU C compiler.</p>
<p>You will also need a working copy of <a href="https://www.openssl.org/">OpenSSL</a> to use for fetching packages from the PyPi.org website. If you later plan on using this build to install additional packages, SSL validation is required.</p>
<p>The simplest way to install OpenSSL on macOS is by using <a href="https://brew.sh">HomeBrew</a>. If you already have HomeBrew installed, you can install the dependencies for CPython with the <code>brew install</code> command:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> brew install openssl xz zlib
</pre></div>
<p>Now that you have the dependencies, you can run the <code>configure</code> script, enabling SSL support by discovering the location that HomeBrew installed to and enabling the debug hooks <code>--with-pydebug</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"-I</span><span class="k">$(</span>brew --prefix zlib<span class="k">)</span><span class="s2">/include"</span> <span class="se">\</span>
<span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L</span><span class="k">$(</span>brew --prefix zlib<span class="k">)</span><span class="s2">/lib"</span> <span class="se">\</span>
./configure --with-openssl<span class="o">=</span><span class="k">$(</span>brew --prefix openssl<span class="k">)</span> --with-pydebug
</pre></div>
<p>This will generate a <code>Makefile</code> in the root of the repository that you can use to automate the build process. The <code>./configure</code> step only needs to be run once. You can build the CPython binary by running:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> make -j2 -s
</pre></div>
<p>The <code>-j2</code> flag allows <code>make</code> to run 2 jobs simultaneously. If you have 4 cores, you can change this to 4. The <code>-s</code> flag stops the <code>Makefile</code> from printing every command it runs to the console. You can remove this, but the output is very verbose.</p>
<p>During the build, you may receive some errors, and in the summary, it will notify you that not all packages could be built. For example, <code>_dbm</code>, <code>_sqlite3</code>, <code>_uuid</code>, <code>nis</code>, <code>ossaudiodev</code>, <code>spwd</code>, and <code>_tkinter</code> would fail to build with this set of instructions. That’s okay if you aren’t planning on developing against those packages. If you are, then check out the <a href="https://devguide.python.org/">dev guide</a> website for more information.</p>
<p>The build will take a few minutes and generate a binary called <code>python.exe</code>. Every time you make changes to the source code, you will need to re-run <code>make</code> with the same flags.
The <code>python.exe</code> binary is the debug binary of CPython. Execute <code>python.exe</code> to see a working REPL:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe
<span class="go">Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03) </span>
<span class="go">[Clang 10.0.1 (clang-1001.0.46.4)] on darwin</span>
<span class="go">Type "help", "copyright", "credits" or "license" for more information.</span>
<span class="gp">></span>>>
</pre></div>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong>
Yes, that’s right, the macOS build has a file extension for <code>.exe</code>. This is <em>not</em> because it’s a Windows binary. Because macOS has a case-insensitive filesystem and when working with the binary, the developers didn’t want people to accidentally refer to the directory <code>Python/</code> so <code>.exe</code> was appended to avoid ambiguity.
If you later run <code>make install</code> or <code>make altinstall</code>, it will rename the file back to <code>python</code>.</p>
</div>
<h3 id="compiling-cpython-linux">Compiling CPython (Linux)</h3>
<p>For Linux, the first step is to download and install <code>make</code>, <code>gcc</code>, <code>configure</code>, and <code>pkgconfig</code>. </p>
<p>For Fedora Core, RHEL, CentOS, or other yum-based systems: </p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo yum install yum-utils
</pre></div>
<p>For Debian, Ubuntu, or other <code>apt</code>-based systems:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo apt install build-essential
</pre></div>
<p>Then install the required packages, for Fedora Core, RHEL, CentOS or other yum-based systems: </p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo yum-builddep python3
</pre></div>
<p>For Debian, Ubuntu, or other <code>apt</code>-based systems:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo apt install libssl-dev zlib1g-dev libncurses5-dev <span class="se">\</span>
libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev <span class="se">\</span>
libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev libffi-dev
</pre></div>
<p>Now that you have the dependencies, you can run the <code>configure</code> script, enabling the debug hooks <code>--with-pydebug</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./configure --with-pydebug
</pre></div>
<p>Review the output to ensure that OpenSSL support was marked as <code>YES</code>. Otherwise, check with your distribution for instructions on installing the headers for OpenSSL.</p>
<p>Next, you can build the CPython binary by running the generated <code>Makefile</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> make -j2 -s
</pre></div>
<p>During the build, you may receive some errors, and in the summary, it will notify you that not all packages could be built. That’s okay if you aren’t planning on developing against those packages. If you are, then check out the <a href="https://devguide.python.org/">dev guide</a> website for more information.</p>
<p>The build will take a few minutes and generate a binary called <code>python</code>. This is the debug binary of CPython. Execute <code>./python</code> to see a working REPL:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python
<span class="go">Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03) </span>
<span class="go">[Clang 10.0.1 (clang-1001.0.46.4)] on darwin</span>
<span class="go">Type "help", "copyright", "credits" or "license" for more information.</span>
<span class="gp">></span>>>
</pre></div>
<h3 id="compiling-cpython-windows">Compiling CPython (Windows)</h3>
<p>Inside the PC folder is a Visual Studio project file for building and exploring CPython. To use this, you need to have Visual Studio installed on your PC.</p>
<p>The newest version of Visual Studio, Visual Studio 2019, makes it easier to work with Python and the CPython source code, so it is recommended for use in this tutorial. If you already have Visual Studio 2017 installed, that would also work fine.</p>
<p>None of the paid features are required for compiling CPython or this tutorial. You can use the Community edition of Visual Studio, which is available for free from <a href="https://visualstudio.microsoft.com/vs/">Microsoft’s Visual Studio website</a>.</p>
<p>Once you’ve downloaded the installer, you’ll be asked to select which components you want to install. The bare minimum for this tutorial is:</p>
<ul>
<li>The <strong>Python Development</strong> workload</li>
<li>The optional <strong>Python native development tools</strong></li>
<li>Python 3 64-bit (3.7.2) (can be deselected if you already have Python 3.7 installed)</li>
</ul>
<p>Any other optional features can be deselected if you want to be more conscientious with disk space:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png" width="2504" height="1260" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png&w=626&sig=86eb9f82580a69f533983087ba0fa4faf0d5bf96 626w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png&w=1252&sig=fe2157486f81073eabe043b3c441af23dd67b78a 1252w, https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png 2504w" sizes="75vw" alt="Visual Studio Options Window"/></a></p>
<p>The installer will then download and install all of the required components. The installation could take an hour, so you may want to read on and come back to this section.</p>
<p>Once the installer has completed, click the <em>Launch</em> button to start Visual Studio. You will be prompted to sign in. If you have a Microsoft account you can log in, or skip that step.</p>
<p>Once Visual Studio starts, you will be prompted to Open a Project. A shortcut to getting started with the Git configuration and cloning CPython is to choose the <em>Clone or check out code</em> option:</p>
<p><a href="https://files.realpython.com/media/Capture3.e19765d74ec4.PNG" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/Capture3.e19765d74ec4.PNG" width="2048" height="1420" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture3.e19765d74ec4.PNG&w=512&sig=e475e2a09cd780894f108850f91736c53ac95f27 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture3.e19765d74ec4.PNG&w=1024&sig=47d66a316c2546c9787e5af4cb4a9a006dca936d 1024w, https://files.realpython.com/media/Capture3.e19765d74ec4.PNG 2048w" sizes="75vw" alt="Choosing a Project Type in Visual Studio"/></a></p>
<p>For the project URL, type <code>https://github.com/python/cpython</code> to clone:</p>
<p><a href="https://files.realpython.com/media/Capture4.ea01418a971c.PNG" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/Capture4.ea01418a971c.PNG" width="2048" height="1420" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture4.ea01418a971c.PNG&w=512&sig=47ed81b234652446a4f0e77e2a3f70e0074ac222 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture4.ea01418a971c.PNG&w=1024&sig=de393823fe557847f295d040573bd061b2ccd557 1024w, https://files.realpython.com/media/Capture4.ea01418a971c.PNG 2048w" sizes="75vw" alt="Cloning projects in Visual Studio"/></a></p>
<p>Visual Studio will then download a copy of CPython from GitHub using the version of Git bundled with Visual Studio. This step also saves you the hassle of having to install Git on Windows. The download may take 10 minutes.</p>
<p>Once the project has downloaded, you need to point it to the <strong><code>pcbuild</code></strong> Solution file, by clicking on <em>Solutions and Projects</em> and selecting <code>pcbuild.sln</code>:</p>
<p><a href="https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG" target="_blank"><img class="img-fluid mx-auto d-block border w-50" src="https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG" width="863" height="565" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture6.3d06a62b8e87.PNG&w=215&sig=0d4c23758a58848ea65b74514aca9e15a72748fa 215w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture6.3d06a62b8e87.PNG&w=431&sig=05f0ebe121ef54abf51bcdb55fba695d28656868 431w, https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG 863w" sizes="75vw" alt="Selecting a solution"/></a></p>
<p>When the solution is loaded, it will prompt you to retarget the project’s inside the solution to the version of the C/C++ compiler you have installed. Visual Studio will also target the version of the Windows SDK you have installed.</p>
<p>Ensure that you change the Windows SDK version to the newest installed version and the platform toolset to the latest version. If you missed this window, you can right-click on the Solution in the <em>Solutions and Projects</em> window and click <em>Retarget Solution</em>.</p>
<p>Once this is complete, you need to download some source files to be able to build the whole CPython package. Inside the <code>PCBuild</code> folder there is a <code>.bat</code> file that automates this for you. <a href="https://www.youtube.com/watch?v=bgSSJQolR0E">Open up a command-line prompt inside</a> the downloaded <code>PCBuild</code> and run <code>get_externals.bat</code>:</p>
<div class="highlight sh"><pre><span></span><span class="go"> > get_externals.bat</span>
<span class="go">Using py -3.7 (found 3.7 with py.exe)</span>
<span class="go">Fetching external libraries...</span>
<span class="go">Fetching bzip2-1.0.6...</span>
<span class="go">Fetching sqlite-3.21.0.0...</span>
<span class="go">Fetching xz-5.2.2...</span>
<span class="go">Fetching zlib-1.2.11...</span>
<span class="go">Fetching external binaries...</span>
<span class="go">Fetching openssl-bin-1.1.0j...</span>
<span class="go">Fetching tcltk-8.6.9.0...</span>
<span class="go">Finished.</span>
</pre></div>
<p>Next, back within Visual Studio, build CPython by pressing <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-b">B</kbd></span>, or choosing <em>Build Solution</em> from the top menu. If you receive any errors about the Windows SDK being missing, make sure you set the right targeting settings in the <em>Retarget Solution</em> window. You should also see <em>Windows Kits</em> inside your Start Menu, and <em>Windows Software Development Kit</em> inside of that menu.</p>
<p>The build stage could take 10 minutes or more for the first time. Once the build is completed, you may see a few warnings that you can ignore and eventual completion.</p>
<p>To start the debug version of CPython, press <span class="keys"><kbd class="key-f5">F5</kbd></span> and CPython will start in Debug mode straight into the REPL:</p>
<p><a href="https://files.realpython.com/media/Capture8.967a3606daf0.PNG" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Capture8.967a3606daf0.PNG" width="3360" height="2100" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture8.967a3606daf0.PNG&w=840&sig=1822fde8ffe6946fc91e47dd8975aa23add8b23a 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture8.967a3606daf0.PNG&w=1680&sig=a093e68320ad548384c585840002e4294ab4bf94 1680w, https://files.realpython.com/media/Capture8.967a3606daf0.PNG 3360w" sizes="75vw" alt="CPython debugging Windows"/></a></p>
<p>Once this is completed, you can run the Release build by changing the build configuration from <em>Debug</em> to <em>Release</em> on the top menu bar and rerunning Build Solution again.
You now have both Debug and Release versions of the CPython binary within <code>PCBuild\win32\</code>.</p>
<p>You can set up Visual Studio to be able to open a REPL with either the Release or Debug build by choosing <em><code>Tools</code>-><code>Python</code>-><code>Python Environments</code></em> from the top menu:</p>
<p><a href="https://files.realpython.com/media/Environments.96a819ecf0b3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Environments.96a819ecf0b3.png" width="3360" height="2033" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Environments.96a819ecf0b3.png&w=840&sig=f8dd9b3b31d44c25cbe06d56078b0462cb0fa753 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Environments.96a819ecf0b3.png&w=1680&sig=e8ca69be87363b62ccda44bcf53bedd4e4e320c2 1680w, https://files.realpython.com/media/Environments.96a819ecf0b3.png 3360w" sizes="75vw" alt="Choosing Python environments"/></a></p>
<p>Then click <em>Add Environment</em> and then target the Debug or Release binary. The Debug binary will end in <code>_d.exe</code>, for example, <code>python_d.exe</code> and <code>pythonw_d.exe</code>. You will most likely want to use the debug binary as it comes with Debugging support in Visual Studio and will be useful for this tutorial.</p>
<p>In the Add Environment window, target the <code>python_d.exe</code> file as the interpreter inside the <code>PCBuild/win32</code> and the <code>pythonw_d.exe</code> as the windowed interpreter:</p>
<p><a href="https://files.realpython.com/media/environment3.d33858c1f6aa.PNG" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/environment3.d33858c1f6aa.PNG" width="2048" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment3.d33858c1f6aa.PNG&w=512&sig=ffc6b359ac60d689f40f98233466cef35354d239 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment3.d33858c1f6aa.PNG&w=1024&sig=d881ce7ecabc62624fb2a5154102a5be2ff6a4db 1024w, https://files.realpython.com/media/environment3.d33858c1f6aa.PNG 2048w" sizes="75vw" alt="Adding an environment in VS2019"/></a></p>
<p>Now, you can start a REPL session by clicking <em>Open Interactive Window</em> in the Python Environments window and you will see the REPL for the compiled version of Python:</p>
<p><a href="https://files.realpython.com/media/environment4.7c9eade3b74e.PNG" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/environment4.7c9eade3b74e.PNG" width="3360" height="2033" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment4.7c9eade3b74e.PNG&w=840&sig=9e384e72bcfdebb39fe6dc23f21c239a0895ad51 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment4.7c9eade3b74e.PNG&w=1680&sig=be56aac5b9baffc9b8b0de4701527954ed32ef53 1680w, https://files.realpython.com/media/environment4.7c9eade3b74e.PNG 3360w" sizes="75vw" alt="Python Environment REPL"/></a></p>
<p>During this tutorial there will be REPL sessions with example commands. I encourage you to use the Debug binary to run these REPL sessions in case you want to put in any breakpoints within the code.</p>
<p>Lastly, to make it easier to navigate the code, in the Solution View, click on the toggle button next to the Home icon to switch to Folder view:</p>
<p><a href="https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png" width="1231" height="692" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png&w=307&sig=14bf2b50bd86dfabedc3ee1ec75a60ebccd574c4 307w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png&w=615&sig=0d5644853d8c429ccc78fc1bced16d8f571f555d 615w, https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png 1231w" sizes="75vw" alt="Switching Environment Mode"/></a></p>
<p>Now you have a version of CPython compiled and ready to go, let’s find out how the CPython compiler works.</p>
<h3 id="what-does-a-compiler-do">What Does a Compiler Do?</h3>
<p>The purpose of a compiler is to convert one language into another. Think of a compiler like a translator. You would hire a translator to listen to you speaking in English and then speak in Japanese:</p>
<p><a href="https://files.realpython.com/media/t.38be306a7e83.png" target="_blank"><img class="img-fluid mx-auto d-block w-75" src="https://files.realpython.com/media/t.38be306a7e83.png" width="960" height="540" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/t.38be306a7e83.png&w=240&sig=2ad6eec49af1eaba79b83c099925e01a970d5efd 240w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/t.38be306a7e83.png&w=480&sig=033587107c8dceca7bcc292527a4c815fffb0b8d 480w, https://files.realpython.com/media/t.38be306a7e83.png 960w" sizes="75vw" alt="Translating from English to Japanese"/></a></p>
<p>Some compilers will compile into a low-level machine code which can be executed directly on a system. Other compilers will compile into an intermediary language, to be executed by a virtual machine.</p>
<p>One important decision to make when choosing a compiler is the system portability requirements. <a href="https://en.wikipedia.org/wiki/Java_bytecode">Java</a> and <a href="https://en.wikipedia.org/wiki/Common_Language_Runtime">.NET CLR</a> will compile into an Intermediary Language so that the compiled code is portable across multiple systems architectures. C, Go, C++, and Pascal will compile into a low-level executable that will only work on systems similar to the one it was compiled. </p>
<p>Because Python applications are typically distributed as source code, the role of the Python runtime is to convert the Python source code and execute it in one step. Internally, the CPython runtime does compile your code. A popular misconception is that Python is an interpreted language. It is actually compiled.</p>
<p>Python code is not compiled into machine-code. It is compiled into a special low-level intermediary language called <strong>bytecode</strong> that only CPython understands. This code is stored in <code>.pyc</code> files in a hidden directory and cached for execution. If you run the same Python application twice without changing the source code, it’ll always be much faster the second time. This is because it loads the compiled bytecode and executes it directly.</p>
<h3 id="why-is-cpython-written-in-c-and-not-python">Why Is CPython Written in C and Not Python?</h3>
<p>The <strong>C</strong> in CPython is a reference to the C programming language, implying that this Python distribution is written in the C language.</p>
<p>This statement is largely true: the compiler in CPython is written in pure C. However, many of the standard library modules are written in pure Python or a combination of C and Python.</p>
<p><strong>So why is CPython written in C and not Python?</strong></p>
<p>The answer is located in how compilers work. There are two types of compiler:</p>
<ol>
<li><strong><a href="https://en.wikipedia.org/wiki/Self-hosting">Self-hosted compilers</a></strong> are compilers written in the language they compile, such as the Go compiler.</li>
<li><strong><a href="https://en.wikipedia.org/wiki/Source-to-source_compiler">Source-to-source compilers</a></strong> are compilers written in another language that already have a compiler.</li>
</ol>
<p>If you’re writing a new programming language from scratch, you need an executable application to compile your compiler! You need a compiler to execute anything, so when new languages are developed, they’re often written first in an older, more established language.</p>
<p>A good example would be the Go programming language. The first Go compiler was written in C, then once Go could be compiled, the compiler was rewritten in Go. </p>
<p>CPython kept its C heritage: many of the standard library modules, like the <code>ssl</code> module or the <code>sockets</code> module, are written in C to access low-level operating system APIs.
The APIs in the Windows and Linux kernels for <a href="https://realpython.com/python-sockets/">creating network sockets</a>, <a href="https://realpython.com/working-with-files-in-python/">working with the filesystem</a> or <a href="https://realpython.com/python-gui-with-wxpython/">interacting with the display</a> are all written in C. It made sense for Python’s extensibility layer to be focused on the C language. Later in this article, we will cover the Python Standard Library and the C modules.</p>
<p>There is a Python compiler written in Python called <a href="https://pypy.org/">PyPy</a>. PyPy’s logo is an <a href="https://en.wikipedia.org/wiki/Ouroboros">Ouroboros</a> to represent the self-hosting nature of the compiler.</p>
<p>Another example of a cross-compiler for Python is <a href="https://www.jython.org/">Jython</a>. Jython is written in Java and compiles from Python source code into Java bytecode. In the same way that CPython makes it easy to import C libraries and use them from Python, Jython makes it easy to import and reference Java modules and classes.</p>
<h3 id="the-python-language-specification">The Python Language Specification</h3>
<p>Contained within the CPython source code is the definition of the Python language. This is the reference specification used by all the Python interpreters.</p>
<p>The specification is in both human-readable and machine-readable format. Inside the documentation is a detailed explanation of the Python language, what is allowed, and how each statement should behave.</p>
<h4 id="documentation">Documentation</h4>
<p>Located inside the <code>Doc/reference</code> directory are <a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> explanations of each of the features in the Python language. This forms the official Python reference guide on <a href="https://docs.python.org/3/reference/">docs.python.org</a>.</p>
<p>Inside the directory are the files you need to understand the whole language, structure, and keywords:</p>
<div class="highlight"><pre><span></span>cpython/Doc/reference
|
โโโ compound_stmts.rst
โโโ datamodel.rst
โโโ executionmodel.rst
โโโ expressions.rst
โโโ grammar.rst
โโโ import.rst
โโโ index.rst
โโโ introduction.rst
โโโ lexical_analysis.rst
โโโ simple_stmts.rst
โโโ toplevel_components.rst
</pre></div>
<p>Inside <code>compound_stmts.rst</code>, the documentation for compound statements, you can see a simple example defining the <code>with</code> statement.</p>
<p>The <code>with</code> statement can be used in multiple ways in Python, the simplest being the <a href="https://dbader.org/blog/python-context-managers-and-with-statement">instantiation of a context-manager</a> and a nested block of code:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">():</span>
<span class="o">...</span>
</pre></div>
<p>You can assign the result to a variable using the <code>as</code> keyword:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">()</span> <span class="k">as</span> <span class="n">y</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>You can also chain context managers together with a comma:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">()</span> <span class="k">as</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">()</span> <span class="k">as</span> <span class="n">jk</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>Next, we’ll explore the computer-readable documentation of the Python language.</p>
<h4 id="grammar">Grammar</h4>
<p>The documentation contains the human-readable specification of the language, and the machine-readable specification is housed in a single file, <a href="https://github.com/python/cpython/blob/master/Grammar/Grammar"><code>Grammar/Grammar</code></a>. </p>
<p>The Grammar file is written in a context-notation called <a href="https://en.m.wikipedia.org/wiki/Backus%E2%80%93Naur_form">Backus-Naur Form (BNF)</a>. BNF is not specific to Python and is often used as the notation for grammars in many other languages.</p>
<p>The concept of grammatical structure in a programming language is inspired by <a href="https://en.wikipedia.org/wiki/Syntactic_Structures">Noam Chomsky’s work on Syntactic Structures</a> in the 1950s!</p>
<p>Python’s grammar file uses the Extended-BNF (EBNF) specification with regular-expression syntax. So, in the grammar file you can use:</p>
<ul>
<li><strong><code>*</code></strong> for repetition</li>
<li><strong><code>+</code></strong> for at-least-once repetition</li>
<li><strong><code>[]</code></strong> for optional parts</li>
<li><strong><code>|</code></strong> for alternatives</li>
<li><strong><code>()</code></strong> for grouping</li>
</ul>
<p>If you search for the <code>with</code> statement in the grammar file, at around line 80 you’ll see the definitions for the <code>with</code> statement:</p>
<div class="highlight text"><pre><span></span>with_stmt: 'with' with_item (',' with_item)* ':' suite
with_item: test ['as' expr]
</pre></div>
<p>Anything in quotes is a string literal, which is how keywords are defined. So the <code>with_stmt</code> is specified as:</p>
<ol>
<li>Starting with the word <code>with</code></li>
<li>Followed by a <code>with_item</code>, which is a <code>test</code> and (optionally), the word <code>as</code>, and an expression</li>
<li>Following one or many items, each separated by a comma</li>
<li>Ending with a <code>:</code></li>
<li>Followed by a <code>suite</code></li>
</ol>
<p>There are references to some other definitions in these two lines:</p>
<ul>
<li><strong><code>suite</code></strong> refers to a block of code with one or multiple statements</li>
<li><strong><code>test</code></strong> refers to a simple statement that is evaluated</li>
<li><strong><code>expr</code></strong> refers to a simple expression</li>
</ul>
<p>If you want to explore those in detail, the whole of the Python grammar is defined in this single file.</p>
<p>If you want to see a recent example of how grammar is used, in PEP 572 the <strong>colon equals</strong> operator was added to the grammar file in <a href="https://github.com/python/cpython/commit/8f59ee01be3d83d5513a9a3f654a237d77d80d9a#diff-cb0b9d6312c0d67f6d4aa1966766cedd">this Git commit</a>.</p>
<h4 id="using-pgen">Using <code>pgen</code></h4>
<p>The grammar file itself is never used by the Python compiler. Instead, a parser table created by a tool called <code>pgen</code> is used. <code>pgen</code> reads the grammar file and converts it into a parser table. If you make changes to the grammar file, you must regenerate the parser table and recompile Python.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> The <code>pgen</code> application was rewritten in Python 3.8 from C to <a href="https://github.com/python/cpython/blob/master/Parser/pgen/pgen.py">pure Python</a>.</p>
</div>
<p>To see <code>pgen</code> in action, let’s change part of the Python grammar. Around line 51 you will see the definition of a <code>pass</code> statement:</p>
<div class="highlight text"><pre><span></span>pass_stmt: 'pass'
</pre></div>
<p>Change that line to accept the keyword <code>'pass'</code> or <code>'proceed'</code> as keywords:</p>
<div class="highlight text"><pre><span></span>pass_stmt: 'pass' | 'proceed'
</pre></div>
<p>Now you need to rebuild the grammar files.
On macOS and Linux, run <code>make regen-grammar</code> to run <code>pgen</code> over the altered grammar file. For Windows, there is no officially supported way of running <code>pgen</code>. However, you can clone <a href="https://github.com/tonybaloney/cpython/tree/pcbuildregen">my fork</a> and run <code>build.bat --regen</code> from within the <code>PCBuild</code> directory.</p>
<p>You should see an output similar to this, showing that the new <code>Include/graminit.h</code> and <code>Python/graminit.c</code> files have been generated:</p>
<div class="highlight text"><pre><span></span># Regenerate Doc/library/token-list.inc from Grammar/Tokens
# using Tools/scripts/generate_token.py
...
python3 ./Tools/scripts/update_file.py ./Include/graminit.h ./Include/graminit.h.new
python3 ./Tools/scripts/update_file.py ./Python/graminit.c ./Python/graminit.c.new
</pre></div>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> <code>pgen</code> works by converting the EBNF statements into a <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton">Non-deterministic Finite Automaton (NFA)</a>, which is then turned into a <a href="https://en.wikipedia.org/wiki/Deterministic_finite_automaton">Deterministic Finite Automaton (DFA)</a>.
The DFAs are used by the parser as parsing tables in a special way that’s unique to CPython. This technique was <a href="http://infolab.stanford.edu/~ullman/dragon/slides1.pdf">formed at Stanford University</a> and developed in the 1980s, just before the advent of Python.</p>
</div>
<p>With the regenerated parser tables, you need to recompile CPython to see the new syntax. Use the same compilation steps you used earlier for your operating system.</p>
<p>If the code compiled successfully, you can execute your new CPython binary and start a REPL.</p>
<p>In the REPL, you can now try defining a function and instead of using the <code>pass</code> statement, use the <code>proceed</code> keyword alternative that you compiled into the Python grammar:</p>
<div class="highlight text"><pre><span></span>Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def example():
... proceed
...
>>> example()
</pre></div>
<p>Well done! You’ve changed the CPython syntax and compiled your own version of CPython. Ship it!</p>
<p>Next, we’ll explore tokens and their relationship to grammar.</p>
<h4 id="tokens">Tokens</h4>
<p>Alongside the grammar file in the <code>Grammar</code> folder is a <a href="https://github.com/python/cpython/blob/master/Grammar/Tokens"><code>Tokens</code></a> file, which contains each of the unique types found as a leaf node in a parse tree. We will cover parser trees in depth later.
Each token also has a name and a generated unique ID. The names are used to make it simpler to refer to in the tokenizer.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> The <code>Tokens</code> file is a new feature in Python 3.8.</p>
</div>
<p>For example, the left parenthesis is called <code>LPAR</code>, and semicolons are called <code>SEMI</code>. You’ll see these tokens later in the article:</p>
<div class="highlight text"><pre><span></span>LPAR '('
RPAR ')'
LSQB '['
RSQB ']'
COLON ':'
COMMA ','
SEMI ';'
</pre></div>
<p>As with the <code>Grammar</code> file, if you change the <code>Tokens</code> file, you need to run <code>pgen</code> again. </p>
<p>To see tokens in action, you can use the <code>tokenize</code> module in CPython. Create a simple Python script called <code>test_tokens.py</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Hello world!</span>
<span class="k">def</span> <span class="nf">my_function</span><span class="p">():</span>
<span class="n">proceed</span>
</pre></div>
<div class="alert alert-primary" role="alert">
<p>For the rest of this tutorial, <code>./python.exe</code> will refer to the compiled version of CPython. However, the actual command will depend on your system.</p>
<p>For Windows:</p>
<div class="highlight sh"><pre><span></span><span class="go"> > python.exe</span>
</pre></div>
<p>For Linux:</p>
<div class="highlight sh"><pre><span></span><span class="go"> > ./python</span>
</pre></div>
<p>For macOS:</p>
<div class="highlight sh"><pre><span></span><span class="go"> > ./python.exe</span>
</pre></div>
</div>
<p>Then pass this file through a module built into the standard library called <code>tokenize</code>. You will see the list of tokens, by line and character. Use the <code>-e</code> flag to output the exact token name:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -m tokenize -e test_tokens.py
<span class="go">0,0-0,0: ENCODING 'utf-8' </span>
<span class="go">1,0-1,14: COMMENT '# Hello world!'</span>
<span class="go">1,14-1,15: NL '\n' </span>
<span class="go">2,0-2,3: NAME 'def' </span>
<span class="go">2,4-2,15: NAME 'my_function' </span>
<span class="go">2,15-2,16: LPAR '(' </span>
<span class="go">2,16-2,17: RPAR ')' </span>
<span class="go">2,17-2,18: COLON ':' </span>
<span class="go">2,18-2,19: NEWLINE '\n' </span>
<span class="go">3,0-3,3: INDENT ' ' </span>
<span class="go">3,3-3,7: NAME 'proceed' </span>
<span class="go">3,7-3,8: NEWLINE '\n' </span>
<span class="go">4,0-4,0: DEDENT '' </span>
<span class="go">4,0-4,0: ENDMARKER '' </span>
</pre></div>
<p>In the output, the first column is the range of the line/column coordinates, the second column is the name of the token, and the final column is the value of the token.</p>
<p>In the output, the <code>tokenize</code> module has implied some tokens that were not in the file. The <code>ENCODING</code> token for <code>utf-8</code>, and a blank line at the end, giving <code>DEDENT</code> to close the function declaration and an <code>ENDMARKER</code> to end the file.</p>
<p>It is best practice to have a blank line at the end of your Python source files. If you omit it, CPython adds it for you, with a tiny performance penalty.</p>
<p>The <code>tokenize</code> module is written in pure Python and is located in <a href="https://github.com/python/cpython/blob/master/Lib/tokenize.py"><code>Lib/tokenize.py</code></a> within the CPython source code.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Important:</strong> There are two tokenizers in the CPython source code: one written in Python, demonstrated here, and another written in C.
The tokenizer written in Python is meant as a utility, and the one written in C is used by the Python compiler. They have identical output and behavior. The version written in C is designed for performance and the module in Python is designed for debugging.</p>
</div>
<p>To see a verbose readout of the C tokenizer, you can run Python with the <code>-d</code> flag. Using the <code>test_tokens.py</code> script you created earlier, run it with the following:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -d test_tokens.py
<span class="go">Token NAME/'def' ... It's a keyword</span>
<span class="go"> DFA 'file_input', state 0: Push 'stmt'</span>
<span class="go"> DFA 'stmt', state 0: Push 'compound_stmt'</span>
<span class="go"> DFA 'compound_stmt', state 0: Push 'funcdef'</span>
<span class="go"> DFA 'funcdef', state 0: Shift.</span>
<span class="go">Token NAME/'my_function' ... It's a token we know</span>
<span class="go"> DFA 'funcdef', state 1: Shift.</span>
<span class="go">Token LPAR/'(' ... It's a token we know</span>
<span class="go"> DFA 'funcdef', state 2: Push 'parameters'</span>
<span class="go"> DFA 'parameters', state 0: Shift.</span>
<span class="go">Token RPAR/')' ... It's a token we know</span>
<span class="go"> DFA 'parameters', state 1: Shift.</span>
<span class="go"> DFA 'parameters', state 2: Direct pop.</span>
<span class="go">Token COLON/':' ... It's a token we know</span>
<span class="go"> DFA 'funcdef', state 3: Shift.</span>
<span class="go">Token NEWLINE/'' ... It's a token we know</span>
<span class="go"> DFA 'funcdef', state 5: [switch func_body_suite to suite] Push 'suite'</span>
<span class="go"> DFA 'suite', state 0: Shift.</span>
<span class="go">Token INDENT/'' ... It's a token we know</span>
<span class="go"> DFA 'suite', state 1: Shift.</span>
<span class="hll"><span class="go">Token NAME/'proceed' ... It's a keyword</span>
</span><span class="go"> DFA 'suite', state 3: Push 'stmt'</span>
<span class="go">...</span>
<span class="go"> ACCEPT.</span>
</pre></div>
<p>In the output, you can see that it highlighted <code>proceed</code> as a keyword. In the next chapter, we’ll see how executing the Python binary gets to the tokenizer and what happens from there to execute your code.</p>
<p>Now that you have an overview of the Python grammar and the relationship between tokens and statements, there is a way to convert the <code>pgen</code> output into an interactive graph.</p>
<p>Here is a screenshot of the Python 3.8a2 grammar:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png" width="3258" height="2248" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png&w=814&sig=21666b4228a46a6bcc7aeca6d5263e62a3aeb6d5 814w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png&w=1629&sig=01404157eddc09548f8cf7d4e995da9806c36fac 1629w, https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png 3258w" sizes="75vw" alt="Python 3.8 DFA node graph"/></a></p>
<p>The Python package used to generate this graph, <code>instaviz</code>, will be covered in a later chapter.</p>
<h3 id="memory-management-in-cpython">Memory Management in CPython</h3>
<p>Throughout this article, you will see references to a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L128"><code>PyArena</code></a> object. The arena is one of CPython’s memory management structures. The code is within <code>Python/pyarena.c</code> and contains a wrapper around C’s memory allocation and deallocation functions.</p>
<p>In a traditionally written C program, the developer <em>should</em> allocate memory for data structures before writing into that data. This allocation marks the memory as belonging to the process with the operating system.</p>
<p>It is also up to the developer to deallocate, or “free,” the allocated memory when its no longer being used and return it to the operating system’s block table of free memory.
If a process allocates memory for a variable, say within a function or loop, when that function has completed, the memory is not automatically given back to the operating system in C. So if it hasn’t been explicitly deallocated in the C code, it causes a memory leak. The process will continue to take more memory each time that function runs until eventually, the system runs out of memory, and crashes!</p>
<p>Python takes that responsibility away from the programmer and uses two algorithms: <a href="https://realpython.com/python-memory-management/">a reference counter and a garbage collector</a>.</p>
<p>Whenever an interpreter is instantiated, a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L128"><code>PyArena</code></a> is created and attached one of the fields in the interpreter. During the lifecycle of a CPython interpreter, many arenas could be allocated. They are connected with a linked list. The arena stores a list of pointers to Python Objects as a <code>PyListObject</code>. Whenever a new Python object is created, a pointer to it is added using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L203"><code>PyArena_AddPyObject()</code></a>. This function call stores a pointer in the arena’s list, <code>a_objects</code>.</p>
<div class="alert alert-primary" role="alert">
<p>Even though Python doesn’t have pointers, there are some <a href="https://realpython.com/pointers-in-python/">interesting techniques</a> to simulate the behavior of pointers.</p>
</div>
<p>The <code>PyArena</code> serves a second function, which is to allocate and reference a list of raw memory blocks. For example, a <code>PyList</code> would need extra memory if you added thousands of additional values. The <code>PyList</code> object’s C code does not allocate memory directly. The object gets raw blocks of memory from the <code>PyArena</code> by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L180"><code>PyArena_Malloc()</code></a> from the <code>PyObject</code> with the required memory size. This task is completed by another abstraction in <code>Objects/obmalloc.c</code>. In the object allocation module, memory can be allocated, freed, and reallocated for a Python Object.</p>
<p>A linked list of allocated blocks is stored inside the arena, so that when an interpreter is stopped, all managed memory blocks can be deallocated in one go using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L157"><code>PyArena_Free()</code></a>.</p>
<p>Take the <code>PyListObject</code> example. If you were to <code>.append()</code> an object to the end of a Python list, you don’t need to reallocate the memory used in the existing list beforehand. The <code>.append()</code> method calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/listobject.c#L36"><code>list_resize()</code></a> which handles memory allocation for lists. Each list object keeps a list of the amount of memory allocated. If the item you’re appending will fit inside the existing free memory, it is simply added. If the list needs more memory space, it is expanded. Lists are expanded in length as 0, 4, 8, 16, 25, 35, 46, 58, 72, 88.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L618"><code>PyMem_Realloc()</code></a> is called to expand the memory allocated in a list. <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L618"><code>PyMem_Realloc()</code></a> is an API wrapper for <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L1913"><code>pymalloc_realloc()</code></a>.</p>
<p>Python also has a special wrapper for the C call <code>malloc()</code>, which sets the max size of the memory allocation to help prevent buffer overflow errors (See <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/overlapped.c#L28"><code>PyMem_RawMalloc()</code></a>).</p>
<p>In summary: </p>
<ul>
<li>Allocation of raw memory blocks is done via <code>PyMem_RawAlloc()</code>.</li>
<li>The pointers to Python objects are stored within the <code>PyArena</code>.</li>
<li><code>PyArena</code> also stores a linked-list of allocated memory blocks.</li>
</ul>
<p>More information on the API is detailed on the <a href="https://docs.python.org/3/c-api/memory.html">CPython documentation</a>.</p>
<h4 id="reference-counting">Reference Counting</h4>
<p>To create a variable in Python, you have to assign a value to a <em>uniquely</em> named variable:</p>
<div class="highlight python"><pre><span></span><span class="n">my_variable</span> <span class="o">=</span> <span class="mi">180392</span>
</pre></div>
<p>Whenever a value is assigned to a variable in Python, the name of the variable is checked within the locals and globals scope to see if it already exists.</p>
<p>Because <code>my_variable</code> is not already within the <code>locals()</code> or <code>globals()</code> dictionary, this new object is created, and the value is assigned as being the numeric constant <code>180392</code>.</p>
<p>There is now one reference to <code>my_variable</code>, so the reference counter for <code>my_variable</code> is incremented by 1. </p>
<p>You will see function calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L239"><code>Py_INCREF()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L245"><code>Py_DECREF()</code></a> throughout the C source code for CPython. These functions increment and decrement the count of references to that object.</p>
<p>References to an object are decremented when a variable falls outside of the scope in which it was declared. Scope in Python can refer to a function or method, a comprehension, or a lambda function. These are some of the more literal scopes, but there are many other implicit scopes, like passing variables to a function call.</p>
<p>The handling of incrementing and decrementing references based on the language is built into the CPython compiler and the core execution loop, <code>ceval.c</code>, which we will cover in detail later in this article.</p>
<p>Whenever <code>Py_DECREF()</code> is called, and the counter becomes 0, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L707"><code>PyObject_Free()</code></a> function is called. For that object <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L157"><code>PyArena_Free()</code></a> is called for all of the memory that was allocated. </p>
<h4 id="garbage-collection">Garbage Collection</h4>
<p>How often does your garbage get collected? Weekly, or fortnightly? </p>
<p>When you’re finished with something, you discard it and throw it in the trash. But that trash won’t get collected straight away. You need to wait for the garbage trucks to come and pick it up.</p>
<p>CPython has the same principle, using a garbage collection algorithm. CPython’s garbage collector is enabled by default, happens in the background and works to deallocate memory that’s been used for objects which are no longer in use.</p>
<p>Because the garbage collection algorithm is a lot more complex than the reference counter, it doesn’t happen all the time, otherwise, it would consume a huge amount of CPU resources. It happens periodically, after a set number of operations.</p>
<p>CPython’s standard library comes with a Python module to interface with the arena and the garbage collector, the <code>gc</code> module. Here’s how to use the <code>gc</code> module in debug mode:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">gc</span>
<span class="gp">>>> </span><span class="n">gc</span><span class="o">.</span><span class="n">set_debug</span><span class="p">(</span><span class="n">gc</span><span class="o">.</span><span class="n">DEBUG_STATS</span><span class="p">)</span>
</pre></div>
<p>This will print the statistics whenever the garbage collector is run.</p>
<p>You can get the threshold after which the garbage collector is run by calling <code>get_threshold()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">gc</span><span class="o">.</span><span class="n">get_threshold</span><span class="p">()</span>
<span class="go">(700, 10, 10)</span>
</pre></div>
<p>You can also get the current threshold counts:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">gc</span><span class="o">.</span><span class="n">get_count</span><span class="p">()</span>
<span class="go">(688, 1, 1)</span>
</pre></div>
<p>Lastly, you can run the collection algorithm manually:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
<span class="go">24</span>
</pre></div>
<p>This will call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/gcmodule.c#L987"><code>collect()</code></a> inside the <code>Modules/gcmodule.c</code> file which contains the implementation of the garbage collector algorithm.</p>
<h3 id="conclusion">Conclusion</h3>
<p>In Part 1, you covered the structure of the source code repository, how to compile from source, and the Python language specification. These core concepts will be critical in Part 2 as you dive deeper into the Python interpreter process.</p>
<h2 h1="h1" id="part-2-the-python-interpreter-process">Part 2: The Python Interpreter Process</h2>
<p>Now that you’ve seen the Python grammar and memory management, you can follow the process from typing <code>python</code> to the part where your code is executed.</p>
<p>There are five ways the <code>python</code> binary can be called:</p>
<ol>
<li>To run a single command with <code>-c</code> and a Python command</li>
<li>To start a module with <code>-m</code> and the name of a module</li>
<li>To run a file with the filename</li>
<li>To run the <code>stdin</code> input using a shell pipe</li>
<li>To start the REPL and execute commands one at a time</li>
</ol>
<div class="alert alert-primary" role="alert">
<p>Python has so many ways to execute scripts, it can be a little overwhelming. Darren Jones has put together a <a href="https://realpython.com/courses/running-python-scripts/">great course on running Python scripts</a> if you want to learn more.</p>
</div>
<p>The three source files you need to inspect to see this process are:</p>
<ol>
<li><strong><code>Programs/python.c</code></strong> is a simple entry point.</li>
<li><strong><code>Modules/main.c</code></strong> contains the code to bring together the whole process, loading configuration, executing code and clearing up memory.</li>
<li><strong><code>Python/initconfig.c</code></strong> loads the configuration from the system environment and merges it with any command-line flags.</li>
</ol>
<p>This diagram shows how each of those functions is called:</p>
<p><a href="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" width="1046" height="851" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&w=261&sig=8aa36cedaf32be0236896cce2c32b6c4c4ec7e05 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&w=523&sig=a447b53d2afa96dfd204b9265c8b439bd71272d7 523w, https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png 1046w" sizes="75vw" alt="Python run swim lane diagram"/></a></p>
<p>The execution mode is determined from the configuration.</p>
<div class="alert alert-primary" role="alert">
<p><strong>The CPython source code style:</strong></p>
<p>Similar to the <a href="https://realpython.com/courses/writing-beautiful-python-code-pep-8/">PEP8 style guide for Python code</a>, there is an <a href="https://www.python.org/dev/peps/pep-0007/">official style guide</a> for the CPython C code, designed originally in 2001 and updated for modern versions. </p>
<p>There are some naming standards which help when navigating the source code:</p>
<ul>
<li>
<p>Use a <code>Py</code> prefix for public functions, never for static functions. The <code>Py_</code> prefix is reserved for global service routines like <code>Py_FatalError</code>. Specific groups of routines (like specific object type APIs) use a longer prefix, such as <code>PyString_</code> for string functions.</p>
</li>
<li>
<p>Public functions and variables use MixedCase with underscores, like this: <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L924"><code>PyObject_GetAttr</code></a>, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/modsupport.h#L20"><code>Py_BuildValue</code></a>, <code>PyExc_TypeError</code>.</p>
</li>
<li>
<p>Occasionally an “internal” function has to be visible to the loader. We use the <code>_Py</code> prefix for this, for example, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L464"><code>_PyObject_Dump</code></a>.</p>
</li>
<li>
<p>Macros should have a MixedCase prefix and then use upper case, for example <code>PyString_AS_STRING</code>, <code>Py_PRINT_RAW</code>.</p>
</li>
</ul>
</div>
<h3 id="establishing-runtime-configuration">Establishing Runtime Configuration</h3>
<p><a href="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" width="1046" height="851" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&w=261&sig=8aa36cedaf32be0236896cce2c32b6c4c4ec7e05 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&w=523&sig=a447b53d2afa96dfd204b9265c8b439bd71272d7 523w, https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png 1046w" sizes="75vw" alt="Python run swim lane diagram"/></a></p>
<p>In the swimlanes, you can see that before any Python code is executed, the runtime first establishes the configuration.
The configuration of the runtime is a data structure defined in <code>Include/cpython/initconfig.h</code> named <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/cpython/initconfig.h#L407"><code>PyConfig</code></a>.</p>
<p>The configuration data structure includes things like:</p>
<ul>
<li>Runtime flags for various modes like debug and optimized mode</li>
<li>The execution mode, such as whether a filename was passed, <code>stdin</code> was provided or a module name</li>
<li>Extended option, specified by <code>-X <option></code></li>
<li>Environment variables for runtime settings</li>
</ul>
<p>The configuration data is primarily used by the CPython runtime to enable and disable various features.</p>
<p>Python also comes with several <a href="https://docs.python.org/3/using/cmdline.html">Command Line Interface Options</a>. In Python you can enable verbose mode with the <code>-v</code> flag. In verbose mode, Python will print messages to the screen when modules are loaded:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -v -c <span class="s2">"print('hello world')"</span>
<span class="gp">#</span> installing zipimport hook
<span class="go">import zipimport # builtin</span>
<span class="gp">#</span> installed zipimport hook
<span class="go">...</span>
</pre></div>
<p>You will see a hundred lines or more with all the imports of your user site-packages and anything else in the system environment.</p>
<p>You can see the definition of this flag within <code>Include/cpython/initconfig.h</code> inside the <code>struct</code> for <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/cpython/initconfig.h#L407"><code>PyConfig</code></a>:</p>
<div class="highlight c"><pre><span></span><span class="cm">/* --- PyConfig ---------------------------------------------- */</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">_config_version</span><span class="p">;</span> <span class="cm">/* Internal configuration version,</span>
<span class="cm"> used for ABI compatibility */</span>
<span class="kt">int</span> <span class="n">_config_init</span><span class="p">;</span> <span class="cm">/* _PyConfigInitEnum value */</span>
<span class="p">...</span>
<span class="cm">/* If greater than 0, enable the verbose mode: print a message each time a</span>
<span class="cm"> module is initialized, showing the place (filename or built-in module)</span>
<span class="cm"> from which it is loaded.</span>
<span class="cm"> If greater or equal to 2, print a message for each file that is checked</span>
<span class="cm"> for when searching for a module. Also provides information on module</span>
<span class="cm"> cleanup at exit.</span>
<span class="cm"> Incremented by the -v option. Set by the PYTHONVERBOSE environment</span>
<span class="cm"> variable. If set to -1 (default), inherit Py_VerboseFlag value. */</span>
<span class="kt">int</span> <span class="n">verbose</span><span class="p">;</span>
</pre></div>
<p>In <code>Python/initconfig.c</code>, the logic for reading settings from environment variables and runtime command-line flags is established.</p>
<p>In the <code>config_read_env_vars</code> function, the environment variables are read and used to assign the values for the configuration settings:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyStatus</span>
<span class="nf">config_read_env_vars</span><span class="p">(</span><span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyStatus</span> <span class="n">status</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">use_env</span> <span class="o">=</span> <span class="n">config</span><span class="o">-></span><span class="n">use_environment</span><span class="p">;</span>
<span class="cm">/* Get environment variables */</span>
<span class="hll"> <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&</span><span class="n">config</span><span class="o">-></span><span class="n">parser_debug</span><span class="p">,</span> <span class="s">"PYTHONDEBUG"</span><span class="p">);</span>
</span> <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&</span><span class="n">config</span><span class="o">-></span><span class="n">verbose</span><span class="p">,</span> <span class="s">"PYTHONVERBOSE"</span><span class="p">);</span>
<span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&</span><span class="n">config</span><span class="o">-></span><span class="n">optimization_level</span><span class="p">,</span> <span class="s">"PYTHONOPTIMIZE"</span><span class="p">);</span>
<span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&</span><span class="n">config</span><span class="o">-></span><span class="n">inspect</span><span class="p">,</span> <span class="s">"PYTHONINSPECT"</span><span class="p">);</span>
</pre></div>
<p>For the verbose setting, you can see that the value of <code>PYTHONVERBOSE</code> is used to set the value of <code>&config->verbose</code>, if <code>PYTHONVERBOSE</code> is found. If the environment variable does not exist, then the default value of <code>-1</code> will remain.</p>
<p>Then in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/initconfig.c#L1828"><code>config_parse_cmdline</code></a> within <code>initconfig.c</code> again, the command-line flag is used to set the value, if provided:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyStatus</span>
<span class="nf">config_parse_cmdline</span><span class="p">(</span><span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span><span class="p">,</span> <span class="n">PyWideStringList</span> <span class="o">*</span><span class="n">warnoptions</span><span class="p">,</span>
<span class="n">Py_ssize_t</span> <span class="o">*</span><span class="n">opt_index</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="k">case</span> <span class="sc">'v'</span><span class="o">:</span>
<span class="hll"> <span class="n">config</span><span class="o">-></span><span class="n">verbose</span><span class="o">++</span><span class="p">;</span>
</span> <span class="k">break</span><span class="p">;</span>
<span class="p">...</span>
<span class="cm">/* This space reserved for other options */</span>
<span class="k">default</span><span class="o">:</span>
<span class="cm">/* unknown argument: parsing failed */</span>
<span class="n">config_usage</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">program</span><span class="p">);</span>
<span class="k">return</span> <span class="n">_PyStatus_EXIT</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">);</span>
</pre></div>
<p>This value is later copied to a global variable <code>Py_VerboseFlag</code> by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/initconfig.c#L134"><code>_Py_GetGlobalVariablesAsDict</code></a> function.</p>
<p>Within a Python session, you can access the runtime flags, like verbose mode, quiet mode, using the <code>sys.flags</code> named tuple.
The <code>-X</code> flags are all available inside the <code>sys._xoptions</code> dictionary:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="go">$ ./python.exe -X dev -q </span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">flags</span>
<span class="go">sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, </span>
<span class="go"> no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, </span>
<span class="go"> quiet=1, hash_randomization=1, isolated=0, dev_mode=True, utf8_mode=0)</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">_xoptions</span>
<span class="go">{'dev': True}</span>
</pre></div>
<p>As well as the runtime configuration in <code>initconfig.h</code>, there is also the build configuration, which is located inside <code>pyconfig.h</code> in the root folder. This file is created dynamically in the <code>configure</code> step in the build process, or by Visual Studio for Windows systems.</p>
<p>You can see the build configuration by running:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -m sysconfig
</pre></div>
<h3 id="reading-filesinput">Reading Files/Input</h3>
<p>Once CPython has the runtime configuration and the command-line arguments, it can establish what it needs to execute.</p>
<p>This task is handled by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L665"><code>pymain_main</code></a> function inside <code>Modules/main.c</code>. Depending on the newly created <code>config</code> instance, CPython will now execute code provided via several options.</p>
<h4 id="input-via-c">Input via <code>-c</code></h4>
<p>The simplest is providing CPython a command with the <code>-c</code> option and a Python program inside quotes.</p>
<p>For example:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -c <span class="s2">"print('hi')"</span>
<span class="go">hi</span>
</pre></div>
<p>Here is the full flowchart of how this happens:</p>
<p><a href="https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png" width="1041" height="751" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pymain_run_command.f5da561ba7d5.png&w=260&sig=f7f802b23e900bf42b29804ee80ed0dd0eaec6a4 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pymain_run_command.f5da561ba7d5.png&w=520&sig=3079060f305945fd3556ce7cd453fac965a6ec27 520w, https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png 1041w" sizes="75vw" alt="Flow chart of pymain_run_command"/></a></p>
<p>First, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L240"><code>pymain_run_command()</code></a> function is executed inside <code>Modules/main.c</code> taking the command passed in <code>-c</code> as an argument in the C type <code>wchar_t*</code>. The <code>wchar_t*</code> type is often used as a low-level storage type for Unicode data across CPython as the size of the type can store UTF8 characters.</p>
<p>When converting the <code>wchar_t*</code> to a Python string, the <code>Objects/unicodeobject.c</code> file has a helper function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/unicodeobject.c#L2088"><code>PyUnicode_FromWideChar()</code></a> that returns a <code>PyObject</code>, of type <code>str</code>. The encoding to UTF8 is then done by <code>PyUnicode_AsUTF8String()</code> on the Python <code>str</code> object to convert it to a Python <code>bytes</code> object. </p>
<p>Once this is complete, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L240"><code>pymain_run_command()</code></a> will then pass the Python bytes object to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> for execution, but first converting the <code>bytes</code> to a <code>str</code> type again:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">pymain_run_command</span><span class="p">(</span><span class="kt">wchar_t</span> <span class="o">*</span><span class="n">command</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">cf</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">unicode</span><span class="p">,</span> <span class="o">*</span><span class="n">bytes</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ret</span><span class="p">;</span>
<span class="n">unicode</span> <span class="o">=</span> <span class="n">PyUnicode_FromWideChar</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">unicode</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">"cpython.run_command"</span><span class="p">,</span> <span class="s">"O"</span><span class="p">,</span> <span class="n">unicode</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">bytes</span> <span class="o">=</span> <span class="n">PyUnicode_AsUTF8String</span><span class="p">(</span><span class="n">unicode</span><span class="p">);</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">unicode</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">bytes</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">PyRun_SimpleStringFlags</span><span class="p">(</span><span class="n">PyBytes_AsString</span><span class="p">(</span><span class="n">bytes</span><span class="p">),</span> <span class="n">cf</span><span class="p">);</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">);</span>
<span class="nl">error</span><span class="p">:</span>
<span class="n">PySys_WriteStderr</span><span class="p">(</span><span class="s">"Unable to decode the command from the command line:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>The conversion of <code>wchar_t*</code> to Unicode, bytes, and then a string is roughly equivalent to the following:</p>
<div class="highlight python"><pre><span></span><span class="n">unicode</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">command</span><span class="p">)</span>
<span class="n">bytes_</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">(</span><span class="n">unicode</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf8'</span><span class="p">))</span>
<span class="c1"># call PyRun_SimpleStringFlags with bytes_</span>
</pre></div>
<p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> function is part of <code>Python/pythonrun.c</code>. It’s purpose is to turn this simple command into a Python module and then send it on to be executed.
Since a Python module needs to have <code>__main__</code> to be executed as a standalone module, it creates that automatically:</p>
<div class="highlight c"><pre><span></span><span class="kt">int</span>
<span class="nf">PyRun_SimpleStringFlags</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">command</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="o">*</span><span class="n">d</span><span class="p">,</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="hll"> <span class="n">m</span> <span class="o">=</span> <span class="n">PyImport_AddModule</span><span class="p">(</span><span class="s">"__main__"</span><span class="p">);</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">m</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="hll"> <span class="n">d</span> <span class="o">=</span> <span class="n">PyModule_GetDict</span><span class="p">(</span><span class="n">m</span><span class="p">);</span>
</span><span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">PyRun_StringFlags</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="n">Py_file_input</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyErr_Print</span><span class="p">();</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Once <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> has created a module and a dictionary, it calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1008"><code>PyRun_StringFlags()</code></a>, which creates a fake filename and then calls the Python parser to create an AST from the string and return a module, <code>mod</code>:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyRun_StringFlags</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromStringObject</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mod</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="n">PyArena_Free</span><span class="p">(</span><span class="n">arena</span><span class="p">);</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
</pre></div>
<p>You’ll dive into the AST and Parser code in the next section.</p>
<h4 id="input-via-m">Input via <code>-m</code></h4>
<p>Another way to execute Python commands is by using the <code>-m</code> option with the name of a module.
A typical example is <code>python -m unittest</code> to run the unittest module in the standard library.</p>
<p>Being able to execute modules as scripts were initially proposed in <a href="https://www.python.org/dev/peps/pep-0338">PEP 338</a> and then the standard for explicit relative imports defined in <a href="https://www.python.org/dev/peps/pep-0366">PEP366</a>. </p>
<p>The use of the <code>-m</code> flag implies that within the module package, you want to execute whatever is inside <a href="https://realpython.com/python-main-function/"><code>__main__</code></a>. It also implies that you want to search <code>sys.path</code> for the named module.</p>
<p>This search mechanism is why you don’t need to remember where the <code>unittest</code> module is stored on your filesystem.</p>
<p>Inside <code>Modules/main.c</code> there is a function called when the command-line is run with the <code>-m</code> flag. The name of the module is passed as the <code>modname</code> argument.</p>
<p>CPython will then import a standard library module, <code>runpy</code> and execute it using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/call.c#L214"><code>PyObject_Call()</code></a>. The import is done using the C API function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/import.c#L1409"><code>PyImport_ImportModule()</code></a>, found within the <code>Python/import.c</code> file:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">pymain_run_module</span><span class="p">(</span><span class="k">const</span> <span class="kt">wchar_t</span> <span class="o">*</span><span class="n">modname</span><span class="p">,</span> <span class="kt">int</span> <span class="n">set_argv0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">module</span><span class="p">,</span> <span class="o">*</span><span class="n">runpy</span><span class="p">,</span> <span class="o">*</span><span class="n">runmodule</span><span class="p">,</span> <span class="o">*</span><span class="n">runargs</span><span class="p">,</span> <span class="o">*</span><span class="n">result</span><span class="p">;</span>
<span class="n">runpy</span> <span class="o">=</span> <span class="n">PyImport_ImportModule</span><span class="p">(</span><span class="s">"runpy"</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">runmodule</span> <span class="o">=</span> <span class="n">PyObject_GetAttrString</span><span class="p">(</span><span class="n">runpy</span><span class="p">,</span> <span class="s">"_run_module_as_main"</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">module</span> <span class="o">=</span> <span class="n">PyUnicode_FromWideChar</span><span class="p">(</span><span class="n">modname</span><span class="p">,</span> <span class="n">wcslen</span><span class="p">(</span><span class="n">modname</span><span class="p">));</span>
<span class="p">...</span>
<span class="n">runargs</span> <span class="o">=</span> <span class="n">Py_BuildValue</span><span class="p">(</span><span class="s">"(Oi)"</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="n">set_argv0</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">PyObject_Call</span><span class="p">(</span><span class="n">runmodule</span><span class="p">,</span> <span class="n">runargs</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>In this function you’ll also see 2 other C API functions: <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/call.c#L214"><code>PyObject_Call()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L831"><code>PyObject_GetAttrString()</code></a>. Because <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/import.c#L1409"><code>PyImport_ImportModule()</code></a> returns a <code>PyObject*</code>, the core object type, you need to call special functions to get attributes and to call it.</p>
<p>In Python, if you had an object and wanted to get an attribute, then you could call <code>getattr()</code>. In the C API, this call is <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L831"><code>PyObject_GetAttrString()</code></a>, which is found in <code>Objects/object.c</code>. If you wanted to run a callable, you would give it parentheses, or you can run the <code>__call__()</code> property on any Python object. The <code>__call__()</code> method is implemented inside <code>Objects/object.c</code>:</p>
<div class="highlight python"><pre><span></span><span class="n">hi</span> <span class="o">=</span> <span class="s2">"hi!"</span>
<span class="n">hi</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="o">==</span> <span class="n">hi</span><span class="o">.</span><span class="n">upper</span><span class="o">.</span><span class="fm">__call__</span><span class="p">()</span> <span class="c1"># this is the same</span>
</pre></div>
<p>The <code>runpy</code> module is written in pure Python and located in <code>Lib/runpy.py</code>.</p>
<p>Executing <code>python -m <module></code> is equivalent to running <code>python -m runpy <module></code>. The <code>runpy</code> module was created to abstract the process of locating and executing modules on an operating system.</p>
<p><code>runpy</code> does a few things to run the target module:</p>
<ul>
<li>Calls <code>__import__()</code> for the module name you provided</li>
<li>Sets <code>__name__</code> (the module name) to a namespace called <code>__main__</code></li>
<li>Executes the module within the <code>__main__</code> namespace</li>
</ul>
<p>The <code>runpy</code> module also supports executing directories and zip files.</p>
<h4 id="input-via-filename">Input via Filename</h4>
<p>If the first argument to <code>python</code> was a filename, such as <code>python test.py</code>, then CPython will open a file handle, similar to using <code>open()</code> in Python and pass the handle to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L372"><code>PyRun_SimpleFileExFlags()</code></a> inside <code>Python/pythonrun.c</code>.</p>
<p>There are 3 paths this function can take:</p>
<ol>
<li>If the file path is a <code>.pyc</code> file, it will call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1145"><code>run_pyc_file()</code></a>.</li>
<li>If the file path is a script file (<code>.py</code>) it will run <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a>.</li>
<li>If the filepath is <code>stdin</code> because the user ran <code>command | python</code> then treat <code>stdin</code> as a file handle and run <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a>.</li>
</ol>
<div class="highlight c"><pre><span></span><span class="kt">int</span>
<span class="nf">PyRun_SimpleFileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span>
<span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">PyImport_AddModule</span><span class="p">(</span><span class="s">"__main__"</span><span class="p">);</span>
<span class="p">...</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">maybe_pyc_file</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">ext</span><span class="p">,</span> <span class="n">closeit</span><span class="p">))</span> <span class="p">{</span>
</span> <span class="p">...</span>
<span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">run_pyc_file</span><span class="p">(</span><span class="n">pyc_fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="cm">/* When running from stdin, leave __main__.__loader__ alone */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">"<stdin>"</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span> <span class="o">&&</span>
<span class="n">set_main_loader</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="s">"SourceFileLoader"</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"python: failed to set __main__.__loader__</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
<span class="p">}</span>
<span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">PyRun_FileExFlags</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">Py_file_input</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span>
</span><span class="hll"> <span class="n">closeit</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
</span> <span class="p">}</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="input-via-file-with-pyrun_fileexflags">Input via File With <code>PyRun_FileExFlags()</code></h4>
<p>For <code>stdin</code> and basic script files, CPython will pass the file handle to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> located in the <code>pythonrun.c</code> file.</p>
<p>The purpose of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> is similar to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> used for the <code>-c</code> input. CPython will load the file handle into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>. We’ll cover the Parser and AST modules in the next section.
Because this is a full script, it doesn’t need the <code>PyImport_AddModule("__main__");</code> step used by <code>-c</code>:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyRun_FileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename_str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">flags</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Identical to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a>, once <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> has created a Python module from the file, it sent it to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> to be executed.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> is found within <code>Python/pythonrun.c</code>, and sends the module to the AST to be compiled into a code object. Code objects are a format used to store the bytecode operations and the format kept in <code>.pyc</code> files:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">run_mod</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
<span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="n">co</span> <span class="o">=</span> <span class="n">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">"exec"</span><span class="p">,</span> <span class="s">"O"</span><span class="p">,</span> <span class="n">co</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="k">return</span> <span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>We will cover the CPython compiler and bytecodes in the next section. The call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> is a simple wrapper function that calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> in the <code>Python/eval.c</code> file. The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> function is the main evaluation loop for CPython, it iterates over each bytecode statement and executes it on your local machine.</p>
<h4 id="input-via-compiled-bytecode-with-run_pyc_file">Input via Compiled Bytecode With <code>run_pyc_file()</code></h4>
<p>In the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L372"><code>PyRun_SimpleFileExFlags()</code></a> there was a clause for the user providing a file path to a <code>.pyc</code> file. If the file path ended in <code>.pyc</code> then instead of loading the file as a plain text file and parsing it, it will assume that the <code>.pyc</code> file contains a code object written to disk. </p>
<p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1145"><code>run_pyc_file()</code></a> function inside <code>Python/pythonrun.c</code> then marshals the code object from the <code>.pyc</code> file by using the file handle. <strong>Marshaling</strong> is a technical term for copying the contents of a file into memory and converting them to a specific data structure. The code object data structure on the disk is the CPython compiler’s way to caching compiled code so that it doesn’t need to parse it every time the script is called:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">run_pyc_file</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="p">...</span>
<span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">PyMarshal_ReadLastObjectFromFile</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
</span> <span class="p">...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="o">!</span><span class="n">PyCode_Check</span><span class="p">(</span><span class="n">v</span><span class="p">))</span> <span class="p">{</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_RuntimeError</span><span class="p">,</span>
<span class="s">"Bad code object in .pyc file"</span><span class="p">);</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="hll"> <span class="n">co</span> <span class="o">=</span> <span class="p">(</span><span class="n">PyCodeObject</span> <span class="o">*</span><span class="p">)</span><span class="n">v</span><span class="p">;</span>
</span><span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&&</span> <span class="n">flags</span><span class="p">)</span>
<span class="n">flags</span><span class="o">-></span><span class="n">cf_flags</span> <span class="o">|=</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">PyCF_MASK</span><span class="p">);</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="k">return</span> <span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Once the code object has been marshaled to memory, it is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>, which calls <code>Python/ceval.c</code> to execute the code.</p>
<h3 id="lexing-and-parsing">Lexing and Parsing</h3>
<p>In the exploration of reading and executing Python files, we dived as deep as the parser and AST modules, with function calls to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>.</p>
<p>Sticking within <code>Python/pythonrun.c</code>, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a> function will take a file handle, compiler flags and a <code>PyArena</code> instance and convert the file object into a node object using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a>.</p>
<p>With the node object, it will then convert that into a module using the AST function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a>:</p>
<div class="highlight c"><pre><span></span><span class="n">mod_ty</span>
<span class="nf">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">enc</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps1</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps2</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">errcode</span><span class="p">,</span>
<span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="hll"> <span class="n">node</span> <span class="o">*</span><span class="n">n</span> <span class="o">=</span> <span class="n">PyParser_ParseFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span>
</span><span class="hll"> <span class="o">&</span><span class="n">_PyParser_Grammar</span><span class="p">,</span>
</span><span class="hll"> <span class="n">start</span><span class="p">,</span> <span class="n">ps1</span><span class="p">,</span> <span class="n">ps2</span><span class="p">,</span> <span class="o">&</span><span class="n">err</span><span class="p">,</span> <span class="o">&</span><span class="n">iflags</span><span class="p">);</span>
</span> <span class="p">...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="hll"> <span class="n">flags</span><span class="o">-></span><span class="n">cf_flags</span> <span class="o">|=</span> <span class="n">iflags</span> <span class="o">&</span> <span class="n">PyCF_MASK</span><span class="p">;</span>
</span> <span class="n">mod</span> <span class="o">=</span> <span class="n">PyAST_FromNodeObject</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
<span class="n">PyNode_Free</span><span class="p">(</span><span class="n">n</span><span class="p">);</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">mod</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>For <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a> we switch to <code>Parser/parsetok.c</code> and the parser-tokenizer stage of the CPython interpreter. This function has two important tasks:</p>
<ol>
<li>Instantiate a tokenizer state <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.h#L23"><code>tok_state</code></a> using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.h#L78"><code>PyTokenizer_FromFile()</code></a> in <code>Parser/tokenizer.c</code></li>
<li>Convert the tokens into a concrete parse tree (a list of <code>node</code>) using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a> in <code>Parser/parsetok.c</code> </li>
</ol>
<div class="highlight c"><pre><span></span><span class="n">node</span> <span class="o">*</span>
<span class="nf">PyParser_ParseFileObject</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">enc</span><span class="p">,</span> <span class="n">grammar</span> <span class="o">*</span><span class="n">g</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps1</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps2</span><span class="p">,</span>
<span class="n">perrdetail</span> <span class="o">*</span><span class="n">err_ret</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="hll"> <span class="k">struct</span> <span class="n">tok_state</span> <span class="o">*</span><span class="n">tok</span><span class="p">;</span>
</span><span class="p">...</span>
<span class="hll"> <span class="k">if</span> <span class="p">((</span><span class="n">tok</span> <span class="o">=</span> <span class="n">PyTokenizer_FromFile</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span> <span class="n">ps1</span><span class="p">,</span> <span class="n">ps2</span><span class="p">))</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
</span> <span class="n">err_ret</span><span class="o">-></span><span class="n">error</span> <span class="o">=</span> <span class="n">E_NOMEM</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="hll"> <span class="k">return</span> <span class="n">parsetok</span><span class="p">(</span><span class="n">tok</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">err_ret</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
</span><span class="p">}</span>
</pre></div>
<p><code>tok_state</code> (defined in <code>Parser/tokenizer.h</code>) is the data structure to store all temporary data generated by the tokenizer. It is returned to the parser-tokenizer as the data structure is required by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a> to develop the concrete syntax tree.</p>
<p>Inside <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a>, it will use the <code>tok_state</code> structure and make calls to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a> in a loop until the file is exhausted and no more tokens can be found.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a>, defined in <code>Parser/tokenizer.c</code> behaves like an iterator. It will keep returning the next token in the parse tree.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a> is one of the most complex functions in the whole CPython codebase. It has over 640 lines and includes decades of heritage with edge cases, new language features, and syntax.</p>
<p>One of the simpler examples would be the part that converts a newline break into a NEWLINE token:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">tok_get</span><span class="p">(</span><span class="k">struct</span> <span class="n">tok_state</span> <span class="o">*</span><span class="n">tok</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">p_start</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">p_end</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="cm">/* Newline */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">==</span> <span class="sc">'\n'</span><span class="p">)</span> <span class="p">{</span>
<span class="n">tok</span><span class="o">-></span><span class="n">atbol</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">blankline</span> <span class="o">||</span> <span class="n">tok</span><span class="o">-></span><span class="n">level</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">nextline</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">*</span><span class="n">p_start</span> <span class="o">=</span> <span class="n">tok</span><span class="o">-></span><span class="n">start</span><span class="p">;</span>
<span class="o">*</span><span class="n">p_end</span> <span class="o">=</span> <span class="n">tok</span><span class="o">-></span><span class="n">cur</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="cm">/* Leave '\n' out of the string */</span>
<span class="n">tok</span><span class="o">-></span><span class="n">cont_line</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tok</span><span class="o">-></span><span class="n">async_def</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* We're somewhere inside an 'async def' function, and</span>
<span class="cm"> we've encountered a NEWLINE after its signature. */</span>
<span class="n">tok</span><span class="o">-></span><span class="n">async_def_nl</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">NEWLINE</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>In this case, <code>NEWLINE</code> is a token, with a value defined in <code>Include/token.h</code>. All tokens are constant <code>int</code> values, and the <code>Include/token.h</code> file was generated earlier when we ran <code>make regen-grammar</code>.</p>
<p>The <code>node</code> type returned by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a> is going to be essential for the next stage, converting a parse tree into an Abstract-Syntax-Tree (AST):</p>
<div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">_node</span> <span class="p">{</span>
<span class="kt">short</span> <span class="n">n_type</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">n_str</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n_lineno</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n_col_offset</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n_nchildren</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">_node</span> <span class="o">*</span><span class="n">n_child</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n_end_lineno</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n_end_col_offset</span><span class="p">;</span>
<span class="p">}</span> <span class="n">node</span><span class="p">;</span>
</pre></div>
<p>Since the CST is a tree of syntax, token IDs, and symbols, it would be difficult for the compiler to make quick decisions based on the Python language.</p>
<p>That is why the next stage is to convert the CST into an AST, a much higher-level structure. This task is performed by the <code>Python/ast.c</code> module, which has both a C and Python API.</p>
<p>Before you jump into the AST, there is a way to access the output from the parser stage. CPython has a standard library module <code>parser</code>, which exposes the C functions with a Python API.</p>
<p>The module is documented as an implementation detail of CPython so that you won’t see it in other Python interpreters. Also the output from the functions is not that easy to read.</p>
<p>The output will be in the numeric form, using the token and symbol numbers generated by the <code>make regen-grammar</code> stage, stored in <code>Include/token.h</code>: </p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">parser</span>
<span class="gp">>>> </span><span class="n">st</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">expr</span><span class="p">(</span><span class="s1">'a + 1'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">pprint</span><span class="p">(</span><span class="n">parser</span><span class="o">.</span><span class="n">st2list</span><span class="p">(</span><span class="n">st</span><span class="p">))</span>
<span class="go">[258,</span>
<span class="go"> [332,</span>
<span class="go"> [306,</span>
<span class="go"> [310,</span>
<span class="go"> [311,</span>
<span class="go"> [312,</span>
<span class="go"> [313,</span>
<span class="go"> [316,</span>
<span class="go"> [317,</span>
<span class="go"> [318,</span>
<span class="go"> [319,</span>
<span class="go"> [320,</span>
<span class="go"> [321, [322, [323, [324, [325, [1, 'a']]]]]],</span>
<span class="go"> [14, '+'],</span>
<span class="go"> [321, [322, [323, [324, [325, [2, '1']]]]]]]]]]]]]]]]],</span>
<span class="go"> [4, ''],</span>
<span class="go"> [0, '']]</span>
</pre></div>
<p>To make it easier to understand, you can take all the numbers in the <code>symbol</code> and <code>token</code> modules, put them into a dictionary and recursively replace the values in the output of <code>parser.st2list()</code> with the names:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">symbol</span>
<span class="kn">import</span> <span class="nn">token</span>
<span class="kn">import</span> <span class="nn">parser</span>
<span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">expression</span><span class="p">):</span>
<span class="n">symbols</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">symbol</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="nb">int</span><span class="p">)}</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">token</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="nb">int</span><span class="p">)}</span>
<span class="n">lexicon</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">symbols</span><span class="p">,</span> <span class="o">**</span><span class="n">tokens</span><span class="p">}</span>
<span class="n">st</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">expr</span><span class="p">(</span><span class="n">expression</span><span class="p">)</span>
<span class="n">st_list</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">st2list</span><span class="p">(</span><span class="n">st</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">replace</span><span class="p">(</span><span class="n">l</span><span class="p">:</span> <span class="nb">list</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">l</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
<span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">replace</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">lexicon</span><span class="p">:</span>
<span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">lexicon</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">return</span> <span class="n">replace</span><span class="p">(</span><span class="n">st_list</span><span class="p">)</span>
</pre></div>
<p>You can run <code>lex()</code> with a simple expression, like <code>a + 1</code> to see how this is represented as a parser-tree:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span>
<span class="gp">>>> </span><span class="n">pprint</span><span class="p">(</span><span class="n">lex</span><span class="p">(</span><span class="s1">'a + 1'</span><span class="p">))</span>
<span class="go">['eval_input',</span>
<span class="go"> ['testlist',</span>
<span class="go"> ['test',</span>
<span class="go"> ['or_test',</span>
<span class="go"> ['and_test',</span>
<span class="go"> ['not_test',</span>
<span class="go"> ['comparison',</span>
<span class="go"> ['expr',</span>
<span class="go"> ['xor_expr',</span>
<span class="go"> ['and_expr',</span>
<span class="go"> ['shift_expr',</span>
<span class="go"> ['arith_expr',</span>
<span class="go"> ['term',</span>
<span class="go"> ['factor', ['power', ['atom_expr', ['atom', ['NAME', 'a']]]]]],</span>
<span class="go"> ['PLUS', '+'],</span>
<span class="go"> ['term',</span>
<span class="go"> ['factor',</span>
<span class="go"> ['power', ['atom_expr', ['atom', ['NUMBER', '1']]]]]]]]]]]]]]]]],</span>
<span class="go"> ['NEWLINE', ''],</span>
<span class="go"> ['ENDMARKER', '']]</span>
</pre></div>
<p>In the output, you can see the symbols in lowercase, such as <code>'test'</code> and the tokens in uppercase, such as <code>'NUMBER'</code>.</p>
<h3 id="abstract-syntax-trees">Abstract Syntax Trees</h3>
<p>The next stage in the CPython interpreter is to convert the CST generated by the parser into something more logical that can be executed. The structure is a higher-level representation of the code, called an Abstract Syntax Tree (AST).</p>
<p>ASTs are produced inline with the CPython interpreter process, but you can also generate them in both Python using the <code>ast</code> module in the Standard Library as well as through the C API.</p>
<p>Before diving into the C implementation of the AST, it would be useful to understand what an AST looks like for a simple piece of Python code.</p>
<p>To do this, here’s a simple app called <code>instaviz</code> for this tutorial. It displays the AST and bytecode instructions (which we’ll cover later) in a Web UI.</p>
<p>To install <code>instaviz</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install instaviz
</pre></div>
<p>Then, open up a REPL by running <code>python</code> at the command line with no arguments:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">instaviz</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">example</span><span class="p">():</span>
<span class="go"> a = 1</span>
<span class="go"> b = a + 1</span>
<span class="go"> return b</span>
<span class="gp">>>> </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">example</span><span class="p">)</span>
</pre></div>
<p>You’ll see a notification on the command-line that a web server has started on port <code>8080</code>. If you were using that port for something else, you can change it by calling <code>instaviz.show(example, port=9090)</code> or another port number.</p>
<p>In the web browser, you can see the detailed breakdown of your function:</p>
<p><a href="https://files.realpython.com/media/screenshot.e148c89e3a9a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/screenshot.e148c89e3a9a.png" width="4802" height="2566" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/screenshot.e148c89e3a9a.png&w=1200&sig=eeb7b21839b90e3726ea5db7bfa0aac05730b1fe 1200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/screenshot.e148c89e3a9a.png&w=2401&sig=aed4aed922bbe4ef9c665a90c9244ebaa950e032 2401w, https://files.realpython.com/media/screenshot.e148c89e3a9a.png 4802w" sizes="75vw" alt="Instaviz screenshot"/></a></p>
<p>The bottom left graph is the function you declared in REPL, represented as an Abstract Syntax Tree. Each node in the tree is an AST type. They are found in the <code>ast</code> module, and all inherit from <code>_ast.AST</code>. </p>
<p>Some of the nodes have properties which link them to child nodes, unlike the CST, which has a generic child node property. </p>
<p>For example, if you click on the Assign node in the center, this links to the line <code>b = a + 1</code>:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png" width="2226" height="1596" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png&w=556&sig=6a6f30034c85f411bd6c159df8aef50e899dda9c 556w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png&w=1113&sig=1b85cbf0bec4b114ef52d8a7bcdbfa08a1519237 1113w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png 2226w" sizes="75vw" alt="Instaviz screenshot 2"/></a></p>
<p>It has two properties:</p>
<ol>
<li><strong><code>targets</code></strong> is a list of names to assign. It is a list because you can assign to multiple variables with a single expression using unpacking</li>
<li><strong><code>value</code></strong> is the value to assign, which in this case is a <code>BinOp</code> statement, <code>a + 1</code>.</li>
</ol>
<p>If you click on the <code>BinOp</code> statement, it shows the properties of relevance:</p>
<ul>
<li><strong><code>left</code>:</strong> the node to the left of the operator</li>
<li><strong><code>op</code>:</strong> the operator, in this case, an <code>Add</code> node (<code>+</code>) for addition</li>
<li><strong><code>right</code>:</strong> the node to the right of the operator</li>
</ul>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png" width="1708" height="932" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png&w=427&sig=381d8bee4cc98dc98031abc6bc34ec376fd452b7 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png&w=854&sig=ccc0b88e9d762b2298a33ca2a03e826ee13f4193 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png 1708w" sizes="75vw" alt="Instaviz screenshot 3"/></a></p>
<p>Compiling an AST in C is not a straightforward task, so the <code>Python/ast.c</code> module is over 5000 lines of code.</p>
<p>There are a few entry points, forming part of the AST’s public API. In the last section on the lexer and parser, you stopped when you’d reached the call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a>. By this stage, the Python interpreter process had created a CST in the format of <code>node *</code> tree.</p>
<p>Jumping then into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> inside <code>Python/ast.c</code>, you can see it receives the <code>node *</code> tree, the filename, compiler flags, and the <code>PyArena</code>.</p>
<p>The return type from this function is <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/ast.h#L10"><code>mod_ty</code></a>, defined in <code>Include/Python-ast.h</code>. <code>mod_ty</code> is a container structure for one of the 5 module types in Python:</p>
<ol>
<li><code>Module</code> </li>
<li><code>Interactive</code></li>
<li><code>Expression</code></li>
<li><code>FunctionType</code></li>
<li><code>Suite</code></li>
</ol>
<p>In <code>Include/Python-ast.h</code> you can see that an <code>Expression</code> type requires a field <code>body</code>, which is an <code>expr_ty</code> type. The <code>expr_ty</code> type is also defined in <code>Include/Python-ast.h</code>:</p>
<div class="highlight c"><pre><span></span><span class="k">enum</span> <span class="n">_mod_kind</span> <span class="p">{</span><span class="n">Module_kind</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">Interactive_kind</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">Expression_kind</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="n">FunctionType_kind</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">Suite_kind</span><span class="o">=</span><span class="mi">5</span><span class="p">};</span>
<span class="k">struct</span> <span class="n">_mod</span> <span class="p">{</span>
<span class="k">enum</span> <span class="n">_mod_kind</span> <span class="n">kind</span><span class="p">;</span>
<span class="k">union</span> <span class="p">{</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
<span class="n">asdl_seq</span> <span class="o">*</span><span class="n">type_ignores</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Module</span><span class="p">;</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Interactive</span><span class="p">;</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">expr_ty</span> <span class="n">body</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Expression</span><span class="p">;</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">asdl_seq</span> <span class="o">*</span><span class="n">argtypes</span><span class="p">;</span>
<span class="n">expr_ty</span> <span class="n">returns</span><span class="p">;</span>
<span class="p">}</span> <span class="n">FunctionType</span><span class="p">;</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Suite</span><span class="p">;</span>
<span class="p">}</span> <span class="n">v</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>The AST types are all listed in <code>Parser/Python.asdl</code>. You will see the module types, statement types, expression types, operators, and comprehensions all listed. The names of the types in this document relate to the classes generated by the AST and the same classes named in the <code>ast</code> standard module library.</p>
<p>The parameters and names in <code>Include/Python-ast.h</code> correlate directly to those specified in <code>Parser/Python.asdl</code>:</p>
<div class="highlight text"><pre><span></span>-- ASDL's 5 builtin types are:
-- identifier, int, string, object, constant
module Python
{
mod = Module(stmt* body, type_ignore *type_ignores)
| Interactive(stmt* body)
<span class="hll"> | Expression(expr body)
</span> | FunctionType(expr* argtypes, expr returns)
</pre></div>
<p>The C header file and structures are there so that the <code>Python/ast.c</code> program can quickly generate the structures with pointers to the relevant data.</p>
<p>Looking at <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> you can see that it is essentially a <code>switch</code> statement around the result from <code>TYPE(n)</code>. <code>TYPE()</code> is one of the core functions used by the AST to determine what type a node in the concrete syntax tree is. In the case of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> it’s just looking at the first node, so it can only be one of the module types defined as <code>Module</code>, <code>Interactive</code>, <code>Expression</code>, <code>FunctionType</code>.</p>
<p>The result of <code>TYPE()</code> will be either a symbol or token type, which we’re very familiar with by this stage.</p>
<p>For <code>file_input</code>, the results should be a <code>Module</code>. Modules are a series of statements, of which there are a few types. The logic to traverse the children of <code>n</code> and create statement nodes is within <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L4512"><code>ast_for_stmt()</code></a>. This function is called either once, if there is only 1 statement in the module, or in a loop if there are many. The resulting <code>Module</code> is then returned with the <code>PyArena</code>.</p>
<p>For <code>eval_input</code>, the result should be an <code>Expression</code>. The result from <code>CHILD(n ,0)</code>, which is the first child of <code>n</code> is passed to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L3246"><code>ast_for_testlist()</code></a> which returns an <code>expr_ty</code> type. This <code>expr_ty</code> is sent to <code>Expression()</code> with the PyArena to create an expression node, and then passed back as a result:</p>
<div class="highlight c"><pre><span></span><span class="n">mod_ty</span>
<span class="nf">PyAST_FromNodeObject</span><span class="p">(</span><span class="k">const</span> <span class="n">node</span> <span class="o">*</span><span class="n">n</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">n</span><span class="p">))</span> <span class="p">{</span>
<span class="k">case</span> <span class="nl">file_input</span><span class="p">:</span>
<span class="n">stmts</span> <span class="o">=</span> <span class="n">_Py_asdl_seq_new</span><span class="p">(</span><span class="n">num_stmts</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">arena</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">stmts</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ch</span> <span class="o">=</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">ch</span><span class="p">)</span> <span class="o">==</span> <span class="n">NEWLINE</span><span class="p">)</span>
<span class="k">continue</span><span class="p">;</span>
<span class="n">REQ</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">stmt</span><span class="p">);</span>
<span class="n">num</span> <span class="o">=</span> <span class="n">num_stmts</span><span class="p">(</span><span class="n">ch</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">num</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="hll"> <span class="n">s</span> <span class="o">=</span> <span class="n">ast_for_stmt</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="n">ch</span><span class="p">);</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">s</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="n">asdl_seq_SET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">k</span><span class="o">++</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="n">ch</span> <span class="o">=</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">REQ</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">simple_stmt</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">num</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="hll"> <span class="n">s</span> <span class="o">=</span> <span class="n">ast_for_stmt</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">j</span> <span class="o">*</span> <span class="mi">2</span><span class="p">));</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">s</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="n">asdl_seq_SET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">k</span><span class="o">++</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/* Type ignores are stored under the ENDMARKER in file_input. */</span>
<span class="p">...</span>
<span class="hll"> <span class="n">res</span> <span class="o">=</span> <span class="n">Module</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">type_ignores</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nl">eval_input</span><span class="p">:</span> <span class="p">{</span>
<span class="n">expr_ty</span> <span class="n">testlist_ast</span><span class="p">;</span>
<span class="cm">/* XXX Why not comp_for here? */</span>
<span class="hll"> <span class="n">testlist_ast</span> <span class="o">=</span> <span class="n">ast_for_testlist</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">testlist_ast</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="hll"> <span class="n">res</span> <span class="o">=</span> <span class="n">Expression</span><span class="p">(</span><span class="n">testlist_ast</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">case</span> <span class="nl">single_input</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nl">func_type_input</span><span class="p">:</span>
<span class="p">...</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Inside the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L4512"><code>ast_for_stmt()</code></a> function, there is another <code>switch</code> statement for each possible statement type (<code>simple_stmt</code>, <code>compound_stmt</code>, and so on) and the code to determine the arguments to the node class.</p>
<p>One of the simpler functions is for the power expression, i.e., <code>2**4</code> is 2 to the power of 4. This function starts by getting the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L2770"><code>ast_for_atom_expr()</code></a>, which is the number <code>2</code> in our example, then if that has one child, it returns the atomic expression. If it has more than one child, it will get the right-hand (the number <code>4</code>) and return a <code>BinOp</code> (binary operation) with the operator as <code>Pow</code> (power), the left hand of <code>e</code> (2), and the right hand of <code>f</code> (4):</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">expr_ty</span>
<span class="nf">ast_for_power</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiling</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="k">const</span> <span class="n">node</span> <span class="o">*</span><span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
<span class="cm">/* power: atom trailer* ('**' factor)*</span>
<span class="cm"> */</span>
<span class="n">expr_ty</span> <span class="n">e</span><span class="p">;</span>
<span class="n">REQ</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">power</span><span class="p">);</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">ast_for_atom_expr</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">e</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">e</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="o">==</span> <span class="n">factor</span><span class="p">)</span> <span class="p">{</span>
<span class="n">expr_ty</span> <span class="n">f</span> <span class="o">=</span> <span class="n">ast_for_expr</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">f</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">Pow</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">LINENO</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">n</span><span class="o">-></span><span class="n">n_col_offset</span><span class="p">,</span>
<span class="n">n</span><span class="o">-></span><span class="n">n_end_lineno</span><span class="p">,</span> <span class="n">n</span><span class="o">-></span><span class="n">n_end_col_offset</span><span class="p">,</span> <span class="n">c</span><span class="o">-></span><span class="n">c_arena</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">e</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>You can see the result of this if you send a short function to the <code>instaviz</code> module:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
<span class="go"> 2**4</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">instaviz</span>
<span class="gp">>>> </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
</pre></div>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png" width="1708" height="1094" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png&w=427&sig=68167794b71cec6447aa8fcb4d22ca738a2e2ce4 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png&w=854&sig=feeef6a7ce0b14d39c57956f77320125bcce43bf 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png 1708w" sizes="75vw" alt="Instaviz screenshot 4"/></a></p>
<p>In the UI you can also see the corresponding properties:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png" width="1708" height="630" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png&w=427&sig=91663308ef854874e262756b2a28e598d263fff5 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png&w=854&sig=fbfcdd33bcf685f91ebe0559b4135ead2b296d7e 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png 1708w" sizes="75vw" alt="Instaviz screenshot 5"/></a></p>
<p>In summary, each statement type and expression has a corresponding <code>ast_for_*()</code> function to create it. The arguments are defined in <code>Parser/Python.asdl</code> and exposed via the <code>ast</code> module in the standard library. If an expression or statement has children, then it will call the corresponding <code>ast_for_*</code> child function in a depth-first traversal.</p>
<h3 id="conclusion_1">Conclusion</h3>
<p>CPython’s versatility and low-level execution API make it the ideal candidate for an embedded scripting engine. You will see CPython used in many UI applications, such as Game Design, 3D graphics and system automation. </p>
<p>The interpreter process is flexible and efficient, and now you have an understanding of how it works you’re ready to understand the compiler.</p>
<h2 h1="h1" id="part-3-the-cpython-compiler-and-execution-loop">Part 3: The CPython Compiler and Execution Loop</h2>
<p>In Part 2, you saw how the CPython interpreter takes an input, such as a file or string, and converts it into a logical Abstract Syntax Tree. We’re still not at the stage where this code can be executed. Next, we have to go deeper to convert the Abstract Syntax Tree into a set of sequential commands that the CPU can understand. </p>
<h3 id="compiling">Compiling</h3>
<p>Now the interpreter has an AST with the properties required for each of the operations, functions, classes, and namespaces. It is the job of the compiler to turn the AST into something the CPU can understand.</p>
<p>This compilation task is split into 2 parts:</p>
<ol>
<li>Traverse the tree and create a control-flow-graph, which represents the logical sequence for execution</li>
<li>Convert the nodes in the CFG to smaller, executable statements, known as byte-code</li>
</ol>
<p>Earlier, we were looking at how files are executed, and the <code>PyRun_FileExFlags()</code> function in <code>Python/pythonrun.c</code>. Inside this function, we converted the <code>FILE</code> handle into a <code>mod</code>, of type <code>mod_ty</code>. This task was completed by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>, which in turns calls the <code>tokenizer</code>, <code>parser-tokenizer</code> and then the AST:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyRun_FileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename_str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="hll"> <span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
</span> <span class="p">...</span>
<span class="hll"> <span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
</span><span class="p">}</span>
</pre></div>
<p>The resulting module from the call to is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> still in <code>Python/pythonrun.c</code>. This is a small function that gets a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/code.h#L69"><code>PyCodeObject</code></a> from <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> and sends it on to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>. You will tackle <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> in the next section:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">run_mod</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
<span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="hll"> <span class="n">co</span> <span class="o">=</span> <span class="n">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">"exec"</span><span class="p">,</span> <span class="s">"O"</span><span class="p">,</span> <span class="n">co</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="hll"> <span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
</span> <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="k">return</span> <span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> function is the main entry point to the CPython compiler. It takes a Python module as its primary argument, along with the name of the file, the globals, locals, and the <code>PyArena</code> all created earlier in the interpreter process.</p>
<p>We’re starting to get into the guts of the CPython compiler now, with decades of development and Computer Science theory behind it. Don’t be put off by the language. Once we break down the compiler into logical steps, it’ll make sense.</p>
<p>Before the compiler starts, a global compiler state is created. This type, <code>compiler</code> is defined in <code>Python/compile.c</code> and contains properties used by the compiler to remember the compiler flags, the stack, and the <code>PyArena</code>:</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">compiler</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">c_filename</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">c_st</span><span class="p">;</span>
<span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">c_future</span><span class="p">;</span> <span class="cm">/* pointer to module's __future__ */</span>
<span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">c_flags</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c_optimize</span><span class="p">;</span> <span class="cm">/* optimization level */</span>
<span class="kt">int</span> <span class="n">c_interactive</span><span class="p">;</span> <span class="cm">/* true if in interactive mode */</span>
<span class="kt">int</span> <span class="n">c_nestlevel</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c_do_not_emit_bytecode</span><span class="p">;</span> <span class="cm">/* The compiler won't emit any bytecode</span>
<span class="cm"> if this value is different from zero.</span>
<span class="cm"> This can be used to temporarily visit</span>
<span class="cm"> nodes without emitting bytecode to</span>
<span class="cm"> check only errors. */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">c_const_cache</span><span class="p">;</span> <span class="cm">/* Python dict holding all constants,</span>
<span class="cm"> including names tuple */</span>
<span class="k">struct</span> <span class="n">compiler_unit</span> <span class="o">*</span><span class="n">u</span><span class="p">;</span> <span class="cm">/* compiler state for current block */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">c_stack</span><span class="p">;</span> <span class="cm">/* Python list holding compiler_unit ptrs */</span>
<span class="n">PyArena</span> <span class="o">*</span><span class="n">c_arena</span><span class="p">;</span> <span class="cm">/* pointer to memory allocation arena */</span>
<span class="p">};</span>
</pre></div>
<p>Inside <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a>, there are 11 main steps happening:</p>
<ol>
<li>Create an empty <code>__doc__</code> property to the module if it doesn’t exist.</li>
<li>Create an empty <code>__annotations__</code> property to the module if it doesn’t exist.</li>
<li>Set the filename of the global compiler state to the filename argument.</li>
<li>Set the memory allocation arena for the compiler to the one used by the interpreter.</li>
<li>Copy any <code>__future__</code> flags in the module to the future flags in the compiler.</li>
<li>Merge runtime flags provided by the command-line or environment variables.</li>
<li>Enable any <code>__future__</code> features in the compiler.</li>
<li>Set the optimization level to the provided argument, or default.</li>
<li>Build a symbol table from the module object.</li>
<li>Run the compiler with the compiler state and return the code object.</li>
<li>Free any allocated memory by the compiler.</li>
</ol>
<div class="highlight c"><pre><span></span><span class="n">PyCodeObject</span> <span class="o">*</span>
<span class="nf">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">optimize</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">compiler</span> <span class="n">c</span><span class="p">;</span>
<span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">PyCompilerFlags</span> <span class="n">local_flags</span> <span class="o">=</span> <span class="n">_PyCompilerFlags_INIT</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">merged</span><span class="p">;</span>
<span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span> <span class="o">=</span> <span class="o">&</span><span class="n">_PyInterpreterState_GET_UNSAFE</span><span class="p">()</span><span class="o">-></span><span class="n">config</span><span class="p">;</span>
<span class="hll">
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__doc__</span><span class="p">)</span> <span class="p">{</span>
<span class="n">__doc__</span> <span class="o">=</span> <span class="n">PyUnicode_InternFromString</span><span class="p">(</span><span class="s">"__doc__"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__doc__</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__annotations__</span><span class="p">)</span> <span class="p">{</span>
</span> <span class="n">__annotations__</span> <span class="o">=</span> <span class="n">PyUnicode_InternFromString</span><span class="p">(</span><span class="s">"__annotations__"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__annotations__</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_init</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">))</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="hll"> <span class="n">Py_INCREF</span><span class="p">(</span><span class="n">filename</span><span class="p">);</span>
</span><span class="hll"> <span class="n">c</span><span class="p">.</span><span class="n">c_filename</span> <span class="o">=</span> <span class="n">filename</span><span class="p">;</span>
</span><span class="hll"> <span class="n">c</span><span class="p">.</span><span class="n">c_arena</span> <span class="o">=</span> <span class="n">arena</span><span class="p">;</span>
</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span> <span class="o">=</span> <span class="n">PyFuture_FromASTObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">c_future</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">flags</span><span class="p">)</span> <span class="p">{</span>
<span class="n">flags</span> <span class="o">=</span> <span class="o">&</span><span class="n">local_flags</span><span class="p">;</span>
<span class="hll"> <span class="p">}</span>
</span><span class="hll"> <span class="n">merged</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="o">-></span><span class="n">ff_features</span> <span class="o">|</span> <span class="n">flags</span><span class="o">-></span><span class="n">cf_flags</span><span class="p">;</span>
</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="o">-></span><span class="n">ff_features</span> <span class="o">=</span> <span class="n">merged</span><span class="p">;</span>
<span class="n">flags</span><span class="o">-></span><span class="n">cf_flags</span> <span class="o">=</span> <span class="n">merged</span><span class="p">;</span>
<span class="hll"> <span class="n">c</span><span class="p">.</span><span class="n">c_flags</span> <span class="o">=</span> <span class="n">flags</span><span class="p">;</span>
</span> <span class="n">c</span><span class="p">.</span><span class="n">c_optimize</span> <span class="o">=</span> <span class="p">(</span><span class="n">optimize</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="n">config</span><span class="o">-></span><span class="nl">optimization_level</span> <span class="p">:</span> <span class="n">optimize</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">c_nestlevel</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">c_do_not_emit_bytecode</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">_PyAST_Optimize</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">arena</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">c_optimize</span><span class="p">))</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
<span class="hll"> <span class="p">}</span>
</span>
<span class="n">c</span><span class="p">.</span><span class="n">c_st</span> <span class="o">=</span> <span class="n">PySymtable_BuildObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">c_st</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">PyErr_Occurred</span><span class="p">())</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_SystemError</span><span class="p">,</span> <span class="s">"no symtable"</span><span class="p">);</span>
<span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
<span class="hll"> <span class="p">}</span>
</span>
<span class="n">co</span> <span class="o">=</span> <span class="n">compiler_mod</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="n">mod</span><span class="p">);</span>
<span class="nl">finally</span><span class="p">:</span>
<span class="n">compiler_free</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">co</span> <span class="o">||</span> <span class="n">PyErr_Occurred</span><span class="p">());</span>
<span class="k">return</span> <span class="n">co</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="future-flags-and-compiler-flags">Future Flags and Compiler Flags</h4>
<p>Before the compiler runs, there are two types of flags to toggle the features inside the compiler. These come from two places:</p>
<ol>
<li>The interpreter state, which may have been command-line options, set in <code>pyconfig.h</code> or via environment variables</li>
<li>The use of <code>__future__</code> statements inside the actual source code of the module</li>
</ol>
<p>To distinguish the two types of flags, think that the <code>__future__</code> flags are required because of the syntax or features in that specific module. For example, Python 3.7 introduced delayed evaluation of type hints through the <code>annotations</code> future flag:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">annotations</span>
</pre></div>
<p>The code after this statement might use unresolved type hints, so the <code>__future__</code> statement is required. Otherwise, the module wouldn’t import. It would be unmaintainable to manually request that the person importing the module enable this specific compiler flag.</p>
<p>The other compiler flags are specific to the environment, so they might change the way the code executes or the way the compiler runs, but they shouldn’t link to the source in the same way that <code>__future__</code> statements do.</p>
<p>One example of a compiler flag would be the <a href="https://docs.python.org/3/using/cmdline.html#cmdoption-o"><code>-O</code> flag for optimizing the use of <code>assert</code> statements</a>. This flag disables any <code>assert</code> statements, which may have been put in the code for <a href="https://realpython.com/python-debugging-pdb/">debugging purposes</a>.
It can also be enabled with the <code>PYTHONOPTIMIZE=1</code> environment variable setting.</p>
<h4 id="symbol-tables">Symbol Tables</h4>
<p>In <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> there was a reference to a <code>symtable</code> and a call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> with the module to be executed.</p>
<p>The purpose of the symbol table is to provide a list of namespaces, globals, and locals for the compiler to use for referencing and resolving scopes.</p>
<p>The <code>symtable</code> structure in <code>Include/symtable.h</code> is well documented, so it’s clear what each of the fields is for. There should be one symtable instance for the compiler, so namespacing becomes essential. </p>
<p>If you create a function called <code>resolve_names()</code> in one module and declare another function with the same name in another module, you want to be sure which one is called. The symtable serves this purpose, as well as ensuring that variables declared within a narrow scope don’t automatically become globals (after all, this isn’t JavaScript):</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">symtable</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">st_filename</span><span class="p">;</span> <span class="cm">/* name of file being compiled,</span>
<span class="cm"> decoded from the filesystem encoding */</span>
<span class="k">struct</span> <span class="n">_symtable_entry</span> <span class="o">*</span><span class="n">st_cur</span><span class="p">;</span> <span class="cm">/* current symbol table entry */</span>
<span class="k">struct</span> <span class="n">_symtable_entry</span> <span class="o">*</span><span class="n">st_top</span><span class="p">;</span> <span class="cm">/* symbol table entry for module */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">st_blocks</span><span class="p">;</span> <span class="cm">/* dict: map AST node addresses</span>
<span class="cm"> * to symbol table entries */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">st_stack</span><span class="p">;</span> <span class="cm">/* list: stack of namespace info */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">st_global</span><span class="p">;</span> <span class="cm">/* borrowed ref to st_top->ste_symbols */</span>
<span class="kt">int</span> <span class="n">st_nblocks</span><span class="p">;</span> <span class="cm">/* number of blocks used. kept for</span>
<span class="cm"> consistency with the corresponding</span>
<span class="cm"> compiler structure */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">st_private</span><span class="p">;</span> <span class="cm">/* name of current class or NULL */</span>
<span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">st_future</span><span class="p">;</span> <span class="cm">/* module's future features that affect</span>
<span class="cm"> the symbol table */</span>
<span class="kt">int</span> <span class="n">recursion_depth</span><span class="p">;</span> <span class="cm">/* current recursion depth */</span>
<span class="kt">int</span> <span class="n">recursion_limit</span><span class="p">;</span> <span class="cm">/* recursion limit */</span>
<span class="p">};</span>
</pre></div>
<p>Some of the symbol table API is exposed via <a href="https://docs.python.org/3/library/symtable.html">the <code>symtable</code> module</a> in the standard library. You can provide an expression or a module an receive a <code>symtable.SymbolTable</code> instance.</p>
<p>You can provide a string with a Python expression and the <code>compile_type</code> of <code>"eval"</code>, or a module, function or class, and the <code>compile_mode</code> of <code>"exec"</code> to get a symbol table.</p>
<p>Looping over the elements in the table we can see some of the public and private fields and their types:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">symtable</span>
<span class="gp">>>> </span><span class="n">s</span> <span class="o">=</span> <span class="n">symtable</span><span class="o">.</span><span class="n">symtable</span><span class="p">(</span><span class="s1">'b + 1'</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s1">'test.py'</span><span class="p">,</span> <span class="n">compile_type</span><span class="o">=</span><span class="s1">'eval'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="vm">__dict__</span> <span class="k">for</span> <span class="n">symbol</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">get_symbols</span><span class="p">()]</span>
<span class="go">[{'_Symbol__name': 'b', '_Symbol__flags': 6160, '_Symbol__scope': 3, '_Symbol__namespaces': ()}]</span>
</pre></div>
<p>The C code behind this is all within <code>Python/symtable.c</code> and the primary interface is the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> function.</p>
<p>Similar to the top-level AST function we covered earlier, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> function switches between the <code>mod_ty</code> possible types (Module, Expression, Interactive, Suite, FunctionType), and visits each of the statements inside them.</p>
<p>Remember, <code>mod_ty</code> is an AST instance, so the will now recursively explore the nodes and branches of the tree and add entries to the symtable:</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span>
<span class="nf">PySymtable_BuildObject</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">future</span><span class="p">)</span>
<span class="p">{</span>
<span class="hll"> <span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">st</span> <span class="o">=</span> <span class="n">symtable_new</span><span class="p">();</span>
</span> <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">seq</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
<span class="n">PyThreadState</span> <span class="o">*</span><span class="n">tstate</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">recursion_limit</span> <span class="o">=</span> <span class="n">Py_GetRecursionLimit</span><span class="p">();</span>
<span class="p">...</span>
<span class="n">st</span><span class="o">-></span><span class="n">st_top</span> <span class="o">=</span> <span class="n">st</span><span class="o">-></span><span class="n">st_cur</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">mod</span><span class="o">-></span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="nl">Module_kind</span><span class="p">:</span>
<span class="n">seq</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">body</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">asdl_seq_LEN</span><span class="p">(</span><span class="n">seq</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_visit_stmt</span><span class="p">(</span><span class="n">st</span><span class="p">,</span>
</span> <span class="p">(</span><span class="n">stmt_ty</span><span class="p">)</span><span class="n">asdl_seq_GET</span><span class="p">(</span><span class="n">seq</span><span class="p">,</span> <span class="n">i</span><span class="p">)))</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nl">Expression_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Interactive_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Suite_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">FunctionType_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>So for a module, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> will loop through each statement in the module and call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L1176"><code>symtable_visit_stmt()</code></a>. The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L1176"><code>symtable_visit_stmt()</code></a> is a huge <code>switch</code> statement with a case for each statement type (defined in <code>Parser/Python.asdl</code>).</p>
<p>For each statement type, there is specific logic to that statement type. For example, a function definition has particular logic for:</p>
<ol>
<li>If the recursion depth is beyond the limit, raise a recursion depth error</li>
<li>The name of the function to be added as a local variable</li>
<li>The default values for sequential arguments to be resolved</li>
<li>The default values for keyword arguments to be resolved </li>
<li>Any annotations for the arguments or the return type are resolved</li>
<li>Any function decorators are resolved</li>
<li>The code block with the contents of the function is visited in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L973"><code>symtable_enter_block()</code></a></li>
<li>The arguments are visited</li>
<li>The body of the function is visited</li>
</ol>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> If you’ve ever wondered why Python’s default arguments are mutable, the reason is in this function. You can see they are a pointer to the variable in the symtable. No extra work is done to copy any values to an immutable type.</p>
</div>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">symtable_visit_stmt</span><span class="p">(</span><span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">st</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">st</span><span class="o">-></span><span class="n">recursion_depth</span> <span class="o">></span> <span class="n">st</span><span class="o">-></span><span class="n">recursion_limit</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 1.</span>
</span> <span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_RecursionError</span><span class="p">,</span>
<span class="s">"maximum recursion depth exceeded during compilation"</span><span class="p">);</span>
<span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="nl">FunctionDef_kind</span><span class="p">:</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_add_def</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">DEF_LOCAL</span><span class="p">))</span> <span class="c1">// 2.</span>
</span> <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-></span><span class="n">defaults</span><span class="p">)</span> <span class="c1">// 3.</span>
</span> <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-></span><span class="n">defaults</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-></span><span class="n">kw_defaults</span><span class="p">)</span> <span class="c1">// 4.</span>
</span> <span class="n">VISIT_SEQ_WITH_NULL</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-></span><span class="n">kw_defaults</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_visit_annotations</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="p">,</span> <span class="c1">// 5.</span>
</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">returns</span><span class="p">))</span>
<span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">decorator_list</span><span class="p">)</span> <span class="c1">// 6.</span>
</span> <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">decorator_list</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_enter_block</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="c1">// 7.</span>
</span> <span class="n">FunctionBlock</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">lineno</span><span class="p">,</span>
<span class="n">s</span><span class="o">-></span><span class="n">col_offset</span><span class="p">))</span>
<span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="hll"> <span class="n">VISIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">arguments</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="p">);</span> <span class="c1">// 8.</span>
</span><span class="hll"> <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">body</span><span class="p">);</span> <span class="c1">// 9.</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_exit_block</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="p">))</span>
<span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nl">ClassDef_kind</span><span class="p">:</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="k">case</span> <span class="nl">Return_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Delete_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Assign_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">AnnAssign_kind</span><span class="p">:</span>
<span class="p">...</span>
</pre></div>
<p>Once the resulting symtable has been created, it is sent back to be used for the compiler.</p>
<h4 id="core-compilation-process">Core Compilation Process</h4>
<p>Now that the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> has a compiler state, a symtable, and a module in the form of the AST, the actual compilation can begin.</p>
<p>The purpose of the core compiler is to:</p>
<ul>
<li>Convert the state, symtable, and AST into a <a href="https://en.wikipedia.org/wiki/Control-flow_graph">Control-Flow-Graph (CFG)</a></li>
<li>Protect the execution stage from runtime exceptions by catching any logic and code errors and raising them here</li>
</ul>
<p>You can call the CPython compiler in Python code by calling the built-in function <code>compile()</code>. It returns a <code>code object</code> instance:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">compile</span><span class="p">(</span><span class="s1">'b+1'</span><span class="p">,</span> <span class="s1">'test.py'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'eval'</span><span class="p">)</span>
<span class="go"><code object <module> at 0x10f222780, file "test.py", line 1></span>
</pre></div>
<p>The same as with the <code>symtable()</code> function, a simple expression should have a mode of <code>'eval'</code> and a module, function, or class should have a mode of <code>'exec'</code>.</p>
<p>The compiled code can be found in the <code>co_code</code> property of the code object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">co</span><span class="o">.</span><span class="n">co_code</span>
<span class="go">b'e\x00d\x00\x17\x00S\x00'</span>
</pre></div>
<p>There is also a <code>dis</code> module in the standard library, which disassembles the bytecode instructions and can print them on the screen or give you a list of <code>Instruction</code> instances.</p>
<p>If you import <code>dis</code> and give the <code>dis()</code> function the code object’s <code>co_code</code> property it disassembles it and prints the instructions on the REPL:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">dis</span>
<span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">co</span><span class="o">.</span><span class="n">co_code</span><span class="p">)</span>
<span class="go"> 0 LOAD_NAME 0 (0)</span>
<span class="go"> 2 LOAD_CONST 0 (0)</span>
<span class="go"> 4 BINARY_ADD</span>
<span class="go"> 6 RETURN_VALUE</span>
</pre></div>
<p><code>LOAD_NAME</code>, <code>LOAD_CONST</code>, <code>BINARY_ADD</code>, and <code>RETURN_VALUE</code> are all bytecode instructions. They’re called bytecode because, in binary form, they were a byte long. However, since Python 3.6 the storage format was changed to a <code>word</code>, so now they’re technically wordcode, not bytecode.</p>
<p>The <a href="https://docs.python.org/3/library/dis.html#python-bytecode-instructions">full list of bytecode instructions</a> is available for each version of Python, and it does change between versions. For example, in Python 3.7, some new bytecode instructions were introduced to speed up execution of specific method calls.</p>
<p>In an earlier section, we explored the <code>instaviz</code> package. This included a visualization of the code object type by running the compiler. It also displays the Bytecode operations inside the code objects.</p>
<p>Execute instaviz again to see the code object and bytecode for a function defined on the REPL:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">instaviz</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">example</span><span class="p">():</span>
<span class="go"> a = 1</span>
<span class="go"> b = a + 1</span>
<span class="go"> return b</span>
<span class="gp">>>> </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">example</span><span class="p">)</span>
</pre></div>
<p>If we now jump into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1782"><code>compiler_mod()</code></a>, a function used to switch to different compiler functions depending on the module type. We’ll assume that <code>mod</code> is a <code>Module</code>. The module is compiled into the compiler state and then <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5971"><code>assemble()</code></a> is run to create a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/code.h#L69"><code>PyCodeObject</code></a>.</p>
<p>The new code object is returned back to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> and sent on for execution:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
<span class="nf">compiler_mod</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">mod_ty</span> <span class="n">mod</span><span class="p">)</span>
<span class="p">{</span>
<span class="hll"> <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
</span> <span class="kt">int</span> <span class="n">addNone</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">module</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">mod</span><span class="o">-></span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="nl">Module_kind</span><span class="p">:</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_body</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">mod</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">body</span><span class="p">))</span> <span class="p">{</span>
</span> <span class="n">compiler_exit_scope</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nl">Interactive_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Expression_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">Suite_kind</span><span class="p">:</span>
<span class="p">...</span>
<span class="p">...</span>
<span class="hll"> <span class="n">co</span> <span class="o">=</span> <span class="n">assemble</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">addNone</span><span class="p">);</span>
</span> <span class="n">compiler_exit_scope</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="hll"> <span class="k">return</span> <span class="n">co</span><span class="p">;</span>
</span><span class="p">}</span>
</pre></div>
<p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1743"><code>compiler_body()</code></a> function has some optimization flags and then loops over each statement in the module and visits it, similar to how the <code>symtable</code> functions worked:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">compiler_body</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">stmts</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">stmt_ty</span> <span class="n">st</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">docstring</span><span class="p">;</span>
<span class="p">...</span>
<span class="hll"> <span class="k">for</span> <span class="p">(;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">asdl_seq_LEN</span><span class="p">(</span><span class="n">stmts</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
</span><span class="hll"> <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="p">(</span><span class="n">stmt_ty</span><span class="p">)</span><span class="n">asdl_seq_GET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">i</span><span class="p">));</span>
</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The statement type is determined through a call to the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/asdl.h#L32"><code>asdl_seq_GET()</code></a> function, which looks at the AST node’s type.</p>
<p>Through some smart macros, <code>VISIT</code> calls a function in <code>Python/compile.c</code> for each statement type:</p>
<div class="highlight c"><pre><span></span><span class="cp">#define VISIT(C, TYPE, V) {\</span>
<span class="cp"> if (!compiler_visit_ ## TYPE((C), (V))) \</span>
<span class="cp"> return 0; \</span>
<span class="cp">}</span>
</pre></div>
<p>For a <code>stmt</code> (the category for a statement) the compiler will then drop into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L3310"><code>compiler_visit_stmt()</code></a> and switch through all of the potential statement types found in <code>Parser/Python.asdl</code>:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">compiler_visit_stmt</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Py_ssize_t</span> <span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">;</span>
<span class="cm">/* Always assign a lineno to the next instruction for a stmt. */</span>
<span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_lineno</span> <span class="o">=</span> <span class="n">s</span><span class="o">-></span><span class="n">lineno</span><span class="p">;</span>
<span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_col_offset</span> <span class="o">=</span> <span class="n">s</span><span class="o">-></span><span class="n">col_offset</span><span class="p">;</span>
<span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_lineno_set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="nl">FunctionDef_kind</span><span class="p">:</span>
<span class="k">return</span> <span class="n">compiler_function</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">case</span> <span class="nl">ClassDef_kind</span><span class="p">:</span>
<span class="k">return</span> <span class="n">compiler_class</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
<span class="p">...</span>
<span class="k">case</span> <span class="nl">For_kind</span><span class="p">:</span>
<span class="k">return</span> <span class="n">compiler_for</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>As an example, let’s focus on the <code>For</code> statement, in Python is the:</p>
<div class="highlight python"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">:</span>
<span class="c1"># block</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># optional if iterable is False</span>
<span class="c1"># block</span>
</pre></div>
<p>If the statement is a <code>For</code> type, it calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L2651"><code>compiler_for()</code></a>. There is an equivalent <code>compiler_*()</code> function for all of the statement and expression types. The more straightforward types create the bytecode instructions inline, some of the more complex statement types call other functions.</p>
<p>Many of the statements can have sub-statements. A <code>for</code> loop has a body, but you can also have complex expressions in the assignment and the iterator.</p>
<p>The compiler’s <code>compiler_</code> statements sends blocks to the compiler state. These blocks contain instructions, the instruction data structure in <code>Python/compile.c</code> has the opcode, any arguments, and the target block (if this is a jump instruction), it also contains the line number.</p>
<p>For jump statements, they can either be absolute or relative jump statements. Jump statements are used to “jump” from one operation to another. Absolute jump statements specify the exact operation number in the compiled code object, whereas relative jump statements specify the jump target relative to another operation:</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">instr</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="nl">i_jabs</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="nl">i_jrel</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">i_opcode</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">i_oparg</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">i_target</span><span class="p">;</span> <span class="cm">/* target block (if jump instruction) */</span>
<span class="kt">int</span> <span class="n">i_lineno</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>So a frame block (of type <code>basicblock</code>), contains the following fields:</p>
<ul>
<li>A <code>b_list</code> pointer, the link to a list of blocks for the compiler state</li>
<li>A list of instructions <code>b_instr</code>, with both the allocated list size <code>b_ialloc</code>, and the number used <code>b_iused</code></li>
<li>The next block after this one <code>b_next</code></li>
<li>Whether the block has been “seen” by the assembler when traversing depth-first</li>
<li>If this block has a <code>RETURN_VALUE</code> opcode (<code>b_return</code>)</li>
<li>The depth of the stack when this block was entered (<code>b_startdepth</code>)</li>
<li>The instruction offset for the assembler</li>
</ul>
<div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">basicblock_</span> <span class="p">{</span>
<span class="cm">/* Each basicblock in a compilation unit is linked via b_list in the</span>
<span class="cm"> reverse order that the block are allocated. b_list points to the next</span>
<span class="cm"> block, not to be confused with b_next, which is next by control flow. */</span>
<span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">b_list</span><span class="p">;</span>
<span class="cm">/* number of instructions used */</span>
<span class="kt">int</span> <span class="n">b_iused</span><span class="p">;</span>
<span class="cm">/* length of instruction array (b_instr) */</span>
<span class="kt">int</span> <span class="n">b_ialloc</span><span class="p">;</span>
<span class="cm">/* pointer to an array of instructions, initially NULL */</span>
<span class="k">struct</span> <span class="n">instr</span> <span class="o">*</span><span class="n">b_instr</span><span class="p">;</span>
<span class="cm">/* If b_next is non-NULL, it is a pointer to the next</span>
<span class="cm"> block reached by normal control flow. */</span>
<span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">b_next</span><span class="p">;</span>
<span class="cm">/* b_seen is used to perform a DFS of basicblocks. */</span>
<span class="kt">unsigned</span> <span class="nl">b_seen</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="cm">/* b_return is true if a RETURN_VALUE opcode is inserted. */</span>
<span class="kt">unsigned</span> <span class="nl">b_return</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="cm">/* depth of stack upon entry of block, computed by stackdepth() */</span>
<span class="kt">int</span> <span class="n">b_startdepth</span><span class="p">;</span>
<span class="cm">/* instruction offset for block, computed by assemble_jump_offsets() */</span>
<span class="kt">int</span> <span class="n">b_offset</span><span class="p">;</span>
<span class="p">}</span> <span class="n">basicblock</span><span class="p">;</span>
</pre></div>
<p>The <code>For</code> statement is somewhere in the middle in terms of complexity. There are 15 steps in the compilation of a <code>For</code> statement with the <code>for <target> in <iterator>:</code> syntax:</p>
<ol>
<li>Create a new code block called <code>start</code>, this allocates memory and creates a <code>basicblock</code> pointer</li>
<li>Create a new code block called <code>cleanup</code></li>
<li>Create a new code block called <code>end</code></li>
<li>Push a frame block of type <code>FOR_LOOP</code> to the stack with <code>start</code> as the entry block and <code>end</code> as the exit block</li>
<li>Visit the iterator expression, which adds any operations for the iterator</li>
<li>Add the <code>GET_ITER</code> operation to the compiler state</li>
<li>Switch to the <code>start</code> block</li>
<li>Call <code>ADDOP_JREL</code> which calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1413"><code>compiler_addop_j()</code></a> to add the <code>FOR_ITER</code> operation with an argument of the <code>cleanup</code> block</li>
<li>Visit the <code>target</code> and add any special code, like tuple unpacking, to the <code>start</code> block</li>
<li>Visit each statement in the body of the for loop</li>
<li>Call <code>ADDOP_JABS</code> which calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1413"><code>compiler_addop_j()</code></a> to add the <code>JUMP_ABSOLUTE</code> operation which indicates after the body is executed, jumps back to the start of the loop</li>
<li>Move to the <code>cleanup</code> block</li>
<li>Pop the <code>FOR_LOOP</code> frame block off the stack</li>
<li>Visit the statements inside the <code>else</code> section of the for loop</li>
<li>Use the <code>end</code> block</li>
</ol>
<p>Referring back to the <code>basicblock</code> structure. You can see how in the compilation of the for statement, the various blocks are created and pushed into the compiler’s frame block and stack:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
<span class="nf">compiler_for</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">basicblock</span> <span class="o">*</span><span class="n">start</span><span class="p">,</span> <span class="o">*</span><span class="n">cleanup</span><span class="p">,</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="hll"> <span class="n">start</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="c1">// 1.</span>
</span><span class="hll"> <span class="n">cleanup</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="c1">// 2.</span>
</span><span class="hll"> <span class="n">end</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="c1">// 3.</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">start</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">end</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">cleanup</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_push_fblock</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_LOOP</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">))</span> <span class="c1">// 4.</span>
</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="hll"> <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span> <span class="c1">// 5.</span>
</span><span class="hll"> <span class="n">ADDOP</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">GET_ITER</span><span class="p">);</span> <span class="c1">// 6.</span>
</span><span class="hll"> <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span> <span class="c1">// 7.</span>
</span><span class="hll"> <span class="n">ADDOP_JREL</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_ITER</span><span class="p">,</span> <span class="n">cleanup</span><span class="p">);</span> <span class="c1">// 8.</span>
</span><span class="hll"> <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">target</span><span class="p">);</span> <span class="c1">// 9.</span>
</span><span class="hll"> <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">body</span><span class="p">);</span> <span class="c1">// 10.</span>
</span><span class="hll"> <span class="n">ADDOP_JABS</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">JUMP_ABSOLUTE</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span> <span class="c1">// 11.</span>
</span><span class="hll"> <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">cleanup</span><span class="p">);</span> <span class="c1">// 12.</span>
</span>
<span class="hll"> <span class="n">compiler_pop_fblock</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_LOOP</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span> <span class="c1">// 13.</span>
</span>
<span class="hll"> <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">orelse</span><span class="p">);</span> <span class="c1">// 14.</span>
</span><span class="hll"> <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="p">);</span> <span class="c1">// 15.</span>
</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Depending on the type of operation, there are different arguments required. For example, we used <code>ADDOP_JABS</code> and <code>ADDOP_JREL</code> here, which refer to “<strong>ADD</strong> <strong>O</strong>peration with <strong>J</strong>ump to a <strong>REL</strong>ative position” and “<strong>ADD</strong> <strong>O</strong>peration with <strong>J</strong>ump to an <strong>ABS</strong>olute position”. This is referring to the <code>APPOP_JREL</code> and <code>ADDOP_JABS</code> macros which call <code>compiler_addop_j(struct compiler *c, int opcode, basicblock *b, int absolute)</code> and set the <code>absolute</code> argument to 0 and 1 respectively.</p>
<p>There are some other macros, like <code>ADDOP_I</code> calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1383"><code>compiler_addop_i()</code></a> which add an operation with an integer argument, or <code>ADDOP_O</code> calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1345"><code>compiler_addop_o()</code></a> which adds an operation with a <code>PyObject</code> argument. </p>
<p>Once these stages have completed, the compiler has a list of frame blocks, each containing a list of instructions and a pointer to the next block.</p>
<h4 id="assembly">Assembly</h4>
<p>With the compiler state, the assembler performs a “depth-first-search” of the blocks and merge the instructions into a single bytecode sequence. The assembler state is declared in <code>Python/compile.c</code>:</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">assembler</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">a_bytecode</span><span class="p">;</span> <span class="cm">/* string containing bytecode */</span>
<span class="kt">int</span> <span class="n">a_offset</span><span class="p">;</span> <span class="cm">/* offset into bytecode */</span>
<span class="kt">int</span> <span class="n">a_nblocks</span><span class="p">;</span> <span class="cm">/* number of reachable blocks */</span>
<span class="n">basicblock</span> <span class="o">**</span><span class="n">a_postorder</span><span class="p">;</span> <span class="cm">/* list of blocks in dfs postorder */</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">a_lnotab</span><span class="p">;</span> <span class="cm">/* string containing lnotab */</span>
<span class="kt">int</span> <span class="n">a_lnotab_off</span><span class="p">;</span> <span class="cm">/* offset into lnotab */</span>
<span class="kt">int</span> <span class="n">a_lineno</span><span class="p">;</span> <span class="cm">/* last lineno of emitted instruction */</span>
<span class="kt">int</span> <span class="n">a_lineno_off</span><span class="p">;</span> <span class="cm">/* bytecode offset of last lineno */</span>
<span class="p">};</span>
</pre></div>
<p>The <code>assemble()</code> function has a few tasks:</p>
<ul>
<li>Calculate the number of blocks for memory allocation</li>
<li>Ensure that every block that falls off the end returns <code>None</code>, this is why every function returns <code>None</code>, whether or not a <code>return</code> statement exists</li>
<li>Resolve any jump statements offsets that were marked as relative</li>
<li>Call <code>dfs()</code> to perform a depth-first-search of the blocks</li>
<li>Emit all the instructions to the compiler</li>
<li>Call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5854"><code>makecode()</code></a> with the compiler state to generate the <code>PyCodeObject</code></li>
</ul>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
<span class="nf">assemble</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="kt">int</span> <span class="n">addNone</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">basicblock</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">entryblock</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">assembler</span> <span class="n">a</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">nblocks</span><span class="p">;</span>
<span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="cm">/* Make sure every block that falls off the end returns None.</span>
<span class="cm"> XXX NEXT_BLOCK() isn't quite right, because if the last</span>
<span class="cm"> block ends with a jump or return b_next shouldn't set.</span>
<span class="cm"> */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_curblock</span><span class="o">-></span><span class="n">b_return</span><span class="p">)</span> <span class="p">{</span>
<span class="n">NEXT_BLOCK</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">addNone</span><span class="p">)</span>
</span><span class="hll"> <span class="n">ADDOP_LOAD_CONST</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">Py_None</span><span class="p">);</span>
</span><span class="hll"> <span class="n">ADDOP</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">RETURN_VALUE</span><span class="p">);</span>
</span> <span class="p">}</span>
<span class="p">...</span>
<span class="hll"> <span class="n">dfs</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">entryblock</span><span class="p">,</span> <span class="o">&</span><span class="n">a</span><span class="p">,</span> <span class="n">nblocks</span><span class="p">);</span>
</span>
<span class="cm">/* Can't modify the bytecode after computing jump offsets. */</span>
<span class="hll"> <span class="n">assemble_jump_offsets</span><span class="p">(</span><span class="o">&</span><span class="n">a</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
</span>
<span class="cm">/* Emit code in reverse postorder from dfs. */</span>
<span class="hll"> <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">a_nblocks</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
</span><span class="hll"> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">a_postorder</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
</span><span class="hll"> <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">b</span><span class="o">-></span><span class="n">b_iused</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span>
</span><span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">assemble_emit</span><span class="p">(</span><span class="o">&</span><span class="n">a</span><span class="p">,</span> <span class="o">&</span><span class="n">b</span><span class="o">-></span><span class="n">b_instr</span><span class="p">[</span><span class="n">j</span><span class="p">]))</span>
</span><span class="hll"> <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
</span><span class="hll"> <span class="p">}</span>
</span> <span class="p">...</span>
<span class="n">co</span> <span class="o">=</span> <span class="n">makecode</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="o">&</span><span class="n">a</span><span class="p">);</span>
<span class="hll"> <span class="nl">error</span><span class="p">:</span>
</span> <span class="n">assemble_free</span><span class="p">(</span><span class="o">&</span><span class="n">a</span><span class="p">);</span>
<span class="k">return</span> <span class="n">co</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The depth-first-search is performed by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5397"><code>dfs()</code></a> function in <code>Python/compile.c</code>, which follows the the <code>b_next</code> pointers in each of the blocks, marks them as seen by toggling <code>b_seen</code> and then adds them to the assemblers <code>**a_postorder</code> list in reverse order.</p>
<p>The function loops back over the assembler’s post-order list and for each block, if it has a jump operation, recursively call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5397"><code>dfs()</code></a> for that jump:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">void</span>
<span class="nf">dfs</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">basicblock</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="k">struct</span> <span class="n">assembler</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">end</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">;</span>
<span class="cm">/* Get rid of recursion for normal control flow.</span>
<span class="cm"> Since the number of blocks is limited, unused space in a_postorder</span>
<span class="cm"> (from a_nblocks to end) can be used as a stack for still not ordered</span>
<span class="cm"> blocks. */</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="n">end</span><span class="p">;</span> <span class="n">b</span> <span class="o">&&</span> <span class="o">!</span><span class="n">b</span><span class="o">-></span><span class="n">b_seen</span><span class="p">;</span> <span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">b_next</span><span class="p">)</span> <span class="p">{</span>
<span class="n">b</span><span class="o">-></span><span class="n">b_seen</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">assert</span><span class="p">(</span><span class="n">a</span><span class="o">-></span><span class="n">a_nblocks</span> <span class="o"><</span> <span class="n">j</span><span class="p">);</span>
<span class="n">a</span><span class="o">-></span><span class="n">a_postorder</span><span class="p">[</span><span class="o">--</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">while</span> <span class="p">(</span><span class="n">j</span> <span class="o"><</span> <span class="n">end</span><span class="p">)</span> <span class="p">{</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="o">-></span><span class="n">a_postorder</span><span class="p">[</span><span class="n">j</span><span class="o">++</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">b</span><span class="o">-></span><span class="n">b_iused</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">instr</span> <span class="o">*</span><span class="n">instr</span> <span class="o">=</span> <span class="o">&</span><span class="n">b</span><span class="o">-></span><span class="n">b_instr</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">i_jrel</span> <span class="o">||</span> <span class="n">instr</span><span class="o">-></span><span class="n">i_jabs</span><span class="p">)</span>
<span class="n">dfs</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">instr</span><span class="o">-></span><span class="n">i_target</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">j</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">assert</span><span class="p">(</span><span class="n">a</span><span class="o">-></span><span class="n">a_nblocks</span> <span class="o"><</span> <span class="n">j</span><span class="p">);</span>
<span class="n">a</span><span class="o">-></span><span class="n">a_postorder</span><span class="p">[</span><span class="n">a</span><span class="o">-></span><span class="n">a_nblocks</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="creating-a-code-object">Creating a Code Object</h4>
<p>The task of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5854"><code>makecode()</code></a> is to go through the compiler state, some of the assembler’s properties and to put these into a <code>PyCodeObject</code> by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/codeobject.c#L246"><code>PyCode_New()</code></a>:</p>
<p><a href="https://files.realpython.com/media/codeobject.9c054576627c.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/codeobject.9c054576627c.png" width="201" height="550" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/codeobject.9c054576627c.png&w=50&sig=9d1c4ff65adb0d6d578b775ca93a88843a3742c1 50w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/codeobject.9c054576627c.png&w=100&sig=076765eedb49d9e4e944629435b5a0fc10942c4c 100w, https://files.realpython.com/media/codeobject.9c054576627c.png 201w" sizes="75vw" alt="PyCodeObject structure"/></a></p>
<p>The variable names, constants are put as properties to the code object:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
<span class="nf">makecode</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="k">struct</span> <span class="n">assembler</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">consts</span> <span class="o">=</span> <span class="n">consts_dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_consts</span><span class="p">);</span>
<span class="n">names</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_names</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">varnames</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_varnames</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">cellvars</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_cellvars</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">freevars</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_freevars</span><span class="p">,</span> <span class="n">PyTuple_GET_SIZE</span><span class="p">(</span><span class="n">cellvars</span><span class="p">));</span>
<span class="p">...</span>
<span class="n">flags</span> <span class="o">=</span> <span class="n">compute_code_flags</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">flags</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="n">bytecode</span> <span class="o">=</span> <span class="n">PyCode_Optimize</span><span class="p">(</span><span class="n">a</span><span class="o">-></span><span class="n">a_bytecode</span><span class="p">,</span> <span class="n">consts</span><span class="p">,</span> <span class="n">names</span><span class="p">,</span> <span class="n">a</span><span class="o">-></span><span class="n">a_lnotab</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">co</span> <span class="o">=</span> <span class="n">PyCode_NewWithPosOnlyArgs</span><span class="p">(</span><span class="n">posonlyargcount</span><span class="o">+</span><span class="n">posorkeywordargcount</span><span class="p">,</span>
<span class="n">posonlyargcount</span><span class="p">,</span> <span class="n">kwonlyargcount</span><span class="p">,</span> <span class="n">nlocals_int</span><span class="p">,</span>
<span class="n">maxdepth</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">bytecode</span><span class="p">,</span> <span class="n">consts</span><span class="p">,</span> <span class="n">names</span><span class="p">,</span>
<span class="n">varnames</span><span class="p">,</span> <span class="n">freevars</span><span class="p">,</span> <span class="n">cellvars</span><span class="p">,</span> <span class="n">c</span><span class="o">-></span><span class="n">c_filename</span><span class="p">,</span>
<span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_name</span><span class="p">,</span> <span class="n">c</span><span class="o">-></span><span class="n">u</span><span class="o">-></span><span class="n">u_firstlineno</span><span class="p">,</span> <span class="n">a</span><span class="o">-></span><span class="n">a_lnotab</span><span class="p">);</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">co</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>You may also notice that the bytecode is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/peephole.c#L230"><code>PyCode_Optimize()</code></a> before it is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/codeobject.c#L106"><code>PyCode_NewWithPosOnlyArgs()</code></a>. This function is part of the bytecode optimization process in <code>Python/peephole.c</code>.</p>
<p>The peephole optimizer goes through the bytecode instructions and in certain scenarios, replace them with other instructions. For example, there is an optimizer called “constant unfolding”, so if you put the following statement into your script:</p>
<div class="highlight python"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span>
</pre></div>
<p>It optimizes that to:</p>
<div class="highlight python"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="mi">6</span>
</pre></div>
<p>Because 1 and 5 are constant values, so the result should always be the same.</p>
<h4 id="conclusion_2">Conclusion</h4>
<p>We can pull together all of these stages with the instaviz module:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">instaviz</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="o">**</span><span class="mi">4</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span>
<span class="n">c</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">c</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="k">return</span> <span class="n">c</span>
<span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
</pre></div>
<p>Will produce an AST graph:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png" width="2788" height="1554" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png&w=697&sig=9106a1bc23c5cc07f2ef159968c1ba2155a6562d 697w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png&w=1394&sig=244bebb56c46d0b1eef97b8332afb6bfaa5e3648 1394w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png 2788w" sizes="75vw" alt="Instaviz screenshot 6"/></a></p>
<p>With bytecode instructions in sequence:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png" width="2536" height="1592" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png&w=634&sig=f499889305c84679bdf07256f978975bbbc98c03 634w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png&w=1268&sig=8d86595bce0f05c8dc45c3db3c0bff9b2d9cb0a9 1268w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png 2536w" sizes="75vw" alt="Instaviz screenshot 7"/></a></p>
<p>Also, the code object with the variable names, constants, and binary <code>co_code</code>:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png" width="2098" height="940" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png&w=524&sig=6daa3f3b9841eabbbf87d73886b81d346cdb33b3 524w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png&w=1049&sig=7b074cc948021547f3da60e6bf0be3747585c6c8 1049w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png 2098w" sizes="75vw" alt="Instaviz screenshot 8"/></a></p>
<h3 id="execution">Execution</h3>
<p>In <code>Python/pythonrun.c</code> we broke out just before the call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>.</p>
<p>This call takes a code object, either fetched from the marshaled <code>.pyc</code> file, or compiled through the AST and compiler stages.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> will pass the globals, locals, <code>PyArena</code>, and compiled <code>PyCodeObject</code> to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> in <code>Python/ceval.c</code>.</p>
<p>This stage forms the execution component of CPython. Each of the bytecode operations is taken and executed using a <a href="http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch10s07.html">“Stack Frame” based system</a>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>What is a Stack Frame?</strong></p>
<p>Stack Frames are a data type used by many runtimes, not just Python, that allows functions to be called and variables to be returned between functions. Stack Frames also contain arguments, local variables, and other state information.</p>
<p>Typically, a Stack Frame exists for every function call, and they are stacked in sequence. You can see CPython’s frame stack anytime an exception is unhandled and the stack is printed on the screen.</p>
</div>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> is the public API for evaluating a code object. The logic for evaluation is split between <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L745"><code>_PyEval_EvalFrameDefault()</code></a>, which are both in <code>ceval.c</code>.</p>
<p>The public API <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> will construct an execution frame from the top of the stack by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a>.</p>
<p>The construction of the first execution frame has many steps:</p>
<ol>
<li>Keyword and positional arguments are resolved.</li>
<li>The use of <code>*args</code> and <code>**kwargs</code> in function definitions are resolved.</li>
<li>Arguments are added as local variables to the scope.</li>
<li>Co-routines and <a href="https://realpython.com/introduction-to-python-generators/">Generators</a> are created, including the Asynchronous Generators.</li>
</ol>
<p>The frame object looks like this:</p>
<p><a href="https://files.realpython.com/media/PyFrameObject.8616eee0503e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/PyFrameObject.8616eee0503e.png" width="161" height="408" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyFrameObject.8616eee0503e.png&w=40&sig=5c85bcc7939e61d207a19cf82d23cab3f73ec760 40w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyFrameObject.8616eee0503e.png&w=80&sig=25333bbe22791650facbac4288bb9f49065fb014 80w, https://files.realpython.com/media/PyFrameObject.8616eee0503e.png 161w" sizes="75vw" alt="PyFrameObject structure"/></a></p>
<p>Let’s step through those sequences.</p>
<h4 id="1-constructing-thread-state">1. Constructing Thread State</h4>
<p>Before a frame can be executed, it needs to be referenced from a thread. CPython can have many threads running at any one time within a single interpreter. An Interpreter state includes a list of those threads as a linked list. The thread structure is called <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/pystate.h#L23"><code>PyThreadState</code></a>, and there are many references throughout <code>ceval.c</code>.</p>
<p>Here is the structure of the thread state object:</p>
<p><a href="https://files.realpython.com/media/PyThreadState.20467f3689b7.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/PyThreadState.20467f3689b7.png" width="201" height="208" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyThreadState.20467f3689b7.png&w=50&sig=90efd8d98ffa8ad9ed8b233c1e73fa469e4db4ac 50w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyThreadState.20467f3689b7.png&w=100&sig=e5d4a21dbc5cce9c1017a6d1805cf9eedf03ac9c 100w, https://files.realpython.com/media/PyThreadState.20467f3689b7.png 201w" sizes="75vw" alt="PyThreadState structure"/></a></p>
<h4 id="2-constructing-frames">2. Constructing Frames</h4>
<p>The input to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> and therefore <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a> has arguments for:</p>
<ul>
<li><strong><code>_co</code>:</strong> a <code>PyCodeObject</code></li>
<li><strong><code>globals</code>:</strong> a <code>PyDict</code> with variable names as keys and their values</li>
<li><strong><code>locals</code>:</strong> a <code>PyDict</code> with variable names as keys and their values</li>
</ul>
<p>The other arguments are optional, and not used for the basic API:</p>
<ul>
<li><strong><code>args</code>:</strong> a <code>PyTuple</code> with positional argument values in order, and <code>argcount</code> for the number of values</li>
<li><strong><code>kwnames</code>:</strong> a list of keyword argument names</li>
<li><strong><code>kwargs</code>:</strong> a list of keyword argument values, and <code>kwcount</code> for the number of them</li>
<li><strong><code>defs</code>:</strong> a list of default values for positional arguments, and <code>defcount</code> for the length</li>
<li><strong><code>kwdefs</code>:</strong> a dictionary with the default values for keyword arguments</li>
<li><strong><code>closure</code>:</strong> a tuple with strings to merge into the code objects <code>co_freevars</code> field</li>
<li><strong><code>name</code>:</strong> the name for this evaluation statement as a string</li>
<li><strong><code>qualname</code>:</strong> the qualified name for this evaluation statement as a string</li>
</ul>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">_PyEval_EvalCodeWithName</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">_co</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">Py_ssize_t</span> <span class="n">argcount</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">kwnames</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">kwargs</span><span class="p">,</span>
<span class="n">Py_ssize_t</span> <span class="n">kwcount</span><span class="p">,</span> <span class="kt">int</span> <span class="n">kwstep</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">defs</span><span class="p">,</span> <span class="n">Py_ssize_t</span> <span class="n">defcount</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">kwdefs</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">closure</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">name</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">qualname</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">PyThreadState</span> <span class="o">*</span><span class="n">tstate</span> <span class="o">=</span> <span class="n">_PyThreadState_GET</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">tstate</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">globals</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">_PyErr_SetString</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">PyExc_SystemError</span><span class="p">,</span>
<span class="s">"PyEval_EvalCodeEx: NULL globals"</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Create the frame */</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">_PyFrame_New_NoTrack</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">f</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fastlocals</span> <span class="o">=</span> <span class="n">f</span><span class="o">-></span><span class="n">f_localsplus</span><span class="p">;</span>
<span class="n">freevars</span> <span class="o">=</span> <span class="n">f</span><span class="o">-></span><span class="n">f_localsplus</span> <span class="o">+</span> <span class="n">co</span><span class="o">-></span><span class="n">co_nlocals</span><span class="p">;</span>
</pre></div>
<h4 id="3-converting-keyword-parameters-to-a-dictionary">3. Converting Keyword Parameters to a Dictionary</h4>
<p>If the function definition contained a <code>**kwargs</code> style catch-all for keyword arguments, then a new dictionary is created, and the values are copied across. The <code>kwargs</code> name is then set as a variable, like in this example:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">example</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">arg2</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">kwargs</span><span class="p">[</span><span class="s1">'extra'</span><span class="p">])</span> <span class="c1"># this would resolve to a dictionary key</span>
</pre></div>
<p>The logic for creating a keyword argument dictionary is in the next part of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045">_PyEval_EvalCodeWithName()</a>:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Create a dictionary for keyword parameters (**kwargs) */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_VARKEYWORDS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">kwdict</span> <span class="o">=</span> <span class="n">PyDict_New</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">kwdict</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">total_args</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_VARARGS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">i</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">SETLOCAL</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">kwdict</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="n">kwdict</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The <code>kwdict</code> variable will reference a <code>PyDictObject</code> if any keyword arguments were found.</p>
<h4 id="4-converting-positional-arguments-into-variables">4. Converting Positional Arguments Into Variables</h4>
<p>Next, each of the positional arguments (if provided) are set as local variables:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Copy all positional arguments into local variables */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argcount</span> <span class="o">></span> <span class="n">co</span><span class="o">-></span><span class="n">co_argcount</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">co_argcount</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">argcount</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">x</span><span class="p">);</span>
<span class="n">SETLOCAL</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>At the end of the loop, you’ll see a call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L1005"><code>SETLOCAL()</code></a> with the value, so if a positional argument is defined with a value, that is available within this scope: </p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">example</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">)</span> <span class="c1"># both args are already local variables.</span>
</pre></div>
<p>Also, the reference counter for those variables is incremented, so the garbage collector won’t remove them until the frame has evaluated.</p>
<h4 id="5-packing-positional-arguments-into-args">5. Packing Positional Arguments Into <code>*args</code></h4>
<p>Similar to <code>**kwargs</code>, a function argument prepended with a <code>*</code> can be set to catch all remaining positional arguments. This argument is a tuple and the <code>*args</code> name is set as a local variable: </p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Pack other positional arguments into the *args argument */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_VARARGS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">u</span> <span class="o">=</span> <span class="n">_PyTuple_FromArray</span><span class="p">(</span><span class="n">args</span> <span class="o">+</span> <span class="n">n</span><span class="p">,</span> <span class="n">argcount</span> <span class="o">-</span> <span class="n">n</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">u</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">SETLOCAL</span><span class="p">(</span><span class="n">total_args</span><span class="p">,</span> <span class="n">u</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h4 id="6-loading-keyword-arguments">6. Loading Keyword Arguments</h4>
<p>If the function was called with keyword arguments and values, the <code>kwdict</code> dictionary created in step 4 is now filled with any remaining keyword arguments passed by the caller that doesn’t resolve to named arguments or positional arguments.</p>
<p>For example, the <code>e</code> argument was neither positional or named, so it is added to <code>**remaining</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">my_function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">remaining</span><span class="p">):</span>
<span class="go"> print(a, b, c, d, remaining)</span>
<span class="gp">>>> </span><span class="n">my_function</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="go">(1, 2, 3, 4, {'e': 5})</span>
</pre></div>
<div class="alert alert-primary" role="alert">
<p><strong>Positional-only arguments</strong> is a new feature in Python 3.8. Introduced in <a href="https://www.python.org/dev/peps/pep-0570/">PEP570</a>, positional-only arguments are a way of stopping users of your API from using positional arguments with a keyword syntax.</p>
<p>For example, this simple function converts Farenheit to Celcius. Note, the use of <code>/</code> as a special argument seperates positional-only arguments from the other arguments.</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">to_celcius</span><span class="p">(</span><span class="n">farenheit</span><span class="p">,</span> <span class="o">/</span><span class="p">,</span> <span class="n">options</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">farenheit</span><span class="o">-</span><span class="mi">31</span><span class="p">)</span><span class="o">*</span><span class="mi">5</span><span class="o">/</span><span class="mi">9</span>
</pre></div>
<p>All arguments to the left of <code>/</code> must be called only as a positional argument, and arguments to the right can be called as either positional or keyword arguments:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">to_celcius</span><span class="p">(</span><span class="mi">110</span><span class="p">)</span>
</pre></div>
<p>Calling the function using a keyword argument to a positional-only argument will raise a <code>TypeError</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">to_celcius</span><span class="p">(</span><span class="n">farenheit</span><span class="o">=</span><span class="mi">110</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="gr">TypeError</span>: <span class="n">to_celcius() got some positional-only arguments passed as keyword arguments: 'farenheit'</span>
</pre></div>
</div>
<p>The resolution of the keyword argument dictionary values comes after the unpacking of all other arguments. The PEP570 positional-only arguments are shown by starting the keyword-argument loop at <code>co_posonlyargcount</code>. If the <code>/</code> symbol was used on the 3rd argument, the value of <code>co_posonlyargcount</code> would be <code>2</code>.
<code>PyDict_SetItem()</code> is called for each remaining argument to add it to the <code>locals</code> dictionary, so when executing, each of the keyword arguments are scoped local variables:</p>
<div class="highlight c"><pre><span></span> <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">kwcount</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="n">kwstep</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">**</span><span class="n">co_varnames</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">keyword</span> <span class="o">=</span> <span class="n">kwnames</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">value</span> <span class="o">=</span> <span class="n">kwargs</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">...</span>
<span class="cm">/* Speed hack: do raw pointer compares. As names are</span>
<span class="cm"> normally interned this should almost always hit. */</span>
<span class="n">co_varnames</span> <span class="o">=</span> <span class="p">((</span><span class="n">PyTupleObject</span> <span class="o">*</span><span class="p">)(</span><span class="n">co</span><span class="o">-></span><span class="n">co_varnames</span><span class="p">))</span><span class="o">-></span><span class="n">ob_item</span><span class="p">;</span>
<span class="hll"> <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">co_posonlyargcount</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">total_args</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">name</span> <span class="o">=</span> <span class="n">co_varnames</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">name</span> <span class="o">==</span> <span class="n">keyword</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">kw_found</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">kwdict</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_posonlyargcount</span>
<span class="o">&&</span> <span class="n">positional_only_passed_as_keyword</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span>
<span class="n">kwcount</span><span class="p">,</span> <span class="n">kwnames</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">_PyErr_Format</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">PyExc_TypeError</span><span class="p">,</span>
<span class="s">"%U() got an unexpected keyword argument '%S'"</span><span class="p">,</span>
<span class="n">co</span><span class="o">-></span><span class="n">co_name</span><span class="p">,</span> <span class="n">keyword</span><span class="p">);</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyDict_SetItem</span><span class="p">(</span><span class="n">kwdict</span><span class="p">,</span> <span class="n">keyword</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">continue</span><span class="p">;</span>
<span class="nl">kw_found</span><span class="p">:</span>
<span class="p">...</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">SETLOCAL</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>At the end of the loop, you’ll see a call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L1005"><code>SETLOCAL()</code></a> with the value. If a keyword argument is defined with a value, that is available within this scope:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">example</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">,</span> <span class="n">example_kwarg</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">example_kwarg</span><span class="p">)</span> <span class="c1"># example_kwarg is already a local variable.</span>
</pre></div>
<h4 id="7-adding-missing-positional-arguments">7. Adding Missing Positional Arguments</h4>
<p>Any positional arguments provided to a function call that are not in the list of positional arguments are added to a <code>*args</code> tuple if this tuple does not exist, a failure is raised:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Add missing positional arguments (copy default values from defs) */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argcount</span> <span class="o"><</span> <span class="n">co</span><span class="o">-></span><span class="n">co_argcount</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_ssize_t</span> <span class="n">m</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">co_argcount</span> <span class="o">-</span> <span class="n">defcount</span><span class="p">;</span>
<span class="n">Py_ssize_t</span> <span class="n">missing</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">argcount</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">m</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">GETLOCAL</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">missing</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">missing</span><span class="p">)</span> <span class="p">{</span>
<span class="n">missing_arguments</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">missing</span><span class="p">,</span> <span class="n">defcount</span><span class="p">,</span> <span class="n">fastlocals</span><span class="p">);</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">></span> <span class="n">m</span><span class="p">)</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="n">m</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">defcount</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">GETLOCAL</span><span class="p">(</span><span class="n">m</span><span class="o">+</span><span class="n">i</span><span class="p">)</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">def</span> <span class="o">=</span> <span class="n">defs</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">def</span><span class="p">);</span>
<span class="n">SETLOCAL</span><span class="p">(</span><span class="n">m</span><span class="o">+</span><span class="n">i</span><span class="p">,</span> <span class="n">def</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="8-adding-missing-keyword-arguments">8. Adding Missing Keyword Arguments</h4>
<p>Any keyword arguments provided to a function call that are not in the list of named keyword arguments are added to a <code>**kwargs</code> dictionary if this dictionary does not exist, a failure is raised:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Add missing keyword arguments (copy default values from kwdefs) */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_kwonlyargcount</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_ssize_t</span> <span class="n">missing</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">co_argcount</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">total_args</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">name</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">GETLOCAL</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">continue</span><span class="p">;</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">PyTuple_GET_ITEM</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_varnames</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">kwdefs</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">def</span> <span class="o">=</span> <span class="n">PyDict_GetItemWithError</span><span class="p">(</span><span class="n">kwdefs</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="n">missing</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h4 id="9-collapsing-closures">9. Collapsing Closures</h4>
<p>Any closure names are added to the code object’s list of free variable names:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Copy closure variables to free variables */</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">PyTuple_GET_SIZE</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_freevars</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">o</span> <span class="o">=</span> <span class="n">PyTuple_GET_ITEM</span><span class="p">(</span><span class="n">closure</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">o</span><span class="p">);</span>
<span class="n">freevars</span><span class="p">[</span><span class="n">PyTuple_GET_SIZE</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_cellvars</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="10-creating-generators-coroutines-and-asynchronous-generators">10. Creating Generators, Coroutines, and Asynchronous Generators</h4>
<p>If the evaluated code object has a flag that it is a generator, coroutine or async generator, then a new frame is created using one of the unique methods in the Generator, Coroutine or Async libraries and the current frame is added as a property.</p>
<p>The new frame is then returned, and the original frame is not evaluated. The frame is only evaluated when the generator/coroutine/async method is called on to execute its target:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Handle generator/coroutine/asynchronous generator */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="p">(</span><span class="n">CO_GENERATOR</span> <span class="o">|</span> <span class="n">CO_COROUTINE</span> <span class="o">|</span> <span class="n">CO_ASYNC_GENERATOR</span><span class="p">))</span> <span class="p">{</span>
<span class="p">...</span>
<span class="cm">/* Create a new generator that owns the ready to run frame</span>
<span class="cm"> * and return that as the value. */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">is_coro</span><span class="p">)</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyCoro_New</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_ASYNC_GENERATOR</span><span class="p">)</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyAsyncGen_New</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyGen_NewWithQualName</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">gen</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Lastly, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L738"><code>PyEval_EvalFrameEx()</code></a> is called with the new frame:</p>
<div class="highlight c"><pre><span></span> <span class="n">retval</span> <span class="o">=</span> <span class="n">PyEval_EvalFrameEx</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="mi">0</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h4 id="frame-execution">Frame Execution</h4>
<p>As covered earlier in the compiler and AST chapters, the code object contains a binary encoding of the bytecode to be executed. It also contains a list of variables and a symbol table.</p>
<p>The local and global variables are determined at runtime based on how that function, module, or block was called. This information is added to the frame by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a> function. There are other usages of frames, like the coroutine decorator, which dynamically generates a frame with the target as a variable.</p>
<p>The public API, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L738"><code>PyEval_EvalFrameEx()</code></a> calls the interpreter’s configured frame evaluation function in the <code>eval_frame</code> property. Frame evaluation was <a href="https://www.python.org/dev/peps/pep-0523/">made pluggable in Python 3.7 with PEP 523</a>.</p>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L745"><code>_PyEval_EvalFrameDefault()</code></a> is the default function, and it is unusual to use anything other than this. </p>
<p>Frames are executed in the main execution loop inside <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L745"><code>_PyEval_EvalFrameDefault()</code></a>. This function is central function that brings everything together and brings your code to life. It contains decades of optimization since even a single line of code can have a significant impact on performance for the whole of CPython.</p>
<p>Everything that gets executed in CPython goes through this function.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Something you might notice when reading <code>ceval.c</code>, is how many times C macros have been used. C Macros are a way of having DRY-compliant code without the overhead of making function calls. The compiler converts the macros into C code and then compile the generated code. </p>
<p>If you want to see the expanded code, you can run <code>gcc -E</code> on Linux and macOS:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> gcc -E Python/ceval.c
</pre></div>
<p>Alternatively, <a href="https://realpython.com/python-development-visual-studio-code/">Visual Studio Code</a> can do inline macro expansion once you have installed the official C/C++ extension:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-25_at_3.33.40_pm.c240b1a46e99.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-25_at_3.33.40_pm.c240b1a46e99.png" width="1866" height="622" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-25_at_3.33.40_pm.c240b1a46e99.png&w=466&sig=28bc3a2f62e499a92fdac8fbd951c6ab5a18d361 466w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-25_at_3.33.40_pm.c240b1a46e99.png&w=933&sig=ec297187398d1a9b390bdadedd2c96855b2dfacc 933w, https://files.realpython.com/media/Screen_Shot_2019-03-25_at_3.33.40_pm.c240b1a46e99.png 1866w" sizes="75vw" alt="C Macro expansion with VScode"/></a>
</p>
</div>
<p>We can step through frame execution in Python 3.7 and beyond by enabling the tracing attribute on the current thread.</p>
<p>This code example sets the global tracing function to a function called <code>trace()</code> that gets the stack from the current frame, prints the disassembled opcodes to the screen, and some extra information for debugging:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">dis</span>
<span class="kn">import</span> <span class="nn">traceback</span>
<span class="kn">import</span> <span class="nn">io</span>
<span class="k">def</span> <span class="nf">trace</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="n">event</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
<span class="n">frame</span><span class="o">.</span><span class="n">f_trace_opcodes</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">stack</span> <span class="o">=</span> <span class="n">traceback</span><span class="o">.</span><span class="n">extract_stack</span><span class="p">(</span><span class="n">frame</span><span class="p">)</span>
<span class="n">pad</span> <span class="o">=</span> <span class="s2">" "</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span> <span class="o">+</span> <span class="s2">"|"</span>
<span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="s1">'opcode'</span><span class="p">:</span>
<span class="k">with</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span> <span class="k">as</span> <span class="n">out</span><span class="p">:</span>
<span class="n">dis</span><span class="o">.</span><span class="n">disco</span><span class="p">(</span><span class="n">frame</span><span class="o">.</span><span class="n">f_code</span><span class="p">,</span> <span class="n">frame</span><span class="o">.</span><span class="n">f_lasti</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">out</span><span class="p">)</span>
<span class="n">lines</span> <span class="o">=</span> <span class="n">out</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="p">[</span><span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}{l}</span><span class="s2">"</span><span class="p">)</span> <span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="n">lines</span><span class="p">]</span>
<span class="k">elif</span> <span class="n">event</span> <span class="o">==</span> <span class="s1">'call'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}</span><span class="s2">Calling </span><span class="si">{frame.f_code}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">event</span> <span class="o">==</span> <span class="s1">'return'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}</span><span class="s2">Returning </span><span class="si">{args}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">event</span> <span class="o">==</span> <span class="s1">'line'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}</span><span class="s2">Changing line to </span><span class="si">{frame.f_lineno}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}{frame}</span><span class="s2"> (</span><span class="si">{event}</span><span class="s2"> - </span><span class="si">{args}</span><span class="s2">)"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">"</span><span class="si">{pad}</span><span class="s2">----------------------------------"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">trace</span>
<span class="n">sys</span><span class="o">.</span><span class="n">settrace</span><span class="p">(</span><span class="n">trace</span><span class="p">)</span>
<span class="c1"># Run some code for a demo</span>
<span class="nb">eval</span><span class="p">(</span><span class="s1">'"-".join([letter for letter in "hello"])'</span><span class="p">)</span>
</pre></div>
<p>This prints the code within each stack and point to the next operation before it is executed. When a frame returns a value, the return statement is printed:</p>
<p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-25_at_1.21.07_pm.7b03e9032f62.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-25_at_1.21.07_pm.7b03e9032f62.png" width="2572" height="1686" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-25_at_1.21.07_pm.7b03e9032f62.png&w=643&sig=6d6106abaf9fb5f3d7e45e09320c7575b3a0e77c 643w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-25_at_1.21.07_pm.7b03e9032f62.png&w=1286&sig=c41bf26bd15641fded6a31125b0cb5b630496647 1286w, https://files.realpython.com/media/Screen_Shot_2019-03-25_at_1.21.07_pm.7b03e9032f62.png 2572w" sizes="75vw" alt="Evaluating frame with tracing"/></a></p>
<p>The full list of instructions is available on the <a href="https://docs.python.org/3/library/dis.html#python-bytecode-instructions"><code>dis</code> module</a> documentation. </p>
<h4 id="the-value-stack">The Value Stack</h4>
<p>Inside the core evaluation loop, a value stack is created. This stack is a list of pointers to sequential <code>PyObject</code> instances.</p>
<p>One way to think of the value stack is like a wooden peg on which you can stack cylinders. You would only add or remove one item at a time. This is done using the <code>PUSH(a)</code> macro, where <code>a</code> is a pointer to a <code>PyObject</code>.</p>
<p>For example, if you created a <code>PyLong</code> with the value 10 and pushed it onto the value stack:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>
</pre></div>
<p>This action would have the following effect:</p>
<p><a href="https://files.realpython.com/media/stacks_push.0c755d83b347.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_push.0c755d83b347.png" width="822" height="279" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_push.0c755d83b347.png&w=205&sig=d3ce00e768e8c7f89ef321ea039b159131b39d8a 205w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_push.0c755d83b347.png&w=411&sig=65cf700de41fd59781082f03c5d347fd891ac5a9 411w, https://files.realpython.com/media/stacks_push.0c755d83b347.png 822w" sizes="75vw" alt="PUSH()"/></a></p>
<p>In the next operation, to fetch that value, you would use the <code>POP()</code> macro to take the top value from the stack:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="n">POP</span><span class="p">();</span> <span class="c1">// a is PyLongObject with a value of 10</span>
</pre></div>
<p>This action would return the top value and end up with an empty value stack:</p>
<p><a href="https://files.realpython.com/media/stacks_pop.85872baaa8c3.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_pop.85872baaa8c3.png" width="778" height="226" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pop.85872baaa8c3.png&w=194&sig=887719d9b65d3ef671e2310aae15904e04acba28 194w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pop.85872baaa8c3.png&w=389&sig=6950fcd6370022df61559d12b74c6b685a61e238 389w, https://files.realpython.com/media/stacks_pop.85872baaa8c3.png 778w" sizes="75vw" alt="POP()"/></a></p>
<p>If you were to add 2 values to the stack:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">20</span><span class="p">);</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
</pre></div>
<p>They would end up in the order in which they were added, so <code>a</code> would be pushed to the second position in the stack:</p>
<p><a href="https://files.realpython.com/media/stacks_pushpush.e052996db029.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_pushpush.e052996db029.png" width="783" height="226" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pushpush.e052996db029.png&w=195&sig=47cc1b6b30f314f1589eb9c8188efcd3eb0ef79e 195w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pushpush.e052996db029.png&w=391&sig=a0813bcc81d314c3f2f0eb146634ba187d447634 391w, https://files.realpython.com/media/stacks_pushpush.e052996db029.png 783w" sizes="75vw" alt="PUSH();PUSH()"/></a></p>
<p>If you were to fetch the top value in the stack, you would get a pointer to <code>b</code> because it is at the top:</p>
<p><a href="https://files.realpython.com/media/stacks_pop2.af5068718f92.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_pop2.af5068718f92.png" width="789" height="226" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pop2.af5068718f92.png&w=197&sig=1856b32d5f95f58eb2ad60a549165f85bf2b4d7a 197w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_pop2.af5068718f92.png&w=394&sig=ce3ba993ee2fa8cb037bb84144c73f95450954ac 394w, https://files.realpython.com/media/stacks_pop2.af5068718f92.png 789w" sizes="75vw" alt="POP();"/></a></p>
<p>If you need to fetch the pointer to the top value in the stack without popping it, you can use the <code>PEEK(v)</code> operation, where <code>v</code> is the stack position:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">first</span> <span class="o">=</span> <span class="n">PEEK</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
</pre></div>
<p>0 represents the top of the stack, 1 would be the second position:</p>
<p><a href="https://files.realpython.com/media/stacks_peek.b00bde86bc7b.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_peek.b00bde86bc7b.png" width="802" height="174" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_peek.b00bde86bc7b.png&w=200&sig=fac4fef886154a979f9abe7316e9468179c14c71 200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_peek.b00bde86bc7b.png&w=401&sig=463aeb044f65a4c446a37c71cd86ae71564a2ec6 401w, https://files.realpython.com/media/stacks_peek.b00bde86bc7b.png 802w" sizes="75vw" alt="PEEK()"/></a></p>
<p>To clone the value at the top of the stack, the <code>DUP_TWO()</code> macro can be used, or by using the <code>DUP_TWO</code> opcode:</p>
<div class="highlight c"><pre><span></span><span class="n">DUP_TOP</span><span class="p">();</span>
</pre></div>
<p>This action would copy the value at the top to form 2 pointers to the same object:</p>
<p><a href="https://files.realpython.com/media/stacks_duptop.9b2eafe67375.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_duptop.9b2eafe67375.png" width="799" height="177" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_duptop.9b2eafe67375.png&w=199&sig=3d40e1422f0aa5f3b50b8b174d02f0fe5df54364 199w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_duptop.9b2eafe67375.png&w=399&sig=11e2d5a69b3dd385c9201d7ab4eeb902f8fa20c9 399w, https://files.realpython.com/media/stacks_duptop.9b2eafe67375.png 799w" sizes="75vw" alt="DUP_TOP()"/></a></p>
<p>There is a rotation macro <code>ROT_TWO</code> that swaps the first and second values:</p>
<p><a href="https://files.realpython.com/media/stacks_rottwo.2990d9b3ecc7.png" target="_blank"><img class="img-fluid " src="https://files.realpython.com/media/stacks_rottwo.2990d9b3ecc7.png" width="803" height="175" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_rottwo.2990d9b3ecc7.png&w=200&sig=e3e88549e9801f463358ad1e8b5635ba56b77049 200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/stacks_rottwo.2990d9b3ecc7.png&w=401&sig=d2c11f545336f13fa57bc1588f57b7e7f8dc049f 401w, https://files.realpython.com/media/stacks_rottwo.2990d9b3ecc7.png 803w" sizes="75vw" alt="ROT_TWO()"/></a></p>
<p>Each of the opcodes have a predefined “stack effect,” calculated by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L878"><code>stack_effect()</code></a> function inside <code>Python/compile.c</code>. This function returns the delta in the number of values inside the stack for each opcode.</p>
<h4 id="example-adding-an-item-to-a-list">Example: Adding an Item to a List</h4>
<p>In Python, when you create a list, the <code>.append()</code> method is available on the list object:</p>
<div class="highlight python"><pre><span></span><span class="n">my_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">my_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
</pre></div>
<p>Where <code>obj</code> is an object, you want to append to the end of the list.</p>
<p>There are 2 operations involved in this operation. <code>LOAD_FAST</code>, to load the object <code>obj</code> to the top of the value stack from the list of <code>locals</code> in the frame, and <code>LIST_APPEND</code> to add the object.</p>
<p>First exploring <code>LOAD_FAST</code>, there are 5 steps:</p>
<ol>
<li>
<p>The pointer to <code>obj</code> is loaded from <code>GETLOCAL()</code>, where the variable to load is the operation argument. The list of variable pointers is stored in <code>fastlocals</code>, which is a copy of the PyFrame attribute <code>f_localsplus</code>. The operation argument is a number, pointing to the index in the <code>fastlocals</code> array pointer. This means that the loading of a local is simply a copy of the pointer instead of having to look up the variable name.</p>
</li>
<li>
<p>If variable no longer exists, an unbound local variable error is raised.</p>
</li>
<li>
<p>The reference counter for <code>value</code> (in our case, <code>obj</code>) is increased by 1.</p>
</li>
<li>
<p>The pointer to <code>obj</code> is pushed to the top of the value stack.</p>
</li>
<li>
<p>The <code>FAST_DISPATCH</code> macro is called, if tracing is enabled, the loop goes over again (with all the tracing), if tracing is not enabled, a <code>goto</code> is called to <code>fast_next_opcode</code>, which jumps back to the top of the loop for the next instruction.</p>
</li>
</ol>
<div class="highlight c"><pre><span></span> <span class="p">...</span>
<span class="k">case</span> <span class="n">TARGET</span><span class="p">(</span><span class="n">LOAD_FAST</span><span class="p">)</span><span class="o">:</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">value</span> <span class="o">=</span> <span class="n">GETLOCAL</span><span class="p">(</span><span class="n">oparg</span><span class="p">);</span> <span class="c1">// 1.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">format_exc_check_arg</span><span class="p">(</span>
<span class="n">PyExc_UnboundLocalError</span><span class="p">,</span>
<span class="n">UNBOUNDLOCAL_ERROR_MSG</span><span class="p">,</span>
<span class="n">PyTuple_GetItem</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_varnames</span><span class="p">,</span> <span class="n">oparg</span><span class="p">));</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span> <span class="c1">// 2.</span>
<span class="p">}</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">value</span><span class="p">);</span> <span class="c1">// 3.</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">value</span><span class="p">);</span> <span class="c1">// 4.</span>
<span class="n">FAST_DISPATCH</span><span class="p">();</span> <span class="c1">// 5.</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>Now the pointer to <code>obj</code> is at the top of the value stack. The next instruction <code>LIST_APPEND</code> is run.</p>
<p>Many of the bytecode operations are referencing the base types, like PyUnicode, PyNumber. For example, <code>LIST_APPEND</code> appends an object to the end of a list. To achieve this, it pops the pointer from the value stack and returns the pointer to the last object in the stack. The macro is a shortcut for: </p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="p">(</span><span class="o">*--</span><span class="n">stack_pointer</span><span class="p">);</span>
</pre></div>
<p>Now the pointer to <code>obj</code> is stored as <code>v</code>. The list pointer is loaded from <code>PEEK(oparg)</code>.</p>
<p>Then the C API for Python lists is called for <code>list</code> and <code>v</code>. The code for this is inside <code>Objects/listobject.c</code>, which we go into in the next chapter.</p>
<p>A call to <code>PREDICT</code> is made, which guesses that the next operation will be <code>JUMP_ABSOLUTE</code>. The <code>PREDICT</code> macro has compiler-generated <code>goto</code> statements for each of the potential operations’ <code>case</code> statements. This means the CPU can jump to that instruction and not have to go through the loop again:</p>
<div class="highlight c"><pre><span></span> <span class="p">...</span>
<span class="k">case</span> <span class="n">TARGET</span><span class="p">(</span><span class="n">LIST_APPEND</span><span class="p">)</span><span class="o">:</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="n">POP</span><span class="p">();</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">list</span> <span class="o">=</span> <span class="n">PEEK</span><span class="p">(</span><span class="n">oparg</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">err</span><span class="p">;</span>
<span class="hll"> <span class="n">err</span> <span class="o">=</span> <span class="n">PyList_Append</span><span class="p">(</span><span class="n">list</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span>
</span> <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="n">PREDICT</span><span class="p">(</span><span class="n">JUMP_ABSOLUTE</span><span class="p">);</span>
<span class="n">DISPATCH</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<div class="alert alert-primary" role="alert">
<p><strong>Opcode predictions:</strong>
Some opcodes tend to come in pairs thus making it possible to predict the second code when the first is run. For example, <code>COMPARE_OP</code> is often followed by <code>POP_JUMP_IF_FALSE</code> or <code>POP_JUMP_IF_TRUE</code>.</p>
<p>“Verifying the prediction costs a single high-speed test of a register variable against a constant. If the pairing was good, then the processor’s own internal branch predication has a high likelihood of success, resulting in a nearly zero-overhead transition to the next opcode. A successful prediction saves a trip through the eval-loop including its unpredictable switch-case branch. Combined with the processor’s internal branch prediction, a successful PREDICT has the effect of making the two opcodes run as if they were a single new opcode with the bodies combined.”</p>
<p>If collecting opcode statistics, you have two choices:</p>
<ol>
<li>Keep the predictions turned-on and interpret the results as if some opcodes had been combined</li>
<li>Turn off predictions so that the opcode frequency counter updates for both opcodes</li>
</ol>
<p>Opcode prediction is disabled with threaded code since the latter allows the CPU to record separate branch prediction information for each opcode.</p>
</div>
<p>Some of the operations, such as <code>CALL_FUNCTION</code>, <code>CALL_METHOD</code>, have an operation argument referencing another compiled function. In these cases, another frame is pushed to the frame stack in the thread, and the evaluation loop is run for that function until the function completes. Each time a new frame is created and pushed onto the stack, the value of the frame’s <code>f_back</code> is set to the current frame before the new one is created.</p>
<p>This nesting of frames is clear when you see a stack trace, take this example script:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">function2</span><span class="p">():</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span>
<span class="k">def</span> <span class="nf">function1</span><span class="p">():</span>
<span class="n">function2</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">function1</span><span class="p">()</span>
</pre></div>
<p>Calling this on the command line will give you:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe example_stack.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "example_stack.py", line 8, in <module></span>
<span class="go"> function1()</span>
<span class="go"> File "example_stack.py", line 5, in function1</span>
<span class="go"> function2()</span>
<span class="go"> File "example_stack.py", line 2, in function2</span>
<span class="go"> raise RuntimeError</span>
<span class="go">RuntimeError</span>
</pre></div>
<p>In <code>traceback.py</code>, the <code>walk_stack()</code> function used to print trace backs:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">walk_stack</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="sd">"""Walk a stack yielding the frame and line number for each frame.</span>
<span class="sd"> This will follow f.f_back from the given frame. If no frame is given, the</span>
<span class="sd"> current stack is used. Usually used with StackSummary.extract.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">f</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">_getframe</span><span class="p">()</span><span class="o">.</span><span class="n">f_back</span><span class="o">.</span><span class="n">f_back</span>
<span class="k">while</span> <span class="n">f</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">f</span><span class="p">,</span> <span class="n">f</span><span class="o">.</span><span class="n">f_lineno</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">f_back</span>
</pre></div>
<p>Here you can see that the current frame, fetched by calling <code>sys._getframe()</code> and the parent’s parent is set as the frame, because you don’t want to see the call to <code>walk_stack()</code> or <code>print_trace()</code> in the trace back, so those function frames are skipped.</p>
<p>Then the <code>f_back</code> pointer is followed to the top.</p>
<p><code>sys._getframe()</code> is the Python API to get the <code>frame</code> attribute of the current thread.</p>
<p>Here is how that frame stack would look visually, with 3 frames each with its code object and a thread state pointing to the current frame:</p>
<p><a href="https://files.realpython.com/media/Frame_Stack_Example.20728854763c.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Frame_Stack_Example.20728854763c.png" width="841" height="1556" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Frame_Stack_Example.20728854763c.png&w=210&sig=bb2e7a72a4363554566003b8412f81c843a8edc2 210w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Frame_Stack_Example.20728854763c.png&w=420&sig=17eddfb3f22ccab1056545af3055470669c4ce3f 420w, https://files.realpython.com/media/Frame_Stack_Example.20728854763c.png 841w" sizes="75vw" alt="Example frame stack"/></a></p>
<h3 id="conclusion_3">Conclusion</h3>
<p>In this Part, you explored the most complex element of CPython: the compiler. The original author of Python, Guido van Rossum, made the statement that CPython’s compiler should be “dumb” so that people can understand it.</p>
<p>By breaking down the compilation process into small, logical steps, it is far easier to understand.</p>
<p>In the next chapter, we connect the compilation process with the basis of all Python code, the <code>object</code>.</p>
<h2 h1="h1" id="part-4-objects-in-cpython">Part 4: Objects in CPython</h2>
<p>CPython comes with a collection of basic types like strings, lists, tuples, dictionaries, and objects.</p>
<p>All of these types are built-in. You don’t need to import any libraries, even from the standard library. Also, the instantiation of these built-in types has some handy shortcuts.</p>
<p>For example, to create a new list, you can call:</p>
<div class="highlight python"><pre><span></span><span class="n">lst</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
</pre></div>
<p>Or, you can use square brackets:</p>
<div class="highlight python"><pre><span></span><span class="n">lst</span> <span class="o">=</span> <span class="p">[]</span>
</pre></div>
<p>Strings can be instantiated from a string-literal by using either double or single quotes. We explored the grammar definitions earlier that cause the compiler to interpret double quotes as a string literal. </p>
<p>All types in Python inherit from <code>object</code>, a built-in base type. Even strings, tuples, and list inherit from <code>object</code>. During the walk-through of the C code, you have read lots of references to <code>PyObject*</code>, the C-API structure for an <code>object</code>.</p>
<p>Because C is not object-oriented <a href="https://realpython.com/python3-object-oriented-programming/">like Python</a>, objects in C don’t inherit from one another. <code>PyObject</code> is the data structure for the beginning of the Python object’s memory.</p>
<p>Much of the base object API is declared in <code>Objects/object.c</code>, like the function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L505"><code>PyObject_Repr</code></a>, which the built-in <code>repr()</code> function. You will also find <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L810"><code>PyObject_Hash()</code></a> and other APIs.</p>
<p>All of these functions can be overridden in a custom object by implementing “dunder” methods on a Python object:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">MyObject</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s2">"<</span><span class="si">{0}</span><span class="s2"> id=</span><span class="si">{1}</span><span class="s2">>"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
</pre></div>
<p>This code is implemented in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L505"><code>PyObject_Repr()</code></a>, inside <code>Objects/object.c</code>. The type of the target object, <code>v</code> will be inferred through a call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/object.h#L122"><code>Py_TYPE()</code></a> and if the <code>tp_repr</code> field is set, then the function pointer is called.
If the <code>tp_repr</code> field is not set, i.e. the object doesn’t declare a custom <code>__repr__</code> method, then the default behavior is run, which is to return <code>"<%s object at %p>"</code> with the type name and the ID:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyObject_Repr</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">res</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyErr_CheckSignals</span><span class="p">())</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="n">PyUnicode_FromString</span><span class="p">(</span><span class="s">"<NULL>"</span><span class="p">);</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">Py_TYPE</span><span class="p">(</span><span class="n">v</span><span class="p">)</span><span class="o">-></span><span class="n">tp_repr</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
</span> <span class="k">return</span> <span class="n">PyUnicode_FromFormat</span><span class="p">(</span><span class="s">"<%s object at %p>"</span><span class="p">,</span>
<span class="n">v</span><span class="o">-></span><span class="n">ob_type</span><span class="o">-></span><span class="n">tp_name</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>The ob_type field for a given <code>PyObject*</code> will point to the data structure <code>PyTypeObject</code>, defined in <code>Include/cpython/object.h</code>.
This data-structure lists all the built-in functions, as fields and the arguments they should receive.</p>
<p>Take <code>tp_repr</code> as an example:</p>
<div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">_typeobject</span> <span class="p">{</span>
<span class="n">PyObject_VAR_HEAD</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">tp_name</span><span class="p">;</span> <span class="cm">/* For printing, in format "<module>.<name>" */</span>
<span class="n">Py_ssize_t</span> <span class="n">tp_basicsize</span><span class="p">,</span> <span class="n">tp_itemsize</span><span class="p">;</span> <span class="cm">/* For allocation */</span>
<span class="cm">/* Methods to implement standard operations */</span>
<span class="p">...</span>
<span class="hll"> <span class="n">reprfunc</span> <span class="n">tp_repr</span><span class="p">;</span>
</span></pre></div>
<p>Where <code>reprfunc</code> is a <code>typedef</code> for <code>PyObject *(*reprfunc)(PyObject *);</code>, a function that takes 1 pointer to <code>PyObject</code> (<code>self</code>).</p>
<p>Some of the dunder APIs are optional, because they only apply to certain types, like numbers:</p>
<div class="highlight c"><pre><span></span> <span class="cm">/* Method suites for standard classes */</span>
<span class="n">PyNumberMethods</span> <span class="o">*</span><span class="n">tp_as_number</span><span class="p">;</span>
<span class="n">PySequenceMethods</span> <span class="o">*</span><span class="n">tp_as_sequence</span><span class="p">;</span>
<span class="n">PyMappingMethods</span> <span class="o">*</span><span class="n">tp_as_mapping</span><span class="p">;</span>
</pre></div>
<p>A sequence, like a list would implement the following methods:</p>
<div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">lenfunc</span> <span class="n">sq_length</span><span class="p">;</span> <span class="c1">// len(v)</span>
<span class="n">binaryfunc</span> <span class="n">sq_concat</span><span class="p">;</span> <span class="c1">// v + x</span>
<span class="n">ssizeargfunc</span> <span class="n">sq_repeat</span><span class="p">;</span> <span class="c1">// for x in v</span>
<span class="n">ssizeargfunc</span> <span class="n">sq_item</span><span class="p">;</span> <span class="c1">// v[x]</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">was_sq_slice</span><span class="p">;</span> <span class="c1">// v[x:y:z]</span>
<span class="n">ssizeobjargproc</span> <span class="n">sq_ass_item</span><span class="p">;</span> <span class="c1">// v[x] = z</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">was_sq_ass_slice</span><span class="p">;</span> <span class="c1">// v[x:y] = z</span>
<span class="n">objobjproc</span> <span class="n">sq_contains</span><span class="p">;</span> <span class="c1">// x in v</span>
<span class="n">binaryfunc</span> <span class="n">sq_inplace_concat</span><span class="p">;</span>
<span class="n">ssizeargfunc</span> <span class="n">sq_inplace_repeat</span><span class="p">;</span>
<span class="p">}</span> <span class="n">PySequenceMethods</span><span class="p">;</span>
</pre></div>
<p>All of these built-in functions are called the <a href="https://docs.python.org/3/reference/datamodel.html">Python Data Model</a>. One of the great resources for the Python Data Model is <a href="https://www.oreilly.com/library/view/fluent-python/9781491946237/">“Fluent Python” by Luciano Ramalho</a>.</p>
<h3 id="base-object-type">Base Object Type</h3>
<p>In <code>Objects/object.c</code>, the base implementation of <code>object</code> type is written as pure C code. There are some concrete implementations of basic logic, like shallow comparisons.</p>
<p>Not all methods in a Python object are part of the Data Model, so that a Python object can contain attributes (either class or instance attributes) and methods.</p>
<p>A simple way to think of a Python object is consisting of 2 things:</p>
<ol>
<li>The core data model, with pointers to compiled functions</li>
<li>A dictionary with any custom attributes and methods</li>
</ol>
<p>The core data model is defined in the <code>PyTypeObject</code>, and the functions are defined in:</p>
<ul>
<li><code>Objects/object.c</code> for the built-in methods</li>
<li><code>Objects/boolobject.c</code> for the <code>bool</code> type</li>
<li><code>Objects/bytearrayobject.c</code> for the <code>byte[]</code> type</li>
<li><code>Objects/bytesobjects.c</code> for the <code>bytes</code> type</li>
<li><code>Objects/cellobject.c</code> for the <code>cell</code> type</li>
<li><code>Objects/classobject.c</code> for the abstract <code>class</code> type, used in meta-programming</li>
<li><code>Objects/codeobject.c</code> used for the built-in <code>code</code> object type</li>
<li><code>Objects/complexobject.c</code> for a complex numeric type</li>
<li><code>Objects/iterobject.c</code> for an iterator</li>
<li><code>Objects/listobject.c</code> for the <code>list</code> type</li>
<li><code>Objects/longobject.c</code> for the <code>long</code> numeric type</li>
<li><code>Objects/memoryobject.c</code> for the base memory type</li>
<li><code>Objects/methodobject.c</code> for the class method type</li>
<li><code>Objects/moduleobject.c</code> for a module type</li>
<li><code>Objects/namespaceobject.c</code> for a namespace type</li>
<li><code>Objects/odictobject.c</code> for an ordered dictionary type</li>
<li><code>Objects/rangeobject.c</code> for a range generator</li>
<li><code>Objects/setobject.c</code> for a <code>set</code> type</li>
<li><code>Objects/sliceobject.c</code> for a slice reference type</li>
<li><code>Objects/structseq.c</code> for a <a href="https://docs.python.org/3/library/struct.html#struct.Struct"><code>struct.Struct</code></a> type</li>
<li><code>Objects/tupleobject.c</code> for a <code>tuple</code> type</li>
<li><code>Objects/typeobject.c</code> for a <code>type</code> type</li>
<li><code>Objects/unicodeobject.c</code> for a <code>str</code> type</li>
<li><code>Objects/weakrefobject.c</code> for a <a href="https://docs.python.org/3/library/weakref.html"><code>weakref</code> object</a></li>
</ul>
<p>We’re going to dive into 3 of these types: </p>
<ol>
<li>Booleans </li>
<li>Integers</li>
<li>Generators</li>
</ol>
<p>Booleans and Integers have a lot in common, so we’ll cover those first.</p>
<h3 id="the-bool-and-long-integer-type">The Bool and Long Integer Type</h3>
<p>The <code>bool</code> type is the most straightforward implementation of the built-in types. It inherits from <code>long</code> and has the predefined constants, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/boolobject.h#L22"><code>Py_True</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/boolobject.h#L21"><code>Py_False</code></a>. These constants are immutable instances, created on the instantiation of the Python interpreter.</p>
<p>Inside <code>Objects/boolobject.c</code>, you can see the helper function to create a <code>bool</code> instance from a number:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="nf">PyBool_FromLong</span><span class="p">(</span><span class="kt">long</span> <span class="n">ok</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">result</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ok</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">Py_True</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">Py_False</span><span class="p">;</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>This function uses the C evaluation of a numeric type to assign <code>Py_True</code> or <code>Py_False</code> to a result and increment the reference counters.</p>
<p>The numeric functions for <code>and</code>, <code>xor</code>, and <code>or</code> are implemented, but addition, subtraction, and division are dereferenced from the base long type since it would make no sense to divide two boolean values.</p>
<p>The implementation of <code>and</code> for a <code>bool</code> value checks if <code>a</code> and <code>b</code> are booleans, then check their references to <code>Py_True</code>, otherwise, are cast as numbers, and the <code>and</code> operation is run on the two numbers:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">bool_and</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">PyBool_Check</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">PyBool_Check</span><span class="p">(</span><span class="n">b</span><span class="p">))</span>
<span class="k">return</span> <span class="n">PyLong_Type</span><span class="p">.</span><span class="n">tp_as_number</span><span class="o">-></span><span class="n">nb_and</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="k">return</span> <span class="n">PyBool_FromLong</span><span class="p">((</span><span class="n">a</span> <span class="o">==</span> <span class="n">Py_True</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">b</span> <span class="o">==</span> <span class="n">Py_True</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>The <code>long</code> type is a bit more complex, as the memory requirements are expansive. In the transition from Python 2 to 3, CPython dropped support for the <code>int</code> type and instead used the <code>long</code> type as the primary integer type. Python’s <code>long</code> type is quite special in that it can store a variable-length number. The maximum length is set in the compiled binary.</p>
<p>The data structure of a Python <code>long</code> consists of the <code>PyObject</code> header and a list of digits. The list of digits, <code>ob_digit</code> is initially set to have one digit, but it later expanded to a longer length when initialized:</p>
<div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">_longobject</span> <span class="p">{</span>
<span class="n">PyObject_VAR_HEAD</span>
<span class="n">digit</span> <span class="n">ob_digit</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>Memory is allocated to a new <code>long</code> through <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/longobject.c#L262"><code>_PyLong_New()</code></a>. This function takes a fixed length and makes sure it is smaller than <code>MAX_LONG_DIGITS</code>. Then it reallocates the memory for <code>ob_digit</code> to match the length.</p>
<p>To convert a C <code>long</code> type to a Python <code>long</code> type, the <code>long</code> is converted to a list of digits, the memory for the Python <code>long</code> is assigned, and then each of the digits is set.
Because <code>long</code> is initialized with <code>ob_digit</code> already being at a length of 1, if the number is less than 10, then the value is set without the memory being allocated:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyLong_FromLong</span><span class="p">(</span><span class="kt">long</span> <span class="n">ival</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyLongObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">abs_ival</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">t</span><span class="p">;</span> <span class="cm">/* unsigned so >> doesn't propagate sign bit */</span>
<span class="kt">int</span> <span class="n">ndigits</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">sign</span><span class="p">;</span>
<span class="n">CHECK_SMALL_INT</span><span class="p">(</span><span class="n">ival</span><span class="p">);</span>
<span class="p">...</span>
<span class="cm">/* Fast path for single-digit ints */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">abs_ival</span> <span class="o">>></span> <span class="n">PyLong_SHIFT</span><span class="p">))</span> <span class="p">{</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">_PyLong_New</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_SIZE</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="o">=</span> <span class="n">sign</span><span class="p">;</span>
<span class="n">v</span><span class="o">-></span><span class="n">ob_digit</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">Py_SAFE_DOWNCAST</span><span class="p">(</span>
<span class="n">abs_ival</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span><span class="p">,</span> <span class="n">digit</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="p">)</span><span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="cm">/* Larger numbers: loop to determine number of digits */</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">abs_ival</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
<span class="o">++</span><span class="n">ndigits</span><span class="p">;</span>
<span class="n">t</span> <span class="o">>>=</span> <span class="n">PyLong_SHIFT</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">_PyLong_New</span><span class="p">(</span><span class="n">ndigits</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">digit</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">v</span><span class="o">-></span><span class="n">ob_digit</span><span class="p">;</span>
<span class="n">Py_SIZE</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="o">=</span> <span class="n">ndigits</span><span class="o">*</span><span class="n">sign</span><span class="p">;</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">abs_ival</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">p</span><span class="o">++</span> <span class="o">=</span> <span class="n">Py_SAFE_DOWNCAST</span><span class="p">(</span>
<span class="n">t</span> <span class="o">&</span> <span class="n">PyLong_MASK</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span><span class="p">,</span> <span class="n">digit</span><span class="p">);</span>
<span class="n">t</span> <span class="o">>>=</span> <span class="n">PyLong_SHIFT</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="p">)</span><span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>To convert a <a href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format">double-point floating point</a> to a Python <code>long</code>, <code>PyLong_FromDouble()</code> does the math for you:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">PyLong_FromDouble</span><span class="p">(</span><span class="kt">double</span> <span class="n">dval</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyLongObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">frac</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">ndig</span><span class="p">,</span> <span class="n">expo</span><span class="p">,</span> <span class="n">neg</span><span class="p">;</span>
<span class="n">neg</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Py_IS_INFINITY</span><span class="p">(</span><span class="n">dval</span><span class="p">))</span> <span class="p">{</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_OverflowError</span><span class="p">,</span>
<span class="s">"cannot convert float infinity to integer"</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Py_IS_NAN</span><span class="p">(</span><span class="n">dval</span><span class="p">))</span> <span class="p">{</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_ValueError</span><span class="p">,</span>
<span class="s">"cannot convert float NaN to integer"</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">dval</span> <span class="o"><</span> <span class="mf">0.0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">neg</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">dval</span> <span class="o">=</span> <span class="o">-</span><span class="n">dval</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">frac</span> <span class="o">=</span> <span class="n">frexp</span><span class="p">(</span><span class="n">dval</span><span class="p">,</span> <span class="o">&</span><span class="n">expo</span><span class="p">);</span> <span class="cm">/* dval = frac*2**expo; 0.0 <= frac < 1.0 */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">expo</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">0L</span><span class="p">);</span>
<span class="n">ndig</span> <span class="o">=</span> <span class="p">(</span><span class="n">expo</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="n">PyLong_SHIFT</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="cm">/* Number of 'digits' in result */</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">_PyLong_New</span><span class="p">(</span><span class="n">ndig</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">frac</span> <span class="o">=</span> <span class="n">ldexp</span><span class="p">(</span><span class="n">frac</span><span class="p">,</span> <span class="p">(</span><span class="n">expo</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">PyLong_SHIFT</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">ndig</span><span class="p">;</span> <span class="o">--</span><span class="n">i</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">digit</span> <span class="n">bits</span> <span class="o">=</span> <span class="p">(</span><span class="n">digit</span><span class="p">)</span><span class="n">frac</span><span class="p">;</span>
<span class="n">v</span><span class="o">-></span><span class="n">ob_digit</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">bits</span><span class="p">;</span>
<span class="n">frac</span> <span class="o">=</span> <span class="n">frac</span> <span class="o">-</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">bits</span><span class="p">;</span>
<span class="n">frac</span> <span class="o">=</span> <span class="n">ldexp</span><span class="p">(</span><span class="n">frac</span><span class="p">,</span> <span class="n">PyLong_SHIFT</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">neg</span><span class="p">)</span>
<span class="n">Py_SIZE</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">Py_SIZE</span><span class="p">(</span><span class="n">v</span><span class="p">));</span>
<span class="k">return</span> <span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="p">)</span><span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The remainder of the implementation functions in <code>longobject.c</code> have utilities, such as converting a Unicode string into a number with <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/longobject.c#L2672"><code>PyLong_FromUnicodeObject()</code></a>.</p>
<h3 id="a-review-of-the-generator-type">A Review of the Generator Type</h3>
<p><a href="https://realpython.com/introduction-to-python-generators/">Python Generators</a> are functions which return a <code>yield</code> statement and can be called continually to generate further values.</p>
<p>Commonly they are used as a more memory efficient way of looping through values in a large block of data, like a file, a database or over a network. </p>
<p>Generator objects are returned in place of a value when <code>yield</code> is used instead of <code>return</code>. The generator object is created from the <code>yield</code> statement and returned to the caller.</p>
<p>Let’s create a simple generator with a list of 4 constant values:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">example</span><span class="p">():</span>
<span class="gp">... </span> <span class="n">lst</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">lst</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">yield</span> <span class="n">i</span>
<span class="gp">... </span>
<span class="gp">>>> </span><span class="n">gen</span> <span class="o">=</span> <span class="n">example</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">gen</span>
<span class="go"><generator object example at 0x100bcc480></span>
</pre></div>
<p>If you explore the contents of the generator object, you can see some of the fields starting with <code>gi_</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">dir</span><span class="p">(</span><span class="n">gen</span><span class="p">)</span>
<span class="go">[ ...</span>
<span class="go"> 'close', </span>
<span class="go"> 'gi_code', </span>
<span class="go"> 'gi_frame', </span>
<span class="go"> 'gi_running', </span>
<span class="go"> 'gi_yieldfrom', </span>
<span class="go"> 'send', </span>
<span class="go"> 'throw']</span>
</pre></div>
<p>The <code>PyGenObject</code> type is defined in <code>Include/genobject.h</code> and there are 3 flavors:</p>
<ol>
<li>Generator objects</li>
<li>Coroutine objects</li>
<li>Async generator objects</li>
</ol>
<p>All 3 share the same subset of fields used in generators, and have similar behaviors:</p>
<p><a href="https://files.realpython.com/media/generators.536b1404195a.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/generators.536b1404195a.png" width="612" height="264" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/generators.536b1404195a.png&w=153&sig=665718d32fbb9081b7983085eaeb6de7437ed6b6 153w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/generators.536b1404195a.png&w=306&sig=05fff378d8474cff821fbd585c9a57c5c5c51283 306w, https://files.realpython.com/media/generators.536b1404195a.png 612w" sizes="75vw" alt="Structure of generator types"/></a></p>
<p>Focusing first on generators, you can see the fields:</p>
<ul>
<li><code>gi_frame</code> linking to a <code>PyFrameObject</code> for the generator, earlier in the execution chapter, we explored the use of locals and globals inside a frame’s value stack. This is how generators remember the last value of local variables since the frame is persistent between calls</li>
<li><code>gi_running</code> set to 0 or 1 if the generator is currently running</li>
<li><code>gi_code</code> linking to a <code>PyCodeObject</code> with the compiled function that yielded the generator so that it can be called again</li>
<li><code>gi_weakreflist</code> linking to a list of weak references to objects inside the generator function</li>
<li><code>gi_name</code> as the name of the generator</li>
<li><code>gi_qualname</code> as the qualified name of the generator</li>
<li><code>gi_exc_state</code> as a tuple of exception data if the generator call raises an exception</li>
</ul>
<p>The coroutine and <a href="https://realpython.com/async-io-python/#other-features-async-for-and-async-generators-comprehensions">async generators</a> have the same fields but prepended with <code>cr</code> and <code>ag</code> respectively.</p>
<p>If you call <code>__next__()</code> on the generator object, the next value is yielded until eventually a <code>StopIteration</code> is raised:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">gen</span><span class="o">.</span><span class="fm">__next__</span><span class="p">()</span>
<span class="go">1</span>
<span class="gp">>>> </span><span class="n">gen</span><span class="o">.</span><span class="fm">__next__</span><span class="p">()</span>
<span class="go">2</span>
<span class="gp">>>> </span><span class="n">gen</span><span class="o">.</span><span class="fm">__next__</span><span class="p">()</span>
<span class="go">3</span>
<span class="gp">>>> </span><span class="n">gen</span><span class="o">.</span><span class="fm">__next__</span><span class="p">()</span>
<span class="go">4</span>
<span class="gp">>>> </span><span class="n">gen</span><span class="o">.</span><span class="fm">__next__</span><span class="p">()</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="gr">StopIteration</span>
</pre></div>
<p>Each time <code>__next__()</code> is called, the code object inside the generators <code>gi_code</code> field is executed as a new frame and the return value is pushed to the value stack.</p>
<p>You can also see that <code>gi_code</code> is the compiled code object for the generator function by importing the <code>dis</code> module and disassembling the bytecode inside:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">gen</span> <span class="o">=</span> <span class="n">example</span><span class="p">()</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">dis</span>
<span class="gp">>>> </span><span class="n">dis</span><span class="o">.</span><span class="n">disco</span><span class="p">(</span><span class="n">gen</span><span class="o">.</span><span class="n">gi_code</span><span class="p">)</span>
<span class="go"> 2 0 LOAD_CONST 1 (1)</span>
<span class="go"> 2 LOAD_CONST 2 (2)</span>
<span class="go"> 4 LOAD_CONST 3 (3)</span>
<span class="go"> 6 LOAD_CONST 4 (4)</span>
<span class="go"> 8 BUILD_LIST 4</span>
<span class="go"> 10 STORE_FAST 0 (l)</span>
<span class="go"> 3 12 SETUP_LOOP 18 (to 32)</span>
<span class="go"> 14 LOAD_FAST 0 (l)</span>
<span class="go"> 16 GET_ITER</span>
<span class="go"> >> 18 FOR_ITER 10 (to 30)</span>
<span class="go"> 20 STORE_FAST 1 (i)</span>
<span class="go"> 4 22 LOAD_FAST 1 (i)</span>
<span class="go"> 24 YIELD_VALUE</span>
<span class="go"> 26 POP_TOP</span>
<span class="go"> 28 JUMP_ABSOLUTE 18</span>
<span class="go"> >> 30 POP_BLOCK</span>
<span class="go"> >> 32 LOAD_CONST 0 (None)</span>
<span class="go"> 34 RETURN_VALUE</span>
</pre></div>
<p>Whenever <code>__next__()</code> is called on a generator object, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L541"><code>gen_iternext()</code></a> is called with the generator instance, which immediately calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L153"><code>gen_send_ex()</code></a> inside <code>Objects/genobject.c</code>.</p>
<p><code>gen_send_ex()</code> is the function that converts a generator object into the next yielded result. You’ll see many similarities with the way frames are constructed in <code>Python/ceval.c</code> from a code object as these functions have similar tasks.</p>
<p>The <code>gen_send_ex()</code> function is shared with generators, coroutines, and async generators and has the following steps:</p>
<ol>
<li>
<p>The current thread state is fetched</p>
</li>
<li>
<p>The frame object from the generator object is fetched</p>
</li>
<li>
<p>If the generator is running when <code>__next__()</code> was called, raise a <code>ValueError</code></p>
</li>
<li>
<p>If the frame inside the generator is at the top of the stack:</p>
<ul>
<li>In the case of a coroutine, if the coroutine is not already marked as closing, a <code>RuntimeError</code> is raised</li>
<li>If this is an async generator, raise a <code>StopAsyncIteration</code></li>
<li>For a standard generator, a <code>StopIteration</code> is raised.</li>
</ul>
</li>
<li>
<p>If the last instruction in the frame (<code>f->f_lasti</code>) is still -1 because it has just been started, and this is a coroutine or async generator, then a non-None value can’t be passed as an argument, so an exception is raised</p>
</li>
<li>
<p>Else, this is the first time it’s being called, and arguments are allowed. The value of the argument is pushed to the frame’s value stack</p>
</li>
<li>
<p>The <code>f_back</code> field of the frame is the caller to which return values are sent, so this is set to the current frame in the thread. This means that the return value is sent to the caller, not the creator of the generator</p>
</li>
<li>
<p>The generator is marked as running</p>
</li>
<li>
<p>The last exception in the generator’s exception info is copied from the last exception in the thread state</p>
</li>
<li>
<p>The thread state exception info is set to the address of the generator’s exception info. This means that if the caller enters a breakpoint around the execution of a generator, the stack trace goes through the generator and the offending code is clear</p>
</li>
<li>
<p>The frame inside the generator is executed within the <code>Python/ceval.c</code> main execution loop, and the value returned</p>
</li>
<li>
<p>The thread state last exception is reset to the value before the frame was called</p>
</li>
<li>
<p>The generator is marked as not running</p>
</li>
<li>
<p>The following cases then match the return value and any exceptions thrown by the call to the generator. Remember that generators should raise a <code>StopIteration</code> when they are exhausted, either manually, or by not yielding a value. Coroutines and async generators should not:</p>
<ul>
<li>If no result was returned from the frame, raise a <code>StopIteration</code> for generators and <code>StopAsyncIteration</code> for async generators</li>
<li>If a <code>StopIteration</code> was explicitly raised, but this is a coroutine or an async generator, raise a <code>RuntimeError</code> as this is not allowed</li>
<li>If a <code>StopAsyncIteration</code> was explicitly raised and this is an async generator, raise a <code>RuntimeError</code>, as this is not allowed</li>
</ul>
</li>
<li>
<p>Lastly, the result is returned back to the caller of <code>__next__()</code></p>
</li>
</ol>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">gen_send_ex</span><span class="p">(</span><span class="n">PyGenObject</span> <span class="o">*</span><span class="n">gen</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">arg</span><span class="p">,</span> <span class="kt">int</span> <span class="n">exc</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closing</span><span class="p">)</span>
<span class="p">{</span>
<span class="hll"> <span class="n">PyThreadState</span> <span class="o">*</span><span class="n">tstate</span> <span class="o">=</span> <span class="n">_PyThreadState_GET</span><span class="p">();</span> <span class="c1">// 1.</span>
</span><span class="hll"> <span class="n">PyFrameObject</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">gen</span><span class="o">-></span><span class="n">gi_frame</span><span class="p">;</span> <span class="c1">// 2.</span>
</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">result</span><span class="p">;</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">gen</span><span class="o">-></span><span class="n">gi_running</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 3.</span>
</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">"generator already executing"</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyCoro_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"coroutine already executing"</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"async generator already executing"</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_ValueError</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">f</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">f</span><span class="o">-></span><span class="n">f_stacktop</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 4.</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">PyCoro_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="n">closing</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* `gen` is an exhausted coroutine: raise an error,</span>
<span class="cm"> except when called from gen_close(), which should</span>
<span class="cm"> always be a silent method. */</span>
<span class="hll"> <span class="n">PyErr_SetString</span><span class="p">(</span>
</span><span class="hll"> <span class="n">PyExc_RuntimeError</span><span class="p">,</span>
</span><span class="hll"> <span class="s">"cannot reuse already awaited coroutine"</span><span class="p">);</span> <span class="c1">// 4a.</span>
</span> <span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">arg</span> <span class="o">&&</span> <span class="o">!</span><span class="n">exc</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* `gen` is an exhausted generator:</span>
<span class="cm"> only set exception if called from send(). */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="hll"> <span class="n">PyErr_SetNone</span><span class="p">(</span><span class="n">PyExc_StopAsyncIteration</span><span class="p">);</span> <span class="c1">// 4b.</span>
</span> <span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="hll"> <span class="n">PyErr_SetNone</span><span class="p">(</span><span class="n">PyExc_StopIteration</span><span class="p">);</span> <span class="c1">// 4c.</span>
</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_lasti</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">arg</span> <span class="o">&&</span> <span class="n">arg</span> <span class="o">!=</span> <span class="n">Py_None</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 5.</span>
</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">"can't send non-None value to a "</span>
<span class="s">"just-started generator"</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyCoro_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">NON_INIT_CORO_MSG</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"can't send non-None value to a "</span>
<span class="s">"just-started async generator"</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_TypeError</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="hll"> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// 6.</span>
</span> <span class="cm">/* Push arg onto the frame's value stack */</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">arg</span> <span class="o">?</span> <span class="nl">arg</span> <span class="p">:</span> <span class="n">Py_None</span><span class="p">;</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="o">*</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_stacktop</span><span class="o">++</span><span class="p">)</span> <span class="o">=</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Generators always return to their most recent caller, not</span>
<span class="cm"> * necessarily their creator. */</span>
<span class="n">Py_XINCREF</span><span class="p">(</span><span class="n">tstate</span><span class="o">-></span><span class="n">frame</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_back</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="hll"> <span class="n">f</span><span class="o">-></span><span class="n">f_back</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">frame</span><span class="p">;</span> <span class="c1">// 7.</span>
</span>
<span class="hll"> <span class="n">gen</span><span class="o">-></span><span class="n">gi_running</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// 8.</span>
</span><span class="hll"> <span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">previous_item</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_info</span><span class="p">;</span> <span class="c1">// 9.</span>
</span><span class="hll"> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_info</span> <span class="o">=</span> <span class="o">&</span><span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">;</span> <span class="c1">// 10.</span>
</span><span class="hll"> <span class="n">result</span> <span class="o">=</span> <span class="n">PyEval_EvalFrameEx</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">exc</span><span class="p">);</span> <span class="c1">// 11.</span>
</span><span class="hll"> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_info</span> <span class="o">=</span> <span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">previous_item</span><span class="p">;</span> <span class="c1">// 12.</span>
</span> <span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">previous_item</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="hll"> <span class="n">gen</span><span class="o">-></span><span class="n">gi_running</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// 13.</span>
</span>
<span class="cm">/* Don't keep the reference to f_back any longer than necessary. It</span>
<span class="cm"> * may keep a chain of frames alive or it could create a reference</span>
<span class="cm"> * cycle. */</span>
<span class="n">assert</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_back</span> <span class="o">==</span> <span class="n">tstate</span><span class="o">-></span><span class="n">frame</span><span class="p">);</span>
<span class="n">Py_CLEAR</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_back</span><span class="p">);</span>
<span class="cm">/* If the generator just returned (as opposed to yielding), signal</span>
<span class="cm"> * that the generator is exhausted. */</span>
<span class="hll"> <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">&&</span> <span class="n">f</span><span class="o">-></span><span class="n">f_stacktop</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 14a.</span>
</span> <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="n">Py_None</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* Delay exception instantiation if we can */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">PyErr_SetNone</span><span class="p">(</span><span class="n">PyExc_StopAsyncIteration</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="n">PyErr_SetNone</span><span class="p">(</span><span class="n">PyExc_StopIteration</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="cm">/* Async generators cannot return anything but None */</span>
<span class="n">assert</span><span class="p">(</span><span class="o">!</span><span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">));</span>
<span class="n">_PyGen_SetStopIterationValue</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Py_CLEAR</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="hll"> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">result</span> <span class="o">&&</span> <span class="n">PyErr_ExceptionMatches</span><span class="p">(</span><span class="n">PyExc_StopIteration</span><span class="p">))</span> <span class="p">{</span> <span class="c1">// 14b.</span>
</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">"generator raised StopIteration"</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyCoro_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">))</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"coroutine raised StopIteration"</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">)</span> <span class="p">{</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"async generator raised StopIteration"</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">_PyErr_FormatFromCause</span><span class="p">(</span><span class="n">PyExc_RuntimeError</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">result</span> <span class="o">&&</span> <span class="n">PyAsyncGen_CheckExact</span><span class="p">(</span><span class="n">gen</span><span class="p">)</span> <span class="o">&&</span>
<span class="hll"> <span class="n">PyErr_ExceptionMatches</span><span class="p">(</span><span class="n">PyExc_StopAsyncIteration</span><span class="p">))</span> <span class="c1">// 14c.</span>
</span> <span class="p">{</span>
<span class="cm">/* code in `gen` raised a StopAsyncIteration error:</span>
<span class="cm"> raise a RuntimeError.</span>
<span class="cm"> */</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">"async generator raised StopAsyncIteration"</span><span class="p">;</span>
<span class="n">_PyErr_FormatFromCause</span><span class="p">(</span><span class="n">PyExc_RuntimeError</span><span class="p">,</span> <span class="s">"%s"</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="hll"> <span class="k">return</span> <span class="n">result</span><span class="p">;</span> <span class="c1">// 15.</span>
</span><span class="p">}</span>
</pre></div>
<p>Going back to the evaluation of code objects whenever a function or module is called, there was a special case for generators, coroutines, and async generators in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a>. This function checks for the <code>CO_GENERATOR</code>, <code>CO_COROUTINE</code>, and <code>CO_ASYNC_GENERATOR</code> flags on the code object.</p>
<p>When a new coroutine is created using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L1152"><code>PyCoro_New()</code></a>, a new async generator is created with <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L1428"><code>PyAsyncGen_New()</code></a> or a generator with <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L811"><code>PyGen_NewWithQualName()</code></a>. These objects are returned early instead of returning an evaluated frame, which is why you get a generator object after calling a function with a yield statement:</p>
<div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">_PyEval_EvalCodeWithName</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">_co</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="p">...</span>
<span class="p">...</span>
<span class="cm">/* Handle generator/coroutine/asynchronous generator */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="p">(</span><span class="n">CO_GENERATOR</span> <span class="o">|</span> <span class="n">CO_COROUTINE</span> <span class="o">|</span> <span class="n">CO_ASYNC_GENERATOR</span><span class="p">))</span> <span class="p">{</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">gen</span><span class="p">;</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">coro_wrapper</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">coroutine_wrapper</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">is_coro</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_COROUTINE</span><span class="p">;</span>
<span class="p">...</span>
<span class="cm">/* Create a new generator that owns the ready to run frame</span>
<span class="cm"> * and return that as the value. */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">is_coro</span><span class="p">)</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyCoro_New</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_flags</span> <span class="o">&</span> <span class="n">CO_ASYNC_GENERATOR</span><span class="p">)</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyAsyncGen_New</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">PyGen_NewWithQualName</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">qualname</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="k">return</span> <span class="n">gen</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>The flags in the code object were injected by the compiler after traversing the AST and seeing the <code>yield</code> or <code>yield from</code> statements or seeing the <code>coroutine</code> decorator.</p>
<p><code>PyGen_NewWithQualName()</code> will call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/genobject.c#L778"><code>gen_new_with_qualname()</code></a> with the generated frame and then create the <code>PyGenObject</code> with <code>NULL</code> values and the compiled code object:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">gen_new_with_qualname</span><span class="p">(</span><span class="n">PyTypeObject</span> <span class="o">*</span><span class="n">type</span><span class="p">,</span> <span class="n">PyFrameObject</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">name</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">qualname</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyGenObject</span> <span class="o">*</span><span class="n">gen</span> <span class="o">=</span> <span class="n">PyObject_GC_New</span><span class="p">(</span><span class="n">PyGenObject</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">gen</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Py_DECREF</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_frame</span> <span class="o">=</span> <span class="n">f</span><span class="p">;</span>
<span class="n">f</span><span class="o">-></span><span class="n">f_gen</span> <span class="o">=</span> <span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="p">)</span> <span class="n">gen</span><span class="p">;</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_code</span><span class="p">);</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_code</span> <span class="o">=</span> <span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="p">)(</span><span class="n">f</span><span class="o">-></span><span class="n">f_code</span><span class="p">);</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_running</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_weakreflist</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">exc_type</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">exc_value</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">exc_traceback</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_exc_state</span><span class="p">.</span><span class="n">previous_item</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">name</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_name</span> <span class="o">=</span> <span class="n">name</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_name</span> <span class="o">=</span> <span class="p">((</span><span class="n">PyCodeObject</span> <span class="o">*</span><span class="p">)</span><span class="n">gen</span><span class="o">-></span><span class="n">gi_code</span><span class="p">)</span><span class="o">-></span><span class="n">co_name</span><span class="p">;</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">gen</span><span class="o">-></span><span class="n">gi_name</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">qualname</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_qualname</span> <span class="o">=</span> <span class="n">qualname</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">gen</span><span class="o">-></span><span class="n">gi_qualname</span> <span class="o">=</span> <span class="n">gen</span><span class="o">-></span><span class="n">gi_name</span><span class="p">;</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">gen</span><span class="o">-></span><span class="n">gi_qualname</span><span class="p">);</span>
<span class="n">_PyObject_GC_TRACK</span><span class="p">(</span><span class="n">gen</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="p">)</span><span class="n">gen</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Bringing this all together you can see how the generator expression is a powerful syntax where a single keyword, <code>yield</code> triggers a whole flow to create a unique object, copy a compiled code object as a property, set a frame, and store a list of variables in the local scope.</p>
<p>To the user of the generator expression, this all seems like magic, but under the covers it’s not <em>that</em> complex.</p>
<h3 id="conclusion_4">Conclusion</h3>
<p>Now that you understand how some built-in types, you can explore other types. </p>
<p>When exploring Python classes, it is important to remember there are built-in types, written in C and classes inheriting from those types, written in Python or C.</p>
<p>Some libraries have types written in C instead of inheriting from the built-in types. One example is <code>numpy</code>, a library for numeric arrays. The <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html"><code>nparray</code></a> type is written in C, is highly efficient and performant.</p>
<p>In the next Part, we will explore the classes and functions defined in the standard library.</p>
<h2 h1="h1" id="part-5-the-cpython-standard-library">Part 5: The CPython Standard Library</h2>
<p>Python has always come “batteries included.” This statement means that with a standard CPython distribution, there are libraries for working with files, threads, networks, web sites, music, keyboards, screens, text, and a whole manner of utilities.</p>
<p>Some of the batteries that come with CPython are more like AA batteries. They’re useful for everything, like the <code>collections</code> module and the <code>sys</code> module. Some of them are a bit more obscure, like a small watch battery that you never know when it might come in useful.</p>
<p>There are 2 types of modules in the CPython standard library:</p>
<ol>
<li>Those written in pure Python that provides a utility</li>
<li>Those written in C with Python wrappers</li>
</ol>
<p>We will explore both types.</p>
<h3 id="python-modules">Python Modules</h3>
<p>The modules written in pure Python are all located in the <code>Lib/</code> directory in the source code. Some of the larger modules have submodules in subfolders, like the <code>email</code> module.</p>
<p>An easy module to look at would be the <code>colorsys</code> module. It’s only a few hundred lines of Python code. You may not have come across it before. The <code>colorsys</code> module has some utility functions for converting color scales.</p>
<p>When you install a Python distribution from source, standard library modules are copied from the <code>Lib</code> folder into the distribution folder. This folder is always part of your path when you start Python, so you can <code>import</code> the modules without having to worry about where they’re located. </p>
<p>For example:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">colorsys</span>
<span class="gp">>>> </span><span class="n">colorsys</span>
<span class="go"><module 'colorsys' from '/usr/shared/lib/python3.7/colorsys.py'></span>
<span class="gp">>>> </span><span class="n">colorsys</span><span class="o">.</span><span class="n">rgb_to_hls</span><span class="p">(</span><span class="mi">255</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">)</span>
<span class="go">(0.0, 127.5, -1.007905138339921) </span>
</pre></div>
<p>We can see the source code of <code>rgb_to_hls()</code> inside <code>Lib/colorsys.py</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># HLS: Hue, Luminance, Saturation</span>
<span class="c1"># H: position in the spectrum</span>
<span class="c1"># L: color lightness</span>
<span class="c1"># S: color saturation</span>
<span class="k">def</span> <span class="nf">rgb_to_hls</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="n">maxc</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="n">minc</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="c1"># XXX Can optimize (maxc+minc) and (maxc-minc)</span>
<span class="n">l</span> <span class="o">=</span> <span class="p">(</span><span class="n">minc</span><span class="o">+</span><span class="n">maxc</span><span class="p">)</span><span class="o">/</span><span class="mf">2.0</span>
<span class="k">if</span> <span class="n">minc</span> <span class="o">==</span> <span class="n">maxc</span><span class="p">:</span>
<span class="k">return</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">l</span><span class="p">,</span> <span class="mf">0.0</span>
<span class="k">if</span> <span class="n">l</span> <span class="o"><=</span> <span class="mf">0.5</span><span class="p">:</span>
<span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">maxc</span><span class="o">+</span><span class="n">minc</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mf">2.0</span><span class="o">-</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span>
<span class="n">rc</span> <span class="o">=</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">r</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span>
<span class="n">gc</span> <span class="o">=</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">g</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span>
<span class="n">bc</span> <span class="o">=</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">b</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">maxc</span><span class="o">-</span><span class="n">minc</span><span class="p">)</span>
<span class="k">if</span> <span class="n">r</span> <span class="o">==</span> <span class="n">maxc</span><span class="p">:</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">bc</span><span class="o">-</span><span class="n">gc</span>
<span class="k">elif</span> <span class="n">g</span> <span class="o">==</span> <span class="n">maxc</span><span class="p">:</span>
<span class="n">h</span> <span class="o">=</span> <span class="mf">2.0</span><span class="o">+</span><span class="n">rc</span><span class="o">-</span><span class="n">bc</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">h</span> <span class="o">=</span> <span class="mf">4.0</span><span class="o">+</span><span class="n">gc</span><span class="o">-</span><span class="n">rc</span>
<span class="n">h</span> <span class="o">=</span> <span class="p">(</span><span class="n">h</span><span class="o">/</span><span class="mf">6.0</span><span class="p">)</span> <span class="o">%</span> <span class="mf">1.0</span>
<span class="k">return</span> <span class="n">h</span><span class="p">,</span> <span class="n">l</span><span class="p">,</span> <span class="n">s</span>
</pre></div>
<p>There’s nothing special about this function, it’s just standard Python. You’ll find similar things with all of the pure Python standard library modules. They’re just written in plain Python, well laid out and easy to understand. You may even spot improvements or bugs, so you can make changes to them and contribute it to the Python distribution. We’ll cover that toward the end of this article.</p>
<h3 id="python-and-c-modules">Python and C Modules</h3>
<p>The remainder of modules are written in C, or a combination or Python and C. The source code for these is in <code>Lib/</code> for the Python component, and <code>Modules/</code> for the C component. There are two exceptions to this rule, the <code>sys</code> module, found in <code>Python/sysmodule.c</code> and the <code>__builtins__</code> module, found in <code>Python/bltinmodule.c</code>.</p>
<p>Python will <code>import * from __builtins__</code> when an interpreter is instantiated, so all of the functions like <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/bltinmodule.c#L1821"><code>print()</code></a>, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/bltinmodule.c#L688"><code>chr()</code></a>, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/bltinmodule.c#L672"><code>format()</code></a>, etc. are found within <code>Python/bltinmodule.c</code>.</p>
<p>Because the <code>sys</code> module is so specific to the interpreter and the internals of CPython, that is found inside the <code>Python</code> directly. It is also marked as an “implementation detail” of CPython and not found in other distributions.</p>
<p>The built-in <code>print()</code> function was probably the first thing you learned to do in Python. So what happens when you type <code>print("hello world!")</code>?</p>
<ol>
<li>The argument <code>"hello world"</code> was converted from a string constant to a <code>PyUnicodeObject</code> by the compiler</li>
<li><code>builtin_print()</code> was executed with 1 argument, and NULL <code>kwnames</code></li>
<li>The <code>file</code> variable is set to <code>PyId_stdout</code>, the system’s <code>stdout</code> handle</li>
<li>Each argument is sent to <code>file</code></li>
<li>A line break, <code>\n</code> is sent to <code>file</code></li>
</ol>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">builtin_print</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">self</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">Py_ssize_t</span> <span class="n">nargs</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">kwnames</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">file</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">file</span> <span class="o">==</span> <span class="n">Py_None</span><span class="p">)</span> <span class="p">{</span>
<span class="hll"> <span class="n">file</span> <span class="o">=</span> <span class="n">_PySys_GetObjectId</span><span class="p">(</span><span class="o">&</span><span class="n">PyId_stdout</span><span class="p">);</span>
</span> <span class="p">...</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">nargs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sep</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="hll"> <span class="n">err</span> <span class="o">=</span> <span class="n">PyFile_WriteString</span><span class="p">(</span><span class="s">" "</span><span class="p">,</span> <span class="n">file</span><span class="p">);</span>
</span> <span class="k">else</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">PyFile_WriteObject</span><span class="p">(</span><span class="n">sep</span><span class="p">,</span> <span class="n">file</span><span class="p">,</span>
<span class="n">Py_PRINT_RAW</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">PyFile_WriteObject</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">file</span><span class="p">,</span> <span class="n">Py_PRINT_RAW</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">end</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="hll"> <span class="n">err</span> <span class="o">=</span> <span class="n">PyFile_WriteString</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">file</span><span class="p">);</span>
</span> <span class="k">else</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">PyFile_WriteObject</span><span class="p">(</span><span class="n">end</span><span class="p">,</span> <span class="n">file</span><span class="p">,</span> <span class="n">Py_PRINT_RAW</span><span class="p">);</span>
<span class="p">...</span>
<span class="n">Py_RETURN_NONE</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The contents of some modules written in C expose operating system functions. Because the CPython source code needs to compile to macOS, Windows, Linux, and other *nix-based operating systems, there are some special cases.</p>
<p>The <code>time</code> module is a good example. The way that Windows keeps and stores time in the Operating System is fundamentally different than Linux and macOS. This is one of the reasons why the accuracy of the clock functions differs <a href="https://docs.python.org/3/library/time.html#time.clock_gettime_ns">between operating systems</a>.</p>
<p>In <code>Modules/timemodule.c</code>, the operating system time functions for Unix-based systems are imported from <code><sys/times.h></code>:</p>
<div class="highlight c"><pre><span></span><span class="cp">#ifdef HAVE_SYS_TIMES_H</span>
<span class="cp">#include</span> <span class="cpf"><sys/times.h></span><span class="cp"></span>
<span class="cp">#endif</span>
<span class="p">...</span>
<span class="cp">#ifdef MS_WINDOWS</span>
<span class="cp">#define WIN32_LEAN_AND_MEAN</span>
<span class="cp">#include</span> <span class="cpf"><windows.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">"pythread.h"</span><span class="cp"></span>
<span class="cp">#endif </span><span class="cm">/* MS_WINDOWS */</span><span class="cp"></span>
<span class="p">...</span>
</pre></div>
<p>Later in the file, <code>time_process_time_ns()</code> is defined as a wrapper for <code>_PyTime_GetProcessTimeWithInfo()</code>:</p>
<div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
<span class="nf">time_process_time_ns</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">self</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">unused</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">_PyTime_t</span> <span class="n">t</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">_PyTime_GetProcessTimeWithInfo</span><span class="p">(</span><span class="o">&</span><span class="n">t</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">_PyTime_AsNanosecondsObject</span><span class="p">(</span><span class="n">t</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/timemodule.c#L1120"><code>_PyTime_GetProcessTimeWithInfo()</code></a> is implemented multiple different ways in the source code, but only certain parts are compiled into the binary for the module, depending on the operating system. Windows systems will call <code>GetProcessTimes()</code> and Unix systems will call <code>clock_gettime()</code>.</p>
<p>Other modules that have multiple implementations for the same API are <a href="https://realpython.com/intro-to-python-threading/">the threading module</a>, the file system module, and the networking modules. Because the Operating Systems behave differently, the CPython source code implements the same behavior as best as it can and exposes it using a consistent, abstracted API.</p>
<h3 id="the-cpython-regression-test-suite">The CPython Regression Test Suite</h3>
<p>CPython has a robust and extensive test suite covering the core interpreter, the standard library, the tooling and distribution for both Windows and Linux/macOS.</p>
<p>The test suite is located in <code>Lib/test</code> and written almost entirely in Python.</p>
<p>The full test suite is a Python package, so can be run using the Python interpreter that you’ve compiled. Change directory to the <code>Lib</code> directory and run <code>python -m test -j2</code>, where <code>j2</code> means to use 2 CPUs.</p>
<p>On Windows use the <code>rt.bat</code> script inside the PCBuild folder, ensuring that you have built the <strong>Release</strong> configuration from Visual Studio in advance:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">cd</span> PCbuild
<span class="gp">$</span> rt.bat -q
<span class="go">C:\repos\cpython\PCbuild>"C:\repos\cpython\PCbuild\win32\python.exe" -u -Wd -E -bb -m test</span>
<span class="go">== CPython 3.8.0b4</span>
<span class="go">== Windows-10-10.0.17134-SP0 little-endian</span>
<span class="go">== cwd: C:\repos\cpython\build\test_python_2784</span>
<span class="go">== CPU count: 2</span>
<span class="go">== encodings: locale=cp1252, FS=utf-8</span>
<span class="go">Run tests sequentially</span>
<span class="go">0:00:00 [ 1/420] test_grammar</span>
<span class="go">0:00:00 [ 2/420] test_opcodes</span>
<span class="go">0:00:00 [ 3/420] test_dict</span>
<span class="go">0:00:00 [ 4/420] test_builtin</span>
<span class="go">...</span>
</pre></div>
<p>On Linux:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">cd</span> Lib
<span class="gp">$</span> ../python -m <span class="nb">test</span> -j2
<span class="go">== CPython 3.8.0b4</span>
<span class="go">== macOS-10.14.3-x86_64-i386-64bit little-endian</span>
<span class="go">== cwd: /Users/anthonyshaw/cpython/build/test_python_23399</span>
<span class="go">== CPU count: 4</span>
<span class="go">== encodings: locale=UTF-8, FS=utf-8</span>
<span class="go">Run tests in parallel using 2 child processes</span>
<span class="go">0:00:00 load avg: 2.14 [ 1/420] test_opcodes passed</span>
<span class="go">0:00:00 load avg: 2.14 [ 2/420] test_grammar passed</span>
<span class="go">...</span>
</pre></div>
<p>On macOS:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">cd</span> Lib
<span class="gp">$</span> ../python.exe -m <span class="nb">test</span> -j2
<span class="go">== CPython 3.8.0b4</span>
<span class="go">== macOS-10.14.3-x86_64-i386-64bit little-endian</span>
<span class="go">== cwd: /Users/anthonyshaw/cpython/build/test_python_23399</span>
<span class="go">== CPU count: 4</span>
<span class="go">== encodings: locale=UTF-8, FS=utf-8</span>
<span class="go">Run tests in parallel using 2 child processes</span>
<span class="go">0:00:00 load avg: 2.14 [ 1/420] test_opcodes passed</span>
<span class="go">0:00:00 load avg: 2.14 [ 2/420] test_grammar passed</span>
<span class="go">...</span>
</pre></div>
<p>Some tests require certain flags; otherwise they are skipped. For example, many of the IDLE tests require a GUI.</p>
<p>To see a list of test suites in the configuration, use the <code>--list-tests</code> flag:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ../python.exe -m <span class="nb">test</span> --list-tests
<span class="go">test_grammar</span>
<span class="go">test_opcodes</span>
<span class="go">test_dict</span>
<span class="go">test_builtin</span>
<span class="go">test_exceptions</span>
<span class="go">...</span>
</pre></div>
<p>You can run specific tests by providing the test suite as the first argument:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ../python.exe -m <span class="nb">test</span> test_webbrowser
<span class="go">Run tests sequentially</span>
<span class="go">0:00:00 load avg: 2.74 [1/1] test_webbrowser</span>
<span class="go">== Tests result: SUCCESS ==</span>
<span class="go">1 test OK.</span>
<span class="go">Total duration: 117 ms</span>
<span class="go">Tests result: SUCCESS</span>
</pre></div>
<p>You can also see a detailed list of tests that were executed with the result using the <code>-v</code> argument:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> ../python.exe -m <span class="nb">test</span> test_webbrowser -v
<span class="go">== CPython 3.8.0b4 </span>
<span class="go">== macOS-10.14.3-x86_64-i386-64bit little-endian</span>
<span class="go">== cwd: /Users/anthonyshaw/cpython/build/test_python_24562</span>
<span class="go">== CPU count: 4</span>
<span class="go">== encodings: locale=UTF-8, FS=utf-8</span>
<span class="go">Run tests sequentially</span>
<span class="go">0:00:00 load avg: 2.36 [1/1] test_webbrowser</span>
<span class="go">test_open (test.test_webbrowser.BackgroundBrowserCommandTest) ... ok</span>
<span class="go">test_register (test.test_webbrowser.BrowserRegistrationTest) ... ok</span>
<span class="go">test_register_default (test.test_webbrowser.BrowserRegistrationTest) ... ok</span>
<span class="go">test_register_preferred (test.test_webbrowser.BrowserRegistrationTest) ... ok</span>
<span class="go">test_open (test.test_webbrowser.ChromeCommandTest) ... ok</span>
<span class="go">test_open_new (test.test_webbrowser.ChromeCommandTest) ... ok</span>
<span class="go">...</span>
<span class="go">test_open_with_autoraise_false (test.test_webbrowser.OperaCommandTest) ... ok</span>
<span class="go">----------------------------------------------------------------------</span>
<span class="go">Ran 34 tests in 0.056s</span>
<span class="go">OK (skipped=2)</span>
<span class="go">== Tests result: SUCCESS ==</span>
<span class="go">1 test OK.</span>
<span class="go">Total duration: 134 ms</span>
<span class="go">Tests result: SUCCESS</span>
</pre></div>
<p>Understanding how to use the test suite and checking the state of the version you have compiled is very important if you wish to make changes to CPython. Before you start making changes, you should run the whole test suite and make sure everything is passing.</p>
<h3 id="installing-a-custom-version">Installing a Custom Version</h3>
<p>From your source repository, if you’re happy with your changes and want to use them inside your system, you can install it as a custom version.</p>
<p>For macOS and Linux, you can use the <code>altinstall</code> command, which won’t create symlinks for <code>python3</code> and install a standalone version:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> make altinstall
</pre></div>
<p>For Windows, you have to change the build configuration from <code>Debug</code> to <code>Release</code>, then copy the packaged binaries to a directory on your computer which is part of the system path.</p>
<h2 id="the-cpython-source-code-conclusion">The CPython Source Code: Conclusion</h2>
<p>Congratulations, you made it! Did your tea get cold? Make yourself another cup. You’ve earned it.</p>
<p>Now that you’ve seen the CPython source code, the modules, the compiler, and the tooling, you may wish to make some changes and contribute them back to the Python ecosystem.</p>
<p>The <a href="https://devguide.python.org/">official dev guide</a> contains plenty of resources for beginners. You’ve already taken the first step, to understand the source, knowing how to change, compile, and test the CPython applications.</p>
<p>Think back to all the things you’ve learned about CPython over this article. All the pieces of magic to which you’ve learned the secrets. The journey doesn’t stop here. </p>
<p>This might be a good time to learn more about Python and C. Who knows: you could be contributing more and more to the CPython project!</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Python Histogram Plotting: NumPy, Matplotlib, Pandas & Seabornhttps://realpython.com/courses/python-histograms/2019-08-20T14:00:00+00:00In this course, you'll be equipped to make production-quality, presentation-ready Python histogram plots with a range of choices and features. It's your one-stop shop for constructing and manipulating histograms with Python's scientific stack.
<p>In this course, you’ll be equipped to make production-quality, presentation-ready Python histogram plots with a range of choices and features.</p>
<p>If you have introductory to intermediate knowledge in Python and statistics, then you can use this article as a one-stop shop for building and plotting histograms in Python using libraries from its scientific stack, including NumPy, Matplotlib, Pandas, and Seaborn.</p>
<p>A histogram is a great tool for quickly assessing a <a href="https://en.wikipedia.org/wiki/Probability_distribution">probability distribution</a> that is intuitively understood by almost any audience. Python offers a handful of different options for building and plotting histograms. Most people know a histogram by its graphical representation, which is similar to a bar graph:</p>
<p><a href="https://files.realpython.com/media/commute_times.621e5b1ce062.png" target="_blank"><img class="img-fluid mx-auto d-block w-75" src="https://files.realpython.com/media/commute_times.621e5b1ce062.png" width="1152" height="888" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/commute_times.621e5b1ce062.png&w=288&sig=408f56b07d4fb71d47171405f51e3cb58a7e6cc2 288w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/commute_times.621e5b1ce062.png&w=576&sig=a2b0324c81a7cd89fe7ceaacdcd4ae797ad1a587 576w, https://files.realpython.com/media/commute_times.621e5b1ce062.png 1152w" sizes="75vw" alt="Histogram of commute times for 1000 commuters"/></a></p>
<p>This course will guide you through creating plots like the one above as well as more complex ones. Here’s what you’ll cover:</p>
<ul>
<li>Building histograms in pure Python, without use of third party libraries</li>
<li>Constructing histograms with NumPy to summarize the underlying data</li>
<li>Plotting the resulting histogram with Matplotlib, Pandas, and Seaborn</li>
</ul>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> Short on time? <a href="https://realpython.com/optins/view/histograms-cheatsheet/" class="alert-link" data-toggle="modal" data-target="#modal-histograms-cheatsheet" data-focus="false">Click here to get access to a free two-page Python histograms cheat sheet</a> that summarizes the techniques explained in this tutorial.</p></div>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
How to Make a Discord Bot in Pythonhttps://realpython.com/how-to-make-a-discord-bot-python/2019-08-19T14:00:00+00:00In this step-by-step tutorial, you'll learn how to make a Discord bot in Python and interact with several APIs. You'll learn how to handle events, accept commands, validate and verify input, and all the basics that can help you create useful and exciting automations!
<p>In a world where video games are so important to so many people, communication and community around games are vital. Discord offers both of those and more in one well-designed package. In this tutorial, you’ll learn how to make a Discord bot in Python so that you can make the most of this fantastic platform.</p>
<p><strong>By the end of this article you’ll learn:</strong></p>
<ul>
<li>What Discord is and why it’s so valuable</li>
<li>How to make a Discord bot through the Developer Portal</li>
<li>How to create Discord connections</li>
<li>How to handle events</li>
<li>How to accept commands and validate assumptions</li>
<li>How to interact with various Discord APIs</li>
</ul>
<p>You’ll begin by learning what Discord is and why it’s valuable.</p>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-tricks-sample" data-focus="false">Click here to get access to a chapter from Python Tricks: The Book</a> that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.</p></div>
<h2 id="what-is-discord">What Is Discord?</h2>
<p><a href="https://discordapp.com/">Discord</a> is a voice and text communication platform for gamers.</p>
<p>Players, streamers, and developers use Discord to discuss games, answer questions, chat while they play, and much more. It even has a game store, complete with critical reviews and a subscription service. It is nearly a one-stop shop for gaming communities.</p>
<p>While there are many things you can build using Discord’s <a href="https://discordapp.com/developers/docs/intro">APIs</a>, this tutorial will focus on a particular learning outcome: how to make a Discord bot in Python.</p>
<h2 id="what-is-a-bot">What Is a Bot?</h2>
<p>Discord is growing in popularity. As such, automated processes, such as banning inappropriate users and reacting to user requests are vital for a community to thrive and grow.</p>
<p>Automated programs that look and act like users and automatically respond to events and commands on Discord are called <strong>bot users</strong>. Discord bot users (or just <strong>bots</strong>) have nearly <a href="https://discordbots.org">unlimited applications</a>.</p>
<p>For example, let’s say you’re managing a new Discord guild and a user joins for the very first time. Excited, you may personally reach out to that user and welcome them to your community. You might also tell them about your channels or ask them to introduce themselves.</p>
<p>The user feels welcomed and enjoys the discussions that happen in your guild and they, in turn, invite friends.</p>
<p>Over time, your community grows so big that it’s no longer feasible to personally reach out to each new member, but you still want to send them something to recognize them as a new member of the guild.</p>
<p>With a bot, it’s possible to automatically react to the new member joining your guild. You can even customize its behavior based on context and control how it interacts with each new user.</p>
<p>This is great, but it’s only one small example of how a bot can be useful. There are so many opportunities for you to be creative with bots, once you know how to make them.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Although Discord allows you to create bots that deal with voice communication, this article will stick to the text side of the service.</p>
</div>
<p>There are two key steps when you’re creating a bot:</p>
<ol>
<li>Create the bot user on Discord and register it with a guild.</li>
<li>Write code that uses Discord’s APIs and implements your bot’s behaviors.</li>
</ol>
<p>In the next section, you’ll learn how to make a Discord bot in Discord’s <a href="https://discordapp.com/developers/applications">Developer Portal</a>.</p>
<h2 id="how-to-make-a-discord-bot-in-the-developer-portal">How to Make a Discord Bot in the Developer Portal</h2>
<p>Before you can dive into any Python code to handle events and create exciting automations, you need to first create a few Discord components:</p>
<ol>
<li>An account</li>
<li>An application</li>
<li>A bot</li>
<li>A guild</li>
</ol>
<p>You’ll learn more about each piece in the following sections.</p>
<p>Once you’ve created all of these components, you’ll tie them together by registering your bot with your guild.</p>
<p>You can get started by heading to Discord’s <a href="http://discordapp.com/developers/applications">Developer Portal</a>.</p>
<h3 id="creating-a-discord-account">Creating a Discord Account</h3>
<p>The first thing you’ll see is a landing page where you’ll need to either login, if you have an existing account, or create a new account:</p>
<p><a href="https://files.realpython.com/media/discord-bot-register-user.41a9c2bc4db9.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-register-user.41a9c2bc4db9.png" width="3024" height="1762" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-register-user.41a9c2bc4db9.png&w=756&sig=6d39dcc4f6fc11eb5de6e2b457fc387d04dcb138 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-register-user.41a9c2bc4db9.png&w=1512&sig=454c3dfb7830bfefbd4fc267e92af61669936082 1512w, https://files.realpython.com/media/discord-bot-register-user.41a9c2bc4db9.png 3024w" sizes="75vw" alt="Discord: Account Login Screen"/></a></p>
<p>If you need to create a new account, then click on the <em>Register</em> button below <em>Login</em> and enter your account information.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Important:</strong> You’ll need to verify your email before you’re able to move on.</p>
</div>
<p>Once you’re finished, you’ll be redirected to the Developer Portal home page, where you’ll create your application.</p>
<h3 id="creating-an-application">Creating an Application</h3>
<p>An <strong>application</strong> allows you to interact with Discord’s APIs by providing authentication tokens, designating permissions, and so on.</p>
<p>To create a new application, select <em>New Application</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-new-app.40b4a51bb57d.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-new-app.40b4a51bb57d.png" width="3024" height="1765" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-app.40b4a51bb57d.png&w=756&sig=094b6a170204052aa3e03749c858002b90ccb1bd 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-app.40b4a51bb57d.png&w=1512&sig=fb20c2f10526eafc64138f33c8e8f08e5c895d57 1512w, https://files.realpython.com/media/discord-bot-new-app.40b4a51bb57d.png 3024w" sizes="75vw" alt="Discord: My Applications Screen"/></a></p>
<p>Next, you’ll be prompted to name your application. Select a name and click <em>Create</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-name-application.8ccfc8a69cb5.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-name-application.8ccfc8a69cb5.png" width="3024" height="1771" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-name-application.8ccfc8a69cb5.png&w=756&sig=663b4e94bcf90cb6c34d416805e4a488dfea3a36 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-name-application.8ccfc8a69cb5.png&w=1512&sig=46630dceba46441199dfd02f62565bc7be1ca896 1512w, https://files.realpython.com/media/discord-bot-name-application.8ccfc8a69cb5.png 3024w" sizes="75vw" alt="Discord: Naming an Application"/></a></p>
<p>Congratulations! You made a Discord application. On the resulting screen, you can see information about your application:</p>
<p><a href="https://files.realpython.com/media/discord-bot-app-info.146a24d590a6.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-app-info.146a24d590a6.png" width="3015" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-app-info.146a24d590a6.png&w=753&sig=08d7b87d0cdda73c1ce9f040123d5fbe6b3ecc76 753w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-app-info.146a24d590a6.png&w=1507&sig=8d9fdeb253e6d562be88406244262b8797e50525 1507w, https://files.realpython.com/media/discord-bot-app-info.146a24d590a6.png 3015w" sizes="75vw" alt="Discord: Application General Information"/></a></p>
<p>Keep in mind that any program that interacts with Discord APIs requires a Discord application, not just bots. Bot-related APIs are only a subset of Discord’s total interface.</p>
<p>However, since this tutorial is about how to make a Discord bot, navigate to the <em>Bot</em> tab on the left-hand navigation list.</p>
<h3 id="creating-a-bot">Creating a Bot</h3>
<p>As you learned in the previous sections, a bot user is one that listens to and automatically reacts to certain events and commands on Discord.</p>
<p>For your code to actually be manifested on Discord, you’ll need to create a bot user. To do so, select <em>Add Bot</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-add-bot.4735c88ff16b.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-add-bot.4735c88ff16b.png" width="3021" height="1761" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-add-bot.4735c88ff16b.png&w=755&sig=1bd131ee29f9c52aed1698d09c26f4267abb5f02 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-add-bot.4735c88ff16b.png&w=1510&sig=3b9afe030a68a934e1f3730ebab7c9bd7e8981cf 1510w, https://files.realpython.com/media/discord-bot-add-bot.4735c88ff16b.png 3021w" sizes="75vw" alt="Discord: Add Bot"/></a></p>
<p>Once you confirm that you want to add the bot to your application, you’ll see the new bot user in the portal:</p>
<p><a href="https://files.realpython.com/media/discord-bot-created.fbdf4a021810.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-created.fbdf4a021810.png" width="3009" height="1760" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-created.fbdf4a021810.png&w=752&sig=8d9dc28ff753dff981137aef1b7e5add00a429a1 752w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-created.fbdf4a021810.png&w=1504&sig=2ec8e17f1a229eb212a96d7f85c9b680391776f1 1504w, https://files.realpython.com/media/discord-bot-created.fbdf4a021810.png 3009w" sizes="75vw" alt="Discord: Bot Created Successfully"/></a></p>
<p>Notice that, by default, your bot user will inherit the name of your application. Instead, update the username to something more bot-like, such as <code>RealPythonTutorialBot</code>, and <em>Save Changes</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-rename-bot.008fd6ed6354.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-rename-bot.008fd6ed6354.png" width="3023" height="1770" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-rename-bot.008fd6ed6354.png&w=755&sig=39e7af71df9ef2095bce4d7e964973af010903c0 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-rename-bot.008fd6ed6354.png&w=1511&sig=252ca9f5542ba888771dd8aa32cf36b65cbc6317 1511w, https://files.realpython.com/media/discord-bot-rename-bot.008fd6ed6354.png 3023w" sizes="75vw" alt="Discord: Rename Bot"/></a></p>
<p>Now, the bot’s all set and ready to go, but to where?</p>
<p>A bot user is not useful if it’s not interacting with other users. Next, you’ll create a guild so that your bot can interact with other users.</p>
<h3 id="creating-a-guild">Creating a Guild</h3>
<p>A <strong>guild</strong> (or a <strong>server</strong>, as it is often called in Discord’s user interface) is a specific group of channels where users congregate to chat.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> While <strong>guild</strong> and <strong>server</strong> are interchangeable, this article will use the term <strong>guild</strong> primarily because the APIs stick to the same term. The term <strong>server</strong> will only be used when referring to a guild in the graphical UI.</p>
</div>
<p>For example, say you want to create a space where users can come together and talk about your latest game. You’d start by creating a guild. Then, in your guild, you could have multiple channels, such as:</p>
<ul>
<li><strong>General Discussion:</strong> A channel for users to talk about whatever they want</li>
<li><strong>Spoilers, Beware:</strong> A channel for users who have finished your game to talk about all the end game reveals</li>
<li><strong>Announcements:</strong> A channel for you to announce game updates and for users to discuss them</li>
</ul>
<p>Once you’ve created your guild, you’d invite other users to populate it.</p>
<p>So, to create a guild, head to your Discord <a href="https://discordapp.com/channels/@me">home</a> page:</p>
<p><a href="https://files.realpython.com/media/discord-bot-homepage.f533b989cedd.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-homepage.f533b989cedd.png" width="3028" height="1717" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-homepage.f533b989cedd.png&w=757&sig=34788068aad51ef7668a3e5ff0583199519ec3c9 757w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-homepage.f533b989cedd.png&w=1514&sig=e36d4b45395b91c00d546eee352c80cee2f8165a 1514w, https://files.realpython.com/media/discord-bot-homepage.f533b989cedd.png 3028w" sizes="75vw" alt="Discord: User Account Home Page"/></a></p>
<p>From this home page, you can view and add friends, direct messages, and guilds. From here, select the <em>+</em> icon on the left-hand side of the web page to <em>Add a Server</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-add-server.bd5a5a58c50c.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-add-server.bd5a5a58c50c.png" width="3027" height="1721" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-add-server.bd5a5a58c50c.png&w=756&sig=1113434567a2bedb5b26726cce843adc7e5791d6 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-add-server.bd5a5a58c50c.png&w=1513&sig=5cb289f097eac68358ee2400d26e45ee9e377fd6 1513w, https://files.realpython.com/media/discord-bot-add-server.bd5a5a58c50c.png 3027w" sizes="75vw" alt="Discord: Add Server"/></a></p>
<p>This will present two options, <em>Create a server</em> and <em>Join a Server</em>. In this case, select <em>Create a server</em> and enter a name for your guild:</p>
<p><a href="https://files.realpython.com/media/discord-bot-create-server.922dba753792.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-create-server.922dba753792.png" width="3023" height="1716" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-create-server.922dba753792.png&w=755&sig=e5083d753497fc156dfec6c3b351abd790b8c0d8 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-create-server.922dba753792.png&w=1511&sig=4709fdcaa3feede390c444431adbb70f998f3f04 1511w, https://files.realpython.com/media/discord-bot-create-server.922dba753792.png 3023w" sizes="75vw" alt="Discord: Naming a Server"/></a></p>
<p>Once you’ve finished creating your guild, you’ll be able to see the users on the right-hand side and the channels on the left:</p>
<p><a href="https://files.realpython.com/media/discord-bot-server.cba61f3781cf.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-server.cba61f3781cf.png" width="3026" height="1721" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-server.cba61f3781cf.png&w=756&sig=b8d05142625b9e2ce814747d1cf64921eba8e048 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-server.cba61f3781cf.png&w=1513&sig=6b6139eeecc4f1f64523d0a4d96f69e1a623c558 1513w, https://files.realpython.com/media/discord-bot-server.cba61f3781cf.png 3026w" sizes="75vw" alt="Discord: Newly Created Server"/></a></p>
<p>The final step on Discord is to register your bot with your new guild.</p>
<h3 id="adding-a-bot-to-a-guild">Adding a Bot to a Guild</h3>
<p>A bot can’t accept invites like a normal user can. Instead, you’ll add your bot using the OAuth2 protocol.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Technical Detail:</strong> <a href="https://oauth.net/2/">OAuth2</a> is a protocol for dealing with authorization, where a service can grant a client application limited access based on the application’s credentials and allowed scopes.</p>
</div>
<p>To do so, head back to the <a href="http://discordapp.com/developers/applications">Developer Portal</a> and select the OAuth2 page from the left-hand navigation:</p>
<p><a href="https://files.realpython.com/media/discord-bot-oauth2.7c000bfe571b.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-oauth2.7c000bfe571b.png" width="3008" height="1770" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-oauth2.7c000bfe571b.png&w=752&sig=a7d51e04cbb851359c7fe643d80acb8e1674d8a2 752w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-oauth2.7c000bfe571b.png&w=1504&sig=5ba25b1a754b3d08bb200328f56ea71e8b71db81 1504w, https://files.realpython.com/media/discord-bot-oauth2.7c000bfe571b.png 3008w" sizes="75vw" alt="Discord: Application OAuth2"/></a></p>
<p>From this window, you’ll see the OAuth2 URL Generator.</p>
<p>This tool generates an authorization URL that hits Discord’s OAuth2 API and authorizes API access using your application’s credentials.</p>
<p>In this case, you’ll want to grant your application’s bot user access to Discord APIs using your application’s OAuth2 credentials.</p>
<p>To do this, scroll down and select <em>bot</em> from the <em>SCOPES</em> options and <em>Administrator</em> from <em>BOT PERMISSIONS</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-scopes.ee333b7a5987.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-scopes.ee333b7a5987.png" width="3012" height="1766" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-scopes.ee333b7a5987.png&w=753&sig=3df33d0353cab7ddbf213779e83c711e5d629e21 753w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-scopes.ee333b7a5987.png&w=1506&sig=259c0a008e687ca6bcb131a63d825e4a3f5133c5 1506w, https://files.realpython.com/media/discord-bot-scopes.ee333b7a5987.png 3012w" sizes="75vw" alt="Discord: Application Scopes and Bot Permissions"/></a></p>
<p>Now, Discord has generated your application’s authorization URL with the selected scope and permissions.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Disclaimer:</strong> While we’re using <em>Administrator</em> for the purposes of this tutorial, you should be as granular as possible when granting permissions in a real-world application.</p>
</div>
<p>Select <em>Copy</em> beside the URL that was generated for you, paste it into your browser, and select your guild from the dropdown options:</p>
<p><a href="https://files.realpython.com/media/discord-bot-select-server.3cd1af626256.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-select-server.3cd1af626256.png" width="3023" height="1715" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-select-server.3cd1af626256.png&w=755&sig=bfa0dba079ebee97013871c01220f66505eed020 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-select-server.3cd1af626256.png&w=1511&sig=f2e91e59e77705a053b9930c27d1bf198ff6c7dc 1511w, https://files.realpython.com/media/discord-bot-select-server.3cd1af626256.png 3023w" sizes="75vw" alt="Discord: Add Bot to a Server"/></a></p>
<p>Click <em>Authorize</em>, and you’re done!</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You might get a <a href="https://en.wikipedia.org/wiki/ReCAPTCHA">reCAPTCHA</a> before moving on. If so, you’ll need to prove you’re a human.</p>
</div>
<p>If you go back to your guild, then you’ll see that the bot has been added:</p>
<p><a href="https://files.realpython.com/media/discord-bot-added-to-guild.4a6b4477bc1e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-added-to-guild.4a6b4477bc1e.png" width="3024" height="1719" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-added-to-guild.4a6b4477bc1e.png&w=756&sig=39634e4118660e59511dee28f86e7bb290192f3d 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-added-to-guild.4a6b4477bc1e.png&w=1512&sig=add9851baae9338210082db661a6277cad86ef5f 1512w, https://files.realpython.com/media/discord-bot-added-to-guild.4a6b4477bc1e.png 3024w" sizes="75vw" alt="Discord: Bot Added to Guild"/></a></p>
<p>In summary, you’ve created:</p>
<ul>
<li>An <strong>application</strong> that your bot will use to authenticate with Discord’s APIs</li>
<li>A <strong>bot</strong> user that you’ll use to interact with other users and events in your guild</li>
<li>A <strong>guild</strong> in which your user account and your bot user will be active</li>
<li>A <strong>Discord</strong> account with which you created everything else and that you’ll use to interact with your bot</li>
</ul>
<p>Now, you know how to make a Discord bot using the Developer Portal. Next comes the fun stuff: implementing your bot in Python!</p>
<h2 id="how-to-make-a-discord-bot-in-python">How to Make a Discord Bot in Python</h2>
<p>Since you’re learning how to make a Discord bot with Python, you’ll be using <code>discord.py</code>.</p>
<p><a href="https://discordpy.readthedocs.io/en/latest/index.html"><code>discord.py</code></a> is a Python library that exhaustively implements Discord’s APIs in an efficient and Pythonic way. This includes utilizing Python’s implementation of <a href="https://realpython.com/async-io-python/">Async IO</a>.</p>
<p>Begin by installing <code>discord.py</code> with <a href="https://realpython.com/what-is-pip/"><code>pip</code></a>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install -U discord.py
</pre></div>
<p>Now that you’ve installed <code>discord.py</code>, you’ll use it to create your first connection to Discord!</p>
<h2 id="creating-a-discord-connection">Creating a Discord Connection</h2>
<p>The first step in implementing your bot user is to create a connection to Discord. With <code>discord.py</code>, you do this by creating an instance of <code>Client</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>A <code>Client</code> is an object that represents a connection to Discord. A <code>Client</code> handles events, tracks state, and generally interacts with Discord APIs.</p>
<p>Here, you’ve created a <code>Client</code> and implemented its <code>on_ready()</code> event handler, which handles the event when the <code>Client</code> has established a connection to Discord and it has finished preparing the data that Discord has sent, such as login state, guild and channel data, and more.</p>
<p>In other words, <code>on_ready()</code> will be called (and your message will be printed) once <code>client</code> is ready for further action. You’ll learn more about event handlers later in this article.</p>
<p>When you’re working with secrets such as your Discord token, it’s good practice to read it into your program from an environment variable. Using environment variables helps you:</p>
<ul>
<li>Avoid putting the secrets into source control</li>
<li>Use different variables for development and production environments without changing your code</li>
</ul>
<p>While you could <code>export DISCORD_TOKEN={your-bot-token}</code>, an easier solution is to save a <code>.env</code> file on all machines that will be running this code. This is not only easier, since you won’t have to <code>export</code> your token every time you clear your shell, but it also protects you from storing your secrets in your shell’s history.</p>
<p>Create a file named <code>.env</code> in the same directory as <code>bot.py</code>:</p>
<div class="highlight text"><pre><span></span># .env
DISCORD_TOKEN={your-bot-token}
</pre></div>
<p>You’ll need to replace <code>{your-bot-token}</code> with your bot’s token, which you can get by going back to the <em>Bot</em> page on the <a href="http://discordapp.com/developers/applications">Developer Portal</a> and clicking <em>Copy</em> under the <em>TOKEN</em> section:</p>
<p><a href="https://files.realpython.com/media/discord-bot-copy-token.1228e6cb6cba.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-copy-token.1228e6cb6cba.png" width="3024" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-copy-token.1228e6cb6cba.png&w=756&sig=57425662c2056deac50ebeb9a8d2891d4a8a6b53 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-copy-token.1228e6cb6cba.png&w=1512&sig=c8c437a86a1fb23a94e35050f5c969f5f20c4814 1512w, https://files.realpython.com/media/discord-bot-copy-token.1228e6cb6cba.png 3024w" sizes="75vw" alt="Discord: Copy Bot Token"/></a></p>
<p>Looking back at the <code>bot.py</code> code, you’ll notice a library called <a href="https://github.com/theskumar/python-dotenv"><code>dotenv</code></a>. This library is handy for working with <code>.env</code> files. <code>load_dotenv()</code> loads environment variables from a <code>.env</code> file into your shell’s environment variables so that you can use them in your code.</p>
<p>Install <code>dotenv</code> with <code>pip</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install -U python-dotenv
</pre></div>
<p>Finally, <code>client.run()</code> runs your <code>Client</code> using your bot’s token.</p>
<p>Now that you’ve set up both <code>bot.py</code> and <code>.env</code>, you can run your code:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot#9643 has connected to Discord!</span>
</pre></div>
<p>Great! Your <code>Client</code> has connected to Discord using your bot’s token. In the next section, you’ll build on this <code>Client</code> by interacting with more Discord APIs.</p>
<h2 id="interacting-with-discord-apis">Interacting With Discord APIs</h2>
<p>Using a <code>Client</code>, you have access to a wide range of Discord APIs.</p>
<p>For example, let’s say you wanted to write the name and identifier of the guild that you registered your bot user with to the console.</p>
<p>First, you’ll need to add a new environment variable:</p>
<div class="highlight text"><pre><span></span># .env
DISCORD_TOKEN={your-bot-token}
<span class="hll">DISCORD_GUILD={your-guild-name}
</span></pre></div>
<p>Don’t forget that you’ll need to replace the two placeholders with actual values:</p>
<ol>
<li><code>{your-bot-token}</code></li>
<li><code>{your-guild-name}</code></li>
</ol>
<p>Remember that Discord calls <code>on_ready()</code>, which you used before, once the <code>Client</code> has made the connection and prepared the data. So, you can rely on the guild data being available inside <code>on_ready()</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">GUILD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_GUILD'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="k">for</span> <span class="n">guild</span> <span class="ow">in</span> <span class="n">client</span><span class="o">.</span><span class="n">guilds</span><span class="p">:</span>
<span class="k">if</span> <span class="n">guild</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">GUILD</span><span class="p">:</span>
<span class="k">break</span>
<span class="nb">print</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> is connected to the following guild:</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{guild.name}</span><span class="s1">(id: </span><span class="si">{guild.id}</span><span class="s1">)'</span>
<span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p>Here, you looped through the guild data that Discord has sent <code>client</code>, namely <code>client.guilds</code>. Then, you found the guild with the matching name and printed a <a href="https://realpython.com/python-f-strings/">formatted string</a> to <code>stdout</code>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Even though you can be pretty confident at this point in the tutorial that your bot is only connected to a single guild (so <code>client.guilds[0]</code> would be simpler), it’s important to realize that a bot user can be connected to many guilds.</p>
<p>Therefore, a more robust solution is to loop through <code>client.guilds</code> to find the one you’re looking for.</p>
</div>
<p>Run the program to see the results:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot#9643 is connected to the following guild:</span>
<span class="go">RealPythonTutorialServer(id: 571759877328732195)</span>
</pre></div>
<p>Great! You can see the name of your bot, the name of your server, and the server’s identification number.</p>
<p>Another interesting bit of data you can pull from a guild is the list of users who are members of the guild:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">GUILD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_GUILD'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="k">for</span> <span class="n">guild</span> <span class="ow">in</span> <span class="n">client</span><span class="o">.</span><span class="n">guilds</span><span class="p">:</span>
<span class="k">if</span> <span class="n">guild</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">GUILD</span><span class="p">:</span>
<span class="k">break</span>
<span class="nb">print</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> is connected to the following guild:</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{guild.name}</span><span class="s1">(id: </span><span class="si">{guild.id}</span><span class="s1">)</span><span class="se">\n</span><span class="s1">'</span>
<span class="p">)</span>
<span class="n">members</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\n</span><span class="s1"> - '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">member</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">member</span> <span class="ow">in</span> <span class="n">guild</span><span class="o">.</span><span class="n">members</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Guild Members:</span><span class="se">\n</span><span class="s1"> - </span><span class="si">{members}</span><span class="s1">'</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p>By looping through <code>guild.members</code>, you pulled the names of all of the members of the guild and printed them with a formatted string.</p>
<p>When you run the program, you should see at least the name of the account you created the guild with and the name of the bot user itself:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot#9643 is connected to the following guild:</span>
<span class="go">RealPythonTutorialServer(id: 571759877328732195)</span>
<span class="go">Guild Members:</span>
<span class="go"> - aronq2</span>
<span class="go"> - RealPythonTutorialBot</span>
</pre></div>
<p>These examples barely scratch the surface of the APIs available on Discord, be sure to check out their <a href="https://discordpy.readthedocs.io/en/latest/api.html#">documentation</a> to see all that they have to offer.</p>
<p>Next, you’ll learn about some utility functions and how they can simplify these examples.</p>
<h2 id="using-utility-functions">Using Utility Functions</h2>
<p>Let’s take another look at the example from the last section where you printed the name and identifier of the bot’s guild:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">GUILD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_GUILD'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="k">for</span> <span class="n">guild</span> <span class="ow">in</span> <span class="n">client</span><span class="o">.</span><span class="n">guilds</span><span class="p">:</span>
<span class="k">if</span> <span class="n">guild</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">GUILD</span><span class="p">:</span>
<span class="k">break</span>
<span class="nb">print</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> is connected to the following guild:</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{guild.name}</span><span class="s1">(id: </span><span class="si">{guild.id}</span><span class="s1">)'</span>
<span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p>You could clean up this code by using some of the utility functions available in <code>discord.py</code>.</p>
<p><a href="https://discordpy.readthedocs.io/en/latest/api.html#discord.utils.find"><code>discord.utils.find()</code></a> is one utility that can improve the simplicity and readability of this code by replacing the <code>for</code> loop with an intuitive, abstracted function:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">GUILD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_GUILD'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="n">guild</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="k">lambda</span> <span class="n">g</span><span class="p">:</span> <span class="n">g</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">GUILD</span><span class="p">,</span> <span class="n">client</span><span class="o">.</span><span class="n">guilds</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> is connected to the following guild:</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{guild.name}</span><span class="s1">(id: </span><span class="si">{guild.id}</span><span class="s1">)'</span>
<span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p><code>find()</code> takes a function, called a <strong>predicate</strong>, which identifies some characteristic of the element in the iterable that you’re looking for. Here, you used a particular type of anonymous function, called a <a href="https://realpython.com/python-lambda/">lambda</a>, as the predicate.</p>
<p>In this case, you’re trying to find the guild with the same name as the one you stored in the <code>DISCORD_GUILD</code> environment variable. Once <code>find()</code> locates an element in the iterable that satisfies the predicate, it will return the element. This is essentially equivalent to the <code>break</code> statement in the previous example, but cleaner.</p>
<p><code>discord.py</code> has even abstracted this concept one step further with the <a href="https://discordpy.readthedocs.io/en/latest/api.html#discord.utils.get"><code>get()</code> utility</a>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">GUILD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_GUILD'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="n">guild</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">client</span><span class="o">.</span><span class="n">guilds</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">GUILD</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{client.user}</span><span class="s1"> is connected to the following guild:</span><span class="se">\n</span><span class="s1">'</span>
<span class="n">f</span><span class="s1">'</span><span class="si">{guild.name}</span><span class="s1">(id: </span><span class="si">{guild.id}</span><span class="s1">)'</span>
<span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p><code>get()</code> takes the iterable and some keyword arguments. The keyword arguments represent attributes of the elements in the iterable that must all be satisfied for <code>get()</code> to return the element.</p>
<p>In this example, you’ve identified <code>name=GUILD</code> as the attribute that must be satisfied.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Technical Detail:</strong> Under the hood, <code>get()</code> actually uses the <code>attrs</code> keyword arguments to build a predicate, which it then uses to call <code>find()</code>.</p>
</div>
<p>Now that you’ve learned the basics of interacting with APIs, you’ll dive a little deeper into the function that you’ve been using to access them: <code>on_ready()</code>.</p>
<h2 id="responding-to-events">Responding to Events</h2>
<p>You already learned that <code>on_ready()</code> is an event. In fact, you might have noticed that it is identified as such in the code by the <code>client.event</code> <a href="https://realpython.com/primer-on-python-decorators/">decorator</a>.</p>
<p>But what is an event?</p>
<p>An <strong>event</strong> is something that happens on Discord that you can use to trigger a reaction in your code. Your code will listen for and then respond to events.</p>
<p>Using the example you’ve seen already, the <code>on_ready()</code> event handler handles the event that the <code>Client</code> has made a connection to Discord and prepared its response data.</p>
<p>So, when Discord fires an event, <code>discord.py</code> will route the event data to the corresponding event handler on your connected <code>Client</code>.</p>
<p>There are two ways in <code>discord.py</code> to implement an event handler:</p>
<ol>
<li>Using the <code>client.event</code> decorator</li>
<li>Creating a subclass of <code>Client</code> and overriding its handler methods</li>
</ol>
<p>You already saw the implementation using the decorator. Next, take a look at how to subclass <code>Client</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomClient</span><span class="p">(</span><span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">):</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.user}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">CustomClient</span><span class="p">()</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>Here, just like before, you’ve created a <code>client</code> variable and called <code>.run()</code> with your Discord token. The actual <code>Client</code> is different, however. Instead of using the normal base class, <code>client</code> is an instance of <code>CustomClient</code>, which has an overridden <code>on_ready()</code> function.</p>
<p>There is no difference between the two implementation styles of events, but this tutorial will primarily use the decorator version because it looks similar to how you implement <code>Bot</code> commands, which is a topic you’ll cover in a bit.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Technical Detail:</strong> Regardless of how you implement your event handler, one thing must be consistent: all event handlers in <code>discord.py</code> must be <a href="https://realpython.com/async-io-python/#the-asyncawait-syntax-and-native-coroutines">coroutines</a>.</p>
</div>
<p>Now that you’ve learned how to create an event handler, let’s walk through some different examples of handlers you can create.</p>
<h3 id="welcoming-new-members">Welcoming New Members</h3>
<p>Previously, you saw the example of responding to the event where a member joins a guild. In that example, your bot user could send them a message, welcoming them to your Discord community.</p>
<p>Now, you’ll implement that behavior in your <code>Client</code>, using event handlers, and verify its behavior in Discord:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{client.user.name}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_member_join</span><span class="p">(</span><span class="n">member</span><span class="p">):</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">create_dm</span><span class="p">()</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">dm_channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'Hi </span><span class="si">{member.name}</span><span class="s1">, welcome to my Discord server!'</span>
<span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>Like before, you handled the <code>on_ready()</code> event by printing the bot user’s name in a formatted string. New, however, is the implementation of the <code>on_member_join()</code> event handler.</p>
<p><code>on_member_join()</code>, as its name suggests, handles the event of a new member joining a guild.</p>
<p>In this example, you used <code>member.create_dm()</code> to create a direct message channel. Then, you used that channel to <code>.send()</code> a direct message to that new member.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Technical Detail:</strong> Notice the <code>await</code> keyword before <code>member.create_dm()</code> and <code>member.dm_channel.send()</code>.</p>
<p><code>await</code> suspends the execution of the surrounding coroutine until the execution of each coroutine has finished.</p>
</div>
<p>Now, let’s test out your bot’s new behavior.</p>
<p>First, run your new version of <code>bot.py</code> and wait for the <code>on_ready()</code> event to fire, logging your message to <code>stdout</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot has connected to Discord!</span>
</pre></div>
<p>Now, head over to <a href="https://discordapp.com/">Discord</a>, log in, and navigate to your guild by selecting it from the left-hand side of the screen:</p>
<p><a href="https://files.realpython.com/media/discord-bot-navigate-to-server.dfef0364630f.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-navigate-to-server.dfef0364630f.png" width="3024" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-navigate-to-server.dfef0364630f.png&w=756&sig=e385f92a8203695f50bbd2d55922fc8ae3b4599e 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-navigate-to-server.dfef0364630f.png&w=1512&sig=7b1fd07e2bfbee2e23b36aa922a614d938b29698 1512w, https://files.realpython.com/media/discord-bot-navigate-to-server.dfef0364630f.png 3024w" sizes="75vw" alt="Discord: Navigate to Server"/></a></p>
<p>Select <em>Invite People</em> just beside the guild list where you selected your guild. Check the box that says <em>Set this link to never expire</em> and copy the link:</p>
<p><a href="https://files.realpython.com/media/discord-bot-copy-invite.0dd6b229c819.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-copy-invite.0dd6b229c819.png" width="3024" height="1766" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-copy-invite.0dd6b229c819.png&w=756&sig=81a88bffad8cbfa324a0eef38a1b6ff7ccfc67a5 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-copy-invite.0dd6b229c819.png&w=1512&sig=3c9f1911a3ebeb646c31e570e6a9ca8cff23f8cd 1512w, https://files.realpython.com/media/discord-bot-copy-invite.0dd6b229c819.png 3024w" sizes="75vw" alt="Discord: Copy Invite Link"/></a></p>
<p>Now, with the invite link copied, create a new account and join the guild using your invite link:</p>
<p><a href="https://files.realpython.com/media/discord-bot-accept-invite.4b33a1ba7062.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-accept-invite.4b33a1ba7062.png" width="3021" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-accept-invite.4b33a1ba7062.png&w=755&sig=3b1a1db4101d859112e0a1d224d2aa8cc7606715 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-accept-invite.4b33a1ba7062.png&w=1510&sig=c5efa82e1d5940ff89422813371ddc83054cdbce 1510w, https://files.realpython.com/media/discord-bot-accept-invite.4b33a1ba7062.png 3021w" sizes="75vw" alt="Discord: Accept Invite"/></a></p>
<p>First, you’ll see that Discord introduced you to the guild by default with an automated message. More importantly though, notice the badge on the left-hand side of the screen that notifies you of a new message:</p>
<p><a href="https://files.realpython.com/media/discord-bot-direct-message-notification.95e423f72678.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-direct-message-notification.95e423f72678.png" width="3022" height="1768" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-direct-message-notification.95e423f72678.png&w=755&sig=84f08a7484e333f40081aa4071dafc4d331ca7f2 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-direct-message-notification.95e423f72678.png&w=1511&sig=51ed613c1cbe18a474ca12e6b648cb18d9a0505a 1511w, https://files.realpython.com/media/discord-bot-direct-message-notification.95e423f72678.png 3022w" sizes="75vw" alt="Discord: Direct Message Notification"/></a></p>
<p>When you select it, you’ll see a private message from your bot user:</p>
<p><a href="https://files.realpython.com/media/discord-bot-direct-message.7f49832b7bb7.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-direct-message.7f49832b7bb7.png" width="3024" height="1769" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-direct-message.7f49832b7bb7.png&w=756&sig=70663b008b61bdb83ce3f2f3b6cc4d4abda8a914 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-direct-message.7f49832b7bb7.png&w=1512&sig=5874eab3fb67f6bd0ba0b3a4ed5c6d401e23f8c8 1512w, https://files.realpython.com/media/discord-bot-direct-message.7f49832b7bb7.png 3024w" sizes="75vw" alt="Discord: Direct Message"/></a></p>
<p>Perfect! Your bot user is now interacting with other users with minimal code.</p>
<p>Next, you’ll learn how to respond to specific user messages in the chat.</p>
<h3 id="responding-to-messages">Responding to Messages</h3>
<p>Let’s add on to the previous functionality of your bot by handling the <code>on_message()</code> event.</p>
<p><code>on_message()</code> occurs when a message is posted in a channel that your bot has access to. In this example, you’ll respond to the message <code>'99!'</code> with a one-liner from the television show <a href="https://www.nbc.com/brooklyn-nine-nine">Brooklyn Nine-Nine</a>:</p>
<div class="highlight python"><pre><span></span><span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_message</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="s1">'99!'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">message</span><span class="o">.</span><span class="n">channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
</pre></div>
<p>The bulk of this event handler looks at the <code>message.content</code>, checks to see if it’s equal to <code>'99!'</code>, and responds by sending a random quote to the message’s channel if it is.</p>
<p>The other piece is an important one:</p>
<div class="highlight python"><pre><span></span><span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
</pre></div>
<p>Because a <code>Client</code> can’t tell the difference between a bot user and a normal user account, your <code>on_message()</code> handler should protect against a potentially recursive case where the bot sends a message that it might, itself, handle.</p>
<p>To illustrate, let’s say you want your bot to listen for users telling each other <code>'Happy Birthday'</code>. You could implement your <code>on_message()</code> handler like this:</p>
<div class="highlight python"><pre><span></span><span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_message</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">if</span> <span class="s1">'happy birthday'</span> <span class="ow">in</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span><span class="o">.</span><span class="n">lower</span><span class="p">():</span>
<span class="k">await</span> <span class="n">message</span><span class="o">.</span><span class="n">channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">'Happy Birthday! ๐๐'</span><span class="p">)</span>
</pre></div>
<p>Aside from the potentially spammy nature of this event handler, it also has a devastating side effect. The message that the bot responds with contains the same message it’s going to handle!</p>
<p>So, if one person in the channel tells another “Happy Birthday,” then the bot will also chime in… again… and again… and again:</p>
<p><a href="https://files.realpython.com/media/discord-bot-happy-birthday-repetition.864acfe23979.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-happy-birthday-repetition.864acfe23979.png" width="3028" height="1769" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-happy-birthday-repetition.864acfe23979.png&w=757&sig=1e56e02abfd0366024a7df56f2111fbb023f42be 757w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-happy-birthday-repetition.864acfe23979.png&w=1514&sig=2f67cb61941a012ff7d260910406a5c427c8f4c4 1514w, https://files.realpython.com/media/discord-bot-happy-birthday-repetition.864acfe23979.png 3028w" sizes="75vw" alt="Discord: Happy Birthday Message Repetition"/></a></p>
<p>That’s why it’s important to compare the <code>message.author</code> to the <code>client.user</code> (your bot user), and ignore any of its own messages.</p>
<p>So, let’s fix <code>bot.py</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{client.user.name}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_member_join</span><span class="p">(</span><span class="n">member</span><span class="p">):</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">create_dm</span><span class="p">()</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">dm_channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'Hi </span><span class="si">{member.name}</span><span class="s1">, welcome to my Discord server!'</span>
<span class="p">)</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_message</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="s1">'99!'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">message</span><span class="o">.</span><span class="n">channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>Don’t forget to <code>import random</code> at the top of the module, since the <code>on_message()</code> handler utilizes <code>random.choice()</code>.</p>
<p>Run the program:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot has connected to Discord!</span>
</pre></div>
<p>Finally, head over to Discord to test it out:</p>
<p><a href="https://files.realpython.com/media/discord-bot-brooklyn-99-quotes.e934592e025e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-brooklyn-99-quotes.e934592e025e.png" width="3027" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-brooklyn-99-quotes.e934592e025e.png&w=756&sig=4f9a47703207d0920393a69f3411bd7d5914521b 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-brooklyn-99-quotes.e934592e025e.png&w=1513&sig=d78013102c8901874e9632b66e4a8d1b26f7b7fc 1513w, https://files.realpython.com/media/discord-bot-brooklyn-99-quotes.e934592e025e.png 3027w" sizes="75vw" alt="Discord: Quotes From Brooklyn Nine-Nine"/></a></p>
<p>Great! Now that you’ve seen a few different ways to handle some common Discord events, you’ll learn how to deal with errors that event handlers may raise.</p>
<h3 id="handling-exceptions">Handling Exceptions</h3>
<p>As you’ve seen already, <code>discord.py</code> is an event-driven system. This focus on events extends even to exceptions. When one event handler <a href="https://realpython.com/python-exceptions/">raises an <code>Exception</code></a>, Discord calls <code>on_error()</code>.</p>
<p>The default behavior of <code>on_error()</code> is to write the error message and stack trace to <code>stderr</code>. To test this, add a special message handler to <code>on_message()</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{client.user.name}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_member_join</span><span class="p">(</span><span class="n">member</span><span class="p">):</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">create_dm</span><span class="p">()</span>
<span class="k">await</span> <span class="n">member</span><span class="o">.</span><span class="n">dm_channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span>
<span class="n">f</span><span class="s1">'Hi </span><span class="si">{member.name}</span><span class="s1">, welcome to my Discord server!'</span>
<span class="p">)</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_message</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="s1">'99!'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">message</span><span class="o">.</span><span class="n">channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="hll"> <span class="k">elif</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="s1">'raise-exception'</span><span class="p">:</span>
</span><span class="hll"> <span class="k">raise</span> <span class="n">discord</span><span class="o">.</span><span class="n">DiscordException</span>
</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>The new <code>raise-exception</code> message handler allows you to raise a <code>DiscordException</code> on command.</p>
<p>Run the program and type <code>raise-exception</code> into the Discord channel:</p>
<p><a href="https://files.realpython.com/media/discord-bot-raise-exception.7fcae85fb06e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-raise-exception.7fcae85fb06e.png" width="3026" height="1765" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-raise-exception.7fcae85fb06e.png&w=756&sig=5d0026bbd91a3b8bb1bd561f0779dde3150d588a 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-raise-exception.7fcae85fb06e.png&w=1513&sig=7b226812216c03ef139c4c33823dd6318dc34ec5 1513w, https://files.realpython.com/media/discord-bot-raise-exception.7fcae85fb06e.png 3026w" sizes="75vw" alt="Discord: Raise Exception Message"/></a></p>
<p>You should now see the <code>Exception</code> that was raised by your <code>on_message()</code> handler in the console:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">RealPythonTutorialBot has connected to Discord!</span>
<span class="go">Ignoring exception in on_message</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/client.py", line 255, in _run_event</span>
<span class="go"> await coro(*args, **kwargs)</span>
<span class="go"> File "bot.py", line 42, in on_message</span>
<span class="go"> raise discord.DiscordException</span>
<span class="go">discord.errors.DiscordException</span>
</pre></div>
<p>The exception was caught by the default error handler, so the output contains the message <code>Ignoring exception in on_message</code>. Let’s fix that by handling that particular error. To do so, you’ll catch the <code>DiscordException</code> and <a href="https://realpython.com/working-with-files-in-python/">write it to a file</a> instead.</p>
<p>The <code>on_error()</code> event handler takes the <code>event</code> as the first argument. In this case, we expect the <code>event</code> to be <code>'on_message'</code>. It also accepts <code>*args</code> and <code>**kwargs</code> as flexible, positional and keyword arguments passed to the original event handler.</p>
<p>So, since <code>on_message()</code> takes a single argument, <code>message</code>, we expect <code>args[0]</code> to be the <code>message</code> that the user sent in the Discord channel:</p>
<div class="highlight python"><pre><span></span><span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_error</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'err.log'</span><span class="p">,</span> <span class="s1">'a'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="s1">'on_message'</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">f</span><span class="s1">'Unhandled message: </span><span class="si">{args[0]}</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span>
</pre></div>
<p>If the <code>Exception</code> originated in the <code>on_message()</code> event handler, you <code>.write()</code> a formatted string to the file <code>err.log</code>. If another event raises an <code>Exception</code>, then we simply want our handler to re-raise the exception to invoke the default behavior.</p>
<p>Run <code>bot.py</code> and send the <code>raise-exception</code> message again to view the output in <code>err.log</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> cat err.log
<span class="go">Unhandled message: <Message id=573845548923224084 pinned=False author=<Member id=543612676807327754 name='alexronquillo' discriminator='0933' bot=False nick=None guild=<Guild id=571759877328732195 name='RealPythonTutorialServer' chunked=True>>></span>
</pre></div>
<p>Instead of only a stack trace, you have a more informative error, showing the <code>message</code> that caused <code>on_message()</code> to raise the <code>DiscordException</code>, saved to a file for longer persistence.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Technical Detail:</strong> If you want to take the actual <code>Exception</code> into account when you’re writing your error messages to <code>err.log</code>, then you can use functions from <code>sys</code>, such as <a href="https://docs.python.org/library/sys.html#sys.exc_info"><code>exc_info()</code></a>.</p>
</div>
<p>Now that you have some experience handling different events and interacting with Discord APIs, you’ll learn about a subclass of <code>Client</code> called <code>Bot</code>, which implements some handy, bot-specific functionality.</p>
<h2 id="connecting-a-bot">Connecting a Bot</h2>
<p>A <code>Bot</code> is a subclass of <code>Client</code> that adds a little bit of extra functionality that is useful when you’re creating bot users. For example, a <code>Bot</code> can handle events and commands, invoke validation checks, and more.</p>
<p>Before you get into the features specific to <code>Bot</code>, convert <code>bot.py</code> to use a <code>Bot</code> instead of a <code>Client</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="c1"># 1</span>
<span class="kn">from</span> <span class="nn">discord.ext</span> <span class="k">import</span> <span class="n">commands</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="c1"># 2</span>
<span class="n">bot</span> <span class="o">=</span> <span class="n">commands</span><span class="o">.</span><span class="n">Bot</span><span class="p">(</span><span class="n">command_prefix</span><span class="o">=</span><span class="s1">'!'</span><span class="p">)</span>
<span class="nd">@bot</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_ready</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{bot.user.name}</span><span class="s1"> has connected to Discord!'</span><span class="p">)</span>
<span class="n">bot</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>As you can see, <code>Bot</code> can handle events the same way that <code>Client</code> does. However, notice the differences between <code>Client</code> and <code>Bot</code>:</p>
<ol>
<li><code>Bot</code> is imported from the <code>discord.ext.commands</code> module.</li>
<li>The <code>Bot</code> initializer requires a <code>command_prefix</code>, which you’ll learn more about in the next section.</li>
</ol>
<p>The extensions library, <code>ext</code>, offers several interesting components to help you create a Discord <code>Bot</code>. One such component is the <a href="https://discordpy.readthedocs.io/en/latest/ext/commands/commands.html"><code>Command</code></a>.</p>
<h3 id="using-bot-commands">Using <code>Bot</code> Commands</h3>
<p>In general terms, a <strong>command</strong> is an order that a user gives to a bot so that it will do something. Commands are different from events because they are:</p>
<ul>
<li>Arbitrarily defined</li>
<li>Directly called by the user</li>
<li>Flexible, in terms of their interface</li>
</ul>
<p>In technical terms, a <strong><code>Command</code></strong> is an object that wraps a function that is invoked by a text command in Discord. The text command must start with the <code>command_prefix</code>, defined by the <code>Bot</code> object.</p>
<p>Let’s take a look at an old event to better understand what this looks like:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
<span class="nd">@client</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_message</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="s1">'99!'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">message</span><span class="o">.</span><span class="n">channel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="n">client</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p>Here, you created an <code>on_message()</code> event handler, which receives the <code>message</code> string and compares it to a pre-defined option: <code>'99!'</code>.</p>
<p>Using a <code>Command</code>, you can convert this example to be more specific:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">discord.ext</span> <span class="k">import</span> <span class="n">commands</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">bot</span> <span class="o">=</span> <span class="n">commands</span><span class="o">.</span><span class="n">Bot</span><span class="p">(</span><span class="n">command_prefix</span><span class="o">=</span><span class="s1">'!'</span><span class="p">)</span>
<span class="nd">@bot</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'99'</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">nine_nine</span><span class="p">(</span><span class="n">ctx</span><span class="p">):</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">ctx</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="n">bot</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>There are several important characteristics to understand about using <code>Command</code>:</p>
<ol>
<li>
<p>Instead of using <code>bot.event</code> like before, you use <code>bot.command()</code>, passing the invocation command (<code>name</code>) as its argument.</p>
</li>
<li>
<p>The function will now only be called when <code>!99</code> is mentioned in chat. This is different than the <code>on_message()</code> event, which was executed any time a user sent a message, regardless of the content.</p>
</li>
<li>
<p>The command must be prefixed with the exclamation point (<code>!</code> ) because that’s the <code>command_prefix</code> that you defined in the initializer for your <code>Bot</code>.</p>
</li>
<li>
<p>Any <code>Command</code> function (technically called a <code>callback</code>) must accept at least one parameter, called <code>ctx</code>, which is the <a href="https://discordpy.readthedocs.io/en/latest/ext/commands/commands.html#invocation-context"><code>Context</code></a> surrounding the invoked <code>Command</code>.</p>
</li>
</ol>
<p>A <code>Context</code> holds data such as the channel and guild that the user called the <code>Command</code> from.</p>
<p>Run the program:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
</pre></div>
<p>With your bot running, you can now head to Discord to try out your new command:</p>
<p><a href="https://files.realpython.com/media/discord-bot-brooklyn-99-command.f01b21540756.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-brooklyn-99-command.f01b21540756.png" width="3031" height="1769" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-brooklyn-99-command.f01b21540756.png&w=757&sig=dff1081b5b379d39c1cdd41115bdef90fa4a833c 757w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-brooklyn-99-command.f01b21540756.png&w=1515&sig=a05851348e0e7b2217a0900c531a16a42bf4f32e 1515w, https://files.realpython.com/media/discord-bot-brooklyn-99-command.f01b21540756.png 3031w" sizes="75vw" alt="Discord: Brooklyn Nine-Nine Command"/></a></p>
<p>From the user’s point of view, the practical difference is that the prefix helps formalize the command, rather than simply reacting to a particular <code>on_message()</code> event.</p>
<p>This comes with other great benefits as well. For example, you can invoke the <code>help</code> command to see all the commands that your <code>Bot</code> handles:</p>
<p><a href="https://files.realpython.com/media/discord-bot-help-command.a2ec772cc910.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-help-command.a2ec772cc910.png" width="3027" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-help-command.a2ec772cc910.png&w=756&sig=122cba0d422a63e39e790208f553a16ebbceb674 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-help-command.a2ec772cc910.png&w=1513&sig=e8ef5a87dcca2de43f9ff198bb191746c5757465 1513w, https://files.realpython.com/media/discord-bot-help-command.a2ec772cc910.png 3027w" sizes="75vw" alt="Discord: Help Command"/></a></p>
<p>If you want to add a description to your command so that the <code>help</code> message is more informative, simply pass a <code>help</code> description to the <code>.command()</code> decorator:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">discord.ext</span> <span class="k">import</span> <span class="n">commands</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">bot</span> <span class="o">=</span> <span class="n">commands</span><span class="o">.</span><span class="n">Bot</span><span class="p">(</span><span class="n">command_prefix</span><span class="o">=</span><span class="s1">'!'</span><span class="p">)</span>
<span class="nd">@bot</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'99'</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">'Responds with a random quote from Brooklyn 99'</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">nine_nine</span><span class="p">(</span><span class="n">ctx</span><span class="p">):</span>
<span class="n">brooklyn_99_quotes</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'I</span><span class="se">\'</span><span class="s1">m the human form of the ๐ฏ emoji.'</span><span class="p">,</span>
<span class="s1">'Bingpot!'</span><span class="p">,</span>
<span class="p">(</span>
<span class="s1">'Cool. Cool cool cool cool cool cool cool, '</span>
<span class="s1">'no doubt no doubt no doubt no doubt.'</span>
<span class="p">),</span>
<span class="p">]</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">brooklyn_99_quotes</span><span class="p">)</span>
<span class="k">await</span> <span class="n">ctx</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="n">bot</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
</pre></div>
<p>Now, when the user invokes the <code>help</code> command, your bot will present a description of your command:</p>
<p><a href="https://files.realpython.com/media/discord-bot-help-description.7f710c984c66.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-help-description.7f710c984c66.png" width="3030" height="1771" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-help-description.7f710c984c66.png&w=757&sig=183786aba9f5ba8b48f494d4403adfc38436116d 757w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-help-description.7f710c984c66.png&w=1515&sig=9fde033f6aae07bf16a859ca882fd6b5ffbcf574 1515w, https://files.realpython.com/media/discord-bot-help-description.7f710c984c66.png 3030w" sizes="75vw" alt="Discord: Informative Help Description"/></a></p>
<p>Keep in mind that all of this functionality exists only for the <code>Bot</code> subclass, not the <code>Client</code> superclass.</p>
<p><code>Command</code> has another useful functionality: the ability to use a <code>Converter</code> to change the types of its arguments.</p>
<h3 id="converting-parameters-automatically">Converting Parameters Automatically</h3>
<p>Another benefit of using commands is the ability to <strong>convert</strong> parameters.</p>
<p>Sometimes, you require a parameter to be a certain type, but arguments to a <code>Command</code> function are, by default, strings. A <a href="https://discordpy.readthedocs.io/en/latest/ext/commands/commands.html#converters"><code>Converter</code></a> lets you convert those parameters to the type that you expect.</p>
<p>For example, if you want to build a <code>Command</code> for your bot user to simulate rolling some dice (knowing what you’ve learned so far), you might define it like this:</p>
<div class="highlight python"><pre><span></span><span class="nd">@bot</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'roll_dice'</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">'Simulates rolling dice.'</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">roll</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">number_of_dice</span><span class="p">,</span> <span class="n">number_of_sides</span><span class="p">):</span>
<span class="n">dice</span> <span class="o">=</span> <span class="p">[</span>
<span class="nb">str</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">number_of_sides</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">number_of_dice</span><span class="p">)</span>
<span class="p">]</span>
<span class="k">await</span> <span class="n">ctx</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">dice</span><span class="p">))</span>
</pre></div>
<p>You defined <code>roll</code> to take two parameters:</p>
<ol>
<li>The number of dice to roll</li>
<li>The number of sides per die</li>
</ol>
<p>Then, you decorated it with <code>.command()</code> so that you can invoke it with the <code>!roll_dice</code> command. Finally, you <code>.send()</code> the results in a message back to the <code>channel</code>.</p>
<p>While this looks correct, it isn’t. Unfortunately, if you run <code>bot.py</code>, and invoke the <code>!roll_dice</code> command in your Discord channel, you’ll see the following error:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">Ignoring exception in command roll_dice:</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 63, in wrapped</span>
<span class="go"> ret = await coro(*args, **kwargs)</span>
<span class="go"> File "bot.py", line 40, in roll</span>
<span class="go"> for _ in range(number_of_dice)</span>
<span class="go">TypeError: 'str' object cannot be interpreted as an integer</span>
<span class="go">The above exception was the direct cause of the following exception:</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/bot.py", line 860, in invoke</span>
<span class="go"> await ctx.command.invoke(ctx)</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 698, in invoke</span>
<span class="go"> await injected(*ctx.args, **ctx.kwargs)</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 72, in wrapped</span>
<span class="go"> raise CommandInvokeError(exc) from exc</span>
<span class="go">discord.ext.commands.errors.CommandInvokeError: Command raised an exception: TypeError: 'str' object cannot be interpreted as an integer</span>
</pre></div>
<p>In other words, <code>range()</code> can’t accept a <code>str</code> as an argument. Instead, it must be an <code>int</code>. While you could cast each value to an <code>int</code>, there is a better way: you can use a <code>Converter</code> .</p>
<p>In <code>discord.py</code>, a <code>Converter</code> is defined using Python 3’s <a href="https://realpython.com/python-type-checking/#annotations">function annotations</a>:</p>
<div class="highlight python"><pre><span></span><span class="nd">@bot</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'roll_dice'</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">'Simulates rolling dice.'</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">roll</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">number_of_dice</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">number_of_sides</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
<span class="n">dice</span> <span class="o">=</span> <span class="p">[</span>
<span class="nb">str</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">number_of_sides</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">number_of_dice</span><span class="p">)</span>
<span class="p">]</span>
<span class="k">await</span> <span class="n">ctx</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">dice</span><span class="p">))</span>
</pre></div>
<p>You added <code>: int</code> annotations to the two parameters that you expect to be of type <code>int</code>. Try the command again:</p>
<p><a href="https://files.realpython.com/media/discord-bot-roll-dice.0255e76f078e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-roll-dice.0255e76f078e.png" width="3025" height="1769" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-roll-dice.0255e76f078e.png&w=756&sig=ab57569c628b33d801c556a37291da4c2bf38898 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-roll-dice.0255e76f078e.png&w=1512&sig=561a833205a5df4d50976436b4640f10942e6be3 1512w, https://files.realpython.com/media/discord-bot-roll-dice.0255e76f078e.png 3025w" sizes="75vw" alt="Discord: Bot Dice-Rolling Command"/></a></p>
<p>With that little change, your command works! The difference is that you’re now converting the command arguments to <code>int</code>, which makes them compatible with your function’s logic.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> A <code>Converter</code> can be any callable, not merely data types. The argument will be passed to the callable, and the return value will be passed into the <code>Command</code>.</p>
</div>
<p>Next, you’ll learn about the <code>Check</code> object and how it can improve your commands.</p>
<h3 id="checking-command-predicates">Checking Command Predicates</h3>
<p>A <code>Check</code> is a predicate that is evaluated before a <code>Command</code> is executed to ensure that the <code>Context</code> surrounding the <code>Command</code> invocation is valid.</p>
<p>In an earlier example, you did something similar to verify that the user who sent a message that the bot handles was not the bot user, itself:</p>
<div class="highlight python"><pre><span></span><span class="k">if</span> <span class="n">message</span><span class="o">.</span><span class="n">author</span> <span class="o">==</span> <span class="n">client</span><span class="o">.</span><span class="n">user</span><span class="p">:</span>
<span class="k">return</span>
</pre></div>
<p>The <code>commands</code> extension provides a cleaner and more usable mechanism for performing this kind of check, namely using <code>Check</code> objects.</p>
<p>To demonstrate how this works, assume you want to support a command <code>!create_channel <channel_name></code> that creates a new channel. However, you only want to allow administrators the ability to create new channels with this command.</p>
<p>First, you’ll need to create a new member role in the admin. Go into the Discord guild and select the <em>{Server Name} โ Server Settings</em> menu:</p>
<p><a href="https://files.realpython.com/media/discord-bot-server-settings.1eb7e71e881b.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-server-settings.1eb7e71e881b.png" width="3023" height="1768" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-server-settings.1eb7e71e881b.png&w=755&sig=65edbf90c63743f93f195bb32661ebd7dc8c453f 755w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-server-settings.1eb7e71e881b.png&w=1511&sig=9576ffae79458bead937645d0d0fc693e91c9972 1511w, https://files.realpython.com/media/discord-bot-server-settings.1eb7e71e881b.png 3023w" sizes="75vw" alt="Discord: Server Settings Screen"/></a></p>
<p>Then, select <em>Roles</em> from the left-hand navigation list:</p>
<p><a href="https://files.realpython.com/media/discord-bot-roles.bdc21374afa9.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-roles.bdc21374afa9.png" width="3002" height="1769" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-roles.bdc21374afa9.png&w=750&sig=c9353734130e1c77c2e9022990b89f79db8806b7 750w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-roles.bdc21374afa9.png&w=1501&sig=c990b12b692cfdee8eccd91f8dec3e70e1977a67 1501w, https://files.realpython.com/media/discord-bot-roles.bdc21374afa9.png 3002w" sizes="75vw" alt="Discord: Navigate to Roles"/></a></p>
<p>Finally select the <em>+</em> sign next to <em>ROLES</em> and enter the name <code>admin</code> and select <em>Save Changes</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-new-role.7e8d95291d0d.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-new-role.7e8d95291d0d.png" width="3027" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-role.7e8d95291d0d.png&w=756&sig=f69a116bd35511e1c5b1b70d26b96c483d0f723e 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-role.7e8d95291d0d.png&w=1513&sig=3d883a2bb6f60a880175c06d523790ca07cc208c 1513w, https://files.realpython.com/media/discord-bot-new-role.7e8d95291d0d.png 3027w" sizes="75vw" alt="Discord: Create New Admin Role"/></a></p>
<p>Now, you’ve created an <code>admin</code> role that you can assign to particular users. Next, you’ll update <code>bot.py</code> to <code>Check</code> the user’s role before allowing them to initiate the command:</p>
<div class="highlight python"><pre><span></span><span class="c1"># bot.py</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">discord</span>
<span class="kn">from</span> <span class="nn">discord.ext</span> <span class="k">import</span> <span class="n">commands</span>
<span class="kn">from</span> <span class="nn">dotenv</span> <span class="k">import</span> <span class="n">load_dotenv</span>
<span class="n">load_dotenv</span><span class="p">()</span>
<span class="n">TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'DISCORD_TOKEN'</span><span class="p">)</span>
<span class="n">bot</span> <span class="o">=</span> <span class="n">commands</span><span class="o">.</span><span class="n">Bot</span><span class="p">(</span><span class="n">command_prefix</span><span class="o">=</span><span class="s1">'!'</span><span class="p">)</span>
<span class="nd">@bot</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'create-channel'</span><span class="p">)</span>
<span class="nd">@commands</span><span class="o">.</span><span class="n">has_role</span><span class="p">(</span><span class="s1">'admin'</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">create_channel</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">channel_name</span><span class="o">=</span><span class="s1">'real-python'</span><span class="p">):</span>
<span class="n">guild</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">guild</span>
<span class="n">existing_channel</span> <span class="o">=</span> <span class="n">discord</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">guild</span><span class="o">.</span><span class="n">channels</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">channel_name</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">existing_channel</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Creating a new channel: </span><span class="si">{channel_name}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">await</span> <span class="n">guild</span><span class="o">.</span><span class="n">create_text_channel</span><span class="p">(</span><span class="n">channel_name</span><span class="p">)</span>
<span class="n">bot</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span>
</pre></div>
<p>In <code>bot.py</code>, you have a new <code>Command</code> function, called <code>create_channel()</code> which takes an optional <code>channel_name</code> and creates that channel. <code>create_channel()</code> is also decorated with a <code>Check</code> called <code>has_role()</code>.</p>
<p>You also use <code>discord.utils.get()</code> to ensure that you don’t create a channel with the same name as an existing channel.</p>
<p>If you run this program as it is and type <code>!create-channel</code> into your Discord channel, then you’ll see the following error message:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python bot.py
<span class="go">Ignoring exception in command create-channel:</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/bot.py", line 860, in invoke</span>
<span class="go"> await ctx.command.invoke(ctx)</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 691, in invoke</span>
<span class="go"> await self.prepare(ctx)</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 648, in prepare</span>
<span class="go"> await self._verify_checks(ctx)</span>
<span class="go"> File "/Users/alex.ronquillo/.pyenv/versions/discord-venv/lib/python3.7/site-packages/discord/ext/commands/core.py", line 598, in _verify_checks</span>
<span class="go"> raise CheckFailure('The check functions for command {0.qualified_name} failed.'.format(self))</span>
<span class="go">discord.ext.commands.errors.CheckFailure: The check functions for command create-channel failed.</span>
</pre></div>
<p>This <code>CheckFailure</code> says that <code>has_role('admin')</code> failed. Unfortunately, this error only prints to <code>stdout</code>. It would be better to report this to the user in the channel. To do so, add the following event:</p>
<div class="highlight python"><pre><span></span><span class="nd">@bot</span><span class="o">.</span><span class="n">event</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">on_command_error</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">error</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">error</span><span class="p">,</span> <span class="n">commands</span><span class="o">.</span><span class="n">errors</span><span class="o">.</span><span class="n">CheckFailure</span><span class="p">):</span>
<span class="k">await</span> <span class="n">ctx</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">'You do not have the correct role for this command.'</span><span class="p">)</span>
</pre></div>
<p>This event handles an error event from the command and sends an informative error message back to the original <code>Context</code> of the invoked <code>Command</code>.</p>
<p>Try it all again, and you should see an error in the Discord channel:</p>
<p><a href="https://files.realpython.com/media/discord-bot-role-error-message.adfe85fe76a9.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-role-error-message.adfe85fe76a9.png" width="3028" height="1768" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-role-error-message.adfe85fe76a9.png&w=757&sig=f4c4c8c318e4079b634fd089dfbdbc14524de8d8 757w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-role-error-message.adfe85fe76a9.png&w=1514&sig=0ac21290b0674b8954a40c6a92c0e14c04a8ca78 1514w, https://files.realpython.com/media/discord-bot-role-error-message.adfe85fe76a9.png 3028w" sizes="75vw" alt="Discord: Role Check Error"/></a></p>
<p>Great! Now, to resolve the issue, you’ll need to give yourself the <em>admin</em> role:</p>
<p><a href="https://files.realpython.com/media/discord-bot-role-granted.081c0c317834.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-role-granted.081c0c317834.png" width="3026" height="1767" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-role-granted.081c0c317834.png&w=756&sig=d2cb8f8765ffd8fc574808a810170c267e06069f 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-role-granted.081c0c317834.png&w=1513&sig=33fb00357b97007f92ee8fa88ecadfdcecdb09f9 1513w, https://files.realpython.com/media/discord-bot-role-granted.081c0c317834.png 3026w" sizes="75vw" alt="Discord: Grant Admin Role"/></a></p>
<p>With the <em>admin</em> role, your user will pass the <code>Check</code> and will be able to create channels using the command.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Keep in mind that in order to assign a role, your user will have to have the correct permissions. The easiest way to ensure this is to sign in with the user that you created the guild with.</p>
</div>
<p>When you type <code>!create-channel</code> again, you’ll successfully create the channel <em>real-python</em>:</p>
<p><a href="https://files.realpython.com/media/discord-bot-new-channel.43cd2889446c.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/discord-bot-new-channel.43cd2889446c.png" width="3026" height="1768" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-channel.43cd2889446c.png&w=756&sig=768abb1b230e675f2d0b864311f357eff67414b2 756w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/discord-bot-new-channel.43cd2889446c.png&w=1513&sig=ee91b1a9895a80fa1d326de58e6e5b8ac4fd1719 1513w, https://files.realpython.com/media/discord-bot-new-channel.43cd2889446c.png 3026w" sizes="75vw" alt="Discord: Navigate to New Channel"/></a></p>
<p>Also, note that you can pass the optional <code>channel_name</code> argument to name the channel to whatever you want!</p>
<p>With this last example, you combined a <code>Command</code>, an event, a <code>Check</code>, and even the <code>get()</code> utility to create a useful Discord bot!</p>
<h2 id="conclusion">Conclusion</h2>
<p>Congratulations! Now, you’ve learned how to make a Discord bot in Python. You’re able to build bots for interacting with users in guilds that you create or even bots that other users can invite to interact with their communities. Your bots will be able to respond to messages and commands and numerous other events.</p>
<p>In this tutorial, you learned the basics of creating your own Discord bot. You now know:</p>
<ul>
<li>What Discord is</li>
<li>Why <code>discord.py</code> is so valuable</li>
<li>How to make a Discord bot in the Developer Portal</li>
<li>How to create a Discord connection in Python</li>
<li>How to handle events</li>
<li>How to create a <code>Bot</code> connection</li>
<li>How to use bot commands, checks, and converters</li>
</ul>
<p>To read more about the powerful <code>discord.py</code> library and take your bots to the next level, read through their extensive <a href="https://discordapp.com/developers/docs/intro">documentation</a>. Also, now that you’re familiar with Discord APIs in general, you have a better foundation for building other types of Discord applications.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
An Effective Python Environment: Making Yourself at Homehttps://realpython.com/effective-python-environment/2019-08-14T14:00:00+00:00This guide will walk you through the decisions you need to make when customizing your development environment for working with Python.
<p>When you’re first learning a new programming language, a lot of your time and effort go into understanding the syntax, code style, and built-in tooling. This is just as true for Python as it is for any other language. Once you gain enough familiarity to be comfortable with the ins and outs of Python, you can start to invest time into building a Python environment that will foster your productivity.</p>
<p>Your shell is more than a prebuilt program provided to you as-is. It’s a framework on which you can build an ecosystem. This ecosystem will come to fit your needs so that you can spend less time fiddling and more time thinking about the next big project you’re working on.</p>
<p>Although no two developers have the same setup, there are a number of choices everyone faces when cultivating their Python environment. It’s important to understand each of these decisions and the options available to you!</p>
<p><strong>By the end of this article, you’ll be able to answer questions like:</strong></p>
<ul>
<li>What shell should I use? What terminal should I use?</li>
<li>What version(s) of Python can I use?</li>
<li>How do I manage dependencies for different projects?</li>
<li>How can I make my tools do some of the work for me?</li>
</ul>
<p>Once you’ve answered these questions for yourself, you can embark on the journey of creating a Python environment to call your very own. Let’s get started!</p>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-dependency-pitfalls-email-course" data-focus="false">Click here to get access to a free 5-day class</a> that shows you how to avoid common dependency management issues with tools like Pip, PyPI, Virtualenv, and requirements files.</p></div>
<h2 id="shells">Shells</h2>
<p>When you use a <a href="https://realpython.com/command-line-interfaces-python-argparse/#what-is-a-command-line-interface">command-line interface</a> (CLI), you execute commands and see their output. A <strong>shell</strong> is a program that provides this (usually text-based) interface to you. Shells often provide their own programming language that you can use to manipulate files, install software, and so on.</p>
<p>There are more unique shells than could be reasonably listed here, so you’ll see a few prominent ones. Others differ in syntax or enhanced features, but they generally provide the same core functionality.</p>
<h3 id="unix-shells">Unix Shells</h3>
<p><a href="https://en.wikipedia.org/wiki/Unix">Unix</a> is a family of operating systems first developed in the early days of computing. Unix’s popularity has lasted through today, heavily inspiring Linux and macOS. The first shells were developed for use with Unix and Unix-like operating systems.</p>
<h4 id="bourne-shell-sh">Bourne Shell (<code>sh</code>)</h4>
<p>The Bourne shell—developed by Stephen Bourne for Bell Labs in 1979—was one of the first to incorporate the idea of environment variables, conditionals, and loops. It has provided a strong basis for many other shells in use today and is still available on most systems at <code>/bin/sh</code>.</p>
<h4 id="bourne-again-shell-bash">Bourne-Again Shell (<code>bash</code>)</h4>
<p>Built on the success of the original Bourne shell, <code>bash</code> introduced improved user-interaction features. With <code>bash</code>, you get <span class="keys"><kbd class="key-tab">Tab</kbd></span> completion, history, and wildcard searching for commands and paths. The <code>bash</code> programming language provides more data types, like arrays.</p>
<h4 id="z-shell-zsh">Z Shell (<code>zsh</code>)</h4>
<p><code>zsh</code> combines many of the best features from other shells along with a few of its own tricks into one experience. <code>zsh</code> offers autocorrection of misspelled commands, shorthand for manipulating multiple files, and advanced options for customizing your command prompt.</p>
<p><code>zsh</code> also provides a framework for deep customization. The <a href="https://ohmyz.sh">Oh My Zsh</a> project supplies a rich set of themes and plugins, and is often used hand in hand with <code>zsh</code>.</p>
<p><a href="https://support.apple.com/en-us/HT208050">macOS will ship with <code>zsh</code> as its default shell starting with Catalina</a>, speaking to the shell’s popularity. Consider acquainting yourself with <code>zsh</code> now so that you’ll be comfortable with it going forward.</p>
<h4 id="xonsh">Xonsh</h4>
<p>If you’re feeling particularly adventurous, you can give <a href="https://xon.sh">Xonsh</a> a try. Xonsh is a shell that combines some features of other Unix-like shells with the power of Python syntax. You can use the language you already know to accomplish tasks on your filesystem and so on.</p>
<p>Although Xonsh is powerful, it lacks the compatibility other shells tend to share. You might not be able to run many existing shell scripts in Xonsh as a result. If you find that you like Xonsh, but compatibility is a concern, then you can use Xonsh as a supplement to your activities in a more widely used shell.</p>
<h3 id="windows-shells">Windows Shells</h3>
<p>Similarly to Unix-like operating systems, Windows also offers a number of options when it comes to shells. The shells offered in Windows vary in features and syntax, so you may need to try several to find one you like best.</p>
<h4 id="cmd-cmdexe">CMD (<code>cmd.exe</code>)</h4>
<p>CMD (short for “command”) is the default CLI shell for Windows. It’s the successor to COMMAND.COM, the shell built for DOS (disk operating system).</p>
<p>Because DOS and Unix evolved independently, the commands and syntax in CMD are markedly different from shells built for Unix-like systems. However, CMD still provides the same core functionality for browsing and manipulating files, running commands, and viewing output.</p>
<h4 id="powershell">PowerShell</h4>
<p>PowerShell was released in 2006 and also ships with Windows. It provides Unix-like aliases for most commands, so if you’re coming to Windows from macOS or Linux or have to use both, then PowerShell might be great for you.</p>
<p>PowerShell is vastly more powerful than CMD. With PowerShell you can:</p>
<ul>
<li>Pipe the output of one command to the input of another</li>
<li>Automate tasks through the exposed Windows management features</li>
<li>Use a scripting language to accomplish complex tasks</li>
</ul>
<h4 id="windows-subsystem-for-linux">Windows Subsystem for Linux</h4>
<p>Microsoft has released a <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">Windows subsystem for Linux</a> (WSL) for running Linux directly on Windows. If you install WSL, then you can use <code>zsh</code>, <code>bash</code>, or any other Unix-like shell. If you want strong compatibility across your Windows and macOS or Linux environments, then be sure to give WSL a try. You may also consider <a href="https://opensource.com/article/18/5/dual-boot-linux">dual-booting Linux and Windows</a> as an alternative.</p>
<p>See this <a href="https://en.wikipedia.org/wiki/Comparison_of_command_shells">comparison of command shells</a> for exhaustive coverage.</p>
<h2 id="terminal-emulators">Terminal Emulators</h2>
<p>Early developers used <strong>terminals</strong> to interact with a central mainframe computer. These were devices with a keyboard and a screen or printer that would display computed output.</p>
<p>Today, computers are portable and don’t require separate devices to interact with them, but the terminology still remains. Whereas a shell provides the prompt and interpreter you use to interface with text-based CLI tools, a terminal <strong>emulator</strong> (often shortened to <strong>terminal</strong>) is the graphical application you run to access the shell.</p>
<p>Almost any terminal you encounter should support the same basic features:</p>
<ul>
<li><strong>Text colors</strong> for syntax highlighting in your code or distinguishing meaningful text in command output</li>
<li><strong>Scrolling</strong> for viewing an earlier command or its output</li>
<li><strong>Copy/paste</strong> for transferring text in or out of the shell from other programs</li>
<li><strong>Tabs</strong> for running multiple programs at once or separating your work into different sessions</li>
</ul>
<h3 id="macos-terminals">macOS Terminals</h3>
<p>The terminal options available for macOS are all full-featured, differing mostly in aesthetics and specific integrations with other tools.</p>
<h4 id="terminal">Terminal</h4>
<p>If you’re using a Mac, then you may have used the built-in <a href="https://support.apple.com/guide/terminal/welcome/mac">Terminal</a> app before. Terminal supports all the usual functionality, and you can also customize the color scheme and a few hotkeys. It’s a nice enough tool if you don’t need many bells and whistles. You can find the Terminal app in <em>Applications → Utilities → Terminal</em> on macOS.</p>
<h4 id="iterm2">iTerm2</h4>
<p>I’ve been a long-time user of <a href="https://iterm2.com">iTerm2</a>. It takes the developer experience on Mac a step further, offering a much wider palette of customization and productivity options that enable you to:</p>
<ul>
<li>Integrate with the shell to jump quickly to previously entered commands</li>
<li>Create custom search term highlighting in the output from commands</li>
<li>Open URLs and files displayed in the terminal with <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd>click</kbd></span></li>
</ul>
<p>A Python API ships with the latest versions of iTerm2, so you can even improve your Python chops by developing more intricate customizations!</p>
<p>iTerm2 is popular enough to enjoy first-class integration with several other tools, and has a healthy community building plugins and so on. It’s a good choice because of its more frequent release cycle compared to Terminal, which only updates as often as macOS does.</p>
<h4 id="hyper">Hyper</h4>
<p>A relative newcomer, <a href="https://hyper.is/">Hyper</a> is a terminal built on <a href="https://electronjs.org/">Electron</a>, a framework for building desktop applications using web technologies. Electron apps are heavily customizable because they’re “just JavaScript” under the hood. You can create any functionality that you can write the JavaScript for.</p>
<p>On the other hand, JavaScript is a high-level programming language and won’t always perform as well as low-level languages like Objective-C or Swift. Be mindful of the plugins you install or create!</p>
<h3 id="windows-terminals">Windows Terminals</h3>
<p>As with the shell options, Windows terminal options vary widely in utility. Some are tightly bound to a particular shell as well.</p>
<h4 id="command-prompt">Command Prompt</h4>
<p>Command Prompt is the graphical application you can use to work with CMD in Windows. Like CMD, it’s a bare-bones tool for getting a few small things done. Although Command Prompt and CMD provide fewer features than other alternatives, you can be confident that they’ll be available on nearly every Windows installation and in a consistent place.</p>
<h4 id="cygwin">Cygwin</h4>
<p>Cygwin is a third-party suite of tools for Windows that provides a Unix-like wrapper. This was my preferred setup when I was in Windows, but you may consider adopting the Windows Subsystem for Linux as it receives more traction and polish.</p>
<h4 id="windows-terminal">Windows Terminal</h4>
<p>Microsoft recently released an open source terminal for Windows 10 called <a href="https://github.com/Microsoft/Terminal">Windows Terminal</a>. It lets you work in CMD, PowerShell, and even the Windows Subsystem for Linux. If you need to do a fair amount of shell work in Windows, then Windows Terminal is probably your best bet! Windows Terminal is still in late beta, so it doesn’t ship with Windows yet. Check the documentation for instructions on getting access.</p>
<h2 id="python-version-management">Python Version Management</h2>
<p>With your choice of terminal and shell made, you can focus your attention on your Python environment specifically.</p>
<p>Something you’ll eventually run into is the need to run multiple <strong>versions</strong> of Python. Projects you use may only run on certain versions, or you may be interested in creating a project that supports multiple Python versions. You can configure your Python environment to accommodate these needs.</p>
<p>macOS and most Unix operating systems come with a version of Python installed by default. This is often called the <strong>system Python</strong>. The system Python works just fine, but it’s usually out of date. As of this writing, macOS High Sierra still ships with Python 2.7.10 as the system Python.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note</strong>: You’ll almost certainly want to install the latest version of Python at a minimum, so you’ll have at least two versions of Python already. </p>
<p><strong>It’s important that you leave the system Python as the default</strong>, because many parts of the system rely on the default Python being a specific version. This is one of many great reasons to customize your Python environment!</p>
</div>
<p>How do you navigate this? Tooling is here to help.</p>
<h3 id="pyenv"><code>pyenv</code></h3>
<p><a href="https://github.com/pyenv/pyenv"><code>pyenv</code></a> is a mature tool for installing and managing multiple Python versions on macOS. I recommend <a href="https://github.com/pyenv/pyenv#homebrew-on-macos">installing it with Homebrew</a>. If you’re using Windows, you can use <a href="https://github.com/pyenv-win/pyenv-win#installation"><code>pyenv-win</code></a>. After you’ve got <code>pyenv</code> installed, you can install multiple versions of Python into your Python environment with a few short commands:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pyenv versions
<span class="go">* system</span>
<span class="gp">$</span> python --version
<span class="go">Python 2.7.10</span>
<span class="gp">$</span> pyenv install <span class="m">3</span>.7.3 <span class="c1"># This may take some time</span>
<span class="gp">$</span> pyenv versions
<span class="go">* system</span>
<span class="go"> 3.7.3</span>
</pre></div>
<p>You can manage which Python you’d like to use in your current session, globally, or on a per-project basis as well. <code>pyenv</code> will make the <code>python</code> command point to whichever Python you specify. Note that none of these overrides the default system Python for other applications, so you’re safe to use them however they work best for you within your Python environment:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pyenv global <span class="m">3</span>.7.3
<span class="gp">$</span> pyenv versions
<span class="go"> system</span>
<span class="go">* 3.7.3 (set by /Users/dhillard/.pyenv/version)</span>
<span class="gp">$</span> pyenv <span class="nb">local</span> <span class="m">3</span>.7.3
<span class="gp">$</span> pyenv versions
<span class="go"> system</span>
<span class="go">* 3.7.3 (set by /Users/dhillard/myproj/.python-version)</span>
<span class="gp">$</span> pyenv shell <span class="m">3</span>.7.3
<span class="gp">$</span> pyenv versions
<span class="go"> system</span>
<span class="go">* 3.7.3 (set by PYENV_VERSION environment variable)</span>
<span class="gp">$</span> python --version
<span class="go">Python 3.7.3</span>
</pre></div>
<p>Because I use a specific version of Python for work, the latest version of Python for personal projects, and multiple versions for testing open source projects, <code>pyenv</code> has proven to be a fairly smooth way for me to manage all these different versions within my own Python environment. See <a href="https://realpython.com/intro-to-pyenv/">Managing Multiple Python Versions with <code>pyenv</code></a> for a detailed overview of the tool.</p>
<h3 id="conda"><code>conda</code></h3>
<p>If you’re in the data science community, you might already be using <a href="https://www.anaconda.com/distribution/">Anaconda</a> (or <a href="https://docs.conda.io/en/latest/miniconda.html">Miniconda</a>). Anaconda is a sort of one-stop shop for data science software that supports more than just Python.</p>
<p>If you don’t need the data science packages or all the things that come pre-packaged with Anaconda, <code>pyenv</code> might be a better lightweight solution for you. Managing Python versions is pretty similar in each, though. You can install Python versions similarly to <code>pyenv</code>, using the <code>conda</code> command:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> conda install <span class="nv">python</span><span class="o">=</span><span class="m">3</span>.7.3
</pre></div>
<p>You’ll see a verbose list of all the dependent software <code>conda</code> will install, and it will ask you to confirm.</p>
<p><code>conda</code> doesn’t have a way to set the “default” Python version or even a good way to see which versions of Python you’ve installed. Rather, it hinges on the concept of “environments,” which you can read more about in the following sections.</p>
<h2 id="virtual-environments">Virtual Environments</h2>
<p>Now you know how to manage multiple Python versions. Often, you’ll be working on multiple projects that need the <em>same</em> Python version.</p>
<p>Because each project has its own set of dependencies, it’s a good practice to avoid mixing them. If all the dependencies are installed together in a single Python environment, then it will be difficult to discern where each one came from. In the worst cases, two different projects may depend on two different versions of a package, but with Python you can only have one version of a package installed at one time. What a mess!</p>
<p>Enter <strong>virtual environments</strong>. You can think of a virtual environment as a carbon copy of a base version of Python. If you’ve installed Python 3.7.3, for example, then you can create many virtual environments based off of it. When you install a package in a virtual environment, you do it in isolation from other Python environments you may have. Each virtual environment has its own copy of the <code>python</code> executable.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Tip</strong>: Most virtual environment tooling provides a way to update your shell’s command prompt to show the current active virtual environment. Make sure to do this if you frequently switch between projects so you’re sure you’re working inside the correct virtual environment.</p>
</div>
<h3 id="venv"><code>venv</code></h3>
<p><a href="https://docs.python.org/3/library/venv.html"><code>venv</code></a> ships with Python versions 3.3+. You can create virtual environments just by passing it a path at which to store the environment’s <code>python</code>, installed packages, and so on:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python -m venv ~/.virtualenvs/my-env
</pre></div>
<p>You activate a virtual environment by sourcing its <code>activate</code> script:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">source</span> ~/.virtualenvs/my-env/bin/activate
</pre></div>
<p>You exit the virtual environment using the <code>deactivate</code> command, which is made available when you activate the virtual environment:</p>
<div class="highlight sh"><pre><span></span><span class="gp">(my-env)$</span> deactivate
</pre></div>
<p><code>venv</code> is built on the wonderful work and successes of the independent <a href="https://virtualenv.pypa.io/en/stable/"><code>virtualenv</code></a> project. <code>virtualenv</code> still provides a few interesting features of its own, but <code>venv</code> is nice because it provides the utility of virtual environments without requiring you to install additional software. You can probably get pretty far with it if you’re working mostly in a single Python version in your Python environment.</p>
<p>If you’re already managing multiple Python versions (or plan to), then it could make sense to integrate with that tooling to simplify the process of making new virtual environments with specific versions of Python. The <code>pyenv</code> and <code>conda</code> ecosystems both provide ways to specify the Python version to use when you create new virtual environments, covered in the following sections.</p>
<h3 id="pyenv-virtualenv"><code>pyenv-virtualenv</code></h3>
<p>If you’re using <code>pyenv</code>, then <a href="https://github.com/pyenv/pyenv-virtualenv"><code>pyenv-virtualenv</code></a> enhances <code>pyenv</code> with a subcommand for managing virtual environments:</p>
<div class="highlight sh"><pre><span></span><span class="go">// Create virtual environment</span>
<span class="gp">$</span> pyenv virtualenv <span class="m">3</span>.7.3 my-env
<span class="go">// Activate virtual environment</span>
<span class="gp">$</span> pyenv activate my-env
<span class="go">// Exit virtual environment</span>
<span class="gp">(my-env)$</span> pyenv deactivate
</pre></div>
<p>I switch contexts between a large handful of projects on a day-to-day basis. As a result, I have at least a dozen distinct virtual environments to manage in my Python environment. What’s really nice about <code>pyenv-virtualenv</code> is that you can configure a virtual environment using the <code>pyenv local</code> command and have <code>pyenv-virtualenv</code> auto-activate the right environments as you switch to different directories:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pyenv virtualenv <span class="m">3</span>.7.3 proj1
<span class="gp">$</span> pyenv virtualenv <span class="m">3</span>.7.3 proj2
<span class="gp">$</span> <span class="nb">cd</span> /Users/dhillard/proj1
<span class="gp">$</span> pyenv <span class="nb">local</span> proj1
<span class="gp">(proj1)$</span> <span class="nb">cd</span> ../proj2
<span class="gp">$</span> pyenv <span class="nb">local</span> proj2
<span class="gp">(proj2)$</span> pyenv versions
<span class="go"> system</span>
<span class="go"> 3.7.3</span>
<span class="go"> 3.7.3/envs/proj1</span>
<span class="go"> 3.7.3/envs/proj2</span>
<span class="go"> proj1</span>
<span class="go">* proj2 (set by /Users/dhillard/proj2/.python-version)</span>
</pre></div>
<p><code>pyenv</code> and <code>pyenv-virtualenv</code> have provided a particularly fluid workflow in my Python environment.</p>
<h3 id="conda_1"><code>conda</code></h3>
<p>You saw earlier that <code>conda</code> treats environments, rather than Python versions, as the main method of working. <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html"><code>conda</code> has built-in support for managing virtual environments</a>:</p>
<div class="highlight sh"><pre><span></span><span class="go">// Create virtual environment</span>
<span class="gp">$</span> conda create --name my-env <span class="nv">python</span><span class="o">=</span><span class="m">3</span>.7.3
<span class="go">// Activate virtual environment</span>
<span class="gp">$</span> conda activate my-env
<span class="go">// Exit virtual environment</span>
<span class="gp">(my-env)$</span> conda deactivate
</pre></div>
<p><code>conda</code> will install the specified version of Python if it isn’t already installed, so you don’t have to run <code>conda install python=3.7.3</code> first.</p>
<h3 id="pipenv"><code>pipenv</code></h3>
<p><a href="https://docs.pipenv.org/en/latest/"><code>pipenv</code></a> is a relatively new tool that seeks to combine package management (more on this in a moment) with virtual environment management. It mostly abstracts the virtual environment management from you, which can be great as long as things go smoothly:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">cd</span> /Users/dhillard/myproj
<span class="go">// Create virtual environment</span>
<span class="gp">$</span> pipenv install
<span class="go">Creating a virtualenv for this projectโฆ</span>
<span class="go">Pipfile: /Users/dhillard/myproj/Pipfile</span>
<span class="go">Using /path/to/pipenv/python3.7 (3.7.3) to create virtualenvโฆ</span>
<span class="go">โ Successfully created virtual environment!</span>
<span class="go">Virtualenv location: /Users/dhillard/.local/share/virtualenvs/myproj-nAbMEAt0</span>
<span class="go">Creating a Pipfile for this projectโฆ</span>
<span class="go">Pipfile.lock not found, creatingโฆ</span>
<span class="go">Locking [dev-packages] dependenciesโฆ</span>
<span class="go">Locking [packages] dependenciesโฆ</span>
<span class="go">Updated Pipfile.lock (a65489)!</span>
<span class="go">Installing dependencies from Pipfile.lock (a65489)โฆ</span>
<span class="go"> ๐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0/0 โ 00:00:00</span>
<span class="go">To activate this project's virtualenv, run pipenv shell.</span>
<span class="go">Alternatively, run a command inside the virtualenv with pipenv run.</span>
<span class="go">// Activate virtual environment (uses a subshell)</span>
<span class="gp">$</span> pipenv shell
<span class="go">Launching subshell in virtual environmentโฆ</span>
<span class="go"> . /Users/dhillard/.local/share/virtualenvs/test-nAbMEAt0/bin/activate</span>
<span class="go">// Exit virtual environment (by exiting subshell)</span>
<span class="gp">(myproj-nAbMEAt0)$</span> <span class="nb">exit</span>
</pre></div>
<p><code>pipenv</code> does all the heavy lifting of creating a virtual environment and activating it for you. If you look carefully, you can see that it also creates a file called <code>Pipfile</code>. After you first run <code>pipenv install</code>, this file contains just a few things:</p>
<div class="highlight ini"><pre><span></span><span class="k">[[source]]</span>
<span class="na">name</span> <span class="o">=</span> <span class="s">"pypi"</span>
<span class="na">url</span> <span class="o">=</span> <span class="s">"https://pypi.org/simple"</span>
<span class="na">verify_ssl</span> <span class="o">=</span> <span class="s">true</span>
<span class="k">[dev-packages]</span>
<span class="k">[packages]</span>
<span class="k">[requires]</span>
<span class="na">python_version</span> <span class="o">=</span> <span class="s">"3.7"</span>
</pre></div>
<p>In particular, note that it shows <code>python_version = "3.7"</code>. By default, <code>pipenv</code> creates a virtual Python environment using the same Python version it was installed under. If you want to use a different Python version, then you can create the <code>Pipfile</code> yourself before running <code>pipenv install</code> and specify the version you want. If you have <code>pyenv</code> installed, then <code>pipenv</code> will use it to install the specified Python version if necessary.</p>
<p>Abstracting virtual environment management is a noble goal of <code>pipenv</code>, but it does get hung up with hard-to-read errors occasionally. Give it a try, but don’t worry if you feel confused or overwhelmed by it. The tool, documentation, and community will grow and improve around it as it matures.</p>
<p>To get an in-depth introduction to virtual environments, be sure to read <a href="https://realpython.com/python-virtual-environments-a-primer">Python Virtual Environments: A Primer</a>.</p>
<h2 id="package-management">Package Management</h2>
<p>For many of the projects you work on, you’ll probably need some number of third-party packages. Those packages may have their own dependencies in turn. In the early days of Python, using packages involved manually downloading files and pointing Python at them. Today, we’re fortunate to have a variety of package management tools available to us.</p>
<p>Most package managers work in tandem with virtual environments, isolating the packages you install in one Python environment from another. Using the two together is where you really start to see the power of the tools available to you.</p>
<h3 id="pip"><code>pip</code></h3>
<p><code>pip</code> (<strong>p</strong>ip <strong>i</strong>nstalls <strong>p</strong>ackages) has been the de facto standard for package management in Python for several years. It was heavily inspired by an earlier tool called <code>easy_install</code>. Python incorporated <code>pip</code> into the standard distribution starting in version 3.4. <code>pip</code> automates the process of downloading packages and making Python aware of them.</p>
<p>If you have multiple virtual environments, then you can see that they’re isolated by installing a few packages in one:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pyenv virtualenv <span class="m">3</span>.7.3 proj1
<span class="gp">$</span> pyenv activate proj1
<span class="gp">(proj1)$</span> pip list
<span class="go">Package Version</span>
<span class="go">---------- ---------</span>
<span class="go">pip 19.1.1</span>
<span class="go">setuptools 40.8.0</span>
<span class="gp">(proj1)$</span> python -m pip install requests
<span class="go">Collecting requests</span>
<span class="go"> Downloading .../requests-2.22.0-py2.py3-none-any.whl (57kB)</span>
<span class="go"> 100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 61kB 2.2MB/s</span>
<span class="go">Collecting chardet<3.1.0,>=3.0.2 (from requests)</span>
<span class="go"> Downloading .../chardet-3.0.4-py2.py3-none-any.whl (133kB)</span>
<span class="go"> 100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 143kB 1.7MB/s</span>
<span class="go">Collecting certifi>=2017.4.17 (from requests)</span>
<span class="go"> Downloading .../certifi-2019.6.16-py2.py3-none-any.whl (157kB)</span>
<span class="go"> 100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 163kB 6.0MB/s</span>
<span class="go">Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests)</span>
<span class="go"> Downloading .../urllib3-1.25.3-py2.py3-none-any.whl (150kB)</span>
<span class="go"> 100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 153kB 1.7MB/s</span>
<span class="go">Collecting idna<2.9,>=2.5 (from requests)</span>
<span class="go"> Downloading .../idna-2.8-py2.py3-none-any.whl (58kB)</span>
<span class="go"> 100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 61kB 26.6MB/s</span>
<span class="go">Installing collected packages: chardet, certifi, urllib3, idna, requests</span>
<span class="go">Successfully installed packages</span>
<span class="gp">$</span> pip list
<span class="go">Package Version</span>
<span class="go">---------- ---------</span>
<span class="go">certifi 2019.6.16</span>
<span class="go">chardet 3.0.4</span>
<span class="go">idna 2.8</span>
<span class="go">pip 19.1.1</span>
<span class="go">requests 2.22.0</span>
<span class="go">setuptools 40.8.0</span>
<span class="go">urllib3 1.25.3</span>
</pre></div>
<p><code>pip</code> installed <code>requests</code>, along with several packages it depends on. <code>pip list</code> shows you all the currently installed packages and their versions.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Warning</strong>: You can uninstall packages using <code>pip uninstall requests</code>, for example, but this will <em>only</em> uninstall <code>requests</code>—not any of its dependencies.</p>
</div>
<p>A common way to specify project dependencies for <code>pip</code> is with a <code>requirements.txt</code> file. Each line in the file specifies a package name and, optionally, the version to install:</p>
<div class="highlight pyreq"><pre><span></span><span class="n">scipy</span><span class="o">==</span><span class="mf">1.3</span><span class="o">.</span><span class="mi">0</span>
<span class="n">requests</span><span class="o">==</span><span class="mf">2.22</span><span class="o">.</span><span class="mi">0</span>
</pre></div>
<p>You can then run <code>python -m pip install -r requirements.txt</code> to install all of the specified dependencies at once. For more on <code>pip</code>, see <a href="https://realpython.com/what-is-pip/">What is Pip? A Guide for New Pythonistas</a>.</p>
<h3 id="pipenv_1"><code>pipenv</code></h3>
<p><a href="https://docs.pipenv.org/en/latest/"><code>pipenv</code></a> has most of the same basic operations as <code>pip</code> but thinks about packages a bit differently. Remember the <code>Pipfile</code> that <code>pipenv</code> creates? When you install a package, <code>pipenv</code> adds that package to <code>Pipfile</code> and also adds more detailed information to a new <strong>lock file</strong> called <code>Pipfile.lock</code>. Lock files act as a snapshot of the precise set of packages installed, including direct dependencies as well as their sub-dependencies.</p>
<p>You can see <code>pipenv</code> sorting out the package management when you install a package:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pipenv install requests
<span class="go">Installing requestsโฆ</span>
<span class="go">Adding requests to Pipfile's [packages]โฆ</span>
<span class="go">โ Installation Succeeded</span>
<span class="go">Pipfile.lock (444a6d) out of date, updating to (a65489)โฆ</span>
<span class="go">Locking [dev-packages] dependenciesโฆ</span>
<span class="go">Locking [packages] dependenciesโฆ</span>
<span class="go">โ Success!</span>
<span class="go">Updated Pipfile.lock (444a6d)!</span>
<span class="go">Installing dependencies from Pipfile.lock (444a6d)โฆ</span>
<span class="go"> ๐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 5/5 โ 00:00:00</span>
</pre></div>
<p><code>pipenv</code> will use this lock file, if present, to install the same set of packages. You can ensure that you always have the same set of working dependencies in any Python environment you create using this approach.</p>
<p><code>pipenv</code> also distinguishes between <strong>development dependencies</strong> and <strong>production (regular) dependencies</strong>. You may need some tools during development, such as <a href="https://github.com/python/black"><code>black</code></a> or <a href="http://flake8.pycqa.org/en/latest/"><code>flake8</code></a>, that you don’t need when you run your application in production. You can specify that a package is for development when you install it:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pipenv install --dev flake8
<span class="go">Installing flake8โฆ</span>
<span class="go">Adding flake8 to Pipfile's [dev-packages]โฆ</span>
<span class="go">โ Installation Succeeded</span>
<span class="go">...</span>
</pre></div>
<p><code>pipenv install</code> (without any arguments) will only install your production packages by default, but you can tell it to install development dependencies as well with <code>pipenv install --dev</code>.</p>
<h3 id="poetry"><code>poetry</code></h3>
<p><a href="https://poetry.eustace.io"><code>poetry</code></a> addresses additional facets of package management, including creating and publishing your own packages. After installing <code>poetry</code>, you can use it to create a new project:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> poetry new myproj
<span class="go">Created package myproj in myproj</span>
<span class="gp">$</span> ls myproj/
<span class="go">README.rst myproj pyproject.toml tests</span>
</pre></div>
<p>Similarly to how <code>pipenv</code> creates the <code>Pipfile</code>, <code>poetry</code> creates a <code>pyproject.toml</code> file. This <a href="https://www.python.org/dev/peps/pep-0518/#file-format">recent standard</a> contains metadata about the project as well as dependency versions:</p>
<div class="highlight ini"><pre><span></span><span class="k">[tool.poetry]</span>
<span class="na">name</span> <span class="o">=</span> <span class="s">"myproj"</span>
<span class="na">version</span> <span class="o">=</span> <span class="s">"0.1.0"</span>
<span class="na">description</span> <span class="o">=</span> <span class="s">""</span>
<span class="na">authors</span> <span class="o">=</span> <span class="s">["Dane Hillard <github@danehillard.com>"]</span>
<span class="k">[tool.poetry.dependencies]</span>
<span class="na">python</span> <span class="o">=</span> <span class="s">"^3.7"</span>
<span class="k">[tool.poetry.dev-dependencies]</span>
<span class="na">pytest</span> <span class="o">=</span> <span class="s">"^3.0"</span>
<span class="k">[build-system]</span>
<span class="na">requires</span> <span class="o">=</span> <span class="s">["poetry>=0.12"]</span>
<span class="na">build-backend</span> <span class="o">=</span> <span class="s">"poetry.masonry.api"</span>
</pre></div>
<p>You can install packages with <code>poetry add</code> (or as development dependencies with <code>poetry add --dev</code>):</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> poetry add requests
<span class="go">Using version ^2.22 for requests</span>
<span class="go">Updating dependencies</span>
<span class="go">Resolving dependencies... (0.2s)</span>
<span class="go">Writing lock file</span>
<span class="go">Package operations: 5 installs, 0 updates, 0 removals</span>
<span class="go"> - Installing certifi (2019.6.16)</span>
<span class="go"> - Installing chardet (3.0.4)</span>
<span class="go"> - Installing idna (2.8)</span>
<span class="go"> - Installing urllib3 (1.25.3)</span>
<span class="go"> - Installing requests (2.22.0)</span>
</pre></div>
<p><code>poetry</code> also maintains a lock file, and it has a benefit over <code>pipenv</code> because it keeps track of which packages are subdependencies. As a result, you can uninstall <code>requests</code> <em>and</em> its dependencies with <code>poetry remove requests</code>.</p>
<h3 id="conda_2"><code>conda</code></h3>
<p>With <code>conda</code>, you can use <code>pip</code> to install packages as usual, but you can also use <code>conda install</code> to install packages from different <strong>channels </strong>, which are collections of packages provided by Anaconda or other providers. To install <code>requests</code> from the <code>conda-forge</code> channel, you can run <code>conda install -c conda-forge requests</code>.</p>
<p>Learn more about package management in <code>conda</code> in <a href="https://realpython.com/python-windows-machine-learning-setup/">Setting Up Python for Machine Learning on Windows</a>.</p>
<h2 id="python-interpreters">Python Interpreters</h2>
<p>If you’re interested in further customization of your Python environment, you can choose the command line experience you have when interacting with Python. The Python interpreter provides a <strong>read-eval-print loop</strong> (REPL), which is what comes up when you type <code>python</code> with no arguments in your shell:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="go">Python 3.7.3 (default, Jun 17 2019, 14:09:05)</span>
<span class="go">[Clang 10.0.1 (clang-1001.0.46.4)] on darwin</span>
<span class="go">Type "help", "copyright", "credits" or "license" for more information.</span>
<span class="gp">>>> </span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span>
<span class="go">4</span>
<span class="gp">>>> </span><span class="n">exit</span><span class="p">()</span>
</pre></div>
<p>The REPL <strong>reads</strong> what you type, <strong>evaluates</strong> it as Python code, and <strong>prints</strong> the result. Then it waits to do it all over again. This is about as much as the default Python REPL provides, which is sufficient for a good portion of typical work.</p>
<h3 id="ipython">IPython</h3>
<p>Like Anaconda, <a href="https://ipython.org/">IPython</a> is a suite of tools supporting more than just Python, but one of its main features is an alternative Python REPL. IPython’s REPL numbers each command and explicitly labels each command’s input and output. After installing IPython (<code>python -m pip install ipython</code>), you can run the <code>ipython</code> command in place of the <code>python</code> command to use the IPython REPL:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="go">Python 3.7.3</span>
<span class="go">Type 'copyright', 'credits' or 'license' for more information</span>
<span class="go">IPython 6.0.0.dev -- An enhanced Interactive Python. Type '?' for help.</span>
<span class="gp">In [1]: </span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span>
<span class="gh">Out[1]: </span><span class="go">4</span>
<span class="gp">In [2]: </span><span class="nb">print</span><span class="p">(</span><span class="s2">"Hello!"</span><span class="p">)</span>
<span class="gh">Out[2]: </span><span class="go">Hello!</span>
</pre></div>
<p>IPython also supports <span class="keys"><kbd class="key-tab">Tab</kbd></span> completion, more powerful help features, and strong integration with other tooling such as <a href="https://matplotlib.org/"><code>matplotlib</code></a> for graphing. IPython provided the foundation for <a href="https://jupyter.org/">Jupyter</a>, and both have been used extensively in the data science community because of their integration with other tools.</p>
<p>The IPython REPL is <a href="https://ipython.readthedocs.io/en/stable/config/intro.html">highly configurable</a> too, so while it falls just shy of being a full development environment, it can still be a boon to your productivity. Its built-in and customizable <a href="https://ipython.org/ipython-doc/3/interactive/tutorial.html#magic-functions">magic commands</a> are worth checking out.</p>
<h3 id="bpython"><code>bpython</code></h3>
<p><a href="https://bpython-interpreter.org"><code>bpython</code></a> is another alternative REPL that provides inline syntax highlighting, tab completion, and even auto-suggestions as you type. It provides quite a few of the quick benefits of IPython without altering the interface much. Without the weight of the integrations and so on, <code>bpython</code> might be good to add to your repertoire for a while to see how it improves your use of the REPL.</p>
<h2 id="text-editors">Text Editors</h2>
<p>You spend a third of your life sleeping, so it makes sense to invest in a great bed. As a developer, you spend a great deal of your time reading and writing code, so it follows that you should invest time in setting up your Python environment’s text editor just the way you like it.</p>
<p>Each editor offers a different set of key bindings and model for manipulating text. Some require a mouse to interact with them effectively, whereas others can be controlled with only the keyboard. Some people consider their choice of text editor and customizations some of the most personal decisions they make!</p>
<p>There are so many options to choose from in this arena, so I won’t attempt to cover it in detail here. Check out <a href="https://realpython.com/python-ides-code-editors-guide/">Python IDEs and Code Editors (Guide)</a> for a broad overview. A good strategy is to find a simple, small text editor for quick changes and a full-featured IDE for more involved work. <a href="https://www.vim.org/">Vim</a> and <a href="https://www.jetbrains.com/pycharm/">PyCharm</a>, respectively, are my editors of choice.</p>
<h2 id="python-environment-tips-and-tricks">Python Environment Tips and Tricks</h2>
<p>Once you’ve made the big decisions about your Python environment, the rest of the road is paved with little tweaks to make your life a little easier. These tweaks each save minutes or seconds alone, but they collectively save you hours of time.</p>
<p>Making a certain activity easier reduces your cognitive load so you can focus on the task at hand instead of the logistics surrounding it. If you notice yourself performing an action over and over, then consider automating it. Use <a href="https://xkcd.com/1205/">this wonderful chart</a> from XKCD to determine if it’s worth automating a particular task.</p>
<p>Here are a few final tips.</p>
<p><strong>Know your current virtual environment</strong></p>
<p>As mentioned earlier, it’s a great idea to display the active Python version or virtual environment in your command prompt. Most tools will do this for you, but if not (or if you want to customize the prompt), the value is usually contained in the <code>VIRTUAL_ENV</code> environment variable.</p>
<p><strong>Disable unnecessary, temporary files</strong></p>
<p>Have you ever noticed <code>*.pyc</code> files all over your project directories? These files are pre-compiled Python bytecode—they help Python start your application faster. In production, these are a great idea because they’ll give you some performance gain. During local development, however, they’re rarely useful. Set <code>PYTHONDONTWRITEBYTECODE=1</code> to disable this behavior. If you find use cases for them later, then you can easily remove this from your Python environment.</p>
<p><strong>Customize your Python interpreter</strong></p>
<p>You can affect how the REPL behaves using a <strong>startup file</strong>. Python will read this startup file and execute the code it contains before entering the REPL. Set the <code>PYTHONSTARTUP</code> environment variable to the path of your startup file. (Mine’s at <code>~/.pystartup</code>.) If you’d like to hit <span class="keys"><kbd class="key-arrow-up">Up</kbd></span> for command history and <span class="keys"><kbd class="key-tab">Tab</kbd></span> for completion like your shell provides, then give <a href="https://github.com/daneah/dotfiles/blob/master/source/pystartup">this startup file</a> a try.</p>
<h2 id="conclusion">Conclusion</h2>
<p>You learned about many facets of the typical Python environment. Armed with this knowledge, you can:</p>
<ul>
<li>Choose a terminal with the aesthetics and enhanced features you like</li>
<li>Choose a shell with as many (or as few) customization options as you need</li>
<li>Manage multiple versions of Python on your system</li>
<li>Manage multiple projects that use a single version of Python, using virtual Python environments</li>
<li>Install packages in your virtual environments</li>
<li>Choose a REPL that suits your interactive coding needs</li>
</ul>
<p>When you’ve got your Python environment just so, I hope you’ll share screenshots, screencasts, or blog posts about your perfect setup โจ</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Traditional Face Detection With Pythonhttps://realpython.com/courses/traditional-face-detection-python/2019-08-13T14:00:00+00:00In this course on face detection with Python, you'll learn about a historically important algorithm for object detection that can be successfully applied to finding the location of a human face within an image.
<p><strong>Computer vision</strong> is an exciting and growing field. There are tons of interesting problems to solve! One of them is face detection: the ability of a computer to recognize that a photograph contains a human face, and tell you where it is located. In this course, you’ll learn about <strong>face detection</strong> with Python.</p>
<p>To detect any object in an image, it is necessary to understand how images are represented inside a computer, and how that object differs <em>visually</em> from any other object.</p>
<p>Once that is done, the process of scanning an image and looking for those visual cues needs to be automated and optimized. All these steps come together to form a fast and reliable computer vision algorithm.</p>
<p><strong>In this course, you’ll learn:</strong></p>
<ul>
<li>What face detection is</li>
<li>How computers understand features in images</li>
<li>How to quickly analyze many different features to reach a decision</li>
<li>How to use a minimal Python solution for detecting human faces in images</li>
</ul>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Your Guide to the Python Print Functionhttps://realpython.com/python-print/2019-08-12T14:00:00+00:00In this step-by-step tutorial, you'll learn about the print() function in Python and discover some of its lesser-known features. Avoid common mistakes, take your "hello world" to the next level, and know when to use a better alternative.
<p>If you’re like most Python users, including me, then you probably started your Python journey by learning about <code>print()</code>. It helped you write your very own <code>hello world</code> one-liner. You can use it to display formatted messages onto the screen and perhaps find some bugs. But if you think that’s all there is to know about Python’s <code>print()</code> function, then you’re missing out on a lot!</p>
<p>Keep reading to take full advantage of this seemingly boring and unappreciated little function. This tutorial will get you up to speed with using Python <code>print()</code> effectively. However, prepare for a deep dive as you go through the sections. You may be surprised how much <code>print()</code> has to offer!</p>
<p><strong>By the end of this tutorial, you’ll know how to:</strong></p>
<ul>
<li>Avoid common mistakes with Python’s <code>print()</code></li>
<li>Deal with newlines, character encodings, and buffering</li>
<li>Write text to files</li>
<li>Mock <code>print()</code> in unit tests</li>
<li>Build advanced user interfaces in the terminal</li>
</ul>
<p>If you’re a complete beginner, then you’ll benefit most from reading the first part of this tutorial, which illustrates the essentials of printing in Python. Otherwise, feel free to skip that part and jump around as you see fit.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> <code>print()</code> was a major addition to Python 3, in which it replaced the old <code>print</code> statement available in Python 2.</p>
<p>There were a number of good reasons for that, as you’ll see shortly. Although this tutorial focuses on Python 3, it does show the old way of printing in Python for reference.</p>
</div>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-cheat-sheet-experiment" data-focus="false">Click here to get our free Python Cheat Sheet</a> that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.</p></div>
<h2 id="printing-in-a-nutshell">Printing in a Nutshell</h2>
<p>Let’s jump in by looking at a few real-life examples of printing in Python. By the end of this section, you’ll know every possible way of calling <code>print()</code>. Or, in programmer lingo, you’d say you’ll be familiar with the <strong>function signature</strong>.</p>
<h3 id="calling-print">Calling Print</h3>
<p>The simplest example of using Python <code>print()</code> requires just a few keystrokes:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">()</span>
</pre></div>
<p>You don’t pass any arguments, but you still need to put empty parentheses at the end, which tell Python to actually <a href="https://realpython.com/lessons/example-function/">execute the function</a> rather than just refer to it by name.</p>
<p>This will produce an invisible newline character, which in turn will cause a blank line to appear on your screen. You can call <code>print()</code> multiple times like this to add vertical space. It’s just as if you were hitting <span class="keys"><kbd class="key-enter">Enter</kbd></span> on your keyboard in a word processor.</p>
<div class="card mb-3" id="collapse_card5787b1">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse5787b1" aria-expanded="false" aria-controls="collapse5787b1">Newline Character</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse5787b1" aria-expanded="false" aria-controls="collapse5787b1">Show/Hide</button></p></div>
<div id="collapse5787b1" class="collapse" data-parent="#collapse_card5787b1"><div class="card-body" markdown="1">
<p>A <strong>newline character</strong> is a special control character used to indicate the end of a line (EOL). It usually doesn’t have a visible representation on the screen, but some text editors can display such non-printable characters with little graphics.</p>
<p>The word “character” is somewhat of a misnomer in this case, because a newline is often more than one character long. For example, the Windows operating system, as well as the HTTP protocol, represent newlines with a pair of characters. Sometimes you need to take those differences into account to design truly portable programs.</p>
<p>To find out what constitutes a newline in your operating system, use Python’s built-in <code>os</code> module.</p>
<p>This will immediately tell you that <strong>Windows</strong> and <strong>DOS</strong> represent the newline as a sequence of <code>\r</code> followed by <code>\n</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="n">os</span><span class="o">.</span><span class="n">linesep</span>
<span class="go">'\r\n'</span>
</pre></div>
<p>On <strong>Unix</strong>, <strong>Linux</strong>, and recent versions of <strong>macOS</strong>, it’s a single <code>\n</code> character:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="n">os</span><span class="o">.</span><span class="n">linesep</span>
<span class="go">'\n'</span>
</pre></div>
<p>The classic <strong>Mac OS X</strong>, however, sticks to its own “think different” philosophy by choosing yet another representation:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="n">os</span><span class="o">.</span><span class="n">linesep</span>
<span class="go">'\r'</span>
</pre></div>
<p>Notice how these characters appear in string literals. They use special syntax with a preceding backslash (<code>\</code>) to denote the start of an <strong>escape character sequence</strong>. Such sequences allow for representing control characters, which would be otherwise invisible on screen.</p>
<p>Most programming languages come with a predefined set of escape sequences for special characters such as these:</p>
<ul>
<li><strong><code>\\</code>:</strong> backslash</li>
<li><strong><code>\b</code>:</strong> backspace</li>
<li><strong><code>\t</code>:</strong> tab</li>
<li><strong><code>\r</code>:</strong> carriage return (CR)</li>
<li><strong><code>\n</code>:</strong> newline, also known as line feed (LF)</li>
</ul>
<p>The last two are reminiscent of mechanical typewriters, which required two separate commands to insert a newline. The first command would move the carriage back to the beginning of the current line, while the second one would advance the roll to the next line.</p>
<p>By comparing the corresponding <strong>ASCII character codes</strong>, you’ll see that putting a backslash in front of a character changes its meaning completely. However, not all characters allow for this–only the special ones.</p>
<p>To compare ASCII character codes, you may want to use the built-in <code>ord()</code> function:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">ord</span><span class="p">(</span><span class="s1">'r'</span><span class="p">)</span>
<span class="go">114</span>
<span class="gp">>>> </span><span class="nb">ord</span><span class="p">(</span><span class="s1">'</span><span class="se">\r</span><span class="s1">'</span><span class="p">)</span>
<span class="go">13</span>
</pre></div>
<p>Keep in mind that, in order to form a correct escape sequence, there must be no space between the backslash character and a letter!</p>
</div></div>
</div>
<p>As you just saw, calling <code>print()</code> without arguments results in a <strong>blank line</strong>, which is a line comprised solely of the newline character. Don’t confuse this with an <strong>empty line</strong>, which doesn’t contain any characters at all, not even the newline!</p>
<p>You can use Python’s <a href="https://realpython.com/python-strings/">string</a> literals to visualize these two:</p>
<div class="highlight python"><pre><span></span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span> <span class="c1"># Blank line</span>
<span class="s1">''</span> <span class="c1"># Empty line</span>
</pre></div>
<p>The first one is one character long, whereas the second one has no content.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> To remove the newline character from a string in Python, use its <code>.rstrip()</code> method, like this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="s1">'A line of text.</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">rstrip</span><span class="p">()</span>
<span class="go">'A line of text.'</span>
</pre></div>
<p>This strips any trailing whitespace from the right edge of the string of characters.</p>
</div>
<p>In a more common scenario, you’d want to communicate some message to the end user. There are a few ways to achieve this.</p>
<p>First, you may pass a string literal directly to <code>print()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'Please wait while the program is loading...'</span><span class="p">)</span>
</pre></div>
<p>This will print the message verbatim onto the screen.</p>
<div class="card mb-3" id="collapse_cardf395ab">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapsef395ab" aria-expanded="false" aria-controls="collapsef395ab">String Literals</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapsef395ab" aria-expanded="false" aria-controls="collapsef395ab">Show/Hide</button></p></div>
<div id="collapsef395ab" class="collapse" data-parent="#collapse_cardf395ab"><div class="card-body" markdown="1">
<p><strong>String literals</strong> in Python can be enclosed either in single quotes (<code>'</code>) or double quotes (<code>"</code>). According to the official <a href="https://www.python.org/dev/peps/pep-0008/#string-quotes">PEP 8</a> style guide, you should just pick one and keep using it consistently. There’s no difference, unless you need to nest one in another.</p>
<p>For example, you can’t use double quotes for the literal and also include double quotes inside of it, because that’s ambiguous for the Python interpreter:</p>
<div class="highlight python"><pre><span></span><span class="s2">"My favorite book is "</span><span class="n">Python</span> <span class="n">Tricks</span><span class="s2">""</span> <span class="c1"># Wrong!</span>
</pre></div>
<p>What you want to do is enclose the text, which contains double quotes, within single quotes:</p>
<div class="highlight python"><pre><span></span><span class="s1">'My favorite book is "Python Tricks"'</span>
</pre></div>
<p>The same trick would work the other way around:</p>
<div class="highlight python"><pre><span></span><span class="s2">"My favorite book is 'Python Tricks'"</span>
</pre></div>
<p>Alternatively, you could use escape character sequences mentioned earlier, to make Python treat those internal double quotes literally as part of the string literal:</p>
<div class="highlight python"><pre><span></span><span class="s2">"My favorite book is </span><span class="se">\"</span><span class="s2">Python Tricks</span><span class="se">\"</span><span class="s2">"</span>
</pre></div>
<p>Escaping is fine and dandy, but it can sometimes get in the way. Specifically, when you need your string to contain relatively many backslash characters in literal form.</p>
<p>One classic example is a file path on Windows:</p>
<div class="highlight python"><pre><span></span><span class="s1">'C:\Users\jdoe'</span> <span class="c1"># Wrong!</span>
<span class="s1">'C:</span><span class="se">\\</span><span class="s1">Users</span><span class="se">\\</span><span class="s1">jdoe'</span>
</pre></div>
<p>Notice how each backslash character needs to be escaped with yet another backslash.</p>
<p>This is even more prominent with regular expressions, which quickly get convoluted due to the heavy use of special characters:</p>
<div class="highlight python"><pre><span></span><span class="s1">'^</span><span class="se">\\</span><span class="s1">w:</span><span class="se">\\\\</span><span class="s1">(?:(?:(?:[^</span><span class="se">\\\\</span><span class="s1">]+)?|(?:[^</span><span class="se">\\\\</span><span class="s1">]+)</span><span class="se">\\\\</span><span class="s1">[^</span><span class="se">\\\\</span><span class="s1">]+)*)$'</span>
</pre></div>
<p>Fortunately, you can turn off character escaping entirely with the help of raw-string literals. Simply prepend an <code>r</code> or <code>R</code> before the opening quote, and now you end up with this:</p>
<div class="highlight python"><pre><span></span><span class="sa">r</span><span class="s1">'C:\Users\jdoe'</span>
<span class="sa">r</span><span class="s1">'^\w:</span><span class="se">\\</span><span class="s1">(?:(?:(?:[^</span><span class="se">\\</span><span class="s1">]+)?|(?:[^</span><span class="se">\\</span><span class="s1">]+)</span><span class="se">\\</span><span class="s1">[^</span><span class="se">\\</span><span class="s1">]+)*)$'</span>
</pre></div>
<p>That’s much better, isn’t it?</p>
<p>There are a few more prefixes that give special meaning to string literals in Python, but you won’t get into them here.</p>
<p>Lastly, you can define multi-line string literals by enclosing them between <code>'''</code> or <code>"""</code>, which are often used as <strong>docstrings</strong>.</p>
<p>Here’s an example:</p>
<div class="highlight python"><pre><span></span><span class="sd">"""</span>
<span class="sd">This is an example</span>
<span class="sd">of a multi-line string</span>
<span class="sd">in Python.</span>
<span class="sd">"""</span>
</pre></div>
<p>To prevent an initial newline, simply put the text right after the opening <code>"""</code>:</p>
<div class="highlight python"><pre><span></span><span class="sd">"""This is an example</span>
<span class="sd">of a multi-line string</span>
<span class="sd">in Python.</span>
<span class="sd">"""</span>
</pre></div>
<p>You can also use a backslash to get rid of the newline:</p>
<div class="highlight python"><pre><span></span><span class="sd">"""\</span>
<span class="sd">This is an example</span>
<span class="sd">of a multi-line string</span>
<span class="sd">in Python.</span>
<span class="sd">"""</span>
</pre></div>
<p>To remove indentation from a multi-line string, you might take advantage of the built-in <code>textwrap</code> module:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">textwrap</span>
<span class="gp">>>> </span><span class="n">paragraph</span> <span class="o">=</span> <span class="s1">'''</span>
<span class="gp">... </span><span class="s1"> This is an example</span>
<span class="gp">... </span><span class="s1"> of a multi-line string</span>
<span class="gp">... </span><span class="s1"> in Python.</span>
<span class="gp">... </span><span class="s1"> '''</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">paragraph</span><span class="p">)</span>
<span class="go"> This is an example</span>
<span class="go"> of a multi-line string</span>
<span class="go"> in Python.</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">textwrap</span><span class="o">.</span><span class="n">dedent</span><span class="p">(</span><span class="n">paragraph</span><span class="p">)</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span>
<span class="go">This is an example</span>
<span class="go">of a multi-line string</span>
<span class="go">in Python.</span>
</pre></div>
<p>This will take care of unindenting paragraphs for you. There are also a few other useful functions in <code>textwrap</code> for text alignment you’d find in a word processor.</p>
</div></div>
</div>
<p>Secondly, you could extract that message into its own variable with a meaningful name to enhance readability and promote code reuse:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">message</span> <span class="o">=</span> <span class="s1">'Please wait while the program is loading...'</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">message</span><span class="p">)</span>
</pre></div>
<p>Lastly, you could pass an expression, like <a href="https://realpython.com/lessons/concatenating-joining-strings-python/">string concatenation</a>, to be evaluated before printing the result:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'Hello, '</span> <span class="o">+</span> <span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">()</span> <span class="o">+</span> <span class="s1">'! How are you?'</span><span class="p">)</span>
<span class="go">Hello, jdoe! How are you?</span>
</pre></div>
<p>In fact, there are a dozen ways to format messages in Python. I highly encourage you to take a look at <a href="https://realpython.com/python-f-strings/">f-strings</a>, introduced in Python 3.6, because they offer the most concise syntax of them all:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Hello, {os.getlogin()}! How are you?'</span><span class="p">)</span>
</pre></div>
<p>Moreover, f-strings will prevent you from making a common mistake, which is forgetting to type cast concatenated operands. Python is a strongly typed language, which means it won’t allow you to do this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="s1">'My age is '</span> <span class="o">+</span> <span class="mi">42</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<input>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="s1">'My age is '</span> <span class="o">+</span> <span class="mi">42</span>
<span class="gr">TypeError</span>: <span class="n">can only concatenate str (not "int") to str</span>
</pre></div>
<p>That’s wrong because adding numbers to strings doesn’t make sense. You need to explicitly convert the number to string first, in order to join them together:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="s1">'My age is '</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
<span class="go">'My age is 42'</span>
</pre></div>
<p>Unless you <a href="https://realpython.com/courses/python-exceptions-101/">handle such errors</a> yourself, the Python interpreter will let you know about a problem by showing a <a href="https://realpython.com/python-traceback/">traceback</a>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> <code>str()</code> is a global built-in function that converts an object into its string representation.</p>
<p>You can call it directly on any object, for example, a number:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">str</span><span class="p">(</span><span class="mf">3.14</span><span class="p">)</span>
<span class="go">'3.14'</span>
</pre></div>
<p>Built-in data types have a predefined string representation out of the box, but later in this article, you’ll find out how to provide one for your custom classes.</p>
</div>
<p>As with any function, it doesn’t matter whether you pass a literal, a variable, or an expression. Unlike many other functions, however, <code>print()</code> will accept anything regardless of its type.</p>
<p>So far, you only looked at the string, but how about other data types? Let’s try literals of different built-in types and see what comes out:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> <span class="c1"># <class 'int'></span>
<span class="go">42</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="mf">3.14</span><span class="p">)</span> <span class="c1"># <class 'float'></span>
<span class="go">3.14</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span><span class="n">j</span><span class="p">)</span> <span class="c1"># <class 'complex'></span>
<span class="go">(1+2j)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span> <span class="c1"># <class 'bool'></span>
<span class="go">True</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span> <span class="c1"># <class 'list'></span>
<span class="go">[1, 2, 3]</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> <span class="c1"># <class 'tuple'></span>
<span class="go">(1, 2, 3)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">({</span><span class="s1">'red'</span><span class="p">,</span> <span class="s1">'green'</span><span class="p">,</span> <span class="s1">'blue'</span><span class="p">})</span> <span class="c1"># <class 'set'></span>
<span class="go">{'red', 'green', 'blue'}</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">({</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Alice'</span><span class="p">,</span> <span class="s1">'age'</span><span class="p">:</span> <span class="mi">42</span><span class="p">})</span> <span class="c1"># <class 'dict'></span>
<span class="go">{'name': 'Alice', 'age': 42}</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span> <span class="c1"># <class 'str'></span>
<span class="go">hello</span>
</pre></div>
<p>Watch out for the <code>None</code> constant, though. Despite being used to indicate an absence of a value, it will show up as <code>'None'</code> rather than an empty string:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
<span class="go">None</span>
</pre></div>
<p>How does <code>print()</code> know how to work with all these different types? Well, the short answer is that it doesn’t. It implicitly calls <code>str()</code> behind the scenes to type cast any object into a string. Afterward, it treats strings in a uniform way.</p>
<p>Later in this tutorial, you’ll learn how to use this mechanism for printing custom data types such as your classes.</p>
<p>Okay, you’re now able to call <code>print()</code> with a single argument or without any arguments. You know how to print fixed or formatted messages onto the screen. The next subsection will expand on message formatting a little bit.</p>
<div class="card mb-3" id="collapse_card09ba8e">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse09ba8e" aria-expanded="false" aria-controls="collapse09ba8e">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse09ba8e" aria-expanded="false" aria-controls="collapse09ba8e">Show/Hide</button></p></div>
<div id="collapse09ba8e" class="collapse" data-parent="#collapse_card09ba8e"><div class="card-body" markdown="1">
<p>To achieve the same result in the previous language generation, you’d normally want to drop the parentheses enclosing the text:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Python 2</span>
<span class="k">print</span>
<span class="k">print</span> <span class="s1">'Please wait...'</span>
<span class="k">print</span> <span class="s1">'Hello, </span><span class="si">%s</span><span class="s1">! How are you?'</span> <span class="o">%</span> <span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">()</span>
<span class="k">print</span> <span class="s1">'Hello, </span><span class="si">%s</span><span class="s1">. Your age is </span><span class="si">%d</span><span class="s1">.'</span> <span class="o">%</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">)</span>
</pre></div>
<p>That’s because <code>print</code> wasn’t a function back then, as you’ll see in the <a href="#understanding-python-print">next section</a>. Note, however, that in some cases parentheses in Python are redundant. It wouldn’t harm to include them as they’d just get ignored. Does that mean you should be using the <code>print</code> statement as if it were a function? Absolutely not!</p>
<p>For example, parentheses enclosing a single expression or a literal are optional. Both instructions produce the same result in Python 2:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s1">'Please wait...'</span>
<span class="go">Please wait...</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'Please wait...'</span><span class="p">)</span>
<span class="go">Please wait...</span>
</pre></div>
<p>Round brackets are actually part of the expression rather than the <code>print</code> statement. If your expression happens to contain only one item, then it’s as if you didn’t include the brackets at all.</p>
<p>On the other hand, putting parentheses around multiple items forms a <a href="https://realpython.com/python-lists-tuples/#python-tuples">tuple</a>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s1">'My name is'</span><span class="p">,</span> <span class="s1">'John'</span>
<span class="go">My name is John</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'My name is'</span><span class="p">,</span> <span class="s1">'John'</span><span class="p">)</span>
<span class="go">('My name is', 'John')</span>
</pre></div>
<p>This is a known source of confusion. In fact, you’d also get a tuple by appending a trailing comma to the only item surrounded by parentheses:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'Please wait...'</span><span class="p">)</span>
<span class="go">Please wait...</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'Please wait...'</span><span class="p">,)</span> <span class="c1"># Notice the comma</span>
<span class="go">('Please wait...',)</span>
</pre></div>
<p>The bottom line is that you shouldn’t call <code>print</code> with brackets in Python 2. Although, to be completely accurate, you can work around this with the help of a <code>__future__</code> import, which you’ll read more about in the relevant section.</p>
</div></div>
</div>
<h3 id="separating-multiple-arguments">Separating Multiple Arguments</h3>
<p>You saw <code>print()</code> called without any arguments to produce a blank line and then called with a single argument to display either a fixed or a formatted message.</p>
<p>However, it turns out that this function can accept any number of <strong>positional arguments</strong>, including zero, one, or more arguments. That’s very handy in a common case of message formatting, where you’d want to join a few elements together.</p>
<div class="card mb-3" id="collapse_card9620b8">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse9620b8" aria-expanded="false" aria-controls="collapse9620b8">Positional Arguments</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse9620b8" aria-expanded="false" aria-controls="collapse9620b8">Show/Hide</button></p></div>
<div id="collapse9620b8" class="collapse" data-parent="#collapse_card9620b8"><div class="card-body" markdown="1">
<p>Arguments can be passed to a function in one of several ways. One way is by explicitly naming the arguments when you’re calling the function, like this:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">div</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="go">0.75</span>
</pre></div>
<p>Since arguments can be uniquely identified by name, their order doesn’t matter. Swapping them out will still give the same result:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">div</span><span class="p">(</span><span class="n">b</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">a</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="go">0.75</span>
</pre></div>
<p>Conversely, arguments passed without names are identified by their position. That’s why <strong>positional arguments</strong> need to follow strictly the order imposed by the function signature:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">div</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="go">0.75</span>
<span class="gp">>>> </span><span class="n">div</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="go">1.3333333333333333</span>
</pre></div>
<p><code>print()</code> allows an <a href="https://docs.python.org/dev/tutorial/controlflow.html#arbitrary-argument-lists">arbitrary number of positional arguments</a> thanks to the <code>*args</code> parameter.</p>
</div></div>
</div>
<p>Let’s have a look at this example:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'My name is'</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">(),</span> <span class="s1">'and I am'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="go">My name is jdoe and I am 42</span>
</pre></div>
<p><code>print()</code> concatenated all four arguments passed to it, and it inserted a single space between them so that you didn’t end up with a squashed message like <code>'My name isjdoeand I am42'</code>.</p>
<p>Notice that it also took care of proper type casting by implicitly calling <code>str()</code> on each argument before joining them together. If you recall from the previous subsection, a naรฏve concatenation may easily result in an error due to incompatible types:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'My age is: '</span> <span class="o">+</span> <span class="mi">42</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<input>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'My age is: '</span> <span class="o">+</span> <span class="mi">42</span><span class="p">)</span>
<span class="gr">TypeError</span>: <span class="n">can only concatenate str (not "int") to str</span>
</pre></div>
<p>Apart from accepting a variable number of positional arguments, <code>print()</code> defines four named or <strong>keyword arguments</strong>, which are optional since they all have default values. You can view their brief documentation by calling <code>help(print)</code> from the interactive interpreter.</p>
<p>Let’s focus on <code>sep</code> just for now. It stands for <strong>separator</strong> and is assigned a single space (<code>' '</code>) by default. It determines the value to join elements with.</p>
<p>It has to be either a string or <code>None</code>, but the latter has the same effect as the default space:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
<span class="go">hello world</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">' '</span><span class="p">)</span>
<span class="go">hello world</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">)</span>
<span class="go">hello world</span>
</pre></div>
<p>If you wanted to suppress the separator completely, you’d have to pass an empty string (<code>''</code>) instead:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="go">helloworld</span>
</pre></div>
<p>You may want <code>print()</code> to join its arguments as separate lines. In that case, simply pass the escaped newline character described earlier:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="go">hello</span>
<span class="go">world</span>
</pre></div>
<p>A more useful example of the <code>sep</code> parameter would be printing something like file paths:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'home'</span><span class="p">,</span> <span class="s1">'user'</span><span class="p">,</span> <span class="s1">'documents'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'/'</span><span class="p">)</span>
<span class="go">home/user/documents</span>
</pre></div>
<p>Remember that the separator comes between the elements, not around them, so you need to account for that in one way or another:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'/home'</span><span class="p">,</span> <span class="s1">'user'</span><span class="p">,</span> <span class="s1">'documents'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'/'</span><span class="p">)</span>
<span class="go">/home/user/documents</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span> <span class="s1">'home'</span><span class="p">,</span> <span class="s1">'user'</span><span class="p">,</span> <span class="s1">'documents'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'/'</span><span class="p">)</span>
<span class="go">/home/user/documents</span>
</pre></div>
<p>Specifically, you can insert a slash character (<code>/</code>) into the first positional argument, or use an empty string as the first argument to enforce the leading slash.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Be careful about joining elements of a list or tuple.</p>
<p>Doing it manually will result in a well-known <code>TypeError</code> if at least one of the elements isn’t a string:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="s1">'jdoe is'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="s1">'years old'</span><span class="p">]))</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<input>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="nb">print</span><span class="p">(</span><span class="s1">','</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="s1">'jdoe is'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="s1">'years old'</span><span class="p">]))</span>
<span class="gr">TypeError</span>: <span class="n">sequence item 1: expected str instance, int found</span>
</pre></div>
<p>It’s safer to just unpack the sequence with the star operator (<code>*</code>) and let <code>print()</code> handle type casting:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="s1">'jdoe is'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="s1">'years old'</span><span class="p">])</span>
<span class="go">jdoe is 42 years old</span>
</pre></div>
<p>Unpacking is effectively the same as calling <code>print()</code> with individual elements of the list.</p>
</div>
<p>One more interesting example could be exporting data to a <a href="https://realpython.com/courses/reading-and-writing-csv-files/">comma-separated values</a> (CSV) format:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Python Tricks'</span><span class="p">,</span> <span class="s1">'Dan Bader'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
<span class="go">1,Python Tricks,Dan Bader</span>
</pre></div>
<p>This wouldn’t handle edge cases such as escaping commas correctly, but for simple use cases, it should do. The line above would show up in your terminal window. In order to save it to a file, you’d have to redirect the output. Later in this section, you’ll see how to use <code>print()</code> to write text to files straight from Python.</p>
<p>Finally, the <code>sep</code> parameter isn’t constrained to a single character only. You can join elements with strings of any length:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'node'</span><span class="p">,</span> <span class="s1">'child'</span><span class="p">,</span> <span class="s1">'child'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">' -> '</span><span class="p">)</span>
<span class="go">node -> child -> child</span>
</pre></div>
<p>In the upcoming subsections, you’ll explore the remaining keyword arguments of the <code>print()</code> function.</p>
<div class="card mb-3" id="collapse_cardab50a5">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapseab50a5" aria-expanded="false" aria-controls="collapseab50a5">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapseab50a5" aria-expanded="false" aria-controls="collapseab50a5">Show/Hide</button></p></div>
<div id="collapseab50a5" class="collapse" data-parent="#collapse_cardab50a5"><div class="card-body" markdown="1">
<p>To print multiple elements in Python 2, you must drop the parentheses around them, just like before:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s1">'My name is'</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">(),</span> <span class="s1">'and I am'</span><span class="p">,</span> <span class="mi">42</span>
<span class="go">My name is jdoe and I am 42</span>
</pre></div>
<p>If you kept them, on the other hand, you’d be passing a single tuple element to the <code>print</code> statement:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'My name is'</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">(),</span> <span class="s1">'and I am'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="go">('My name is', 'jdoe', 'and I am', 42)</span>
</pre></div>
<p>Moreover, there’s no way of altering the default separator of joined elements in Python 2, so one workaround is to use string interpolation like so:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="c1"># Python 2</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">os</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s1">'My name is </span><span class="si">%s</span><span class="s1"> and I am </span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getlogin</span><span class="p">(),</span> <span class="mi">42</span><span class="p">)</span>
<span class="go">My name is jdoe and I am 42</span>
</pre></div>
<p>That was the default way of formatting strings until the <code>.format()</code> method got backported from Python 3.</p>
</div></div>
</div>
<h3 id="preventing-line-breaks">Preventing Line Breaks</h3>
<p>Sometimes you don’t want to end your message with a trailing newline so that subsequent calls to <code>print()</code> will continue on the same line. Classic examples include updating the progress of a long-running operation or prompting the user for input. In the latter case, you want the user to type in the answer on the same line:</p>
<div class="highlight text"><pre><span></span>Are you sure you want to do this? [y/n] y
</pre></div>
<p>Many programming languages expose functions similar to <code>print()</code> through their standard libraries, but they let you decide whether to add a newline or not. For example, in Java and C#, you have two distinct functions, while other languages require you to explicitly append <code>\n</code> at the end of a string literal.</p>
<p>Here are a few examples of syntax in such languages:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Language</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>Perl</td>
<td><code>print "hello world\n"</code></td>
</tr>
<tr>
<td>C</td>
<td><code>printf("hello world\n");</code></td>
</tr>
<tr>
<td>C++</td>
<td><code>std::cout << "hello world" << std::endl;</code></td>
</tr>
</tbody>
</table>
</div>
<p>In contrast, Python’s <code>print()</code> function always adds <code>\n</code> without asking, because that’s what you want in most cases. To disable it, you can take advantage of yet another keyword argument, <code>end</code>, which dictates what to end the line with.</p>
<p>In terms of semantics, the <code>end</code> parameter is almost identical to the <code>sep</code> one that you saw earlier:</p>
<ul>
<li>It must be a string or <code>None</code>.</li>
<li>It can be arbitrarily long.</li>
<li>It has a default value of <code>'\n'</code>.</li>
<li>If equal to <code>None</code>, it’ll have the same effect as the default value.</li>
<li>If equal to an empty string (<code>''</code>), it’ll suppress the newline.</li>
</ul>
<p>Now you understand what’s happening under the hood when you’re calling <code>print()</code> without arguments. Since you don’t provide any positional arguments to the function, there’s nothing to be joined, and so the default separator isn’t used at all. However, the default value of <code>end</code> still applies, and a blank line shows up.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You may be wondering why the <code>end</code> parameter has a fixed default value rather than whatever makes sense on your operating system.</p>
<p>Well, you don’t have to worry about newline representation across different operating systems when printing, because <code>print()</code> will handle the conversion automatically. Just remember to always use the <code>\n</code> escape sequence in string literals.</p>
<p>This is currently the most portable way of printing a newline character in Python:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'line1</span><span class="se">\n</span><span class="s1">line2</span><span class="se">\n</span><span class="s1">line3'</span><span class="p">)</span>
<span class="go">line1</span>
<span class="go">line2</span>
<span class="go">line3</span>
</pre></div>
<p>If you were to try to forcefully print a Windows-specific newline character on a Linux machine, for example, you’d end up with broken output:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'line1</span><span class="se">\r\n</span><span class="s1">line2</span><span class="se">\r\n</span><span class="s1">line3'</span><span class="p">)</span>
<span class="go">line3</span>
</pre></div>
<p>On the flip side, when you open a file for reading with <code>open()</code>, you don’t need to care about newline representation either. The function will translate any system-specific newline it encounters into a universal <code>'\n'</code>. At the same time, you have control over how the newlines should be treated both on input and output if you really need that.</p>
</div>
<p>To disable the newline, you must specify an empty string through the <code>end</code> keyword argument:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'Checking file integrity...'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="c1"># (...)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'ok'</span><span class="p">)</span>
</pre></div>
<p>Even though these are two separate <code>print()</code> calls, which can execute a long time apart, you’ll eventually see only one line. First, it’ll look like this:</p>
<div class="highlight text"><pre><span></span>Checking file integrity...
</pre></div>
<p>However, after the second call to <code>print()</code>, the same line will appear on the screen as:</p>
<div class="highlight text"><pre><span></span>Checking file integrity...ok
</pre></div>
<p>As with <code>sep</code>, you can use <code>end</code> to join individual pieces into a big blob of text with a custom separator. Instead of joining multiple arguments, however, it’ll append text from each function call to the same line:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'The first sentence'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'. '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'The second sentence'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'. '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'The last sentence.'</span><span class="p">)</span>
</pre></div>
<p>These three instructions will output a single line of text:</p>
<div class="highlight text"><pre><span></span>The first sentence. The second sentence. The last sentence.
</pre></div>
<p>You can mix the two keyword arguments:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'Mercury'</span><span class="p">,</span> <span class="s1">'Venus'</span><span class="p">,</span> <span class="s1">'Earth'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">', '</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">', '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Mars'</span><span class="p">,</span> <span class="s1">'Jupiter'</span><span class="p">,</span> <span class="s1">'Saturn'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">', '</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">', '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Uranus'</span><span class="p">,</span> <span class="s1">'Neptune'</span><span class="p">,</span> <span class="s1">'Pluto'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">', '</span><span class="p">)</span>
</pre></div>
<p>Not only do you get a single line of text, but all items are separated with a comma:</p>
<div class="highlight text"><pre><span></span>Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, Pluto
</pre></div>
<p>There’s nothing to stop you from using the newline character with some extra padding around it:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'Printing in a Nutshell'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1"> * '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calling Print'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1"> * '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Separating Multiple Arguments'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'</span><span class="se">\n</span><span class="s1"> * '</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Preventing Line Breaks'</span><span class="p">)</span>
</pre></div>
<p>It would print out the following piece of text:</p>
<div class="highlight text"><pre><span></span>Printing in a Nutshell
* Calling Print
* Separating Multiple Arguments
* Preventing Line Breaks
</pre></div>
<p>As you can see, the <code>end</code> keyword argument will accept arbitrary strings.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Looping over lines in a text file preserves their own newline characters, which combined with the <code>print()</code> function’s default behavior will result in a redundant newline character:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">file_object</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod</span>
<span class="go">tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,</span>
<span class="go">quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo</span>
</pre></div>
<p>There are two newlines after each line of text. You want to strip one of the them, as shown earlier in this article, before printing the line:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">rstrip</span><span class="p">())</span>
</pre></div>
<p>Alternatively, you can keep the newline in the content but suppress the one appended by <code>print()</code> automatically. You’d use the <code>end</code> keyword argument to do that:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">file_object</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod</span>
<span class="go">tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,</span>
<span class="go">quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo</span>
</pre></div>
<p>By ending a line with an empty string, you effectively disable one of the newlines.</p>
</div>
<p>You’re getting more acquainted with printing in Python, but there’s still a lot of useful information ahead. In the upcoming subsection, you’ll learn how to intercept and redirect the <code>print()</code> function’s output.</p>
<div class="card mb-3" id="collapse_card37ad87">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse37ad87" aria-expanded="false" aria-controls="collapse37ad87">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse37ad87" aria-expanded="false" aria-controls="collapse37ad87">Show/Hide</button></p></div>
<div id="collapse37ad87" class="collapse" data-parent="#collapse_card37ad87"><div class="card-body" markdown="1">
<p>Preventing a line break in Python 2 requires that you append a trailing comma to the expression:</p>
<div class="highlight python"><pre><span></span><span class="k">print</span> <span class="s1">'hello world'</span><span class="p">,</span>
</pre></div>
<p>However, that’s not ideal because it also adds an unwanted space, which would translate to <code>end=' '</code> instead of <code>end=''</code> in Python 3. You can test this with the following code snippet:</p>
<div class="highlight python"><pre><span></span><span class="k">print</span> <span class="s1">'BEFORE'</span>
<span class="k">print</span> <span class="s1">'hello'</span><span class="p">,</span>
<span class="k">print</span> <span class="s1">'AFTER'</span>
</pre></div>
<p>Notice there’s a space between the words <code>hello</code> and <code>AFTER</code>:</p>
<div class="highlight text"><pre><span></span>BEFORE
hello AFTER
</pre></div>
<p>In order to get the expected result, you’d need to use one of the tricks explained later, which is either importing the <code>print()</code> function from <code>__future__</code> or falling back to the <code>sys</code> module:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span>
<span class="k">print</span> <span class="s1">'BEFORE'</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span>
<span class="k">print</span> <span class="s1">'AFTER'</span>
</pre></div>
<p>This will print the correct output without extra space:</p>
<div class="highlight text"><pre><span></span>BEFORE
helloAFTER
</pre></div>
<p>While using the <code>sys</code> module gives you control over what gets printed to the standard output, the code becomes a little bit more cluttered.</p>
</div></div>
</div>
<h3 id="printing-to-a-file">Printing to a File</h3>
<p>Believe it or not, <code>print()</code> doesn’t know how to turn messages into text on your screen, and frankly it doesn’t need to. That’s a job for lower-level layers of code, which understand bytes and know how to push them around.</p>
<p><code>print()</code> is an abstraction over these layers, providing a convenient interface that merely delegates the actual printing to a stream or <strong>file-like object</strong>. A stream can be any file on your disk, a network socket, or perhaps an in-memory buffer.</p>
<p>In addition to this, there are three standard streams provided by the operating system:</p>
<ol>
<li><strong><code>stdin</code>:</strong> standard input</li>
<li><strong><code>stdout</code>:</strong> standard output</li>
<li><strong><code>stderr</code>:</strong> standard error</li>
</ol>
<div class="card mb-3" id="collapse_cardd994b2">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapsed994b2" aria-expanded="false" aria-controls="collapsed994b2">Standard Streams</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapsed994b2" aria-expanded="false" aria-controls="collapsed994b2">Show/Hide</button></p></div>
<div id="collapsed994b2" class="collapse" data-parent="#collapse_cardd994b2"><div class="card-body" markdown="1">
<p><strong>Standard output</strong> is what you see in the terminal when you run various command-line programs including your own <a href="https://realpython.com/run-python-scripts/">Python scripts</a>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> cat hello.py
<span class="go">print('This will appear on stdout')</span>
<span class="gp">$</span> python hello.py
<span class="go">This will appear on stdout</span>
</pre></div>
<p>Unless otherwise instructed, <code>print()</code> will default to writing to standard output. However, you can tell your operating system to temporarily swap out <code>stdout</code> for a file stream, so that any output ends up in that file rather than the screen:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python hello.py > file.txt
<span class="gp">$</span> cat file.txt
<span class="go">This will appear on stdout</span>
</pre></div>
<p>That’s called stream redirection.</p>
<p>The standard error is similar to <code>stdout</code> in that it also shows up on the screen. Nonetheless, it’s a separate stream, whose purpose is to log error messages for diagnostics. By redirecting one or both of them, you can keep things clean.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> To redirect <code>stderr</code>, you need to know about <strong>file descriptors</strong>, also known as <strong>file handles</strong>.</p>
<p>They’re arbitrary, albeit constant, numbers associated with standard streams. Below, you’ll find a summary of the file descriptors for a family of POSIX-compliant operating systems:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Stream</th>
<th>File Descriptor</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>stdin</code></td>
<td>0</td>
</tr>
<tr>
<td><code>stdout</code></td>
<td>1</td>
</tr>
<tr>
<td><code>stderr</code></td>
<td>2</td>
</tr>
</tbody>
</table>
</div>
<p>Knowing those descriptors allows you to redirect one or more streams at a time:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Command</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>./program > out.txt</code></td>
<td>Redirect <code>stdout</code></td>
</tr>
<tr>
<td><code>./program 2> err.txt</code></td>
<td>Redirect <code>stderr</code></td>
</tr>
<tr>
<td><code>./program > out.txt 2> err.txt</code></td>
<td>Redirect <code>stdout</code> and <code>stderr</code> to separate files</td>
</tr>
<tr>
<td><code>./program &> out_err.txt</code></td>
<td>Redirect <code>stdout</code> and <code>stderr</code> to the same file</td>
</tr>
</tbody>
</table>
</div>
<p>Note that <code>></code> is the same as <code>1></code>.</p>
</div>
<p>Some programs use different coloring to distinguish between messages printed to <code>stdout</code> and <code>stderr</code>:</p>
<figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/pycharm-console-streams.69affb3462e4.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-console-streams.69affb3462e4.png" width="999" height="286" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-console-streams.69affb3462e4.png&w=249&sig=6f17bc7db6fe5ea5fb93f3575f95a4d46eb88af0 249w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-console-streams.69affb3462e4.png&w=499&sig=ccfce4ba25553f534f07f65b483f9af2b106109b 499w, https://files.realpython.com/media/pycharm-console-streams.69affb3462e4.png 999w" sizes="75vw" alt="The output of a program executed in PyCharm"/></a><figcaption class="figure-caption text-center">Run Tool Window in PyCharm</figcaption></figure>
<p>While both <code>stdout</code> and <code>stderr</code> are write-only, <code>stdin</code> is read-only. You can think of standard input as your keyboard, but just like with the other two, you can swap out <code>stdin</code> for a file to read data from.</p>
</div></div>
</div>
<p>In Python, you can access all standard streams through the built-in <code>sys</code> module:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stdin</span>
<span class="go"><_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'></span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span>
<span class="go">0</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span>
<span class="go"><_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'></span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span>
<span class="go">1</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span>
<span class="go"><_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'></span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span>
<span class="go">2</span>
</pre></div>
<p>As you can see, these predefined values resemble file-like objects with <code>mode</code> and <code>encoding</code> attributes as well as <code>.read()</code> and <code>.write()</code> methods among many others.</p>
<p>By default, <code>print()</code> is bound to <code>sys.stdout</code> through its <code>file</code> argument, but you can change that. Use that keyword argument to indicate a file that was open in write or append mode, so that messages go straight to it:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'hello world'</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">file_object</span><span class="p">)</span>
</pre></div>
<p>This will make your code immune to stream redirection at the operating system level, which might or might not be desired.</p>
<p>For more information on <a href="https://realpython.com/working-with-files-in-python/">working with files in Python</a>, you can check out <a href="https://realpython.com/read-write-files-python">Reading and Writing Files in Python (Guide)</a>.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Don’t try using <code>print()</code> for writing binary data as it’s only well suited for text.</p>
<p>Just call the binary file’s <code>.write()</code> directly:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.dat'</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="n">file_object</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
<span class="n">file_object</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s1">'</span><span class="se">\xff</span><span class="s1">'</span><span class="p">)</span>
</pre></div>
<p>If you wanted to write raw bytes on the standard output, then this will fail too because <code>sys.stdout</code> is a character stream:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="gr">TypeError</span>: <span class="n">write() argument must be str, not bytes</span>
</pre></div>
<p>You must dig deeper to get a handle of the underlying byte stream instead:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="n">num_bytes_written</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">buffer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s1">'</span><span class="se">\x41\x0a</span><span class="s1">'</span><span class="p">)</span>
<span class="go">A</span>
</pre></div>
<p>This prints an uppercase letter <code>A</code> and a newline character, which correspond to decimal values of 65 and 10 in ASCII. However, they’re encoded using hexadecimal notation in the bytes literal.</p>
</div>
<p>Note that <code>print()</code> has no control over <a href="https://realpython.com/python-encodings-guide/">character encoding</a>. It’s the stream’s responsibility to encode received Unicode strings into bytes correctly. In most cases, you won’t set the encoding yourself, because the default UTF-8 is what you want. If you really need to, perhaps for legacy systems, you can use the <code>encoding</code> argument of <code>open()</code>:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'w'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'iso-8859-1'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'รผber naรฏve cafรฉ'</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">file_object</span><span class="p">)</span>
</pre></div>
<p>Instead of a real file existing somewhere in your file system, you can provide a fake one, which would reside in your computer’s memory. You’ll use this technique later for mocking <code>print()</code> in unit tests:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">io</span>
<span class="gp">>>> </span><span class="n">fake_file</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello world'</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">fake_file</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">fake_file</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
<span class="go">'hello world\n'</span>
</pre></div>
<p>If you got to this point, then you’re left with only one keyword argument in <code>print()</code>, which you’ll see in the next subsection. It’s probably the least used of them all. Nevertheless, there are times when it’s absolutely necessary.</p>
<div class="card mb-3" id="collapse_card80f59e">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse80f59e" aria-expanded="false" aria-controls="collapse80f59e">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse80f59e" aria-expanded="false" aria-controls="collapse80f59e">Show/Hide</button></p></div>
<div id="collapse80f59e" class="collapse" data-parent="#collapse_card80f59e"><div class="card-body" markdown="1">
<p>There’s a special syntax in Python 2 for replacing the default <code>sys.stdout</code> with a custom file in the <code>print</code> statement:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="k">print</span> <span class="o">>></span> <span class="n">file_object</span><span class="p">,</span> <span class="s1">'hello world'</span>
</pre></div>
<p>Because strings and bytes are represented with the same <code>str</code> type in Python 2, the <code>print</code> statement can handle binary data just fine:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.dat'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="k">print</span> <span class="o">>></span> <span class="n">file_object</span><span class="p">,</span> <span class="s1">'</span><span class="se">\x41\x0a</span><span class="s1">'</span>
</pre></div>
<p>Although, there’s a problem with character encoding. The <code>open()</code> function in Python 2 lacks the <code>encoding</code> parameter, which would often result in the dreadful <code>UnicodeEncodeError</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">unicode_text</span> <span class="o">=</span> <span class="sa">u</span><span class="s1">'</span><span class="se">\xfc</span><span class="s1">ber na</span><span class="se">\xef</span><span class="s1">ve caf</span><span class="se">\xe9</span><span class="s1">'</span>
<span class="gp">... </span> <span class="k">print</span> <span class="o">>></span> <span class="n">file_object</span><span class="p">,</span> <span class="n">unicode_text</span>
<span class="gp">... </span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">3</span>, in <span class="n"><module></span>
<span class="gr">UnicodeEncodeError</span>: <span class="n">'ascii' codec can't encode character u'\xfc'...</span>
</pre></div>
<p>Notice how non-Latin characters must be escaped in both Unicode and string literals to avoid a syntax error. Take a look at this example:</p>
<div class="highlight python"><pre><span></span><span class="n">unicode_literal</span> <span class="o">=</span> <span class="sa">u</span><span class="s1">'</span><span class="se">\xfc</span><span class="s1">ber na</span><span class="se">\xef</span><span class="s1">ve caf</span><span class="se">\xe9</span><span class="s1">'</span>
<span class="n">string_literal</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\xc3\xbc</span><span class="s1">ber na</span><span class="se">\xc3\xaf</span><span class="s1">ve caf</span><span class="se">\xc3\xa9</span><span class="s1">'</span>
</pre></div>
<p>Alternatively, you could specify source code encoding according to <a href="https://www.python.org/dev/peps/pep-0263/">PEP 263</a> at the top of the file, but that wasn’t the best practice due to portability issues:</p>
<div class="highlight python"><pre><span></span><span class="ch">#!/usr/bin/env python2</span>
<span class="c1"># -*- coding: utf-8 -*-</span>
<span class="n">unescaped_unicode_literal</span> <span class="o">=</span> <span class="sa">u</span><span class="s1">'รผber naรฏve cafรฉ'</span>
<span class="n">unescaped_string_literal</span> <span class="o">=</span> <span class="s1">'รผber naรฏve cafรฉ'</span>
</pre></div>
<p>Your best bet is to encode the Unicode string just before printing it. You can do this manually:</p>
<div class="highlight python"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="n">unicode_text</span> <span class="o">=</span> <span class="sa">u</span><span class="s1">'</span><span class="se">\xfc</span><span class="s1">ber na</span><span class="se">\xef</span><span class="s1">ve caf</span><span class="se">\xe9</span><span class="s1">'</span>
<span class="n">encoded_text</span> <span class="o">=</span> <span class="n">unicode_text</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="k">print</span> <span class="o">>></span> <span class="n">file_object</span><span class="p">,</span> <span class="n">encoded_text</span>
</pre></div>
<p>However, a more convenient option is to use the built-in <code>codecs</code> module:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">codecs</span>
<span class="k">with</span> <span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'file.txt'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-8'</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_object</span><span class="p">:</span>
<span class="n">unicode_text</span> <span class="o">=</span> <span class="sa">u</span><span class="s1">'</span><span class="se">\xfc</span><span class="s1">ber na</span><span class="se">\xef</span><span class="s1">ve caf</span><span class="se">\xe9</span><span class="s1">'</span>
<span class="k">print</span> <span class="o">>></span> <span class="n">file_object</span><span class="p">,</span> <span class="n">unicode_text</span>
</pre></div>
<p>It’ll take care of making appropriate conversions when you need to read or write files.</p>
</div></div>
</div>
<h3 id="buffering-print-calls">Buffering Print Calls</h3>
<p>In the previous subsection, you learned that <code>print()</code> delegates printing to a file-like object such as <code>sys.stdout</code>. Some streams, however, buffer certain I/O operations to enhance performance, which can get in the way. Let’s take a look at an example.</p>
<p>Imagine you were writing a countdown timer, which should append the remaining time to the same line every second:</p>
<div class="highlight text"><pre><span></span>3...2...1...Go!
</pre></div>
<p>Your first attempt may look something like this:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="n">num_seconds</span> <span class="o">=</span> <span class="mi">3</span>
<span class="k">for</span> <span class="n">countdown</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">num_seconds</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)):</span>
<span class="k">if</span> <span class="n">countdown</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">countdown</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'...'</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Go!'</span><span class="p">)</span>
</pre></div>
<p>As long as the <code>countdown</code> variable is greater than zero, the code keeps appending text without a trailing newline and then goes to sleep for one second. Finally, when the countdown is finished, it prints <code>Go!</code> and terminates the line.</p>
<p>Unexpectedly, instead of counting down every second, the program idles wastefully for three seconds, and then suddenly prints the entire line at once:</p>
<p><a href="https://files.realpython.com/media/print_countdown.ba38eb242915.gif" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/print_countdown.ba38eb242915.gif" width="576" height="236" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/print_countdown.ba38eb242915.gif&w=144&sig=1084be0d7eb723df32fe7d0e16f06724e2666c04 144w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/print_countdown.ba38eb242915.gif&w=288&sig=0618ab48bb5a9e8b729d8a97e4736f78f9b3560f 288w, https://files.realpython.com/media/print_countdown.ba38eb242915.gif 576w" sizes="75vw" alt="Terminal with buffered output"/></a></p>
<p>That’s because the operating system buffers subsequent writes to the standard output in this case. You need to know that there are three kinds of streams with respect to buffering:</p>
<ol>
<li>Unbuffered</li>
<li>Line-buffered</li>
<li>Block-buffered</li>
</ol>
<p><strong>Unbuffered</strong> is self-explanatory, that is, no buffering is taking place, and all writes have immediate effect. A <strong>line-buffered</strong> stream waits before firing any I/O calls until a line break appears somewhere in the buffer, whereas a <strong>block-buffered</strong> one simply allows the buffer to fill up to a certain size regardless of its content. Standard output is both <strong>line-buffered</strong> and <strong>block-buffered</strong>, depending on which event comes first.</p>
<p>Buffering helps to reduce the number of expensive I/O calls. Think about sending messages over a high-latency network, for example. When you connect to a remote server to execute commands over the SSH protocol, each of your keystrokes may actually produce an individual data packet, which is orders of magnitude bigger than its payload. What an overhead! It would make sense to wait until at least a few characters are typed and then send them together. That’s where buffering steps in.</p>
<p>On the other hand, buffering can sometimes have undesired effects as you just saw with the countdown example. To fix it, you can simply tell <code>print()</code> to forcefully flush the stream without waiting for a newline character in the buffer using its <code>flush</code> flag:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">countdown</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">'...'</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>That’s all. Your countdown should work as expected now, but don’t take my word for it. Go ahead and test it to see the difference.</p>
<p>Congratulations! At this point, you’ve seen examples of calling <code>print()</code> that cover all of its parameters. You know their purpose and when to use them. Understanding the signature is only the beginning, however. In the upcoming sections, you’ll see why.</p>
<div class="card mb-3" id="collapse_cardec4147">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapseec4147" aria-expanded="false" aria-controls="collapseec4147">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapseec4147" aria-expanded="false" aria-controls="collapseec4147">Show/Hide</button></p></div>
<div id="collapseec4147" class="collapse" data-parent="#collapse_cardec4147"><div class="card-body" markdown="1">
<p>There isn’t an easy way to flush the stream in Python 2, because the <code>print</code> statement doesn’t allow for it by itself. You need to get a handle of its lower-level layer, which is the standard output, and call it directly:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="n">num_seconds</span> <span class="o">=</span> <span class="mi">3</span>
<span class="k">for</span> <span class="n">countdown</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">num_seconds</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)):</span>
<span class="k">if</span> <span class="n">countdown</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1">...'</span> <span class="o">%</span> <span class="n">countdown</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">'Go!'</span>
</pre></div>
<p>Alternatively, you could disable buffering of the standard streams either by providing the <code>-u</code> flag to the Python interpreter or by setting up the <code>PYTHONUNBUFFERED</code> environment variable:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python2 -u countdown.py
<span class="gp">$</span> <span class="nv">PYTHONUNBUFFERED</span><span class="o">=</span><span class="m">1</span> python2 countdown.py
</pre></div>
<p>Note that <code>print()</code> was backported to Python 2 and made available through the <code>__future__</code> module. Unfortunately, it doesn’t come with the <code>flush</code> parameter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="gp">>>> </span><span class="n">help</span><span class="p">(</span><span class="k">print</span><span class="p">)</span>
<span class="go">Help on built-in function print in module __builtin__:</span>
<span class="go">print(...)</span>
<span class="go"> print(value, ..., sep=' ', end='\n', file=sys.stdout)</span>
</pre></div>
<p>What you’re seeing here is a <strong>docstring</strong> of the <code>print()</code> function. You can display docstrings of various objects in Python using the built-in <code>help()</code> function.</p>
</div></div>
</div>
<h3 id="printing-custom-data-types">Printing Custom Data Types</h3>
<p>Up until now, you only dealt with built-in data types such as strings and numbers, but you’ll often want to print your own abstract data types. Let’s have a look at different ways of defining them.</p>
<p>For simple objects without any logic, whose purpose is to carry data, you’ll typically take advantage of <a href="https://docs.python.org/3/library/collections.html#collections.namedtuple"><code>namedtuple</code></a>, which is available in the standard library. Named tuples have a neat textual representation out of the box:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">collections</span> <span class="k">import</span> <span class="n">namedtuple</span>
<span class="gp">>>> </span><span class="n">Person</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s1">'Person'</span><span class="p">,</span> <span class="s1">'name age'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">jdoe</span> <span class="o">=</span> <span class="n">Person</span><span class="p">(</span><span class="s1">'John Doe'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)</span>
<span class="go">Person(name='John Doe', age=42)</span>
</pre></div>
<p>That’s great as long as holding data is enough, but in order to add behaviors to the <code>Person</code> type, you’ll eventually need to define a class. Take a look at this example:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span>
</pre></div>
<p>If you now create an instance of the <code>Person</code> class and try to print it, you’ll get this bizarre output, which is quite different from the equivalent <code>namedtuple</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">jdoe</span> <span class="o">=</span> <span class="n">Person</span><span class="p">(</span><span class="s1">'John Doe'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)</span>
<span class="go"><__main__.Person object at 0x7fcac3fed1d0></span>
</pre></div>
<p>It’s the default representation of objects, which comprises their address in memory, the corresponding class name and a module in which they were defined. You’ll fix that in a bit, but just for the record, as a quick workaround you could combine <code>namedtuple</code> and a custom class through <a href="https://realpython.com/inheritance-composition-python/">inheritance</a>:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">collections</span> <span class="k">import</span> <span class="n">namedtuple</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">namedtuple</span><span class="p">(</span><span class="s1">'Person'</span><span class="p">,</span> <span class="s1">'name age'</span><span class="p">)):</span>
<span class="k">pass</span>
</pre></div>
<p>Your <code>Person</code> class has just become a specialized kind of <code>namedtuple</code> with two attributes, which you can customize.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> In Python 3, the <code>pass</code> statement can be replaced with the <a href="https://docs.python.org/dev/library/constants.html#Ellipsis">ellipsis</a> (<code>...</code>) literal to indicate a placeholder:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">delta</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
<p>This prevents the interpreter from raising <code>IndentationError</code> due to missing indented block of code.</p>
</div>
<p>That’s better than a plain <code>namedtuple</code>, because not only do you get printing right for free, but you can also add custom methods and properties to the class. However, it solves one problem while introducing another. Remember that tuples, including named tuples, are immutable in Python, so they can’t change their values once created.</p>
<p>It’s true that designing immutable data types is desirable, but in many cases, you’ll want them to allow for change, so you’re back with regular classes again.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Following other languages and frameworks, Python 3.7 introduced <a href="https://realpython.com/python-data-classes/">data classes</a>, which you can think of as mutable tuples. This way, you get the best of both worlds:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">dataclasses</span> <span class="k">import</span> <span class="n">dataclass</span>
<span class="gp">>>> </span><span class="nd">@dataclass</span>
<span class="gp">... </span><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
<span class="gp">... </span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
<span class="gp">... </span> <span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
<span class="gp">... </span>
<span class="gp">... </span> <span class="k">def</span> <span class="nf">celebrate_birthday</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">age</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="gp">... </span>
<span class="gp">>>> </span><span class="n">jdoe</span> <span class="o">=</span> <span class="n">Person</span><span class="p">(</span><span class="s1">'John Doe'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">jdoe</span><span class="o">.</span><span class="n">celebrate_birthday</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)</span>
<span class="go">Person(name='John Doe', age=43)</span>
</pre></div>
<p>The syntax for <a href="https://www.python.org/dev/peps/pep-0526/">variable annotations</a>, which is required to specify class fields with their corresponding types, was defined in Python 3.6.</p>
</div>
<p>From earlier subsections, you already know that <code>print()</code> implicitly calls the built-in <code>str()</code> function to convert its positional arguments into strings. Indeed, calling <code>str()</code> manually against an instance of the regular <code>Person</code> class yields the same result as printing it:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">jdoe</span> <span class="o">=</span> <span class="n">Person</span><span class="p">(</span><span class="s1">'John Doe'</span><span class="p">,</span> <span class="mi">42</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">str</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)</span>
<span class="go">'<__main__.Person object at 0x7fcac3fed1d0>'</span>
</pre></div>
<p><code>str()</code>, in turn, looks for one of two <strong>magic methods</strong> within the class body, which you typically implement. If it doesn’t find one, then it falls back to the ugly default representation. Those magic methods are, in order of search:</p>
<ol>
<li><code>def __str__(self)</code></li>
<li><code>def __repr__(self)</code></li>
</ol>
<p>The first one is recommended to return a short, human-readable text, which includes information from the most relevant attributes. After all, you don’t want to expose sensitive data, such as user passwords, when printing objects.</p>
<p>However, the other one should provide complete information about an object, to allow for restoring its state from a string. Ideally, it should return valid Python code, so that you can pass it directly to <code>eval()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">repr</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)</span>
<span class="go">"Person(name='John Doe', age=42)"</span>
<span class="gp">>>> </span><span class="nb">type</span><span class="p">(</span><span class="nb">eval</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">jdoe</span><span class="p">)))</span>
<span class="go"><class '__main__.Person'></span>
</pre></div>
<p>Notice the use of another built-in function, <code>repr()</code>, which always tries to call <code>.__repr__()</code> in an object, but falls back to the default representation if it doesn’t find that method.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Even though <code>print()</code> itself uses <code>str()</code> for type casting, some compound data types delegate that call to <code>repr()</code> on their members. This happens to lists and tuples, for example.</p>
<p>Consider this class with both magic methods, which return alternative string representations of the same object:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">User</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">login</span><span class="p">,</span> <span class="n">password</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">login</span> <span class="o">=</span> <span class="n">login</span>
<span class="bp">self</span><span class="o">.</span><span class="n">password</span> <span class="o">=</span> <span class="n">password</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">login</span>
<span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s2">"User('</span><span class="si">{self.login}</span><span class="s2">', '</span><span class="si">{self.password}</span><span class="s2">')"</span>
</pre></div>
<p>If you print a single object of the <code>User</code> class, then you won’t see the password, because <code>print(user)</code> will call <code>str(user)</code>, which eventually will invoke <code>user.__str__()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="s1">'jdoe'</span><span class="p">,</span> <span class="s1">'s3cret'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="go">jdoe</span>
</pre></div>
<p>However, if you put the same <code>user</code> variable inside a list by wrapping it in square brackets, then the password will become clearly visible:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">([</span><span class="n">user</span><span class="p">])</span>
<span class="go">[User('jdoe', 's3cret')]</span>
</pre></div>
<p>That’s because sequences, such as lists and tuples, implement their <code>.__str__()</code> method so that all of their elements are first converted with <code>repr()</code>.</p>
</div>
<p>Python gives you a lot of freedom when it comes to defining your own data types if none of the built-in ones meet your needs. Some of them, such as named tuples and data classes, offer string representations that look good without requiring any work on your part. Still, for the most flexibility, you’ll have to define a class and override its magic methods described above.</p>
<div class="card mb-3" id="collapse_card996300">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse996300" aria-expanded="false" aria-controls="collapse996300">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse996300" aria-expanded="false" aria-controls="collapse996300">Show/Hide</button></p></div>
<div id="collapse996300" class="collapse" data-parent="#collapse_card996300"><div class="card-body" markdown="1">
<p>The semantics of <code>.__str__()</code> and <code>.__repr__()</code> didn’t change since Python 2, but you must remember that strings were nothing more than glorified byte arrays back then. To convert your objects into proper Unicode, which was a separate data type, you’d have to provide yet another magic method: <code>.__unicode__()</code>.</p>
<p>Here’s an example of the same <code>User</code> class in Python 2:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">login</span><span class="p">,</span> <span class="n">password</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">login</span> <span class="o">=</span> <span class="n">login</span>
<span class="bp">self</span><span class="o">.</span><span class="n">password</span> <span class="o">=</span> <span class="n">password</span>
<span class="k">def</span> <span class="fm">__unicode__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">login</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">unicode</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">user</span> <span class="o">=</span> <span class="sa">u</span><span class="s2">"User('</span><span class="si">%s</span><span class="s2">', '</span><span class="si">%s</span><span class="s2">')"</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">login</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">password</span><span class="p">)</span>
<span class="k">return</span> <span class="n">user</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'unicode_escape'</span><span class="p">)</span>
</pre></div>
<p>As you can see, this implementation delegates some work to avoid duplication by calling the built-in <code>unicode()</code> function on itself.</p>
<p>Both <code>.__str__()</code> and <code>.__repr__()</code> methods must return strings, so they encode Unicode characters into specific byte representations called <strong>character sets</strong>. UTF-8 is the most widespread and safest encoding, while <code>unicode_escape</code> is a special constant to express funky characters, such as <code>รฉ</code>, as escape sequences in plain ASCII, such as <code>\xe9</code>.</p>
<p>The <code>print</code> statement is looking for the magic <code>.__str__()</code> method in the class, so the chosen <strong>charset</strong> must correspond to the one used by the terminal. For example, default encoding in DOS and Windows is CP 852 rather than UTF-8, so running this can result in a <code>UnicodeEncodeError</code> or even garbled output:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="sa">u</span><span class="s1">'</span><span class="se">\u043d\u0438\u043a\u0438\u0442\u0430</span><span class="s1">'</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'s3cret'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="n">user</span>
<span class="go">ฤลปฤลฤโฤลฤรฉฤโ</span>
</pre></div>
<p>However, if you ran the same code on a system with UTF-8 encoding, then you’d get the proper spelling of a popular Russian name:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="sa">u</span><span class="s1">'</span><span class="se">\u043d\u0438\u043a\u0438\u0442\u0430</span><span class="s1">'</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'s3cret'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="n">user</span>
<span class="go">ะฝะธะบะธัะฐ</span>
</pre></div>
<p>It’s recommended to convert strings to Unicode as early as possible, for example, when you’re reading data from a file, and use it consistently everywhere in your code. At the same time, you should encode Unicode back to the chosen character set right before presenting it to the user.</p>
<p>It seems as if you have more control over string representation of objects in Python 2 because there’s no magic <code>.__unicode__()</code> method in Python 3 anymore. You may be asking yourself if it’s possible to convert an object to its byte string representation rather than a Unicode string in Python 3. It’s possible, with a special <code>.__bytes__()</code> method that does just that:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">login</span><span class="p">,</span> <span class="n">password</span><span class="p">):</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">login</span> <span class="o">=</span> <span class="n">login</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">password</span> <span class="o">=</span> <span class="n">password</span>
<span class="gp">... </span>
<span class="gp">... </span> <span class="k">def</span> <span class="nf">__bytes__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="c1"># Python 3</span>
<span class="gp">... </span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">login</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="sa">u</span><span class="s1">'</span><span class="se">\u043d\u0438\u043a\u0438\u0442\u0430</span><span class="s1">'</span><span class="p">,</span> <span class="sa">u</span><span class="s1">'s3cret'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">bytes</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="go">b'\xd0\xbd\xd0\xb8\xd0\xba\xd0\xb8\xd1\x82\xd0\xb0'</span>
</pre></div>
<p>Using the built-in <code>bytes()</code> function on an instance delegates the call to its <code>__bytes__()</code> method defined in the corresponding class.</p>
</div></div>
</div>
<h2 id="understanding-python-print">Understanding Python Print</h2>
<p>You know <strong>how</strong> to use <code>print()</code> quite well at this point, but knowing <strong>what</strong> it is will allow you to use it even more effectively and consciously. After reading this section, you’ll understand how printing in Python has improved over the years.</p>
<h3 id="print-is-a-function-in-python-3">Print Is a Function in Python 3</h3>
<p>You’ve seen that <code>print()</code> is a function in Python 3. More specifically, it’s a built-in function, which means that you don’t need to import it from anywhere:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span>
<span class="go"><built-in function print></span>
</pre></div>
<p>It’s always available in the global namespace so that you can call it directly, but you can also access it through a module from the standard library:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">builtins</span>
<span class="gp">>>> </span><span class="n">builtins</span><span class="o">.</span><span class="n">print</span>
<span class="go"><built-in function print></span>
</pre></div>
<p>This way, you can avoid name collisions with custom functions. Let’s say you wanted to <strong>redefine</strong> <code>print()</code> so that it doesn’t append a trailing newline. At the same time, you wanted to rename the original function to something like <code>println()</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">builtins</span>
<span class="gp">>>> </span><span class="n">println</span> <span class="o">=</span> <span class="n">builtins</span><span class="o">.</span><span class="n">print</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="gp">... </span> <span class="n">builtins</span><span class="o">.</span><span class="n">print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">println</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span>
<span class="go">hello</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="go">hello</span>
</pre></div>
<p>Now you have two separate printing functions just like in the Java programming language. You’ll define custom <code>print()</code> functions in the <a href="#mocking-python-print-in-unit-tests">mocking section</a> later as well. Also, note that you wouldn’t be able to overwrite <code>print()</code> in the first place if it wasn’t a function.</p>
<p>On the other hand, <code>print()</code> isn’t a function in the mathematical sense, because it doesn’t return any meaningful value other than the implicit <code>None</code>:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">value</span> <span class="o">=</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'hello world'</span><span class="p">)</span>
<span class="go">hello world</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="go">None</span>
</pre></div>
<p>Such functions are, in fact, procedures or subroutines that you call to achieve some kind of side-effect, which ultimately is a change of a global state. In the case of <code>print()</code>, that side-effect is showing a message on the standard output or writing to a file.</p>
<p>Because <code>print()</code> is a function, it has a well-defined signature with known attributes. You can quickly find its <strong>documentation</strong> using the editor of your choice, without having to remember some weird syntax for performing a certain task.</p>
<p>Besides, functions are easier to <strong>extend</strong>. Adding a new feature to a function is as easy as adding another keyword argument, whereas changing the language to support that new feature is much more cumbersome. Think of stream redirection or buffer flushing, for example.</p>
<p>Another benefit of <code>print()</code> being a function is <strong>composability</strong>. Functions are so-called <a href="https://realpython.com/lessons/functions-first-class-objects-python/">first-class objects</a> or <a href="https://realpython.com/lessons/functions-are-first-class-citizens-python/">first-class citizens</a> in Python, which is a fancy way of saying they’re values just like strings or numbers. This way, you can assign a function to a variable, pass it to another function, or even return one from another. <code>print()</code> isn’t different in this regard. For instance, you can take advantage of it for dependency injection:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">download</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">log</span><span class="o">=</span><span class="nb">print</span><span class="p">):</span>
<span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">'Downloading </span><span class="si">{url}</span><span class="s1">'</span><span class="p">)</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">custom_print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">pass</span> <span class="c1"># Do not print anything</span>
<span class="n">download</span><span class="p">(</span><span class="s1">'/js/app.js'</span><span class="p">,</span> <span class="n">log</span><span class="o">=</span><span class="n">custom_print</span><span class="p">)</span>
</pre></div>
<p>Here, the <code>log</code> parameter lets you inject a callback function, which defaults to <code>print()</code> but can be any callable. In this example, printing is completely disabled by substituting <code>print()</code> with a dummy function that does nothing.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> A <strong>dependency</strong> is any piece of code required by another bit of code.</p>
<p><strong>Dependency injection</strong> is a technique used in code design to make it more testable, reusable, and open for extension. You can achieve it by referring to dependencies indirectly through abstract interfaces and by providing them in a <strong>push</strong> rather than <strong>pull</strong> fashion.</p>
<p>There’s a funny explanation of dependency injection circulating on the Internet:</p>
<blockquote>
<p>Dependency injection for five-year-olds</p>
<p>When you go and get things out of the refrigerator for yourself, you can cause problems. You might leave the door open, you might get something Mommy or Daddy doesn’t want you to have. You might even be looking for something we don’t even have or which has expired.</p>
<p>What you should be doing is stating a need, “I need something to drink with lunch,” and then we will make sure you have something when you sit down to eat.</p>
<p>— <em>John Munsch, 28 October 2009.</em> (<a href="https://stackoverflow.com/a/1638961">Source</a>)</p>
</blockquote>
</div>
<p>Composition allows you to combine a few functions into a new one of the same kind. Let’s see this in action by specifying a custom <code>error()</code> function that prints to the standard error stream and prefixes all messages with a given log level:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">functools</span> <span class="k">import</span> <span class="n">partial</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="n">redirect</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">function</span><span class="p">,</span> <span class="n">stream</span><span class="p">:</span> <span class="n">partial</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">stream</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">prefix</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">function</span><span class="p">,</span> <span class="n">prefix</span><span class="p">:</span> <span class="n">partial</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="n">prefix</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">error</span> <span class="o">=</span> <span class="n">prefix</span><span class="p">(</span><span class="n">redirect</span><span class="p">(</span><span class="nb">print</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">),</span> <span class="s1">'[ERROR]'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">error</span><span class="p">(</span><span class="s1">'Something went wrong'</span><span class="p">)</span>
<span class="go">[ERROR] Something went wrong</span>
</pre></div>
<p>This custom function uses <strong>partial functions</strong> to achieve the desired effect. It’s an advanced concept borrowed from the <a href="https://realpython.com/courses/functional-programming-python/">functional programming</a> paradigm, so you don’t need to go too deep into that topic for now. However, if you’re interested in this topic, I recommend taking a look at the <a href="https://pymotw.com/3/functools/"><code>functools</code></a> module.</p>
<p>Unlike statements, functions are values. That means you can mix them with <strong>expressions</strong>, in particular, <a href="https://realpython.com/python-lambda/"><strong>lambda</strong> expressions</a>. Instead of defining a full-blown function to replace <code>print()</code> with, you can make an anonymous lambda expression that calls it:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'/js/app.js'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">msg</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'[INFO]'</span><span class="p">,</span> <span class="n">msg</span><span class="p">))</span>
<span class="go">[INFO] Downloading /js/app.js</span>
</pre></div>
<p>However, because a lambda expression is defined in place, there’s no way of referring to it elsewhere in the code.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> In Python, you can’t put statements, such as assignments, conditional statements, loops, and so on, in an <strong>anonymous lambda function</strong>. It has to be a single expression!</p>
</div>
<p>Another kind of expression is a ternary conditional expression:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">user</span> <span class="o">=</span> <span class="s1">'jdoe'</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'Hi!'</span><span class="p">)</span> <span class="k">if</span> <span class="n">user</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Hi, </span><span class="si">{user}</span><span class="s1">.'</span><span class="p">)</span>
<span class="go">Hi, jdoe.</span>
</pre></div>
<p>Python has both <a href="https://realpython.com/python-conditional-statements/">conditional statements</a> and <a href="https://realpython.com/python-conditional-statements/#conditional-expressions-pythons-ternary-operator">conditional expressions</a>. The latter is evaluated to a single value that can be assigned to a variable or passed to a function. In the example above, you’re interested in the side-effect rather than the value, which evaluates to <code>None</code>, so you simply ignore it.</p>
<p>As you can see, functions allow for an elegant and extensible solution, which is consistent with the rest of the language. In the next subsection, you’ll discover how not having <code>print()</code> as a function caused a lot of headaches.</p>
<h3 id="print-was-a-statement-in-python-2">Print Was a Statement in Python 2</h3>
<p>A <strong>statement</strong> is an instruction that may evoke a side-effect when executed but never evaluates to a value. In other words, you wouldn’t be able to print a statement or assign it to a variable like this:</p>
<div class="highlight python"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="k">print</span> <span class="s1">'hello world'</span>
</pre></div>
<p>That’s a syntax error in Python 2.</p>
<p>Here are a few more examples of statements in Python:</p>
<ul>
<li><strong>assignment:</strong> <code>=</code></li>
<li><strong>conditional:</strong> <code>if</code></li>
<li><strong>loop:</strong> <code>while</code></li>
<li><strong>assertion</strong>: <code>assert</code></li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Python 3.8 brings a controversial <strong>walrus operator</strong> (<code>:=</code>), which is an <a href="https://www.python.org/dev/peps/pep-0572/">assignment expression</a>. With it, you can evaluate an expression and assign the result to a variable at the same time, even within another expression!</p>
<p>Take a look at this example, which calls an expensive function once and then reuses the result for further computation:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Python 3.8+</span>
<span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="n">y</span> <span class="p">:</span><span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">y</span><span class="o">**</span><span class="mi">3</span><span class="p">]</span>
</pre></div>
<p>This is useful for simplifying the code without losing its efficiency. Typically, performant code tends to be more verbose:</p>
<div class="highlight python"><pre><span></span><span class="n">y</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="n">y</span><span class="p">,</span> <span class="n">y</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">y</span><span class="o">**</span><span class="mi">3</span><span class="p">]</span>
</pre></div>
<p>The controversy behind this new piece of syntax caused a lot of argument. An abundance of negative comments and heated debates eventually led Guido van Rossum to step down from the <strong>Benevolent Dictator For Life</strong> or BDFL position.</p>
</div>
<p>Statements are usually comprised of reserved keywords such as <code>if</code>, <code>for</code>, or <code>print</code> that have fixed meaning in the language. You can’t use them to name your variables or other symbols. That’s why redefining or mocking the <code>print</code> statement isn’t possible in Python 2. You’re stuck with what you get.</p>
<p>Furthermore, you can’t print from anonymous functions, because statements aren’t accepted in lambda expressions:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">lambda</span><span class="p">:</span> <span class="k">print</span> <span class="s1">'hello world'</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>
<span class="k">lambda</span><span class="p">:</span> <span class="k">print</span> <span class="s1">'hello world'</span>
<span class="o">^</span>
<span class="gr">SyntaxError</span>: <span class="n">invalid syntax</span>
</pre></div>
<p>The syntax of the <code>print</code> statement is ambiguous. Sometimes you can add parentheses around the message, and they’re completely optional:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">print</span> <span class="s1">'Please wait...'</span>
<span class="go">Please wait...</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'Please wait...'</span><span class="p">)</span>
<span class="go">Please wait...</span>
</pre></div>
<p>At other times they change how the message is printed:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">print</span> <span class="s1">'My name is'</span><span class="p">,</span> <span class="s1">'John'</span>
<span class="go">My name is John</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'My name is'</span><span class="p">,</span> <span class="s1">'John'</span><span class="p">)</span>
<span class="go">('My name is', 'John')</span>
</pre></div>
<p>String concatenation can raise a <code>TypeError</code> due to incompatible types, which you have to handle manually, for example:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'jdoe'</span><span class="p">,</span> <span class="s1">'is'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="s1">'years old'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">values</span><span class="p">))</span>
<span class="go">jdoe is 42 years old</span>
</pre></div>
<p>Compare this with similar code in Python 3, which leverages sequence unpacking:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'jdoe'</span><span class="p">,</span> <span class="s1">'is'</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="s1">'years old'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="o">*</span><span class="n">values</span><span class="p">)</span> <span class="c1"># Python 3</span>
<span class="go">jdoe is 42 years old</span>
</pre></div>
<p>There aren’t any keyword arguments for common tasks such as flushing the buffer or stream redirection. You need to remember the quirky syntax instead. Even the built-in <code>help()</code> function isn’t that helpful with regards to the <code>print</code> statement:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">help</span><span class="p">(</span><span class="k">print</span><span class="p">)</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>
<span class="n">help</span><span class="p">(</span><span class="k">print</span><span class="p">)</span>
<span class="o">^</span>
<span class="gr">SyntaxError</span>: <span class="n">invalid syntax</span>
</pre></div>
<p>Trailing newline removal doesn’t work quite right, because it adds an unwanted space. You can’t compose multiple <code>print</code> statements together, and, on top of that, you have to be extra diligent about character encoding.</p>
<p>The list of problems goes on and on. If you’re curious, you can jump back to the <a href="#printing-in-a-nutshell">previous section</a> and look for more detailed explanations of the syntax in Python 2.</p>
<p>However, you can mitigate some of those problems with a much simpler approach. It turns out the <code>print()</code> function was backported to ease the migration to Python 3. You can import it from a special <code>__future__</code> module, which exposes a selection of language features released in later Python versions.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You may import future functions as well as baked-in language constructs such as the <code>with</code> statement.</p>
<p>To find out exactly what features are available to you, inspect the module:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">__future__</span>
<span class="gp">>>> </span><span class="n">__future__</span><span class="o">.</span><span class="n">all_feature_names</span>
<span class="go">['nested_scopes',</span>
<span class="go"> 'generators',</span>
<span class="go"> 'division',</span>
<span class="go"> 'absolute_import',</span>
<span class="go"> 'with_statement',</span>
<span class="go"> 'print_function',</span>
<span class="go"> 'unicode_literals']</span>
</pre></div>
<p>You could also call <code>dir(__future__)</code>, but that would show a lot of uninteresting internal details of the module.</p>
</div>
<p>To enable the <code>print()</code> function in Python 2, you need to add this import statement at the beginning of your source code:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
</pre></div>
<p>From now on the <code>print</code> statement is no longer available, but you have the <code>print()</code> function at your disposal. Note that it isn’t the same function like the one in Python 3, because it’s missing the <code>flush</code> keyword argument, but the rest of the arguments are the same.</p>
<p>Other than that, it doesn’t spare you from managing character encodings properly.</p>
<p>Here’s an example of calling the <code>print()</code> function in Python 2:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">sys</span>
<span class="gp">>>> </span><span class="k">print</span><span class="p">(</span><span class="s1">'I am a function in Python'</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="o">.</span><span class="n">major</span><span class="p">)</span>
<span class="go">I am a function in Python 2</span>
</pre></div>
<p>You now have an idea of how printing in Python evolved and, most importantly, understand why these backward-incompatible changes were necessary. Knowing this will surely help you become a better Python programmer.</p>
<h2 id="printing-with-style">Printing With Style</h2>
<p>If you thought that printing was only about lighting pixels up on the screen, then technically you’d be right. However, there are ways to make it look cool. In this section, you’ll find out how to format complex data structures, add colors and other decorations, build interfaces, use animation, and even play sounds with text! </p>
<h3 id="pretty-printing-nested-data-structures">Pretty-Printing Nested Data Structures</h3>
<p>Computer languages allow you to represent data as well as executable code in a structured way. Unlike Python, however, most languages give you a lot of freedom in using whitespace and formatting. This can be useful, for example in compression, but it sometimes leads to less readable code.</p>
<p>Pretty-printing is about making a piece of data or code look more appealing to the human eye so that it can be understood more easily. This is done by indenting certain lines, inserting newlines, reordering elements, and so forth.</p>
<p>Python comes with the <code>pprint</code> module in its standard library, which will help you in pretty-printing large data structures that don’t fit on a single line. Because it prints in a more human-friendly way, many popular <a href="https://realpython.com/interacting-with-python/">REPL</a> tools, including <a href="https://realpython.com/jupyter-notebook-introduction/">JupyterLab and IPython</a>, use it by default in place of the regular <code>print()</code> function.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> To toggle pretty printing in IPython, issue the following command:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">In [1]: </span><span class="o">%</span><span class="k">pprint</span>
<span class="go">Pretty printing has been turned OFF</span>
<span class="gp">In [2]: </span><span class="o">%</span><span class="k">pprint</span>
<span class="go">Pretty printing has been turned ON</span>
</pre></div>
<p>This is an example of <strong>Magic</strong> in IPython. There are a lot of built-in commands that start with a percent sign (<code>%</code>), but you can find more on <a href="https://pypi.org/">PyPI</a>, or even create your own.</p>
</div>
<p>If you don’t care about not having access to the original <code>print()</code> function, then you can replace it with <code>pprint()</code> in your code using import renaming:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span> <span class="k">as</span> <span class="nb">print</span>
<span class="gp">>>> </span><span class="nb">print</span>
<span class="go"><function pprint at 0x7f7a775a3510></span>
</pre></div>
<p>Personally, I like to have both functions at my fingertips, so I’d rather use something like <code>pp</code> as a short alias:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span> <span class="k">as</span> <span class="n">pp</span>
</pre></div>
<p>At first glance, there’s hardly any difference between the two functions, and in some cases there’s virtually none:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
<span class="go">42</span>
<span class="gp">>>> </span><span class="n">pp</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
<span class="go">42</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span>
<span class="go">hello</span>
<span class="gp">>>> </span><span class="n">pp</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span>
<span class="go">'hello' # Did you spot the difference?</span>
</pre></div>
<p>That’s because <code>pprint()</code> calls <code>repr()</code> instead of the usual <code>str()</code> for type casting, so that you may evaluate its output as Python code if you want to. The differences become apparent as you start feeding it more complex data structures:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'powers'</span><span class="p">:</span> <span class="p">[</span><span class="n">x</span><span class="o">**</span><span class="mi">10</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)]}</span>
<span class="gp">>>> </span><span class="n">pp</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="go">{'powers': [0,</span>
<span class="go"> 1,</span>
<span class="go"> 1024,</span>
<span class="go"> 59049,</span>
<span class="go"> 1048576,</span>
<span class="go"> 9765625,</span>
<span class="go"> 60466176,</span>
<span class="go"> 282475249,</span>
<span class="go"> 1073741824,</span>
<span class="go"> 3486784401]}</span>
</pre></div>
<p>The function applies reasonable formatting to improve readability, but you can customize it even further with a couple of parameters. For example, you may limit a deeply nested hierarchy by showing an ellipsis below a given level:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">cities</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'USA'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'Texas'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'Dallas'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'Irving'</span><span class="p">]}}}</span>
<span class="gp">>>> </span><span class="n">pp</span><span class="p">(</span><span class="n">cities</span><span class="p">,</span> <span class="n">depth</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="go">{'USA': {'Texas': {'Dallas': [...]}}}</span>
</pre></div>
<p>The ordinary <code>print()</code> also uses ellipses but for displaying recursive data structures, which form a cycle, to avoid stack overflow error:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">items</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">items</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">items</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">items</span><span class="p">)</span>
<span class="go">[1, 2, 3, [...]]</span>
</pre></div>
<p>However, <code>pprint()</code> is more explicit about it by including the unique identity of a self-referencing object:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">pp</span><span class="p">(</span><span class="n">items</span><span class="p">)</span>
<span class="go">[1, 2, 3, <Recursion on list with id=140635757287688>]</span>
<span class="gp">>>> </span><span class="nb">id</span><span class="p">(</span><span class="n">items</span><span class="p">)</span>
<span class="go">140635757287688</span>
</pre></div>
<p>The last element in the list is the same object as the entire list.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Recursive or very large data sets can be dealt with using the <code>reprlib</code> module as well:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">reprlib</span>
<span class="gp">>>> </span><span class="n">reprlib</span><span class="o">.</span><span class="n">repr</span><span class="p">([</span><span class="n">x</span><span class="o">**</span><span class="mi">10</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)])</span>
<span class="go">'[0, 1, 1024, 59049, 1048576, 9765625, ...]'</span>
</pre></div>
<p>This module supports most of the built-in types and is used by the Python debugger.</p>
</div>
<p><code>pprint()</code> automatically sorts dictionary keys for you before printing, which allows for consistent comparison. When you’re comparing strings, you often don’t care about a particular order of serialized attributes. Anyways, it’s always best to compare actual dictionaries before serialization.</p>
<p>Dictionaries often represent <a href="https://realpython.com/python-json/">JSON data</a>, which is widely used on the Internet. To correctly serialize a dictionary into a valid JSON-formatted string, you can take advantage of the <code>json</code> module. It too has pretty-printing capabilities:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">json</span>
<span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'username'</span><span class="p">:</span> <span class="s1">'jdoe'</span><span class="p">,</span> <span class="s1">'password'</span><span class="p">:</span> <span class="s1">'s3cret'</span><span class="p">}</span>
<span class="gp">>>> </span><span class="n">ugly</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">pretty</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">sort_keys</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">ugly</span><span class="p">)</span>
<span class="go">{"username": "jdoe", "password": "s3cret"}</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">pretty</span><span class="p">)</span>
<span class="go">{</span>
<span class="go"> "password": "s3cret",</span>
<span class="go"> "username": "jdoe"</span>
<span class="go">}</span>
</pre></div>
<p>Notice, however, that you need to handle printing yourself, because it’s not something you’d typically want to do. Similarly, the <code>pprint</code> module has an additional <code>pformat()</code> function that returns a string, in case you had to do something other than printing it.</p>
<p>Surprisingly, the signature of <code>pprint()</code> is nothing like the <code>print()</code> function’s one. You can’t even pass more than one positional argument, which shows how much it focuses on printing data structures.</p>
<h3 id="adding-colors-with-ansi-escape-sequences">Adding Colors With ANSI Escape Sequences</h3>
<p>As personal computers got more sophisticated, they had better graphics and could display more colors. However, different vendors had their own idea about the API design for controlling it. That changed a few decades ago when people at the American National Standards Institute decided to unify it by defining <a href="https://en.wikipedia.org/wiki/ANSI_escape_code">ANSI escape codes</a>.</p>
<p>Most of today’s terminal emulators support this standard to some degree. Until recently, the Windows operating system was a notable exception. Therefore, if you want the best portability, use the <a href="https://pypi.org/project/colorama/"><code>colorama</code></a> library in Python. It translates ANSI codes to their appropriate counterparts in Windows while keeping them intact in other operating systems.</p>
<p>To check if your terminal understands a subset of the ANSI escape sequences, for example, related to colors, you can try using the following command:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> tput colors
</pre></div>
<p>My default terminal on Linux says it can display 256 distinct colors, while xterm gives me only 8. The command would return a negative number if colors were unsupported.</p>
<p>ANSI escape sequences are like a markup language for the terminal. In HTML you work with tags, such as <code><b></code> or <code><i></code>, to change how elements look in the document. These tags are mixed with your content, but they’re not visible themselves. Similarly, escape codes won’t show up in the terminal as long as it recognizes them. Otherwise, they’ll appear in the literal form as if you were viewing the source of a website.</p>
<p>As its name implies, a sequence must begin with the non-printable <span class="keys"><kbd class="key-escape">Esc</kbd></span> character, whose ASCII value is 27, sometimes denoted as <code>0x1b</code> in hexadecimal or <code>033</code> in octal. You may use Python number literals to quickly verify it’s indeed the same number:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="mi">27</span> <span class="o">==</span> <span class="mh">0x1b</span> <span class="o">==</span> <span class="mo">0o33</span>
<span class="go">True</span>
</pre></div>
<p>Additionally, you can obtain it with the <code>\e</code> escape sequence in the shell:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\e"</span>
</pre></div>
<p>The most common ANSI escape sequences take the following form:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Element</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="keys"><kbd class="key-escape">Esc</kbd></span></td>
<td>non-printable escape character</td>
<td><code>\033</code></td>
</tr>
<tr>
<td><code>[</code></td>
<td>opening square bracket</td>
<td><code>[</code></td>
</tr>
<tr>
<td>numeric code</td>
<td>one or more numbers separated with <code>;</code></td>
<td><code>0</code></td>
</tr>
<tr>
<td>character code</td>
<td>uppercase or lowercase letter</td>
<td><code>m</code></td>
</tr>
</tbody>
</table>
</div>
<p>The <strong>numeric code</strong> can be one or more numbers separated with a semicolon, while the <strong>character code</strong> is just one letter. Their specific meaning is defined by the ANSI standard. For example, to reset all formatting, you would type one of the following commands, which use the code zero and the letter <code>m</code>:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\e[0m"</span>
<span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\x1b[0m"</span>
<span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\033[0m"</span>
</pre></div>
<p>At the other end of the spectrum, you have compound code values. To set foreground and background with RGB channels, given that your terminal supports 24-bit depth, you could provide multiple numbers:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\e[38;2;0;0;0m\e[48;2;255;255;255mBlack on white\e[0m"</span>
</pre></div>
<p>It’s not just text color that you can set with the ANSI escape codes. You can, for example, clear and scroll the terminal window, change its background, move the cursor around, make the text blink or decorate it with an underline.</p>
<p>In Python, you’d probably write a helper function to allow for wrapping arbitrary codes into a sequence:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">esc</span><span class="p">(</span><span class="n">code</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">f</span><span class="s1">'</span><span class="se">\033</span><span class="s1">[</span><span class="si">{code}</span><span class="s1">m'</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">esc</span><span class="p">(</span><span class="s1">'31;1;4'</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'really'</span> <span class="o">+</span> <span class="n">esc</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="s1">' important'</span><span class="p">)</span>
</pre></div>
<p>This would make the word <code>really</code> appear in red, bold, and underlined font:</p>
<p><a href="https://files.realpython.com/media/ansi.21ed85878eb9.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ansi.21ed85878eb9.png" width="576" height="236" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ansi.21ed85878eb9.png&w=144&sig=1b3e337786114c9e099a662d5e096ce1d41ba6a5 144w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ansi.21ed85878eb9.png&w=288&sig=cb8720ce7fb4db092d674380ef3bf3543503546a 288w, https://files.realpython.com/media/ansi.21ed85878eb9.png 576w" sizes="75vw" alt="Text formatted with ANSI escape codes"/></a></p>
<p>However, there are higher-level abstractions over ANSI escape codes, such as the mentioned <code>colorama</code> library, as well as tools for building user interfaces in the console.</p>
<h3 id="building-console-user-interfaces">Building Console User Interfaces</h3>
<p>While playing with ANSI escape codes is undeniably a ton of fun, in the real world you’d rather have more abstract building blocks to put together a user interface. There are a few libraries that provide such a high level of control over the terminal, but <a href="https://docs.python.org/3/howto/curses.html"><code>curses</code></a> seems to be the most popular choice.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> To use the <code>curses</code> library in Windows, you need to install a third-party package:</p>
<div class="highlight sh"><pre><span></span><span class="go">C:\> pip install windows-curses</span>
</pre></div>
<p>That’s because <code>curses</code> isn’t available in the standard library of the Python distribution for Windows.</p>
</div>
<p>Primarily, it allows you to think in terms of independent graphical widgets instead of a blob of text. Besides, you get a lot of freedom in expressing your inner artist, because it’s really like painting a blank canvas. The library hides the complexities of having to deal with different terminals. Other than that, it has great support for keyboard events, which might be useful for writing video games.</p>
<p>How about making a retro snake game? Let’s create a Python snake simulator:</p>
<p><a href="https://files.realpython.com/media/snake.a9589582b58a.gif" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/snake.a9589582b58a.gif" width="576" height="392" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/snake.a9589582b58a.gif&w=144&sig=a02bac6a4123b5c1a10112973045e6deec2895c3 144w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/snake.a9589582b58a.gif&w=288&sig=0019bb69f00b55d6201acf7661b870f865758f61 288w, https://files.realpython.com/media/snake.a9589582b58a.gif 576w" sizes="75vw" alt="The retro snake game built with curses library"/></a></p>
<p>First, you need to import the <code>curses</code> module. Since it modifies the state of a running terminal, it’s important to handle errors and gracefully restore the previous state. You can do this manually, but the library comes with a convenient wrapper for your main function:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">curses</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">curses</span><span class="o">.</span><span class="n">wrapper</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>Note, the function must accept a reference to the screen object, also known as <code>stdscr</code>, that you’ll use later for additional setup.</p>
<p>If you run this program now, you won’t see any effects, because it terminates immediately. However, you can add a small delay to have a sneak peek:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span><span class="o">,</span> <span class="nn">curses</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">curses</span><span class="o">.</span><span class="n">wrapper</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>This time the screen went completely blank for a second, but the cursor was still blinking. To hide it, just call one of the configuration functions defined in the module:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span><span class="o">,</span> <span class="nn">curses</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="hll"> <span class="n">curses</span><span class="o">.</span><span class="n">curs_set</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># Hide the cursor</span>
</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">curses</span><span class="o">.</span><span class="n">wrapper</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>Let’s define the snake as a list of points in screen coordinates:</p>
<div class="highlight python"><pre><span></span><span class="n">snake</span> <span class="o">=</span> <span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">))]</span>
</pre></div>
<p>The head of the snake is always the first element in the list, whereas the tail is the last one. The initial shape of the snake is horizontal, starting from the top-left corner of the screen and facing to the right. While its y-coordinate stays at zero, its x-coordinate decreases from head to tail.</p>
<p>To draw the snake, you’ll start with the head and then follow with the remaining segments. Each segment carries <code>(y, x)</code> coordinates, so you can unpack them:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Draw the snake</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">snake</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">'@'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">segment</span> <span class="ow">in</span> <span class="n">snake</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">segment</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">)</span>
</pre></div>
<p>Again, if you run this code now, it won’t display anything, because you must explicitly refresh the screen afterward:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span><span class="o">,</span> <span class="nn">curses</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="n">curses</span><span class="o">.</span><span class="n">curs_set</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># Hide the cursor</span>
<span class="n">snake</span> <span class="o">=</span> <span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">))]</span>
<span class="c1"># Draw the snake</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">snake</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">'@'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">segment</span> <span class="ow">in</span> <span class="n">snake</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">segment</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">)</span>
<span class="hll"> <span class="n">screen</span><span class="o">.</span><span class="n">refresh</span><span class="p">()</span>
</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">curses</span><span class="o">.</span><span class="n">wrapper</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>You want to move the snake in one of four directions, which can be defined as vectors. Eventually, the direction will change in response to an arrow keystroke, so you may hook it up to the library’s key codes:</p>
<div class="highlight python"><pre><span></span><span class="n">directions</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_UP</span><span class="p">:</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_DOWN</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_LEFT</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_RIGHT</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">}</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">directions</span><span class="p">[</span><span class="n">curses</span><span class="o">.</span><span class="n">KEY_RIGHT</span><span class="p">]</span>
</pre></div>
<p>How does a snake move? It turns out that only its head really moves to a new location, while all other segments shift towards it. In each step, almost all segments remain the same, except for the head and the tail. Assuming the snake isn’t growing, you can remove the tail and insert a new head at the beginning of the list:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Move the snake</span>
<span class="n">snake</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="n">snake</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="nb">zip</span><span class="p">(</span><span class="n">snake</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">direction</span><span class="p">))))</span>
</pre></div>
<p>To get the new coordinates of the head, you need to add the direction vector to it. However, adding tuples in Python results in a bigger tuple instead of the algebraic sum of the corresponding vector components. One way to fix this is by using the built-in <code>zip()</code>, <code>sum()</code>, and <code>map()</code> functions.</p>
<p>The direction will change on a keystroke, so you need to call <code>.getch()</code> to obtain the pressed key code. However, if the pressed key doesn’t correspond to the arrow keys defined earlier as dictionary keys, the direction won’t change:</p>
<div class="highlight python"><pre><span></span><span class="c1"># Change direction on arrow keystroke</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">directions</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">screen</span><span class="o">.</span><span class="n">getch</span><span class="p">(),</span> <span class="n">direction</span><span class="p">)</span>
</pre></div>
<p>By default, however, <code>.getch()</code> is a blocking call that would prevent the snake from moving unless there was a keystroke. Therefore, you need to make the call non-blocking by adding yet another configuration:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="n">curses</span><span class="o">.</span><span class="n">curs_set</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># Hide the cursor</span>
<span class="hll"> <span class="n">screen</span><span class="o">.</span><span class="n">nodelay</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span> <span class="c1"># Don't block I/O calls</span>
</span></pre></div>
<p>You’re almost done, but there’s just one last thing left. If you now loop this code, the snake will appear to be growing instead of moving. That’s because you have to erase the screen explicitly before each iteration.</p>
<p>Finally, this is all you need to play the snake game in Python:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">time</span><span class="o">,</span> <span class="nn">curses</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">screen</span><span class="p">):</span>
<span class="n">curses</span><span class="o">.</span><span class="n">curs_set</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># Hide the cursor</span>
<span class="n">screen</span><span class="o">.</span><span class="n">nodelay</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span> <span class="c1"># Don't block I/O calls</span>
<span class="n">directions</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_UP</span><span class="p">:</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_DOWN</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_LEFT</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
<span class="n">curses</span><span class="o">.</span><span class="n">KEY_RIGHT</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">}</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">directions</span><span class="p">[</span><span class="n">curses</span><span class="o">.</span><span class="n">KEY_RIGHT</span><span class="p">]</span>
<span class="n">snake</span> <span class="o">=</span> <span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">))]</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="hll"> <span class="n">screen</span><span class="o">.</span><span class="n">erase</span><span class="p">()</span>
</span>
<span class="c1"># Draw the snake</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">snake</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">'@'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">segment</span> <span class="ow">in</span> <span class="n">snake</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">screen</span><span class="o">.</span><span class="n">addstr</span><span class="p">(</span><span class="o">*</span><span class="n">segment</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">)</span>
<span class="c1"># Move the snake</span>
<span class="n">snake</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="n">snake</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="nb">zip</span><span class="p">(</span><span class="n">snake</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">direction</span><span class="p">))))</span>
<span class="c1"># Change direction on arrow keystroke</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">directions</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">screen</span><span class="o">.</span><span class="n">getch</span><span class="p">(),</span> <span class="n">direction</span><span class="p">)</span>
<span class="n">screen</span><span class="o">.</span><span class="n">refresh</span><span class="p">()</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">curses</span><span class="o">.</span><span class="n">wrapper</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>This is merely scratching the surface of the possibilities that the <code>curses</code> module opens up. You may use it for game development like this or more business-oriented applications.</p>
<h3 id="living-it-up-with-cool-animations">Living It Up With Cool Animations</h3>
<p>Not only can animations make the user interface more appealing to the eye, but they also improve the overall user experience. When you provide early feedback to the user, for example, they’ll know if your program’s still working or if it’s time to kill it.</p>
<p>To animate text in the terminal, you have to be able to freely move the cursor around. You can do this with one of the tools mentioned previously, that is ANSI escape codes or the <code>curses</code> library. However, I’d like to show you an even simpler way.</p>
<p>If the animation can be constrained to a single line of text, then you might be interested in two special escape character sequences:</p>
<ul>
<li><strong>Carriage return:</strong> <code>\r</code></li>
<li><strong>Backspace:</strong> <code>\b</code></li>
</ul>
<p>The first one moves the cursor to the beginning of the line, whereas the second one moves it only one character to the left. They both work in a non-destructive way without overwriting text that’s already been written.</p>
<p>Let’s take a look at a few examples.</p>
<p>You’ll often want to display some kind of a <strong>spinning wheel</strong> to indicate a work in progress without knowing exactly how much time’s left to finish:</p>
<p><a href="https://files.realpython.com/media/spinning_wheel.c595af6f83ea.gif" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/spinning_wheel.c595af6f83ea.gif" width="576" height="236" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/spinning_wheel.c595af6f83ea.gif&w=144&sig=6668f7484a9d15711f5a9f149647b38baf8683cb 144w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/spinning_wheel.c595af6f83ea.gif&w=288&sig=267bb31016ff764a987006dcecafcf80233af3ad 288w, https://files.realpython.com/media/spinning_wheel.c595af6f83ea.gif 576w" sizes="75vw" alt="Indefinite animation in the terminal"/></a></p>
<p>Many command line tools use this trick while downloading data over the network. You can make a really simple stop motion animation from a sequence of characters that will cycle in a round-robin fashion:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">itertools</span> <span class="k">import</span> <span class="n">cycle</span>
<span class="kn">from</span> <span class="nn">time</span> <span class="k">import</span> <span class="n">sleep</span>
<span class="k">for</span> <span class="n">frame</span> <span class="ow">in</span> <span class="n">cycle</span><span class="p">(</span><span class="sa">r</span><span class="s1">'-\|/-\|/'</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\r</span><span class="s1">'</span><span class="p">,</span> <span class="n">frame</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">sleep</span><span class="p">(</span><span class="mf">0.2</span><span class="p">)</span>
</pre></div>
<p>The loop gets the next character to print, then moves the cursor to the beginning of the line, and overwrites whatever there was before without adding a newline. You don’t want extra space between positional arguments, so separator argument must be blank. Also, notice the use of Python’s raw strings due to backslash characters present in the literal.</p>
<p>When you know the remaining time or task completion percentage, then you’re able to show an animated progress bar:</p>
<p><a href="https://files.realpython.com/media/progress.6bd055d8dcc4.gif" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/progress.6bd055d8dcc4.gif" width="576" height="236" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/progress.6bd055d8dcc4.gif&w=144&sig=b77d52a79d28329d6ec8d5e43fbffd8bc112896e 144w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/progress.6bd055d8dcc4.gif&w=288&sig=57b015e2b6a7534b9a9d027d81a71b52967c20e0 288w, https://files.realpython.com/media/progress.6bd055d8dcc4.gif 576w" sizes="75vw" alt="Progress bar animation in the terminal"/></a></p>
<p>First, you need to calculate how many hashtags to display and how many blank spaces to insert. Next, you erase the line and build the bar from scratch:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">time</span> <span class="k">import</span> <span class="n">sleep</span>
<span class="k">def</span> <span class="nf">progress</span><span class="p">(</span><span class="n">percent</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">30</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">width</span> <span class="o">*</span> <span class="n">percent</span> <span class="o">//</span> <span class="mi">100</span>
<span class="n">right</span> <span class="o">=</span> <span class="n">width</span> <span class="o">-</span> <span class="n">left</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\r</span><span class="s1">['</span><span class="p">,</span> <span class="s1">'#'</span> <span class="o">*</span> <span class="n">left</span><span class="p">,</span> <span class="s1">' '</span> <span class="o">*</span> <span class="n">right</span><span class="p">,</span> <span class="s1">']'</span><span class="p">,</span>
<span class="n">f</span><span class="s1">' </span><span class="si">{percent:.0f}</span><span class="s1">%'</span><span class="p">,</span>
<span class="n">sep</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">101</span><span class="p">):</span>
<span class="n">progress</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">sleep</span><span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
</pre></div>
<p>As before, each request for update repaints the entire line.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> There’s a feature-rich <a href="https://pypi.org/project/progressbar2/"><code>progressbar2</code></a> library, along with a few other similar tools, that can show progress in a much more comprehensive way.</p>
</div>
<h3 id="making-sounds-with-print">Making Sounds With Print</h3>
<p>If you’re old enough to remember computers with a PC speaker, then you must also remember their distinctive <em>beep</em> sound, often used to indicate hardware problems. They could barely make any more noises than that, yet video games seemed so much better with it.</p>
<p>Today you can still take advantage of this small loudspeaker, but chances are your laptop didn’t come with one. In such a case, you can enable <strong>terminal bell</strong> emulation in your shell, so that a system warning sound is played instead.</p>
<p>Go ahead and type this command to see if your terminal can play a sound:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">echo</span> -e <span class="s2">"\a"</span>
</pre></div>
<p>This would normally print text, but the <code>-e</code> flag enables the interpretation of backslash escapes. As you can see, there’s a dedicated escape sequence <code>\a</code>, which stands for “alert”, that outputs a special <a href="https://en.wikipedia.org/wiki/Bell_character">bell character</a>. Some terminals make a sound whenever they see it.</p>
<p>Similarly, you can print this character in Python. Perhaps in a loop to form some kind of melody. While it’s only a single note, you can still vary the length of pauses between consecutive instances. That seems like a perfect toy for Morse code playback!</p>
<p>The rules are the following:</p>
<ul>
<li>Letters are encoded with a sequence of <strong>dot</strong> (·) and <strong>dash</strong> (–) symbols.</li>
<li>A <strong>dot</strong> is one unit of time.</li>
<li>A <strong>dash</strong> is three units of time.</li>
<li>Individual <strong>symbols</strong> in a letter are spaced one unit of time apart.</li>
<li>Symbols of two adjacent <strong>letters</strong> are spaced three units of time apart.</li>
<li>Symbols of two adjacent <strong>words</strong> are spaced seven units of time apart.</li>
</ul>
<p>According to those rules, you could be “printing” an SOS signal indefinitely in the following way:</p>
<div class="highlight python"><pre><span></span><span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">letter_space</span><span class="p">()</span>
<span class="n">dash</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dash</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dash</span><span class="p">()</span>
<span class="n">letter_space</span><span class="p">()</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">symbol_space</span><span class="p">()</span>
<span class="n">dot</span><span class="p">()</span>
<span class="n">word_space</span><span class="p">()</span>
</pre></div>
<p>In Python, you can implement it in merely ten lines of code:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">time</span> <span class="k">import</span> <span class="n">sleep</span>
<span class="n">speed</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="k">def</span> <span class="nf">signal</span><span class="p">(</span><span class="n">duration</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span>
<span class="n">sleep</span><span class="p">(</span><span class="n">duration</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">symbol</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">dot</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">signal</span><span class="p">(</span><span class="n">speed</span><span class="p">,</span> <span class="s1">'ยท</span><span class="se">\a</span><span class="s1">'</span><span class="p">)</span>
<span class="n">dash</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">signal</span><span class="p">(</span><span class="mi">3</span><span class="o">*</span><span class="n">speed</span><span class="p">,</span> <span class="s1">'โ</span><span class="se">\a</span><span class="s1">'</span><span class="p">)</span>
<span class="n">symbol_space</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">signal</span><span class="p">(</span><span class="n">speed</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="n">letter_space</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">signal</span><span class="p">(</span><span class="mi">3</span><span class="o">*</span><span class="n">speed</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="n">word_space</span> <span class="o">=</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">signal</span><span class="p">(</span><span class="mi">7</span><span class="o">*</span><span class="n">speed</span><span class="p">,</span> <span class="s1">' '</span><span class="p">)</span>
</pre></div>
<p>Maybe you could even take it one step further and make a command line tool for translating text into Morse code? Either way, I hope you’re having fun with this!</p>
<h2 id="mocking-python-print-in-unit-tests">Mocking Python Print in Unit Tests</h2>
<p>Nowadays, it’s expected that you ship code that meets high quality standards. If you aspire to become a professional, you must learn <a href="https://realpython.com/python-testing/">how to test</a> your code.</p>
<p>Software testing is especially important in dynamically typed languages, such as Python, which don’t have a compiler to warn you about obvious mistakes. Defects can make their way to the production environment and remain dormant for a long time, until that one day when a branch of code finally gets executed.</p>
<p>Sure, you have <a href="https://realpython.com/python-code-quality/#linters">linters</a>, <a href="https://realpython.com/python-type-checking/">type checkers</a>, and other tools for static code analysis to assist you. But they won’t tell you whether your program does what it’s supposed to do on the business level.</p>
<p>So, should you be testing <code>print()</code>? No. After all, it’s a built-in function that must have already gone through a comprehensive suite of tests. What you want to test, though, is whether your code is calling <code>print()</code> at the right time with the expected parameters. That’s known as a <strong>behavior</strong>.</p>
<p>You can test behaviors by <a href="https://realpython.com/python-mock-library/">mocking</a> real objects or functions. In this case, you want to mock <code>print()</code> to record and verify its invocations.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> You might have heard the terms: <strong>dummy</strong>, <strong>fake</strong>, <strong>stub</strong>, <strong>spy</strong>, or <strong>mock</strong> used interchangeably. Some people make a distinction between them, while others don’t.</p>
<p>Martin Fowler explains their differences in a <a href="https://martinfowler.com/bliki/TestDouble.html">short glossary</a> and collectively calls them <strong>test doubles</strong>.</p>
</div>
<p>Mocking in Python can be done twofold. First, you can take the traditional path of statically-typed languages by employing dependency injection. This may sometimes require you to change the code under test, which isn’t always possible if the code is defined in an external library:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">download</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">log</span><span class="o">=</span><span class="nb">print</span><span class="p">):</span>
<span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">'Downloading </span><span class="si">{url}</span><span class="s1">'</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre></div>
<p>This is the same example I used in an earlier section to talk about function composition. It basically allows for substituting <code>print()</code> with a custom function of the same interface. To check if it prints the right message, you have to intercept it by injecting a mocked function:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">mock_print</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="gp">... </span> <span class="n">mock_print</span><span class="o">.</span><span class="n">last_message</span> <span class="o">=</span> <span class="n">message</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'resource'</span><span class="p">,</span> <span class="n">mock_print</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">assert</span> <span class="s1">'Downloading resource'</span> <span class="o">==</span> <span class="n">mock_print</span><span class="o">.</span><span class="n">last_message</span>
</pre></div>
<p>Calling this mock makes it save the last message in an attribute, which you can inspect later, for example in an <code>assert</code> statement.</p>
<p>In a slightly alternative solution, instead of replacing the entire <code>print()</code> function with a custom wrapper, you could redirect the standard output to an in-memory file-like stream of characters:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">download</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">stream</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Downloading </span><span class="si">{url}</span><span class="s1">'</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">stream</span><span class="p">)</span>
<span class="gp">... </span> <span class="c1"># ...</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">io</span>
<span class="gp">>>> </span><span class="n">memory_buffer</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'app.js'</span><span class="p">,</span> <span class="n">memory_buffer</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'style.css'</span><span class="p">,</span> <span class="n">memory_buffer</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">memory_buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
<span class="go">'Downloading app.js\nDownloading style.css\n'</span>
</pre></div>
<p>This time the function explicitly calls <code>print()</code>, but it exposes its <code>file</code> parameter to the outside world.</p>
<p>However, a more Pythonic way of mocking objects takes advantage of the built-in <code>mock</code> module, which uses a technique called <a href="https://en.wikipedia.org/wiki/Monkey_patch">monkey patching</a>. This derogatory name stems from it being a “dirty hack” that you can easily shoot yourself in the foot with. It’s less elegant than dependency injection but definitely quick and convenient.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> The <code>mock</code> module got absorbed by the standard library in Python 3, but before that, it was a third-party package. You had to install it separately:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> pip2 install mock
</pre></div>
<p>Other than that, you referred to it as <code>mock</code>, whereas in Python 3 it’s part of the unit testing module, so you must import from <code>unittest.mock</code>.</p>
</div>
<p>What monkey patching does is alter implementation dynamically at runtime. Such a change is visible globally, so it may have unwanted consequences. In practice, however, patching only affects the code for the duration of test execution.</p>
<p>To mock <code>print()</code> in a test case, you’ll typically use the <code>@patch</code> <a href="https://realpython.com/primer-on-python-decorators/">decorator</a> and specify a target for patching by referring to it with a fully qualified name, that is including the module name:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="k">import</span> <span class="n">patch</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'builtins.print'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_print</span><span class="p">(</span><span class="n">mock_print</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'not a real print'</span><span class="p">)</span>
<span class="n">mock_print</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="s1">'not a real print'</span><span class="p">)</span>
</pre></div>
<p>This will automatically create the mock for you and inject it to the test function. However, you need to declare that your test function accepts a mock now. The underlying mock object has lots of useful methods and attributes for verifying behavior.</p>
<p>Did you notice anything peculiar about that code snippet?</p>
<p>Despite injecting a mock to the function, you’re not calling it directly, although you could. That injected mock is only used to make assertions afterward and maybe to prepare the context before running the test.</p>
<p>In real life, mocking helps to isolate the code under test by removing dependencies such as a database connection. You rarely call mocks in a test, because that doesn’t make much sense. Rather, it’s other pieces of code that call your mock indirectly without knowing it.</p>
<p>Here’s what that means:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="k">import</span> <span class="n">patch</span>
<span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Hello, </span><span class="si">{name}</span><span class="s1">!'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'builtins.print'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_greet</span><span class="p">(</span><span class="n">mock_print</span><span class="p">):</span>
<span class="n">greet</span><span class="p">(</span><span class="s1">'John'</span><span class="p">)</span>
<span class="n">mock_print</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="s1">'Hello, John!'</span><span class="p">)</span>
</pre></div>
<p>The code under test is a function that prints a greeting. Even though it’s a fairly simple function, you can’t test it easily because it doesn’t return a value. It has a side-effect.</p>
<p>To eliminate that side-effect, you need to mock the dependency out. Patching lets you avoid making changes to the original function, which can remain agnostic about <code>print()</code>. It thinks it’s calling <code>print()</code>, but in reality, it’s calling a mock you’re in total control of.</p>
<p>There are many reasons for testing software. One of them is looking for bugs. When you write tests, you often want to get rid of the <code>print()</code> function, for example, by mocking it away. Paradoxically, however, that same function can help you find bugs during a related process of debugging you’ll read about in the next section.</p>
<div class="card mb-3" id="collapse_card841486">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse841486" aria-expanded="false" aria-controls="collapse841486">Syntax in Python 2</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse841486" aria-expanded="false" aria-controls="collapse841486">Show/Hide</button></p></div>
<div id="collapse841486" class="collapse" data-parent="#collapse_card841486"><div class="card-body" markdown="1">
<p>You can’t monkey patch the <code>print</code> statement in Python 2, nor can you inject it as a dependency. However, you have a few other options:</p>
<ul>
<li>Use stream redirection.</li>
<li>Patch the standard output defined in the <code>sys</code> module.</li>
<li>Import <code>print()</code> from the <code>__future__</code> module.</li>
</ul>
<p>Let’s examine them one by one.</p>
<p>Stream redirection is almost identical to the example you saw earlier:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">download</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">stream</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">print</span> <span class="o">>></span> <span class="n">stream</span><span class="p">,</span> <span class="s1">'Downloading </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">url</span>
<span class="gp">... </span> <span class="c1"># ...</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">StringIO</span> <span class="kn">import</span> <span class="n">StringIO</span>
<span class="gp">>>> </span><span class="n">memory_buffer</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'app.js'</span><span class="p">,</span> <span class="n">memory_buffer</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">download</span><span class="p">(</span><span class="s1">'style.css'</span><span class="p">,</span> <span class="n">memory_buffer</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">memory_buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
<span class="go">'Downloading app.js\nDownloading style.css\n'</span>
</pre></div>
<p>There are only two differences. First, the syntax for stream redirection uses chevron (<code>>></code>) instead of the <code>file</code> argument. The other difference is where <code>StringIO</code> is defined. You can import it from a similarly named <code>StringIO</code> module, or <code>cStringIO</code> for a faster implementation.</p>
<p>Patching the standard output from the <code>sys</code> module is exactly what it sounds like, but you need to be aware of a few gotchas:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">mock</span> <span class="kn">import</span> <span class="n">patch</span><span class="p">,</span> <span class="n">call</span>
<span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="k">print</span> <span class="s1">'Hello, </span><span class="si">%s</span><span class="s1">!'</span> <span class="o">%</span> <span class="n">name</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'sys.stdout'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_greet</span><span class="p">(</span><span class="n">mock_stdout</span><span class="p">):</span>
<span class="n">greet</span><span class="p">(</span><span class="s1">'John'</span><span class="p">)</span>
<span class="n">mock_stdout</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">assert_has_calls</span><span class="p">([</span>
<span class="n">call</span><span class="p">(</span><span class="s1">'Hello, John!'</span><span class="p">),</span>
<span class="n">call</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="p">])</span>
</pre></div>
<p>First of all, remember to install the <code>mock</code> module as it wasn’t available in the standard library in Python 2.</p>
<p>Secondly, the <code>print</code> statement calls the underlying <code>.write()</code> method on the mocked object instead of calling the object itself. That’s why you’ll run assertions against <code>mock_stdout.write</code>. </p>
<p>Finally, a single <code>print</code> statement doesn’t always correspond to a single call to <code>sys.stdout.write()</code>. In fact, you’ll see the newline character written separately.</p>
<p>The last option you have is importing <code>print()</code> from <code>future</code> and patching it:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'Hello, </span><span class="si">%s</span><span class="s1">!'</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'__builtin__.print'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_greet</span><span class="p">(</span><span class="n">mock_print</span><span class="p">):</span>
<span class="n">greet</span><span class="p">(</span><span class="s1">'John'</span><span class="p">)</span>
<span class="n">mock_print</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="s1">'Hello, John!'</span><span class="p">)</span>
</pre></div>
<p>Again, it’s nearly identical to Python 3, but the <code>print()</code> function is defined in the <code>__builtin__</code> module rather than <code>builtins</code>.</p>
</div></div>
</div>
<h2 id="print-debugging">Print Debugging</h2>
<p>In this section, you’ll take a look at the available tools for debugging in Python, starting from a humble <code>print()</code> function, through the <code>logging</code> module, to a fully fledged debugger. After reading it, you’ll be able to make an educated decision about which of them is the most suitable in a given situation.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Debugging is the process of looking for the root causes of <strong>bugs</strong> or defects in software after they’ve been discovered, as well as taking steps to fix them.</p>
<p>The term <strong>bug</strong> has an <a href="https://en.wikipedia.org/wiki/Debugging#Origin_of_the_term">amusing story</a> about the origin of its name.</p>
</div>
<h3 id="tracing">Tracing</h3>
<p>Also known as <strong>print debugging</strong> or <strong>caveman debugging</strong>, it’s the most basic form of debugging. While a little bit old-fashioned, it’s still powerful and has its uses.</p>
<p>The idea is to follow the path of program execution until it stops abruptly, or gives incorrect results, to identify the exact instruction with a problem. You do that by inserting print statements with words that stand out in carefully chosen places.</p>
<p>Take a look at this example, which manifests a rounding error:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">average</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'debug1:'</span><span class="p">,</span> <span class="n">numbers</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'debug2:'</span><span class="p">,</span> <span class="nb">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">))</span>
<span class="gp">... </span> <span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="mf">0.1</span> <span class="o">==</span> <span class="n">average</span><span class="p">(</span><span class="mi">3</span><span class="o">*</span><span class="p">[</span><span class="mf">0.1</span><span class="p">])</span>
<span class="go">debug1: [0.1, 0.1, 0.1]</span>
<span class="go">debug2: 0.30000000000000004</span>
<span class="go">False</span>
</pre></div>
<p>As you can see, the function doesn’t return the expected value of <code>0.1</code>, but now you know it’s because the sum is a little off. Tracing the state of variables at different steps of the algorithm can give you a hint where the issue is.</p>
<div class="card mb-3" id="collapse_cardab3ddc">
<div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapseab3ddc" aria-expanded="false" aria-controls="collapseab3ddc">Rounding Error</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapseab3ddc" aria-expanded="false" aria-controls="collapseab3ddc">Show/Hide</button></p></div>
<div id="collapseab3ddc" class="collapse" data-parent="#collapse_cardab3ddc"><div class="card-body" markdown="1">
<p>In this case, the problem lies in how <strong>floating point</strong> numbers are represented in computer memory. Remember that numbers are stored in binary form. Decimal value of <code>0.1</code> turns out to have an infinite binary representation, which gets rounded.</p>
<p>For more information on rounding numbers in Python, you can check out <a href="https://realpython.com/python-rounding/">How to Round Numbers in Python</a>.</p>
</div></div>
</div>
<p>This method is simple and intuitive and will work in pretty much every programming language out there. Not to mention, it’s a great exercise in the learning process.</p>
<p>On the other hand, once you master more advanced techniques, it’s hard to go back, because they allow you to find bugs much quicker. Tracing is a laborious manual process, which can let even more errors slip through. The build and deploy cycle takes time. Afterward, you need to remember to meticulously remove all the <code>print()</code> calls you made without accidentally touching the genuine ones.</p>
<p>Besides, it requires you to make changes in the code, which isn’t always possible. Maybe you’re debugging an application running in a remote web server or want to diagnose a problem in a <strong>post-mortem</strong> fashion. Sometimes you simply don’t have access to the standard output.</p>
<p>That’s precisely where <a href="https://realpython.com/courses/logging-python/">logging</a> shines.</p>
<h3 id="logging">Logging</h3>
<p>Let’s pretend for a minute that you’re running an e-commerce website. One day, an angry customer makes a phone call complaining about a failed transaction and saying he lost his money. He claims to have tried purchasing a few items, but in the end, there was some cryptic error that prevented him from finishing that order. Yet, when he checked his bank account, the money was gone.</p>
<p>You apologize sincerely and make a refund, but also don’t want this to happen again in the future. How do you debug that? If only you had some trace of what happened, ideally in the form of a chronological list of events with their context.</p>
<p>Whenever you find yourself doing print debugging, consider turning it into permanent log messages. This may help in situations like this, when you need to analyze a problem after it happened, in an environment that you don’t have access to.</p>
<p>There are sophisticated tools for log aggregation and searching, but at the most basic level, you can think of logs as text files. Each line conveys detailed information about an event in your system. Usually, it won’t contain personally identifying information, though, in some cases, it may be mandated by law.</p>
<p>Here’s a breakdown of a typical log record:</p>
<div class="highlight text"><pre><span></span>[2019-06-14 15:18:34,517][DEBUG][root][MainThread] Customer(id=123) logged out
</pre></div>
<p>As you can see, it has a structured form. Apart from a descriptive message, there are a few customizable fields, which provide the context of an event. Here, you have the exact date and time, the log level, the logger name, and the thread name.</p>
<p>Log levels allow you to filter messages quickly to reduce noise. If you’re looking for an error, you don’t want to see all the warnings or debug messages, for example. It’s trivial to disable or enable messages at certain log levels through the configuration, without even touching the code.</p>
<p>With logging, you can keep your debug messages separate from the standard output. All the log messages go to the standard error stream by default, which can conveniently show up in different colors. However, you can redirect log messages to separate files, even for individual modules!</p>
<p>Quite commonly, misconfigured logging can lead to running out of space on the server’s disk. To prevent that, you may set up <strong>log rotation</strong>, which will keep the log files for a specified duration, such as one week, or once they hit a certain size. Nevertheless, it’s always a good practice to archive older logs. Some regulations enforce that customer data be kept for as long as five years!</p>
<p>Compared to other programming languages, <a href="https://realpython.com/python-logging/">logging in Python</a> is simpler, because the <code>logging</code> module is bundled with the standard library. You just import and configure it in as little as two lines of code:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span>
<span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span><span class="p">)</span>
</pre></div>
<p>You can call functions defined at the module level, which are hooked to the <strong>root logger</strong>, but more the common practice is to obtain a dedicated logger for each of your source files:</p>
<div class="highlight python"><pre><span></span><span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span> <span class="c1"># Module-level function</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span> <span class="c1"># Logger's method</span>
</pre></div>
<p>The advantage of using custom loggers is more fine-grain control. They’re usually named after the module they were defined in through the <code>__name__</code> variable.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> There’s a somewhat related <code>warnings</code> module in Python, which can also log messages to the standard error stream. However, it has a narrower spectrum of applications, mostly in library code, whereas client applications should use the <code>logging</code> module.</p>
<p>That said, you can make them work together by calling <code>logging.captureWarnings(True)</code>.</p>
</div>
<p>One last reason to switch from the <code>print()</code> function to logging is thread safety. In the upcoming section, you’ll see that the former doesn’t play well with multiple threads of execution.</p>
<h3 id="debugging">Debugging</h3>
<p>The truth is that neither tracing nor logging can be considered real debugging. To do actual debugging, you need a debugger tool, which allows you to do the following:</p>
<ul>
<li>Step through the code interactively.</li>
<li>Set breakpoints, including conditional breakpoints.</li>
<li>Introspect variables in memory.</li>
<li>Evaluate custom expressions at runtime.</li>
</ul>
<p>A crude debugger that runs in the terminal, unsurprisingly named <strong><code>pdb</code></strong> for “The Python Debugger,” is distributed as part of the standard library. This makes it always available, so it may be your only choice for performing remote debugging. Perhaps that’s a good reason to get familiar with it.</p>
<p>However, it doesn’t come with a graphical interface, so <a href="https://realpython.com/python-debugging-pdb/">using <code>pdb</code></a> may be a bit tricky. If you can’t edit the code, you have to run it as a module and pass your script’s location:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python -m pdb my_script.py
</pre></div>
<p>Otherwise, you can set up a breakpoint directly in the code, which will pause the execution of your script and drop you into the debugger. The old way of doing this required two steps:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pdb</span>
<span class="gp">>>> </span><span class="n">pdb</span><span class="o">.</span><span class="n">set_trace</span><span class="p">()</span>
<span class="go">--Return--</span>
<span class="go">> <stdin>(1)<module>()->None</span>
<span class="go">(Pdb)</span>
</pre></div>
<p>This shows up an interactive prompt, which might look intimidating at first. However, you can still type native Python at this point to examine or modify the state of local variables. Apart from that, there’s really only a handful of debugger-specific commands that you want to use for stepping through the code.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> It’s customary to put the two instructions for spinning up a debugger on a single line. This requires the use of a semicolon, which is rarely found in Python programs:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">pdb</span><span class="p">;</span> <span class="n">pdb</span><span class="o">.</span><span class="n">set_trace</span><span class="p">()</span>
</pre></div>
<p>While certainly not Pythonic, it stands out as a reminder to remove it after you’re done with debugging.</p>
</div>
<p>Since Python 3.7, you can also call the built-in <code>breakpoint()</code> function, which does the same thing, but in a more compact way and with some additional <a href="https://realpython.com/python37-new-features/#the-breakpoint-built-in">bells and whistles</a>:</p>
<div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">average</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">breakpoint</span><span class="p">()</span> <span class="c1"># Python 3.7+</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
</pre></div>
<p>You’re probably going to use a visual debugger integrated with a code editor for the most part. <a href="https://www.jetbrains.com/pycharm/">PyCharm</a> has an excellent debugger, which boasts high performance, but you’ll find <a href="https://realpython.com/python-ides-code-editors-guide/">plenty of alternative IDEs</a> with debuggers, both paid and free of charge.</p>
<p>Debugging isn’t the proverbial silver bullet. Sometimes logging or tracing will be a better solution. For example, defects that are hard to reproduce, such as <a href="https://en.wikipedia.org/wiki/Race_condition">race conditions</a>, often result from temporal coupling. When you stop at a breakpoint, that little pause in program execution may mask the problem. It’s kind of like the <a href="https://en.wikipedia.org/wiki/Uncertainty_principle">Heisenberg principle</a>: you can’t measure and observe a bug at the same time.</p>
<p>These methods aren’t mutually exclusive. They complement each other.</p>
<h2 id="thread-safe-printing">Thread-Safe Printing</h2>
<p>I briefly touched upon the thread safety issue before, recommending <code>logging</code> over the <code>print()</code> function. If you’re still reading this, then you must be comfortable with <a href="https://realpython.com/intro-to-python-threading/">the concept of threads</a>.</p>
<p>Thread safety means that a piece of code can be safely shared between multiple threads of execution. The simplest strategy for ensuring thread-safety is by sharing <strong>immutable</strong> objects only. If threads can’t modify an object’s state, then there’s no risk of breaking its consistency.</p>
<p>Another method takes advantage of <strong>local memory</strong>, which makes each thread receive its own copy of the same object. That way, other threads can’t see the changes made to it in the current thread.</p>
<p>But that doesn’t solve the problem, does it? You often want your threads to cooperate by being able to mutate a shared resource. The most common way of synchronizing concurrent access to such a resource is by <strong>locking</strong> it. This gives exclusive write access to one or sometimes a few threads at a time.</p>
<p>However, locking is expensive and reduces concurrent throughput, so other means for controlling access have been invented, such as <strong>atomic variables</strong> or the <strong>compare-and-swap</strong> algorithm.</p>
<p>Printing isn’t thread-safe in Python. The <code>print()</code> function holds a reference to the standard output, which is a shared global variable. In theory, because there’s no locking, a context switch could happen during a call to <code>sys.stdout.write()</code>, intertwining bits of text from multiple <code>print()</code> calls.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> A context switch means that one thread halts its execution, either voluntarily or not, so that another one can take over. This might happen at any moment, even in the middle of a function call.</p>
</div>
<p>In practice, however, that doesn’t happen. No matter how hard you try, writing to the standard output seems to be atomic. The only problem that you may sometimes observe is with messed up line breaks:</p>
<div class="highlight text"><pre><span></span>[Thread-3 A][Thread-2 A][Thread-1 A]
[Thread-3 B][Thread-1 B]
[Thread-1 C][Thread-3 C]
[Thread-2 B]
[Thread-2 C]
</pre></div>
<p>To simulate this, you can increase the likelihood of a context switch by making the underlying <code>.write()</code> method go to sleep for a random amount of time. How? By mocking it, which you already know about from an earlier section:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">time</span> <span class="k">import</span> <span class="n">sleep</span>
<span class="kn">from</span> <span class="nn">random</span> <span class="k">import</span> <span class="n">random</span>
<span class="kn">from</span> <span class="nn">threading</span> <span class="k">import</span> <span class="n">current_thread</span><span class="p">,</span> <span class="n">Thread</span>
<span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="k">import</span> <span class="n">patch</span>
<span class="n">write</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span>
<span class="k">def</span> <span class="nf">slow_write</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="n">sleep</span><span class="p">(</span><span class="n">random</span><span class="p">())</span>
<span class="n">write</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">task</span><span class="p">():</span>
<span class="n">thread_name</span> <span class="o">=</span> <span class="n">current_thread</span><span class="p">()</span><span class="o">.</span><span class="n">name</span>
<span class="k">for</span> <span class="n">letter</span> <span class="ow">in</span> <span class="s1">'ABC'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'[</span><span class="si">{thread_name}</span><span class="s1"> </span><span class="si">{letter}</span><span class="s1">]'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'sys.stdout'</span><span class="p">)</span> <span class="k">as</span> <span class="n">mock_stdout</span><span class="p">:</span>
<span class="n">mock_stdout</span><span class="o">.</span><span class="n">write</span> <span class="o">=</span> <span class="n">slow_write</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
<span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">task</span><span class="p">)</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</pre></div>
<p>First, you need to store the original <code>.write()</code> method in a variable, which you’ll delegate to later. Then you provide your fake implementation, which will take up to one second to execute. Each thread will make a few <code>print()</code> calls with its name and a letter: A, B, and C.</p>
<p>If you read the mocking section before, then you may already have an idea of why printing misbehaves like that. Nonetheless, to make it crystal clear, you can capture values fed into your <code>slow_write()</code> function. You’ll notice that you get a slightly different sequence each time:</p>
<div class="highlight python"><pre><span></span><span class="p">[</span>
<span class="s1">'[Thread-3 A]'</span><span class="p">,</span>
<span class="s1">'[Thread-2 A]'</span><span class="p">,</span>
<span class="s1">'[Thread-1 A]'</span><span class="p">,</span>
<span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">'[Thread-3 B]'</span><span class="p">,</span>
<span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="p">]</span>
</pre></div>
<p>Even though <code>sys.stdout.write()</code> itself is an atomic operation, a single call to the <code>print()</code> function can yield more than one write. For example, line breaks are written separately from the rest of the text, and context switching takes place between those writes.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> The atomic nature of the standard output in Python is a byproduct of the <a href="https://realpython.com/python-gil/">Global Interpreter Lock</a>, which applies locking around bytecode instructions. Be aware, however, that many interpreter flavors don’t have the GIL, where multi-threaded printing requires explicit locking.</p>
</div>
<p>You can make the newline character become an integral part of the message by handling it manually:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'[</span><span class="si">{thread_name}</span><span class="s1"> </span><span class="si">{letter}</span><span class="s1">]</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>This will fix the output:</p>
<div class="highlight text"><pre><span></span>[Thread-2 A]
[Thread-1 A]
[Thread-3 A]
[Thread-1 B]
[Thread-3 B]
[Thread-2 B]
[Thread-1 C]
[Thread-2 C]
[Thread-3 C]
</pre></div>
<p>Notice, however, that the <code>print()</code> function still keeps making a separate call for the empty suffix, which translates to useless <code>sys.stdout.write('')</code> instruction:</p>
<div class="highlight python"><pre><span></span><span class="p">[</span>
<span class="s1">'[Thread-2 A]</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">'[Thread-1 A]</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">'[Thread-3 A]</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">''</span><span class="p">,</span>
<span class="s1">''</span><span class="p">,</span>
<span class="s1">''</span><span class="p">,</span>
<span class="s1">'[Thread-1 B]</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="p">]</span>
</pre></div>
<p>A truly thread-safe version of the <code>print()</code> function could look like this:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">threading</span>
<span class="n">lock</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Lock</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">thread_safe_print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">with</span> <span class="n">lock</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</pre></div>
<p>You can put that function in a module and import it elsewhere:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">thread_safe_print</span> <span class="k">import</span> <span class="n">thread_safe_print</span>
<span class="k">def</span> <span class="nf">task</span><span class="p">():</span>
<span class="n">thread_name</span> <span class="o">=</span> <span class="n">current_thread</span><span class="p">()</span><span class="o">.</span><span class="n">name</span>
<span class="k">for</span> <span class="n">letter</span> <span class="ow">in</span> <span class="s1">'ABC'</span><span class="p">:</span>
<span class="n">thread_safe_print</span><span class="p">(</span><span class="n">f</span><span class="s1">'[</span><span class="si">{thread_name}</span><span class="s1"> </span><span class="si">{letter}</span><span class="s1">]'</span><span class="p">)</span>
</pre></div>
<p>Now, despite making two writes per each <code>print()</code> request, only one thread is allowed to interact with the stream, while the rest must wait:</p>
<div class="highlight python"><pre><span></span><span class="p">[</span>
<span class="c1"># Lock acquired by Thread-3 </span>
<span class="s1">'[Thread-3 A]'</span><span class="p">,</span>
<span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="c1"># Lock released by Thread-3</span>
<span class="c1"># Lock acquired by Thread-1</span>
<span class="s1">'[Thread-1 B]'</span><span class="p">,</span>
<span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span>
<span class="c1"># Lock released by Thread-1</span>
<span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="p">]</span>
</pre></div>
<p>I added comments to indicate how the lock is limiting access to the shared resource.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Even in single-threaded code, you might get caught up in a similar situation. Specifically, when you’re printing to the standard output and the standard error streams at the same time. Unless you redirect one or both of them to separate files, they’ll both share a single terminal window.</p>
</div>
<p>Conversely, the <code>logging</code> module is thread-safe by design, which is reflected by its ability to display thread names in the formatted message:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">logging</span>
<span class="gp">>>> </span><span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="nb">format</span><span class="o">=</span><span class="s1">'</span><span class="si">%(threadName)s</span><span class="s1"> </span><span class="si">%(message)s</span><span class="s1">'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">)</span>
<span class="go">MainThread hello</span>
</pre></div>
<p>It’s another reason why you might not want to use the <code>print()</code> function all the time.</p>
<h2 id="python-print-counterparts">Python Print Counterparts</h2>
<p>By now, you know a lot of what there is to know about <code>print()</code>! The subject, however, wouldn’t be complete without talking about its counterparts a little bit. While <code>print()</code> is about the output, there are functions and libraries for the input.</p>
<h3 id="built-in">Built-In</h3>
<p>Python comes with a built-in function for accepting input from the user, predictably called <code>input()</code>. It accepts data from the standard input stream, which is usually the keyboard:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">name</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">'Enter your name: '</span><span class="p">)</span>
<span class="go">Enter your name: jdoe</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="go">jdoe</span>
</pre></div>
<p>The function always returns a string, so you might need to parse it accordingly:</p>
<div class="highlight python"><pre><span></span><span class="k">try</span><span class="p">:</span>
<span class="n">age</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s1">'How old are you? '</span><span class="p">))</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="k">pass</span>
</pre></div>
<p>The prompt parameter is completely optional, so nothing will show if you skip it, but the function will still work:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">x</span> <span class="o">=</span> <span class="nb">input</span><span class="p">()</span>
<span class="go">hello world</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="go">hello world</span>
</pre></div>
<p>Nevertheless, throwing in a descriptive call to action makes the user experience so much better.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> To read from the standard input in Python 2, you have to call <code>raw_input()</code> instead, which is yet another built-in. Unfortunately, there’s also a misleadingly named <code>input()</code> function, which does a slightly different thing.</p>
<p>In fact, it also takes the input from the standard stream, but then it tries to evaluate it as if it was Python code. Because that’s a potential <strong>security vulnerability</strong>, this function was completely removed from Python 3, while <code>raw_input()</code> got renamed to <code>input()</code>.</p>
<p>Here’s a quick comparison of the available functions and what they do:</p>
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Python 2</th>
<th>Python 3</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>raw_input()</code></td>
<td><code>input()</code></td>
</tr>
<tr>
<td><code>input()</code></td>
<td><code>eval(input())</code></td>
</tr>
</tbody>
</table>
</div>
<p>As you can tell, it’s still possible to simulate the old behavior in Python 3.</p>
</div>
<p>Asking the user for a password with <code>input()</code> is a bad idea because it’ll show up in plaintext as they’re typing it. In this case, you should be using the <code>getpass()</code> function instead, which masks typed characters. This function is defined in a module under the same name, which is also available in the standard library:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">getpass</span> <span class="k">import</span> <span class="n">getpass</span>
<span class="gp">>>> </span><span class="n">password</span> <span class="o">=</span> <span class="n">getpass</span><span class="p">()</span>
<span class="go">Password: </span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
<span class="go">s3cret</span>
</pre></div>
<p>The <code>getpass</code> module has another function for getting the user’s name from an environment variable:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">getpass</span> <span class="k">import</span> <span class="n">getuser</span>
<span class="gp">>>> </span><span class="n">getuser</span><span class="p">()</span>
<span class="go">'jdoe'</span>
</pre></div>
<p>Python’s built-in functions for handling the standard input are quite limited. At the same time, there are plenty of third-party packages, which offer much more sophisticated tools.</p>
<h3 id="third-party">Third-Party</h3>
<p>There are external Python packages out there that allow for building complex graphical interfaces specifically to collect data from the user. Some of their features include:</p>
<ul>
<li>Advanced formatting and styling</li>
<li>Automated parsing, validation, and sanitization of user data</li>
<li>A declarative style of defining layouts</li>
<li>Interactive autocompletion</li>
<li>Mouse support</li>
<li>Predefined widgets such as checklists or menus</li>
<li>Searchable history of typed commands</li>
<li>Syntax highlighting</li>
</ul>
<p>Demonstrating such tools is outside of the scope of this article, but you may want to try them out. I personally got to know about some of those through the <a href="https://pythonbytes.fm/">Python Bytes Podcast</a>. Here they are:</p>
<ul>
<li><a href="https://github.com/Mckinsey666/bullet"><code>bullet</code></a></li>
<li><a href="https://pypi.org/project/cooked-input/"><code>cooked-input</code></a></li>
<li><a href="https://pypi.org/project/prompt_toolkit/"><code>prompt_toolkit</code></a></li>
<li><a href="https://github.com/kylebebak/questionnaire"><code>questionnaire</code></a></li>
</ul>
<p>Nonetheless, it’s worth mentioning a command line tool called <code>rlwrap</code> that adds powerful line editing capabilities to your Python scripts for free. You don’t have to do anything for it to work!</p>
<p>Let’s assume you wrote a command-line interface that understands three instructions, including one for adding numbers:</p>
<div class="highlight python"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'Type "help", "exit", "add a [b [c ...]]"'</span><span class="p">)</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">command</span><span class="p">,</span> <span class="o">*</span><span class="n">arguments</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">'~ '</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">command</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">command</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'exit'</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">elif</span> <span class="n">command</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'help'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'This is help.'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">command</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'add'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">int</span><span class="p">,</span> <span class="n">arguments</span><span class="p">)))</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Unknown command'</span><span class="p">)</span>
</pre></div>
<p>At first glance, it seems like a typical prompt when you run it:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python calculator.py
<span class="go">Type "help", "exit", "add a [b [c ...]]"</span>
<span class="go">~ add 1 2 3 4</span>
<span class="go">10</span>
<span class="go">~ aad 2 3</span>
<span class="go">Unknown command</span>
<span class="go">~ exit</span>
<span class="gp">$</span>
</pre></div>
<p>But as soon as you make a mistake and want to fix it, you’ll see that none of the function keys work as expected. Hitting the <span class="keys"><kbd class="key-arrow-left">Left</kbd></span> arrow, for example, results in this instead of moving the cursor back:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python calculator.py
<span class="go">Type "help", "exit", "add a [b [c ...]]"</span>
<span class="go">~ aad^[[D</span>
</pre></div>
<p>Now, you can wrap the same script with the <code>rlwrap</code> command. Not only will you get the arrow keys working, but you’ll also be able to search through the persistent history of your custom commands, use autocompletion, and edit the line with shortcuts:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> rlwrap python calculator.py
<span class="go">Type "help", "exit", "add a [b [c ...]]"</span>
<span class="go">(reverse-i-search)`a': add 1 2 3 4</span>
</pre></div>
<p>Isn’t that great?</p>
<h2 id="conclusion">Conclusion</h2>
<p>You’re now armed with a body of knowledge about the <code>print()</code> function in Python, as well as many surrounding topics. You have a deep understanding of what it is and how it works, involving all of its key elements. Numerous examples gave you insight into its evolution from Python 2.</p>
<p>Apart from that, you learned how to:</p>
<ul>
<li>Avoid common mistakes with <code>print()</code> in Python</li>
<li>Deal with newlines, character encodings and buffering</li>
<li>Write text to files</li>
<li>Mock the <code>print()</code> function in unit tests</li>
<li>Build advanced user interfaces in the terminal</li>
</ul>
<p>Now that you know all this, you can make interactive programs that communicate with users or produce data in popular file formats. You’re able to quickly diagnose problems in your code and protect yourself from them. Last but not least, you know how to implement the classic snake game.</p>
<p>If you’re still thirsty for more information, have questions, or simply would like to share your thoughts, then feel free to reach out in the comments section below.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Inheritance and Composition: A Python OOP Guidehttps://realpython.com/inheritance-composition-python/2019-08-07T14:00:00+00:00In this step-by-step tutorial, you'll learn about inheritance and composition in Python. You'll improve your object-oriented programming (OOP) skills by understanding how to use inheritance and composition and how to leverage them in their design.
<p>In this article, you’ll explore <strong>inheritance</strong> and <strong>composition</strong> in Python. <a href="https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)">Inheritance</a> and <a href="https://en.wikipedia.org/wiki/Object_composition">composition</a> are two important concepts in object oriented programming that model the relationship between two classes. They are the building blocks of <a href="https://realpython.com/python3-object-oriented-programming/">object oriented design</a>, and they help programmers to write reusable code.</p>
<p><strong>By the end of this article, you’ll know how to</strong>:</p>
<ul>
<li>Use inheritance in Python</li>
<li>Model class hierarchies using inheritance</li>
<li>Use multiple inheritance in Python and understand its drawbacks</li>
<li>Use composition to create complex objects</li>
<li>Reuse existing code by applying composition</li>
<li>Change application behavior at run-time through composition</li>
</ul>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="#" class="alert-link" data-toggle="modal" data-target="#modal-python-oop" data-focus="false">Click here to get access to a free Python OOP Cheat Sheet</a> that points you to the best tutorials, videos, and books to learn more about Object-Oriented Programming with Python.</p></div>
<h2 id="what-are-inheritance-and-composition">What Are Inheritance and Composition?</h2>
<p><strong>Inheritance</strong> and <strong>composition</strong> are two major concepts in object oriented programming that model the relationship between two classes. They drive the design of an application and determine how the application should evolve as new features are added or requirements change.</p>
<p>Both of them enable code reuse, but they do it in different ways.</p>
<h3 id="whats-inheritance">What’s Inheritance?</h3>
<p><strong>Inheritance</strong> models what is called an <strong>is a</strong> relationship. This means that when you have a <code>Derived</code> class that inherits from a <code>Base</code> class, you created a relationship where <code>Derived</code> <strong>is a</strong> specialized version of <code>Base</code>.</p>
<p>Inheritance is represented using the <a href="https://www.uml.org/">Unified Modeling Language</a> or UML in the following way:</p>
<p><a href="https://files.realpython.com/media/ic-basic-inheritance.f8dc9ffee4d7.jpg" target="_blank"><img class="img-fluid mx-auto d-block w-33" src="https://files.realpython.com/media/ic-basic-inheritance.f8dc9ffee4d7.jpg" width="242" height="343" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-basic-inheritance.f8dc9ffee4d7.jpg&w=60&sig=5b8ced824343b1ab983944639b080f1fb53c0ba4 60w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-basic-inheritance.f8dc9ffee4d7.jpg&w=121&sig=e32f6166c127193b0856fcb61f9d2677ff7aa293 121w, https://files.realpython.com/media/ic-basic-inheritance.f8dc9ffee4d7.jpg 242w" sizes="75vw" alt="Basic inheritance between Base and Derived classes"/></a></p>
<p>Classes are represented as boxes with the class name on top. The inheritance relationship is represented by an arrow from the derived class pointing to the base class. The word <strong>extends</strong> is usually added to the arrow.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> In an inheritance relationship:</p>
<ul>
<li>Classes that inherit from another are called derived classes, subclasses, or subtypes. </li>
<li>Classes from which other classes are derived are called base classes or super classes. </li>
<li>A derived class is said to derive, inherit, or extend a base class.</li>
</ul>
</div>
<p>Let’s say you have a base class <code>Animal</code> and you derive from it to create a <code>Horse</code> class. The inheritance relationship states that a <code>Horse</code> <strong>is an</strong> <code>Animal</code>. This means that <code>Horse</code> inherits the interface and implementation of <code>Animal</code>, and <code>Horse</code> objects can be used to replace <code>Animal</code> objects in the application.</p>
<p>This is known as the <a href="https://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov substitution principle</a>. The principle states that “in a computer program, if <code>S</code> is a subtype of <code>T</code>, then objects of type <code>T</code> may be replaced with objects of type <code>S</code> without altering any of the desired properties of the program”.</p>
<p>You’ll see in this article why you should always follow the Liskov substitution principle when creating your class hierarchies, and the problems you’ll run into if you don’t.</p>
<h3 id="whats-composition">What’s Composition?</h3>
<p><strong>Composition</strong> is a concept that models a <strong>has a</strong> relationship. It enables creating complex types by combining objects of other types. This means that a class <code>Composite</code> can contain an object of another class <code>Component</code>. This relationship means that a <code>Composite</code> <strong>has a</strong> <code>Component</code>.</p>
<p>UML represents composition as follows:</p>
<p><a href="https://files.realpython.com/media/ic-basic-composition.8a15876f7db2.jpg" target="_blank"><img class="img-fluid mx-auto d-block w-33" src="https://files.realpython.com/media/ic-basic-composition.8a15876f7db2.jpg" width="249" height="348" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-basic-composition.8a15876f7db2.jpg&w=62&sig=fe9cee54da335ac7d82f147d947f595d9063864f 62w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-basic-composition.8a15876f7db2.jpg&w=124&sig=28fd4da7064494aa88f126b3b073aaab8a500b61 124w, https://files.realpython.com/media/ic-basic-composition.8a15876f7db2.jpg 249w" sizes="75vw" alt="Basic composition between Composite and Component classes"/></a></p>
<p>Composition is represented through a line with a diamond at the composite class pointing to the component class. The composite side can express the cardinality of the relationship. The cardinality indicates the number or valid range of <code>Component</code> instances the <code>Composite</code> class will contain.</p>
<p>In the diagram above, the <code>1</code> represents that the <code>Composite</code> class contains one object of type <code>Component</code>. Cardinality can be expressed in the following ways:</p>
<ul>
<li><strong>A number</strong> indicates the number of <code>Component</code> instances that are contained in the <code>Composite</code>.</li>
<li><strong>The * symbol</strong> indicates that the <code>Composite</code> class can contain a variable number of <code>Component</code> instances.</li>
<li><strong>A range 1..4</strong> indicates that the <code>Composite</code> class can contain a range of <code>Component</code> instances. The range is indicated with the minimum and maximum number of instances, or minimum and many instances like in <strong>1..*</strong>.</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> Classes that contain objects of other classes are usually referred to as composites, where classes that are used to create more complex types are referred to as components.</p>
</div>
<p>For example, your <code>Horse</code> class can be composed by another object of type <code>Tail</code>. Composition allows you to express that relationship by saying a <code>Horse</code> <strong>has a</strong> <code>Tail</code>.</p>
<p>Composition enables you to reuse code by adding objects to other objects, as opposed to inheriting the interface and implementation of other classes. Both <code>Horse</code> and <code>Dog</code> classes can leverage the functionality of <code>Tail</code> through composition without deriving one class from the other.</p>
<h2 id="an-overview-of-inheritance-in-python">An Overview of Inheritance in Python</h2>
<p>Everything in Python is an object. Modules are objects, class definitions and functions are objects, and of course, objects created from classes are objects too.</p>
<p>Inheritance is a required feature of every object oriented programming language. This means that Python supports inheritance, and as you’ll see later, it’s one of the few languages that supports multiple inheritance.</p>
<p>When you write Python code using classes, you are using inheritance even if you don’t know you’re using it. Let’s take a look at what that means.</p>
<h3 id="the-object-super-class">The Object Super Class</h3>
<p>The easiest way to see inheritance in Python is to jump into the <a href="https://realpython.com/interacting-with-python/#using-the-python-interpreter-interactively">Python interactive shell</a> and write a little bit of code. You’ll start by writing the simplest class possible:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">MyClass</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">pass</span>
<span class="gp">...</span>
</pre></div>
<p>You declared a class <code>MyClass</code> that doesn’t do much, but it will illustrate the most basic inheritance concepts. Now that you have the class declared, you can use the <code>dir()</code> function to list its members:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">c</span> <span class="o">=</span> <span class="n">MyClass</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">dir</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="go">['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',</span>
<span class="go">'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',</span>
<span class="go">'__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__',</span>
<span class="go">'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',</span>
<span class="go">'__str__', '__subclasshook__', '__weakref__']</span>
</pre></div>
<p><a href="https://docs.python.org/3/library/functions.html#dir"><code>dir()</code></a> returns a list of all the members in the specified object. You have not declared any members in <code>MyClass</code>, so where is the list coming from? You can find out using the interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="n">o</span> <span class="o">=</span> <span class="nb">object</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">dir</span><span class="p">(</span><span class="n">o</span><span class="p">)</span>
<span class="go">['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__',</span>
<span class="go">'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',</span>
<span class="go">'__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__',</span>
<span class="go">'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',</span>
<span class="go">'__subclasshook__']</span>
</pre></div>
<p>As you can see, the two lists are nearly identical. There are some additional members in <code>MyClass</code> like <code>__dict__</code> and <code>__weakref__</code>, but every single member of the <code>object</code> class is also present in <code>MyClass</code>.</p>
<p>This is because every class you create in Python implicitly derives from <code>object</code>. You could be more explicit and write <code>class MyClass(object):</code>, but it’s redundant and unnecessary.</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> In Python 2, you have to explicitly derive from <code>object</code> for reasons beyond the scope of this article, but you can read about it in the <a href="https://docs.python.org/2/reference/datamodel.html#new-style-and-classic-classes">New-style and classic classes</a> section of the Python 2 documentation.</p>
</div>
<h3 id="exceptions-are-an-exception">Exceptions Are an Exception</h3>
<p>Every class that you create in Python will implicitly derive from <code>object</code>. The exception to this rule are classes used to indicate errors by raising an <a href="https://realpython.com/courses/python-exceptions-101/">exception</a>.</p>
<p>You can see the problem using the Python interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">MyError</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">pass</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">raise</span> <span class="n">MyError</span><span class="p">()</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="gr">TypeError</span>: <span class="n">exceptions must derive from BaseException</span>
</pre></div>
<p>You created a new class to indicate a type of error. Then you tried to use it to raise an exception. An exception is raised but the output states that the exception is of type <code>TypeError</code> not <code>MyError</code> and that all <code>exceptions must derive from BaseException</code>.</p>
<p><code>BaseException</code> is a base class provided for all error types. To create a new error type, you must derive your class from <code>BaseException</code> or one of its derived classes. The convention in Python is to derive your custom error types from <code>Exception</code>, which in turn derives from <code>BaseException</code>.</p>
<p>The correct way to define your error type is the following:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">MyError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">pass</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="k">raise</span> <span class="n">MyError</span><span class="p">()</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="err">__main__.MyError</span>
</pre></div>
<p>As you can see, when you raise <code>MyError</code>, the output correctly states the type of error raised.</p>
<h3 id="creating-class-hierarchies">Creating Class Hierarchies</h3>
<p>Inheritance is the mechanism you’ll use to create hierarchies of related classes. These related classes will share a common interface that will be defined in the base classes. Derived classes can specialize the interface by providing a particular implementation where applies.</p>
<p>In this section, you’ll start modeling an HR system. The example will demonstrate the use of inheritance and how derived classes can provide a concrete implementation of the base class interface.</p>
<p>The HR system needs to process payroll for the company’s employees, but there are different types of employees depending on how their payroll is calculated.</p>
<p>You start by implementing a <code>PayrollSystem</code> class that processes payroll:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>The <code>PayrollSystem</code> implements a <code>.calculate_payroll()</code> method that takes a collection of employees and prints their <code>id</code>, <code>name</code>, and check amount using the <code>.calculate_payroll()</code> method exposed on each employee object.</p>
<p>Now, you implement a base class <code>Employee</code> that handles the common interface for every employee type:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</pre></div>
<p><code>Employee</code> is the base class for all employee types. It is constructed with an <code>id</code> and a <code>name</code>. What you are saying is that every <code>Employee</code> must have an <code>id</code> assigned as well as a name.</p>
<p>The HR system requires that every <code>Employee</code> processed must provide a <code>.calculate_payroll()</code> interface that returns the weekly salary for the employee. The implementation of that interface differs depending on the type of <code>Employee</code>.</p>
<p>For example, administrative workers have a fixed salary, so every week they get paid the same amount:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">SalaryEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span> <span class="o">=</span> <span class="n">weekly_salary</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span>
</pre></div>
<p>You create a derived class <code>SalaryEmployee</code> that inherits <code>Employee</code>. The class is initialized with the <code>id</code> and <code>name</code> required by the base class, and you use <code>super()</code> to initialize the members of the base class. You can read all about <code>super()</code> in <a href="https://realpython.com/python-super/">Supercharge Your Classes With Python super()</a>.</p>
<p><code>SalaryEmployee</code> also requires a <code>weekly_salary</code> initialization parameter that represents the amount the employee makes per week.</p>
<p>The class provides the required <code>.calculate_payroll()</code> method used by the HR system. The implementation just returns the amount stored in <code>weekly_salary</code>.</p>
<p>The company also employs manufacturing workers that are paid by the hour, so you add an <code>HourlyEmployee</code> to the HR system:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">HourlyEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">=</span> <span class="n">hours_worked</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span> <span class="o">=</span> <span class="n">hour_rate</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span>
</pre></div>
<p>The <code>HourlyEmployee</code> class is initialized with <code>id</code> and <code>name</code>, like the base class, plus the <code>hours_worked</code> and the <code>hour_rate</code> required to calculate the payroll. The <code>.calculate_payroll()</code> method is implemented by returning the hours worked times the hour rate.</p>
<p>Finally, the company employs sales associates that are paid through a fixed salary plus a commission based on their sales, so you create a <code>CommissionEmployee</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">CommissionEmployee</span><span class="p">(</span><span class="n">SalaryEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">commission</span> <span class="o">=</span> <span class="n">commission</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">fixed</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fixed</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission</span>
</pre></div>
<p>You derive <code>CommissionEmployee</code> from <code>SalaryEmployee</code> because both classes have a <code>weekly_salary</code> to consider. At the same time, <code>CommissionEmployee</code> is initialized with a <code>commission</code> value that is based on the sales for the employee.</p>
<p><code>.calculate_payroll()</code> leverages the implementation of the base class to retrieve the <code>fixed</code> salary and adds the commission value.</p>
<p>Since <code>CommissionEmployee</code> derives from <code>SalaryEmployee</code>, you have access to the <code>weekly_salary</code> property directly, and you could’ve implemented <code>.calculate_payroll()</code> using the value of that property.</p>
<p>The problem with accessing the property directly is that if the implementation of <code>SalaryEmployee.calculate_payroll()</code> changes, then you’ll have to also change the implementation of <code>CommissionEmployee.calculate_payroll()</code>. It’s better to rely on the already implemented method in the base class and extend the functionality as needed.</p>
<p>You created your first class hierarchy for the system. The UML diagram of the classes looks like this:</p>
<p><a href="https://files.realpython.com/media/ic-initial-employee-inheritance.b5f1e65cb8d1.jpg" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ic-initial-employee-inheritance.b5f1e65cb8d1.jpg" width="785" height="744" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-initial-employee-inheritance.b5f1e65cb8d1.jpg&w=196&sig=58f0045964de008f5860e073922ac03b99fcaf9c 196w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-initial-employee-inheritance.b5f1e65cb8d1.jpg&w=392&sig=fce4157604c9d5ad7f8670d79b4a386bae166b02 392w, https://files.realpython.com/media/ic-initial-employee-inheritance.b5f1e65cb8d1.jpg 785w" sizes="75vw" alt="Inheritance example with multiple Employee derived classes"/></a></p>
<p>The diagram shows the inheritance hierarchy of the classes. The derived classes implement the <code>IPayrollCalculator</code> interface, which is required by the <code>PayrollSystem</code>. The <code>PayrollSystem.calculate_payroll()</code> implementation requires that the <code>employee</code> objects passed contain an <code>id</code>, <code>name</code>, and <code>calculate_payroll()</code> implementation. </p>
<p>Interfaces are represented similarly to classes with the word <strong>interface</strong> above the interface name. Interface names are usually prefixed with a capital <code>I</code>.</p>
<p>The application creates its employees and passes them to the payroll system to process payroll:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">hr</span>
<span class="n">salary_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">SalaryEmployee</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">hourly_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">HourlyEmployee</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">commission_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">CommissionEmployee</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">([</span>
<span class="n">salary_employee</span><span class="p">,</span>
<span class="n">hourly_employee</span><span class="p">,</span>
<span class="n">commission_employee</span>
<span class="p">])</span>
</pre></div>
<p>You can run the program in the command line and see the results:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 2 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
</pre></div>
<p>The program creates three employee objects, one for each of the derived classes. Then, it creates the payroll system and passes a list of the employees to its <code>.calculate_payroll()</code> method, which calculates the payroll for each employee and prints the results.</p>
<p>Notice how the <code>Employee</code> base class doesn’t define a <code>.calculate_payroll()</code> method. This means that if you were to create a plain <code>Employee</code> object and pass it to the <code>PayrollSystem</code>, then you’d get an error. You can try it in the Python interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">hr</span>
<span class="gp">>>> </span><span class="n">employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">Employee</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Invalid'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">([</span><span class="n">employee</span><span class="p">])</span>
<span class="go">Payroll for: 1 - Invalid</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
File <span class="nb">"/hr.py"</span>, line <span class="m">39</span>, in <span class="n">calculate_payroll</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="gr">AttributeError</span>: <span class="n">'Employee' object has no attribute 'calculate_payroll'</span>
</pre></div>
<p>While you can instantiate an <code>Employee</code> object, the object can’t be used by the <code>PayrollSystem</code>. Why? Because it can’t <code>.calculate_payroll()</code> for an <code>Employee</code>. To meet the requirements of <code>PayrollSystem</code>, you’ll want to convert the <code>Employee</code> class, which is currently a concrete class, to an abstract class. That way, no employee is ever just an <code>Employee</code>, but one that implements <code>.calculate_payroll()</code>.</p>
<h3 id="abstract-base-classes-in-python">Abstract Base Classes in Python</h3>
<p>The <code>Employee</code> class in the example above is what is called an abstract base class. Abstract base classes exist to be inherited, but never instantiated. Python provides the <code>abc</code> module to define abstract base classes.</p>
<p>You can use <a href="https://dbader.org/blog/meaning-of-underscores-in-python">leading underscores</a> in your class name to communicate that objects of that class should not be created. Underscores provide a friendly way to prevent misuse of your code, but they don’t prevent eager users from creating instances of that class. </p>
<p>The <a href="https://docs.python.org/3/library/abc.html#module-abc"><code>abc</code> module</a> in the Python standard library provides functionality to prevent creating objects from abstract base classes.</p>
<p>You can modify the implementation of the <code>Employee</code> class to ensure that it can’t be instantiated:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="kn">from</span> <span class="nn">abc</span> <span class="k">import</span> <span class="n">ABC</span><span class="p">,</span> <span class="n">abstractmethod</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">(</span><span class="n">ABC</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>You derive <code>Employee</code> from <code>ABC</code>, making it an abstract base class. Then, you decorate the <code>.calculate_payroll()</code> method with the <code>@abstractmethod</code> <a href="https://realpython.com/primer-on-python-decorators/">decorator</a>.</p>
<p>This change has two nice side-effects:</p>
<ol>
<li>You’re telling users of the module that objects of type <code>Employee</code> can’t be created. </li>
<li>You’re telling other developers working on the <code>hr</code> module that if they derive from <code>Employee</code>, then they must override the <code>.calculate_payroll()</code> abstract method.</li>
</ol>
<p>You can see that objects of type <code>Employee</code> can’t be created using the interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">hr</span>
<span class="gp">>>> </span><span class="n">employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">Employee</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'abstract'</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
<span class="gr">TypeError</span>: <span class="n">Can't instantiate abstract class Employee with abstract methods </span>
<span class="go">calculate_payroll</span>
</pre></div>
<p>The output shows that the class cannot be instantiated because it contains an abstract method <code>calculate_payroll()</code>. Derived classes must override the method to allow creating objects of their type.</p>
<h3 id="implementation-inheritance-vs-interface-inheritance">Implementation Inheritance vs Interface Inheritance</h3>
<p>When you derive one class from another, the derived class inherits both:</p>
<ol>
<li>
<p><strong>The base class interface:</strong> The derived class inherits all the methods, properties, and attributes of the base class.</p>
</li>
<li>
<p><strong>The base class implementation:</strong> The derived class inherits the code that implements the class interface.</p>
</li>
</ol>
<p>Most of the time, you’ll want to inherit the implementation of a class, but you will want to implement multiple interfaces, so your objects can be used in different situations.</p>
<p>Modern programming languages are designed with this basic concept in mind. They allow you to inherit from a single class, but you can implement multiple interfaces.</p>
<p>In Python, you don’t have to explicitly declare an interface. Any object that implements the desired interface can be used in place of another object. This is known as <a href="https://realpython.com/python-type-checking/#duck-typing"><strong>duck typing</strong></a>. Duck typing is usually explained as “if it behaves like a duck, then it’s a duck.”</p>
<p>To illustrate this, you will now add a <code>DisgruntledEmployee</code> class to the example above which doesn’t derive from <code>Employee</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In disgruntled.py</span>
<span class="k">class</span> <span class="nc">DisgruntledEmployee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">1000000</span>
</pre></div>
<p>The <code>DisgruntledEmployee</code> class doesn’t derive from <code>Employee</code>, but it exposes the same interface required by the <code>PayrollSystem</code>. The <code>PayrollSystem.calculate_payroll()</code> requires a list of objects that implement the following interface:</p>
<ul>
<li>An <strong><code>id</code></strong> property or attribute that returns the employee’s id</li>
<li>A <strong><code>name</code></strong> property or attribute that represents the employee’s name</li>
<li>A <strong><code>.calculate_payroll()</code></strong> method that doesn’t take any parameters and returns the payroll amount to process</li>
</ul>
<p>All these requirements are met by the <code>DisgruntledEmployee</code> class, so the <code>PayrollSystem</code> can still calculate its payroll.</p>
<p>You can modify the program to use the <code>DisgruntledEmployee</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">hr</span>
<span class="kn">import</span> <span class="nn">disgruntled</span>
<span class="n">salary_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">SalaryEmployee</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">hourly_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">HourlyEmployee</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">commission_employee</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">CommissionEmployee</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">disgruntled_employee</span> <span class="o">=</span> <span class="n">disgruntled</span><span class="o">.</span><span class="n">DisgruntledEmployee</span><span class="p">(</span><span class="mi">20000</span><span class="p">,</span> <span class="s1">'Anonymous'</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">([</span>
<span class="n">salary_employee</span><span class="p">,</span>
<span class="n">hourly_employee</span><span class="p">,</span>
<span class="n">commission_employee</span><span class="p">,</span>
<span class="n">disgruntled_employee</span>
<span class="p">])</span>
</pre></div>
<p>The program creates a <code>DisgruntledEmployee</code> object and adds it to the list processed by the <code>PayrollSystem</code>. You can now run the program and see its output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 2 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 20000 - Anonymous</span>
<span class="go">- Check amount: 1000000</span>
</pre></div>
<p>As you can see, the <code>PayrollSystem</code> can still process the new object because it meets the desired interface.</p>
<p>Since you don’t have to derive from a specific class for your objects to be reusable by the program, you may be asking why you should use inheritance instead of just implementing the desired interface. The following rules may help you:</p>
<ul>
<li>
<p><strong>Use inheritance to reuse an implementation:</strong> Your derived classes should leverage most of their base class implementation. They must also model an <strong>is a</strong> relationship. A <code>Customer</code> class might also have an <code>id</code> and a <code>name</code>, but a <code>Customer</code> is not an <code>Employee</code>, so you should not use inheritance.</p>
</li>
<li>
<p><strong>Implement an interface to be reused:</strong> When you want your class to be reused by a specific part of your application, you implement the required interface in your class, but you don’t need to provide a base class, or inherit from another class.</p>
</li>
</ul>
<p>You can now clean up the example above to move onto the next topic. You can delete the <code>disgruntled.py</code> file and then modify the <code>hr</code> module to its original state:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">class</span> <span class="nc">SalaryEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span> <span class="o">=</span> <span class="n">weekly_salary</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span>
<span class="k">class</span> <span class="nc">HourlyEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">=</span> <span class="n">hours_worked</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span> <span class="o">=</span> <span class="n">hour_rate</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span>
<span class="k">class</span> <span class="nc">CommissionEmployee</span><span class="p">(</span><span class="n">SalaryEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">commission</span> <span class="o">=</span> <span class="n">commission</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">fixed</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fixed</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission</span>
</pre></div>
<p>You removed the import of the <code>abc</code> module since the <code>Employee</code> class doesn’t need to be abstract. You also removed the abstract <code>calculate_payroll()</code> method from it since it doesn’t provide any implementation.</p>
<p>Basically, you are inheriting the implementation of the <code>id</code> and <code>name</code> attributes of the <code>Employee</code> class in your derived classes. Since <code>.calculate_payroll()</code> is just an interface to the <code>PayrollSystem.calculate_payroll()</code> method, you don’t need to implement it in the <code>Employee</code> base class.</p>
<p>Notice how the <code>CommissionEmployee</code> class derives from <code>SalaryEmployee</code>. This means that <code>CommissionEmployee</code> inherits the implementation and interface of <code>SalaryEmployee</code>. You can see how the <code>CommissionEmployee.calculate_payroll()</code> method leverages the base class implementation because it relies on the result from <code>super().calculate_payroll()</code> to implement its own version.</p>
<h3 id="the-class-explosion-problem">The Class Explosion Problem</h3>
<p>If you are not careful, inheritance can lead you to a huge hierarchical structure of classes that is hard to understand and maintain. This is known as the <strong>class explosion problem</strong>.</p>
<p>You started building a class hierarchy of <code>Employee</code> types used by the <code>PayrollSystem</code> to calculate payroll. Now, you need to add some functionality to those classes, so they can be used with the new <code>ProductivitySystem</code>. </p>
<p>The <code>ProductivitySystem</code> tracks productivity based on employee roles. There are different employee roles:</p>
<ul>
<li><strong>Managers:</strong> They walk around yelling at people telling them what to do. They are salaried employees and make more money.</li>
<li><strong>Secretaries:</strong> They do all the paper work for managers and ensure that everything gets billed and payed on time. They are also salaried employees but make less money.</li>
<li><strong>Sales employees:</strong> They make a lot of phone calls to sell products. They have a salary, but they also get commissions for sales.</li>
<li><strong>Factory workers:</strong> They manufacture the products for the company. They are paid by the hour.</li>
</ul>
<p>With those requirements, you start to see that <code>Employee</code> and its derived classes might belong somewhere other than the <code>hr</code> module because now they’re also used by the <code>ProductivitySystem</code>.</p>
<p>You create an <code>employees</code> module and move the classes there:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">class</span> <span class="nc">SalaryEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span> <span class="o">=</span> <span class="n">weekly_salary</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span>
<span class="k">class</span> <span class="nc">HourlyEmployee</span><span class="p">(</span><span class="n">Employee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">=</span> <span class="n">hours_worked</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span> <span class="o">=</span> <span class="n">hour_rate</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span>
<span class="k">class</span> <span class="nc">CommissionEmployee</span><span class="p">(</span><span class="n">SalaryEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">commission</span> <span class="o">=</span> <span class="n">commission</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">fixed</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fixed</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission</span>
</pre></div>
<p>The implementation remains the same, but you move the classes to the <code>employee</code> module. Now, you change your program to support the change:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">hr</span>
<span class="kn">import</span> <span class="nn">employees</span>
<span class="n">salary_employee</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">SalaryEmployee</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">hourly_employee</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">HourlyEmployee</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">commission_employee</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">CommissionEmployee</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">([</span>
<span class="n">salary_employee</span><span class="p">,</span>
<span class="n">hourly_employee</span><span class="p">,</span>
<span class="n">commission_employee</span>
<span class="p">])</span>
</pre></div>
<p>You run the program and verify that it still works:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 2 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
</pre></div>
<p>With everything in place, you start adding the new classes:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Manager</span><span class="p">(</span><span class="n">SalaryEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.name}</span><span class="s1"> screams and yells for </span><span class="si">{hours}</span><span class="s1"> hours.'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Secretary</span><span class="p">(</span><span class="n">SalaryEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.name}</span><span class="s1"> expends </span><span class="si">{hours}</span><span class="s1"> hours doing office paperwork.'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SalesPerson</span><span class="p">(</span><span class="n">CommissionEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.name}</span><span class="s1"> expends </span><span class="si">{hours}</span><span class="s1"> hours on the phone.'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">FactoryWorker</span><span class="p">(</span><span class="n">HourlyEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.name}</span><span class="s1"> manufactures gadgets for </span><span class="si">{hours}</span><span class="s1"> hours.'</span><span class="p">)</span>
</pre></div>
<p>First, you add a <code>Manager</code> class that derives from <code>SalaryEmployee</code>. The class exposes a method <code>work()</code> that will be used by the productivity system. The method takes the <code>hours</code> the employee worked.</p>
<p>Then you add <code>Secretary</code>, <code>SalesPerson</code>, and <code>FactoryWorker</code> and then implement the <code>work()</code> interface, so they can be used by the productivity system.</p>
<p>Now, you can add the <code>ProductivitySytem</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In productivity.py</span>
<span class="k">class</span> <span class="nc">ProductivitySystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">track</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Tracking Employee Productivity'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'=============================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="n">employee</span><span class="o">.</span><span class="n">work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>The class tracks employees in the <code>track()</code> method that takes a list of employees and the number of hours to track. You can now add the productivity system to your program:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">hr</span>
<span class="kn">import</span> <span class="nn">employees</span>
<span class="kn">import</span> <span class="nn">productivity</span>
<span class="n">manager</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Manager</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Mary Poppins'</span><span class="p">,</span> <span class="mi">3000</span><span class="p">)</span>
<span class="n">secretary</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Secretary</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">sales_guy</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">SalesPerson</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">factory_worker</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">FactoryWorker</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">employees</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">manager</span><span class="p">,</span>
<span class="n">secretary</span><span class="p">,</span>
<span class="n">sales_guy</span><span class="p">,</span>
<span class="n">factory_worker</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">productivity_system</span> <span class="o">=</span> <span class="n">productivity</span><span class="o">.</span><span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="n">productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>The program creates a list of employees of different types. The employee list is sent to the productivity system to track their work for 40 hours. Then the same list of employees is sent to the payroll system to calculate their payroll.</p>
<p>You can run the program to see the output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Mary Poppins screams and yells for 40 hours.</span>
<span class="go">John Smith expends 40 hours doing office paperwork.</span>
<span class="go">Kevin Bacon expends 40 hours on the phone.</span>
<span class="go">Jane Doe manufactures gadgets for 40 hours.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
</pre></div>
<p>The program shows the employees working for 40 hours through the productivity system. Then it calculates and displays the payroll for each of the employees.</p>
<p>The program works as expected, but you had to add four new classes to support the changes. As new requirements come, your class hierarchy will inevitably grow, leading to the class explosion problem where your hierarchies will become so big that they’ll be hard to understand and maintain.</p>
<p>The following diagram shows the new class hierarchy:</p>
<p><a href="https://files.realpython.com/media/ic-class-explosion.a3d42b8c9b91.jpg" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ic-class-explosion.a3d42b8c9b91.jpg" width="1184" height="1199" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-class-explosion.a3d42b8c9b91.jpg&w=296&sig=3df3dc7dac2263ed3beba586e8731fc276936b39 296w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-class-explosion.a3d42b8c9b91.jpg&w=592&sig=e2d4c2f1d5145230cd248e731aca8cdfe5e25709 592w, https://files.realpython.com/media/ic-class-explosion.a3d42b8c9b91.jpg 1184w" sizes="75vw" alt="Class design explosion by inheritance"/></a></p>
<p>The diagram shows how the class hierarchy is growing. Additional requirements might have an exponential effect in the number of classes with this design.</p>
<h3 id="inheriting-multiple-classes">Inheriting Multiple Classes</h3>
<p>Python is one of the few modern programming languages that supports multiple inheritance. Multiple inheritance is the ability to derive a class from multiple base classes at the same time.</p>
<p>Multiple inheritance has a bad reputation to the extent that most modern programming languages don’t support it. Instead, modern programming languages support the concept of interfaces. In those languages, you inherit from a single base class and then implement multiple interfaces, so your class can be re-used in different situations.</p>
<p>This approach puts some constraints in your designs. You can only inherit the implementation of one class by directly deriving from it. You can implement multiple interfaces, but you can’t inherit the implementation of multiple classes.</p>
<p>This constraint is good for software design because it forces you to design your classes with fewer dependencies on each other. You will see later in this article that you can leverage multiple implementations through composition, which makes software more flexible. This section, however, is about multiple inheritance, so let’s take a look at how it works.</p>
<p>It turns out that sometimes temporary secretaries are hired when there is too much paperwork to do. The <code>TemporarySecretary</code> class performs the role of a <code>Secretary</code> in the context of the <code>ProductivitySystem</code>, but for payroll purposes, it is an <code>HourlyEmployee</code>.</p>
<p>You look at your class design. It has grown a little bit, but you can still understand how it works. It seems you have two options:</p>
<ol>
<li>
<p><strong>Derive from <code>Secretary</code>:</strong> You can derive from <code>Secretary</code> to inherit the <code>.work()</code> method for the role, and then override the <code>.calculate_payroll()</code> method to implement it as an <code>HourlyEmployee</code>.</p>
</li>
<li>
<p><strong>Derive from <code>HourlyEmployee</code>:</strong> You can derive from <code>HourlyEmployee</code> to inherit the <code>.calculate_payroll()</code> method, and then override the <code>.work()</code> method to implement it as a <code>Secretary</code>.</p>
</li>
</ol>
<p>Then, you remember that Python supports multiple inheritance, so you decide to derive from both <code>Secretary</code> and <code>HourlyEmployee</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">Secretary</span><span class="p">,</span> <span class="n">HourlyEmployee</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>Python allows you to inherit from two different classes by specifying them between parenthesis in the class declaration.</p>
<p>Now, you modify your program to add the new temporary secretary employee:</p>
<div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">hr</span>
<span class="kn">import</span> <span class="nn">employees</span>
<span class="kn">import</span> <span class="nn">productivity</span>
<span class="n">manager</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Manager</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Mary Poppins'</span><span class="p">,</span> <span class="mi">3000</span><span class="p">)</span>
<span class="n">secretary</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Secretary</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">sales_guy</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">SalesPerson</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">factory_worker</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">FactoryWorker</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="hll"><span class="n">temporary_secretary</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">TemporarySecretary</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'Robin Williams'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
</span><span class="n">company_employees</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">manager</span><span class="p">,</span>
<span class="n">secretary</span><span class="p">,</span>
<span class="n">sales_guy</span><span class="p">,</span>
<span class="n">factory_worker</span><span class="p">,</span>
<span class="hll"> <span class="n">temporary_secretary</span><span class="p">,</span>
</span><span class="p">]</span>
<span class="n">productivity_system</span> <span class="o">=</span> <span class="n">productivity</span><span class="o">.</span><span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="n">productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">company_employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">company_employees</span><span class="p">)</span>
</pre></div>
<p>You run the program to test it:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File ".\program.py", line 9, in <module></span>
<span class="go"> temporary_secretary = employee.TemporarySecretary(5, 'Robin Williams', 40, 9)</span>
<span class="go">TypeError: __init__() takes 4 positional arguments but 5 were given</span>
</pre></div>
<p>You get a <a href="https://docs.python.org/3/library/exceptions.html#TypeError"><code>TypeError</code></a> exception saying that <code>4</code> positional arguments where expected, but <code>5</code> were given.</p>
<p>This is because you derived <code>TemporarySecretary</code> first from <code>Secretary</code> and then from <code>HourlyEmployee</code>, so the interpreter is trying to use <code>Secretary.__init__()</code> to initialize the object.</p>
<p>Okay, let’s reverse it:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">HourlyEmployee</span><span class="p">,</span> <span class="n">Secretary</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>Now, run the program again and see what happens:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File ".\program.py", line 9, in <module></span>
<span class="go"> temporary_secretary = employee.TemporarySecretary(5, 'Robin Williams', 40, 9)</span>
<span class="go"> File "employee.py", line 16, in __init__</span>
<span class="go"> super().__init__(id, name)</span>
<span class="go">TypeError: __init__() missing 1 required positional argument: 'weekly_salary'</span>
</pre></div>
<p>Now it seems you are missing a <code>weekly_salary</code> parameter, which is necessary to initialize <code>Secretary</code>, but that parameter doesn’t make sense in the context of a <code>TemporarySecretary</code> because it’s an <code>HourlyEmployee</code>.</p>
<p>Maybe implementing <code>TemporarySecretary.__init__()</code> will help:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">HourlyEmployee</span><span class="p">,</span> <span class="n">Secretary</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">)</span>
</pre></div>
<p>Try it:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File ".\program.py", line 9, in <module></span>
<span class="go"> temporary_secretary = employee.TemporarySecretary(5, 'Robin Williams', 40, 9)</span>
<span class="go"> File "employee.py", line 54, in __init__</span>
<span class="go"> super().__init__(id, name, hours_worked, hour_rate)</span>
<span class="go"> File "employee.py", line 16, in __init__</span>
<span class="go"> super().__init__(id, name)</span>
<span class="go">TypeError: __init__() missing 1 required positional argument: 'weekly_salary'</span>
</pre></div>
<p>That didn’t work either. Okay, it’s time for you to dive into Python’s <strong>method resolution order</strong> (MRO) to see what’s going on.</p>
<p>When a method or attribute of a class is accessed, Python uses the class <a href="https://www.python.org/download/releases/2.3/mro/">MRO</a> to find it. The MRO is also used by <code>super()</code> to determine which method or attribute to invoke. You can learn more about <code>super()</code> in <a href="https://realpython.com/python-super/">Supercharge Your Classes With Python super()</a>.</p>
<p>You can evaluate the <code>TemporarySecretary</code> class MRO using the interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">TemporarySecretary</span>
<span class="gp">>>> </span><span class="n">TemporarySecretary</span><span class="o">.</span><span class="vm">__mro__</span>
<span class="go">(<class 'employees.TemporarySecretary'>,</span>
<span class="go"> <class 'employees.HourlyEmployee'>,</span>
<span class="go"> <class 'employees.Secretary'>,</span>
<span class="go"> <class 'employees.SalaryEmployee'>,</span>
<span class="go"> <class 'employees.Employee'>,</span>
<span class="go"> <class 'object'></span>
<span class="go">)</span>
</pre></div>
<p>The MRO shows the order in which Python is going to look for a matching attribute or method. In the example, this is what happens when we create the <code>TemporarySecretary</code> object:</p>
<ol>
<li>
<p>The <code>TemporarySecretary.__init__(self, id, name, hours_worked, hour_rate)</code> method is called.</p>
</li>
<li>
<p>The <code>super().__init__(id, name, hours_worked, hour_rate)</code> call matches <code>HourlyEmployee.__init__(self, id, name, hour_worked, hour_rate)</code>.</p>
</li>
<li>
<p><code>HourlyEmployee</code> calls <code>super().__init__(id, name)</code>, which the MRO is going to match to <code>Secretary.__init__()</code>, which is inherited from <code>SalaryEmployee.__init__(self, id, name, weekly_salary)</code>.</p>
</li>
</ol>
<p>Because the parameters don’t match, a <code>TypeError</code> exception is raised.</p>
<p>You can bypass the MRO by reversing the inheritance order and directly calling <code>HourlyEmployee.__init__()</code> as follows:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">Secretary</span><span class="p">,</span> <span class="n">HourlyEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="n">HourlyEmployee</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">)</span>
</pre></div>
<p>That solves the problem of creating the object, but you will run into a similar problem when trying to calculate payroll. You can run the program to see the problem:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Mary Poppins screams and yells for 40 hours.</span>
<span class="go">John Smith expends 40 hours doing office paperwork.</span>
<span class="go">Kevin Bacon expends 40 hours on the phone.</span>
<span class="go">Jane Doe manufactures gadgets for 40 hours.</span>
<span class="go">Robin Williams expends 40 hours doing office paperwork.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File ".\program.py", line 20, in <module></span>
<span class="go"> payroll_system.calculate_payroll(employees)</span>
<span class="go"> File "hr.py", line 7, in calculate_payroll</span>
<span class="go"> print(f'- Check amount: {employee.calculate_payroll()}')</span>
<span class="go"> File "employee.py", line 12, in calculate_payroll</span>
<span class="go"> return self.weekly_salary</span>
<span class="go">AttributeError: 'TemporarySecretary' object has no attribute 'weekly_salary'</span>
</pre></div>
<p>The problem now is that because you reversed the inheritance order, the MRO is finding the <code>.calculate_payroll()</code> method of <code>SalariedEmployee</code> before the one in <code>HourlyEmployee</code>. You need to override <code>.calculate_payroll()</code> in <code>TemporarySecretary</code> and invoke the right implementation from it:</p>
<div class="highlight python"><pre><span></span><span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">Secretary</span><span class="p">,</span> <span class="n">HourlyEmployee</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="n">HourlyEmployee</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">HourlyEmployee</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</pre></div>
<p>The <code>calculate_payroll()</code> method directly invokes <code>HourlyEmployee.calculate_payroll()</code> to ensure that you get the correct result. You can run the program again to see it working:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Mary Poppins screams and yells for 40 hours.</span>
<span class="go">John Smith expends 40 hours doing office paperwork.</span>
<span class="go">Kevin Bacon expends 40 hours on the phone.</span>
<span class="go">Jane Doe manufactures gadgets for 40 hours.</span>
<span class="go">Robin Williams expends 40 hours doing office paperwork.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
</pre></div>
<p>The program now works as expected because you’re forcing the method resolution order by explicitly telling the interpreter which method we want to use.</p>
<p>As you can see, multiple inheritance can be confusing, especially when you run into the <a href="https://en.wikipedia.org/wiki/Multiple_inheritance#The_diamond_problem">diamond problem</a>.</p>
<p>The following diagram shows the diamond problem in your class hierarchy:</p>
<p><a href="https://files.realpython.com/media/ic-diamond-problem.8e685f12d3c2.jpg" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ic-diamond-problem.8e685f12d3c2.jpg" width="726" height="845" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-diamond-problem.8e685f12d3c2.jpg&w=181&sig=8c590866d2b03d475c750f72405d8fd9e325c350 181w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-diamond-problem.8e685f12d3c2.jpg&w=363&sig=13e5b6884a585b8f452fc384238507c40f9c97b8 363w, https://files.realpython.com/media/ic-diamond-problem.8e685f12d3c2.jpg 726w" sizes="75vw" alt="Diamond problem caused by multiple inheritance"/></a></p>
<p>The diagram shows the diamond problem with the current class design. <code>TemporarySecretary</code> uses multiple inheritance to derive from two classes that ultimately also derive from <code>Employee</code>. This causes two paths to reach the <code>Employee</code> base class, which is something you want to avoid in your designs.</p>
<p>The diamond problem appears when you’re using multiple inheritance and deriving from two classes that have a common base class. This can cause the wrong version of a method to be called. </p>
<p>As you’ve seen, Python provides a way to force the right method to be invoked, and analyzing the MRO can help you understand the problem.</p>
<p>Still, when you run into the diamond problem, it’s better to re-think the design. You will now make some changes to leverage multiple inheritance, avoiding the diamond problem.</p>
<p>The <code>Employee</code> derived classes are used by two different systems:</p>
<ol>
<li>
<p><strong>The productivity system</strong> that tracks employee productivity.</p>
</li>
<li>
<p><strong>The payroll system</strong> that calculates the employee payroll.</p>
</li>
</ol>
<p>This means that everything related to productivity should be together in one module and everything related to payroll should be together in another. You can start making changes to the productivity module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In productivity.py</span>
<span class="k">class</span> <span class="nc">ProductivitySystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">track</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Tracking Employee Productivity'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'=============================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">employee</span><span class="o">.</span><span class="n">work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{employee.name}</span><span class="s1">: </span><span class="si">{result}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">ManagerRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'screams and yells for </span><span class="si">{hours}</span><span class="s1"> hours.'</span>
<span class="k">class</span> <span class="nc">SecretaryRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'expends </span><span class="si">{hours}</span><span class="s1"> hours doing office paperwork.'</span>
<span class="k">class</span> <span class="nc">SalesRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'expends </span><span class="si">{hours}</span><span class="s1"> hours on the phone.'</span>
<span class="k">class</span> <span class="nc">FactoryRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'manufactures gadgets for </span><span class="si">{hours}</span><span class="s1"> hours.'</span>
</pre></div>
<p>The <code>productivity</code> module implements the <code>ProductivitySystem</code> class, as well as the related roles it supports. The classes implement the <code>work()</code> interface required by the system, but they don’t derived from <code>Employee</code>.</p>
<p>You can do the same with the <code>hr</code> module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SalaryPolicy</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span> <span class="o">=</span> <span class="n">weekly_salary</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span>
<span class="k">class</span> <span class="nc">HourlyPolicy</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">=</span> <span class="n">hours_worked</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span> <span class="o">=</span> <span class="n">hour_rate</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span>
<span class="k">class</span> <span class="nc">CommissionPolicy</span><span class="p">(</span><span class="n">SalaryPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">weekly_salary</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">commission</span> <span class="o">=</span> <span class="n">commission</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">fixed</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fixed</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission</span>
</pre></div>
<p>The <code>hr</code> module implements the <code>PayrollSystem</code>, which calculates payroll for the employees. It also implements the policy classes for payroll. As you can see, the policy classes don’t derive from <code>Employee</code> anymore.</p>
<p>You can now add the necessary classes to the <code>employee</code> module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="p">(</span>
<span class="n">SalaryPolicy</span><span class="p">,</span>
<span class="n">CommissionPolicy</span><span class="p">,</span>
<span class="n">HourlyPolicy</span>
<span class="p">)</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="p">(</span>
<span class="n">ManagerRole</span><span class="p">,</span>
<span class="n">SecretaryRole</span><span class="p">,</span>
<span class="n">SalesRole</span><span class="p">,</span>
<span class="n">FactoryRole</span>
<span class="p">)</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">class</span> <span class="nc">Manager</span><span class="p">(</span><span class="n">Employee</span><span class="p">,</span> <span class="n">ManagerRole</span><span class="p">,</span> <span class="n">SalaryPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="n">SalaryPolicy</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Secretary</span><span class="p">(</span><span class="n">Employee</span><span class="p">,</span> <span class="n">SecretaryRole</span><span class="p">,</span> <span class="n">SalaryPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="n">SalaryPolicy</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SalesPerson</span><span class="p">(</span><span class="n">Employee</span><span class="p">,</span> <span class="n">SalesRole</span><span class="p">,</span> <span class="n">CommissionPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">):</span>
<span class="n">CommissionPolicy</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">FactoryWorker</span><span class="p">(</span><span class="n">Employee</span><span class="p">,</span> <span class="n">FactoryRole</span><span class="p">,</span> <span class="n">HourlyPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="n">HourlyPolicy</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">TemporarySecretary</span><span class="p">(</span><span class="n">Employee</span><span class="p">,</span> <span class="n">SecretaryRole</span><span class="p">,</span> <span class="n">HourlyPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="n">HourlyPolicy</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours_worked</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
</pre></div>
<p>The <code>employees</code> module imports policies and roles from the other modules and implements the different <code>Employee</code> types. You are still using multiple inheritance to inherit the implementation of the salary policy classes and the productivity roles, but the implementation of each class only needs to deal with initialization.</p>
<p>Notice that you still need to explicitly initialize the salary policies in the constructors. You probably saw that the initializations of <code>Manager</code> and <code>Secretary</code> are identical. Also, the initializations of <code>FactoryWorker</code> and <code>TemporarySecretary</code> are the same.</p>
<p>You will not want to have this kind of code duplication in more complex designs, so you have to be careful when designing class hierarchies.</p>
<p>Here’s the UML diagram for the new design:</p>
<p><a href="https://files.realpython.com/media/ic-inheritance-policies.0a0de2d42a25.jpg" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ic-inheritance-policies.0a0de2d42a25.jpg" width="1221" height="1030" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-inheritance-policies.0a0de2d42a25.jpg&w=305&sig=b0ddba71545417031c5f11c94ea9b3bd5870e6fb 305w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-inheritance-policies.0a0de2d42a25.jpg&w=610&sig=eb49f48a49342bc70a46148ca0a780d517d69f39 610w, https://files.realpython.com/media/ic-inheritance-policies.0a0de2d42a25.jpg 1221w" sizes="75vw" alt="Policy based design using multiple inheritance"/></a></p>
<p>The diagram shows the relationships to define the <code>Secretary</code> and <code>TemporarySecretary</code> using multiple inheritance, but avoiding the diamond problem.</p>
<p>You can run the program and see how it works:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Mary Poppins: screams and yells for 40 hours.</span>
<span class="go">John Smith: expends 40 hours doing office paperwork.</span>
<span class="go">Kevin Bacon: expends 40 hours on the phone.</span>
<span class="go">Jane Doe: manufactures gadgets for 40 hours.</span>
<span class="go">Robin Williams: expends 40 hours doing office paperwork.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
</pre></div>
<p>You’ve seen how inheritance and multiple inheritance work in Python. You can now explore the topic of composition.</p>
<h2 id="composition-in-python">Composition in Python</h2>
<p><strong>Composition</strong> is an object oriented design concept that models a <strong>has a</strong> relationship. In composition, a class known as <strong>composite</strong> contains an object of another class known to as <strong>component</strong>. In other words, a composite class <strong>has a</strong> component of another class.</p>
<p>Composition allows composite classes to reuse the implementation of the components it contains. The composite class doesn’t inherit the component class interface, but it can leverage its implementation.</p>
<p>The composition relation between two classes is considered loosely coupled. That means that changes to the component class rarely affect the composite class, and changes to the composite class never affect the component class.</p>
<p>This provides better adaptability to change and allows applications to introduce new requirements without affecting existing code.</p>
<p>When looking at two competing software designs, one based on inheritance and another based on composition, the composition solution usually is the most flexible. You can now look at how composition works.</p>
<p>You’ve already used composition in our examples. If you look at the <code>Employee</code> class, you’ll see that it contains two attributes:</p>
<ol>
<li><strong><code>id</code></strong> to identify an employee.</li>
<li><strong><code>name</code></strong> to contain the name of the employee.</li>
</ol>
<p>These two attributes are objects that the <code>Employee</code> class has. Therefore, you can say that an <code>Employee</code> <strong>has an</strong> <code>id</code> and <strong>has a</strong> name.</p>
<p>Another attribute for an <code>Employee</code> might be an <code>Address</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In contacts.py</span>
<span class="k">class</span> <span class="nc">Address</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">street</span><span class="p">,</span> <span class="n">city</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">zipcode</span><span class="p">,</span> <span class="n">street2</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street</span> <span class="o">=</span> <span class="n">street</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street2</span> <span class="o">=</span> <span class="n">street2</span>
<span class="bp">self</span><span class="o">.</span><span class="n">city</span> <span class="o">=</span> <span class="n">city</span>
<span class="bp">self</span><span class="o">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">state</span>
<span class="bp">self</span><span class="o">.</span><span class="n">zipcode</span> <span class="o">=</span> <span class="n">zipcode</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">street</span><span class="p">]</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">:</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">)</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.city}</span><span class="s1">, </span><span class="si">{self.state}</span><span class="s1"> </span><span class="si">{self.zipcode}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span>
</pre></div>
<p>You implemented a basic address class that contains the usual components for an address. You made the <code>street2</code> attribute optional because not all addresses will have that component.</p>
<p>You implemented <code>__str__()</code> to provide a pretty representation of an <code>Address</code>. You can see this implementation in the interactive interpreter:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">contacts</span> <span class="k">import</span> <span class="n">Address</span>
<span class="gp">>>> </span><span class="n">address</span> <span class="o">=</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'55 Main St.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="go">55 Main St.</span>
<span class="go">Concord, NH 03301</span>
</pre></div>
<p>When you <code>print()</code> the <code>address</code> variable, the special method <code>__str__()</code> is invoked. Since you overloaded the method to return a string formatted as an address, you get a nice, readable representation. <a href="https://realpython.com/operator-function-overloading/">Operator and Function Overloading in Custom Python Classes</a> gives a good overview of the special methods available in classes that can be implemented to customize the behavior of your objects.</p>
<p>You can now add the <code>Address</code> to the <code>Employee</code> class through composition:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="kc">None</span>
</pre></div>
<p>You initialize the <code>address</code> attribute to <code>None</code> for now to make it optional, but by doing that, you can now assign an <code>Address</code> to an <code>Employee</code>. Also notice that there is no reference in the <code>employee</code> module to the <code>contacts</code> module.</p>
<p>Composition is a loosely coupled relationship that often doesn’t require the composite class to have knowledge of the component.</p>
<p>The UML diagram representing the relationship between <code>Employee</code> and <code>Address</code> looks like this:</p>
<p><a href="https://files.realpython.com/media/ic-employee-address.240e806b1101.jpg" target="_blank"><img class="img-fluid mx-auto d-block w-33" src="https://files.realpython.com/media/ic-employee-address.240e806b1101.jpg" width="284" height="469" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-employee-address.240e806b1101.jpg&w=71&sig=871e9fce940fb84544980c474624a07c5d52b8e8 71w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-employee-address.240e806b1101.jpg&w=142&sig=658578725c830eb5554ba1e349b27a198c2aa7e9 142w, https://files.realpython.com/media/ic-employee-address.240e806b1101.jpg 284w" sizes="75vw" alt="Composition example with Employee containing Address"/></a></p>
<p>The diagram shows the basic composition relationship between <code>Employee</code> and <code>Address</code>.</p>
<p>You can now modify the <code>PayrollSystem</code> class to leverage the <code>address</code> attribute in <code>Employee</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="hll"> <span class="k">if</span> <span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">:</span>
</span><span class="hll"> <span class="nb">print</span><span class="p">(</span><span class="s1">'- Sent to:'</span><span class="p">)</span>
</span><span class="hll"> <span class="nb">print</span><span class="p">(</span><span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">)</span>
</span> <span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>You check to see if the <code>employee</code> object has an address, and if it does, you print it. You can now modify the program to assign some addresses to the employees:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">hr</span>
<span class="kn">import</span> <span class="nn">employees</span>
<span class="kn">import</span> <span class="nn">productivity</span>
<span class="kn">import</span> <span class="nn">contacts</span>
<span class="n">manager</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Manager</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Mary Poppins'</span><span class="p">,</span> <span class="mi">3000</span><span class="p">)</span>
<span class="n">manager</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">contacts</span><span class="o">.</span><span class="n">Address</span><span class="p">(</span>
<span class="s1">'121 Admin Rd'</span><span class="p">,</span>
<span class="s1">'Concord'</span><span class="p">,</span>
<span class="s1">'NH'</span><span class="p">,</span>
<span class="s1">'03301'</span>
<span class="p">)</span>
<span class="n">secretary</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">Secretary</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'John Smith'</span><span class="p">,</span> <span class="mi">1500</span><span class="p">)</span>
<span class="n">secretary</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">contacts</span><span class="o">.</span><span class="n">Address</span><span class="p">(</span>
<span class="s1">'67 Paperwork Ave.'</span><span class="p">,</span>
<span class="s1">'Manchester'</span><span class="p">,</span>
<span class="s1">'NH'</span><span class="p">,</span>
<span class="s1">'03101'</span>
<span class="p">)</span>
<span class="n">sales_guy</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">SalesPerson</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">factory_worker</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">FactoryWorker</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Jane Doe'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">temporary_secretary</span> <span class="o">=</span> <span class="n">employees</span><span class="o">.</span><span class="n">TemporarySecretary</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'Robin Williams'</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="n">employees</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">manager</span><span class="p">,</span>
<span class="n">secretary</span><span class="p">,</span>
<span class="n">sales_guy</span><span class="p">,</span>
<span class="n">factory_worker</span><span class="p">,</span>
<span class="n">temporary_secretary</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">productivity_system</span> <span class="o">=</span> <span class="n">productivity</span><span class="o">.</span><span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="n">productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">hr</span><span class="o">.</span><span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>You added a couple of addresses to the <code>manager</code> and <code>secretary</code> objects. When you run the program, you will see the addresses printed:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Mary Poppins: screams and yells for {hours} hours.</span>
<span class="go">John Smith: expends {hours} hours doing office paperwork.</span>
<span class="go">Kevin Bacon: expends {hours} hours on the phone.</span>
<span class="go">Jane Doe: manufactures gadgets for {hours} hours.</span>
<span class="go">Robin Williams: expends {hours} hours doing office paperwork.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">- Sent to:</span>
<span class="go">121 Admin Rd</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">- Sent to:</span>
<span class="go">67 Paperwork Ave.</span>
<span class="go">Manchester, NH 03101</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1250</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
</pre></div>
<p>Notice how the payroll output for the <code>manager</code> and <code>secretary</code> objects show the addresses where the checks were sent.</p>
<p>The <code>Employee</code> class leverages the implementation of the <code>Address</code> class without any knowledge of what an <code>Address</code> object is or how it’s represented. This type of design is so flexible that you can change the <code>Address</code> class without any impact to the <code>Employee</code> class.</p>
<h3 id="flexible-designs-with-composition">Flexible Designs With Composition</h3>
<p>Composition is more flexible than inheritance because it models a loosely coupled relationship. Changes to a component class have minimal or no effects on the composite class. Designs based on composition are more suitable to change.</p>
<p>You change behavior by providing new components that implement those behaviors instead of adding new classes to your hierarchy.</p>
<p>Take a look at the multiple inheritance example above. Imagine how new payroll policies will affect the design. Try to picture what the class hierarchy will look like if new roles are needed. As you saw before, relying too heavily on inheritance can lead to class explosion.</p>
<p>The biggest problem is not so much the number of classes in your design, but how tightly coupled the relationships between those classes are. Tightly coupled classes affect each other when changes are introduced.</p>
<p>In this section, you are going to use composition to implement a better design that still fits the requirements of the <code>PayrollSystem</code> and the <code>ProductivitySystem</code>.</p>
<p>You can start by implementing the functionality of the <code>ProductivitySystem</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In productivity.py</span>
<span class="k">class</span> <span class="nc">ProductivitySystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_roles</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'manager'</span><span class="p">:</span> <span class="n">ManagerRole</span><span class="p">,</span>
<span class="s1">'secretary'</span><span class="p">:</span> <span class="n">SecretaryRole</span><span class="p">,</span>
<span class="s1">'sales'</span><span class="p">:</span> <span class="n">SalesRole</span><span class="p">,</span>
<span class="s1">'factory'</span><span class="p">:</span> <span class="n">FactoryRole</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_role</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">role_id</span><span class="p">):</span>
<span class="n">role_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_roles</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">role_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">role_type</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'role_id'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">role_type</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">track</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Tracking Employee Productivity'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'=============================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="n">employee</span><span class="o">.</span><span class="n">work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>The <code>ProductivitySystem</code> class defines some roles using a string identifier mapped to a role class that implements the role. It exposes a <code>.get_role()</code> method that, given a role identifier, returns the role type object. If the role is not found, then a <code>ValueError</code> exception is raised.</p>
<p>It also exposes the previous functionality in the <code>.track()</code> method, where given a list of employees it tracks the productivity of those employees.</p>
<p>You can now implement the different role classes:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In productivity.py</span>
<span class="k">class</span> <span class="nc">ManagerRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">perform_duties</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'screams and yells for </span><span class="si">{hours}</span><span class="s1"> hours.'</span>
<span class="k">class</span> <span class="nc">SecretaryRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">perform_duties</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'does paperwork for </span><span class="si">{hours}</span><span class="s1"> hours.'</span>
<span class="k">class</span> <span class="nc">SalesRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">perform_duties</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'expends </span><span class="si">{hours}</span><span class="s1"> hours on the phone.'</span>
<span class="k">class</span> <span class="nc">FactoryRole</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">perform_duties</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="k">return</span> <span class="n">f</span><span class="s1">'manufactures gadgets for </span><span class="si">{hours}</span><span class="s1"> hours.'</span>
</pre></div>
<p>Each of the roles you implemented expose a <code>.perform_duties()</code> that takes the number of <code>hours</code> worked. The methods return a string representing the duties.</p>
<p>The role classes are independent of each other, but they expose the same interface, so they are interchangeable. You’ll see later how they are used in the application.</p>
<p>Now, you can implement the <code>PayrollSystem</code> for the application:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employee_policies</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span><span class="p">:</span> <span class="n">SalaryPolicy</span><span class="p">(</span><span class="mi">3000</span><span class="p">),</span>
<span class="mi">2</span><span class="p">:</span> <span class="n">SalaryPolicy</span><span class="p">(</span><span class="mi">1500</span><span class="p">),</span>
<span class="mi">3</span><span class="p">:</span> <span class="n">CommissionPolicy</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">100</span><span class="p">),</span>
<span class="mi">4</span><span class="p">:</span> <span class="n">HourlyPolicy</span><span class="p">(</span><span class="mi">15</span><span class="p">),</span>
<span class="mi">5</span><span class="p">:</span> <span class="n">HourlyPolicy</span><span class="p">(</span><span class="mi">9</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employee_id</span><span class="p">):</span>
<span class="n">policy</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employee_policies</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">policy</span><span class="p">:</span>
<span class="k">return</span> <span class="ne">ValueError</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">policy</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'- Sent to:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
</pre></div>
<p>The <code>PayrollSystem</code> keeps an internal database of payroll policies for each employee. It exposes a <code>.get_policy()</code> that, given an employee <code>id</code>, returns its payroll policy. If a specified <code>id</code> doesn’t exist in the system, then the method raises a <code>ValueError</code> exception.</p>
<p>The implementation of <code>.calculate_payroll()</code> works the same as before. It takes a list of employees, calculates the payroll, and prints the results.</p>
<p>You can now implement the payroll policy classes:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">PayrollPolicy</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">track_work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">+=</span> <span class="n">hours</span>
<span class="k">class</span> <span class="nc">SalaryPolicy</span><span class="p">(</span><span class="n">PayrollPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span> <span class="o">=</span> <span class="n">weekly_salary</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">weekly_salary</span>
<span class="k">class</span> <span class="nc">HourlyPolicy</span><span class="p">(</span><span class="n">PayrollPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hour_rate</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span> <span class="o">=</span> <span class="n">hour_rate</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">hour_rate</span>
<span class="k">class</span> <span class="nc">CommissionPolicy</span><span class="p">(</span><span class="n">SalaryPolicy</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">weekly_salary</span><span class="p">,</span> <span class="n">commission_per_sale</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">weekly_salary</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">commission_per_sale</span> <span class="o">=</span> <span class="n">commission_per_sale</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">commission</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">sales</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hours_worked</span> <span class="o">/</span> <span class="mi">5</span>
<span class="k">return</span> <span class="n">sales</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission_per_sale</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">fixed</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fixed</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">commission</span>
</pre></div>
<p>You first implement a <code>PayrollPolicy</code> class that serves as a base class for all the payroll policies. This class tracks the <code>hours_worked</code>, which is common to all payroll policies.</p>
<p>The other policy classes derive from <code>PayrollPolicy</code>. We use inheritance here because we want to leverage the implementation of <code>PayrollPolicy</code>. Also, <code>SalaryPolicy</code>, <code>HourlyPolicy</code>, and <code>CommissionPolicy</code> <strong>are a</strong> <code>PayrollPolicy</code>.</p>
<p><code>SalaryPolicy</code> is initialized with a <code>weekly_salary</code> value that is then used in <code>.calculate_payroll()</code>. <code>HourlyPolicy</code> is initialized with the <code>hour_rate</code>, and implements <code>.calculate_payroll()</code> by leveraging the base class <code>hours_worked</code>.</p>
<p>The <code>CommissionPolicy</code> class derives from <code>SalaryPolicy</code> because it wants to inherit its implementation. It is initialized with the <code>weekly_salary</code> parameters, but it also requires a <code>commission_per_sale</code> parameter.</p>
<p>The <code>commission_per_sale</code> is used to calculate the <code>.commission</code>, which is implemented as a property so it gets calculated when requested. In the example, we are assuming that a sale happens every 5 hours worked, and the <code>.commission</code> is the number of sales times the <code>commission_per_sale</code> value.</p>
<p><code>CommissionPolicy</code> implements the <code>.calculate_payroll()</code> method by first leveraging the implementation in <code>SalaryPolicy</code> and then adding the calculated commission.</p>
<p>You can now add an <code>AddressBook</code> class to manage employee addresses:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In contacts.py</span>
<span class="k">class</span> <span class="nc">AddressBook</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employee_addresses</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'121 Admin Rd.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="mi">2</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'67 Paperwork Ave'</span><span class="p">,</span> <span class="s1">'Manchester'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03101'</span><span class="p">),</span>
<span class="mi">3</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'15 Rose St'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">,</span> <span class="s1">'Apt. B-1'</span><span class="p">),</span>
<span class="mi">4</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'39 Sole St.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="mi">5</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'99 Mountain Rd.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_employee_address</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employee_id</span><span class="p">):</span>
<span class="n">address</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employee_addresses</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">address</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">address</span>
</pre></div>
<p>The <code>AddressBook</code> class keeps an internal database of <code>Address</code> objects for each employee. It exposes a <code>get_employee_address()</code> method that returns the address of the specified employee <code>id</code>. If the employee <code>id</code> doesn’t exist, then it raises a <code>ValueError</code>.</p>
<p>The <code>Address</code> class implementation remains the same as before:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In contacts.py</span>
<span class="k">class</span> <span class="nc">Address</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">street</span><span class="p">,</span> <span class="n">city</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">zipcode</span><span class="p">,</span> <span class="n">street2</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street</span> <span class="o">=</span> <span class="n">street</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street2</span> <span class="o">=</span> <span class="n">street2</span>
<span class="bp">self</span><span class="o">.</span><span class="n">city</span> <span class="o">=</span> <span class="n">city</span>
<span class="bp">self</span><span class="o">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">state</span>
<span class="bp">self</span><span class="o">.</span><span class="n">zipcode</span> <span class="o">=</span> <span class="n">zipcode</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">street</span><span class="p">]</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">:</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">)</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.city}</span><span class="s1">, </span><span class="si">{self.state}</span><span class="s1"> </span><span class="si">{self.zipcode}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span>
</pre></div>
<p>The class manages the address components and provides a pretty representation of an address.</p>
<p>So far, the new classes have been extended to support more functionality, but there are no significant changes to the previous design. This is going to change with the design of the <code>employees</code> module and its classes.</p>
<p>You can start by implementing an <code>EmployeeDatabase</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">ProductivitySystem</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">PayrollSystem</span>
<span class="kn">from</span> <span class="nn">contacts</span> <span class="k">import</span> <span class="n">AddressBook</span>
<span class="k">class</span> <span class="nc">EmployeeDatabase</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employees</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Mary Poppins'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'manager'</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'John Smith'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'secretary'</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'sales'</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Jane Doe'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'factory'</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'id'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Robin Williams'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'secretary'</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">productivity</span> <span class="o">=</span> <span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">payroll</span> <span class="o">=</span> <span class="n">PayrollSystem</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">employee_addresses</span> <span class="o">=</span> <span class="n">AddressBook</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">employees</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_create_employee</span><span class="p">(</span><span class="o">**</span><span class="n">data</span><span class="p">)</span> <span class="k">for</span> <span class="n">data</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employees</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_create_employee</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">role</span><span class="p">):</span>
<span class="n">address</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">employee_addresses</span><span class="o">.</span><span class="n">get_employee_address</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="n">employee_role</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">productivity</span><span class="o">.</span><span class="n">get_role</span><span class="p">(</span><span class="n">role</span><span class="p">)</span>
<span class="n">payroll_policy</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">payroll</span><span class="o">.</span><span class="n">get_policy</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Employee</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">employee_role</span><span class="p">,</span> <span class="n">payroll_policy</span><span class="p">)</span>
</pre></div>
<p>The <code>EmployeeDatabase</code> keeps track of all the employees in the company. For each employee, it tracks the <code>id</code>, <code>name</code>, and <code>role</code>. It <strong>has an</strong> instance of the <code>ProductivitySystem</code>, the <code>PayrollSystem</code>, and the <code>AddressBook</code>. These instances are used to create employees.</p>
<p>It exposes an <code>.employees</code> property that returns the list of employees. The <code>Employee</code> objects are created in an internal method <code>._create_employee()</code>. Notice that you don’t have different types of <code>Employee</code> classes. You just need to implement a single <code>Employee</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">role</span><span class="p">,</span> <span class="n">payroll</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">address</span>
<span class="bp">self</span><span class="o">.</span><span class="n">role</span> <span class="o">=</span> <span class="n">role</span>
<span class="bp">self</span><span class="o">.</span><span class="n">payroll</span> <span class="o">=</span> <span class="n">payroll</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
</pre></div>
<p>The <code>Employee</code> class is initialized with the <code>id</code>, <code>name</code>, and <code>address</code> attributes. It also requires the productivity <code>role</code> for the employee and the <code>payroll</code> policy.</p>
<p>The class exposes a <code>.work()</code> method that takes the hours worked. This method first retrieves the <code>duties</code> from the <code>role</code>. In other words, it delegates to the <code>role</code> object to perform its duties.</p>
<p>In the same way, it delegates to the <code>payroll</code> object to track the work <code>hours</code>. The <code>payroll</code>, as you saw, uses those hours to calculate the payroll if needed.</p>
<p>The following diagram shows the composition design used:</p>
<p><a href="https://files.realpython.com/media/ic-policy-based-composition.6e78bdb5824f.jpg" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/ic-policy-based-composition.6e78bdb5824f.jpg" width="1219" height="1381" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-policy-based-composition.6e78bdb5824f.jpg&w=304&sig=8baa24bb7a519fc76f40f994c95c05833c8ded67 304w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/ic-policy-based-composition.6e78bdb5824f.jpg&w=609&sig=2a14268f94d82ac5f5c64d115f02f8eeea808580 609w, https://files.realpython.com/media/ic-policy-based-composition.6e78bdb5824f.jpg 1219w" sizes="75vw" alt="Policy based design using composition"/></a></p>
<p>The diagram shows the design of composition based policies. There is a single <code>Employee</code> that is composed of other data objects like <code>Address</code> and depends on the <code>IRole</code> and <code>IPayrollCalculator</code> interfaces to delegate the work. There are multiple implementations of these interfaces.</p>
<p>You can now use this design in your program:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">PayrollSystem</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">ProductivitySystem</span>
<span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">EmployeeDatabase</span>
<span class="n">productivity_system</span> <span class="o">=</span> <span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">employee_database</span> <span class="o">=</span> <span class="n">EmployeeDatabase</span><span class="p">()</span>
<span class="n">employees</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">employees</span>
<span class="n">productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>You can run the program to see its output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Employee 1 - Mary Poppins:</span>
<span class="go">- screams and yells for 40 hours.</span>
<span class="go">Employee 2 - John Smith:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Employee 3 - Kevin Bacon:</span>
<span class="go">- expends 40 hours on the phone.</span>
<span class="go">Employee 4 - Jane Doe:</span>
<span class="go">- manufactures gadgets for 40 hours.</span>
<span class="go">Employee 5 - Robin Williams:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">- Sent to:</span>
<span class="go">121 Admin Rd.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">- Sent to:</span>
<span class="go">67 Paperwork Ave</span>
<span class="go">Manchester, NH 03101</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1800.0</span>
<span class="go">- Sent to:</span>
<span class="go">15 Rose St</span>
<span class="go">Apt. B-1</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">- Sent to:</span>
<span class="go">39 Sole St.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
<span class="go">- Sent to:</span>
<span class="go">99 Mountain Rd.</span>
<span class="go">Concord, NH 03301</span>
</pre></div>
<p>This design is what is called <a href="https://en.wikipedia.org/wiki/Modern_C%2B%2B_Design#Policy-based_design">policy-based design</a>, where classes are composed of policies, and they delegate to those policies to do the work.</p>
<p>Policy-based design was introduced in the book <a href="https://realpython.com/asins/B00AU3JUHG">Modern C++ Design</a>, and it uses template metaprogramming in C++ to achieve the results.</p>
<p>Python does not support templates, but you can achieve similar results using composition, as you saw in the example above.</p>
<p>This type of design gives you all the flexibility you’ll need as requirements change. Imagine you need to change the way payroll is calculated for an object at run-time.</p>
<h3 id="customizing-behavior-with-composition">Customizing Behavior With Composition</h3>
<p>If your design relies on inheritance, you need to find a way to change the type of an object to change its behavior. With composition, you just need to change the policy the object uses.</p>
<p>Imagine that our <code>manager</code> all of a sudden becomes a temporary employee that gets paid by the hour. You can modify the object during the execution of the program in the following way:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">PayrollSystem</span><span class="p">,</span> <span class="n">HourlyPolicy</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">ProductivitySystem</span>
<span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">EmployeeDatabase</span>
<span class="n">productivity_system</span> <span class="o">=</span> <span class="n">ProductivitySystem</span><span class="p">()</span>
<span class="n">payroll_system</span> <span class="o">=</span> <span class="n">PayrollSystem</span><span class="p">()</span>
<span class="n">employee_database</span> <span class="o">=</span> <span class="n">EmployeeDatabase</span><span class="p">()</span>
<span class="n">employees</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">employees</span>
<span class="n">manager</span> <span class="o">=</span> <span class="n">employees</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">manager</span><span class="o">.</span><span class="n">payroll</span> <span class="o">=</span> <span class="n">HourlyPolicy</span><span class="p">(</span><span class="mi">55</span><span class="p">)</span>
<span class="n">productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>The program gets the employee list from the <code>EmployeeDatabase</code> and retrieves the first employee, which is the manager we want. Then it creates a new <code>HourlyPolicy</code> initialized at $55 per hour and assigns it to the manager object.</p>
<p>The new policy is now used by the <code>PayrollSystem</code> modifying the existing behavior. You can run the program again to see the result:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Employee 1 - Mary Poppins:</span>
<span class="go">- screams and yells for 40 hours.</span>
<span class="go">Employee 2 - John Smith:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Employee 3 - Kevin Bacon:</span>
<span class="go">- expends 40 hours on the phone.</span>
<span class="go">Employee 4 - Jane Doe:</span>
<span class="go">- manufactures gadgets for 40 hours.</span>
<span class="go">Employee 5 - Robin Williams:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 2200</span>
<span class="go">- Sent to:</span>
<span class="go">121 Admin Rd.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">- Sent to:</span>
<span class="go">67 Paperwork Ave</span>
<span class="go">Manchester, NH 03101</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1800.0</span>
<span class="go">- Sent to:</span>
<span class="go">15 Rose St</span>
<span class="go">Apt. B-1</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">- Sent to:</span>
<span class="go">39 Sole St.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
<span class="go">- Sent to:</span>
<span class="go">99 Mountain Rd.</span>
<span class="go">Concord, NH 03301</span>
</pre></div>
<p>The check for Mary Poppins, our manager, is now for $2200 instead of the fixed salary of $3000 that she had per week.</p>
<p>Notice how we added that business rule to the program without changing any of the existing classes. Consider what type of changes would’ve been required with an inheritance design. </p>
<p>You would’ve had to create a new class and change the type of the manager employee. There is no chance you could’ve changed the policy at run-time.</p>
<h2 id="choosing-between-inheritance-and-composition-in-python">Choosing Between Inheritance and Composition in Python</h2>
<p>So far, you’ve seen how inheritance and composition work in Python. You’ve seen that derived classes inherit the interface and implementation of their base classes. You’ve also seen that composition allows you to reuse the implementation of another class.</p>
<p>You’ve implemented two solutions to the same problem. The first solution used multiple inheritance, and the second one used composition.</p>
<p>You’ve also seen that Python’s duck typing allows you to reuse objects with existing parts of a program by implementing the desired interface. In Python, it isn’t necessary to derive from a base class for your classes to be reused.</p>
<p>At this point, you might be asking when to use inheritance vs composition in Python. They both enable code reuse. Inheritance and composition can tackle similar problems in your Python programs.</p>
<p>The general advice is to use the relationship that creates fewer dependencies between two classes. This relation is composition. Still, there will be times where inheritance will make more sense.</p>
<p>The following sections provide some guidelines to help you make the right choice between inheritance and composition in Python.</p>
<h3 id="inheritance-to-model-is-a-relationship">Inheritance to Model “Is A” Relationship</h3>
<p>Inheritance should only be used to model an <strong>is a</strong> relationship. Liskov’s substitution principle says that an object of type <code>Derived</code>, which inherits from <code>Base</code>, can replace an object of type <code>Base</code> without altering the desirable properties of a program.</p>
<p>Liskov’s substitution principle is the most important guideline to determine if inheritance is the appropriate design solution. Still, the answer might not be straightforward in all situations. Fortunately, there is a simple test you can use to determine if your design follows Liskov’s substitution principle.</p>
<p>Let’s say you have a class <code>A</code> that provides an implementation and interface you want to reuse in another class <code>B</code>. Your initial thought is that you can derive <code>B</code> from <code>A</code> and inherit both the interface and implementation. To be sure this is the right design, you follow theses steps:</p>
<ol>
<li>
<p><strong>Evaluate <code>B</code> is an <code>A</code>:</strong> Think about this relationship and justify it. Does it make sense?</p>
</li>
<li>
<p><strong>Evaluate <code>A</code> is a <code>B</code>:</strong> Reverse the relationship and justify it. Does it also make sense?</p>
</li>
</ol>
<p>If you can justify both relationships, then you should never inherit those classes from one another. Let’s look at a more concrete example.</p>
<p>You have a class <code>Rectangle</code> which exposes an <code>.area</code> property. You need a class <code>Square</code>, which also has an <code>.area</code>. It seems that a <code>Square</code> is a special type of <code>Rectangle</code>, so maybe you can derive from it and leverage both the interface and implementation.</p>
<p>Before you jump into the implementation, you use Liskov’s substitution principle to evaluate the relationship.</p>
<p>A <code>Square</code> <strong>is a</strong> <code>Rectangle</code> because its area is calculated from the product of its <code>height</code> times its <code>length</code>. The constraint is that <code>Square.height</code> and <code>Square.length</code> must be equal.</p>
<p>It makes sense. You can justify the relationship and explain why a <code>Square</code> <strong>is a</strong> <code>Rectangle</code>. Let’s reverse the relationship to see if it makes sense.</p>
<p>A <code>Rectangle</code> <strong>is a</strong> <code>Square</code> because its area is calculated from the product of its <code>height</code> times its <code>length</code>. The difference is that <code>Rectangle.height</code> and <code>Rectangle.width</code> can change independently.</p>
<p>It also makes sense. You can justify the relationship and describe the special constraints for each class. This is a good sign that these two classes should never derive from each other.</p>
<p>You might have seen other examples that derive <code>Square</code> from <code>Rectangle</code> to explain inheritance. You might be skeptical with the little test you just did. Fair enough. Let’s write a program that illustrates the problem with deriving <code>Square</code> from <code>Rectangle</code>.</p>
<p>First, you implement <code>Rectangle</code>. You’re even going to <a href="https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)">encapsulate</a> the attributes to ensure that all the constraints are met:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="k">class</span> <span class="nc">Rectangle</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">length</span><span class="p">,</span> <span class="n">height</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_length</span> <span class="o">=</span> <span class="n">length</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_height</span> <span class="o">=</span> <span class="n">height</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">area</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_length</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">_height</span>
</pre></div>
<p>The <code>Rectangle</code> class is initialized with a <code>length</code> and a <code>height</code>, and it provides an <code>.area</code> property that returns the area. The <code>length</code> and <code>height</code> are encapsulated to avoid changing them directly.</p>
<p>Now, you derive <code>Square</code> from <code>Rectangle</code> and override the necessary interface to meet the constraints of a <code>Square</code>:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="k">class</span> <span class="nc">Square</span><span class="p">(</span><span class="n">Rectangle</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">side_size</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">side_size</span><span class="p">,</span> <span class="n">side_size</span><span class="p">)</span>
</pre></div>
<p>The <code>Square</code> class is initialized with a <code>side_size</code>, which is used to initialize both components of the base class. Now, you write a small program to test the behavior:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="n">rectangle</span> <span class="o">=</span> <span class="n">Rectangle</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">rectangle</span><span class="o">.</span><span class="n">area</span> <span class="o">==</span> <span class="mi">8</span>
<span class="n">square</span> <span class="o">=</span> <span class="n">Square</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">square</span><span class="o">.</span><span class="n">area</span> <span class="o">==</span> <span class="mi">4</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'OK!'</span><span class="p">)</span>
</pre></div>
<p>The program creates a <code>Rectangle</code> and a <code>Square</code> and asserts that their <code>.area</code> is calculated correctly. You can run the program and see that everything is <code>OK</code> so far:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python rectangle_square_demo.py
<span class="go">OK!</span>
</pre></div>
<p>The program executes correctly, so it seems that <code>Square</code> is just a special case of a <code>Rectangle</code>.</p>
<p>Later on, you need to support resizing <code>Rectangle</code> objects, so you make the appropriate changes to the class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="k">class</span> <span class="nc">Rectangle</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">length</span><span class="p">,</span> <span class="n">height</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_length</span> <span class="o">=</span> <span class="n">length</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_height</span> <span class="o">=</span> <span class="n">height</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">area</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_length</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">_height</span>
<span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">new_length</span><span class="p">,</span> <span class="n">new_height</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_length</span> <span class="o">=</span> <span class="n">new_length</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_height</span> <span class="o">=</span> <span class="n">new_height</span>
</pre></div>
<p><code>.resize()</code> takes the <code>new_length</code> and <code>new_width</code> for the object. You can add the following code to the program to verify that it works correctly:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="n">rectangle</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">rectangle</span><span class="o">.</span><span class="n">area</span> <span class="o">==</span> <span class="mi">15</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'OK!'</span><span class="p">)</span>
</pre></div>
<p>You resize the rectangle object and assert that the new area is correct. You can run the program to verify the behavior:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python rectangle_square_demo.py
<span class="go">OK!</span>
</pre></div>
<p>The assertion passes, and you see that the program runs correctly.</p>
<p>So, what happens if you resize a square? Modify the program, and try to modify the <code>square</code> object:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In rectangle_square_demo.py</span>
<span class="n">square</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Square area: </span><span class="si">{square.area}</span><span class="s1">'</span><span class="p">)</span>
</pre></div>
<p>You pass the same parameters to <code>square.resize()</code> that you used with <code>rectangle</code>, and print the area. When you run the program you see:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python rectangle_square_demo.py
<span class="go">Square area: 15</span>
<span class="go">OK!</span>
</pre></div>
<p>The program shows that the new area is <code>15</code> like the <code>rectangle</code> object. The problem now is that the <code>square</code> object no longer meets the <code>Square</code> class constraint that the <code>length</code> and <code>height</code> must be equal.</p>
<p>How can you fix that problem? You can try several approaches, but all of them will be awkward. You can override <code>.resize()</code> in <code>square</code> and ignore the <code>height</code> parameter, but that will be confusing for people looking at other parts of the program where <code>rectangles</code> are being resized and some of them are not getting the expected areas because they are really <code>squares</code>.</p>
<p>In a small program like this one, it might be easy to spot the causes of the weird behavior, but in a more complex program, the problem will be harder to find.</p>
<p>The reality is that if you’re able to justify an inheritance relationship between two classes both ways, you should not derive one class from another. </p>
<p>In the example, it doesn’t make sense that <code>Square</code> inherits the interface and implementation of <code>.resize()</code> from <code>Rectangle</code>. That doesn’t mean that <code>Square</code> objects can’t be resized. It means that the interface is different because it only needs a <code>side_size</code> parameter.</p>
<p>This difference in interface justifies not deriving <code>Square</code> from <code>Rectangle</code> like the test above advised.</p>
<h3 id="mixing-features-with-mixin-classes">Mixing Features With Mixin Classes</h3>
<p>One of the uses of multiple inheritance in Python is to extend a class features through <a href="https://en.wikipedia.org/wiki/Mixin">mixins</a>. A <strong>mixin</strong> is a class that provides methods to other classes but are not considered a base class.</p>
<p>A mixin allows other classes to reuse its interface and implementation without becoming a super class. They implement a unique behavior that can be aggregated to other unrelated classes. They are similar to composition but they create a stronger relationship.</p>
<p>Let’s say you want to convert objects of certain types in your application to a dictionary representation of the object. You could provide a <code>.to_dict()</code> method in every class that you want to support this feature, but the implementation of <code>.to_dict()</code> seems to be very similar.</p>
<p>This could be a good candidate for a mixin. You start by slightly modifying the <code>Employee</code> class from the composition example:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">role</span><span class="p">,</span> <span class="n">payroll</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">address</span>
<span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span> <span class="o">=</span> <span class="n">role</span>
</span><span class="hll"> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">payroll</span>
</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
</pre></div>
<p>The change is very small. You just changed the <code>role</code> and <code>payroll</code> attributes to be internal by adding a leading underscore to their name. You will see soon why you are making that change.</p>
<p>Now, you add the <code>AsDictionaryMixin</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In representations.py</span>
<span class="k">class</span> <span class="nc">AsDictionaryMixin</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">to_dict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="n">prop</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_represent</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">for</span> <span class="n">prop</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">()</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_is_internal</span><span class="p">(</span><span class="n">prop</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">_represent</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">object</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s1">'to_dict'</span><span class="p">):</span>
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="n">to_dict</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">_is_internal</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">prop</span><span class="p">):</span>
<span class="k">return</span> <span class="n">prop</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'_'</span><span class="p">)</span>
</pre></div>
<p>The <code>AsDictionaryMixin</code> class exposes a <code>.to_dict()</code> method that returns the representation of itself as a dictionary. The method is implemented as a <a href="https://www.python.org/dev/peps/pep-0274/"><code>dict</code> comprehension</a> that says, “Create a dictionary mapping <code>prop</code> to <code>value</code> for each item in <code>self.__dict__.items()</code> if the <code>prop</code> is not internal.”</p>
<div class="alert alert-primary" role="alert">
<p><strong>Note:</strong> This is why we made the role and payroll attributes internal in the <code>Employee</code> class, because we don’t want to represent them in the dictionary.</p>
</div>
<p>As you saw at the beginning, creating a class inherits some members from <code>object</code>, and one of those members is <code>__dict__</code>, which is basically a mapping of all the attributes in an object to their value.</p>
<p>You iterate through all the items in <code>__dict__</code> and filter out the ones that have a name that starts with an underscore using <code>._is_internal()</code>.</p>
<p><code>._represent()</code> checks the specified value. If the value <strong>is an</strong> <code>object</code>, then it looks to see if it also has a <code>.to_dict()</code> member and uses it to represent the object. Otherwise, it returns a string representation. If the value is not an <code>object</code>, then it simply returns the value.</p>
<p>You can modify the <code>Employee</code> class to support this mixin:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="kn">from</span> <span class="nn">representations</span> <span class="k">import</span> <span class="n">AsDictionaryMixin</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">(</span><span class="n">AsDictionaryMixin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span><span class="p">,</span> <span class="n">role</span><span class="p">,</span> <span class="n">payroll</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">address</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_role</span> <span class="o">=</span> <span class="n">role</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">payroll</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
</pre></div>
<p>All you have to do is inherit the <code>AsDictionaryMixin</code> to support the functionality. It will be nice to support the same functionality in the <code>Address</code> class, so the <code>Employee.address</code> attribute is represented in the same way:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In contacts.py</span>
<span class="kn">from</span> <span class="nn">representations</span> <span class="k">import</span> <span class="n">AsDictionaryMixin</span>
<span class="k">class</span> <span class="nc">Address</span><span class="p">(</span><span class="n">AsDictionaryMixin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">street</span><span class="p">,</span> <span class="n">city</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">zipcode</span><span class="p">,</span> <span class="n">street2</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street</span> <span class="o">=</span> <span class="n">street</span>
<span class="bp">self</span><span class="o">.</span><span class="n">street2</span> <span class="o">=</span> <span class="n">street2</span>
<span class="bp">self</span><span class="o">.</span><span class="n">city</span> <span class="o">=</span> <span class="n">city</span>
<span class="bp">self</span><span class="o">.</span><span class="n">state</span> <span class="o">=</span> <span class="n">state</span>
<span class="bp">self</span><span class="o">.</span><span class="n">zipcode</span> <span class="o">=</span> <span class="n">zipcode</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">street</span><span class="p">]</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">:</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">street2</span><span class="p">)</span>
<span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{self.city}</span><span class="s1">, </span><span class="si">{self.state}</span><span class="s1"> </span><span class="si">{self.zipcode}</span><span class="s1">'</span><span class="p">)</span>
<span class="k">return</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span>
</pre></div>
<p>You apply the mixin to the <code>Address</code> class to support the feature. Now, you can write a small program to test it:</p>
<div class="highlight python"><pre><span></span> <span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">EmployeeDatabase</span>
<span class="k">def</span> <span class="nf">print_dict</span><span class="p">(</span><span class="n">d</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">EmployeeDatabase</span><span class="p">()</span><span class="o">.</span><span class="n">employees</span><span class="p">:</span>
<span class="n">print_dict</span><span class="p">(</span><span class="n">employee</span><span class="o">.</span><span class="n">to_dict</span><span class="p">())</span>
</pre></div>
<p>The program implements a <code>print_dict()</code> that converts the dictionary to a <a href="http://json.org/">JSON</a> string using indentation so the output looks better.</p>
<p>Then, it iterates through all the employees, printing the dictionary representation provided by <code>.to_dict()</code>. You can run the program to see its output:</p>
<div class="highlight sh"><pre><span></span><span class="gp"> $</span> python program.py
<span class="go"> {</span>
<span class="go"> "id": "1",</span>
<span class="go"> "name": "Mary Poppins",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "121 Admin Rd.",</span>
<span class="go"> "street2": "",</span>
<span class="go"> "city": "Concord",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03301"</span>
<span class="go"> }</span>
<span class="go">}</span>
<span class="go">{</span>
<span class="go"> "id": "2",</span>
<span class="go"> "name": "John Smith",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "67 Paperwork Ave",</span>
<span class="go"> "street2": "",</span>
<span class="go"> "city": "Manchester",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03101"</span>
<span class="go"> }</span>
<span class="go">}</span>
<span class="go">{</span>
<span class="go"> "id": "3",</span>
<span class="go"> "name": "Kevin Bacon",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "15 Rose St",</span>
<span class="go"> "street2": "Apt. B-1",</span>
<span class="go"> "city": "Concord",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03301"</span>
<span class="go"> }</span>
<span class="go">}</span>
<span class="go">{</span>
<span class="go"> "id": "4",</span>
<span class="go"> "name": "Jane Doe",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "39 Sole St.",</span>
<span class="go"> "street2": "",</span>
<span class="go"> "city": "Concord",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03301"</span>
<span class="go"> }</span>
<span class="go">}</span>
<span class="go">{</span>
<span class="go"> "id": "5",</span>
<span class="go"> "name": "Robin Williams",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "99 Mountain Rd.",</span>
<span class="go"> "street2": "",</span>
<span class="go"> "city": "Concord",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03301"</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
<p>You leveraged the implementation of <code>AsDictionaryMixin</code> in both <code>Employee</code> and <code>Address</code> classes even when they are not related. Because <code>AsDictionaryMixin</code> only provides behavior, it is easy to reuse with other classes without causing problems.</p>
<h3 id="composition-to-model-has-a-relationship">Composition to Model “Has A” Relationship</h3>
<p>Composition models a <strong>has a</strong> relationship. With composition, a class <code>Composite</code> <strong>has an</strong> instance of class <code>Component</code> and can leverage its implementation. The <code>Component</code> class can be reused in other classes completely unrelated to the <code>Composite</code>.</p>
<p>In the composition example above, the <code>Employee</code> class <strong>has an</strong> <code>Address</code> object. <code>Address</code> implements all the functionality to handle addresses, and it can be reused by other classes.</p>
<p>Other classes like <code>Customer</code> or <code>Vendor</code> can reuse <code>Address</code> without being related to <code>Employee</code>. They can leverage the same implementation ensuring that addresses are handled consistently across the application.</p>
<p>A problem you may run into when using composition is that some of your classes may start growing by using multiple components. Your classes may require multiple parameters in the constructor just to pass in the components they are made of. This can make your classes hard to use.</p>
<p>A way to avoid the problem is by using the <a href="https://realpython.com/factory-method-python/">Factory Method</a> to construct your objects. You did that with the composition example.</p>
<p>If you look at the implementation of the <code>EmployeeDatabase</code> class, you’ll notice that it uses <code>._create_employee()</code> to construct an <code>Employee</code> object with the right parameters.</p>
<p>This design will work, but ideally, you should be able to construct an <code>Employee</code> object just by specifying an <code>id</code>, for example <code>employee = Employee(1)</code>.</p>
<p>The following changes might improve your design. You can start with the <code>productivity</code> module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In productivity.py</span>
<span class="k">class</span> <span class="nc">_ProductivitySystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_roles</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'manager'</span><span class="p">:</span> <span class="n">ManagerRole</span><span class="p">,</span>
<span class="s1">'secretary'</span><span class="p">:</span> <span class="n">SecretaryRole</span><span class="p">,</span>
<span class="s1">'sales'</span><span class="p">:</span> <span class="n">SalesRole</span><span class="p">,</span>
<span class="s1">'factory'</span><span class="p">:</span> <span class="n">FactoryRole</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_role</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">role_id</span><span class="p">):</span>
<span class="n">role_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_roles</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">role_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">role_type</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'role_id'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">role_type</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">track</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Tracking Employee Productivity'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'=============================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="n">employee</span><span class="o">.</span><span class="n">work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="c1"># Role classes implementation omitted</span>
<span class="n">_productivity_system</span> <span class="o">=</span> <span class="n">_ProductivitySystem</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_role</span><span class="p">(</span><span class="n">role_id</span><span class="p">):</span>
<span class="k">return</span> <span class="n">_productivity_system</span><span class="o">.</span><span class="n">get_role</span><span class="p">(</span><span class="n">role_id</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">_productivity_system</span><span class="o">.</span><span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="n">hours</span><span class="p">)</span>
</pre></div>
<p>First, you make the <code>_ProductivitySystem</code> class internal, and then provide a <code>_productivity_system</code> internal variable to the module. You are communicating to other developers that they should not create or use the <code>_ProductivitySystem</code> directly. Instead, you provide two functions, <code>get_role()</code> and <code>track()</code>, as the public interface to the module. This is what other modules should use.</p>
<p>What you are saying is that the <code>_ProductivitySystem</code> is a <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singleton</a>, and there should only be one object created from it.</p>
<p>Now, you can do the same with the <code>hr</code> module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">_PayrollSystem</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employee_policies</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span><span class="p">:</span> <span class="n">SalaryPolicy</span><span class="p">(</span><span class="mi">3000</span><span class="p">),</span>
<span class="mi">2</span><span class="p">:</span> <span class="n">SalaryPolicy</span><span class="p">(</span><span class="mi">1500</span><span class="p">),</span>
<span class="mi">3</span><span class="p">:</span> <span class="n">CommissionPolicy</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">100</span><span class="p">),</span>
<span class="mi">4</span><span class="p">:</span> <span class="n">HourlyPolicy</span><span class="p">(</span><span class="mi">15</span><span class="p">),</span>
<span class="mi">5</span><span class="p">:</span> <span class="n">HourlyPolicy</span><span class="p">(</span><span class="mi">9</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employee_id</span><span class="p">):</span>
<span class="n">policy</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employee_policies</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">policy</span><span class="p">:</span>
<span class="k">return</span> <span class="ne">ValueError</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">policy</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employees</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Calculating Payroll'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'==================='</span><span class="p">)</span>
<span class="k">for</span> <span class="n">employee</span> <span class="ow">in</span> <span class="n">employees</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Payroll for: </span><span class="si">{employee.id}</span><span class="s1"> - </span><span class="si">{employee.name}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- Check amount: {employee.calculate_payroll()}'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'- Sent to:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">employee</span><span class="o">.</span><span class="n">address</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="c1"># Policy classes implementation omitted</span>
<span class="n">_payroll_system</span> <span class="o">=</span> <span class="n">_PayrollSystem</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_policy</span><span class="p">(</span><span class="n">employee_id</span><span class="p">):</span>
<span class="k">return</span> <span class="n">_payroll_system</span><span class="o">.</span><span class="n">get_policy</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">):</span>
<span class="n">_payroll_system</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>Again, you make the <code>_PayrollSystem</code> internal and provide a public interface to it. The application will use the public interface to get policies and calculate payroll.</p>
<p>You will now do the same with the <code>contacts</code> module:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In contacts.py</span>
<span class="k">class</span> <span class="nc">_AddressBook</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employee_addresses</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'121 Admin Rd.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="mi">2</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'67 Paperwork Ave'</span><span class="p">,</span> <span class="s1">'Manchester'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03101'</span><span class="p">),</span>
<span class="mi">3</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'15 Rose St'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">,</span> <span class="s1">'Apt. B-1'</span><span class="p">),</span>
<span class="mi">4</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'39 Sole St.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="mi">5</span><span class="p">:</span> <span class="n">Address</span><span class="p">(</span><span class="s1">'99 Mountain Rd.'</span><span class="p">,</span> <span class="s1">'Concord'</span><span class="p">,</span> <span class="s1">'NH'</span><span class="p">,</span> <span class="s1">'03301'</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">get_employee_address</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employee_id</span><span class="p">):</span>
<span class="n">address</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employee_addresses</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">address</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">address</span>
<span class="c1"># Implementation of Address class omitted</span>
<span class="n">_address_book</span> <span class="o">=</span> <span class="n">_AddressBook</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_employee_address</span><span class="p">(</span><span class="n">employee_id</span><span class="p">):</span>
<span class="k">return</span> <span class="n">_address_book</span><span class="o">.</span><span class="n">get_employee_address</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
</pre></div>
<p>You are basically saying that there should only be one <code>_AddressBook</code>, one <code>_PayrollSystem</code>, and one <code>_ProductivitySystem</code>. Again, this design pattern is called the <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singleton</a> design pattern, which comes in handy for classes from which there should only be one, single instance.</p>
<p>Now, you can work on the <code>employees</code> module. You will also make a Singleton out of the <code>_EmployeeDatabase</code>, but you will make some additional changes:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">get_role</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">get_policy</span>
<span class="kn">from</span> <span class="nn">contacts</span> <span class="k">import</span> <span class="n">get_employee_address</span>
<span class="kn">from</span> <span class="nn">representations</span> <span class="k">import</span> <span class="n">AsDictionaryMixin</span>
<span class="k">class</span> <span class="nc">_EmployeeDatabase</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_employees</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Mary Poppins'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'manager'</span>
<span class="p">},</span>
<span class="mi">2</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'John Smith'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'secretary'</span>
<span class="p">},</span>
<span class="mi">3</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Kevin Bacon'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'sales'</span>
<span class="p">},</span>
<span class="mi">4</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Jane Doe'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'factory'</span>
<span class="p">},</span>
<span class="mi">5</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Robin Williams'</span><span class="p">,</span>
<span class="s1">'role'</span><span class="p">:</span> <span class="s1">'secretary'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">employees</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">Employee</span><span class="p">(</span><span class="n">id_</span><span class="p">)</span> <span class="k">for</span> <span class="n">id_</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_employees</span><span class="p">)]</span>
<span class="k">def</span> <span class="nf">get_employee_info</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">employee_id</span><span class="p">):</span>
<span class="n">info</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_employees</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">info</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="n">employee_id</span><span class="p">)</span>
<span class="k">return</span> <span class="n">info</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">(</span><span class="n">AsDictionaryMixin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">get_employee_info</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'name'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">get_employee_address</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_role</span> <span class="o">=</span> <span class="n">get_role</span><span class="p">(</span><span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'role'</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">get_policy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="n">employee_database</span> <span class="o">=</span> <span class="n">_EmployeeDatabase</span><span class="p">()</span>
</pre></div>
<p>You first import the relevant functions and classes from other modules. The <code>_EmployeeDatabase</code> is made internal, and at the bottom, you create a single instance. This instance is public and part of the interface because you will want to use it in the application.</p>
<p>You changed the <code>_EmployeeDatabase._employees</code> attribute to be a dictionary where the key is the employee <code>id</code> and the value is the employee information. You also exposed a <code>.get_employee_info()</code> method to return the information for the specified employee <code>employee_id</code>.</p>
<p>The <code>_EmployeeDatabase.employees</code> property now sorts the keys to return the employees sorted by their <code>id</code>. You replaced the method that constructed the <code>Employee</code> objects with calls to the <code>Employee</code> initializer directly.</p>
<p>The <code>Employee</code> class now is initialized with the <code>id</code> and uses the public functions exposed in the other modules to initialize its attributes.</p>
<p>You can now change the program to test the changes:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">calculate_payroll</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">track</span>
<span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">employee_database</span><span class="p">,</span> <span class="n">Employee</span>
<span class="k">def</span> <span class="nf">print_dict</span><span class="p">(</span><span class="n">d</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">employees</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">employees</span>
<span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
<span class="n">temp_secretary</span> <span class="o">=</span> <span class="n">Employee</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Temporary Secretary:'</span><span class="p">)</span>
<span class="n">print_dict</span><span class="p">(</span><span class="n">temp_secretary</span><span class="o">.</span><span class="n">to_dict</span><span class="p">())</span>
</pre></div>
<p>You import the relevant functions from the <code>hr</code> and <code>productivity</code> modules, as well as the <code>employee_database</code> and <code>Employee</code> class. The program is cleaner because you exposed the required interface and encapsulated how objects are accessed.</p>
<p>Notice that you can now create an <code>Employee</code> object directly just using its <code>id</code>. You can run the program to see its output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Employee 1 - Mary Poppins:</span>
<span class="go">- screams and yells for 40 hours.</span>
<span class="go">Employee 2 - John Smith:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Employee 3 - Kevin Bacon:</span>
<span class="go">- expends 40 hours on the phone.</span>
<span class="go">Employee 4 - Jane Doe:</span>
<span class="go">- manufactures gadgets for 40 hours.</span>
<span class="go">Employee 5 - Robin Williams:</span>
<span class="go">- does paperwork for 40 hours.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">- Sent to:</span>
<span class="go">121 Admin Rd.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">- Sent to:</span>
<span class="go">67 Paperwork Ave</span>
<span class="go">Manchester, NH 03101</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1800.0</span>
<span class="go">- Sent to:</span>
<span class="go">15 Rose St</span>
<span class="go">Apt. B-1</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">- Sent to:</span>
<span class="go">39 Sole St.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
<span class="go">- Sent to:</span>
<span class="go">99 Mountain Rd.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Temporary Secretary:</span>
<span class="go">{</span>
<span class="go"> "id": "5",</span>
<span class="go"> "name": "Robin Williams",</span>
<span class="go"> "address": {</span>
<span class="go"> "street": "99 Mountain Rd.",</span>
<span class="go"> "street2": "",</span>
<span class="go"> "city": "Concord",</span>
<span class="go"> "state": "NH",</span>
<span class="go"> "zipcode": "03301"</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
<p>The program works the same as before, but now you can see that a single <code>Employee</code> object can be created from its <code>id</code> and display its dictionary representation.</p>
<p>Take a closer look at the <code>Employee</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">(</span><span class="n">AsDictionaryMixin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">get_employee_info</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'name'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">get_employee_address</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_role</span> <span class="o">=</span> <span class="n">get_role</span><span class="p">(</span><span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'role'</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">get_policy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
</pre></div>
<p>The <code>Employee</code> class is a composite that contains multiple objects providing different functionality. It contains an <code>Address</code> that implements all the functionality related to where the employee lives.</p>
<p><code>Employee</code> also contains a productivity role provided by the <code>productivity</code> module, and a payroll policy provided by the <code>hr</code> module. These two objects provide implementations that are leveraged by the <code>Employee</code> class to track work in the <code>.work()</code> method and to calculate the payroll in the <code>.calculate_payroll()</code> method.</p>
<p>You are using composition in two different ways. The <code>Address</code> class provides additional data to <code>Employee</code> where the role and payroll objects provide additional behavior.</p>
<p>Still, the relationship between <code>Employee</code> and those objects is loosely coupled, which provides some interesting capabilities that you’ll see in the next section.</p>
<h3 id="composition-to-change-run-time-behavior">Composition to Change Run-Time Behavior</h3>
<p>Inheritance, as opposed to composition, is a tightly couple relationship. With inheritance, there is only one way to change and customize behavior. Method overriding is the only way to customize the behavior of a base class. This creates rigid designs that are difficult to change.</p>
<p>Composition, on the other hand, provides a loosely coupled relationship that enables flexible designs and can be used to change behavior at run-time.</p>
<p>Imagine you need to support a long-term disability (LTD) policy when calculating payroll. The policy states that an employee on LTD should be paid 60% of their weekly salary assuming 40 hours of work.</p>
<p>With an inheritance design, this can be a very difficult requirement to support. Adding it to the composition example is a lot easier. Let’s start by adding the policy class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In hr.py</span>
<span class="k">class</span> <span class="nc">LTDPolicy</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_base_policy</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">def</span> <span class="nf">track_work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_check_base_policy</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_base_policy</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_check_base_policy</span><span class="p">()</span>
<span class="n">base_salary</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_base_policy</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">return</span> <span class="n">base_salary</span> <span class="o">*</span> <span class="mf">0.6</span>
<span class="k">def</span> <span class="nf">apply_to_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_policy</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_base_policy</span> <span class="o">=</span> <span class="n">base_policy</span>
<span class="k">def</span> <span class="nf">_check_base_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_base_policy</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s1">'Base policy missing'</span><span class="p">)</span>
</pre></div>
<p>Notice that <code>LTDPolicy</code> doesn’t inherit <code>PayrollPolicy</code>, but implements the same interface. This is because the implementation is completely different, so we don’t want to inherit any of the <code>PayrollPolicy</code> implementation.</p>
<p>The <code>LTDPolicy</code> initializes <code>_base_policy</code> to <code>None</code>, and provides an internal <code>._check_base_policy()</code> method that raises an exception if the <code>._base_policy</code> has not been applied. Then, it provides a <code>.apply_to_policy()</code> method to assign the <code>_base_policy</code>.</p>
<p>The public interface first checks that the <code>_base_policy</code> has been applied, and then implements the functionality in terms of that base policy. The <code>.track_work()</code> method just delegates to the base policy, and <code>.calculate_payroll()</code> uses it to calculate the <code>base_salary</code> and then return the 60%.</p>
<p>You can now make a small change to the <code>Employee</code> class:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In employees.py</span>
<span class="k">class</span> <span class="nc">Employee</span><span class="p">(</span><span class="n">AsDictionaryMixin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="nb">id</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">get_employee_info</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'name'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">get_employee_address</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_role</span> <span class="o">=</span> <span class="n">get_role</span><span class="p">(</span><span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'role'</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">get_policy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hours</span><span class="p">):</span>
<span class="n">duties</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_role</span><span class="o">.</span><span class="n">perform_duties</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'Employee </span><span class="si">{self.id}</span><span class="s1"> - </span><span class="si">{self.name}</span><span class="s1">:'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s1">'- </span><span class="si">{duties}</span><span class="s1">'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">track_work</span><span class="p">(</span><span class="n">hours</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">calculate_payroll</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="o">.</span><span class="n">calculate_payroll</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">apply_payroll_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">new_policy</span><span class="p">):</span>
<span class="n">new_policy</span><span class="o">.</span><span class="n">apply_to_policy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_payroll</span> <span class="o">=</span> <span class="n">new_policy</span>
</pre></div>
<p>You added an <code>.apply_payroll_policy()</code> method that applies the existing payroll policy to the new policy and then substitutes it. You can now modify the program to apply the policy to an <code>Employee</code> object:</p>
<div class="highlight python"><pre><span></span><span class="c1"># In program.py</span>
<span class="kn">from</span> <span class="nn">hr</span> <span class="k">import</span> <span class="n">calculate_payroll</span><span class="p">,</span> <span class="n">LTDPolicy</span>
<span class="kn">from</span> <span class="nn">productivity</span> <span class="k">import</span> <span class="n">track</span>
<span class="kn">from</span> <span class="nn">employees</span> <span class="k">import</span> <span class="n">employee_database</span>
<span class="n">employees</span> <span class="o">=</span> <span class="n">employee_database</span><span class="o">.</span><span class="n">employees</span>
<span class="n">sales_employee</span> <span class="o">=</span> <span class="n">employees</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="n">ltd_policy</span> <span class="o">=</span> <span class="n">LTDPolicy</span><span class="p">()</span>
<span class="n">sales_employee</span><span class="o">.</span><span class="n">apply_payroll_policy</span><span class="p">(</span><span class="n">ltd_policy</span><span class="p">)</span>
<span class="n">track</span><span class="p">(</span><span class="n">employees</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">calculate_payroll</span><span class="p">(</span><span class="n">employees</span><span class="p">)</span>
</pre></div>
<p>The program accesses <code>sales_employee</code>, which is located at index <code>2</code>, creates the <code>LTDPolicy</code> object, and applies the policy to the employee. When <code>.calculate_payroll()</code> is called, the change is reflected. You can run the program to evaluate the output:</p>
<div class="highlight sh"><pre><span></span><span class="gp">$</span> python program.py
<span class="go">Tracking Employee Productivity</span>
<span class="go">==============================</span>
<span class="go">Employee 1 - Mary Poppins:</span>
<span class="go">- screams and yells for 40 hours.</span>
<span class="go">Employee 2 - John Smith:</span>
<span class="go">- Does paperwork for 40 hours.</span>
<span class="go">Employee 3 - Kevin Bacon:</span>
<span class="go">- Expends 40 hours on the phone.</span>
<span class="go">Employee 4 - Jane Doe:</span>
<span class="go">- Manufactures gadgets for 40 hours.</span>
<span class="go">Employee 5 - Robin Williams:</span>
<span class="go">- Does paperwork for 40 hours.</span>
<span class="go">Calculating Payroll</span>
<span class="go">===================</span>
<span class="go">Payroll for: 1 - Mary Poppins</span>
<span class="go">- Check amount: 3000</span>
<span class="go">- Sent to:</span>
<span class="go">121 Admin Rd.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 2 - John Smith</span>
<span class="go">- Check amount: 1500</span>
<span class="go">- Sent to:</span>
<span class="go">67 Paperwork Ave</span>
<span class="go">Manchester, NH 03101</span>
<span class="go">Payroll for: 3 - Kevin Bacon</span>
<span class="go">- Check amount: 1080.0</span>
<span class="go">- Sent to:</span>
<span class="go">15 Rose St</span>
<span class="go">Apt. B-1</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 4 - Jane Doe</span>
<span class="go">- Check amount: 600</span>
<span class="go">- Sent to:</span>
<span class="go">39 Sole St.</span>
<span class="go">Concord, NH 03301</span>
<span class="go">Payroll for: 5 - Robin Williams</span>
<span class="go">- Check amount: 360</span>
<span class="go">- Sent to:</span>
<span class="go">99 Mountain Rd.</span>
<span class="go">Concord, NH 03301</span>
</pre></div>
<p>The check amount for employee Kevin Bacon, who is the sales employee, is now for $1080 instead of $1800. That’s because the <code>LTDPolicy</code> has been applied to the salary.</p>
<p>As you can see, you were able to support the changes just by adding a new policy and modifying a couple interfaces. This is the kind of flexibility that policy design based on composition gives you.</p>
<h3 id="choosing-between-inheritance-and-composition-in-python_1">Choosing Between Inheritance and Composition in Python</h3>
<p>Python, as an object oriented programming language, supports both inheritance and composition. You saw that inheritance is best used to model an <strong>is a</strong> relationship, whereas composition models a <strong>has a</strong> relationship.</p>
<p>Sometimes, it’s hard to see what the relationship between two classes should be, but you can follow these guidelines:</p>
<ul>
<li>
<p><strong>Use inheritance over composition in Python</strong> to model a clear <strong>is a</strong> relationship. First, justify the relationship between the derived class and its base. Then, reverse the relationship and try to justify it. If you can justify the relationship in both directions, then you should not use inheritance between them.</p>
</li>
<li>
<p><strong>Use inheritance over composition in Python</strong> to leverage both the interface and implementation of the base class.</p>
</li>
<li>
<p><strong>Use inheritance over composition in Python</strong> to provide <strong>mixin</strong> features to several unrelated classes when there is only one implementation of that feature.</p>
</li>
<li>
<p><strong>Use composition over inheritance in Python</strong> to model a <strong>has a</strong> relationship that leverages the implementation of the component class.</p>
</li>
<li>
<p><strong>Use composition over inheritance in Python</strong> to create components that can be reused by multiple classes in your Python applications.</p>
</li>
<li>
<p><strong>Use composition over inheritance in Python</strong> to implement groups of behaviors and policies that can be applied interchangeably to other classes to customize their behavior.</p>
</li>
<li>
<p><strong>Use composition over inheritance in Python</strong> to enable run-time behavior changes without affecting existing classes.</p>
</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>You explored <strong>inheritance and composition in Python</strong>. You learned about the type of relationships that inheritance and composition create. You also went through a series of exercises to understand how inheritance and composition are implemented in Python.</p>
<p>In this article, you learned how to:</p>
<ul>
<li>Use inheritance to express an <strong>is a</strong> relationship between two classes</li>
<li>Evaluate if inheritance is the right relationship</li>
<li>Use multiple inheritance in Python and evaluate Python’s MRO to troubleshoot multiple inheritance problems</li>
<li>Extend classes with mixins and reuse their implementation</li>
<li>Use composition to express a <strong>has a</strong> relationship between two classes</li>
<li>Provide flexible designs using composition</li>
<li>Reuse existing code through policy design based on composition</li>
</ul>
<h2 id="recommended-reading">Recommended Reading</h2>
<p>Here are some books and articles that further explore object oriented design and can be useful to help you understand the correct use of inheritance and composition in Python or other languages:</p>
<ul>
<li><a href="https://realpython.com/asins/B000SEIBB8">Design Patterns: Elements of Reusable Object-Oriented Software</a></li>
<li><a href="https://realpython.com/asins/B00AA36RZY">Head First Design Patterns: A Brain-Friendly Guide</a></li>
<li><a href="https://realpython.com/asins/B001GSTOAM">Clean Code: A Handbook of Agile Software Craftsmanship</a></li>
<li><a href="https://en.wikipedia.org/wiki/SOLID">SOLID Principles</a></li>
<li><a href="https://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov Substitution Principle</a></li>
</ul>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
11 Beginner Tips for Learning Pythonhttps://realpython.com/courses/python-beginner-tips/2019-08-06T14:00:00+00:00In this course, you'll see several learning strategies and tips that will help you jumpstart your journey towards becoming a rockstar Python programmer!
<p>We are so excited that you have decided to embark on the journey of learning Python! One of the most common questions we receive from our readers is โWhatโs the best way to learn Python?โ</p>
<p>The first step in learning any programming language is making sure that you understand how to learn. Learning how to learn is arguably the most critical skill involved in computer programming.</p>
<p>Why is knowing how to learn so important? Languages evolve, libraries are created, and tools are upgraded. Knowing how to learn will be essential to keeping up with these changes and becoming a successful programmer.</p>
<p>In this course, you’ll see several learning strategies that will help you jumpstart your journey towards becoming a rockstar Python programmer!</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
What You Need to Know to Manage Users in Django Adminhttps://realpython.com/manage-users-in-django-admin/2019-08-05T14:00:00+00:00In this Python tutorial, you'll learn what you need to know to manage users in Django admin. Out of the box, Django admin doesn't enforce special restrictions on the user admin. This can lead to dangerous scenarios that might compromise your system.
<p>User management in Django admin is a tricky subject. If you enforce too many permissions, then you might interfere with day-to-day operations. If you allow for permissions to be granted freely without supervision, then you put your system at risk.</p>
<p>Django provides a good authentication framework with tight integration to Django admin. Out of the box, Django admin does not enforce special restrictions on the user admin. This can lead to dangerous scenarios that might compromise your system.</p>
<p>Did you know staff users that manage other users in the admin can edit their own permissions? Did you know they can also make themselves superusers? There is nothing in Django admin that prevents that, so it’s up to you!</p>
<p><strong>By the end of this tutorial, you’ll know how to protect your system:</strong></p>
<ul>
<li><strong>Protect against permission escalation</strong> by preventing users from editing their own permissions</li>
<li><strong>Keep permissions tidy and maintainable</strong> by only forcing users to manage permissions only using groups</li>
<li><strong>Prevent permissions from leaking through custom actions</strong> by explicitly enforcing the necessary permissions</li>
</ul>
<div class="alert alert-primary" role="alert">
<p><strong>Follow Along:</strong></p>
<p>To follow along with this tutorial, it’s best to setup a small project to play with. If you aren’t sure how to do that, then check out <a href="https://realpython.com/get-started-with-django-1/#hello-world">Get Started With Django</a>.</p>
<p>This tutorial also assumes a basic understanding of user management in Django. If you aren’t familiar with that, then check out <a href="https://docs.djangoproject.com/en/2.2/topics/auth/">the official documentation</a>.</p>
</div>
<div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="#" class="alert-link" data-toggle="modal" data-target="#modal-django-resources-learing-guide" data-focus="false">Click here to get access to a free Django Learning Resources Guide (PDF)</a> that shows you tips and tricks as well as common pitfalls to avoid when building Python + Django web applications.</p></div>
<h2 id="model-permissions">Model Permissions</h2>
<p>Permissions are tricky. If you don’t set permissions, then you put your system at risk of intruders, data leaks, and human errors. If you abuse permissions or use them too much, then you risk interfering with day-to-day operations.</p>
<p>Django comes with a built-in <a href="https://docs.djangoproject.com/en/2.2/topics/auth/">authentication system</a>. The authentication system includes users, groups, and permissions.</p>
<p>When a model is created, Django will automatically create four <a href="https://docs.djangoproject.com/en/2.2/topics/auth/default/#default-permissions">default permissions</a> for the following actions:</p>
<ol>
<li><strong><code>add</code>:</strong> Users with this permission can add an instance of the model.</li>
<li><strong><code>delete</code>:</strong> Users with this permission can delete an instance of the model.</li>
<li><strong><code>change</code>:</strong> Users with this permission can update an instance of the model.</li>
<li><strong><code>view</code>:</strong> Users with this permission can view instances of this model. This permission was a much anticipated one, and it was finally added in Django 2.1.</li>
</ol>
<p>Permission names follow a very specific naming convention: <code><app>.<action>_<modelname></code>.</p>
<p>Let’s break that down:</p>
<ul>
<li><strong><code><app></code></strong> is the name of the app. For example, the <code>User</code> model is imported from the <code>auth</code> app (<code>django.contrib.auth</code>).</li>
<li><strong><code><action></code></strong> is one of the actions above (<code>add</code>, <code>delete</code>, <code>change</code>, or <code>view</code>).</li>
<li><strong><code><modelname></code></strong> is the name of the model, in all lowercase letters.</li>
</ul>
<p>Knowing this naming convention can help you manage permissions more easily. For example, the name of the permission to change a user is <code>auth.change_user</code>.</p>
<h3 id="how-to-check-permissions">How to Check Permissions</h3>
<p>Model permissions are granted to users or groups. To check if a user has a certain permission, you can do the following:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="gp">>>> </span><span class="n">u</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_user</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="s1">'haki'</span><span class="p">)</span>
<span class="hll"><span class="gp">>>> </span><span class="n">u</span><span class="o">.</span><span class="n">has_perm</span><span class="p">(</span><span class="s1">'auth.change_user'</span><span class="p">)</span>
</span><span class="go">False</span>
</pre></div>
<p>It’s worth mentioning that <a href="https://github.com/django/django/blob/bf9e0e342da3ed2f74ee0ec34e75bdcbedde40a9/django/contrib/auth/models.py#L255"><code>.has_perm()</code> will always return <code>True</code></a> for active superuser, even if the permission doesn’t really exist:</p>
<div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">>>></span><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="gp">>>> </span><span class="n">superuser</span> <span class="o">=</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_superuser</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">username</span><span class="o">=</span><span class="s1">'superhaki'</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">email</span><span class="o">=</span><span class="s1">'me@hakibenita.com'</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">password</span><span class="o">=</span><span class="s1">'secret'</span><span class="p">,</span>
<span class="go">)</span>
<span class="hll"><span class="gp">>>> </span><span class="n">superuser</span><span class="o">.</span><span class="n">has_perm</span><span class="p">(</span><span class="s1">'does.not.exist'</span><span class="p">)</span>
</span><span class="go">True</span>
</pre></div>
<p>As you can see, when you’re checking permissions for a superuser, the permissions are not really being checked.</p>
<h3 id="how-to-enforce-permissions">How to Enforce Permissions</h3>
<p>Django models don’t enforce permissions themselves. The only place permissions are enforced out of the box by default is Django Admin.</p>
<p>The reason models don’t enforce permissions is that, normally, the model is unaware of the user performing the action. In Django apps, the user is usually obtained from the request. This is why, most of the time, permissions are enforced at the view layer.</p>
<p>For example, to prevent a user without view permissions on the <code>User</code> model from accessing a view that shows user information, do the following:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.core.exceptions</span> <span class="k">import</span> <span class="n">PermissionDenied</span>
<span class="k">def</span> <span class="nf">users_list_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">has_perm</span><span class="p">(</span><span class="s1">'auth.view_user'</span><span class="p">):</span>
<span class="k">raise</span> <span class="n">PermissionDenied</span><span class="p">()</span>
</pre></div>
<p>If the user making the request logged in and was authenticated, then <a href="https://docs.djangoproject.com/en/2.2/ref/request-response/#django.http.HttpRequest.user"><code>request.user</code></a> will hold an instance of <code>User</code>. If the user did not login, then <code>request.user</code> will be an instance of <a href="https://docs.djangoproject.com/en/2.2/ref/contrib/auth/#anonymoususer-object"><code>AnonymousUser</code></a>. This is a special object used by Django to indicate an unauthenticated user. Using <code>has_perm</code> on <code>AnonymousUser</code> will always return <code>False</code>.</p>
<p>If the user making the request doesn’t have the <code>view_user</code> permission, then you raise a <code>PermissionDenied</code> exception, and a response with status <code>403</code> is returned to the client.</p>
<p>To make it easier to enforce permissions in views, Django provides a shortcut <a href="https://realpython.com/primer-on-python-decorators/">decorator</a> called <a href="https://docs.djangoproject.com/en/2.2/topics/auth/default/#the-permission-required-decorator"><code>permission_required</code></a> that does the same thing:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib.auth.decorators</span> <span class="k">import</span> <span class="n">permission_required</span>
<span class="nd">@permission_required</span><span class="p">(</span><span class="s1">'auth.view_user'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">users_list_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>To enforce permissions in templates, you can access the current user permissions through a special template variable called <a href="https://docs.djangoproject.com/en/2.2/topics/auth/default/#permissions"><code>perms</code></a>. For example, if you want to show a delete button only to users with delete permission, then do the following:</p>
<div class="highlight htmldjango"><pre><span></span><span class="cp">{%</span> <span class="k">if</span> <span class="nv">perms.auth.delete_user</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">button</span><span class="p">></span>Delete user!<span class="p"></</span><span class="nt">button</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
</pre></div>
<p>Some popular third party apps such as the <a href="https://www.django-rest-framework.org/">Django rest framework</a> also provide <a href="https://www.django-rest-framework.org/api-guide/permissions/#djangomodelpermissions">useful integration with Django model permissions</a>.</p>
<h3 id="django-admin-and-model-permissions">Django Admin and Model Permissions</h3>
<p>Django admin has a very <a href="https://docs.djangoproject.com/en/2.2/topics/auth/default/#permissions-and-authorization">tight integration</a> with the built-in authentication system, and model permissions in particular. Out of the box, Django admin is enforcing model permissions:</p>
<ul>
<li>If the user has no permissions on a model, then they won’t be able to see it or access it in the admin.</li>
<li>If the user has view and change permissions on a model, then they will be able to view and update instances, but they won’t be able to add new instances or delete existing ones.</li>
</ul>
<p>With proper permissions in place, admin users are less likely to make mistakes, and intruders will have a harder time causing harm.</p>
<h2 id="implement-custom-business-roles-in-django-admin">Implement Custom Business Roles in Django Admin</h2>
<p>One of the most vulnerable places in every app is the authentication system. In Django apps, this is the <code>User</code> model. So, to better protect your app, you are going to start with the <code>User</code> model.</p>
<p>First, you need to take control over the <code>User</code> model admin page. Django already comes with a very nice admin page to manage users. To take advantage of that great work, you are going to extend the built-in <code>User</code> admin model.</p>
<h3 id="setup-a-custom-user-admin">Setup: A Custom User Admin</h3>
<p>To provide a custom admin for the <code>User</code> model, you need to unregister the existing model admin provided by Django, and register one of your own:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="c1"># Unregister the provided model admin</span>
<span class="n">admin</span><span class="o">.</span><span class="n">site</span><span class="o">.</span><span class="n">unregister</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="c1"># Register out own model admin, based on the default UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>Your <code>CustomUserAdmin</code> is extending Django’s <code>UserAdmin</code>. You did that so you can take advantage of all the work already done by the Django developers.</p>
<p>At this point, if you log into your Django admin at <code>http://127.0.0.1:8000/admin/auth/user</code>, you should see the user admin unchanged:</p>
<p><a href="https://files.realpython.com/media/django-bare-boned-user-admin.4ac55297d529.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/django-bare-boned-user-admin.4ac55297d529.png" width="861" height="671" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/django-bare-boned-user-admin.4ac55297d529.png&w=215&sig=602db70adb2b1b9f4747c90ae4ba0c6e4860f4ad 215w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/django-bare-boned-user-admin.4ac55297d529.png&w=430&sig=c425295f2d8e0abb534dfe50a5f84dc40876726b 430w, https://files.realpython.com/media/django-bare-boned-user-admin.4ac55297d529.png 861w" sizes="75vw" alt="Django bare boned user admin"/></a></p>
<p>By extending <code>UserAdmin</code>, you are able to use all the built-in features provided by Django admin.</p>
<h3 id="prevent-update-of-fields">Prevent Update of Fields</h3>
<p>Unattended admin forms are a prime candidate for horrible mistakes. A staff user can easily update a model instance through the admin in a way the app does not expect. Most of the time, the user won’t even notice something is wrong. Such mistakes are usually very hard to track down and fix.</p>
<p>To prevent such mistakes from happening, you can prevent admin users from modifying certain fields in the model.</p>
<p>If you want to prevent any user, including superusers, from updating a field, you can mark the field as read only. For example, the field <code>date_joined</code> is set when a user registers. This information should never be changed by any user, so you mark it as read only:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="n">readonly_fields</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'date_joined'</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
<p>When a field is added to <code>readonly_fields</code>, it will not be editable in the admin default change form. When a field is marked as read only, Django will render the input element as disabled.</p>
<p>But, what if you want to prevent only some users from updating a field?</p>
<h3 id="conditionally-prevent-update-of-fields">Conditionally Prevent Update of Fields</h3>
<p>Sometimes it’s useful to update fields directly in the admin. But you don’t want to let any user do it: you want to allow only superusers to do it.</p>
<p>Let’s say you want to prevent non-superusers from changing a user’s username. To do that, you need to modify the change form generated by Django, and disable the username field based on the current user:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_form</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">form</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_form</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_superuser</span>
<span class="hll"> <span class="k">if</span> <span class="ow">not</span> <span class="n">is_superuser</span><span class="p">:</span>
</span><span class="hll"> <span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">[</span><span class="s1">'username'</span><span class="p">]</span><span class="o">.</span><span class="n">disabled</span> <span class="o">=</span> <span class="kc">True</span>
</span>
<span class="k">return</span> <span class="n">form</span>
</pre></div>
<p>Let’s break it down:</p>
<ul>
<li>To make adjustments to the form, you override <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.get_form"><code>get_form()</code></a>. This function is used by Django to generate a default change form for a model.</li>
<li>To conditionally disable the field, you first fetch the default form generated by Django, and then if the user is not a superuser, disable the username field.</li>
</ul>
<p>Now, when a non-superuser tries to edit a user, the username field will be disabled. Any attempt to modify the username through Django Admin will fail. When a superuser tries to edit the user, the username field will be editable and behave as expected.</p>
<h3 id="prevent-non-superusers-from-granting-superuser-rights">Prevent Non-Superusers From Granting Superuser Rights</h3>
<p>Superuser is a very strong permission that should not be granted lightly. However, any user with a change permission on the <code>User</code> model can make any user a superuser, including themselves. This goes against the whole purpose of the permission system, so you want to close this hole.</p>
<p>Based on the previous example, to prevent non-superusers from making themselves superusers, you add the following restriction:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="k">import</span> <span class="n">Set</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_form</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">form</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_form</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_superuser</span>
<span class="n">disabled_fields</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span> <span class="c1"># type: Set[str]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_superuser</span><span class="p">:</span>
<span class="n">disabled_fields</span> <span class="o">|=</span> <span class="p">{</span>
<span class="s1">'username'</span><span class="p">,</span>
<span class="hll"> <span class="s1">'is_superuser'</span><span class="p">,</span>
</span> <span class="p">}</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">disabled_fields</span><span class="p">:</span>
<span class="k">if</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">:</span>
<span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="o">.</span><span class="n">disabled</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">form</span>
</pre></div>
<p>In addition to the previous example, you made the following additions:</p>
<ol>
<li>
<p>You initialized an empty set <code>disabled_fields</code> that will hold the fields to disable. <code>set</code> is a data structure that holds unique values. It makes sense to use a set in this case, because you only need to disable a field once. The operator <code>|=</code> is used to perform an in-place <code>OR</code> update. For more information about sets, check out <a href="https://realpython.com/python-sets/">Sets in Python</a>.</p>
</li>
<li>
<p>Next, if the user is a superuser, you add two fields to the set (<code>username</code> from the previous example, and <code>is_superuser</code>). They will prevent non-superusers from making themselves superusers.</p>
</li>
<li>
<p>Lastly, you iterate over the fields in the set, mark all of them as disabled, and return the form.</p>
</li>
</ol>
<p><strong>Django User Admin Two-Step Form</strong></p>
<p>When you create a new user in Django admin, you go through a two-step form. In the first form, you fill in the username and password. In the second form, you update the rest of the fields.</p>
<p>This two-step process is unique to the <code>User</code> model. To accommodate this unique process, you must verify that the field exists before you try to disable it. Otherwise, you might get a <code>KeyError</code>. This is not necessary if you customize other model admins.</p>
<p>For more information about <code>KeyError</code>, check out <a href="https://realpython.com/python-keyerror/">Python KeyError Exceptions and How to Handle Them</a>.</p>
<h3 id="grant-permissions-only-using-groups">Grant Permissions Only Using Groups</h3>
<p>The way permissions are managed is very specific to each team, product, and company. I found that it’s easier to manage permissions in groups. In my own projects, I create groups for support, content editors, analysts, and so on. I found that managing permissions at the user level can be a real hassle. When new models are added, or when business requirements change, it’s tedious to update each individual user.</p>
<p>To manage permissions only using groups, you need to prevent users from granting permissions to specific users. Instead, you want to only allow associating users to groups. To do that, disable the field <code>user_permissions</code> for all non-superusers:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="k">import</span> <span class="n">Set</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_form</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">form</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_form</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_superuser</span>
<span class="n">disabled_fields</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span> <span class="c1"># type: Set[str]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_superuser</span><span class="p">:</span>
<span class="n">disabled_fields</span> <span class="o">|=</span> <span class="p">{</span>
<span class="s1">'username'</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">,</span>
<span class="hll"> <span class="s1">'user_permissions'</span><span class="p">,</span>
</span> <span class="p">}</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">disabled_fields</span><span class="p">:</span>
<span class="k">if</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">:</span>
<span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="o">.</span><span class="n">disabled</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">form</span>
</pre></div>
<p>You used the exact same technique as in the previous sections to implement another business rule. In the next sections, you’re going to implement more complex business rules to protect your system.</p>
<h3 id="prevent-non-superusers-from-editing-their-own-permissions">Prevent Non-Superusers From Editing Their Own Permissions</h3>
<p>Strong users are often a weak spot. They possess strong permissions, and the potential damage they can cause is significant. To prevent permission escalation in case of intrusion, you can prevent users from editing their own permissions:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="k">import</span> <span class="n">Set</span>
<span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_form</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">form</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_form</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">is_superuser</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_superuser</span>
<span class="n">disabled_fields</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span> <span class="c1"># type: Set[str]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_superuser</span><span class="p">:</span>
<span class="n">disabled_fields</span> <span class="o">|=</span> <span class="p">{</span>
<span class="s1">'username'</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">,</span>
<span class="s1">'user_permissions'</span><span class="p">,</span>
<span class="p">}</span>
<span class="c1"># Prevent non-superusers from editing their own permissions</span>
<span class="k">if</span> <span class="p">(</span>
<span class="hll"> <span class="ow">not</span> <span class="n">is_superuser</span>
</span><span class="hll"> <span class="ow">and</span> <span class="n">obj</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
</span><span class="hll"> <span class="ow">and</span> <span class="n">obj</span> <span class="o">==</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span>
</span> <span class="p">):</span>
<span class="n">disabled_fields</span> <span class="o">|=</span> <span class="p">{</span>
<span class="s1">'is_staff'</span><span class="p">,</span>
<span class="s1">'is_superuser'</span><span class="p">,</span>
<span class="s1">'groups'</span><span class="p">,</span>
<span class="s1">'user_permissions'</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">disabled_fields</span><span class="p">:</span>
<span class="k">if</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">:</span>
<span class="n">form</span><span class="o">.</span><span class="n">base_fields</span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="o">.</span><span class="n">disabled</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">form</span>
</pre></div>
<p>The argument <code>obj</code> is the instance of the object you are currently operating on:</p>
<ul>
<li><strong>When <code>obj</code> is None</strong>, the form is used to create a new user.</li>
<li><strong>When <code>obj</code> is not <code>None</code></strong>, the form is used to edit an existing user.</li>
</ul>
<p>To check if the user making the request is operating on themselves, you compare <code>request.user</code> with <code>obj</code>. Because this is the user admin, <code>obj</code> is either an instance of <code>User</code>, or <code>None</code>. When the user making the request, <code>request.user</code>, is equal to <code>obj</code>, then it means that the user is updating themselves. In this case, you disable all sensitive fields that can be used to gain permissions.</p>
<p>The ability to customize the form based on the object is very useful. It can be used to implement elaborate business roles.</p>
<h3 id="override-permissions">Override Permissions</h3>
<p>It can sometimes be useful to completely override the permissions in Django admin. A common scenario is when you use permissions in other places, and you don’t want staff users to make changes in the admin.</p>
<p>Django uses <a href="https://docs.djangoproject.com/en/2.1/ref/contrib/admin/#django.contrib.admin.ModelAdmin.has_view_permission">hooks for the four built-in permissions</a>. Internally, the hooks use the current user’s permissions to make a decision. You can override these hooks, and provide a different decision.</p>
<p>To prevent staff users from deleting a model instance, regardless of their permissions, you can do the following:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="hll"> <span class="k">def</span> <span class="nf">has_delete_permission</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span><span class="hll"> <span class="k">return</span> <span class="kc">False</span>
</span></pre></div>
<p>Just like with <code>get_form()</code>, <code>obj</code> is the instance you currently operate on:</p>
<ul>
<li><strong>When <code>obj</code> is <code>None</code></strong>, the user requested the list view.</li>
<li><strong>When <code>obj</code> is not <code>None</code></strong>, the user requested the change view of a specific instance.</li>
</ul>
<p>Having the instance of the object in this hook is very useful for implementing object-level permissions for different types of actions. Here are other use cases:</p>
<ul>
<li>Preventing changes during business hours</li>
<li>Implementing object-level permissions</li>
</ul>
<h3 id="restrict-access-to-custom-actions">Restrict Access to Custom Actions</h3>
<p><a href="https://docs.djangoproject.com/en/2.2/ref/contrib/admin/actions/#adding-actions-to-the-modeladmin">Custom admin actions</a> require special attention. Django is not familiar with them, so it can’t restrict access to them by default. A custom action will be accessible to any admin user with any permission on the model.</p>
<p>To illustrate, add a handy admin action to mark multiple users as active:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="n">actions</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'activate_users'</span><span class="p">,</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">activate_users</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="n">cnt</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">is_active</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">is_active</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'Activated </span><span class="si">{}</span><span class="s1"> users.'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">cnt</span><span class="p">))</span>
<span class="n">activate_users</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Activate Users'</span> <span class="c1"># type: ignore</span>
</pre></div>
<p>Using this action, a staff user can mark one or more users, and activate them all at once. This is useful in all sorts of cases, such as if you had a bug in the registration process and needed to activate users in bulk.</p>
<p>This action updates user information, so you want only users with change permissions to be able to use it.</p>
<p>Django admin uses an internal function to get actions. To hide <code>activate_users()</code> from users without change permission, override <code>get_actions()</code>:</p>
<div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">django.contrib</span> <span class="k">import</span> <span class="n">admin</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.models</span> <span class="k">import</span> <span class="n">User</span>
<span class="kn">from</span> <span class="nn">django.contrib.auth.admin</span> <span class="k">import</span> <span class="n">UserAdmin</span>
<span class="nd">@admin</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CustomUserAdmin</span><span class="p">(</span><span class="n">UserAdmin</span><span class="p">):</span>
<span class="n">actions</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">'activate_users'</span><span class="p">,</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">activate_users</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">queryset</span><span class="p">):</span>
<span class="k">assert</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">has_perm</span><span class="p">(</span><span class="s1">'auth.change_user'</span><span class="p">)</span>
<span class="n">cnt</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">is_active</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">is_active</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">message_user</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'Activated </span><span class="si">{}</span><span class="s1"> users.'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">cnt</span><span class="p">))</span>
<span class="n">activate_users</span><span class="o">.</span><span class="n">short_description</span> <span class="o">=</span> <span class="s1">'Activate Users'</span> <span class="c1"># type: ignore</span>
<span class="k">def</span> <span class="nf">get_actions</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">):</span>
<span class="n">actions</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_actions</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="hll"> <span class="k">if</span> <span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">has_perm</span><span class="p">(</span><span class="s1">'auth.change_user'</span><span class="p">):</span>
</span><span class="hll"> <span class="k">del</span> <span class="n">actions</span><span class="p">[</span><span class="s1">'activate_users'</span><span class="p">]</span>
</span> <span class="k">return</span> <span class="n">actions</span>
</pre></div>
<p><code>get_actions()</code> returns an <code>OrderedDict</code>. The key is the name of the action, and the value is the action function. To adjust the return value, you override the function, fetch the original value, and depending on the user permissions, remove the custom action <code>activate_users</code> from the <code>dict</code>. To be on the safe side, you assert the user permission in the action as well.</p>
<p>For staff users without <code>change_user()</code> permissions, the action <code>activate_users</code> will not appear in the actions dropdown.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Django admin is a great tool for managing a Django project. Many teams rely on it to stay productive in managing day-to-day operations. If you use Django admin to perform operations on models, then it’s important to be aware of permissions. The techniques described in this article are useful for any model admin, not just the <code>User</code> model.</p>
<p>In this tutorial, you protected your system by making the following adjustments in Django Admin:</p>
<ul>
<li>You <strong>protected against permission escalation</strong> by preventing users from editing their own permissions.</li>
<li>You <strong>kept permissions tidy and maintainable</strong> by only forcing users to manage permissions only using groups.</li>
<li>You <strong>prevented permissions from leaking through custom actions</strong> by explicitly enforcing the necessary permissions.</li>
</ul>
<p>Your <code>User</code> model admin is now much safer than when you started!</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Dictionaries in Pythonhttps://realpython.com/courses/dictionaries-python/2019-07-30T14:00:00+00:00In this course on Python dictionaries, you'll cover the basic characteristics of dictionaries and learn how to access and manage dictionary data. Once you've finished this course, you'll have a good sense of when a dictionary is the appropriate data type to use and know how to use it.
<p>Python provides a composite <a href="https://realpython.com/python-data-types/">data type</a> called a <strong>dictionary</strong>, which is similar to a <a href="https://realpython.com/python-lists-tuples/">list</a> in that it is a collection of objects.</p>
<p><strong>Here’s what you’ll learn in this course:</strong> You’ll cover the basic characteristics of Python dictionaries and learn how to access and manage dictionary data. Once you’ve finished this course, you’ll have a good sense of when a dictionary is the appropriate data type to use and know how to use it.</p>
<p>Dictionaries and lists share the following characteristics:</p>
<ul>
<li>Both are mutable.</li>
<li>Both are dynamic. They can grow and shrink as needed.</li>
<li>Both can be nested. A list can contain another list. A dictionary can contain another dictionary. A dictionary can also contain a list, and vice versa.</li>
</ul>
<p>Dictionaries differ from lists primarily in how elements are accessed:</p>
<ul>
<li>List elements are accessed by their position in the list, via indexing.</li>
<li>Dictionary elements are accessed via keys.</li>
</ul>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Logging in Pythonhttps://realpython.com/courses/logging-python/2019-07-23T14:00:00+00:00In this video course, you'll learn why and how to get started with Python's powerful logging module to meet the needs of beginners and enterprise teams alike.
<p>Logging is a very useful tool in a programmerโs toolbox. It can help you develop a better understanding of the flow of a program and discover scenarios that you might not even have thought of while developing.</p>
<p>Logs provide developers with an extra set of eyes that are constantly looking at the flow that an application is going through. They can store information, like which user or IP accessed the application. If an error occurs, then they can provide more insights than a stack trace by telling you what the state of the program was before it arrived at the line of code where the error occurred.</p>
<p>By logging useful data from the right places, you can not only debug errors easily but also use the data to analyze the performance of the application to plan for scaling or look at usage patterns to plan for marketing.</p>
<p>Python provides a logging system as a part of its standard library, so you can quickly add logging to your application. In this course, you’ll learn why using this module is the best way to add logging to your application as well as how to get started quickly, and you will get an introduction to some of the advanced features available.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
How to Write Pythonic Loopshttps://realpython.com/courses/how-to-write-pythonic-loops/2019-07-16T14:00:00+00:00In this course, you'll see how you can make your loops more Pythonic if you're coming to Python from a C-style language. You'll learn how you can get the most out of using range(), xrange(), and enumerate(). You'll also see how you can avoid having to keep track of loop indexes manually.
<p>One of the easiest ways to spot a developer who has a background in C-style languages and only recently picked up Python is to look at how they loop through a list. In this course, you’ll learn how to take a C-style (Java, PHP, C, C++) loop and turn it into the sort of loop a Python developer would write.</p>
<p>You can use these techniques to refactor your existing Python <a href="https://realpython.com/python-for-loop/"><code>for</code> loops</a> and <a href="https://realpython.com/python-while-loop/"><code>while</code> loops</a> in order to make them easier to read and more maintainable. You’ll learn how to use Python’s <a href="https://realpython.com/python-range/"><code>range()</code>, <code>xrange()</code></a>, and <code>enumerate()</code> built-ins to refactor your loops and how to avoid having to keep track of loop indexes manually.</p>
<p>The main takeaways in this tutorial are that:</p>
<ol>
<li>
<p>Writing C-style loops in Python is considered not <a href="https://realpython.com/learning-paths/writing-pythonic-code/">Pythonic</a>. Avoid managing loop indexes and stop conditions manually if possible.</p>
</li>
<li>
<p>Pythonโs <code>for</code> loops are really โfor eachโ loops that can iterate over items from a container or sequence directly.</p>
</li>
</ol>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Reading and Writing Files in Pythonhttps://realpython.com/courses/reading-and-writing-files-python/2019-07-09T14:00:00+00:00In this course, you'll learn about reading and writing files in Python. You'll cover everything from what a file is made up of to which libraries can help you along that way. You'll also take a look at some basic scenarios of file usage as well as some advanced techniques.
<p>In this course, you’ll learn about reading and writing files in Python. You’ll cover everything from what a file is made up of to which libraries can help you along that way. You’ll also take a look at some basic scenarios of file usage as well as some advanced techniques.</p>
<p>One of the most common tasks that you can do with Python is reading and writing files. Whether itโs writing to a simple text file, reading a complicated server log, or even analyzing raw byte data, all of these situations require reading or writing a file.</p>
<p>By the end of this course, youโll know:</p>
<ul>
<li>What makes up a file and why thatโs important in Python</li>
<li>The basics of reading and writing files in Python</li>
<li>Some basic scenarios of reading and writing files</li>
</ul>
<p>This tutorial is mainly for beginner to intermediate Pythonistas, but there are some tips in here that more advanced programmers may appreciate as well.</p>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>
Functional Programming in Pythonhttps://realpython.com/courses/functional-programming-python/2019-07-02T14:00:00+00:00In this course, you'll learn how to approach functional programming in Python. You'll cover what functional programming is, how you can use immutable data structures to represent your data, as well as how to use filter(), map(), and reduce().
<p>In this course, you’ll learn how to approach functional programming in Python. You’ll start with the absolute basics of Functional Programming (FP). After that, you’ll see hands-on examples for common FP patterns available, like using immutable data structures and the <code>filter()</code>, <code>map()</code>, and <code>reduce()</code> functions. You’ll end the course with actionable tips for parallelizing your code to make it run faster.</p>
<p>You’ll cover:</p>
<ol>
<li>What functional programming is</li>
<li>How you can use immutable data structures to represent your data</li>
<li>How to use <code>filter()</code>, <code>map()</code>, and <code>reduce()</code></li>
<li>How to do parallel processing with <code>multiprocessing</code> and <code>concurrent.futures</code></li>
</ol>
<hr />
<p><em>[ Improve Your Python With ๐ Python Tricks ๐ โ Get a short & sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&utm_medium=rss&utm_campaign=footer">>> Click here to learn more and see examples</a> ]</em></p>