Thursday, October 1, 2015

Copying lists (shallow copy vs. deep copy)

This post is an excerpt from my upcoming eBook "Mastering Python Lists".


Introduction

Python supports two types of list copying: “shallow” and “deep”. A shallow copy shares all of its items with the original list. In contrast, all the items in a deep copy are completely independent from the originals.

Shallow copies

A shallow copy can be created in 3 different ways

  • list() — passing the original list to the “list” function
  • [:] — taking a full slice of the original list
  • copy.copy() — using the “copy” function from the “copy” module
These all make a “shallow” copy of the original list.

original_list = [1,2,3]
shallow_copy_1 = original_list[:]
shallow_copy_2 = list(original_list)
shallow_copy_3 = copy.copy(original_list)

Shallow copy example

Shallow copies can lead to surprising behavior if you don’t understand the difference between a “shallow” copy and a “deep” copy. Let’s create a list containing an inner list, then copy it and see what happens.

outer_list = [1,2,[’a’,’b’,’c’]]
copy_1 = list(outer_list)
copy_2 = outer_list[:]




Notice how the inner list [’a’,’b’,’c’] is shared between the original list and the two copies. This means any change to outer_list[2] is also reflected in copy_1[2] and copy_2[2].

Deep copy example

If we want a copy of a list that is truly independent of the original list, we must use the “deepcopy” function from the “copy” module. This will create an independent copy of the original.

Compare this “deep copy” diagram to the previous “shallow copy” diagram and notice that the inner list is no longer shared.

import copy
outer_list = [1,2,['a','b','c']]
copy_1 = copy.deepcopy(outer_list)
copy_2 = copy.deepcopy(outer_list)



Summary

Shallow copying is Python’s default behavior since creating a shallow copy is much faster than creating a deep copy. Use caution whenever you modify a shallow copy because this may cause hard to find side effects in other parts of your program.

Monday, August 3, 2015

Python PEP 8

"The single most important formatting convention that you can follow is to indent your programs properly, so the indentation conveys the structure of the program to the reader at a glance. Indentation must be done carefully, however, lest you confuse rather than enlighten." -- The Elements of Programming Style

The PEP-8 guidelines were created to improve communication between programmers. Properly formatted source code is easier to understand for the same reason that books are easier to read if they are formatted into sentences and paragraphs. 

Here is the PEP8 document. Read this first.


PyLint
PyLint checks your code to see if it complies with the PEP8 recommendations. It can also help you refactor your program by detecting duplicate code. PyLint is integrated with various IDE's and editors. You can view the complete list here.

You can configure PyLint if you don't like the default settings. Follow the instructions in the PyLint tutorial.


Online Tools
This online PyLint checker lets you can upload your files and check them.


Videos
Beyond PEP 8 -- Best practices for beautiful intelligible code - PyCon 2015
Raymond Hettinger is always a popular speaker. Learn to avoid some of the hazards of the PEP 8 style guide and learn what really matters for creating beautiful intelligible code.


Formatters
AutoPEP8 will reformat your source code so it complies with the PEP8 guidelines. As always, you should make a backup copy of your files (or put them under version control) before letting any tool modify them. You can download it here.


Google
Google has developed their own Python style guide.


More Information
The web is full of Python style guides. Doing a Google search for [python coding style] will find them.

Saturday, July 18, 2015

Python Variables

This issue is a little different. Instead of bringing you the best information out on the web, I'm bringing you some of the worst.

Python variables are actually easy to understand if you have the right mental model. Unfortunately, most people have the wrong model.

Bad Google Results

Python variables are a simple topic, but there is a huge amount of misinformation about this on the internet. If you do a search for 'python variables', most of the information you find is just plain wrong!

Let's look at some of the top search results ...

http://www.tutorialspoint.com/python/python_variable_types.htm

"Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory. Based on the data type of a variable, the interpreter allocates memory and decides what can be stored in the reserved memory"

This is wrong. Python variables do not allocate memory.

Let's look at another one ...

http://www.learnpython.org/en/Variables_and_Types

"Python is completely object oriented, and not "statically typed". You do not need to declare variables before using them, or declare their type. Every variable in Python is an object"

Yes and no. Yes, Python is object oriented. No, variables are not objects.

Let's try another one ...

https://en.wikibooks.org/wiki/Python_Programming/Variables_and_Strings

"A variable is something that holds a value that may change. In simplest terms, a variable is just a box that you can put stuff in. You can use variables to store all kinds of stuff, but for now, we are just going to look at storing numbers in variables"

Nope. Not even close.


One last try ...

http://www.python-course.eu/variables.php

"As the name implies, a variable is something which can change. A variable is a way of referring to a memory location used by a computer program. A variable is a symbolic name for this physical location. This memory location contains values, like numbers, text or more complicated types.  A variable can be seen as a container (or some say a pigeonhole) to store certain values"


Wrong again. Python variables are not pigeonholes, or shoeboxes, or any other kind of container. 

How Variables Work In A Compiled Language Like C

So why is everyone so wrong about this? It's because they are taking what they know about a compiled language like C or Java and trying to apply it to Python.

Let's take a variable declaration in C and break it down into its parts.

Type, Name, and Value

Here is a typical C declaration

int i = 42;

We start with a type

int

This tells the compiler 2 things

  1. How much memory is needed to store the data.
  2. What operations are allowed on the data.

Different data types require different amounts of memory storage. A character needs 1 byte, an integer needs 4 bytes, and so forth.

Next, we have a variable name

i

The compiler allocates 4 bytes of memory exclusively for this variable. No other variable can use this memory.

Finally, we assign a value

= 42

The integer value 42 is copied into the 4 bytes allocated for the variable i.

You can think of this as the "envelope/letter" model. The variable acts like an envelope and the data acts like a letter put inside the envelope.

This is a good model for C or Java, but if you try to apply it to Python you will be hopelessly confused.

Everything Is An Object

In Python, everything is an object. Python has 3 built-in functions for examining objects

  • type()
  • id()
  • dir()

type()

The type() function returns an object's type. Every object has a specific type.

>>> type(42)
<type 'int'>

This tells us 42 is an integer object.

id()

The id() function returns an object's ID number. Every object has a unique ID number.

>>> id(42)
15893832

The ID number of this integer object is 15893832. Notice that ID has nothing to do with the value, which is 42.

dir()

The dir() function returns a list of all the object's attributes.

>>> dir(42)
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']

The integer object 42 is a true object with many built-in methods. It is much more than just a simple number.

So far, we've been operating directly on the object without creating any variables.

Objects can exist without variables. They do not depend on variables in any way.

How Variables Work In Python

Now we're ready to talk about variables.

First, let's look at the Python Language Reference. What does it say about variables?

https://docs.python.org/2/reference/simple_stmts.html?highlight=assignment#grammar-token-assignment_stmt

"Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects"

A variable is a name bound to an object. Variables DO NOT create objects, nor do they allocate memory to hold objects. They simply give us a way to refer to objects by name.


For example

x = 42

says "we can refer to the integer object 42 by the name x ".

x = y = 42

says "we can refer to the integer object 42 by either the name x or the name y "

Any time we use a variable in our program, we are really referring to the object named by that variable.



Variables can refer to anything, not just numbers.

We can bind names to any type of object.

We can bind a name to a list object

my_list = [1,2,3]

We can bind a name to a function object

def my_function():
    pass


f = my_function

Then we can call the function through that name

f()

We can bind a name to a string object

greeting = 'hello'

A Word About del()

You cannot directly destroy a Python object. Using the del() function on a variable only deletes the name, not the object. Python deletes objects automatically when they are no longer needed.

Still Confused?

If you are still confused about this, watch this excellent video by Ned Batchelder
http://nedbatchelder.com/text/names1.html

Friday, June 26, 2015

Python Regular Expressions

Regular expressions (regex's) are a language for describing patterns in text. Although Python functions like string.startswith() and string.endswith() can search for fixed substrings, a regex can recognize string patterns where the exact value is not known.

Books

Mastering Regular Expressions (O'Reilly Publishing)

This is truly the only book you will ever need. A massive encyclopedia (over 500 pages) covering every aspect of regular expressions in all the major programming languages. Affiliate link


Online Testers

Regex101

This is one of the best regular expression testers. It supports several regex dialects (PHP, JavaScript, and Python). You can build your expressions interactively and test them against sample text. Additional features include
  • online regex reference guide
  • display regular expressions in plain English
  • display match groups
  • automatically generate Python code for any regular expression


Command Line Tools

grep

Written 40 years ago, grep is one of the oldest regular expression tools. You can find free versions for all the major operating systems (Linux, Windows, Mac). With grep you can search through large amounts of text files using regular expressions to narrow your search. If you work with text files a lot, grep will probably end up being your main search tool. If you need a quick overview of grep's options, look at this short tutorial. Read the full article


awk

This command line tool is very similar to grep, but it's optimized for searching program files. It's written in Perl, which means you'll need to install Perl if you don't already have it on your system. Read the full article


Puzzles and Games

RegEx Crossword

Test your regular expression skills! This is an online game similar to Sudoku, but all the clues are written as regular expressions. Start at the "beginner" level and see how far you can go. Play now


Videos

Python for Informatics : Regular Expressions

This 35 minute lecture is Lesson 11 in the Python for Informatics course by Charles Severance. Dr. Severance explains how to use the Python regular expression library to clean up "dirty" data. And he'll show you his tattoo. Watch the video


Libraries

Python "re" module

A detailed explanation of Python's standard "re" module with examples. Read the full article


Sunday, June 21, 2015

Python State Machines

State machines allow you to describe the high level logic of your program while ignoring low level implementation details. They are especially useful for modeling event-driven systems.

Theory and Concepts

Statecharts

Here is the original paper written by David Harel in 1986. This gave birth to the UML state diagrams we have today. Read the full article

UML State Diagrams

This Wikipedia article explains the basics of Events, States, Guards, and Actions. Read this if you need to get up to speed on basic concepts and terminology. Read the full article

State Diagram Crash Course

This 30 page whitepaper makes a strong argument for using state machines instead of giant if-else statements. It uses the Visual Basic calculator program to illustrate how state machines can simplify your code and eliminate bugs. Read the full article


Videos

UML State Diagrams

Derek Banas teaches you the UML notation in this 13 minute YouTube video. Watch the video

Coding a State Machine by Hand

You probably don't want to do this. Watch the video


Libraries and Tools

pystatemachine 1.2

A clever idea. This package uses Python @decorators to turn any class into a state machine. Read the full article

State Machine Compiler (SMC)

Writing state machine code is dull, repetitive, and error prone. Let the computer do it for you. This tool generates Python code from a high-level description of your state machine. Read the full article