The Python Report: 2016

Tuesday, April 12, 2016

The Shocking Truth About The Del Statement

The del statement doesn’t actually delete objects

Wait … what?

To understand this, you need to know a little bit about how Python handles variables and assignment.

Objects, references, and the = operator

An assignment statement looks like this:

<variable-name> = <object>

The assignment statement has 3 parts

The variable name.
The object being assigned to the variable.
The = operator, which creates a reference from the variable name to the object.

Point #3 is key to understanding how all this works. It might help if you imagine drawing an arrowhead on the = sign, like this

<variable-name> ⇒ <object>

This emphasizes the fact that variables don’t contain objects, they refer to objects.

If I assign the same object to two variables, they each contain a reference to that object.

Then if I delete the "X" variable here’s what happens

Clearly, the int(29) object didn’t get deleted. You know that because you can still refer to it through the "Y" variable. It was the reference from X to 29 that got deleted. And because you can’t have a variable that doesn’t refer to anything, the variable “X” also got deleted.

So how do you delete an object?

Well, you really can’t. Python does something called “reference counting”. It keeps track of how many references exist to an object. When the number of references drops to zero, the object is marked for “garbage collection”. Exactly how and when garbage collection happens is outside your control.

References

Python Tutorial, "The del statement (lists)"

Python 2.7 Quick Reference, "The del statement"

Wednesday, March 16, 2016

The Most Important Programming Skill

Learn to type.

I mean REAL typing ... not hunt-and-peck-like-a-chicken typing. Typing without looking at your hands.

If you can't type, take a look at typing.com or do a Google search for "learn to type".

Why is this important? Because if you can't type, writing code will be harder ... writing documentation will be harder ... writing unit tests will be harder ... in fact, EVERYTHING will be harder.

Tuesday, March 8, 2016

Refactoring And Testing With Doctest

"If you can't test it, don't build it. If you don't test it, rip it out."
Boris Beizer

Python comes with a variety of testing frameworks which are dead simple to use. You really have no excuse not to test your code as you're writing it.

I've put together a short video where you get to look over my shoulder as I refactor the FizzBuzz program. Even though this problem is completely artificial, it still has enough complexity to make refactoring a worthwhile exercise.

Wednesday, March 2, 2016

Mini lesson: De Morgan's Laws

The any() and all() functions

You can apply the logic operators "and" and "or" to a whole series of values using Python's any() and all() functions.

Given a sequence of values S

all(S) returns True if every value in S is True
any(S) returns True if at least one value in S is True

This means

all(S) == (S[0] and S[1] and S[2] ...)
any(S) == (S[0] or S[1] or S[2] ...)

De Morgan's Laws

De Morgan's laws define the relationship between the and, or, and not operators.

(not A and not B and not C) == not (A or B or C)
(not A or not B or not C) == not (A and B and C)

By knowing these laws you can sometimes rewrite complicated logical expressions in a simpler form.

Applying De Morgan's Laws

The all() function lets you test if every value is True, but how would you test if every value is False? You could negate each value (possibly with a list comprehension)

all( [not s for s in S] )

But it's easier to apply De Morgan's law

not any(S)

Similarly, you could test for at least one False value by negating each value

any( [not s for s in S] )

But De Morgan's law allows you to rewrite this as

not all(S)

Summary

So, to summarize

all(S) is True when every value in S is True
not any(S) is True when every value in S is False
any(S) is True when at least one value in S is True
not all(S) is True when at least one value in S is False

References

https://en.wikipedia.org/wiki/De_Morgan%27s_laws

Tuesday, February 23, 2016

Functional Python (UPDATE)

I had some very interesting responses to my previous post.

One concern some people had was the fact that my solution didn't scale for large files because I was reading the entire file into memory all at once. That's certainly a valid point.

It took me a while to come up with an alternate implementation, but I think I've got it. The trick is using reduce() to sum the 3 columns in parallel.

from functools import reduce

def wc_func(filename):
'''
Description:

1. use a generator to read the file and form
a (1, word_count, char_count) tuple from each line

2. use reduce to calculate a running total of each column
'''

with open(filename, 'r') as f:
data = ((1, len(line.split()), len(line)) for line in f)
return reduce (lambda x,y: (x[0]+y[0], x[1]+y[1], x[2]+y[2]), data)

print (wc_func('bigdata.txt'))

What about performance?

It depends. Using the standard CPython, the performance is painfully slow (basically unusable). But using pypy tells a very different story. With a 1.7Gb test file, performance is actually a bit better than the native "wc" Linux command.

18:54 ~$ which wc                                                                                                   

/usr/bin/wc                                                            

18:54 ~$ which pypy                                                                                                 

/usr/local/bin/pypy                                                     

18:54 ~$ pypy --version                                                                                             

Python 2.7.9 (9c4588d731b7, Mar 23 2015, 16:30:30)                                                                  

[PyPy 2.5.1 with GCC 4.6.3]                                             

18:54 ~$ ls -lh bigdata.txt                                                                                         

-rw-rw-r-- 1 mpoulin registered_users 1.7G Feb 22 23:48 bigdata.txt     

18:54 ~$ time pypy wc_func.py ; time wc bigdata.txt                                                                 

(100000000, 400000000, 1800000000)                                                                                  

real    0m39.695s                                                                                                   

user    0m30.376s                                                                                                   

sys     0m1.051s                                                                                                    

 100000000  400000000 1800000000 bigdata.txt                                                                        

real    0m40.433s                                                                                                   

user    0m31.336s                                                                                                   

sys     0m0.533s                                                                                                    

18:56 ~$           

Monday, February 22, 2016

Functional Python

Python allows you to write code in a variety of styles: procedural, object oriented, and functional. What is functional programming and why should you care?

What is functional programming?

Functional programming is a declarative style of programming that tries to minimize side effects and the use of variables. You might already be doing functional programming without realizing it.

Have you ever written an SQL query for a database?
Have you ever piped a series of commands in a Linux shell?

If so, you've done functional programming.

A Python example

Linux has a "wc" command that counts lines, words, and characters in a text file. Let's compare two versions, one written in a procedural style and the other in a functional style.

Procedural wc

def wc_1(filename):

'''

Description:

1. initialize some counter variables

2. loop through a file line by line

3. for each line, increment the counters

'''

line_count = 0

word_count = 0

char_count = 0

with open(filename, 'r') as f:

for line in f:

line_count += 1

char_count += len(line)

word_count += len(line.split())

return (line_count, word_count, char_count)

Functional wc

def wc_2(filename):

'''

Description:

1. read the file into a list

2. get the total number of lines

3. sum the number of characters in each line

4. sum the number of words in each line

'''

with open(filename, 'r') as f:

lines = f.readlines()

line_count = len(lines)

char_count = sum(len(line) for line in lines)

word_count = sum(len(line.split()) for line in lines)

return (line_count, word_count, char_count)

Comparing the two versions

In the procedural version

you have to create 3 separate counter variables
you have to initialize them correctly
you have to update the counts for each line in the file

In the functional version

there are no counter variables
you let Python's sum() function tally the totals

I prefer the functional version. I can say "sum the number of characters in each line" or "sum the number of words in each line" and Python knows how to do that. And I know that Python won't make any stupid mistakes like forgetting to initialize a variable or increasing the count by the wrong amount.

Performance

The profiler shows us that performance is about the same for both versions. So with this particular input file there's no performance penalty for reading the entire file into memory.

prun wc_1('python_kwic.txt')

         6046 function calls in 0.020 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.011    0.011    0.020    0.020 wc.py:21(wc_1)

     2014    0.004    0.000    0.004    0.000 {method 'split' of 'str' objects}

     4028    0.004    0.000    0.004    0.000 {len}

        1    0.001    0.001    0.001    0.001 {open}

        1    0.000    0.000    0.020    0.020 <string>:1(<module>)

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

prun wc_2('python_kwic.txt')

         6050 function calls in 0.017 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.008    0.008    0.017    0.017 wc.py:1(wc_2)

     4029    0.004    0.000    0.004    0.000 {len}

     2014    0.004    0.000    0.004    0.000 {method 'split' of 'str' objects}

        1    0.001    0.001    0.001    0.001 {method 'readlines' of 'file' objects}

        1    0.001    0.001    0.001    0.001 {open}

        2    0.000    0.000    0.000    0.000 {sum}

        1    0.000    0.000    0.017    0.017 <string>:1(<module>)

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Additional Resources

Functional Programming HOWTO -- https://docs.python.org/2/howto/functional.html

Functional programming -- https://en.wikipedia.org/wiki/Functional_programming

Functional Programming in Python -- http://www.oreilly.com/programming/free/functional-programming-python.csp

Mastering Python Lists -- http://amzn.com/B01BIHNWI0

Thursday, February 18, 2016

Mastering Python Lists (Kindle version)

How to Get Rid of Software Bugs Once and For All

If you suffer from confusing, hard to understand code… if you spend hours tracking down bugs… if you keep missing your project deadlines… then this message is just for you. Here’s why…

I can show you how to use Python’s built in list functions to write code that is faster, easier to understand, and bug-free.

And you need to realize, there is a cost to not dealing with this…

If You Ignore It, It Just Gets Worse

What most people do when facing a looming deadline is start pulling all-nighters. But for most people, that doesn’t work.

you start making stupid mistakes
you deliver code that’s barely been tested (or not tested at all)
your end users are unhappy with your slow, buggy code

And what happens if you just do nothing? If you just keep doing what you’ve been doing? You develop a reputation for poor quality and no one wants to deal with your inferior code.

How I Learned To Unleash The Power Of Python

I’ve got an answer that works. Here’s the story:

Over the last 30 years I’ve programmed in most of the popular languages. I slowly realized that each language has its own “super power” that makes it unique. C has special syntax for working with pointers. Java is designed around classes and objects. C++ has a special template notation. SQL is built on relational algebra and data sets. My AHA! moment came when I realized Python also has a “super power” : working with lists..

It Worked For Me, And It Will Work For You

Here’s what it did for me… By using Python’s built in functions, I was writing less code. My programs started to get smaller. I was finally able to test them and PROVE they were correct.

Finally, It’s Your Turn

When you order Mastering Python Lists today, you’ll get the keys to…

eliminating long, hard to understand blocks of code
speeding up your programs by a factor of 2X, 3X, or more
delivering reliable software that “just works”

And you are 100% safe to try this out. That’s all I’m suggesting. Just try it for 7 days to see if it works for you. If it does, you’ll be delighted - and I think that’s exactly what’s about to happen. If for some reason you’re not delighted with writing clean, reliable code that your boss or clients will love, then just let Amazon know - and you’ll get all your money back.

It’s Decision Time

You have a choice to make: Do what you’ve been doing (or worse, do nothing at all). You know where that will lead. Is that really where you want to go? Take a new action, and get a new result. Finally start writing the kind of quality code you know you are capable of.

Which do you really want for yourself? Buy today and have Mastering Python Lists delivered instantly to your Kindle device or PC.

Wednesday, February 17, 2016

Slice assignment

Slice assignment is a feature of Python that allows you to make bulk updates to a list.

The FizzBuzz test is described here: http://c2.com/cgi/wiki?FizzBuzzTest

"Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”."

I'll implement a solution using slice assignment to demonstrate how it works.

Solution

Start with a list of numbers.

fizzbuzz = list(range(101))

Replace every multiple of 3 with the word "Fizz'. The length of the slice tells us how many replacement values we need,

fizzbuzz[3::3] = ['Fizz'] * len(fizzbuzz[3::3])

Do the same for the multiples of 5 and 15.

fizzbuzz[5::5] = ['Buzz'] * len(fizzbuzz[5::5])

fizzbuzz[15::15] = ['FizzBuzz'] * len(fizzbuzz[15::15])

Now print the results (skipping the first value of 0)

for i in fizzbuzz[1:]:

print(i)

You can paste this code into Python Tutor if you want to step through it line by line.

Tuesday, February 16, 2016

Solving Project Euler Problem #1

A procedural solution

Drew Davis presents a very nice solution to this problem at his blog
http://rdrewd.blogspot.com/2016/02/project-euler-problem-1-in-python.html?m=1

The final code he ends up with is

sum=0
for num in range(1000):
if (num % 3)==0 or (num % 5) == 0:
sum += num
print(sum)

There's nothing wrong with this solution, but it's very procedural and it requires examining every number from 1 to 1000.

A set based solution

There's another way to approach this. We're being asked to sum the multiples of 3 and 5. Python's range function can give us those numbers directly without looping and without using the % operator.

multiples_of_3 = range(3, 1000, 3)

multiples_of_5 = range(5, 1000, 5)

There will be some duplication because any number that is a multiple of both 3 and 5 (like 15) will appear in both lists. The fix for this is to convert each list to a set and combine the sets with the "union" operator. This will eliminate any duplicates.

multiples = set(multiples_of_3) | set(multiples_of_5)

Now the only thing left is to print the sum.

print sum(multiples)

What about speed?

Here are both implementations and the timing results. The set based approach is about 3X faster.

def euler1a():
multiples_of_3 = range(3,1000,3)
multiples_of_5 = range(5,1000,5)
multiples = set(multiples_of_3) | set(multiples_of_5)
return sum(multiples)

def euler1b():
sum=0
for num in range(1000):
if (num % 3)==0 or (num % 5) == 0:
sum += num
return sum

timeit euler1a()

10000 loops, best of 3: 30.4 µs per loop

timeit euler1b()

10000 loops, best of 3: 108 µs per loop

Monday, February 1, 2016

A List Oriented Approach To The Fizz Buzz Test

What is the FizzBuzz Test?

"Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”."
http://c2.com/cgi/wiki?FizzBuzzTest

Here is a classic solution.

for n in range(1, 101):
line = ""
if n%3 == 0:
line = line + "Fizz"
if n%5 == 0:
line = line + "Buzz"

if line:
print line
else:
print n

Can we do better?

The for-loop solution gives the correct answer, but it's completely untestable. There's no way to check the intermediate results.

Here is a list-based solution that will allow us to check the results to any level of confidence we desire.

(NOTE: I'm only using the numbers 1..15 to keep the examples short.)

First, we generate 2 lists of words.

fizz = ['Fizz' if i % 3 == 0 else '' for i in range(1,16)]

buzz = ['Buzz' if i % 5 == 0 else '' for i in range(1,16)]

You can check your work

list(enumerate(fizz,1))

[(1, ''),

(2, ''),

(3, 'Fizz'),

(4, ''),

(5, ''),

(6, 'Fizz'),

(7, ''),

(8, ''),

(9, 'Fizz'),

(10, ''),

(11, ''),

(12, 'Fizz'),

(13, ''),

(14, ''),

(15, 'Fizz')]

list(enumerate(buzz,1))
[(1, ''),

(2, ''),

(3, ''),

(4, ''),

(5, 'Buzz'),

(6, ''),

(7, ''),

(8, ''),

(9, ''),

(10, 'Buzz'),

(11, ''),

(12, ''),

(13, ''),

(14, ''),

(15, 'Buzz')]

Then we zip them and concatenate the strings to produce the output text.

fizzbuzz = [f+b for f,b in zip(fizz,buzz)]

['',

'',

 'Fizz',

'',

 'Buzz',

 'Fizz',

'',

'',

 'Fizz',

 'Buzz',

'',

 'Fizz',

'',

'',

 'FizzBuzz']

Enumerating this list gets us the final answer.

list(enumerate(fizzbuzz,1))

[(1, ''),

 (2, ''),

 (3, 'Fizz'),

 (4, ''),

 (5, 'Buzz'),

 (6, 'Fizz'),

 (7, ''),

 (8, ''),

 (9, 'Fizz'),

 (10, 'Buzz'),

 (11, ''),

 (12, 'Fizz'),

 (13, ''),

 (14, ''),

 (15, 'FizzBuzz')]

Now we just have to decide whether to print the text or the number.

Putting it all together

# processing

fizz = ['Fizz' if i % 3 == 0 else '' for i in range(1,16)]

buzz = ['Buzz' if i % 5 == 0 else '' for i in range(1,16)]

fizzbuzz = [f+b for f,b in zip(fizz,buzz)]

solution = [s if s else i for i,s in enumerate(fizzbuzz,1)]

# output

for x in solution:

print x

This solution is 4 lines shorter than the original for-loop solution, but I think there are two other advantages that are more important:

Advantage #1: clean separation between the calculations and output. It's easier to focus on the logic of solving the problem. The print statement is almost an afterthought.

Advantage #2: we've captured the intermediate calculations and we can check them. For example, what if you made this mistake in your logic (getting the modulo test backwards)

fizz = ['Fizz' if i % 3 else '' for i in range(1,16)]

It would be easy to see that the logic is backwards

[(1, 'Fizz'),

(2, 'Fizz'),

(3, ''),

(4, 'Fizz'),

(5, 'Fizz'),

(6, ''),

(7, 'Fizz'),

(8, 'Fizz'),

(9, ''),

(10, 'Fizz'),

(11, 'Fizz'),

(12, ''),

(13, 'Fizz'),

(14, 'Fizz'),

(15, '')]

That's a lot harder to see when the calculation is buried inside a for-loop.

A direct solution

There is a direct solution which is shorter and faster than anything else. It can be done with

no looping
no if tests
no use of the % operator

Start with a list of numbers.

fizzbuzz = range(16)

Replace every multiple of 3 with 'Fizz'.

fizzbuzz[3::3] = ['Fizz' for i in fizzbuzz[3::3]]

Replace every multiple of 5 with 'Buzz'.

fizzbuzz[5::5] = ['Buzz' for i in fizzbuzz[5::5]]

Replace every multiple of 15 with 'FizzBuzz'.

fizzbuzz[15::15] = ['FizzBuzz' for i in fizzbuzz[15::15]]

for i in fizzbuzz[1:] : print i

I have a lot more tips and tricks in my book Mastering Python Lists.

Thursday, January 28, 2016

Negative subscripts

Link to transcript

Here's a short 3 minute video that explains Python's negative subscripting capabilities. This is a useful ability only seen in a few other programming languages.

Was this information useful? Leave me a comment below.

Tuesday, April 12, 2016

The del statement doesn’t actually delete objects

Objects, references, and the = operator

So how do you delete an object?

References

Wednesday, March 16, 2016

Tuesday, March 8, 2016

"If you can't test it, don't build it. If you don't test it, rip it out."Boris Beizer

Wednesday, March 2, 2016

The any() and all() functions

De Morgan's Laws

Applying De Morgan's Laws

Summary

References

Tuesday, February 23, 2016

What about performance?

Monday, February 22, 2016

What is functional programming?

A Python example

Procedural wc

Functional wc

Comparing the two versions

Performance

Additional Resources

Thursday, February 18, 2016

How to Get Rid of Software Bugs Once and For All

If You Ignore It, It Just Gets Worse

How I Learned To Unleash The Power Of Python

It Worked For Me, And It Will Work For You

Finally, It’s Your Turn

It’s Decision Time

Wednesday, February 17, 2016

Solution

Other solutions

Tuesday, February 16, 2016

A procedural solution

A set based solution

What about speed?

Monday, February 1, 2016

What is the FizzBuzz Test?

Can we do better?

Putting it all together

A direct solution

Thursday, January 28, 2016

Blog Archive

"If you can't test it, don't build it. If you don't test it, rip it out."
Boris Beizer