Thursday, December 11, 2008

Get help porting your package!

To help the community with porting their packages to Python 3, we have created the python-porting mailing list. Many core developers are subscribed to the list, so you should be able to get excellent advice on 2to3, the bytes/unicode split, C-API, or other incompatibilities.

Wednesday, December 3, 2008

Is it all futile?

Today, the much anticipated Python 3.0 final was released. Truly, this is a historic release for the Python community, the first intentional incompatible Python release. It's been a long time in the making, and I applaud everybody who is responsible for proving Py3k is not vaporware. Guido and the other decision makers on Python-dev also deserve credit for not making py3k changes too gratuitous; many revolutionary ideas and features were proposed and rejected. I have every confidence that every incompatibility was well thought out. Much thought has also been given to making the transition as easy as possible for the community. The suggested migration path, fixing Py3k warnings in 2.6 and then applying 2to3, has been used with a fair amount of success on several projects. We've even had several Py3k love letters!

Still, I can't help being a little worried. I think the bytes and str divide will be difficult for people especially with IO where everything has to be dealt with in bytes. We may see many "x.encode('ascii')" lines popping up all over codebases. Userland libraries will need to maintain compatibility with 2.4 and 2.5 for a while; that significantly complicates the dream of maintaining just one branch for 2.x and py3k. 2to3 is not even close to perfect and will only correct the surface incompatibilities of syntax between the versions. I'm also concerned about burn out. The excitement of a new major version will certainly spur an interest in porting for a few months, but I suspect it won't be so fun after the aura wears off a bit. I hope common base libraries (PIL, Twisted, lxml, etc...) are ported soon. It will build the bridge for everything else to cross over too.

Of course, what I'm forgetting is the amazing Python community. Whatever the results, a new era has certainly begun. We just need time.

Sunday, October 19, 2008

Pure Python Dictionary Implementation

For those curious about how CPython's dict implementation works, I've written a Python implementation using the same algorithms. Aside from the education value, it's pretty useless because it doesn't support None as a value and is extremely slow. You can get the source in a Bazaar repo: http://code.python.org/python/users/benjamin.peterson/pydict/

"""
A Python dict implementation.
"""

import collections

MINSIZE = 8
PERTURB_SHIFT = 5
dummy = "<dummy key>"


class Entry(object):
"""
A hash table entry.

Attributes:
* key - The key for this entry.
* hash - The has of the key.
* value - The value associated with the key.
"""

__slots__ = ("key", "value", "hash")

def __init__(self):
self.key = None
self.value = None
self.hash = 0

def __repr__(self):
return "<Entry: key={0} value={1}>".format(self.key, self.value)



class Dict(object):
"""
A mapping interface implemented as a hash table.

Attributes:
* used - The number of entires used in the table.
* filled - used + number of entries with a dummy key.
* table - List of entries; contains the actual dict data.
* mask - Length of table - 1. Used to fetch values.
"""

__slots__ = ("filled", "used", "mask", "table")


def __init__(self, arg=None, **kwargs):
self.clear()
self._update(arg, kwargs)

@classmethod
def fromkeys(cls, keys, value=0):
"""
Return a new dictionary from a sequence of keys.
"""
d = cls()
for key in keys:
d[key] = value
return d

def clear(self):
"""
Clear the dictionary of all data.
"""
self.filled = 0
self.used = 0
self.mask = MINSIZE - 1
self.table = []
# Initialize the table to a clean slate of entries.
for i in range(MINSIZE):
self.table.append(Entry())

def pop(self, *args):
"""
Remove and return the value for a key.
"""
have_default = len(args) == 2
try:
v = self[args[0]]
except KeyError:
if have_default:
return args[1]
raise
else:
del self[args[0]]
return v

def popitem(self):
"""
Remove and return any key-value pair from the dictionary.
"""
if self.used == 0:
raise KeyError("empty dictionary")
entry0 = self.table[0]
entry = entry0
i = 0
if entry0.value is None:
# The first entry in the table's hash is abused to hold the index to
# the next place to look for a value to pop.
i = entry0.hash
if i > self.mask or i < i:
i = 1
entry = self.table[i]
while entry.value is None:
i += 1
if i > self.mask:
i = 1
entry = self.table[i]
res = entry.key, entry.value
self._del(entry)
# Set the next place to start.
entry0.hash = i + 1
return res

def setdefault(self, key, default=0):
"""
If key is in the dictionary, return it. Otherwise, set it to the default
value.
"""
val = self._lookup(key).value
if val is None:
self[key] = default
return default
return val

def _lookup(self, key):
"""
Find the entry for a key.
"""
key_hash = hash(key)
i = key_hash & self.mask
entry = self.table[i]
if entry.key is None or entry is key:
return entry
free = None
if entry.key is dummy:
free = entry
elif entry.hash == key_hash and key == entry.key:
return entry

perturb = key_hash
while True:
i = (i << 2) + i + perturb + 1;
entry = self.table[i & self.mask]
if entry.key is None:
return entry if free is None else free
if entry.key is key or \
(entry.hash == key_hash and key == entry.key):
return entry
elif entry.key is dummy and free is None:
free = dummy
perturb >>= PERTURB_SHIFT

assert False, "not reached"

def _resize(self, minused):
"""
Resize the dictionary to at least minused.
"""
newsize = MINSIZE
# Find the smalled value for newsize.
while newsize <= minused and newsize > 0:
newsize <<= 1
oldtable = self.table
# Create a new table newsize long.
newtable = []
while len(newtable) < newsize:
newtable.append(Entry())
# Replace the old table.
self.table = newtable
self.used = 0
self.filled = 0
# Copy the old data into the new table.
for entry in oldtable:
if entry.value is not None:
self._insert_into_clean(entry)
elif entry.key is dummy:
entry.key = None
self.mask = newsize - 1

def _insert_into_clean(self, entry):
"""
Insert an item in a clean dict. This is a helper for resizing.
"""
i = entry.hash & self.mask
new_entry = self.table[i]
perturb = entry.hash
while new_entry.key is not None:
i = (i << 2) + i + perturb + 1
new_entry = self.table[i & self.mask]
perturb >>= PERTURB_SHIFT
new_entry.key = entry.key
new_entry.value = entry.value
new_entry.hash = entry.hash
self.used += 1
self.filled += 1

def _insert(self, key, value):
"""
Add a new value to the dictionary or replace an old one.
"""
entry = self._lookup(key)
if entry.value is None:
self.used += 1
if entry.key is not dummy:
self.filled += 1
entry.key = key
entry.hash = hash(key)
entry.value = value

def _del(self, entry):
"""
Mark an entry as free with the dummy key.
"""
entry.key = dummy
entry.value = None
self.used -= 1

def __getitem__(self, key):
value = self._lookup(key).value
if value is None:
# Check if we're a subclass.
if type(self) is not Dict:
# Try to call the __missing__ method.
missing = getattr(self, "__missing__")
if missing is not None:
return missing(key)
raise KeyError("no such key: {0!r}".format(key))
return value

def __setitem__(self, key, what):
# None is used as a marker for empty entries, so it can't be in a
# dictionary.
assert what is not None and key is not None, \
"key and value must not be None"
old_used = self.used
self._insert(key, what)
# Maybe resize the dict.
if not (self.used > old_used and
self.filled*3 >= (self.mask + 1)*2):
return
# Large dictionaries (< 5000) are only doubled in size.
factor = 2 if self.used > 5000 else 4
self._resize(factor*self.used)

def __delitem__(self, key):
entry = self._lookup(key)
if entry.value is None:
raise KeyError("no such key: {0!r}".format(key))
self._del(entry)

def __contains__(self, key):
"""
Check if a key is in the dictionary.
"""
return self._lookup(key).value is not None

def __eq__(self, other):
if not isinstance(other, Dict):
try:
# Try to coerce the other to a Dict, so we can compare it.
other = Dict(other)
except TypeError:
return NotImplemented
if self.used != other.used:
# They're not the same size.
return False
# Look through the table and compare every entry, breaking out early if
# we find a difference.
for entry in self.table:
if entry.value is not None:
try:
bval = other[entry.key]
except KeyError:
return False
if not bval == entry.value:
return False
return True

def __ne__(self, other):
return not self == other

def keys(self):
"""
Return a list of keys in the dictionary.
"""
return [entry.key for entry in self.table if entry.value is not None]

def values(self):
"""
Return a list of values in the dictionary.
"""
return [entry.value for entry in self.table if entry.value is not None]

def items(self):
"""
Return a list of key-value pairs.
"""
return [(entry.key, entry.value) for entry in self.table
if entry.value is not None]

def __iter__(self):
return DictKeysIterator(self)

def itervalues(self):
"""
Return an iterator over the values in the dictionary.
"""
return DictValuesIterator(self)

def iterkeys(self):
"""
Return an iterator over the keys in the dictionary.
"""
return DictKeysIterator(self)

def iteritems(self):
"""
Return an iterator over key-value pairs.
"""
return DictItemsIterator(self)

def _merge(self, mapping):
"""
Update the dictionary from a mapping.
"""
for key in mapping.keys():
self[key] = mapping[key]

def _from_sequence(self, seq):
for double in seq:
if len(double) != 2:
raise ValueError("{0!r} doesn't have a length of 2".format(
double))
self[double[0]] = double[1]

def _update(self, arg, kwargs):
if arg:
if isinstance(arg, collections.Mapping):
self._merge(arg)
else:
self._from_sequence(arg)
if kwargs:
self._merge(kwargs)

def update(self, arg=None, **kwargs):
"""
Update the dictionary from a mapping or sequence containing key-value
pairs. Any existing values are overwritten.
"""
self._update(arg, kwargs)

def get(self, key, default=0):
"""
Return the value for key if it exists otherwise the default.
"""
try:
return self[key]
except KeyError:
return default

def __len__(self):
return self.used

def __repr__(self):
r = ["{0!r} : {1!r}".format(k, v) for k, v in self.iteritems()]
return "Dict({" + ", ".join(r) + "})"

collections.Mapping.register(Dict)


class DictIterator(object):

def __init__(self, d):
self.d = d
self.used = self.d.used
self.len = self.d.used
self.pos = 0

def __iter__(self):
return self

def next(self):
# Check if the dictionary has been mutated under us.
if self.used != self.d.used:
# Make this state permanent.
self.used = -1
raise RuntimeError("dictionary size changed during interation")
i = self.pos
while i <= self.d.mask and self.d.table[i].value is None:
i += 1
self.pos = i + 1
if i > self.d.mask:
# We're done.
raise StopIteration
self.len -= 1
return self._extract(self.d.table[i])

__next__ = next

def _extract(self, entry):
return getattr(entry, self.kind)

def __len__(self):
return self.len

class DictKeysIterator(DictIterator):
kind = "key"

class DictValuesIterator(DictIterator):
kind = "value"

class DictItemsIterator(DictIterator):

def _extract(self, entry):
return entry.key, entry.value

Monday, October 13, 2008

First impressions of darcs

This week, I've been playing around with the relatively little known distributed version control system, darcs. (That stands for David's Advanced Revision Control System.)

Darcs is based on David Roundy's, its creator, theory of patches. Simply put, darcs' fundamental type is a difference between two trees, a patch.

Creating a simple repo was quick and painless with "darcs initialize". I recorded a few patches easily, and was feeling quite happy about the fast pace with which darcs went about its business. Then, I decided to review my work. Apparently, darcs has no concept of a revision number; every "commit" is just a patch. This makes selecting patches to review rather difficult since everything is relative to the current state of the repo. Perhaps this isn't a problem in practice, though, because advanced patch matching (with regular expressions) is provided. Another thing I disliked was the lack of history in merging between repos. Although it is simple to do, no evidence besides the author's name in the log indicates that the patch was pulled.

Obviously, this is just a first step into the exciting darcs world; I'll continue to use it for some of my projects, and report back later.

Wednesday, October 1, 2008

Python 2.6 released!

Fire the cannons! Begin the fireworks! Scream at the top of your lungs! Clink your glasses! Python 2.6 is here! Download it, and learn what's new.

This is my first final python release on the core team, and I'm quite proud of our baby. :)

Sunday, September 14, 2008

2.6 RC 1 is out

If you didn't already know it, 2.6 RC 1 was released two days ago. There's nothing big to report (and there shouldn't be of course!). In fact, the biggest change you may notice is the absence of a 3.0 RC alongside. This is due to the fact that we believe although 2.6 is almost rock solid, 3.0 definitely still needs some more TLC. We are still aiming for a 2.6 final on October 1st, but 3.0 has been pushed back to the 15th. (You can, of course, see the whole release schedule in PEP 361.) Please, please, please take the time to download it and test it with your project; now is when we want to know about grotesque bugs and incompatibilities!

Thursday, September 4, 2008

Fun with 2to3

Recently, I've been working on 2to3 code refactoring tool. It's quite exciting really; how often does automatic editing of source code work?

2to3 is completely written in Python using the stdlib. The main steps in code translation are:

  1. 2to3 is given a Python source file and a list of transformations (in units called fixers) to apply to it.

  2. 2to3 generates a custom parse tree of the source based on a Python grammar that combines elements of 2.x and 3.x's syntax. It takes note of exact indentation and comment so it will be reproduced exactly later.

  3. Each fixer has a pattern that describes the nodes it wants to match in the parse tree. The tree is traversed while asking each fixer if it matches the given node.

  4. If the fixer's pattern matches the node, the fixer is ask to transform the code. The fixer can manipulate the parse tree directly.

  5. A diff against the original source is printed to stdout and optionally written back to the file.



Over the past few weeks, I've written a couple of fixers. It's pretty intuitive once you get the hang of it, but writing good tests is very important because Python's flexible syntax produces many possibilities you fixer must deal with. I also refactored lib2to3, so plugging a different system of fixers is much easier for custom applications. I've also written some documentation on it's usage. I hope to start documenting the API and writing a guide for creating fixers soon, so other people can start making use of lib2to3.

On the Google Open Source Blog

It seems my testing project has made the Google Open Source Blog! Once again, I'd like to give a huge round of applause everyone who made it possible!

Thursday, August 21, 2008

The third betas are out!

Last night, Barry released the third and final betas for Python 2.6 and 3.0. This release really saw a monumental effort from everyone. Tuesday was filled with coding, code reviews, and bug fixes, and we were able to close all 10 release blockers. It was looking very bright for a release yesterday morning. (Including the fantastic omen of a green Windows buildbot.) That didn't stop another release blocker from popping up hours before the release. However, we were able to get that one fixed due to some excellent debugging on the parts of Amaury Forgeot d'Arc, Antoine Pitrou, and Victor Stinner. Highlights:

  • A more completed memoryview implementation. (Once again thanks to Antoine.

  • I fixed up the symtable module and wrote unittests and docs for it. It should actually be functional now...

  • The threading API was renamed once again. Now daemon, and name are properties of the Thread object. (That's the last time I'm doing threading renaming I assure you!)

  • multiprocessing has gotten more stable.

  • Guido donated the SRE bytecode validator he wrote for the App Engine.

  • Many, many, many little bug fixes.



Of course, there's still lots of work to do before we can get to release candidates. Hours after the release we already have 11 release blockers. In the mean time, I strongly encourage you to download 2.6 and 3.0 and test them with your project. The bug tracker is still there: http://bugs.python.org

Friday, July 25, 2008

bzr vs. hg -- a different perspective

I'd say one of my favorite parts of F/OSS is the educational value I can get from it. There's much more practical programming knowledge in Python/import.c than in your average CS textbook.

My code curiosity recently turned to my favorite (D)VCSes, Mercurial, and Bazaar. (I've tried reading Subversion, but the C is just too much.) Much of the world (including me) likes to battle over their merits, but I'd like to talk about what I saw in the source. (Disclaimer: I do not claim to have an good knowledge of either project's design philosophy or why things were done the way they were.)

On the superficial side, Bazaar has a lot more code than Mercurial does: About 1 MB with 2.5 MB of tests for Mercurial and 6MB with 6 MB of tests for Bazaar. I'm not going to make anything of it, though, for fear of condemnation. Mercurial's non-capitalized class names also drive me a bit crazy...

Overall, Mercurial appears to be a much simpler system, true to the wisdom of "do one thing well". The one repository format is beautifully simple. Bazaar, on the other hand, has to have layers of abstraction in order to make access over many protocols and different repository, branch, and working tree formats possible. For this complexity, it gains much flexibility allowing it to be a hybrid distributed and central VCS. Bazaar's source code also has many more comments and docstrings. Mercurial's looks rather bare in comparison.

Bazaar and Mercurial both seem to implement dirstate the same way. They also both have C extensions targeted to speed themselves up.

In the end, the both work well and are a pleasure to use, so I'm not going to complain.

Friday, July 18, 2008

The Second betas are out!

Our release manager, Barry Warsaw, has just released (I prefer unleashed) Python 2.6 and 3.0 beta 2 in the big, wild world. Highlights (that I can recall) include:


  • Installing on Mac for 3.0 is fixed.

  • test_multiprocessing hangs have been fixed.

  • gettext has gotten a different API.

  • Tracebacks now display their causes and contexts. See exception chaining.

  • Antoine Pitrou got commit access. Yeah!



Even now, the planned final release date in October looms, so we need lots of people to test their application and libraries with the beta release and file bug reports.

Sunday, July 13, 2008

Learning Django

Quite honestly, I haven't done any web programming since I quit PHP. I do hear a lot of about all those Python web frameworks, so when I saw a link to the Django Book, I took the initiative and started reading. I haven't tried out any other frameworks, but my initial reaction is to love Django. It's obviously well designed around DRY and the Zen and very well documented. Django models are beautiful and makes SQL very painless.

Having worked through the examples in the book, I've started conceiving a larger project. I'd like to make my personal homepage. Obviously, it would have a lot of static content, and I want to do that in the clearest way possible. So, I'll steal a hint from the django docs: write my site in reST and dynamically generate the webpage with docutils. (It will be a good exercise with caching.) In fact, I have a version running locally.

[My first lazyweb query:] Where can I get good Python hosting? I'll need to upload Django, too because I'm running off SVN. Can it be gotten cheap or even for free?

BTW, I'm now aggregated on Planet Python!

Thursday, June 19, 2008

The betas are out!

Yesterday, Barry released the first betas for Python 2.6 and 3.0. Highlights that I remember (look in Misc/NEWS for all the gory details) include:

Now that the 2.6 and 3.0 are almost feature complete, (It was just noticed yesterday that some points of PEP 3118 have not been implemented. They will have to go in post-beta.) we can switch gears and start grinding on the bugs. What I seen so far on the tracker is already impressive. We've gotten at least 10 new issues mentioning the betas. Hopefully, the bug days this weekend will be able to stamp out many of them.