Thursday, September 4, 2008

Fun with 2to3

Recently, I've been working on 2to3 code refactoring tool. It's quite exciting really; how often does automatic editing of source code work?

2to3 is completely written in Python using the stdlib. The main steps in code translation are:

  1. 2to3 is given a Python source file and a list of transformations (in units called fixers) to apply to it.

  2. 2to3 generates a custom parse tree of the source based on a Python grammar that combines elements of 2.x and 3.x's syntax. It takes note of exact indentation and comment so it will be reproduced exactly later.

  3. Each fixer has a pattern that describes the nodes it wants to match in the parse tree. The tree is traversed while asking each fixer if it matches the given node.

  4. If the fixer's pattern matches the node, the fixer is ask to transform the code. The fixer can manipulate the parse tree directly.

  5. A diff against the original source is printed to stdout and optionally written back to the file.



Over the past few weeks, I've written a couple of fixers. It's pretty intuitive once you get the hang of it, but writing good tests is very important because Python's flexible syntax produces many possibilities you fixer must deal with. I also refactored lib2to3, so plugging a different system of fixers is much easier for custom applications. I've also written some documentation on it's usage. I hope to start documenting the API and writing a guide for creating fixers soon, so other people can start making use of lib2to3.

2 comments:

Anonymous said...

Yay! Documentation! I wanted to write a fixer, but failed due to the complete lack of any explanation of how it actually works. :)

I'm hoping for a set of fixes that will actually take 2.6 code and making it into 2.6/3.0 code. This looks possible with code that isn't too complex.

http://code.google.com/p/python-incompatibility

Unknown said...

It's been a while since you wrote this article. But searching online for more information on writing custom fixers didn't show up a lot of results.
Have you had time to write that documentation you were talking about?