2to3 is completely written in Python using the stdlib. The main steps in code translation are:
- 2to3 is given a Python source file and a list of transformations (in units called fixers) to apply to it.
- 2to3 generates a custom parse tree of the source based on a Python grammar that combines elements of 2.x and 3.x's syntax. It takes note of exact indentation and comment so it will be reproduced exactly later.
- Each fixer has a pattern that describes the nodes it wants to match in the parse tree. The tree is traversed while asking each fixer if it matches the given node.
- If the fixer's pattern matches the node, the fixer is ask to transform the code. The fixer can manipulate the parse tree directly.
- A diff against the original source is printed to stdout and optionally written back to the file.
Over the past few weeks, I've written a couple of fixers. It's pretty intuitive once you get the hang of it, but writing good tests is very important because Python's flexible syntax produces many possibilities you fixer must deal with. I also refactored lib2to3, so plugging a different system of fixers is much easier for custom applications. I've also written some documentation on it's usage. I hope to start documenting the API and writing a guide for creating fixers soon, so other people can start making use of lib2to3.