Debriefing from PyCon2010
Presented By: Milad Fatenejad, Nico Preston, Katy Huff, Matthew McCormick, and Anthony Scopatz
Python with 'Big' Astronomy
Astronomers with the Giant Magellan Telescope in Chile have used python for most of their simulations. They make sky simulations by linking already existing C code with Python as well as already existing Fortran code.
- For discrete event simulations they use a python module called !simPy.
- They also use MPI with the mpi4py module.
- The whole thing is destined to be an open source astronomy tool.
- The trac page where you can download this info is at http://dev.lsstcorp.org
- They use Swig and CTypes. The main developer/professor told the creator of Swig that he really likes CTypes...
- MILAD Put the link we got from the guy at the scientific computing Open Space here for python-condor integration, please.
The Mighty Dictionary
Python lists use contiguous segments of RAM and RAM is exactly like lists (indexed properly, etc). Dictionaries on the other hand are different. Behind the scenes it creates a hash table which is a list of items. A hash is an algorithm which reduces some key to 32 bits of binary numbers. Python uses the bottom n bits of the hash.
- Lookup is a function with three steps : Compute the hash, truncate it to the last three bits, and looks at that place in memory. This is why dictionaries print their keys in a crazy order. It's a visit to each slot in the order.
- If the dictionary contains values with a lot of hash values that end in the same three digits, that dictionary lookup will be very slow. The value is stored in the next most logical place when it encounters a hash spot that's already been used.
- The slides for this presentation were created with the python module doctest. A very interesting tool.
Deployment, Development, Dependencies, Packaging, and Stuff
This is an overview of dealing with your dependencies when packaging and deploying your code.
- Code without dependencies is easier to install, arguably more robust, and it's easy to figure out who's responsible if anything is broken.
- He reminds us about contracts, import * and module layouts. Don't use import * .
- What is a public function? Something is only explicitly marked private if it has a leading _ . What does this mean exactly?
- Similar commentary in terms of private modules.
- Double leading underscores are another organizing tool. They're for really obscure stuff?
- Python packages are a woonerf. From Wikipedia: "" A Woonerf (plural woonerfs or woonerven) in the Netherlands and Flanders is a street where pedestrians and cyclists have legal priority over motorists. "" Don't focus on rules or conventions. Just know what's going on. If you're a car about to collide with a bike, you make eye contact and make a decision instead of blindly following rules... Awesome little story.
The State of Packaging
From a developer point of view, distutils and setuptools code are like a spaghetti bowl. From the user point of view, it's a disaster. Pip, setup.py, and easy install all give different results in site-packages.
- The work here was on standardization.
- Metadata Standardization
- Verzion schemas
- Installed Distributions Index
The project (PEP 345) gives you options like what your code requires, what it's incompatible with, etc. There are also markers for compatibility across platforms and environments. In the future installs will be standardized and cleaned up. Uninstallation will also be possible (if they can get it to work :P ).
Using Python to Create Robotic Simulations for Planetary Exploration
Models rovers, which themselves are modular. Python, therefore makes a good fit over previous codes used (tickle). Some complex problems for modeling:
- Gear boxes have components that move at different speeds.
- Wheels of rover must impact (dig into) soil in the correct way.
- Correct suspension response for 1, 2, 4, 6+ wheels.
Neat Tools:
- Terrain data stored in HDF5!
- Code generates movie of simulation
- Use numpy, matplotlib...
Maximize your program's laziness
Haskell is a purely lazy language. Nothing ever gets computed until it is concretely needed. This allows lists to actually be infinitely long! In python, this would a generator or iterator that gives elements of a list.
Some words on python iterators:
- You can take a slice of an iterator! However, elements sliced out are gone (used up). Further access of the iterator won't return elements previously sliced out.
- 'Promise' objects are a promise to call some bit of code later. If you don't call the promise, that code is never executed. Used for very slow or intense methods. (Should be reworded.)
Deconstruction of an Object
This was a beginner talk discussing the way in which you should understand objects, inheritance, and classes in python.
- each class has def init
- docstrings are indicated with """
- super() is a useful builtin function that avoids Parent.init(self) in initialization of parent classes.
- Namespaces
- locals()
- globals()
- in python each module has its own set of globals
- main globals() and builtin globals() are different
- assigning variables to namespaces is similar in classes and functions.
Powerful Python Patterns
"What is our work? We work to delight human beings..."
More details can be found in the document [www.aleax.it/goo_pydp.pdf].
First, he talks about the template method : He calls this more descriptively "self-delegation."
- classically, this happens via inheritance and the base class has the orgnaizing method while subclasses do hooks.
- In python :
- You can override data.
- "mixins" are extra classes that complicate the notion of inheritance and contribute organizing methods
- something called "runtime introspection"
- injection is something where the hook implementation is 'injected' with organizing class attributes. It is well used with Factory methods.
Day 2
Saturday 20th, 2010
Lightning Talks
Points from each talk:
- MVC (in web development) is an Anti-pattern, the opposite of the right way of doing things.
- Package managers are VERY insecure.
- Default options SSL in the stdlib DO NOT verify the certificate.
- Twisted wants you to help review bugs...I remain skeptical.
- Largest PyCon ever at 1,025 with 10-11% female attendees.
Mark Shuttleworth
Good Open Source projects break into three components: Cadence, Leadership, and Maintenance.
Cadence
- The idea that you have a predictable, timed releases.
- For non-web projects, 3 to 6 month release cycle makes the most sense. (It divided into a year nicely, not immediate but not long.)
- Claim: the kernel functions on an unofficial 3 month cadence, almost down to the day.
In projects without a timed release schedule, there is real pressure to constantly merge stable and trunk.
Decorators and Decorator Classes
Decorators can make your code a lot clearer to read.
- decorators use the @ sign to
- concrete decorators don't take arguments @myDecorator() is one type of decorator. @myDecorator is a concrete decorator.
- More details can be found here http://www.python.org/dev/peps/pep-0318/.
- There is a useful python package for making decorators called dectools.
The speed of PyPy
PyPy has different performance characteristics from CPython. For instance,
- You shouldn't load a lot of stuff into your global namespace in PyPy. In both CPython and PyPy you take performance hit for doing this, but it PyPy it is very large but in CPython it is quite small.
- Has version typing.
- Avoid *args and **kwargs, unpacking arguments lists is expensive.
- Objects are smaller than on CPython
- Garbage Collector does not use reference counting, based on Java GCs w/ Python improvements.
Actors: What, Why, and How ?
- What
- Actors can create other actors.
- They can pass messages
- They can wait to receive messages
- Why
- Only an actor can change its own state
- Code is very simple and linear
- Message passing never shares any memory and is therefore easy to distribute
- Simplified Error Handling : Mostly Timeouts or Network Errors
- How
- Implementation in Python?
- This guy built his own [bitbucket.org/fzzzy/python-actors] ("for science!...")
- Implementation in Python?
Continuous Integration
(Titus Brown)
- Ingredients
- shell script / batch file
- cron job / scheduled task
- some notification for 0 or nonzero return
- Options for Continuous Integration Systems for Unit Testing
- Buildbot -> Second tier suggestion
- Hudson -> First tier suggestion
- CruiseControl
- Bamboo
- Bitten
- Apycot
- Continuum
- Quickbuild
- He's created something called pony-build. He suggests you not use it.
Basie
(Greg Wilson) His argument here is that you can get good code out of students. Not only this but student projects will produce students the things that compsci programs will not.
- Basie came from Dr. Project
- The course is called UCOSP, an undergrad capstone course, offers credit, and is based around a project.
- If it takes a student more than 5 minutes to figure something out, they'll never use it.
- Review Board is a cool tool for code reviewing
- Bad working practices (all nighters) are encouraged by the teaching structure at universities.
- He quotes a friend... "Professors, we're here to do research, they pay us to teach, we waste our time in administration."
- Code Review! "We expect people to write code without ever reading any..."
- Suggests Karl Fogel's book on Producing Open Source Software
- [ucosp.wordpress.com] is asking for professors/professionals already involved in open source projects to lead students for a summer.
Hg and Git: Can't we all just get along
- Hg and git are similar.
- SVN is the enemy.
- http://hg-git.github.com/ hg-git provides two way translation to work with a repository of the opposite setup.
- Github.com may have hg support via hg-git soon.
Vistrails
- Provenance and scientific visualization system.
- Stand-alone application or works with Paraview or Visit.
- Visualization with VTK, matplotlib, ....
- http://vistrails.org
Testing
- Good tools for testing: nose http://code.google.com/p/python-nose/ and py.test http://codespeak.net/py/dist/test/
- Coverage testing: figleaf http://darcs.idyll.org/~t/projects/figleaf/doc/ and coverage http://nedbatchelder.com/code/coverage/
Distributed computing
- The python Global Interpreter Lock (GIL) is the suck.
- Good things were said about the multiprocessing module to work on multiple cores.
- execnet is a new system to do distributed computing or testing across heterogeneous systems. http://codespeak.net/py/dist/execnet.html
- mpi4py
- Pyro
- Parallel computing features in IPython.
python-docx
- http://github.com/mikemaccana/python-docx
- Tool to work with Microsoft Office new document formats.
From the Meeting
mpi4py : (Milad): allows you to use MPI in python
- Useful when you couple MPI with fortran: can use fortran for the computationally intensive things, and use python to handle passing the data around
py.test & nose: good for testing your code - e.g., test-driven development
JSON : Java Script Object Notation - the 'new' xml format
- e.g., useful for writing a configuration file
YAML : similar to JSON, but whitespace-sensitive
Parallel computing with python: python has some issues right now with multithreading (i.e, multiple threads on a shared-memory machine): the problem is that there is only one interpreter, so can get conflicts between the two threads, as they both try to interpret the next line of code (?)
- Can still get benefits from doing multi-threading I/O, since that's done in C, and the first thing that's done in the C code is release the lock on the intepreter
- And this issue doesn't come up with MPI -- it just comes up with multithreading
Databases: (Nico): Lots of different database technologies....
Lists & dictionaries: (Katy)
- Lists: stored contiguously in memory (sounds like an array in a lower-level language)
o Although someone debates this...
- Dictionaries: stored using hashes, so that you can basically get O(1) time, like a list lookup
o hash keys are 32 bit strings
+ Can print this using bin(hash(key)) + Then just uses last 3 bits to determine index; if a collision, uses previous 3 bits, etc.
Jeff Klukas asked: He wrote a little python thing that might be useful for others, and wants to know how to distribute it
- The simplest thing seems to be distutils (there are problems with this, but for simple installations it works)
- Good to know about init.py (or something like that) file: you put it in a directory and it does things like say what is publicly available, etc. (?)
- Milad: also, good tools like sourceforge or launchpad to distribute things, or pypi (the latter is the official repository of python packages)
Matt Terry: Wrapping 'bucky' fortran code in python, to make it more interactive & easily accessible
- First, turned code into a library, then wrapped it in f2py
o It's not necessary to turn it into a library, but this is nice in that you can compile the library once, then have a variety of different versions of the main program - one in fortran, one in python, etc.
- Then, can do things like:
o Add a check in the main loop for some user-defined condition o Can inject code to plot something in every cycle
- In order to still have access to the interpreter, spawns a new thread to run the code in the background - then, can type pause() at the command-line, and it will pause at the next iteration, so you can plot stuff, etc., mid-stream
vistrails (http://vistrails.org): a tool for tracking your workflow, and keeping versions of EVERYTHING - e.g., when you change a parameter and re-create your figure, it keeps the different versions
![(please configure the [header_logo] section in trac.ini)](/cgi-bin/hackerwithin.fcgi/chrome/site/thwlogo-small.png)