BMI Students

Monday, April 10, 2006

On the installation of a proper Python environment

I recently had my hard drive fail, so I have just had the fun of reinstalling my Python environment from scratch. Here are instructions for setting up Python as a proper interactive development and data analysis environment. Some of these instructions are OS X specific (I'll flag those), but the general procedure will work on any *nix-ish platform.

Note that on OS X, I don't really love Fink or Darwinports for building and installing software for me. Especially not software that I depend on, and may need to patch, or use bleeding-edge versions, etc. So, here is how to install the following, all from source:
  • Python -- The best.
  • IPython -- An interactive python shell. Really useful.
  • NumPy and SciPy -- Numerical and scientific computing packages. Key for serious Python data analysis.
  • Gnuplot -- This is the plotting package that I use, and that I actually really like. (Hint: it saves plots as SVG for editing in Illustrator.)
  • Gnuplot.py -- A Python to Gnuplot bridge.

  • Here are the installation instructions.

    Before doing anything, make sure that /usr/local/bin is first in the PATH environment variable, because that's where we'll be installing these things. To see your path, type echo $PATH, and you will see a colon-separated list of directory names. This list or directories is searched, in the order specified, for programs to execute when you type in a particular name, like python. Since we are leaving Apple's own old (and crappy) version of Python in /usr/bin, we need to make sure that the new shiny Python we install (in /usr/local/bin) will be the one that is used when we type python into the shell. Hence the need to put /usr/local/bin first on the PATH.
    If you need to add /usr/local/bin to the PATH and you're using the bash shell (the OS X default), you will need to create a file called .profile in your home directory (if it doesn't exist) and add the following line to it: export PATH=/usr/local/bin:$PATH. Here's a good way to do that:
    echo "export PATH=/usr/local/bin:$PATH" >> .profile

    If you're using tcsh (you would know if you are since it's not the default; but you can type echo $SHELL to find out), you would want the following:
    echo "setenv PATH /usr/local/bin:$PATH" >> .tcshrc

    Also note that I've only tested this on OS X 10.4. Some of the stuff might not work right on 10.3. Finally, If you're using 10.4, make sure that you have the latest version of the developer tools.

    And now, the directions!
    1. Make a directory for the source code we'll be getting.
      cd ~
      mkdir Developer
      cd Developer

    2. Install GNU Readline. This library allows Python and other programs to use the arrow keys like you expect, and many other goodies. Most *nixes come with a good version of Readline, but Apple doesn't ship OS X with one, probably because it's GPL. We'll install the latest Readline, plus some patches to it that are pretty important to make IPython work.
      curl -O ftp://ftp.cwru.edu/pub/bash/readline-5.1.tar.gz
      tar -xzf readline-5.1.tar.gz
      cd readline-5.1
      curl -O ftp://ftp.cwru.edu/pub/bash/readline-5.1-patches/readline51-001
      curl -O ftp://ftp.cwru.edu/pub/bash/readline-5.1-patches/readline51-002
      curl -O ftp://ftp.cwru.edu/pub/bash/readline-5.1-patches/readline51-003
      curl -O ftp://ftp.cwru.edu/pub/bash/readline-5.1-patches/readline51-004
      cat readline51* | patch
      ./configure
      make
      sudo make install
      cd ..

    3. Now Python 2.4.3 (the latest released version). These instructions are specific for building Python as an OS X framework (the proper way to install Python on OS X.)
      mkdir Python
      cd Python
      curl -O http://www.python.org/ftp/python/2.4.3/Python-2.4.3.tgz
      tar -xzf Python-2.4.3.tgz
      cd Python-2.4.3
      ./configure --enable-framework
      make
      sudo make frameworkinstall
      cd ../..

      If you're using tcsh, you'll need to type "rehash" so that the shell can find the just-installed python.

    4. Now we need to get Subversion (a CVS-like tool) to check out bleeding-edge versions of IPython, SciPy, and NumPy. (Trust me, the svn versions of these are better than the latest releases, and more bug-free, because I've been actively tracking down OS X bugs for these tools.)
      These instructions show how to (on OS X) download a .dmg disk image containing a .pkg installer, mount the image, install the package, and unmount the image, all from the command line. You could also just do it from the finder with double-clicking, but this shows how hard-core I am!
      curl -O http://metissian.com/downloads/macosx/subversion/subversion-client-1.3.1.dmg
      hdiutil attach subversion-client-1.3.1.dmg
      sudo installer -pkg /Volumes/Subversion\ Client\ 1.3.1/SubversionClient-1.3.1.pkg -target /
      hdiutil detach /Volumes/Subversion\ Client\ 1.3.1

    5. Now IPython. The "pythonw" part is OS X-specific (see the IPython manual for explanation), on other platforms just use "python".
      cd Python
      svn co http://ipython.scipy.org/svn/ipython/ipython/trunk ipython
      cd ipython
      sudo pythonw setup.py install --install-scripts=/usr/local/bin
      cd ..

      Don't forget to type "rehash" if you're using tcsh, otherwise the shell won't be able to find the newly-installed ipython script.

    6. Now NumPy.
      svn co http://svn.scipy.org/svn/numpy/trunk numpy
      cd numpy
      python setup.py build
      sudo python setup.py install
      cd ../..

    7. Here's a fun one. Apple ships GCC version 4 with Tiger. GCC 4 is OK, but it changed the standard for linking object files together from how GCC 3 did it. Now, we'll need to link together a lot of C and Fortran code for SciPy (which wraps lots of high-performance numerical libraries, which are mostly written in Fortran). So we'll need to use a single linking style -- that of gcc3 or of gcc4. Now g77 is the GNU fortran compiler that works with gcc3, and gfortran is the one for use with gcc4. Unfortunately, gfortran sort of sucks, in that it is known to generate incorrect code, especially for PPC chips. So, unless you've got an Intel Mac, we will have to use gcc3 and g77. (The gcc3 Apple supplies for Intel macs is known to suck, so on Intel you should use gcc4 and gfortran.)
      Anyhow, this means that we'll need to tell gcc to use version 3 and not version 4 for the code we compile to link with scipy. Skip if on an Intel Mac. Also skip if you're on OS X 10.3, because gcc3 is all you've got in that case.
      sudo gcc_select 3.3

    8. Now we install FFTW (version 2, which is what SciPy needs). FFTW is a library for doing Fourier transforms.
      curl -O http://www.fftw.org/fftw-2.1.5.tar.gz
      tar -xzf fftw-2.1.5.tar.gz
      cd fftw-2.1.5
      ./configure
      make
      sudo make install
      cd ..

    9. Now we get the Fortran compiler (this is OS X-specific). We'll just grab a pre-built binary of the compiler, since even I agree that compiling a compiler is overkill.
      For PPC Macs:
      curl -O http://easynews.dl.sourceforge.net/sourceforge/hpc/g77v3.4-bin.tar.gz
      sudo tar -C / -xzf g77v3.4-bin.tar.gz

      For Intel Macs:
      curl -O http://easynews.dl.sourceforge.net/sourceforge/hpc/gfortran-intel-bin.tar.gz
      sudo tar -C / -xzf gfortran-intel-bin.tar.gz

    10. Now we compile SciPy.
      cd Python
      svn co http://svn.scipy.org/svn/scipy/trunk scipy
      cd scipy
      python setup.py build
      sudo python setup.py install
      cd ../..

      If for some reason you have both g77 and gfortran installed, and want to choose which one to use, the build line looks like:
      python setup.py config_fc --fcompiler=XXX build

      where XXX is gnu (for g77) or gnu95 for gfortran.

    11. Now we can revert back to the default GCC version. (Skip on Intel Macs. Also skip on Macs running 10.3, since they don't have gcc4 anyway.)
      sudo gcc_select 4.0

    12. Now, install AquaTerm. This is a graphics terminal for Gnuplot to use that is very nice for OS X, and way better than using the X11 terminal, I promise. (It's anti-aliased, for example.)
      curl -O http://easynews.dl.sourceforge.net/sourceforge/aquaterm/AquaTerm1.0.0.dmg
      hdiutil attach AquaTerm1.0.0.dmg
      sudo installer -pkg /Volumes/AquaTerm/AquaTerm.pkg -target /
      hdiutil detach /Volumes/AquaTerm

    13. Now we grab a CVS version of Gnuplot (the CVS has better OS X support). The Sourceforge CVS servers are sometimes overloaded, so you might need to repeat these commands a few times until they succeed. (Thanks Brian for pointing this out.)
      cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/gnuplot login
      [press enter to leave a blank password]
      cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/gnuplot co -P gnuplot
      cd gnuplot
      ./prepare
      ./configure
      make
      sudo make install
      cd ..

    14. Last step! Install Gnuplot.py. Now, this library is designed for Numeric, which was NumPy's predecessor. So to make things work, we'll use a NumPy tool to fix the Gnuplot.py code, and do some search-and-replace to fix things that the numpy tool doesn't fix currently. (Ugh!) Note that in an earlier version of these instructions, the python command line was python -c 'import numpy.lib.convertcode; numpy.lib.convertcode.convertall()'. This has changed with recent versions of numpy.
      cd Python
      curl -O http://easynews.dl.sourceforge.net/sourceforge/gnuplot-py/gnuplot-py-1.7.tar.gz
      tar -xzf gnuplot-py-1.7.tar.gz
      cd gnuplot-py-1.7
      python -c 'import numpy.oldnumeric.alter_code1; numpy.oldnumeric.alter_code1.convertall()'
      sed -i -e "s/Float32/float32/g" *.py
      sed -i -e "s/Float64/float64/g" *.py
      sed -i -e "s/Float/float_/g" *.py
      sudo python setup.py install
      cd ../..

    OK! Now how to use all of this stuff? Well... that's for a later post. Here's a hint, and a test to make sure everything works.
    Run ipython, and then try the following:
    import numpy, scipy, Gnuplot
    num_steps = 20
    range = numpy.linspace(0, 2 * numpy.pi, num_steps)
    sin = numpy.sin(range)
    print range
    print sin

    g = Gnuplot.Gnuplot()
    # Gnuplot expects list of [x, y] pairs, not [x-list, y-list]
    points = numpy.transpose([range, sin])
    g.plot(points)
    # Feed text straight to Gnuplot to control plotting style.
    g('set data style linespoints')
    g.replot()

    # Fit a spline to the original data and interpolate the curve with finer spacing
    import scipy.interpolate
    spline = scipy.interpolate.InterpolatedUnivariateSpline(x = range, y = sin)
    more_steps = 200
    new_range = numpy.linspace(0, 2 * numpy.pi, more_steps)
    interpolated = spline(new_range)
    g.plot(numpy.transpose([new_range, interpolated]))

    true_values = numpy.sin(new_range)
    error = interpolated - true_values
    g.plot(numpy.transpose([new_range, error]))

    print "RMS Error = ", numpy.sqrt((error**2).mean())

    5 Comments:

    • nice... worked great for me.

      a couple of minor points:

      cvs -z3... took a few tries. no error message, it just didn't do anything. server was down or something.

      ipython needs to be added to $PATH at the end

      By Blogger brian, at 8:33 PM  

    • cvs commands: Yeah, the sourceforge cvs server can be rather busy sometimes. Just keep trying the same command until it works.

      ipython $PATH: Really? the IPython install should have put it into /usr/local/bin, which we added to the path as the first step. Hmm.

      By Blogger zachrahan, at 10:03 PM  

    • oops -- ipython *is* in /usr/local/bin... must've not sourced .tcshrc for that term or something.

      By Blogger brian, at 10:19 PM  

    • Hi,

      Just wanted to let you know, since you mention being involved w/ gnuplot .. and also since it's not working on my MacBookPro:

      I've checked gnuplot out from CVS and every time I run the `./prepare` command, it fails with this:

      Some part of the preparation process failed.
      Please refer to INSTALL for details.

      Very descriptive, idn't?

      Before that, it dumps this (I have a feeling this will come through wonky, but wet the heck):

      make: `Makefile.am' is up to date.
      make: `Makefile.am' is up to date.
      make: `Makefile.am' is up to date.
      make: `Makefile.am' is up to date.
      make: `Makefile.am' is up to date.
      src/Makefile.am:42: gnuplot_SOURCES was already defined in condition TRUE, which implies condition INCLUDE_BINARY_C_TRUE

      gnuplot_SOURCES (User, where = src/Makefile.am:42) +=
      {
      TRUE => alloc.c alloc.h ansichek.h axis.c axis.h \
      breaders.c breaders.h bitmap.c bitmap.h color.c color.h command.c \
      command.h contour.c contour.h datafile.c datafile.h dynarray.c dynarray.h \
      eval.c eval.h fit.c fit.h gadgets.c gadgets.h getcolor.c getcolor.h gp_hist.h \
      gp_time.h gp_types.h gplt_x11.h graph3d.c graph3d.h graphics.c graphics.h \
      help.c help.h hidden3d.c hidden3d.h history.c internal.c internal.h \
      interpol.c interpol.h matrix.c matrix.h misc.c misc.h mouse.c mouse.h \
      mousecmn.h national.h parse.c parse.h plot.c plot.h plot2d.c plot2d.h \
      plot3d.c plot3d.h pm3d.c pm3d.h readline.c readline.h save.c \
      save.h scanner.c scanner.h set.c setshow.h show.c specfun.c specfun.h \
      standard.c standard.h stdfn.c stdfn.h syscfg.h tables.c tables.h \
      template.h term_api.h term.c term.h time.c unset.c util.c util.h \
      util3d.c util3d.h variable.c variable.h version.c version.h

      }
      src/Makefile.am:42: gnuplot_SOURCES was already defined in condition TRUE, which implies condition BUILD_WXWIDGETS_TRUE

      gnuplot_SOURCES (User, where = src/Makefile.am:42) +=
      {
      INCLUDE_BINARY_C_TRUE => binary.c
      TRUE => alloc.c alloc.h ansichek.h axis.c axis.h \
      breaders.c breaders.h bitmap.c bitmap.h color.c color.h command.c \
      command.h contour.c contour.h datafile.c datafile.h dynarray.c dynarray.h \
      eval.c eval.h fit.c fit.h gadgets.c gadgets.h getcolor.c getcolor.h gp_hist.h \
      gp_time.h gp_types.h gplt_x11.h graph3d.c graph3d.h graphics.c graphics.h \
      help.c help.h hidden3d.c hidden3d.h history.c internal.c internal.h \
      interpol.c interpol.h matrix.c matrix.h misc.c misc.h mouse.c mouse.h \
      mousecmn.h national.h parse.c parse.h plot.c plot.h plot2d.c plot2d.h \
      plot3d.c plot3d.h pm3d.c pm3d.h readline.c readline.h save.c \
      save.h scanner.c scanner.h set.c setshow.h show.c specfun.c specfun.h \
      standard.c standard.h stdfn.c stdfn.h syscfg.h tables.c tables.h \
      template.h term_api.h term.c term.h time.c unset.c util.c util.h \
      util3d.c util3d.h variable.c variable.h version.c version.h

      }
      src/Makefile.am:42: warning: automake does not support conditional definition of gnuplot_SOURCES in gnuplot_SOURCES
      src/Makefile.am:42: warning: automake does not support conditional definition of gnuplot_SOURCES in gnuplot_SOURCES
      src/Makefile.am:42: warning: automake does not support conditional definition of gnuplot_SOURCES in gnuplot_SOURCES
      src/Makefile.am:42: warning: automake does not support conditional definition of gnuplot_SOURCES in gnuplot_SOURCES
      Use of uninitialized value in concatenation (.) or string at /usr/bin/automake line 8449.
      : am_gnuplot_OBJECTS was already defined in condition INCLUDE_BINARY_C_TRUE, which is implied by condition TRUE
      am_gnuplot_OBJECTS (Automake, where = undefined) =
      {
      BUILD_WXWIDGETS_TRUE => alloc$U.$(OBJEXT) axis$U.$(OBJEXT) breaders$U.$(OBJEXT) bitmap$U.$(OBJEXT) color$U.$(OBJEXT) command$U.$(OBJEXT) contour$U.$(OBJEXT) datafile$U.$(OBJEXT) dynarray$U.$(OBJEXT) eval$U.$(OBJEXT) fit$U.$(OBJEXT) gadgets$U.$(OBJEXT) getcolor$U.$(OBJEXT) graph3d$U.$(OBJEXT) graphics$U.$(OBJEXT) help$U.$(OBJEXT) hidden3d$U.$(OBJEXT) history$U.$(OBJEXT) internal$U.$(OBJEXT) interpol$U.$(OBJEXT) matrix$U.$(OBJEXT) misc$U.$(OBJEXT) mouse$U.$(OBJEXT) parse$U.$(OBJEXT) plot$U.$(OBJEXT) plot2d$U.$(OBJEXT) plot3d$U.$(OBJEXT) pm3d$U.$(OBJEXT) readline$U.$(OBJEXT) save$U.$(OBJEXT) scanner$U.$(OBJEXT) set$U.$(OBJEXT) show$U.$(OBJEXT) specfun$U.$(OBJEXT) standard$U.$(OBJEXT) stdfn$U.$(OBJEXT) tables$U.$(OBJEXT) term$U.$(OBJEXT) time$U.$(OBJEXT) unset$U.$(OBJEXT) util$U.$(OBJEXT) util3d$U.$(OBJEXT) variable$U.$(OBJEXT) version$U.$(OBJEXT) gp_cairo$U.$(OBJEXT) wxt_gui.$(OBJEXT)
      INCLUDE_BINARY_C_TRUE => binary$U.$(OBJEXT) alloc$U.$(OBJEXT) axis$U.$(OBJEXT) breaders$U.$(OBJEXT) bitmap$U.$(OBJEXT) color$U.$(OBJEXT) command$U.$(OBJEXT) contour$U.$(OBJEXT) datafile$U.$(OBJEXT) dynarray$U.$(OBJEXT) eval$U.$(OBJEXT) fit$U.$(OBJEXT) gadgets$U.$(OBJEXT) getcolor$U.$(OBJEXT) graph3d$U.$(OBJEXT) graphics$U.$(OBJEXT) help$U.$(OBJEXT) hidden3d$U.$(OBJEXT) history$U.$(OBJEXT) internal$U.$(OBJEXT) interpol$U.$(OBJEXT) matrix$U.$(OBJEXT) misc$U.$(OBJEXT) mouse$U.$(OBJEXT) parse$U.$(OBJEXT) plot$U.$(OBJEXT) plot2d$U.$(OBJEXT) plot3d$U.$(OBJEXT) pm3d$U.$(OBJEXT) readline$U.$(OBJEXT) save$U.$(OBJEXT) scanner$U.$(OBJEXT) set$U.$(OBJEXT) show$U.$(OBJEXT) specfun$U.$(OBJEXT) standard$U.$(OBJEXT) stdfn$U.$(OBJEXT) tables$U.$(OBJEXT) term$U.$(OBJEXT) time$U.$(OBJEXT) unset$U.$(OBJEXT) util$U.$(OBJEXT) util3d$U.$(OBJEXT) variable$U.$(OBJEXT) version$U.$(OBJEXT)
      }

      By Blogger Steve L., at 10:48 AM  

    • There's also activestate's DMG

      http://activestate.com/Products/ActivePython/

      By Blogger E, at 7:12 AM  

    Post a Comment

    << Home