Monday, April 27, 2015

Python Split() Function

Splitsville

One of the most common things I do in GIS scripting is parsing data from one field to two fields or from an output delimited text file from some other data storage system to a usable CSV or delimited text file that I can import into GIS format.  Enter the split function.  I use the split function almost as much as Harry Potter uses the Expelliarmus charm.

It's fairly simple and crazy useful.  You can use it in a stand alone script or in the field calculator.  The syntax is like this:

'string'.split('delimiter', Max) or stringVariable.split('delimeter', Max)

The function returns a list with elements that result from the split.  Both of the parameters are optional.  The  'delimiter' parameter is what you want to use as the delimiter for the string.  If left out the function uses any whitespace as the delimiter (spaces, tabs, newlines).  The Max parameter is the maximum number of times you want the function to perform the split.

So if we want to split a ZIP code in this format '12345-6789' into its component parts you could use:

myZip = '12345-6789'
zipComponents = myZip.split('-') #output is a list
zip5 = zipComponents[0]
zip4 = zipComponents[1]

You could accomplish the above with less code using string slicing but that's for another post.

The Max parameter is a bit misleading - if I specify a maximum split parameter the list doesn't just contain that many elements, it actually has two valid elements and a third element with the remainder of the string.  So if I run the following:

testString = '1-2-3-4-5'
testList = testString.split('-', 2)

The result would be:

['1', '2', '3-4-5']

I use the split function to parse a pipe ('|') delimited text file to drop unneeded elements and split elements I want in separate fields.  Very useful.

Good luck with using the split function.  I use it all the time with very satisfactory results.  EXPELLIARMUS!

And any Python gurus who would like to comment and make the above code more python-y, please do, I'm always trying to become less of a hack.


2 comments:

  1. Hi Jeff, I often use zip5, zip4 = myZip.split('-') because it's shorter than [] notation. BTW, It's seems that you're using Expelliarmus to disarm your strings ;)

    ReplyDelete
    Replies
    1. Ah, thanks for the tip Kuba. Just what I asked for - python pros love one liners. :)

      Delete