Wednesday, March 23, 2011

Using "with" statement for CSV files in Python

Is it possible to use the with statement directly with CSV files? It seems natural to be able to do something like this:

import csv
with csv.reader(open("myfile.csv")) as reader:
    # do things with reader

But csv.reader doesn't provide the __enter__ and __exit__ methods, so this doesn't work. I can however do it in two steps:

import csv
with open("myfile.csv") as f:
    reader = csv.reader(f)
    # do things with reader

Is this second way the ideal way to do it? Why wouldn't they make csv.reader directly compatible with the with statement?

From stackoverflow
  • Yes. The second way is correct.

    As to why? Who ever knows. You're right, it's probably an easy change. It's not as high priority as other things.

    You can easily make your own patch kit and submit it.

    Kiv : Okay, perhaps I'll do that. Thanks.
    cdleary : I'm not sure this is a "patchable offense". The CSV reader is meant to act on an open file object and provide an iterable of rows -- there's no real resource acquisition and release going on. If you want to get out of the with block quickly, do rows = list(csv.reader(file_)) and use rows outside it.
    S.Lott : @cdleary: I think that the response to with does not have to reflect ACTUAL resource use, but only "resource"-like. All the "Data Compression" and "File Format" library modules should do this for simple consistency.
  • The primary use of with statement is an exception-safe cleanup of an object used in the statement. with makes sure that files are closed, locks are released, contexts are restored, etc.

    Does cvs.reader have things to cleanup in case of exception?

    I'd go with:

    with open("myfile.csv") as f:
        for row in csv.reader(f):
            # process row
    

    You don't need to submit the patch to use cvs.reader and with statement together.

    import contextlib
    

    Help on function contextmanager in module contextlib:

    contextmanager(func)
        @contextmanager decorator.
    

    Typical usage:

        @contextmanager
        def some_generator(<arguments>):
            <setup>
            try:
                yield <value>
            finally:
                <cleanup>
    

    This makes this:

        with some_generator(<arguments>) as <variable>:
            <body>
    

    equivalent to this:

        <setup>
        try:
            <variable> = <value>
            <body>
        finally:
            <cleanup>
    

    Here's a concrete example how I've used it -- curses_screen.

    S.Lott : @J.F.Sebastion: I think that all the "Data Compression" and "File Format" library modules should directly support with.
    J.F. Sebastian : @S.Lott: I agree that standard library should create context managers itself, where it is applicable. In the case of `csv` module it could be `reader = csv.open(path)` but *not* `reader = csv.reader(iterable)`.
    technomalogical : @J.F. Sebastian +1 for in-depth explanation on how one might use contextlib to accomplish this, but check the response from @bluce for an actual implementation to use with csv.
    J.F. Sebastian : @technomalogical: Such implementation is interesting only if it is in stdlib. @bluce's implemenation can't be in stdlib. Its interface is too simplictic (some things to consider: encoding, buffering, restrictions on file-mode values, etc). It saves 1 line of code but complicates interface.
  • It's easy to create what you want using a generator function:

    
    import csv
    from contextlib import contextmanager
    
    @contextmanager
    def opencsv(path):
       yield csv.reader(open(path))
    
    with opencsv("myfile.csv") as reader:
       # do stuff with your csvreader
    
  • The problem is csv.reader doesn't really manage a context. It can accept any iterable, not just a file. Therefore it doesn't call close on its input (incidentally if it did you could use contextlib.closing). So it's not obvious what context support for csv.reader would actually do.

0 comments:

Post a Comment