Reply to comment

A CSV Reader for UTF-8 Files

I was using the Places to CSV plugin to dump my website visit logs, and needed to ingest this data. The problem was, the fields were in UTF-8... but when I tried to decode each line from str to unicode, the csv library wouldn't do its magic. The fix was to let the csv.reader read in the data as str (no encoding, just bytes), and then decode each field as utf-8, returning a unicode string. These unicode strings were returned as a list, just like csv.reader does.

This function is a generator that wraps the csv.reader() method. I'm posting it because it's a practical example of a generator and a list comprehension.

# Reads a CSV file with UTF-8 cells
def utf8_csv_reader(csvdata, **kwargs):
    csv_reader = csv.reader(csvdata, **kwargs)
    for row in csv_reader:
        yield [cell.decode('utf-8') for cell in row]

Reply

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.

More information about formatting options

4 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.