TLDR:Convert your problem file with Sublime Text by opening the file and using “Save with encoding” as
utf-8 . Alternatively, use
iconv -t UTF-8//TRANSLIT -c Zip_Zhvi_SingleFamilyResidence.csv > new_file.csv
I wanted to parse the housing data from Zillow at their research page . Zip code is a great measure of single family home real estate values.
However, when I download this data set as “Zip_Zhvi_SingleFamilyResidence.csv”, I could not simply load this data into
This last line seemed like the clue:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 4: invalid continuation byte
Using a Mac, we can use
file -I <file_name>
Oh, great! its “us-ascii”, we just pass that
Oh maybe, I need to specify the encoding I want. WHY PANDAS, WHY!?
Some encoding error has occurred, maybe because you accidentally opened Excel before opening
ipython or Zillow saves in a crazy format.
Let’s use the *nix program
iconv to convert the file. According to the man page (
man iconv ), “The iconv program converts text form one encoding to another encoding. Great!
Let’s use this.
iconv -f us-ascii -t utf-8 < Zip_Zhvi_SingleFamilyResidence.csv > new_zip_code_file.csv
iconv , that’s your only job… you know, unix philosophy, one program, one job done well etc etc.
Turns out if you use “//TRANSLIT” appended to the encoding, characters are transliterated when needed and
possible ( man page )
Solution 1 –
> iconv -t UTF-8//TRANSLIT -c Zip_Zhvi_SingleFamilyResidence.csv > new_file.csv
> mv new_file.csv Zip_Zhvi_SingleFamilyResidence.csv
Is there a better free editor than Sublime? Be a good citizen and buy your license.
Step 1: Open your file in Sublime Text
Step 2: Save with Encoding > UTF-8
read_csv to your hearts desire
ipython> data = pd.read_csv("new_file.csv")