Geography 353 Cartography and Visualization
...to Geog 353 Main Page and Course Description
...to Geog 353 Syllabus
...to Geog 353 Course Schedule and Lecture Outlines
...to Geog 353 Laboratory Information and Student Projects
Geog 353 Lab 3: Data Processing: Part 1: Excel
Update: 9/16/19
- 50 points
- ASSIGNED: Monday September 23
- DUE: Monday September 30
The data (historical population figures, by county, for your state or states)
downloaded from the WWW must be processed in order to use in ArcGIS. There are
two basic data processing tasks (and two labs) involved in this project.
In this first data processing exercise, you must transform your data (
text files and Excel files) into a single file in a format you can use in
ArcGIS. You will use a spreadsheet (Excel and or Open Office) to perform this exercise,
generating a DBF file. Exercise 5 (after you learn a bit about
ArcGIS in Exercise 4) will entail importing the file generated in this
exercise into ArcGIS, where you will generate
the data on population change that you will eventually map.
Lab 3 Goal: Properly formatted DBF file, entry in your Digital Lab Blog
about processing your data. Please be prepared to proudly show the instructor
your properly formatted DBF file (with county populations from 1900 to 2017)
when this exercise is due.
The Details:
1) Make sure you have all your census data: the 1900-1990 data (text file) and the
2000 and 2010 and 2017 data (Excel files) for your
state or states. Please don't use other forms of the data (PDF, etc.) for
this exercise. Ask Krygier if you don't think you have the right data
format. Keep these files in your Data folder. My Data folder contains these files for my state:
- wi190090.txt (1900-1990 Census numbers)
- CO-EST00INR-01-55.xls (2000 Census numbers)
- PEP_2015_PEPANNRES.xls (2010 Census numbers & 2017 estimates)
2) Open Microsoft Excel on your computer. Select Open Other Workbooks
(lower left) then Computer then Browse to your 1900-90 population data file
(.txt). Select All Files to see your txt file. If you have more than one state,
work with one state data file at a time, and combine them when all are cleaned up and in
the correct format file.
3) Open your file. A wizard window should pop up. This is a tool for turning a
simple text file into an Excel file.
- Original data type is Fixed Width (select if you need too, but you shouldn't),
then hit next.
- Set field widths (column breaks): click and add breaks (vertical lines) between data columns
that don't have these breaks: eg., between FIPS and 1990, 1990 and 1980, etc.
You can adjust the break lines by clicking and dragging the vertical lines. Do
make sure you don't lop off any numbers! That would be bad. Hit
next.
- Next window: column data format should be general...then hit
finish.
- Save as... an Excel file (.xlsx) and modify the file name to reflect
the inclusion of the 2000 and 2010 and 2017 data (mine is wi1900-2017.xlsx. Save
in your Data folder. So now my Data folder has these folders in it (yours won't be exactly the same):
- wi190090.txt
- CO-EST00INR-01-55.xls (2000 Census numbers)
- PEP_2017_PEPANNRES.xls (2010 Census numbers & 2017 estimates)
- wi1900-2017.xlsx (1900-2010 Census numbers + 2017 estimates)
4) Look at your data in Excel. Holy haunch of Helmetshrike!
Ponder this terrifying state of affairs for a moment. What are the implications of
free data from the WWW? Converting data is often a messy and horrifying
process. You have a relatively small file or files to deal with for your one
(or several) states. What if you had to process data like this for all the
counties in the US? Holy crap!
You now have to clean up your file, before you convert it to a format ArcGIS likes.
A sample of how your data should look is included below. You will have to
cut and paste and rearrange things to get it to look like it should. Beware of a
few things:
- Be very careful when cutting/pasting/deleting: if you screw up the data
at this point and don't notice, you probably won't notice until later, when you
will have to come back and start the project over from this point. Look at the
data and think about what you are doing. Look at the sample below and make sure
you carefully clean up the file. Refer back to the original text version of
your data to make sure you adjusted your data correctly.
- The FIPS code column: this is vital. It is a numerical identifier
for each county in the US and ArcGIS needs this information to link your
thematic data to the base map. The FIPS code should be 4 or 5 digits, and the
first FIPS code should end with 00 (the state FIPS code); the first county
should end with 01, the second county should end with 03 etc. In some cases,
due to the wonky nature of the original data file, the word FIPS may be over the
first column of population (1990), and 1990 may be over the 1980 population
figures and so on. Refer back to the original copy of your data to make sure you
adjust this problem correctly.
- Exciting Task: Please head back to the US Census WWW site and locate information about
FIPS codes and why they are so important, using the superb searching skills you
honed in Lab 1. Include what you find, along with a hyper link to the
information at the Census WWW site, in your lab blog.
- In most cases, the FIPS column, four columns of population data (1990, 1980
1970 1960) and the county names are on the top of your file, with the rest of
the population data below. Clean up this part of the data first, then (using
Insert Columns) and Copy and Paste to move the data so it
is like the sample file below.
- The county names column might be screwed up, split up so that the name of
the county (Waukesha) and the word "County" are in different columns. All you
need is the county name, so delete the column with the word "County" in it. You
can find "County" and replace it with nothing (using the find and replace
function in Excel). If the county name and the word "County" are together,
delete the word "county" from each cell. Also, if a county has more than one
word in its name (Fond du Lac) it may be split up. It is important to
get the complete county name in the county name column, so retype it (carefully,
paying attention to capitalization).
- Save often.
- Use the Delete command (under the Edit menu) to get rid of
stuff you don't need, such as the US and state population and junk at the top of
the file.
- Use copy and paste to move data around (don't retype - typos are bad)
- Please Delete the population figures for the US and the State
totals.
- Finally: carefully examine your data and compare it to the original
file of data you found it at the Census. If the numbers get screwed up at this
point and you don't notice, you will certainly notice later (and will have to
come back to this point start over).
5) A Few More important Modifications of your data for ArcGIS: Things must needs be
set up just so:
- Type a single apostrophe before every FIPS code number and hit
return: this lets Excel know to save the FIPS codes as text (rather than
numbers), and that is what ArcGIS likes when working with dBase files. After
you type in the apostrophe, it will not show up on the page, but your FIPS
number should left justify. If you are an fancy Excel connoisseur, you might think you can
just format the column as text instead of typing all these apostrophes: go ahead
and try, sucker. There is a way to do this tedious task that will work, and if you
figure it out, document it in your lab log.
- If your FIPS codes are only 4 numbers long, put a 0 (zero) before the
four digit number.
- The top row on the page, the header row, needs to have at least one
alphabetical character in the cell. The FIPS and County Name columns are
OK, but put a Y before each of the years (eg., Y1990). This, again, is what
ArcGIS likes.
- Counties that have no values for a particular year (eg., they were not a
county by that year) often show up with three dashes in the data. Badness!
Change these dashes to 9999999999 (ten nines). If for some reason you
find a cell with a zero (0) change that to 9999999999 also. This will help us
later on when we are calculating population change and classifying the data.
- Make sure you have any and all text formatting removed (no Bold, or
Italics, or Underlined text).
6) Add the Census 2000 and 2010 numbers as well as the 2017 estimates to your file. Add them as
new columns called Y2000, Y2010 and Y2017 - all adjacent to the Y1990
data in the Excel spreadsheet (as above). Lets assume you are using an Excel
file (.xls) downloaded from the Census Bureau: Open that file in Excel. Copy
the 2000 column but only the population of your first to your last county. Paste
it in your file with the other historical data. Figure out how to remove the
column outlines (if there are any) and commas. Ask your instructor
for help if you need it.
This is what your file should look like... (but with 2017 estimated data instead of 2013).
7) Really Important! More than one state? Create a cleaned up file for each, then open
the alphabetically 1st state and save it under a different name (as an Excel
file). Then open, copy, and paste your other states into the new file in
alphabetical order. Keep one set of headers (Y2017, Y2010, etc.) at the top of
the file. Keep the individual state text files. Again, ask for help if need be.
8) And...finally...save the file as a DBF4 or Dbase4 file:
- under the File menu select Save As...
- save as type DBF4 (in your Data folder).
- ... but where is the DBF choice in the list...? it's gone from this latest
wonderful version of Excel! Not excellent.
- Source
You should always be ready for some kind of suck when working with
computers and software. You can quote me on that.
- A few versions back, Excel removed the ability to export DBF
files.
- But there are always other options.
9) Don't panic. Document in your lab log how you did not panic.
10) Open your cleaned up file (statename.txt) in Excel. Save As... an
Excel spreadsheet format file (.xls) in your Data folder.
11) An alternative to Microsoft Office (Word, Excel, etc.) is Open Office. Open Office is open
source and free and well worth a look as an alternative to Microsoft products.
Open the software (shortcut on desktop).
12) Under the File menu select Open and navigate to your
.xls file of state (or states) population data. Open the file, then from the
File menu select Save As... and select DBF as the file
format. Save the file in your Data folder. For those of you with more
than one state, create a DBF file for the combined state file.
Your file should now work in ArcGIS (and we will certainly find out in Lab
5). Be aware that you may have to come back to Open Office to fix problems
in the data that will only show up later. Those of you with one state should
have four versions of your data safely stowed in your Data folder:
- the original downloaded 1900-1990 data text file
- mine is: wi190090.txt
- the two original downloaded 2000, 2010 and 2017 data excel files
- mine are: CO-EST00INR-01-55.xls (2000 Census numbers) and PEP_2017_PEPANNRES.xls (2010 Census numbers & 2017 estimates)
- the converted and cleaned up file with 2000 and 2010 and 2017 data added:
- mine is: wi1900-2017.xlsx
- and the xls file opened in Open Office and saved as a DBF, ready to go into
ArcGIS
- mine is: wi1900-2017.dbf
Those of you with more than one state should
have a few extra files in you Data folder:
- one original downloaded text file for each state
- the converted and cleaned up text file for each state
- a combined text file with the data for all your states
- the combined text file saved as an Excel (xls) file
- and the xls file opened in Open Office and saved as a DBF, ready to go into
ArcGIS
Make sure you have copies of these files backed up.
Yeesh! Jot down a few inspired paragraphs about the joy of processing data
for such mapping projects in your digital lab blog. What if you had a
really big project, like mapping all the counties in the entire US?
When you are done, by end of class on due date: email me the link to your blog entry for Lab 3. This should include
- comments on FIPS codes (step 4)
- comments on processing data, end of this exercise
- email, as an attachment, the Excel file with your combined 1900-2010 Census population data
E-mail: jbkrygier@owu.edu
...to Geog 353 Main Page and Course Description
...to krygier teaching page.
...to krygier top page.
OWU Home
OWU Geology and Geography Home