From Howard Johnson to the crm-dev list
Per our discussion on the con call this aft, attached is a very small sample from a local census file here in Mass. It's referred to as a 'Resident Extract' because it's a filtered extract of the municipal database kept by the Election Commission, which incorporates the Voter Registration file as well as the results of the annual resident census. It's filtered, because certain portions of the file, such as data on children under the age of 18, cannot legally be provided to unauthorized parties, whereas the basic voter info and other data are considered public information. These things vary by state and local city and town, but in Mass a form is sent by the city or town every year to each address of record (ok, I don't remember the exact genesis of the address file, but I think it's based on the info in the master real estate/tax database which records every building and real estate parcel in the city). You are required by law to fill it out and send it in, under penalty of being drawn and quartered in the town square. No, I don't have a copy of the form handy even though I just filled it out a month or two ago. But it captures lots of interesting information including names of the individuals living at the address, dogs, own or rent, etc etc.
The readme file is the record layout as supplied by the city, which BTW doesn't match the actual records exactly (typical); you'll see a phone number field in the data, for example, that isn't in the record layout as supplied. Also, the dataset should have included the fields for Congressional District, Senate District, State Senate District, and State Rep District but they're missing - I'll find another sample that includes them.
Of most interest here, however, are the ResID and HOHID fields. ResID is the unique ResidentID for the individual - if they have registered to vote, this is also their VoterID. If threre is only one person at the address, their HOHID is the same as their ResID. Additional folks at the same address who are listed on the same census form for that address get the same HOHID as the first person entered. How is the decision made as to which ResID to use for the HOHID? That's something I've been meaning to ask the Secretary of the Election Commission, but it's probably done by the data entry clerk on an "intuitive" basis.
Just a further comment. One of the most frequent problems we encounter in political and non-profit advocacy and fundraising is
- related groups or candidates trying to share data, but one campaign entered "J. Smith 123 Brookvale Rd" and another campaign entered "John Smith 123 Brookvale Dr" and how do we figure out if they're really the same person and which one is correct
- data entry folks do the same kind of thing, creating different entries for the receipt of a check and the reservation for an event.
Having the actual person (J Smith) enter the data themselves is not necessarily going to help you, by the way. What does help is having the local voter and or census file loaded, since it provides a standard and authoritative source for names of individuals, their current address, etc. that we can use to sort out the dupes, dead people and transients. It also has a standard set of street names and abbreviations that are at least recognized by the city or town or state, and it gives us things like Ward and Precinct which most people don't even know themselves. It's now possible to check whether there actually are two different valid addresses, 123 Brookvale Rd and 123 Brookvale Dr. So our standard practice is to load the voter/census file(s) for the relevant geography, and have folks first try to find the person in the existing data, with the usual search on last name, address, or whatever. If the person is already in the db, we've got a standardized record for them and also BTW a unique ID (ResID or some such) that maps to additional information, including things like voter history.
I mention all this because it implies a need for very flexible and robust import/export capabilities, among other things. I'd be happy to discuss further as desired, or furnish additional sample data.
