Redesigning the Location Data Model
Rationale
The existing data model for storing address, email, phone and IM data aggregates these pieces of contact data under "locations" - i.e. Home, Work, Main, etc. Although this model has worked reasonably well, it has several drawbacks.
- Forced grouping of email, phone etc. with a specific physical address does not reflect how many people look at this data. A person's mobile phone is not really a "Home" phone OR a "Work" phone.
- The current model forces the primary phone, email etc. to be linked to a primary "location". This also often doesn't match reality.
- Querying address, phone, email etc requires an extra lever of join and slightly less efficient indexing.
The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data. A join table will still be used for any other objects which require address, phone, email data - e.g. Events, Domains... Each address, phone, etc carries it's own location type, along with an optional descriptive name. The primary address, phone, email address for each contact are independent of location "type" - so one contact might have "Home Address" as primary physical address, and "Work Phone" as primary phone.
Location
Current Table Schema - civicrm_location
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| entity_table |
varchar(64) |
NO |
MUL |
|
|
| entity_id |
int(10) unsigned |
NO |
|
|
|
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| name |
varchar(255) |
YES |
|
NULL |
|
New Schema - civicrm_location_block
Drop the civicrm_location table. Contacts are linked to their address, email and phone records using the contact_id foreign key in each table. A new table - civicrm_location_block - links objects other than contacts to address, email, etc. Current uses for this table are Event Location and Domain Location (add loc_block_id foreign key to these tables). Location blocks can be shared.
civicrm_loc_block
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| name |
varchar(255) |
YES |
|
NULL |
|
| address_1_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| address_2_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| email_1_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| email_2_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| phone_1_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| phone_2_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| im_1_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| im_2_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
civicrm_address
Current Table Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| location_id |
int(10) unsigned |
NO |
UNI |
|
|
| street_address |
varchar(96) |
YES |
|
NULL |
|
| street_number |
int(11) |
YES |
|
NULL |
|
| street_number_suffix |
varchar(8) |
YES |
|
NULL |
|
| street_number_predirectional |
varchar(8) |
YES |
|
NULL |
|
| street_name |
varchar(64) |
YES |
MUL |
NULL |
|
| street_type |
varchar(8) |
YES |
|
NULL |
|
| street_number_postdirectional |
varchar(8) |
YES |
|
NULL |
|
| street_unit |
varchar(16) |
YES |
|
NULL |
|
| supplemental_address_1 |
varchar(96) |
YES |
|
NULL |
|
| supplemental_address_2 |
varchar(96) |
YES |
|
NULL |
|
| supplemental_address_3 |
varchar(96) |
YES |
|
NULL |
|
| city |
varchar(64) |
YES |
MUL |
NULL |
|
| county_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| state_province_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| postal_code_suffix |
varchar(12) |
YES |
|
NULL |
|
| postal_code |
varchar(12) |
YES |
|
NULL |
|
| usps_adc |
varchar(32) |
YES |
|
NULL |
|
| country_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| geo_coord_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| geo_code_1 |
double |
YES |
|
NULL |
|
| geo_code_2 |
double |
YES |
|
NULL |
|
| timezone |
varchar(8) |
YES |
|
NULL |
|
| note |
varchar(255) |
YES |
|
NULL |
|
New Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| contact_id |
int(10) unsigned |
YES |
UNI |
|
|
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| is_billing |
tinyint(4) |
YES |
|
0 |
|
| name |
varchar(255) |
YES |
|
NULL |
|
| street_address |
varchar(96) |
YES |
|
NULL |
|
| street_number |
int(11) |
YES |
|
NULL |
|
| street_number_suffix |
varchar(8) |
YES |
|
NULL |
|
| street_number_predirectional |
varchar(8) |
YES |
|
NULL |
|
| street_name |
varchar(64) |
YES |
MUL |
NULL |
|
| street_type |
varchar(8) |
YES |
|
NULL |
|
| street_number_postdirectional |
varchar(8) |
YES |
|
NULL |
|
| street_unit |
varchar(16) |
YES |
|
NULL |
|
| supplemental_address_1 |
varchar(96) |
YES |
|
NULL |
|
| supplemental_address_2 |
varchar(96) |
YES |
|
NULL |
|
| supplemental_address_3 |
varchar(96) |
YES |
|
NULL |
|
| city |
varchar(64) |
YES |
MUL |
NULL |
|
| county_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| state_province_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| postal_code_suffix |
varchar(12) |
YES |
|
NULL |
|
| postal_code |
varchar(12) |
YES |
|
NULL |
|
| usps_adc |
varchar(32) |
YES |
|
NULL |
|
| country_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| geo_coord_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| geo_code_1 |
double |
YES |
|
NULL |
|
| geo_code_2 |
double |
YES |
|
NULL |
|
| timezone |
varchar(8) |
YES |
|
NULL |
|
civicrm_email
Current Table Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| location_id |
int(10) unsigned |
NO |
MUL |
|
|
| email |
varchar(64) |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| on_hold |
tinyint(4) |
NO |
|
0 |
|
| hold_date |
datetime |
YES |
|
NULL |
|
| reset_date |
datetime |
YES |
|
NULL |
|
New Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| contact_id |
int(10) unsigned |
YES |
UNI |
|
|
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| is_billing |
tinyint(4) |
YES |
|
0 |
|
| name |
varchar(255) |
YES |
|
NULL |
|
| email |
varchar(64) |
YES |
MUL |
NULL |
|
| on_hold |
tinyint(4) |
NO |
|
0 |
|
| hold_date |
datetime |
YES |
|
NULL |
|
| reset_date |
datetime |
YES |
|
NULL |
|
civicrm_phone
Current Table Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| location_id |
int(10) unsigned |
NO |
MUL |
|
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| mobile_provider_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| phone |
varchar(32) |
YES |
|
NULL |
|
| phone_type |
enum('Phone','Mobile','Fax','Pager') |
YES |
|
NULL |
|
New Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| contact_id |
int(10) unsigned |
YES |
UNI |
|
|
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| is_billing |
tinyint(4) |
YES |
|
0 |
|
| name |
varchar(255) |
YES |
|
NULL |
|
| mobile_provider_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| phone |
varchar(32) |
YES |
|
NULL |
|
| phone_type |
enum('Phone','Mobile','Fax','Pager') |
YES |
|
NULL |
|
civicrm_im
Current Table Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| location_id |
int(10) unsigned |
NO |
MUL |
|
|
| name |
varchar(64) |
YES |
|
NULL |
|
| provider_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
New Schema
| Field |
Type |
Null |
Key |
Default |
Extra |
| id |
int(10) unsigned |
NO |
PRI |
NULL |
auto_increment |
| contact_id |
int(10) unsigned |
YES |
UNI |
|
|
| location_type_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
| is_billing |
tinyint(4) |
YES |
|
0 |
|
| name |
varchar(255) |
YES |
|
NULL |
|
| name |
varchar(64) |
YES |
|
NULL |
|
| provider_id |
int(10) unsigned |
YES |
MUL |
NULL |
|
| is_primary |
tinyint(4) |
YES |
|
0 |
|
Issues to Discuss
- Is the location block structure with fixed 2 columns for address 1/2 etc a good model?
- Do we need is_primary or the addreses, phones etc. in location_block.
- What is purpose of civicrm_group_contact.location_id ?
- Do we ever use civicrm_address.note (I've removed it from table).
- Is having a separate is_billing flag a good idea, or continue to use location_type = Billing?
I assume the two-entry limit is for non-contact addresses, right? I'm polling 17 email accounts and I have seven accounts in my IM client...
I'm wondering whether the civicrm_contact table couldn't be extended with an primary_email, primary_phone, etc. columns instead of the is_primiary ones; we could get rid of the 'only one per contact can be primary' logic.
+1 Multiple locations just adds rarely needed complexity for mid level users.
"The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data."
That matches our experience.
Neil
I've been thinking about this a bit
I think it might be easier to keep the is_primary and is_billing field in the tables (since we can potentially use it in other scenarios too) and then use a "Primary Contact View" and "Billing Contact View" which basically gives us a virtual: Contact + Address + Phone + EMail table for both cases.
I dont think we are saving in logic / space in either case. We'll still require a join and still need to keep track of which address / email / phone is primary for a contact
What are civicrm_address.name, civicrm_email.name, civicrm_phone.name ??? I might call my phone names sometimes, but it doesn't need to be tracked.
Why are there two name fields in civicrm_im?
I'm also not quite sure that I understanding the need for civicrm_location_block. Can't Event Location and Domain Location just be linked to contacts? (I've never used CiviEvent so forgive my ignorance).
I have to say that the prospect of a denormalised data model is a big turn off. If MySQL 5 is required, couldn't some of the "extra" joins be put into SQL views? If updating or deleting rows is a problem and you need to deal with the SQL view's results, just use triggers to modify the data in underlying tables.
Having two apparently identical and redundant foreign keys in the loc_block table doubles the indexing required. An index in MySQL is made for each field when they're foreign keys. Putting location_type_id in every contact table adds many additional indexes on the existing model. I can't speak to the performance cost of this, but indexes can be expensive. Often they want to be cached in memory.
In the current code base, location block is used in exactly 2 places: civicrm_domain and civicrm_event. As such for "most" systems the number of entries in location block will probably be just a few (or a couple of hundred if LOTS of events). However the number of contacts is way more, and hence we think the tradeoff is justified
Ok. Thanks for the clarity.
I agree that breaking out the phone/email fields so they're not tied to a specific address has value. Perhaps expand the ability to classify the type of phone (currently just phone/fax/mobile/pager, as I recall) and email, and give admin ability to add/modify that dropdown list. That should provide the necessary classification flexibility. One problem we find with the current structure is that when someone makes a purchase, their email is tracked under the billing address, which means you're often tracking the same email in two places (since billing address is rarely the primary address). Since billing email and address are a special case, maybe there should be a way to multi-classify an address? For example, mark a single address as home and billing?
The current flexibility in having multiple addresses, flagging them by type, and having a text field to add custom names, is invaluable. It's flexibility and functionality beyond many other systems we've used, and I would not want to lose that. The method of flagging one address as primary is also very well constructed. The proposed schema changes seem to make sense. Seems like it will simplify querying to have is_primary part of the actual address table/record to which it applies, without any harm to data integrity/normalization.
The location model described on this page has been implemented as of 2.0 - so probably best to discuss suggestions for additional improvements on the forums. I think being able to assign more than one location "type" (Home + Billing) to an address, email, etc. could quite useful - so perhaps you could sketch this out in more detail on a forum post or blog post - and include thoughts about how we might change the data model to support this as well as how the user interface for contributors, event registrants etc. might change.