Dashboard > CRM > ... > Data Model Changes for 2.x > Location Data
Location Data Log In | Sign Up   View a printable version of the current page.

Added by David Greenberg , last edited by David Greenberg on Oct 18, 2007  (view change)
Labels: 
(None)

Redesigning the Location Data Model

Rationale

The existing data model for storing address, email, phone and IM data aggregates these pieces of contact data under "locations" - i.e. Home, Work, Main, etc. Although this model has worked reasonably well, it has several drawbacks.

  • Forced grouping of email, phone etc. with a specific physical address does not reflect how many people look at this data. A person's mobile phone is not really a "Home" phone OR a "Work" phone.
  • The current model forces the primary phone, email etc. to be linked to a primary "location". This also often doesn't match reality.
  • Querying address, phone, email etc requires an extra lever of join and slightly less efficient indexing.

The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data. A join table will still be used for any other objects which require address, phone, email data - e.g. Events, Domains... Each address, phone, etc carries it's own location type, along with an optional descriptive name. The primary address, phone, email address for each contact are independent of location "type" - so one contact might have "Home Address" as primary physical address, and "Work Phone" as primary phone.

Location

Current Table Schema - civicrm_location

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
entity_table varchar(64) NO MUL    
entity_id int(10) unsigned NO      
location_type_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  
name varchar(255) YES   NULL  

New Schema - civicrm_location_block

Drop the civicrm_location table. Contacts are linked to their address, email and phone records using the contact_id foreign key in each table. A new table - civicrm_location_block - links objects other than contacts to address, email, etc. Current uses for this table are Event Location and Domain Location (add loc_block_id foreign key to these tables). Location blocks can be shared.

civicrm_loc_block

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
location_type_id int(10) unsigned YES MUL NULL  
name varchar(255) YES   NULL  
address_1_id int(10) unsigned YES MUL NULL  
address_2_id int(10) unsigned YES MUL NULL  
email_1_id int(10) unsigned YES MUL NULL  
email_2_id int(10) unsigned YES MUL NULL  
phone_1_id int(10) unsigned YES MUL NULL  
phone_2_id int(10) unsigned YES MUL NULL  
im_1_id int(10) unsigned YES MUL NULL  
im_2_id int(10) unsigned YES MUL NULL  

civicrm_address

Current Table Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
location_id int(10) unsigned NO UNI    
street_address varchar(96) YES   NULL  
street_number int(11) YES   NULL  
street_number_suffix varchar(8) YES   NULL  
street_number_predirectional varchar(8) YES   NULL  
street_name varchar(64) YES MUL NULL  
street_type varchar(8) YES   NULL  
street_number_postdirectional varchar(8) YES   NULL  
street_unit varchar(16) YES   NULL  
supplemental_address_1 varchar(96) YES   NULL  
supplemental_address_2 varchar(96) YES   NULL  
supplemental_address_3 varchar(96) YES   NULL  
city varchar(64) YES MUL NULL  
county_id int(10) unsigned YES MUL NULL  
state_province_id int(10) unsigned YES MUL NULL  
postal_code_suffix varchar(12) YES   NULL  
postal_code varchar(12) YES   NULL  
usps_adc varchar(32) YES   NULL  
country_id int(10) unsigned YES MUL NULL  
geo_coord_id int(10) unsigned YES MUL NULL  
geo_code_1 double YES   NULL  
geo_code_2 double YES   NULL  
timezone varchar(8) YES   NULL  
note varchar(255) YES   NULL  

New Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
contact_id int(10) unsigned YES UNI    
location_type_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  
is_billing tinyint(4) YES   0  
name varchar(255) YES   NULL  
street_address varchar(96) YES   NULL  
street_number int(11) YES   NULL  
street_number_suffix varchar(8) YES   NULL  
street_number_predirectional varchar(8) YES   NULL  
street_name varchar(64) YES MUL NULL  
street_type varchar(8) YES   NULL  
street_number_postdirectional varchar(8) YES   NULL  
street_unit varchar(16) YES   NULL  
supplemental_address_1 varchar(96) YES   NULL  
supplemental_address_2 varchar(96) YES   NULL  
supplemental_address_3 varchar(96) YES   NULL  
city varchar(64) YES MUL NULL  
county_id int(10) unsigned YES MUL NULL  
state_province_id int(10) unsigned YES MUL NULL  
postal_code_suffix varchar(12) YES   NULL  
postal_code varchar(12) YES   NULL  
usps_adc varchar(32) YES   NULL  
country_id int(10) unsigned YES MUL NULL  
geo_coord_id int(10) unsigned YES MUL NULL  
geo_code_1 double YES   NULL  
geo_code_2 double YES   NULL  
timezone varchar(8) YES   NULL  

civicrm_email

Current Table Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
location_id int(10) unsigned NO MUL    
email varchar(64) YES MUL NULL  
is_primary tinyint(4) YES   0  
on_hold tinyint(4) NO   0  
hold_date datetime YES   NULL  
reset_date datetime YES   NULL  

New Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
contact_id int(10) unsigned YES UNI    
location_type_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  
is_billing tinyint(4) YES   0  
name varchar(255) YES   NULL  
email varchar(64) YES MUL NULL  
on_hold tinyint(4) NO   0  
hold_date datetime YES   NULL  
reset_date datetime YES   NULL  

civicrm_phone

Current Table Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
location_id int(10) unsigned NO MUL    
is_primary tinyint(4) YES   0  
mobile_provider_id int(10) unsigned YES MUL NULL  
phone varchar(32) YES   NULL  
phone_type enum('Phone','Mobile','Fax','Pager') YES   NULL  

New Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
contact_id int(10) unsigned YES UNI    
location_type_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  
is_billing tinyint(4) YES   0  
name varchar(255) YES   NULL  
mobile_provider_id int(10) unsigned YES MUL NULL  
phone varchar(32) YES   NULL  
phone_type enum('Phone','Mobile','Fax','Pager') YES   NULL  

civicrm_im

Current Table Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
location_id int(10) unsigned NO MUL    
name varchar(64) YES   NULL  
provider_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  

New Schema

Field Type Null Key Default Extra
id int(10) unsigned NO PRI NULL auto_increment
contact_id int(10) unsigned YES UNI    
location_type_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  
is_billing tinyint(4) YES   0  
name varchar(255) YES   NULL  
name varchar(64) YES   NULL  
provider_id int(10) unsigned YES MUL NULL  
is_primary tinyint(4) YES   0  

Issues to Discuss

  • Is the location block structure with fixed 2 columns for address 1/2 etc a good model?
  • Do we need is_primary or the addreses, phones etc. in location_block.
  • What is purpose of civicrm_group_contact.location_id ?
  • Do we ever use civicrm_address.note (I've removed it from table).
  • Is having a separate is_billing flag a good idea, or continue to use location_type = Billing?

I assume the two-entry limit is for non-contact addresses, right? I'm polling 17 email accounts and I have seven accounts in my IM client...

I'm wondering whether the civicrm_contact table couldn't be extended with an primary_email, primary_phone, etc. columns instead of the is_primiary ones; we could get rid of the 'only one per contact can be primary' logic.

+1 Multiple locations just adds rarely needed complexity for mid level users.

"The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data."

That matches our experience.

Neil 

I've been thinking about this a bit

I think it might be easier to keep the is_primary and is_billing field in the tables (since we can potentially use it in other scenarios too) and then use a "Primary Contact View" and "Billing Contact View" which basically gives us a virtual: Contact + Address + Phone + EMail table for both cases.

I dont think we are saving in logic / space in either case. We'll still require a join and still need to keep track of which address / email / phone is primary for a contact

What are civicrm_address.name, civicrm_email.name, civicrm_phone.name ???  I might call my phone names sometimes, but it doesn't need to be tracked. 

Why are there two name fields in civicrm_im? 

I'm also not quite sure that I understanding the need for civicrm_location_block.  Can't Event Location and Domain Location just be linked to contacts?  (I've never used CiviEvent so forgive my ignorance). 

I have to say that the prospect of a denormalised data model is a big turn off. If MySQL 5 is required, couldn't some of the "extra" joins be put into SQL views? If updating or deleting rows is a problem and you need to deal with the SQL view's results, just use triggers to modify the data in underlying tables.

Having two apparently identical and redundant foreign keys in the loc_block table doubles the indexing required. An index in MySQL is made for each field when they're foreign keys. Putting location_type_id in every contact table adds many additional indexes on the existing model. I can't speak to the performance cost of this, but indexes can be expensive. Often they want to be cached in memory.

In the current code base, location block is used in exactly 2 places: civicrm_domain and civicrm_event. As such for "most" systems the number of entries in location block will probably be just a few (or a couple of hundred if LOTS of events). However the number of contacts is way more, and hence we think the tradeoff is justified

Ok. Thanks for the clarity.

I agree that breaking out the phone/email fields so they're not tied to a specific address has value. Perhaps expand the ability to classify the type of phone (currently just phone/fax/mobile/pager, as I recall) and email, and give admin ability to add/modify that dropdown list. That should provide the necessary classification flexibility. One problem we find with the current structure is that when someone makes a purchase, their email is tracked under the billing address, which means you're often tracking the same email in two places (since billing address is rarely the primary address). Since billing email and address are a special case, maybe there should be a way to multi-classify an address? For example, mark a single address as home and billing?

The current flexibility in having multiple addresses, flagging them by type, and having a text field to add custom names, is invaluable. It's flexibility and functionality beyond many other systems we've used, and I would not want to lose that. The method of flagging one address as primary is also very well constructed. The proposed schema changes seem to make sense. Seems like it will simplify querying to have is_primary part of the actual address table/record to which it applies, without any harm to data integrity/normalization.

The location model described on this page has been implemented as of 2.0 - so probably best to discuss suggestions for additional improvements on the forums. I think being able to assign more than one location "type" (Home + Billing) to an address, email, etc. could quite useful - so perhaps you could sketch this out in more detail on a forum post or blog post - and include thoughts about how we might change the data model to support this as well as how the user interface for contributors, event registrants etc. might change.

Powered by a free Atlassian Confluence Open Source Project License granted to CiviCRM . Evaluate Confluence today.
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators