Skip to end of metadata
Go to start of metadata

Redesigning the Location Data Model

Rationale

The existing data model for storing address, email, phone and IM data aggregates these pieces of contact data under "locations" - i.e. Home, Work, Main, etc. Although this model has worked reasonably well, it has several drawbacks.

  • Forced grouping of email, phone etc. with a specific physical address does not reflect how many people look at this data. A person's mobile phone is not really a "Home" phone OR a "Work" phone.
  • The current model forces the primary phone, email etc. to be linked to a primary "location". This also often doesn't match reality.
  • Querying address, phone, email etc requires an extra lever of join and slightly less efficient indexing.

The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data. A join table will still be used for any other objects which require address, phone, email data - e.g. Events, Domains... Each address, phone, etc carries it's own location type, along with an optional descriptive name. The primary address, phone, email address for each contact are independent of location "type" - so one contact might have "Home Address" as primary physical address, and "Work Phone" as primary phone.

Location

Current Table Schema - civicrm_location

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

entity_table

varchar(64)

NO

MUL

 

 

entity_id

int(10) unsigned

NO

 

 

 

location_type_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

name

varchar(255)

YES

 

NULL

 

New Schema - civicrm_location_block

Drop the civicrm_location table. Contacts are linked to their address, email and phone records using the contact_id foreign key in each table. A new table - civicrm_location_block - links objects other than contacts to address, email, etc. Current uses for this table are Event Location and Domain Location (add loc_block_id foreign key to these tables). Location blocks can be shared.

civicrm_loc_block

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

location_type_id

int(10) unsigned

YES

MUL

NULL

 

name

varchar(255)

YES

 

NULL

 

address_1_id

int(10) unsigned

YES

MUL

NULL

 

address_2_id

int(10) unsigned

YES

MUL

NULL

 

email_1_id

int(10) unsigned

YES

MUL

NULL

 

email_2_id

int(10) unsigned

YES

MUL

NULL

 

phone_1_id

int(10) unsigned

YES

MUL

NULL

 

phone_2_id

int(10) unsigned

YES

MUL

NULL

 

im_1_id

int(10) unsigned

YES

MUL

NULL

 

im_2_id

int(10) unsigned

YES

MUL

NULL

 

civicrm_address

Current Table Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

location_id

int(10) unsigned

NO

UNI

 

 

street_address

varchar(96)

YES

 

NULL

 

street_number

int(11)

YES

 

NULL

 

street_number_suffix

varchar(8)

YES

 

NULL

 

street_number_predirectional

varchar(8)

YES

 

NULL

 

street_name

varchar(64)

YES

MUL

NULL

 

street_type

varchar(8)

YES

 

NULL

 

street_number_postdirectional

varchar(8)

YES

 

NULL

 

street_unit

varchar(16)

YES

 

NULL

 

supplemental_address_1

varchar(96)

YES

 

NULL

 

supplemental_address_2

varchar(96)

YES

 

NULL

 

supplemental_address_3

varchar(96)

YES

 

NULL

 

city

varchar(64)

YES

MUL

NULL

 

county_id

int(10) unsigned

YES

MUL

NULL

 

state_province_id

int(10) unsigned

YES

MUL

NULL

 

postal_code_suffix

varchar(12)

YES

 

NULL

 

postal_code

varchar(12)

YES

 

NULL

 

usps_adc

varchar(32)

YES

 

NULL

 

country_id

int(10) unsigned

YES

MUL

NULL

 

geo_coord_id

int(10) unsigned

YES

MUL

NULL

 

geo_code_1

double

YES

 

NULL

 

geo_code_2

double

YES

 

NULL

 

timezone

varchar(8)

YES

 

NULL

 

note

varchar(255)

YES

 

NULL

 

New Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

contact_id

int(10) unsigned

YES

UNI

 

 

location_type_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

is_billing

tinyint(4)

YES

 

0

 

name

varchar(255)

YES

 

NULL

 

street_address

varchar(96)

YES

 

NULL

 

street_number

int(11)

YES

 

NULL

 

street_number_suffix

varchar(8)

YES

 

NULL

 

street_number_predirectional

varchar(8)

YES

 

NULL

 

street_name

varchar(64)

YES

MUL

NULL

 

street_type

varchar(8)

YES

 

NULL

 

street_number_postdirectional

varchar(8)

YES

 

NULL

 

street_unit

varchar(16)

YES

 

NULL

 

supplemental_address_1

varchar(96)

YES

 

NULL

 

supplemental_address_2

varchar(96)

YES

 

NULL

 

supplemental_address_3

varchar(96)

YES

 

NULL

 

city

varchar(64)

YES

MUL

NULL

 

county_id

int(10) unsigned

YES

MUL

NULL

 

state_province_id

int(10) unsigned

YES

MUL

NULL

 

postal_code_suffix

varchar(12)

YES

 

NULL

 

postal_code

varchar(12)

YES

 

NULL

 

usps_adc

varchar(32)

YES

 

NULL

 

country_id

int(10) unsigned

YES

MUL

NULL

 

geo_coord_id

int(10) unsigned

YES

MUL

NULL

 

geo_code_1

double

YES

 

NULL

 

geo_code_2

double

YES

 

NULL

 

timezone

varchar(8)

YES

 

NULL

 

civicrm_email

Current Table Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

location_id

int(10) unsigned

NO

MUL

 

 

email

varchar(64)

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

on_hold

tinyint(4)

NO

 

0

 

hold_date

datetime

YES

 

NULL

 

reset_date

datetime

YES

 

NULL

 

New Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

contact_id

int(10) unsigned

YES

UNI

 

 

location_type_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

is_billing

tinyint(4)

YES

 

0

 

name

varchar(255)

YES

 

NULL

 

email

varchar(64)

YES

MUL

NULL

 

on_hold

tinyint(4)

NO

 

0

 

hold_date

datetime

YES

 

NULL

 

reset_date

datetime

YES

 

NULL

 

civicrm_phone

Current Table Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

location_id

int(10) unsigned

NO

MUL

 

 

is_primary

tinyint(4)

YES

 

0

 

mobile_provider_id

int(10) unsigned

YES

MUL

NULL

 

phone

varchar(32)

YES

 

NULL

 

phone_type

enum('Phone','Mobile','Fax','Pager')

YES

 

NULL

 

New Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

contact_id

int(10) unsigned

YES

UNI

 

 

location_type_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

is_billing

tinyint(4)

YES

 

0

 

name

varchar(255)

YES

 

NULL

 

mobile_provider_id

int(10) unsigned

YES

MUL

NULL

 

phone

varchar(32)

YES

 

NULL

 

phone_type

enum('Phone','Mobile','Fax','Pager')

YES

 

NULL

 

civicrm_im

Current Table Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

location_id

int(10) unsigned

NO

MUL

 

 

name

varchar(64)

YES

 

NULL

 

provider_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

New Schema

Field

Type

Null

Key

Default

Extra

id

int(10) unsigned

NO

PRI

NULL

auto_increment

contact_id

int(10) unsigned

YES

UNI

 

 

location_type_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

is_billing

tinyint(4)

YES

 

0

 

name

varchar(255)

YES

 

NULL

 

name

varchar(64)

YES

 

NULL

 

provider_id

int(10) unsigned

YES

MUL

NULL

 

is_primary

tinyint(4)

YES

 

0

 

Issues to Discuss

  • Is the location block structure with fixed 2 columns for address 1/2 etc a good model?
  • Do we need is_primary or the addreses, phones etc. in location_block.
  • What is purpose of civicrm_group_contact.location_id ?
  • Do we ever use civicrm_address.note (I've removed it from table).
  • Is having a separate is_billing flag a good idea, or continue to use location_type = Billing?
Labels:
  1. Jul 16, 2007

    I assume the two-entry limit is for non-contact addresses, right? I'm polling 17 email accounts and I have seven accounts in my IM client...

    I'm wondering whether the civicrm_contact table couldn't be extended with an primary_email, primary_phone, etc. columns instead of the is_primiary ones; we could get rid of the 'only one per contact can be primary' logic.

  2. Jul 26, 2007

    +1 Multiple locations just adds rarely needed complexity for mid level users.

    "The proposed approach links addresses, phones etc directly to contacts - which represent 95%+ of the actual data."

    That matches our experience.

    Neil 

  3. Jul 26, 2007

    I've been thinking about this a bit (smile)

    I think it might be easier to keep the is_primary and is_billing field in the tables (since we can potentially use it in other scenarios too) and then use a "Primary Contact View" and "Billing Contact View" which basically gives us a virtual: Contact + Address + Phone + EMail table for both cases.

    I dont think we are saving in logic / space in either case. We'll still require a join and still need to keep track of which address / email / phone is primary for a contact

  4. Jul 27, 2007

    What are civicrm_address.name, civicrm_email.name, civicrm_phone.name ???  I might call my phone names sometimes, but it doesn't need to be tracked.  (wink)

    Why are there two name fields in civicrm_im? 

    I'm also not quite sure that I understanding the need for civicrm_location_block.  Can't Event Location and Domain Location just be linked to contacts?  (I've never used CiviEvent so forgive my ignorance). 

  5. Oct 30, 2007

    I have to say that the prospect of a denormalised data model is a big turn off. If MySQL 5 is required, couldn't some of the "extra" joins be put into SQL views? If updating or deleting rows is a problem and you need to deal with the SQL view's results, just use triggers to modify the data in underlying tables.

    Having two apparently identical and redundant foreign keys in the loc_block table doubles the indexing required. An index in MySQL is made for each field when they're foreign keys. Putting location_type_id in every contact table adds many additional indexes on the existing model. I can't speak to the performance cost of this, but indexes can be expensive. Often they want to be cached in memory.

    1. Oct 31, 2007

      In the current code base, location block is used in exactly 2 places: civicrm_domain and civicrm_event. As such for "most" systems the number of entries in location block will probably be just a few (or a couple of hundred if LOTS of events). However the number of contacts is way more, and hence we think the tradeoff is justified

      1. Oct 31, 2007

        Ok. Thanks for the clarity.

  6. Apr 08, 2008

    I agree that breaking out the phone/email fields so they're not tied to a specific address has value. Perhaps expand the ability to classify the type of phone (currently just phone/fax/mobile/pager, as I recall) and email, and give admin ability to add/modify that dropdown list. That should provide the necessary classification flexibility. One problem we find with the current structure is that when someone makes a purchase, their email is tracked under the billing address, which means you're often tracking the same email in two places (since billing address is rarely the primary address). Since billing email and address are a special case, maybe there should be a way to multi-classify an address? For example, mark a single address as home and billing?

    The current flexibility in having multiple addresses, flagging them by type, and having a text field to add custom names, is invaluable. It's flexibility and functionality beyond many other systems we've used, and I would not want to lose that. The method of flagging one address as primary is also very well constructed. The proposed schema changes seem to make sense. Seems like it will simplify querying to have is_primary part of the actual address table/record to which it applies, without any harm to data integrity/normalization.

    1. Apr 09, 2008

      The location model described on this page has been implemented as of 2.0 - so probably best to discuss suggestions for additional improvements on the forums. I think being able to assign more than one location "type" (Home + Billing) to an address, email, etc. could quite useful - so perhaps you could sketch this out in more detail on a forum post or blog post - and include thoughts about how we might change the data model to support this as well as how the user interface for contributors, event registrants etc. might change.


Creative Commons License
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-Share Alike 3.0 United States Licence.