Skip to end of metadata
Go to start of metadata

Optimization Tips and Tricks

This page contains tips and tricks for optimizing various operations in CiviCRM when you are working with large datasets. Please feel free to make additions / clarifications to this information. Be sure and specify the version of CiviCRM your notes apply to. When possible describe your tuning goals and results achieved with your modifications.

Optimizing Contact Insert and Update (CiviCRM v1.4)

Contributed by Morbus Iff (morbus at disobey dot com).

Working with a 2 million + record dataset..."came up with a magical combination of things to make it speedy enough so that both create/updates are taking one second each:"

  • Disabling the dupe check (and probably a bunch of other useful stuff) by commenting the following lines in api/Contact/crm_create_contact:
  • Removing all foreign keys from all tables.
  • Added indexing on entity_id, date_data, char_data in custom_value.
  • Didn't pass any custom values through crm_create_contact. I have a lot of custom values in my database, which causes crm_custom_value to be my biggest table - 12 million records. Each imported contact gets anywhere from 5 to 15 custom values, which the API (understandably) does as individual queries. However, 90% of my custom values are int_data, so I was able to use MySQL's extended-insert (which it now uses on a mysqldump to speed import) to insert them all in one fell-swoop:
  • Added the domain_id/sort_name index: CREATE INDEX domain_sort_idx ON civicrm_contact (domain_id, sort_name);
  • Tweaked the count functions to use count(*) "...when you do a count() on a table, you're always using
    "count(civicrm_contact.id)". According to #mysql, this can be quickened by just using count(*), since MySQL will optimize this away without having to pick one particular column to look for. I've done this on my install (the 2 million contact dataset in question) and it appears to have made it slightly faster. Not enough to crow about though."

Other optimization tips and tricks by Lobo

Here are a couple of things I did to improve performance significantly. All of them are code level hacks, but I suspect we'll need to make a few code changes to deal with large db sites who are more interested in performance. We will incorporate some of these as config parameters in v1.5

  • If email is not relevant, commenting out the email part of the search gives the biggest improvement. Note that the default civicrm search includes sort_name and email (CRM/Contact/BAO\/Query.php)
  • If ordering of the results is not important comment out the order by clause. This reduces the number of rows examined significantly.
  • Not sure if adding an index to domain_id and sort_name is relevant, since mysql does not seem to use that index with wild characters (i.e. sort_name like '%lobo%'). Searching on exact name is definitely way faster and will use the domain_sort index, also searching on 'lobo%' is faster and will also use the index
  • Comment out alphabetical pager, which reduces one of the 3 big queries involved
  • Morbus figured that removing domain_id from the search could potentially result in a significant improvement. We will optimize this in v1.5 http://issues.civicrm.org/jira/browse/CRM-1004
Labels
  • None

Creative Commons License
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-Share Alike 3.0 United States Licence.