Optimization Tips and Tricks
This page contains tips and tricks for optimizing various operations in CiviCRM when you are working with large datasets. Please feel free to make additions / clarifications to this information. Be sure and specify the version of CiviCRM your notes apply to. When possible describe your tuning goals and results achieved with your modifications. |
Optimizing Contact Insert and Update (CiviCRM v1.4)
Contributed by Morbus Iff (morbus at disobey dot com).
Working with a 2 million + record dataset..."came up with a magical combination of things to make it speedy enough so that both create/updates are taking one second each:"
- Disabling the dupe check (and probably a bunch of other useful stuff) by commenting the following lines in api/Contact/crm_create_contact:
- Removing all foreign keys from all tables.
- Added indexing on entity_id, date_data, char_data in custom_value.
- Didn't pass any custom values through crm_create_contact. I have a lot of custom values in my database, which causes crm_custom_value to be my biggest table - 12 million records. Each imported contact gets anywhere from 5 to 15 custom values, which the API (understandably) does as individual queries. However, 90% of my custom values are int_data, so I was able to use MySQL's extended-insert (which it now uses on a mysqldump to speed import) to insert them all in one fell-swoop:
- Added the domain_id/sort_name index: CREATE INDEX domain_sort_idx ON civicrm_contact (domain_id, sort_name);
- Tweaked the count functions to use count(*) "...when you do a count() on a table, you're always using
"count(civicrm_contact.id)". According to #mysql, this can be quickened by just using count(*), since MySQL will optimize this away without having to pick one particular column to look for. I've done this on my install (the 2 million contact dataset in question) and it appears to have made it slightly faster. Not enough to crow about though."
Other optimization tips and tricks by Lobo
Here are a couple of things I did to improve performance significantly. All of them are code level hacks, but I suspect we'll need to make a few code changes to deal with large db sites who are more interested in performance. We will incorporate some of these as config parameters in v1.5
- If email is not relevant, commenting out the email part of the search gives the biggest improvement. Note that the default civicrm search includes sort_name and email (CRM/Contact/BAO\/Query.php)
- If ordering of the results is not important comment out the order by clause. This reduces the number of rows examined significantly.
- Not sure if adding an index to domain_id and sort_name is relevant, since mysql does not seem to use that index with wild characters (i.e. sort_name like '%lobo%'). Searching on exact name is definitely way faster and will use the domain_sort index, also searching on 'lobo%' is faster and will also use the index
- Comment out alphabetical pager, which reduces one of the 3 big queries involved
- Morbus figured that removing domain_id from the search could potentially result in a significant improvement. We will optimize this in v1.5 http://issues.civicrm.org/jira/browse/CRM-1004
