This is an attempt to push the discussion at http://civicrm.org/node/674 to a further level of detail.
This is intended to specify Phase I functionality and its implementation approach. A Phase II section is included as a catch-all for ideas that might be looked at later.
Phase I
Objective
To create a logging system that will provide a fairly complete record of all data changes: who made them, when they were made, and what the changes were. This logging system is intended to be the basis for the future development of user facing functionality.
Exclusions
When data structures change, eg dropping a custom field or dropping a group of custom fields, data in the eliminated structures will not be preserved.
Implementation
- The following tables will be dropped:
- civicrm_log
- A new page will be added Administer > Configure > Global Settings > Logging, which will allow users to enable and disable logging.
- When logging is disabled, the triggers for logging will be dropped.
- When the user requests to enable logging, a check will be made to ensure that the MySQL version and database user permissions support creating triggers. If these are not appropriate, an error message will be provided to users with instructions to upgrade, as well as the need to ensure permission grants are properly changed after a 5.0 -> 5.1 upgrade. When adequate db privileges exist, the gencode produced code to modify the database structure will be called by this page.
- When logging is enabled, if log tables exist they will be truncated before their base table has triggers added to it. The user will be required to approve a warning message before this action is carried out.
- After the database connection has been initialized, if logging is enabled, civicrm_initialize will set a MySQL variable to save the current UserID for later use in triggers set up for all loggable events, eg:
- $dao->query(sprintf("set @current_user = %d", $session->get('userID')));
- The gencode program that reads the database schema and creates the DAO layer and mysql to create the CiviCRM database will be modified to support logging in the following ways:
- The existing <log> tag for tables in the XML schema will be repurposed to support the new more extensive logging proposed here.
- Logging will be enabled for the following CiviCRM tables (generally, ones that contain contact relationship information and with records that can be edited and/or deleted through the UI), which will be indicated by adding <log>true</log> to their XML schema definitions:
civicrm_activity
civicrm_activity_assignment
civicrm_activity_target
civicrm_address
civicrm_case
civicrm_case_activity
civicrm_case_contact
civicrm_contact
civicrm_contribution
civicrm_contribution_page
civicrm_contribution_product
civicrm_contribution_recur
civicrm_contribution_soft
civicrm_contribution_type
civicrm_contribution_widget
civicrm_currency
civicrm_custom_field
civicrm_custom_group
civicrm_discount
civicrm_email
civicrm_entity_file
civicrm_entity_tag
civicrm_event
civicrm_file
civicrm_financial_trxn
civicrm_grant
civicrm_group
civicrm_group_contact
civicrm_group_nesting
civicrm_group_organization
civicrm_im
civicrm_line_item
civicrm_loc_block
civicrm_location_type
civicrm_membership
civicrm_membership_block
civicrm_membership_payment
civicrm_membership_status
civicrm_membership_type
civicrm_note
civicrm_option_group
civicrm_option_value
civicrm_participant
civicrm_participant_payment
civicrm_participant_status_type
civicrm_pcp
civicrm_pcp_block
civicrm_phone
civicrm_pledge
civicrm_pledge_block
civicrm_pledge_payment
civicrm_preferences
civicrm_preferences_date
civicrm_premiums
civicrm_premiums_product
civicrm_price_field
civicrm_price_set
civicrm_price_set_entity
civicrm_product
civicrm_project
civicrm_relationship
civicrm_relationship_type
civicrm_subscription_history
civicrm_tag
civicrm_task
civicrm_task_status
civicrm_uf_field
civicrm_uf_group
civicrm_uf_join
civicrm_uf_match
civicrm_value_*
- Tables with logging enabled (called base tables) will have an additional log table created for them, eg civicrm_contact will have a civicrm_contact_log table created to store the log of inserts, updates and deletes to its records. Each log table will contain all the fields in its base table, and the following additional fields:
`log_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Unique Contact Log ID',
`log_ts` TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT 'Timestamp when Contact Log record created',
`log_contact` INT(10) unsigned NOT NULL COMMENT 'ID of Contact who acted to create the Contact Log record',
`log_action` ENUM ('Initialization', 'Insert', 'Update', 'Delete') NOT NULL COMMENT 'Action that created Contact Log record' - If data exists in a base table when logging for it is enabled, corresponding records will be inserted into the log table with log_action set to 'Initialization'.
- For performance reasons, indexes will kept to a minimum on log tables. Indexes will be selectively added as functionality to use the log tables is developed. For example, the following indexes exist on civicrm_contact: PRIMARY KEY (`id`),
UNIQUE KEY `UI_external_identifier` (`external_identifier`),
KEY `index_contact_type` (`contact_type`),
KEY `index_contact_sub_type` (`contact_sub_type`),
KEY `index_sort_name` (`sort_name`),
KEY `index_preferred_communication_method` (`preferred_communication_method`),
KEY `index_hash` (`hash`),
KEY `index_api_key` (`api_key`),
KEY `index_first_name` (`first_name`),
KEY `index_last_name` (`last_name`),
KEY `UI_prefix` (`prefix_id`),
KEY `UI_suffix` (`suffix_id`),
KEY `UI_gender` (`gender_id`),
KEY `index_household_name` (`household_name`),
KEY `index_organization_name` (`organization_name`),
KEY `FK_civicrm_contact_mail_to_household_id` (`mail_to_household_id`),
KEY `FK_civicrm_contact_primary_contact_id` (`primary_contact_id`),
KEY `FK_civicrm_contact_employer_id` (`employer_id`). The corresponding log table will have the following indexes (other log tables will only have the first four indexes): PRIMARY KEY (`id`, `log_id`),
KEY `index_id` (`id`),
KEY `index_log_id` (`log_id`),
KEY `index_log_ts` (`log_ts`),
KEY `index_contact_type` (`contact_type`),
KEY `index_contact_sub_type` (`contact_sub_type`),
KEY `index_sort_name` (`sort_name`),
KEY `index_household_name` (`household_name`),
KEY `index_organization_name` (`organization_name`). - Constraints on base tables will not be retained for log tables to improve performance.
- Here are samples of the three triggers that will be created for each log table:
- CREATE DEFINER='civicrm_db_user'@'localhost'
TRIGGER civicrm_contact_after_insert
AFTER INSERT ON civicrm_contact FOR EACH ROW
INSERT INTO civicrm_contact_log
(log_contact, log_action, `id` , `contact_type` , `contact_sub_type`,`do_not_email` , `do_not_phone`, `do_not_mail` , `do_not_sms` , `do_not_trade` , `is_opt_out`, `legal_identifier`, `external_identifier`, `sort_name`, `display_name` , `nick_name`, `legal_name`, `home_URL` , `image_URL` , `preferred_communication_method` , `preferred_mail_format` , `hash`, `api_key`, `source` , `first_name` , `middle_name` , `last_name` , `prefix_id` , `suffix_id` , `email_greeting_id` , `email_greeting_custom` , `email_greeting_display` , `postal_greeting_id`, `postal_greeting_custom` , `postal_greeting_display`, `addressee_id` , `addressee_custom`, `addressee_display` , `job_title` , `gender_id` , `birth_date` , `is_deceased` , `deceased_date` , `mail_to_household_id`, `household_name`, `primary_contact_id`, `organization_name` , `sic_code` , `user_unique_id` , `employer_id` )
VALUES (@current_user, 'INSERT', NEW.`id`, NEW.`contact_type`, NEW.`contact_sub_type`, NEW.`do_not_email`, NEW.`do_not_phone`, NEW.`do_not_mail`, NEW.`do_not_sms`, NEW.`do_not_trade`, NEW.`is_opt_out`, NEW.`legal_identifier`, NEW.`external_identifier`, NEW.`sort_name`, NEW.`display_name`, NEW.`nick_name`, NEW.`legal_name`, NEW.`home_URL`, NEW.`image_URL`, NEW.`preferred_communication_method`, NEW.`preferred_mail_format`, NEW.`hash`, NEW.`api_key`, NEW.`source`, NEW.`first_name`, NEW.`middle_name`, NEW.`last_name`, NEW.`prefix_id`, NEW.`suffix_id`, NEW.`email_greeting_id`, NEW.`email_greeting_custom`, NEW.`email_greeting_display`, NEW.`postal_greeting_id`, NEW.`postal_greeting_custom`, NEW.`postal_greeting_display`, NEW.`addressee_id`, NEW.`addressee_custom`, NEW.`addressee_display`, NEW.`job_title`, NEW.`gender_id`, NEW.`birth_date`, NEW.`is_deceased`, NEW.`deceased_date`, NEW.`mail_to_household_id`, NEW.`household_name`, NEW.`primary_contact_id`, NEW.`organization_name`, NEW.`sic_code`, NEW.`user_unique_id`, NEW.`employer_id`) ; - CREATE DEFINER='civicrm_db_user'@'localhost'
TRIGGER civicrm_contact_after_update
AFTER UPDATE ON civicrm_contact FOR EACH ROW
INSERT INTO civicrm_contact_log
(log_contact, log_action, `id` , `contact_type` , `contact_sub_type`,`do_not_email` , `do_not_phone`, `do_not_mail` , `do_not_sms` , `do_not_trade` , `is_opt_out`, `legal_identifier`, `external_identifier`, `sort_name`, `display_name` , `nick_name`, `legal_name`, `home_URL` , `image_URL` , `preferred_communication_method` , `preferred_mail_format` , `hash`, `api_key`, `source` , `first_name` , `middle_name` , `last_name` , `prefix_id` , `suffix_id` , `email_greeting_id` , `email_greeting_custom` , `email_greeting_display` , `postal_greeting_id`, `postal_greeting_custom` , `postal_greeting_display`, `addressee_id` , `addressee_custom`, `addressee_display` , `job_title` , `gender_id` , `birth_date` , `is_deceased` , `deceased_date` , `mail_to_household_id`, `household_name`, `primary_contact_id`, `organization_name` , `sic_code` , `user_unique_id` , `employer_id` )
VALUES (@current_user, 'INSERT', NEW.`id`, NEW.`contact_type`, NEW.`contact_sub_type`, NEW.`do_not_email`, NEW.`do_not_phone`, NEW.`do_not_mail`, NEW.`do_not_sms`, NEW.`do_not_trade`, NEW.`is_opt_out`, NEW.`legal_identifier`, NEW.`external_identifier`, NEW.`sort_name`, NEW.`display_name`, NEW.`nick_name`, NEW.`legal_name`, NEW.`home_URL`, NEW.`image_URL`, NEW.`preferred_communication_method`, NEW.`preferred_mail_format`, NEW.`hash`, NEW.`api_key`, NEW.`source`, NEW.`first_name`, NEW.`middle_name`, NEW.`last_name`, NEW.`prefix_id`, NEW.`suffix_id`, NEW.`email_greeting_id`, NEW.`email_greeting_custom`, NEW.`email_greeting_display`, NEW.`postal_greeting_id`, NEW.`postal_greeting_custom`, NEW.`postal_greeting_display`, NEW.`addressee_id`, NEW.`addressee_custom`, NEW.`addressee_display`, NEW.`job_title`, NEW.`gender_id`, NEW.`birth_date`, NEW.`is_deceased`, NEW.`deceased_date`, NEW.`mail_to_household_id`, NEW.`household_name`, NEW.`primary_contact_id`, NEW.`organization_name`, NEW.`sic_code`, NEW.`user_unique_id`, NEW.`employer_id`) ; - CREATE DEFINER='civicrm_db_user'@'localhost'
TRIGGER civicrm_contact_after_delete
AFTER DELETE ON civicrm_contact FOR EACH ROW
INSERT INTO civicrm_contact_log
(log_contact, log_action, `id` , `contact_type` , `contact_sub_type`,`do_not_email` , `do_not_phone`, `do_not_mail` , `do_not_sms` , `do_not_trade` , `is_opt_out`, `legal_identifier`, `external_identifier`, `sort_name`, `display_name` , `nick_name`, `legal_name`, `home_URL` , `image_URL` , `preferred_communication_method` , `preferred_mail_format` , `hash`, `api_key`, `source` , `first_name` , `middle_name` , `last_name` , `prefix_id` , `suffix_id` , `email_greeting_id` , `email_greeting_custom` , `email_greeting_display` , `postal_greeting_id`, `postal_greeting_custom` , `postal_greeting_display`, `addressee_id` , `addressee_custom`, `addressee_display` , `job_title` , `gender_id` , `birth_date` , `is_deceased` , `deceased_date` , `mail_to_household_id`, `household_name`, `primary_contact_id`, `organization_name` , `sic_code` , `user_unique_id` , `employer_id` )
VALUES (@current_user, 'DELETE', OLD.`id`, OLD.`contact_type`, OLD.`contact_sub_type`, OLD.`do_not_email`, OLD.`do_not_phone`, OLD.`do_not_mail`, OLD.`do_not_sms`, OLD.`do_not_trade`, OLD.`is_opt_out`, OLD.`legal_identifier`, OLD.`external_identifier`, OLD.`sort_name`, OLD.`display_name`, OLD.`nick_name`, OLD.`legal_name`, OLD.`home_URL`, OLD.`image_URL`, OLD.`preferred_communication_method`, OLD.`preferred_mail_format`, OLD.`hash`, OLD.`api_key`, OLD.`source`, OLD.`first_name`, OLD.`middle_name`, OLD.`last_name`, OLD.`prefix_id`, OLD.`suffix_id`, OLD.`email_greeting_id`, OLD.`email_greeting_custom`, OLD.`email_greeting_display`, OLD.`postal_greeting_id`, OLD.`postal_greeting_custom`, OLD.`postal_greeting_display`, OLD.`addressee_id`, OLD.`addressee_custom`, OLD.`addressee_display`, OLD.`job_title`, OLD.`gender_id`, OLD.`birth_date`, OLD.`is_deceased`, OLD.`deceased_date`, OLD.`mail_to_household_id`, OLD.`household_name`, OLD.`primary_contact_id`, OLD.`organization_name`, OLD.`sic_code`, OLD.`user_unique_id`, OLD.`employer_id`) ;
- CREATE DEFINER='civicrm_db_user'@'localhost'
- gencode.php will produce code to enable and disable logging, including the creation of 'Initialization' records in log tables as the log table is enabled.
- Where database constraints are used to cascade changes to other tables (if they are) from base tables, these cascading actions will be rewritten into triggers on the base tables. For the sake of simplicity, when logging is disabled, these cascading actions will still be implemented via triggers rather than constraints.
- The CiviCRM upgrade script will be modified in the following ways:
- If *_log tables exist, then all changes to their base tables will also be made to the *_log tables.
- Inserts, updates, and deletions of custom fields and custom groups will be mirrored to the corresponding log tables.
Unit Testing
- Gencode.php will produce the following PHP unit tests for each table that is loggable:
- Does enabling logging on a table for the first time create the appropriate *_log table?
- Does enabling logging on a table for the first time create appropriate 'Initialization' records for each record in the base table?
- Does enabling logging on a table after it has been disabled truncate the *_log table?
- Does enabling logging on a table after it has been disabled create appropriate 'Initialization' records for each record in the base table?
- Record level tests for insert, update and delete triggers:
- When logging is enabled, do the actions on the base table produce a record in the log table?
- When logging is disabled, do the actions on the base table produce no record in the log table?
- If there are cascading actions on a base table, do the cascading actions occur before logging is enabled?
- If there are cascading actions on a base table, do the cascading actions occur when logging is enabled?
- If there are cascading actions on a base table, do the cascading actions occur after logging has been enabled and disabled?
- Field level tests for insert, update and delete triggers:
- Does each base table field get accurately copied to log table?
- Do the values for the four log_* fields get accurately set?
Phase II
Administer > Configure > Global Settings > Logging will allow logging to be selectively enabled by table.
Functionality will be developed to make use of the logs. In particular, a Quality Assurance function will allow users with appropriate permissions (eg are members of a Data QA role ACL) to review and approve all data changes to all records since they were last QA'd. This will likely involve creating a new enum value of 'DataQA' for log_action.
- We will create permissions that decide who can commit changes without QA approval. If the user does not hav that permission their changes go into a queue.
- This permission will only affect demographic tables (like civicrm_contact, civicrm_address, civicrm_email, etc...) as we don't want to prevent Contributions, Event registrations, Actions and Memberships from hitting the database immediately.
- Administrators, with "commit changes" rights will have access to the queue and will have the ability to review. The queue will be a list of records that have been changed and with the date of the latest change.
- names in the queue will be links to the record in question or a the header on a cascading fieldset that will open to reveal a list of the fields that are changed grouped by date. So if you have changes on 1/1/10 and 12/31/09 you will have two groups of fields.
- the list of changes will be structured in a table/grid (using divs of course
) three rows, field name, the new value as an editable field and the old value. The admin will be able to edit the 2nd row (the new value) and save the entire record. One major trick I have worked through is what to do if the same field is edited on multiple days, hence it gets rendered in multiple groups. The simple answer is to take the latest revision and train admins to only edit that one. - How do we reject an edit so it doesn't show up each time the queue is visited. The logging tables will have to have a "reject" boolean field to track this.
- to this end we will want to save records for later consideration or for staff to ask someone else to look at while retaining the ability to mark records a reviewed to ensure they don't continue to show up.
- the list of changes will be structured in a table/grid (using divs of course
- We will likely want for the admin to be able to sort, but that's low priority for now.
- names in the queue will be links to the record in question or a the header on a cascading fieldset that will open to reveal a list of the fields that are changed grouped by date. So if you have changes on 1/1/10 and 12/31/09 you will have two groups of fields.

9 Comments
Hide/Show CommentsMay 04, 2010
Jim Taylor
It's worth noting that there is a membership_log table that does this already.
May 06, 2010
Jim Taylor
couple thoughts,
in reference to bullet 4 h gencode.php doesn't ship with the tar ball, so we will have to rely on something else to initialize the log tables.
We need to think through incremental updates as all alter table statements will have to update both the base and log tables. Further we need to think about what happens if logging is turned off and then back on. If you add a custom field after turning logging off your base and log tables schemas will not longer be synched.
I'm of the opinion that the logging tables should ship and be created on the fly for custom fields as groups/fields are edited. They don't necessarily have ot be used but just having them there makes them easier to keep synched. It means a whole bunch more tables ship with core but I think it's worth the sacrifice. This is a crazy though but can turning logging on push the changes in schema from base to logging using triggers or can trigger push alter tables statements form base to logging (wishful thinking I know).
May 25, 2010
JoeMurray
4 h) GenCode produces code that is shipped. The code to truncate a log table, then initialize it with records copying the base table values can be created without reference to the schema, so it does not need to be written by GenCode. The only exception might be a need to have a list of all of the base tables for which logging can be enabled or disabled, which can be worked out during coding.
Incremental updates: Agreed re: updates to base and log tables. Whether logging is on or off, field modifications should be made to both tables after warning user.
I think there is a significant overhead in logging, few will initially use it, and while there is not a lot of overhead in having unused empty log table in the db there is little advantage. I think when logging is enabled, there should be a drop table if exists followed by code to create the log table. I'd have to look if triggers can modify schema in MySQL; my sense is that no, they aren't allowed. But I don't see need for that.
May 26, 2010
Donald A. Lobo
disclaimer: using new cool toys is always fun and a good thing
since logging is more of a safety valve, how about using mongodb for archival purposes? Since logging is primarily a write once (and forget in most cases), its a good use case. Other advantages:
1. we are storing the values and a snapshot of the schema for that version. We basically dont really care too much about schema upgrades with this mechanism
2. fits the use case for something like mongo perfectly (a loggging engine).
3. avoids the schema explosion problem. potentially easier to code and expand what objects are logged
4. ultimately what we really want stored is a consistent state of each object, not the low level table details (since if we need to roll back we roll back at the object level). better match for a more flexible DB
just my initial thoughts
lobo
May 26, 2010
D D
Some ideas:
1) If speed is an issue can use MySQL memory tables for logging instead of InnoDB/ISAM, then periodically write them out to whatever with a background cron task.
2) Use subversion to track changes - serialize the object and revision it. That won't be fast if done in realtime, but maybe serialize to files which is fast and then periodic cron to svn. http://php.net/manual/en/book.svn.php
Jun 15, 2010
JoeMurray
I like the idea of memory tables for logging in general, though it would drive up server requirements.
With regard to svn, I'm concerned that I want to be able to easily use the data during data QA in phase II. Those who will be regularly using data are not very tech savvy. It's not to provide an occasional lookup in case of problems, but to regularly process all changes. So it's not quite a normal logging operation for svn versioning, or so it seems to me. It seems significantly more trouble to pull diff out of repo, present to user, and update current data if necessary compared to using existing structures and support in DAO layer and MySQL. But then perhaps I'm not seeing an elegant pattern for implementing this.
Jun 16, 2010
D D
Fair enough.
A clarification on what I meant by serialize won't answer the concern about the review/edit phase, but would simplify view-only uses. I meant serialize to some text format that makes the changes plain text obvious, not PHP's serialize() where the diff isn't really helpful.
Although if it was PHP's serialize() then you could e.g. pull r5 and instantiate, pull r6 and instantiate, run the object's diff() method on them (that method to-be-written - a civi not an svn feature), then display in a table for review/edit. But yes maybe less efficient and not necessarily easier, and you're no longer really leveraging svn.
Also while I'm here a clarification which on re-read isn't obvious: Repository url's would be file urls not http/svn, so there isn't a subversion server required. It would all happen on the civi server locally.
Jun 15, 2010
JoeMurray
The logging is intended more to enable the data QA functionality for this client, though yes, in general, most will use it as a safety valve.
This is intended to be part of regular workflow for a client that is very good at maximizing response rates in direct mail program by ensuring their data is in tip top shape, e.g. salutation is correct and appropriate for email versus mail communication. They also manually fix invalid addresses both to reduce mailing costs and also to ensure they have as big a prospect list as possible.
I've started learning about mongodb but haven't got my head around its advantages yet versus the issues with introducing a new technology dependence to project, learning curves, etc. Seems like it would complicate the QA in phase II significantly.
Jun 15, 2010
Jim Taylor
Yea I think it's a mistake to think of it as log it and forget about it.
Organizations that hire temp workers to do data entry en-masse or those that relay on members to keep data up to date are likely to use this kind of thing on a nearly daily basis. I've worked with orgs that add 150K names to their db every summer, you can only imagine the data entry issues we encountered.
I agree with Joe adding Mongodb would like speed up performance but be add a learning curve. But that said it's not like record level locking is essential here, as these records won't be edited, so MyISAM isn't out of question which should speed things up.