CiviCRM has a large codebase, which includes some great code, some decent code, and some terrible code. If you're reading this page, then you've stumbled into some code which is not great or decent or even borderline. It's so bad it's off the charts. It's probably a 12-page function with 20 variables and 200 control statements. It's the worst of the worst. You might call it toxic.
Toxic code creates a negative feedback cycle. Developers are afraid to look at it. When they do look at it, they're afraid to change it. When they do change it, they put in gratuitous conditionals and lookups and new variables (because they don't understand all the other conditionals and lookups and variables already there) - which minimizes the short-term risk but ultimately makes the problem even worse (because the code becomes longer and harder to understand). Once a change is prepared, it's hard to get meaningful code-review – because code-reviewers don't understand it – so it either gets weak review or slow review, which in turn discourages cleanup.
If you've patched toxic code, take a moment to appreciate the accomplishment. You've swum upstream. You've overcome a bunch of obstacles. It's hard, and many people can't do it.
Unfortunately, with toxic code, it is not sufficient to fix a bug, enhance an edge-case, or add a feature. We need to break the negative feedback cycle so that future developers won't have such a hard time. As someone who recently wrestled with the toxic code, you have a fresh understanding of it – which makes you particularly qualified to improve it.
Make the code better than how you found it.
After looking at a piece of toxic code, I often feel the urge to go all-in on refactoring or to throw it away (or maybe yell at every person that appears in "git blame"). But then I look at the clock and realize: "Damn, it's 4 am; I've got another project to work on; and there's no way to schedule in 3 days of unplanned refactoring. Besides, I don't want to deal with all those edge-cases." To be sure, sometimes we do find a way - which is awesome - but the Toxic Code Protocol cannot rely on it happening all the time.
Fortunately, we don't need to fix everything all at once – we need to break the negative cycle wherein toxic code grows a little more toxic with each revision. The solution: make it a little less toxic with each revision. Suppose you write a 3-line bugfix which adds a new conditional to a 400-line function. No one expects you to cleanup all 400 lines, but the bugfix increases toxicity. You can balance out the change by moving 6 lines into a separate, smaller function. Now our function is less buggy and more readable.
- Be opportunistic. Toxic functions are long. There should be several opportunities for improvement. Pick whichever looks easiest.
- Be proportionate. If you make a small change, then do a small cleanup. If you make a big change, then do a big cleanup.
- If you're doing two different things (such as fixing a bug at line 300 and cleaning some code at line 400), then put these in separate git commits.
- Use an IDE (like PHPStorm) which supports automated refactoring.
- The "Extract Method" technique is particularly useful in long functions.
- See, e.g. https://sourcemaking.com/refactoring/extract-method or https://www.jetbrains.com/phpstorm/help/extract-method.html
- Play around and preview different extractions. If you select a large chunk of code for extraction, it may pass through a large number of variables - which isn't good for complexity. But if you swap around a few lines or designate a different set of lines, you may be able to get code which is simpler but functionally equivalent.
- Look for duplicated code. This is often ideal for method extraction – you can replace two chunks of nearly identical code with one common function.
- Remove unnecessary temporary variables (eg variables that are only referenced once or twice).
- In ordinary code, temporary variables can help improve readability by giving a name to data.
- In toxic code, there are so many variables that it's hard to find a meaningful name which doesn't conflict with something else, and you'll find variables which are mentally indistinguishable (eg "$message->body_html" vs "$messageBodyHtml" vs "$body_html" – all in the same function). Moreover, it's hard to recognize all the ways the variable might be used or modified.
- Replacing temporary variables with function-calls or inline expressions can improve readability.
- The "Inline Variable" refactoring can be useful for this (eg https://www.jetbrains.com/phpstorm/help/inline.html ).
- Write a unit test. Even if you don't change the principle code, having a unit-test will protect your changes and will help future developers perform more aggressive cleanup.
Q: How are toxic functions identified?
A: One of the core or active developers periodically runs PHP Depend (pdepend) and sorts the results to identify a cohort of functions or classes which are consistently among the worst across multiple metrics. The selections are put in the ".toxic" file.
Q: Why should I share responsibility for toxic code - I only wrote a little piece! And it (fixed a bug | added a feature)!
A: That's probably true for everyone who ever touched it. Toxic code becomes toxic because no one feels individually responsible for improving it. Toxic code is a bit like the Tragedy of the Commons - lots of people are involved, and every step is individually rational and defensible, but the overall result hurts everyone.
Q: What makes toxic code so special that it deserves it's own protocol?
A: The protocol isn't special. "Make the code better than how you found it" is always a good principle... for all patches... to all code... by all developers... all the time. And I believe that most developers work hard to follow these kinds of principles. (And remember: CiviCRM is large. It has good code and bad code.) What makes toxic code special is its historical neglect, which kicked off a negative cycle. Correcting that cycle requires that everyone take special care to improve it.