I am Mohit Agarwal, a final year undergraduate student pursuing B.Tech in Computer Science and Engineering at International Institute of Information Technology Hyderabad, India. I will be working on Predictive and Data Mining project as a 2015 Google Summer of Code student. I am being mentored by Xavier Dutoit and Owen Bowden.
The main of the project is to add prediction to the existing data, i.e. to be able to make predictions about how likely an individual would respond positively or negatively or neutrally to a particular engagement action given their relationship history.
By positive response, it means completing a user journey to make a donation, buy a membership, buy a ticket or sponsorship for an event, or respond positively to major donor request to make a bequest sometime in the next 5 years.
By neutral, it means something close to ignoring the outreach.
By negative response, it is trying to indicate the negative reaction that comes from spamming or asking too much or too often or too early in a relationship.
The relationship history is composed of all of the outreach actions to them and their reaching back to the organization (these blend a bit), such as bulk emails, personal emails, petition signings, survey responses, phone calls, meetings and other custom activities, contributions of various sorts (some will be purchases of goods or services and others will be donations), purchases and renewals of memberships, participation in events through registering, cases they have been involved in (these are sometimes used as workflow for selling memberships, but they can also be for things like helping a person get housing), grants they may have applied for and/or received, and so on.
As CiviCRM is used by organizations of widely varying sizes with widely varying amounts of data on their users, it would be highly useful to know when insufficient information is available to make a prediction with a given level of confidence.
May 21, 2015
The scope of this project is quite large. After discussion with my mentor, we decided to start with something simple. I would first work on making predictions related to mailings. Once I have the model that is able to make mailing related predictions with a certain confidence level, I would try to extend this model to other entities like events, campaigns, etc.
- Install Civimail, Civix, Civisualize.
- Try out Civimail. Understand the mailing database structure.
- Work on a dump that would contain tables having information about the mail opening rates, click rates and so on.
- Use linear regression techniques to find out a formula that can be used to make predictions related to mailings. For example, what is the most optimum time to send mail in a particular group? How many users are likely to open a particular mail in next 3 hours?
- While working on the model, take into account not only absolute time of events, but also the delta time between when the email was sent and actually opened.
- Once you have the equation, visualize it using the CiviCRM data visualization extension and identify patterns.