Using PDI to extract, transform and load data IntroductionAt PUM Senior Experts we decided to do the reporting on a separate data warehouse (MySQL Database) and to use data cubes. We use Pentaho Data Integration (aka Kettle with the UI as Spoon) to extract the data from a daily dump of the CiviCRM database (as the production database is being accessed 24 hours per day from all over the globe and we do not want to cause any performance issues), transform the data and load it into the datawarehouse database. Yes, it is a ETL tool This section will show you how we have done this based on a couple of user stories.
PUM Senior Experts sends their voluntary senior experts on short missions to share hands-on business knowledge with organisations in developing countries and emerging markets, taking away bottle necks, and facilitating sustainable paths for growth. The missions are divided in several types of main activities as they are called at PUM:
Each of those main activities is a case type in CiviCRM. On top of that PUM Senior Experts use case types for:
In the example you will see these various case types occur, and you will see quite some custom data (for additional customer data, expert data and case stuff) and quite a few extensions like one that adds the entity Project to civicrm and allows the user to link cases to a project or the one that adds a sponsor (contributing organization) to a case. We do not aim to explain every litte bit of data but assume the examples are clear against this background. Wherever I think it is not obvious I will explain in Info panels. User StoriesIn this section we will show how we build our datawarehouse transformation. We have used the SCRUM project approach, and created the data warehouse based on user stories. There are quire a few, but the ones mentioned in this section were the first ones.
Initially we only create the overviews for the user stories in Saiku Analytics, which is an ad-hoc reporting toolset which you can use as part of the Pentaho set. At a later stage we will create static reports for some of these. To be able to create the kind of ad-hoc reports required we need:
We did all this in a Kettle transformation file (a file with the extension .ktr) using Spoon, the Data Integration user interface.
Creating the transformation with PDI |
Étiquette
Page: Adding connections
Page: Adding Date and Time tables
Page: Extracting the data
Page: Opening the transformation