Aller directement à la fin des métadonnées
Aller au début des métadonnées

In these examples we use the Pentaho Community Edition

Using PDI to extract, transform and load data

Introduction

At PUM Senior Experts we decided to do the reporting on a separate data warehouse (MySQL Database) and to use data cubes. We use Pentaho Data Integration (aka Kettle with the UI as Spoon) to extract the data from a daily dump of the CiviCRM database (as the production database is being accessed 24 hours per day from all over the globe and we do not want to cause any performance issues), transform the data and load it into the datawarehouse database. Yes, it is a ETL tool (sourire) (Extract, Transform and Load)

This section will show you how we have done this based on a couple of user stories.

aka Kettle and Spoon

Pentaho Data Integration is a part of the Pentaho overall solution, but can also be used as a stand alone application. It is also known as Kettle and the UI is also known as Spoon.

PUM Senior Experts sends their voluntary senior experts on short missions to  share hands-on business knowledge with organisations in developing countries and emerging markets, taking away bottle necks, and facilitating sustainable paths for growth. The missions are divided in several types of main activities as they are called at PUM:

  • advice
  • remote coaching
  • seminar
  • business visit to The Netherlands

Each of those main activities is a case type in CiviCRM. On top of that PUM Senior Experts use case types for:

  • handling the travel arrangements (a so called travel case, which is a case within as case)
  • managing the expert application process when a new expert applies
  • managing the project intake process when a customer requests a new mission
  • opportunities to get major donors
  • managing the grant applications they receive from other organizations.

In the example you will see these various case types occur, and you will see quite some custom data (for additional customer data, expert data and case stuff) and quite a few extensions like one that adds the entity Project to civicrm and allows the user to link cases to a project or the one that adds a sponsor (contributing organization) to a case. We do not aim to explain every litte bit of data but assume the examples are clear against this background. Wherever I think it is not obvious I will explain in Info panels.

User Stories

In this section we will show how we build our datawarehouse transformation. We have used the SCRUM project approach, and created the data warehouse based on user stories. There are quire a few, but the ones mentioned in this section were the first ones.

  1. Overview of main activities per country and period (As CFO I want to receive a report once a month showing (per country and per month) the count of received project intake cases (exclude status error) so I can monintor the progress per country on a monthly basis.)
  2. Overview of main activities per country coordinator and period (In my role as Country Coordinator I want to have an overview (per representative, country and month) of cases in stock (status is matching or preparation) with a window of 3 months so I can estimate the workload in the coming months.)
  3. Overview of main activity per sponsor and period (In my role as CFO I want to have an overview (per sponsor, month) of the realised cases (status is debriefing, execution or completed) in the curreny year.)

Initially we only create the overviews for the user stories in Saiku Analytics, which is an ad-hoc reporting toolset which you can use as part of the Pentaho set. At a later stage we will create static reports for some of these.

To be able to create the kind of ad-hoc reports required we need:

  • fact data on activity level (as the lowest data entity): number of case clients, number of cases and number of activities
  • dimensions on project, client, country coordinator, representative and sponsor. And for good measure we have added dimensions for sector coordinators, project officers and experts too. (Country coordinators are volunteers that coordinate a country, sector coordinators volunteers that coordinate a sector like Waste Management for example, project officers the PUM staff that manage the process of the project, travel etc., representatives local people in a country representing PUM Senior Experts)
  • we also want to filter out the travel cases (as they only exists to facilitate travel for their parent case) and the cases that are created by mistake (case status is Error).
Background on facts and dimensions

For a little background information on facts and dimensions check http://databases.about.com/od/datamining/a/Facts-Vs-Dimensions.htm

We did all this in a Kettle transformation file (a file with the extension .ktr) using Spoon, the Data Integration user interface.

Using Spoon

To install and use Spoon on Ubuntu, follow the instructions on Installation of Pentaho components and setting up connections#InstallPentahoDataIntegration%28PDI%29onUbuntuDesktop and run the Spoon.sh file:

Creating the transformation with PDI

Étiquette
  • Aucun