Aller directement à la fin des métadonnées
Aller au début des métadonnées

Analyzing the user stories at Using PDI (Data Integration) to extract, transform and load data#UserStories I need the following data:

  • cases (with their activities as this is the lowest level of data) with case type, case status and case client
  • for each case client which country the client is located
  • projects which are a layer above projects with the name and the name of the project manager
  • the representative connected to each case by name
  • the country coordinator connected to each case with the name, gender and birth date
  • the sector coordinator connected to each case with the name, gender and birth date
  • the project officer connected to the case by name
  • the expert connected to the case with name, gender and birth date
  • the sponsor connected to the case by name
  • I also need the case start and end date to be able to have time dimensions. On top of that I need the main activity start and end date. These are custom fields on case level that show the dates the actual activity in the country started and ended, which can and will differ from the case start and end date.


Developing the query

Looking at the CiviCRM data model I will need data from a hefty number of tables, and I will need to generate an SQL that retrieves that data from the CiviCRM database. In the PUM setting this is the actual SQL query:

Obviously you need knowledge of where to find the data in CiviCRM to do this, and knowing a bit about SQL helps too. We will not discuss all the details of this here, we assume you know how to get the data with SQL.

CAST date values

You see a couple of CAST lines for the date fields (like CAST(DATE_FORMAT(main.start_date, "%Y%m%d") AS unsigned) AS case_main_start_date_tk). This is because Owen explained it was required for date comparison to work and not run into all sorts of issues. And I immediately accepted the advice of the expert and have not tried to do it any differently.


Adding the Table Input step

In PDI, you now click on the Design tab and you will see a list of steps. There are quire a few! To extract the data, you will select one of the Input steps.

To get a step on your transformation you select it from the menu and drag it on to your workspace.

In this case we will use a Table Input as the input data is coming from a MySQL table. But it could also come from an Excel file, or can be entered in a Data Grid etc. etc.

Once the step is on my workspace I can select it to enter properties.In this case I have given the step a name, selected the database connection and added my query. I can click on the preview button to see if my query actually produces any results.

Transform the data

In the next steps I am going to replace the gender values from my data extract. As you will all know CiviCRM by default shows 1, 2 or 3 for a gender. In my datawarehouse I want to replace those values with Female, Male or Unknown. To do that I will add a Transform step of the type Value Mapper with the properties like this:

I will add a hop (a link) between the steps so they are connected (click on the step and select the output connector or hold shift and drag from one step to the other). Once I have completed all the required gender mappings the total will look like this:

Removing the null values

I do not want null values in my reports so I will now add a step to replace all null values. This is a Utility step of the type If field value is null and then replace all possible null values like so:


I will also add a filter to remove some of the case types I do not need in this data cube. I will add a Flow step Filter Row like so:

After adding this step my total transformation will look like this:

Creating the tables

Please note that the tables have to be created in your database before you can actually run your transformation! You can do this yourself, but you can also select Action/Get SQL from the PDI menu and it will suggest the SQL statements you have to execute to create, alter or drop tables. You can click on Execute SQL to execute all or only a selected query.

You can test each step (right click on the step and select Check) or do a preview (right click on the step and select Preview). You can see the transformation running and can see statistics in the bottom of your window.

Adding a dimension

Now that I have cleaned my data I need to add the dimensions. I will add a step from the Data Warehouse group called Dimension lookup/update. A separate table in my data warehouse will be created for each dimension. So I need to specify what the dimension is, what the table in the data warehouse will be called, what fields will be included etc. 

I have added the CaseClient Dimension as an example:

You can see I have entered the PUM Datawarehouse connection as this is where the dimension table needs to be added. I have named the table dim_case_client and selected the field case_client_id as the key field in my transformation stream, which will then be named CaseClientID in my resulting dimension table.

I have also added a technical key of the name CaseClientIdTK (Owen has taught me well that I should stick to a convention so my technical keys which are required all end with TK). And if I now click on the tab Fields (next to Keys) I can enter the fields from my transformation stream and name them in the table:

You can see that at the end of the line there is a value Punch through. This means that if the value of the field has changed on the next transformation run, the value should be punched through to all instances where this field is used in the data warehouse. Once the name of the client changes, it changes everywhere.

So after this step I will have a new table in my data warehouse called dim_case_client with the fields CaseClientID, ClientName and ClientCountry.

To complete the dimension I need to remove the fields I have just added to my dimension from my stream data so I end up with just my fact data. I can do this with a step Select Values from the Transform group like so:

I will do a set of these for each dimension, so once I have added them all my transformation will look like this:

Load the remaining data into a fact table

The remaining data contains facts and keys, and should be loaded into the fact table in the datawarehouse. I can do this by adding an Output step Table Output like so:

With this step I have now completed my transformation.

Running the transaction

I can run the transformation by clicking the Run symbol or from the Actions menu. My connections will determine if I am running locally or on the server. It is also possible to run transformations regularly on the BI Server, but that is still on my to do list and will be added here once I figured it out. When running you can see the metrics:

And check the logging of you have any errors:


  • Aucun