Analyzing the user stories at Using PDI (Data Integration) to extract, transform and load data#UserStories I need the following data:
Developing the query
Looking at the CiviCRM data model I will need data from a hefty number of tables, and I will need to generate an SQL that retrieves that data from the CiviCRM database. In the PUM setting this is the actual SQL query:
Obviously you need knowledge of where to find the data in CiviCRM to do this, and knowing a bit about SQL helps too. We will not discuss all the details of this here, we assume you know how to get the data with SQL.
Adding the Table Input step
In PDI, you now click on the Design tab and you will see a list of steps. There are quire a few! To extract the data, you will select one of the Input steps.
To get a step on your transformation you select it from the menu and drag it on to your workspace.
In this case we will use a Table Input as the input data is coming from a MySQL table. But it could also come from an Excel file, or can be entered in a Data Grid etc. etc.
Once the step is on my workspace I can select it to enter properties.In this case I have given the step a name, selected the database connection and added my query. I can click on the preview button to see if my query actually produces any results.
Transform the data
In the next steps I am going to replace the gender values from my data extract. As you will all know CiviCRM by default shows 1, 2 or 3 for a gender. In my datawarehouse I want to replace those values with Female, Male or Unknown. To do that I will add a Transform step of the type Value Mapper with the properties like this:
I will add a hop (a link) between the steps so they are connected (click on the step and select the output connector or hold shift and drag from one step to the other). Once I have completed all the required gender mappings the total will look like this:
Removing the null values
I do not want null values in my reports so I will now add a step to replace all null values. This is a Utility step of the type If field value is null and then replace all possible null values like so:
I will also add a filter to remove some of the case types I do not need in this data cube. I will add a Flow step Filter Row like so:
After adding this step my total transformation will look like this:
You can test each step (right click on the step and select Check) or do a preview (right click on the step and select Preview). You can see the transformation running and can see statistics in the bottom of your window.
Adding a dimension
Now that I have cleaned my data I need to add the dimensions. I will add a step from the Data Warehouse group called Dimension lookup/update. A separate table in my data warehouse will be created for each dimension. So I need to specify what the dimension is, what the table in the data warehouse will be called, what fields will be included etc.
I have added the CaseClient Dimension as an example:
You can see I have entered the PUM Datawarehouse connection as this is where the dimension table needs to be added. I have named the table dim_case_client and selected the field case_client_id as the key field in my transformation stream, which will then be named CaseClientID in my resulting dimension table.
I have also added a technical key of the name CaseClientIdTK (Owen has taught me well that I should stick to a convention so my technical keys which are required all end with TK). And if I now click on the tab Fields (next to Keys) I can enter the fields from my transformation stream and name them in the table:
You can see that at the end of the line there is a value Punch through. This means that if the value of the field has changed on the next transformation run, the value should be punched through to all instances where this field is used in the data warehouse. Once the name of the client changes, it changes everywhere.
So after this step I will have a new table in my data warehouse called dim_case_client with the fields CaseClientID, ClientName and ClientCountry.
To complete the dimension I need to remove the fields I have just added to my dimension from my stream data so I end up with just my fact data. I can do this with a step Select Values from the Transform group like so:
I will do a set of these for each dimension, so once I have added them all my transformation will look like this:
Load the remaining data into a fact table
The remaining data contains facts and keys, and should be loaded into the fact table in the datawarehouse. I can do this by adding an Output step Table Output like so:
With this step I have now completed my transformation.
Running the transaction
I can run the transformation by clicking the Run symbol or from the Actions menu. My connections will determine if I am running locally or on the server. It is also possible to run transformations regularly on the BI Server, but that is still on my to do list and will be added here once I figured it out. When running you can see the metrics:
And check the logging of you have any errors: