How to deal with reference data
Reference or master data is an important topic for every application. Let’s have a look at the build-in options we have for creating these kinds of data when working with a CUBA application. Additionally: what possibilities do we have to extend this features to our own needs?
Different categories of data
First of all, although everyone might have a rough understanding about the term reference data, I had to look it up myself and look especially about the differences between reference data and master data. I found this great explanation about the different data types. Mainly there are the following categories of data types:
- reporting data (e.g. aggregated sales data)
- transactional data (e.g. order information)
- master data (e.g. customer information)
- reference data (e.g. order status)
- metadata (e.g. created timestamps)
In this article we will consider the reference data, because it oftentimes has specific requirements.
Problems with reference data
I will not go into much detail about the theory behind master data management, first of all because it is a very broad field and secondly I’m not 100% into it :). Instead I’ll redirect you to a starting point on master data management. But to get a basic understanding around that topic, let’s look at some instances of reference data and what potential problems arise when working with them.
So here’s the example we will go through in this blog post. Let’s imagine we have just another order management system. To manage orders we generally have the following data types that can be categorized into the above mentioned groups:
- transactional data
- Order
- master data
- customer
- reference data
- customer type
- payment method
- tax rates
- tenant information
The interesting part for this blog post is the reference data. CustomerType
is the first of it in the row. In this entity we can classify a customer with entries like new customer, potential customer, loyal customer etc. These classifications of customers are there for e.g. reporting purposes. The general problem with reference data is that it is very sensible towards changes.
Imagine what happens when the entry with the name “new customer” is changed to “Impulsive customer”. In this case not only new customer entries will be able to select this entry as the customer type, but all existing customers that had been previously classified as “new customer” are now “Impulsive”. Is this really what the reference data changer thought of when doing the change? Probably not.
The next example is the TaxRate
entity. Tax rates not only have a name and a code, but a rate value as well. Additionally, as the tax rates are only valid for a certain time period, there is a need to define validities for this type of data. In this case, when working with this data, only certain entries are allowed to be selected depending on a particular point in time.
Alphabet Inc. Ordermanagement
In order to have a look on how we can handle these kinds of situations within a CUBA application, I created an example for the ordermanagement of the Alphabet Inc. With this example we will care about the three requirements:
- Handle deactivated / deleted ReferenceEntity references
- Filter only for currenty valid TemporalReferenceEntities
- Display only allowed CustomerTypes per tenant
First of all, let’s take a look at the underlying domain model:
You’ll find most of the above mentioned entites. All reference entites are subclasses of the common base class ReferenceEntity
. In case of a reference entity with temporal validity information it is the specialization TemporalReferenceEntity
. PaymentMethod
and TaxRate
have validity information while CustomerType
is a normal reference entity.
Handle deactivated / deleted ReferenceEntity references
The first thing that we have to be aware of is the fact that reference entites can change over time. New entries are created and other entries get obsolete. Let’s look at the problematic situation of a reference entity that gets removed.
The immediate question that comes up then: what happens to the transactional data that has a reference to the ReferenceEntity instance. Let’s look at the Customer entity for this example: in this case, the reference data would be the CustomerType. What happens if the CustomerType gets removed from the system?
There are the main options:
- Remove the customers from the system that have this CustomerType
- Remove the reference to the CustomerType for all these customers
- Soft-delete the CustomerType and leave the customer references untouched
For transactional data, it might not even be a big deal, but if you start thinking about statistics and historical information, it is absolutely crucial that these transactional data stay intact.
For example, you have a statistic that gives you a pie chart of the turn over share for all customers in 2015, separated by their CustomerType. In 2016 you remove the main CustomerType with all its customers. Now, when you execute the 2015 statistic again, the pie chart will change significantly. Therefore removing all customers like in p.1 will not work out.
Depending on the type of statistic you want to make, the second option would not work as well. Let’s imagine we want total turn overs only for a certain customer region. In this case, the statistics be infected by the change as well.
So both options are probably not what we want for historical data.
Let’s take a look at the third option. In this case the data will not be deleted at all. Instead, they will be tagged as deleted (e.g. via deleteTs
and deleteBy
attributes). With this the transactional data will stay intanct and the statistics will remain the same, doesn’t matter how often and when they are executed.
Soft deletion in CUBA
So. let’s try to implement this one in CUBA. Soft deletion is a feature of the platform that is already in place if we use StandardEntity
as a base class for our entities. In our case CustomerType is soft deletable.
The default behavior of CUBA in case of a deletion of a soft deletable entity is as follows:
- The instance will not show up in the browse screens of the entity anymore
- References from the other entities will be displayed normally (probably the instance name)
- In case of the editor of an entity with the reference, when using the PickerField, in the lookup screen the option will not be available anymore
Basically this is exactly what we want in this case. The deleted CustomerType “new Customer” is stopped from using it in the future (new customers or changing existing customers), but existing entries remain correct (because customer “Juan Perkins” has been classified as a “new Customer” before). You can see the result in the pictures below.
After this is working out, the next requirement that we can take a look at is how references are handled that are only valid within a certain time area.
Filter only for currenty valid TemporalReferenceEntities
The main difference from the example above goes like this: when I edit an entity that has a reference to another entity - I should be able to select the instances that have been valid at this point in time.
An example of this would be: an order has a payment method. These payment methods may change overtime, because new payment methods are added as the time goes on. But when I want to change the payment method for an existing order, I should only be allowed to see the payment methods that have been available at the time of the order.
To get that going, the PaymentMethod
is a subclass of TemporalReferenceEntity
, that additionally has the two major attributes validFrom
and validUntil
. Instead of deleting an entity instance, it will be invalidated via setting the validUntil date.
This means, that we have to filter our default lookup list of the payment methods to see only the entries that are valid at a given reference date (which in case of the order is the orderDate).
For this to work I created a common superclass for the controllers of the browse screen: TemporalReferenceEntityBrowse. It takes a Date as a window parameter that gets used to filter the entries like this:
With this, the datasource filter query is changed to show only the entries that are valid at the given reference date. The subclasses of this are only required to define the datasource and the entity name with the corresponding abstract methods (see PaymentMethodBrowse for more details).
To pass the order date as the reference date to the lookup screen, take a look at the OrderEditor, which does exactly that:
With this, exactly the same behavior as above is achieved. When a new order gets created, it will show only the payment methods that are valid at the given order date. If the reference data changes (an entry gets deactivated), it will remain in the reference for existing orders.
Depeding on the use case, the reference date might not be a particular date within an entity. There is another example of this in the application: the customer can have a preferred payment method that should be used by default. In this case, there is no reference date like the orderDate, so the current calendar date is used to get only the currently valid options (see the CustomerEditor for details).
Display only allowed CustomerTypes per tenant
As a third example, that is only partly related but nevertheless very interesting is the following situation:
Let’s imagine we are in a multi-tenant environment (like in the cuba-sample-saas example). In this case, we have the ordermanagement for Alphabet Inc. Since this holding has different companies that are part of it, the application needs to have distinct ordermanagement systems for each company (tenant). A tenant on its own is allowed to define which customer types are relevant in the company.
The first option to solve this issue is to let every company create their own entries for CustomerType - this would definitely get the job done. CustomerType would be a subclass of TenantEntity
, and there we go.
But let’s imagine it is required to make statistics over the customers of all companies within the holding. In this case, let every company define their own CustomerTypes wouldn’t work.
So we take another approach. We will share the CustomerTypes for the whole holding. The companies are only allowed to blacklist some of the entries so that they will not get displayed since it might not make sense in their business to have such customer types.
Here’s how we can achieve it:
As described in the entity model from above, we create a class TenantCustomerType that is responsible for holding the blacklist of every tenant for the CustomerTypes.
Next, we will create a Security Group constraint that will kick out every CustomerType via security mechanisms that is in the list of the CustomerType blacklist like this:
Now, when we login as a specific tenant (google:google or fiber:fiber), we will only see and be able to assign references to non-blacklisted CustomerTypes.
With this, we have finished our three requirements for the Alphabet Inc. Ordermanagement system regarding reference data.
As always you’ll find the example application with the described solutions on Github: mariodavid/cuba-example-temporal-reference-data