In-Process Data Review Tool (DRT)
Extracted from Genie
This Data Review Tool (DRT) was created by the Interagency Collaborative for Program Improvement’s (ICPI) Data Access and Quality (DAQ) Workstream to help review MER data. These checks assess multiple aspects of the data and aim to identify potential issues and inconsistencies in the data. Each data quality check occurs at the site-by-implementing mechanism (IM) level (in other words, it occurs at the level at which data entry takes place).
The data used to populate this tool comes from DATIM. However, there is a delay before data that is entered into DATIM appears in this tool. The contents of the DRT Genie Extracts are updated & synchronized with DATIM daily. The Genie webpage shows a time stamp (similar to the one shown below) that shows when data from DATIM was pulled into this tool. Data that was entered after the timestamp displayed in your web browser will not be included in the tool until the next time data is refreshed/synced in Genie. As a best practice, we suggest that when you save this file to your computer you include the date and time displayed in Genie in the file name.
The table below details when the synchronization process from DATIM to the Genie begins and ends each day in several time zones: For example, if a user enters data in DATIM on Monday before 6:00PM Johannesburg time, the user will be able to query these data in the DATIM Genie on Tuesday after 1:30PM Johannesburg time once the full DATIM to Data Genie synchronization process is complete. Any data entered Monday after 6:00PM Johannesburg time will not be available in DATIM Genie on Tuesday, but would be available in DATIM Genie after the synchronization is completed on Wednesday at 1:30PM.
Security Trimming (User Permissions Levels)
The data visible to users in this tool will depend on their user roles and implementing mechanism approval levels - this is known as security trimming. The table below outlines the different approval levels, user roles, and data accessibility for each approval level and user role (Yes = data viewable. No = data not viewable to that user role).
*Please note that partner level users can only view data from their own partner. Regardless of approval level, users with partner level accounts cannot view data from other implementing partners.
Types of Data Checks
The data checks are categorized by different categories of validations (or data checks). Each category then has different types of checks that are performed on different indicators. The categories of checks are:
1. MER Logic Checks: Based originally on MER reporting Guidance documents. Includes checks such as:
-Logic checks comparing the relationship between reported totals within an indicator (ex: if the numerator is greater than the denominator)
-Logic checks comparing the relationship between related indicators (ex: there are results reported for the VMMC_CIRC indicator but not for the VMMC service delivery point for HTS_TST)
2. Checks Across Time Periods: Assesses the consistency of reporting data across time periods. Two primary types of checks are included to assess:
-Whether IMs at facilities have both targets and results reported
-Inconsistency of reporting results across time (e.g., an IM at a facility reported results in the previous time period but not in the current time period, or vice versa)
*Checks Across Time Periods may highlight some cases that are expected & explained by intentional pivots to/from certain partners and/or geographic locations. In these cases, there are not actual data quality issues.
3. Disaggregate Completeness Checks: Assesses whether disaggregate totals are greater than the total numerator.
*The disaggregate completeness checks have been temporarily disabled in the Genie. The disaggregate completeness checks should be turned back on and included in the DRT extracts following the release of the Site x Implementing Mechanisms Q2 Datasets.
4. Contextual Site by IM Information: Checks in this category do not imply a data quality error.
This set of checks provides contextual information about the number of sites that reported results, reported targets, and reported both targets and results for each indicator. The results of these checks are located on the "Contextual Site IM Info" tab and are supposed to make it easier to interpret the prevalence/commonness of other flags located in the "Main Site IM Checks" tab of this tool.
For a full list of the checks and explanations of each check, see the tab labeled "Data Checks Appendix."
How to Use This Tool
Main Site/IM Checks Tab
The "Main Site IM Checks" tab shows cases where an IM reporting at a site violated one a reporting rule. Slicers for PSNU Prioritization levels, PSNU, Site Name, Implementing Mechanism ID, Agency, Programmatic Area, and site types are included so that users can filter data to desired levels. The slicers on the "Contextual Site IM Info" and the "Main Site IM Checks" tabs are linked so using a slicer on one tab will cause the same filters to be applied on the other tab. Importantly, the only checks that appear in the pivot tables are those that have one or more Site/IMs that violated that check given the (a) filters selected in the DATIM Genie and (b) combination of filters/slicers that are selected. If a particular data quality check does not appear in the pivot table, it is because either 1) no site by IM combinations have violated that check or 2) any site/IM combinations that violated that check were excluded from analysis based on filters in DATIM Genie or in this tool's slicers/filters. The numbers in the "Number of Cases Violating the Check" column on the "Main Site IM Checks" tab represent the number of instances where an IM at a site violated each data check.
There are drill-down buttons (indicated by a '+') on each row next to each check that allows the user to drill-down to the PSNU, site name, Agency and then Implementing Mechanism(s) ID that violated each check at each site. Tip: Right click on any data quality check, PSNU, Agency, or IM in the pivot table and utilize the various expand & collapse options to quickly expand/collapse the entire pivot table to the level of your choice (see above).
Additionally, if the user is interested in seeing the raw data for particular row in the pivot table, the user can do so by double clicking the cell in the "Number of Cases Violating a Check' column. This will open a new excel sheet with only the row(s) that pertain to that geographic level and check. For instance, double clicking on the cell in the image below to the left will open a new sheet with only data from the PSNU Dorne that violated the check shown below. The user can do this for any level that is shown in the pivot table.
Contextual Site IM Info Tab
This tab includes the Contextual Site by IM Information checks. These provide contextual information that make it easier to interpret the prevalence/commonness of potential data quality issues flagged in the "Main Site IM Checks" tab. This tab can be useful if you compare the following two things across the "Contextual Site IM Info" and "Main Site IM Checks" tabs:
1) The number of cases where an indicator was reported or the number of cases where a check was violated, and
2) The total values for each indicator
Making these comparisons allows you to determine whether the potential data quality issues shown in the "Main Site IM Checks" tab make up a large or small proportion of the reported data. Knowing whether data quality issues highlighted in the "Main Site IM Checks" tab make up a large proportion of the total results can help to prioritize data cleaning efforts. For example, if someone saw 15 cases where a VMMC data quality check was violated (shown on the main tab) and there are only a total 25 site/IMs reporting results for VMMC_CIRC (shown on the "Contextual Site IM Info" tab), then they might make fixing the 15 data quality violations a high priority because 15 cases represents a large proportion of the VMMC results. However, if the "Contextual Site IM Info" tab showed that there were over 1,000 site/IMs reporting VMMC, then fixing the data quality issues might be prioritized differently because the 15 issues represents a relatively small proportion of the VMMC_CIRC results. Ultimately, the prioritization of data cleaning efforts will depend on a number of factors - the "Contextual Site IM Info" tab is just one tool that can help users prioritize data cleaning tasks.
As mentioned above, there are also multiple slicers at the top of the sheet that will allow the user to drill down to specific SNU prioritization levels, PSNUs, implementing mechanisms, etc.
Data Checks Appendix
This tab details all of the checks implemented, regardless of whether the check resulted in violations for a particular site. These checks are grouped by program area and further color coded by type of check. The specific data elements, explanations, and data labels of the variables in the dataset used within each check are also listed next to each check.
Caveats / Limitations around these Data Checks
These data checks have been created to help stakeholders identify potential data quality problems. However, there can be programmatic reasons why data that appears to violate a standard data quality check might not be problematic. Reviewing data and deterring what issues need to be fixed/cleaned often requires deep knowledge of the country context. When SI Advisors, in-country teams, and implementing partners identify sites with data quality issues, teams are encouraged to work together to determine the best plan to correct these problems. This may include updating of data in DATIM. Or, for more complex or systemic issue, it may include documenting that the issues were not possible to correct during that reporting period and identifying action steps that can be taken to improve the problems in future reporting periods.
It is also important to note that these data quality checks only assess some aspects of data quality; other aspects of data quality are not possible to verify with quantitative formulas. Not being “flagged” in these checks does not necessarily mean data are of quality. Instead, it just means that none of the initially prioritized checks have been triggered. For example, it is possible that at the service delivery point or in the reporting flow within a country, errors are introduced but reported results do not trigger any of the prioritized data quality checks. These checks are designed to facilitate and complement in-country and headquarter activities to assess and or improve data quality, not to replace them.
If you experience technical problems using this tool, please submit a ticket to the DATIM Help Desk.