top of page

How to get clean data for your HR reporting


All the talk about people analytics has led to a resurgence of interest in better HR reporting. Unfortunately, most of the advice on how to do reporting reads like this: “Metrics should be based on accurate data, they should deliver clear insights, and you should tell a story around those insights that will drive action.” (Here the consultant pauses for applause, then sends in their invoice.)

The problem is not that these points are wrong, it’s that those are the very issues that are standing in your way. You don’t have accurate data, it doesn’t deliver clear insights, and there is no story that’s going to drive action.

In this article, we’ll look at the first of these issues: the lack of accurate data.

Lack of Accurate Data

Generating accurate data is expensive. Cleaning up old inaccurate data is even more expensive. If the organization wants that, then they have to pay for it. For example, if you want your recruiters to enter clean data in the applicant tracking system then you need to train them, monitor them, and motivate them…along with any necessary investment in technology and processes. On an ongoing basis, you have to pay for the time they spend entering data when they could be recruiting. You might even pay a certain cost in extra turnover since recruiters don’t like this part of the job. It’s not that getting clean data isn’t worth it, but let’s not pretend doing so is free.

There are practices that make getting clean data much less expensive, such as automating data capture, creating good user interfaces, and building carefully thought out processes—but of course, doing these things takes time and costs money.

Where does this leave us? It means we need to accept that having all our data clean isn’t a realistic business proposition. We need to do three things:

  • For the long run: Begin learning and adopting the practices that allow you to get clean (or at least cleaner) data as part of normal operating procedure. The better you get at this, the less costly it will be to capture data going forward. Implement these practices in different parts of HR depending on how important the data is.

  • In the short run: Make thoughtful use of the data you do have and be creative is digging up additional data as needed. In fact, my workshops on analytics end up focusing a lot of this step, (rather than analysis per se), because being creative in getting data is usually a prerequisite to any analytics work. In the short run, it’s also super helpful to know which data is accurate enough to guide a decision and which is not. Perhaps you don’t have very good source-of-hire data, but it may still be good enough to show which sources are no good—and that’s all you need to make the decision to drop them. Another part of being thoughtful comes down to how you communicate to stakeholders. If you indicate where you know the data is weak, and what kind of errors likely exist, then they can feel confident you know what you’re doing, even when the do come across dirty data.

  • Most importantly: Decide what data you really need. Consider this: Is all the information in your personal address book accurate, complete and up-to-date? Probably not. However, the most important parts will be pretty clean. For example, the emails are probably fairly complete, accurate and up-to-date whereas “street address” may be in poor shape. That’s a good thing. It’s a poor business decision to have your contact information fully up to date because it’s too expensive to do so. If what really matters are the emails, then focus on that. When you’ve decided what data is really worth collecting it will narrow the scope of what you need to do to the point that getting clean data is manageable.

What good data collection processes looks like

Frankly, I think anyone who simply looks at how a data capture process is being done will figure out why it’s going wrong. You don’t need any special education or tips to do so. But just to reassure you here are the sorts of the things you can do:

  • Use a pull-down lists where possible to avoid typos.

  • Implement automatic checks: for example, in Canada a postal code has 3 numbers and 3 letters, if someone enters data that has 2 numbers and 4 letters then the system can flag the problem.

  • Set up standards so that everyone does a calculation (such as turnover) the same way.

  • Audit for errors, then work back to find why they happened: for example, if the DIV codes are wrong your data entry staff may explain “Oh I don’t know what the DIV code is so I always just put in 1.” The solution in this case would be to improve training.

I could multiply these examples, but the point I’m trying to make is that it’s pretty self-evident what the problems are and how to fix them. You can figure this out on your own as long as you are willing to invest the time to do so.

The key lesson

Clean data, everywhere, all the time, isn’t practical for most parts of HR. We can get much better over the years; however, we need to learn to live with what is realistic, and manage, rather than lament, the inevitable shortcomings.

What you should tell leadership


  • We have identified the most important data—that’s what we are focusing on.

  • We are not cleaning up old data, it’s too expensive given how that data would be used, it’s got a low ROI so we’ve stopped doing it.

  • We have a good handle on the quality of data and we will be clear about that in all our analyses. We can tell you what data is correct, what is incomplete but usable, and what is suspect.

  • If there is a need to do more data clean-up, then we can work up an estimate of what it will cost and how long it will take.

For the future

  • We are building our expertise in implementing processes that create clean data.

  • We are establishing common standards (e.g. for how attrition is calculated).

  • You will find that we are getting better and better every year, we’re on the right path and taking a path that doesn’t demand a big increase in our operating budgets.


bottom of page