The state Department of Revenue (DOR) has a variety of duties. One of them is to confirm that people and organizations with a presence in the state are fulfilling their civic responsibilities by filing, reporting, and paying their taxes correctly.
Part of the way that the DOR evaluates this is through audits. But the DOR’s traditional audit methods rely on a person or organization having filed taxes at least once in the past.
The DOR needed a way to find “true non-filers”: individuals and organizations who have never filed taxes, and so are invisible to the traditional audit methods.
To find true non-filers, the DOR needed to use other state sources of information to identify people and organizations with a presence in the state. These sources included real property records, driver’s license records, court records, vehicle registration and title records, real estate transfers, and business and practice licenses.
The DOR had many state record systems that they could use to attempt to identify true non-filers. However, these sources of information were not designed to work together.
For example, each of the state's counties maintained its own property record system—meaning the DOR needed to match records across dozens of systems for real property records alone.
The real property systems were highly variable: essential identifying information like names and addresses were not recorded in standardized forms.
A “name” field, for instance, could contain a person’s full name, multiple people’s full names, or a business name, depending on the system. Names were entered in various formats between and within systems—and even within single records.
In all, the DOR had 25 sources of data, originating from many more actual systems, from which they needed to discover true non-filers. Keys that could be used to match records across all systems were nonexistent.
Cross-system matching to identify non-filing individuals and organizations would be essential, but the systems were so disparate that basic matching techniques—like looking for identical names—were unusable.
We built a solution to discover the true identity of individuals and organizations, the relationships between them, and the extent of their presence in the state, utilizing the messy data from the many state systems available to the DOR.
To identify individuals and organizations known to our client, we consolidated 5 years’ worth of tax filings (approximately 15 million records) to create a record of the tax filers.
We then identified individuals and organizations with a state presence by matching data across the non-DOR state data sources, including:
Potential non-filers were individuals and organizations who had a state presence, based on the state data sources, but no associated state tax history.
We identified the non-filers by using scenario-based matching, which embeds human judgment about the specific ways that records should and should not be allowed to match.
Paired with our expertise in data preparation, cleansing, and standardization, we were able to create the fullest possible picture of each entity represented in the system.
With all entities identified, we ranked the non-filers: the stronger the state presence, and the higher the total asset value, the more highly a non-filer was ranked.
The DOR’s trained auditors were able to select individuals from MIOsoft’s rankings of non-filers and examine the data and records that built the individual’s profile.
The auditor could then determine whether the individual was indeed a candidate for auditing. Ultimately, about 70% of MIOsoft’s highly-ranked candidates were selected for an audit.
“The system enabled us to discover some of the most difficult to find non-filers. This led to audits that generated real revenue for the state.” — CIO, state Department of Revenue