In N.C. voting data, not everything is what it first seems

In a database with more than 30 million records — like the one that tracks voting participation in North Carolina — we’ve found thousand of records that seem to look suspicious on first glance. To the untrained eye, some of these might seem not just suspicious but outright voter fraud. But like UNC journalism professor Ryan Thornburg says, when you see anomalies in data you’ve got one of two things — either a good story or bad data. And upon closer examination, it looks like a lot of the oddities in the State Board of Elections and Ethics Enforcement’s voter files are not violations of election law but just dirty data.

Data can be dirty in a lot of different ways, and people who don’t understand the limits of data run the risk of drawing inaccurate conclusions from it. So when we see dirty data, we do what all good reporters should do — check it out. With help from The News & Observer’s David Raynor, I got in touch with Patrick Gannon at the state elections board and discovered that in some cases, what looks like faulty data is actually just a quirk in the system the board uses, or simply just human error.

This is the first in a series of posts we will be doing that will show you all the pitfalls we’ve discovered and what you need to do to avoid them. In our first installment today, we have what appears in the database to be North Carolinians with amazingly long lives and one voter who seems to have traveled in time. Spoiler alert: neither are what they appear to be.

“Dead” Voters

With 5,340 registered voters aged 110, and older, also known as “supercentenarians,” and just 27 living supercentenarians nationwide, it was immediately clear to us that there was a discrepancy between the voter files and the census. As Gannon explained to us, they don’t really think these people are that old. The board just isn’t sure how old they are.

Upon further analysis of records of voters over the age of 115 — the oldest person in the United States is 114, so we knew those must be errors — we found that the vast majority of them were located in Guilford and Forsyth counties.

According to the elections board, though, there’s a simple explanation: most of them are so-called legacy voters, who “registered to vote before dates of birth were required on voter registration forms and have never updated their dates of birth in their voter records” which, for example, would have happened if they had moved from their original address. When the board moved over to a new voter registration system in the late 2000s, legacy voters without dates of birth automatically defaulted to birth dates of Jan. 1, 1900 or Jan. 1, 1901. These two explanations account for about 98 percent of the voters who appear to be more than 110 years old.

The remaining voters are either still alive, have died and not yet been removed from the voter rolls or election workers incorrectly typed their ages. If voters die in a state that does not share death records with North Carolina, they could remain on the rolls for up to eight years.

TABLE: Voters Who Appear to Be Older Than The Oldest Living American

County
Voters 'Over 114'
GUILFORD
2,225
FORSYTH
1,486
CUMBERLAND
461
RANDOLPH
267
HENDERSON
183
HALIFAX
155
JOHNSTON
85
SCOTLAND
61
PITT
54
BEAUFORT
31
RICHMOND
24
SAMPSON
13
GRANVILLE
13
ALEXANDER
9
WAYNE
9
EDGECOMBE
8
ROBESON
7
NORTHAMPTON
7
ORANGE
7
BRUNSWICK
7
YADKIN
6
VANCE
6
MCDOWELL
5
MARTIN
5
CHEROKEE
5
GATES
4
ALAMANCE
4
CHATHAM
4
BERTIE
4
HERTFORD
4
MONTGOMERY
3
ROWAN
3
HARNETT
3
NASH
3
COLUMBUS
3
DUPLIN
2
FRANKLIN
2
CURRITUCK
2
MECKLENBURG
2
NEW HANOVER
2
PERSON
2
CARTERET
2
BUNCOMBE
2
BLADEN
2
WILSON
2
WATAUGA
1
CABARRUS
1
BURKE
1
ANSON
1
JACKSON
1
DURHAM
1
DAVIDSON
1
STANLY
1
MITCHELL
1
ONSLOW
1
CASWELL
1
PASQUOTANK
1
DAVIE
1

Whether the state has a lot or a few voters who appear to be older than logic would suggest depends on your perspective. Gannon pointed out that these pseudo-supercentenarians account for just seven one-hundredths of one percent of North Carolina’s registered voters, and that in 2012 there were nearly four times as many voters on the rolls who had the default Jan. 1 birthdates.

SQL Query for Finding ‘Supercentarian’ Voters:

SELECT birth_age FROM voter_ncvoter
WHERE birth_age >= 110
AND status_cd IN ('A','I')
ORDER BY birth_age;

BONUS: A Time Traveling Voter?

We also discovered a voter who appears in the state’s database to have voted in the 2018 election. The elections board said that didn’t actually happen.

It was essentially a typo on the part of an election worker, according to Gannon. He said the board is working to get the voter’s history corrected.

Send Us Your Questions

What questions about voting or voter registration would you like to know? Send them to us either via email to ncvotes@reesenewslab.org or on Twitter to @nc_votes with the hashtag #askncvotes.

Coming up in our series, we’ll address these known issues that you need to understand before attempting to analyze North Carolina’s voting and election records:

  • Voters with duplicate registrations or who appear to have voted twice
  • Change in precinct names and shapes from election to election
  • The reasons you can’t just use the current voter registration file on the board’s website to analyze participation in older elections
  • The different ways that absentee and one-stop votes appear in different counties and different years
  • The problem of misspelled candidate names, the kinds of errors that can introduce in the analysis and how to avoid it
  • The reason you can’t simply take the person who received the most votes in an election and always call that person the “winner”
  • The correct way to analyze results of elections across counties — like this year’s Rhodhiss malt beverage vote. And elections in the city of Durham every year
  • The problems we had just obtaining the correct names of candidates who would be on the ballots in this year’s municipal elections.on the ballot this year. For the first several weeks after the end of candidate filing period, counties sometimes had different candidates listed than the state elections board had listed as candidates
  • All the different ways that “write in” candidates appear in the database. — sometimes they are votes for one candidates and sometimes they are votes for multiple candidates
  • In the voter registration files, what’s the difference between mailing address, voting address and municipality? Why are they sometimes they can be different and under what circumstances should do you use each?
  • What kind of data is missing from the voter registration and results files? Are there places where we have NULL values? And how do we deal with those?
  • A look at the percentage of people who list their gender or , race , etc., as “other” — what different things might this mean, and how do we deal with it?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.