Both government agencies and private companies keep vast databases containing very sensitive and very personal information about tens or hundreds of millions of subjects. For example, the FBI's National Crime Information Center (NCIC) keeps records on arrests, outstanding warrants, criminal histories, and other data that might be of use in investigating crime. It currently processes an average of 7.5 million transactions each day. 1 When subjects are stopped by the police, their identities are often checked against the NCIC to see if they are currently wanted for a crime, on probation, or considered dangerous. In the private sector, large credit agencies such as TransUnion and Equifax, keep computerized credit histories on close to a hundred million people. These are searched hundreds of thousands of times each day by thousands of subscribers, whenever a customer requests credit of any kind, whether it is to apply for a loan or simply make a credit card payment. 2
These databases are used to make many critical decisions affecting peoples' lives. Someone can be arrested and detained or denied a mortgage or the use of a credit card based on the data stored in them. Yet the sheer size of these databases, as well as the procedures used to collect, process and maintain the data in them, ensure that they will contain many inaccuracies. A study done by Kenneth Laudon for the Office of Technology Assessment (OTA) found that only 25.7 percent of the records sent by the FBI's identification division were "complete, accurate and unambiguous." A higher percentage, about 46 percent, of the criminal history records in the NCIC met these standards. When Laudon checked a sample of open arrest warrants on file with the FBI against records in the local court houses where they originated, he found that over 15 percent of them were invalid, either because there was no record at all of them or they had already been cleared or vacated. Thus 15 percent of the warrants on record put their subjects at risk of being arrested for no justifiable reason. 3
There are also documented problems with credit databases. For example, according to David Burnham, in 1980 TRW was receiving 350,000 complaints a year from consumers who felt their credit reports were inaccurate, which resulted in 100,000 of the records being changed. And these were only the errors detected by subjects and acknowledged by the company. 4 A 1990 sample of credit reports done by Consumers Union uncovered "major inaccuracies" in 19 percent of them. 5 In 2004 the National Association of State Public Interest Research Groups, in a study of consumer credit reports, estimated that as many as 79 percent of those reports could have some error and 25 percent of the reports could have an error serious enough to lead to the denial of credit. 6 A comprehensive study of the reliability of credit data done by the Federal Reserve Bank, also in 2004, while acknowledging the prevalence of errors in the data, found that the overall impact of the errors on the credit scores of the consumers affected was "modest." However, it also pointed out that the negative effects fell disproportionately on those who were most vulnerable, in that "individuals with relatively low credit history scores or those with thin files are more likely to experience significant effects when a data problem arises." 7
The errors in these massive databases, whether in the public or in the private sector, can be quite damaging. For example, Burnham describes the case of Michael DuCross, who was stopped for a routine traffic violation, when a check with the NCIC showed that he was AWOL from the Marine Corps. DuCross was arrested and held for five months before it was found that he had not been AWOL at all, but had been discharged legitimately. 8 Burnham also tells the story of Lucky Kellener, who, after paying his brother's rent for him, was mistakenly listed in the court papers when his brother was evicted from his apartment. When Lucky went to rent a new apartment for himself, he was turned away by several potential landlords, until he finally found that, because he was named in the eviction notice, he was marked as an undesirable tenant by an investigative service often used by landlords. 9 As another example, a man named Charles Zimmerman was charged 25 percent more than he should have been for medical insurance because a database used by insurers to investigate risk factors mistakenly identified him as being an alcoholic. 10
Inaccurate data can arise from simple data entry errors, from sloppy data collection at the source, or from the misunderstanding or misinterpretation of information, either at its origin or where it is used. Some of this is inevitable, given the volume of data involved; but those who collect and maintain the data contribute to it with their poor or nonexistent auditing and control procedures. A study by the OTA, for example, found that it is rare for federal agencies to audit data quality, and they generally have very low standards for accuracy. 11 When TRW was sued for transmitting an inaccurate credit report, it argued that it had no legal obligation to ensure the accuracy of the information it had received from its sources. 12 On the other hand, once the information has been entered in the database, those who use it seldom question its validity. The attitude is, if it is in the computer it must be right. For example, the company that overcharged Mr. Zimmerman for his health insurance later admitted that it had not conducted its own investigation to verify the data on him in the database, although it was expected to, according to the policies of the bureau maintaining the database. 13
Data can also be erroneous or misleading because it is incomplete. Sometimes the basic facts that are stored in the database may be accurate, but some critical supporting material is left out, either because it was neglected or unknown or because it did not fit into the database design. For example a credit record may show that a customer has unpaid credit card bills but not add the important qualification that the bills are in dispute and that is why they are unpaid. Based on the material in the database, the customer could be labeled unfairly as a bad credit risk. Data can also be incomplete because more recent facts relevant to the case have not been added. A crime database might show an outstanding warrant for someone without showing that the charges have since been dropped so that the warrant is no longer valid.
Another source of error is fraud. "Identity theft," in which malefactors collect enough personal information about victims to be able to masquerade as them, has been identified as the fastest growing crime in the United States. 14 Criminals can obtain this information from a variety of sources. 15 The most attractive targets are credit card numbers. Social Security numbers are also useful, because, combined with a name and birthdate, usually easily obtainable, they can be used to open an account or obtain other credentials in someone else's name. Such information can be obtained by breaking into the victim's personal computer; but it is far more productive to break into a corporate database that stores thousands of pieces of information. Or one can purchase them from a third party who has already done the dirty work. 16 So-called "carder" Web sites make this particularly convenient. 17 Sometimes this data is mistakenly made public by its custodian, put on the Web through some error or oversight for anyone who wants to access it. 18 Low-tech solutions work just as well. Credit card numbers, Social Security numbers and other identifying data can often be found on carelessly discarded records ("dumpster diving") or obtained by calling the victim and impersonating a bank or other trusted agent and asking for identification ("phishing"). 19 The thieves then use the victims' data to impersonate them and obtain credit in their names, running up big bills and ruining the credit ratings of their victims. Usually the scam is not detected unless and until the victim is denied credit somewhere and decides to investigate. It takes even longer to correct the record. 20 Sometimes these false identities are used when the impersonator is caught committing a crime, giving the victim an undeserved criminal record.
It is difficult in general to detect errors in databases. Usually the burden of identifying and substantiating the error is left up to the subject, the one to whom the data refers. It is even more difficult to get the database administrator to make the change. Moreover, because of the way data is shared among computers and propagated from one database to another, sometimes correcting the data in the original database is not enough. The bad data persists long after the correction has supposedly been made. As one example of that, Forester and Morrison relate the story of a man who was mistakenly arrested because someone had stolen his wallet, adopted his identity and subsequently committed a crime. Even after the victim was cleared of the charges, he was arrested five more times over the next 14 months. He then received a letter from the local authorities explaining that he was not a suspect in the crime, but still ran into trouble with law enforcement authorities when traveling in other states. It took a long court battle to get all traces of the erroneous record eradicated. 21 In cases such as this, the data seems to take on a life of its own, beyond the control even of those who were originally responsible for it.
Even when data is correct, it can be misinterpreted because it has been removed from its original context. A man, for example, might file charges against his estranged wife as retaliation in a bitter domestic dispute. That would most likely be known by law enforcement officials in their local community. But if she were traveling elsewhere and officials looked up her record, they would only see that she had charges against her. They might treat her very differently in that case. The problem is that once the information is entered in the computer system, it becomes divorced from its source and from the context that gives it its significance. As a result the information can be misused and people mistreated because of it. An example of the danger of taking information out of context occurred in Massachusetts, where an elderly woman had her Medicaid benefits terminated because the balance in her bank account was greater than the maximum assets allowable under Medicaid. However, part of her balance was held in trust for funeral expenses, which by law should not have counted in the calculation of her assets. Yet the origin and purpose of the assets did not appear in the bank records, just the balance; and based on what was in the record she was denied the benefits to which she was entitled.22Michael McFarland, S.J., a computer scientist with extensive liberal arts teaching experience and a special interest in the intersection of technology and ethics, served as the 31st president of the College of the Holy Cross.
2. Burnham, pp. 43-45.
3. Kenneth C. Laudon, "Data Quality and Due Process in Large Interorganizational Record Systems," Communications of the ACM, 29(1) (January, 1986): 4-11.
4. Burnham, pp. 44-45
5. Stuart Silverstein, "Applicants: Past may Haunt You; Worried by Workplace Crime, More Employers are Using Background Checkers. These Firms Comb Court, Injury Records or Keep Databases -- Sometimes Faulty Ones -- Often Without Candidates' Knowledge," The Los Angeles Times (March 7, 1995): A1.
6. Alison Cassady and Edmund Mierzwinski, "Mistakes Do Happen: A Look at Errors in Consumer Credit Reports," National Association of State Public Interest Research Groups, (June 2004), www.uspirg.org.
7. Robert B. Avery, Paul S. Calem, and Glenn B. Canner, "Credit Report Accuracy and Access to Credit," Federal Reserve Bulletin, (Summer 2004), pp. 297-322, at p. 321.
8. Burnham, pp. 33-34.
9. ibid, pp. 34-35.
10. Gary A. Seidman, "This is Your Life, Mr. Smith...: An Insurance Data Base Knows All. But Who Uses it? What If it is Wrong?" The New York Times, (August 1, 1993): 7.
11. Roger A. Clarke, "Information Technology and Dataveillance," Communications of the ACM, 31(5) (May, 1988): 498-512, p. 506.
12. Burnham, p. 44.
14. Rebecca T. Mercuri, "Scoping Identity Theft: The computer's role in identity theft incidents may have been misgauged through overestimates of reported losses," Communications of the ACM 49(5) (May 2006), pp. 17-21.
15. Adam Cohen, "Internet Insecurity: The Identity Thieves are Out There—and Someone Could Be Spying on You. Why Your Privacy on the Net is at Risk, and What You Can Do," Time, (July 2, 2001), pp. 45-51.
16. Reuters, "Hacker Pleads Guilty in Vast Theft of Card Numbers," The New York Times, (September 12, 2009), http://www.nytimes.com/2009/09/12/
17. Robert McMillan, "Three years undercover with the identity thieves: FBI's Cyber Initiative and Resource Fusion Unit infiltrates online fraud site DarkMarket," PCWorld, (January 20, 2009), http://www.pcworld.com/article/158005/
18. Kevin J. Delaney, "Identity Theft Made Easier," The Wall Street Journal, (March 29, 2005), p. B1.
19. Mindy Fetterman, "Most ID theft takes place offline," USA Today, (January 27, 2005), p. 5B.
20. Tom Forester and Perry Morrison, Computer Ethics: Cautionary Tales and Ethical Dilemmas in Computing, Cambridge, MA: MIT Press (1990), pp. 89-90.
21. ibid, pp. 90-91.
22. Spiros Simitis, "Reviewing Privacy in an Information Society," University of Pennsylvania Law Review, 135 (1987): 707-746, p. 718.