The Ethics and Politics of Search Engines
This panel was co-sponsored by the Santa Clara University Markkula Center for Applied Ethics and the Santa Clara University Center for Science, Technology, and Society February 27, 2006.
- Participants included:Peter Norvig, director of research, Google
- Terry Winograd, professor of computer science, Stanford University
- Geoffrey Bowker, executive director, Center for Science, Technology, and Society
- Moderator: Kirk O. Hanson, executive director, Markkula Center for Applied Ethics
- Introduced by Susan Leigh Star, Senior Scholar, Center for Science, Technology, and Society
MS. LEIGH STAR: Welcome. My name is Leigh Star, and I'm from the STS Center and the Department of Computer Engineering, and also president of the Society for the Social Study of Science. This evening's presentation will look at all sorts of patterns in information retrieval created by search engines, including who understands them and who owns them.
Last fall, we sponsored an event examining similarly the politics and ethics of video gaming, and we engaged that with the Tech Museum's "Game On" exhibit. We'll have one more event this year in that series on ethics and social aspects of information. On May 16, we'll have a discussion of the Social Order of Databases, what we keep and what we throw away and how.
In all of these talks, we have incorporated the axiom that even the most humble of information tools involves social organization, design, and use and politics. Who knew 10 years ago that to Google would become a transitive verb? I always thought it was just a big number. Information and its uses pervade all sorts of spaces, once taken up exclusively by books or by people or by other forms of action such as picking up the telephone and asking somebody for an answer, or even doing it face to face.
At the same time, it's prompted new forms of communication and intimacy-all the sorts of behaviors that extend and support these new ways of knowing.
MR. KIRK HANSON: Thank you very much, Leigh. My name is Kirk Hanson; I'm a university professor in the field of organizations and society here at Santa Clara and also executive director of the Markkula Center for Applied Ethics, which is one of the two co-sponsors for this meeting.
My role is to be the chair for this evening's discussion. We feel delighted that the topic of Search Engine Ethics and Politics is even more au courant than we anticipated when we set the title and the format and such for this session. Perhaps what I'd better do first is to give you a sense of our initial thoughts about the breadth of issues that might be covered tonight, because I think it's important for us to give ourselves the freedom of ranging over these.
It could be everything as broad as are there ethical decisions in terms of what comes up first when one does a search? What is truth if truth is represented by page one of a search? What about advertising and paid placement on search engines? Are there ethical and social dimensions to that? Transparency regarding that?
What about self-censorship? What about decisions not to display as many porn sites that perhaps used to come up when one put almost any innocuous term into a search engine? What about the storage of search results and search inquiries; the issue which Google is confronting with the United States government currently?
What about preserving those and putting those into databases and offering products based upon what searches are indeed conducted? What about questions of intellectual property? And Google was involved in a lawsuit that had a decision last week regarding the presentation of photos through image searches, in which there was conflict over intellectual property?
And what about libraries and the intellectual property in libraries and their presentation through search engines? What about relations with those Web masters who are trying to outsmart the search engines? Whether they try to find out the algorithms that Google and others use, when are they scamming the search engines, and when can you kick the so and so's off your search results because they have violated some kind of ethical norm of what's appropriate behavior and what's not?
And then finally, the government issue of the question of screening of sites, screening of search results. The government of China and the representative who spoke last week saying that China is doing nothing different than every government in the world does. Is that true? What is the range of practices amongst governments, and what are the range of responsibilities that search engines have in that?
So that's just to suggest some of the breadth of issues that we may want to deal with this evening. We are very, very pleased to have with us a panel that has extraordinary background to deal with these and many other questions. Peter Norvig will be our first speaker. Peter has been with Google since 2001 as director of machine learning, search quality and research. He's a fellow of the American Association for Artificial Intelligence, coauthor of Artificial Intelligence; a Modern Approach, which is a leading textbook in the field.
Prior to coming to Google, Peter taught (a professor at USC, a research faculty member at Berkeley) and then served in a number of companies and is head of the 200-person computational sciences division at the Ames Research Center for NASA. So his experience is deep and broad in the field of artificial intelligence.
The second speaker will be Terry Winograd. Terry is a professor of computer science at Stanford University. Terry is well known for his work in natural language. His focus is on the human-computer interaction, the design of that interaction with a focus on both theoretical background of that as well as conceptual models. And he directs teaching programs in a variety of related areas. He was one of the founding members and is past president of the Computer Professionals for Social Responsibility.
Our third speaker will be Geoff Bowker. Geoff is the executive director of the Center for Science, Technology and Society, our co-hosts for this evening. He's the Regis and Diane McKenna Professor in Santa Clara University. Previously, he was professor and chair of the department of communication at U.C. San Diego. He holds a Ph.D. in history and the philosophy of science; and he has a great interest in the design of information systems and the social and ethical questions that arise in that design.
We're delighted to have the three of you here. Let me simply turn it over to Peter to begin our discussion.
MR. PETER NORVIG: Thank you Kirk. It's great to be here. So Kirk mentioned a wide range of topics. I don't think I can cover them all, but hopefully between our speakers and your questions, we'll get to all of them eventually.
So first I want to start by showing you Google's mission, which is to organize the world's information and make it universally accessible and useful. And I think there are ethical implications of that and that we feel that we have a responsibility. We're providing the service and we recognize that others are providing similar services as well. But we really see it as a mission. That's something that's our duty to do and do as well as we can.
Now on top of that, that's the mission statement. On top of that is our informal motto: "don't be evil." And this was something that was coined by Paul Pukite, one of our engineers, I think in 2001 or 2002. And having a motto like that is something like painting a big bull's eye on your chest.
But we welcome it, because I think it's something that as a company, we want to live up to and we want to be held accountable. A while ago, Bill Gates got into a little bit of trouble because he meant to refer to our mission statement and said, we're the opposite, but he said, "We're the opposite of Google's motto," and I don't think that was quite what he intended.
So now in any company, there are a lot of stakeholders that you serve, and trying to do the right thing depends on who you're doing it for. And for us, we identify these, there are the end users, there are all of you when you type a query into Google. There are our stockholders who we're now accountable to. There are the content providers, the Web masters who make all this material available. And we're in sort of the symbiotic relationship with them where we help them and they help us.
There are the advertisers that eventually pay the bill and there are the employees of the company. So how do you balance all that?
What we've chosen to do is to say that our number one focus is always going to be on the end users, and everything else is going to follow from that. So to the extent to which we make money and do good for the company and therefore help the stockholders, it's all by serving the users first. And I think that has implications for how we run the company, the decisions we make, and the ethical considerations.
So now let's look at a couple of the issues. So one of the ones that Kirk mentioned was this idea of having impartial results. And we've taken a strict line of saying that there's a separation of the editorial content and the advertising content. So just like a newspaper has an editorial staff and an advertising staff, we make that distinction and we have sort of a firewall between them where we don't cross.
So one of the implications of that is we've decided to have no kind of paid placement. You can't buy your way into the search results. You can buy an ad and be shown on the side of the page, but you can't be shown in the regular search results. Nothing you can do can influence that.
The other part of it is that we've decided to have our search results be entirely algorithmically determined. So in the end, there is a human, because there's a set of us programmers who are trying to come up with the best results. But we write our algorithms the best way we can, we debug and evaluate them, and then that word is final.
Some of the other companies do that and then on top of that, they do some hand editing. So maybe they write their algorithms and then they run it and they say, "Well, what's the number one result for the query 'weather'?" And maybe it's weather.com and maybe it's Weather Underground. And then they say, "Well, I really think this one should be number one. Let's just push that up to be the first result."
We've decided not to take that route because we think it's just a slippery slope where if you do that one case, then somebody else is going to complain and say, "Why aren't I the number one result for this other query?" And we think it's simple as just to say that everything's going to be done algorithmically. There's no hand editing of the results. And so no human bias can fit in.
Now another important consideration is legality, so Vinton Cerf has said that part of "don't be evil" is don't be illegal. And that's mostly true. Sometimes to not be evil, you do have to break the law, but most of the time you want to follow the law.
One of the things we have to follow in the United States is the Digital Millennium Copyright Act where copyright holders can make a claim to say that "this is something, another Website is showing a result that I hold copyright to and I don't want them to do that." And there's a formal process where they can file these letters of complaint.
So to obey the law, we have to go along with that process. But what we've chosen to do is be transparent about it. And so when you do a search that would have shown one of these pages that had been removed by one of these letters, we show that at the bottom of the page and then we point you to a site, this chillingeffect.org, that lists these complaints. And so you'll see the results at the end of a search page.
Here I've shown you if you go directly to Chilling Effects and you search for Google, you see a list of the notices that have come in. And if you look at them, there's about one every day or two come in for some pages that are removed. So we want to follow the law and say, "This guy has made a claim saying, 'I'm the copy holder of this material, therefore you can't show it from somebody else's site,'" but we also want to be transparent about that to make it clear to our users when we've removed any results. And we do that for these kinds of things and we do that in China and we do that throughout the world.
Another important issue is privacy of our users' information. And we're a member of the European Union U.S. Privacy Safe Harbor Consortium. So in Europe there are very strict laws for privacy. There are different laws throughout the world. But the European Union has set up this safe harbor mechanism whereby if you agree to abide by these certain standards, then you can be certified for their level of privacy.
And these are the important considerations that they go on, and they're ones that we uphold. So notice: we let you know what we're going to do with your data; choice: we give you the choice to opt out if you don't want us to collect that data. Onward transfer means that we have to let you know if we're going to release your private data to another company.
We've taken the policy as even stronger is that we're never going to do that. We'll never release private data, but we may release aggregate data that's anonymized. So we do things like we release the 10 most popular queries for the month. That's something that millions of people are contributing to, not just one. So we feel it's not compromising your privacy.
Security: we have to make sure nobody can steal the data. Integrity: access, we have to give you access to the data so you can contest it. That's seems to be not so important for a search engine. Nobody's saying, "Well gee, I want you to strike out the fact that I made this query at this particular time." That's more important for things like credit-card companies where you say, "Is this a true fact on my record?"
And then we have to have a means of enforcement whereby we agree to arbitration standards if people have complaints. So we've done all of that.
Another thing that I think has an ethical implication is the ability to help the world by enabling new content. So certainly we feel we're helping the world by allowing access to information by allowing you all to do searches and get results back. But we think we're also helping by helping people to create new content, to make the Web more powerful.
So one thing is we're driving traffic to sites, so we're helping Web masters get better known. Another is we're allowing businesses to prosper online by taking out ads and getting traffic that way. And then the third is it's a slightly different program where we're allowing content providers to show our Google ads on their site. And so this allows another way to provide information.
And there are lots of good examples. I was reading a letter that was really touching. A guy who said, "You know, I used to have a job, but in my spare time, I worked on this fishing Website. And my boss would always tell me, 'you're wasting your time. Why don't you come back and work harder and not waste your time on the Website?' And then I started showing Google ads on my fishing Website. And after a month or two, I figured out I was making enough money that I quit my job and now I spend all my time either going out fishing or working on the Website and making it better."
So that was maybe a bad result for his boss, but for him, he gets to do the thing that he loves and for other fishermen, they get better information because he's able to spend more time making his Website and making it be a good thing. And that's the kind of thing we want to be able to enable.
We also think it's important to work on philanthropy. We've taken the sort of venture philanthropy approach where we went to attack philanthropy sort of as a business problem and do the best we can on it. So from google.org we are funding a number of businesses that you can read about the various things they're doing there. And we just this week hired a director for the organization, Dr. Larry Brilliant, who was one of the eradicators of smallpox and done a number of things throughout his life.
He was just awarded a wish from the Ted Foundation Conference. And what he wished for was an early-warning health alarm. So if a pandemic was breaking out, a way to be able to do that. And that's actually coincidentally is something that we've been working on within Google, is a way of detecting that kind of information. So there's lots of information you can put together. We have the computing power to be able to do that.
And one of the things that Dr. Brilliant said is that SARS was the pandemic that did not occur. And part of the reason it did not occur is because of the information technology that was available. So when it first started to break out in Asia, instead of this information being repressed, the doctors were able to communicate with each other and to start to shut it down. And I think that's a tribute to the power of modern communication, both Internet and cell phones and other types of communication that stopped that from happening.
So speaking of Asia, let's talk about China. When I agreed to do this talk, it was before this China stuff came up and I thought the room was going to be more empty than it is. But perhaps this is one of the reasons why some of you are here.
Now I think some of this coverage of Google's and other companies' stance with respect to China has been confused, because there are really two issues. And sometimes I think the press coverage has not differentiated them properly. So the two issues are the protection of users and censorship. And they are quite different things.
So first, the protection of users. So before we agreed to go into China with our China.cn domain, we thought it was very important that we needed a policy to keep our users out of jail. So again, our users are number one. Having them in jail, we thought, was not a good thing.
So how did we do that? Well, a couple of things. So one is we're not going to offer any services in China with the current government the way it is, that have user-identified accounts. So there'll be no g-mail service, no blogger service, nothing where you have an account that identifies you as a person. Because we just thought that was just too dangerous. We didn't want to be in the position of having to hand over those kinds of records to the government, and so we thought the only way we could handle that is not to have those records in the first place.
Then the second aspect of that is the search log. So we're not going to have user accounts, but our search logs don't have personally identifiable information, but they do have an IP address that is potentially identifiable with an individual or maybe a small set of individuals who share an IP address.
So the decision we made there is we're not going to keep any search logs in China. We're going to ship them over to the U.S. If the Chinese government wants to get it, there's a process they have to go through that goes to the U.S. State Department, they have to agree to it, then they have to come to us. We have to agree to it and if everyone agrees, then probably it is the right thing to do to release that information.
If anybody in that chain breaks the chain, then the information does not get sent back. And we recognize this is a hard choice to decide if this is the right way to go. But we thought this offered sufficient protection to our users.
And then we support the U.N. Universal Declaration of Human Rights, the Global Sullivan Principles and so on. And in fact, we'd like to have stronger principles. We'd like to have people get together to have stronger principles for them.
So then the second part of the issue is censorship that I said I think is sometimes confused with the first. Now, part of this confusion is I think some people don't recognize that censorship is, in a way, unavoidable. And so this is a little diagram from the Washington Post that shows how the Internet works in China. Is that if you're on the Web in China, you type in a URL, it first goes through your ISP and the government has a back door into the ISP. It will immediately eliminate certain addresses and then it will also look at the contents of the pages that come back and eliminate certain pages if they have content that the government deems is inappropriate.
So no matter what you do, censorship is there. We've chosen to have our site within China and in order to do that, we have to agree to this level of censorship. Even if we didn't agree to it, it wouldn't make much difference because it would still be censored. But what we have chosen to do that as far as I know nobody else is doing, is add this level of transparency.
So when we do censor results, when we remove results from a page, just as we do everywhere else in the world, we say, "Some of your results have been removed." So at least the user knows what's going on.
We feel that the U.S. government can stand up and make stronger laws and we feel that corporate America can get together and have stronger principles as well. And we're supporting efforts on both those fronts. We feel it's difficult for us to do it on our own.
So we announced that this Google.cn site this month, and in some parts of the press we were hammered for it. But another way of looking at us announcing it this month is to say from 1998 up until this month we resisted opening a Google.cn because we were worried about these issues.
And we didn't see a lot of press coverage saying how Google was so courageous for resisting for those number of years. So I think that shows that as popular as we'd like to think we are, we really don't have that much power by ourselves to change the world in that way. We'd like to get together with governments and with other corporations in order to provide some more power in the aggregate.
Another thing we support is the Open Net Initiative, and this is adding some more transparency. And so this is an example where they have a page where you can compare the results on google.com in the Chinese language interface, that's what you see around the world, and Google.cn, which is what you see in China.
Another thing we've done for transparency is we don't geo target. So for some other sites, some company.cn, they'll give you different results depending on where you come from; whether your IP address is within China or without. We give the same results no matter where you come from, so that that way as an outside observer, you can see what a user in China would see.
So on the left is a Google.cn, on the right is the Google.com. The query, they had a drop-down menu of queries and I just chose one, which was Tibet. And I was actually surprised at that they're pretty much the same. I guess we don't quite have the local search and images up in China yet, so that doesn't show up on the left. But there is the first one here Tibet online provides information about the plight of Tibet and serves as a virtual community to end the suffering of the Tibetan people by returning the right of self-determination.
So that gets through the censors and I was a little bit surprised at that. I thought that would probably be blocked. But it sort of shows that there is a lot of information that's getting through.
Another example here searching for bird flu. And here the results are the same, except that there's an extra ad on the Google.com compared to the Google.cn. Now probably if you searched with Chinese characters rather than with the English characters, there would be more of a difference there. And certainly there is a difference. There are only 52 million results for bird flu in Google.cn versus 94 million for Google.com. But the first 10 are exactly the same.
And as we've been out, we've talked to a lot of people about this, a lot of our own internal Chinese engineers and other employees, a lot of dissidents and so on. And most of the message that we got back is what's important for the Chinese end user is access to information. And sure they want to know about democracy and Falun Gong and so on, but really they want to know about their day-to-day information. And they want to know about things like outbreaks of bird flu and so on.
And so we're giving them that and we think that's the most important. We'd like to be able to give them all the information, but we just can't. And we thought it was more important to give them this information that they can use, even though we have to offer a compromised service.
And this is a quote from Baidu.com, which is a Chinese search company. On their page they say, they talk about all the little things that we do differently. And I think this is actually very accurate. So it's very important if you want to get the best results to be on the ground in a country and be able to do all the little things.
So not only do you have to be able to speak Chinese, and we certainly have plenty of engineers here in Mountain View that speak Chinese, but you also have to know who's the Chinese pop star and what's the equivalent of "American Idol" in China in order to get all those queries right. Because let's face it, that's what most of the queries are about.
Some of the people will want to query about democracy, but most of them just want to know about their pop stars. And in order to get that right, we have to be able to hire people in China who know the culture as well as knowing the language. And so to serve our users better, we felt like we had to be there on the ground.
Liu Xiabo who's a jailed Chinese activist and he says, "The Internet is God's present to China. It provides the best tool for the Chinese people in their project to cast off slavery and strive for freedom." And we think this is really right. This is really the message that hit me as I went out and talked to various Chinese people, saying, certainly censorship is bad and you don't want to collaborate with that, but given no choice, access to information is important and will change the world for these people.
And finally here's Nicholas Kristof from the New York Times comparing the various companies and their actions. And I think this was a very good article because it really differentiated between these two aspects of protecting the user versus censorship. And his conclusion was, "Google strikes me as innocent of wrongdoing, and that by providing both the Google.com site and the Google.cn, we're providing them more choices rather than fewer."
So it's a hard call to make, but we think this is the right one, and we'll continue to review it as we go forward.
MR. HANSON: Thank you, Peter.
MR. HANSON: Terry Winograd.
MR. TERRY WINOGRAD: I apologize on my slide here, I didn't put both sponsors. So it's the Science, Technology and Society and the Markkula Center. Thank you.
In the interest of full disclosure, I just want to start off by saying that my main affiliation, where I actually spend all my time is at Stanford, but I have had and do continue to have connections to Google. So this is not a completely independent view, although I'm coming at it from a somewhat different angle.
Larry Page was my graduate student, never finished his degree, unfortunately. And the Google project actually started as part of the digital libraries project, which I was one of the principle investigators on. And over the years, I've been on the technical advisory board, I took a sabbatical there. I hang out a bit, but I have not been for quite a while involved in any of the active engineering or decision-making. So I had this sort of funny role as an insider outsider.
I know a lot of the people; I've known Peter for many years. I've known many of the people there. But I have a separate stance really from my position as a university professor, somebody who's looking at these questions from an intellectual and academic point of view and trying to see how they then get reflected.
But I will admit that my exposure to and knowledge of what search engine companies do is strongly influenced by the fact that that's what I've seen.
So the question that the organizers asked us is: what are the ethical challenges? And there's a very nice handout that you've gotten that talks about this from a philosophical ethical perspective. And this is sort of a complementary cut at it; it's not really from a philosophical perspective, but more of a common sense perspective. Where do the ethical questions come from? What are the things that you might want to worry about if you're thinking about search engines or if you're designing one or if you own a company or part of a company that does one?
So the first, which Peter has talked about some, is this question of what's fair? It's important to recognize that search is inherently a socially conditioned process. It's not an objective process over an objectively given body of material. It is inherently editorial activity in how you go about finding the things you're going to search, how you organize them. There is a ranking process that is inherently adversarial. What I mean is somebody gets first, somebody else doesn't get first.
So you're in a social setting where you have people, search engine optimizers as they're called, who are working very hard to see that their information gets put before somebody else's. So the search engine provider becomes an arbiter in some sense between these competing entities.
And you have the problem which is much more recent because things like Google weren't around before, but where as you concentrate, and of course, we've seen this in the mass media much earlier, as you tend to concentrate, there is a narrowing of alternatives; a narrowing of views, because there is just one or a few places that you go to see things.
So there are ethical questions that come up in all communications, mass media…[inaudible]...which are part of the scene for search engines as well.
So we talk about bias first. What does it mean for a search result to be biased? And one obvious thing that people raise and Peter talked about this is if I can buy a place in your search because I give you more money, then obviously your search is biased towards money.
And of course that's been true for advertising for years. Why do I get a big chunk of the New York Times? Not because I'm more important, I'm talking the advertising chunk. But it's there, but it's known as the advertising. It's very carefully marked. And I think as Peter pointed out, Google has made a very strong effort to say, "If you're going to buy space, position, then that will be made visible so that the consumer, the user knows that's what it is."
I think that to some extent, this is a problem that today has been dealt with very well by a number of companies, and Google's been particularly there.
There's another kind of bias, which is that you have a point of view and you express that bias. And that's not always bad. So looking around on the Web, you'll find things like the Christian search engine. Okay. They put their bias right in front. They say, "If you want to find," and it says up at the top, "Agape, Christian Search." Now there may be multiple versions of what's Christian, that's another more subtle thing, but you know that you're going to find certain kinds of things here and not other kinds. And that's why you choose to go this site.
So it's fine for a site to be biased in the sense that they're selected, as long as that's part of who they are. That's part of their identity; it's part of what you know is going on.
There is censorship, and we've just heard quite a bit about the whole issue of censorship in China. I want to come back a little later to talk about why that is such a big issue for some people. But I think it's an obvious point that you will get bias if you're explicitly excluding things. And again, the question is sometimes: do you make that visible? Do people know that that's being excluded? In which case, they may not like what they're getting, but at least they aren't being fooled.
There is a lot of discussion as to whether that's true in China. Whether the average Chinese user really knows that the Internet they're seeing isn't the whole Internet, or they just think it is the whole Internet and the other stuff isn't there.
But I want to talk a little more about this notion of technology bias. So this is a quote from Eric Schmidt who's the president of Google. Actually, I was looking through my notes to try to find where it was from. It was an interview and I've forgotten the exact date. But he was being pressed about bias in Google News and he said, "This is not really a newspaper. These are computers." And you can read it up there, I'll skip through, but, "They're assembling it using algorithms. I can assure you it has no bias. These are just computers. They're boring. I'm sorry, you just don't get it."
He was a little annoyed with the question. And Peter made that same point about algorithmic things. But I think it's very important to take a step back and say, "What is the algorithm running on?" And, "What biases are built into an assumption about how it works?"
So take Google News for example. If I decide to publish the Santa Clara Radical News and put it on the Web, will my articles get into Google News? Well, turns out that they have a particular set of news sources that they consider. They don't just take anything they find on the Web. That's different from the regular search. So there's a bias; they try very hard to be balanced. You'll find Al-Jazeera as well as you'll find right wing American papers and so on. But there is a bias inherent in the fact that you're selecting it and even with things that you don't select that way, there's a bias inherent in the technology.
So one of Google's main advances in search engines was the notion of page rank technology, the details of which there have been many other talks on. This one isn't important here, but it does a calculation based on the structure of links which lead certain kinds of pages that have lots of links to them from certain other pages that have lots of links, to get moved up in the ranking as opposed to others that therefore move down.
Now it's only one of the parts of the ranking. So it's not like that's the magic algorithm that does everything. But if you think about what it does, it does have a concentrating power. It says those people who get more attention get more attention because they got more attention. And you could make an argument that that's not necessarily the best for diversity.
So I had a student who did an honors thesis in science, technology, and society at Stanford, a very astute sort of analysis of search engines. So this is from his bachelors' thesis, actually. "While the market mechanism is intended to most fully satisfy the pre-existing preferences of consumers," that's what gets liked gets seen more and therefore gets liked more.
"The deliberative ideal," so here we're talking about issues of democracy, about how people find and debate information in the world, "requires that individuals be also exposed to material that is contrary to those predispositions. What's good for consumers in other words, is not always what's good for citizens."
So this is the kind of ethical question that you then have to balance. Is it good? What's good for the public benefit in terms of a diversity of views that may or may not show up based on a particular algorithm that is perfectly neutral in the sense that it's not designed to have right wing versus left wing or Chinese versus Tibetan or whatever it is. But which implicitly-just because of the nature of the way the calculations are done-has this kind of bias let's say for things that are better known than things that aren't. You could make arguments that that's better for most users as well, but it is an ethical question that you're dealing with.
I'm just going to go through a series of questions here and we can talk more about them in the discussions. Because to put it in simple terms, I have more questions than answers. I don't know what the right answers are, and I think that all of the search engine companies in their own ways are struggling with these as they go.
So what does it mean to own information? You heard about the case that Google recently lost for somebody whose pornographic images, actually they felt were getting unfairly put up in the search results when people did searches for them. But there have been more complex kinds of things.
So this is a report from the Financial Times last month. "A group of newspaper, magazine, and book publishers is accusing Google and other aggregators of online news stories," this is back to the Google News, "of unfairly exploiting their content." Again, I won't read the whole thing, but the obvious tagline is, "They're building their business on the back of kleptomania."
So I think it's a very complex question. Anybody who's studied intellectual property rights and copyright and ownership of intangible property like this knows that it depends on what country you're in, what tradition you're in, what kind of material it is. This is not like you can come up with a simple set of rules to follow.
And then it's the question for anybody who is displaying content to figure out what are the ethical things to do. And I think Google has taken a strong position here based on, as you saw, that "user is number one" slide that Peter put up, which says: if we think it's going to be good for the consumers of content, we're going to push as hard as we can, even if the people who are, say, the producers of that content, are complaining. And we believe it's probably good for them in the long run too, but we're not going to take that into the primary account at the beginning.
This is a choice that other search engines and other online information providers are taking differently. Great source for discussion and debate. We'll move on.
There is a view, this is Richard Stallman, many of you know of him and this is a sort of famous quote, I believe, "All generally useful information should be free." You should have the freedom to copy and adapt it to one's own uses.
And the ideology here is when information is generally useful, redistributing it makes humanity wealthier, not the owner of the information wealthier; humanity wealthier, regardless of who's distributing and no matter who is receiving.
So there are strong views. This is a more subtle one-Stuart Brand. This is actually earlier than Stallman's, this is quite a while ago when he was just looking at the future of the Internet. "Information wants to be expensive because it's so valuable, but it wants to be free because the cost of getting it out is getting lower and lower." So you have these two fighting against each other.
Classical ethical situation. You have two different sets of values. You could argue both sides, and how do you judge in a particular case which is the more important?
A third one I want to talk about in a slightly different way is how much can you-and by 'you' here, I'm addressing these questions to the search engine providers-control your future? If you look at issues like maintaining personal information, this is an article from the Electronic Privacy Information Center, which is actually an outgrowth at one point of the Computer Professionals for Social Responsibility, which I was involved with at the time.
They're looking at what happens when you store online information, in fact, peoples' e-mail, and you search through it. And a lot of the debate when Google first started putting advertisements in e-mail and the computers were going through and processing it, I think it was misdirected because it really, the fact that an Internet mail provider is storing your mail in the first place creates this opportunity.
And whether they are using some algorithm that puts ads on it or not is not the key. But it can be subpoenaed. That is, it says here, "Scanning personal communications in the way Google is proposing is letting the proverbial genie out of the bottle. Google could tomorrow by choice or by court order employ its scanning systems for law enforcement." So what is stopping Google from searching your e-mail to see if you use the word 'bomb' or whatever it is in it?
The answer is, the people who are there don't intend to do it. And I trust them. I do know a lot of the people and I know that's sincere. But how much can they control that? What's going to happen, first of all, when there's a precedent so that some kinds of scanning are done, which makes the court say, "Well, if this kind of scanning is done, we're not going to stop that kind of scanning."
And what happens when that same company is taken over by different people or is put under different constraints by the government? That is, how much are you responsible in your conduct today to take steps that would prevent ethical problems in the future that you can't control?
And I think this is a big one. One of the things that Peter talked about is Google's logs. Google keeps extensive logs of searches, not with people, not with your name unless you've signed up specifically for personal search, but, in fact, data-mining techniques could be used in those logs to, as they like to say in the intelligence business, connect the dots.
They are resisting a subpoena from the Department of Justice right now to collect for the government, to sort of harvest that kind of information for a particular use which has to do with pornography, which isn't the issue. It's who has the right to go take those logs and do data mining over them and find things out?
As long as the logs are there and being collected, it's very hard for a company to have control. Ultimately, as Peter said, being evil doesn't always, or not being evil doesn't always mean following the law. But not going to jail usually means not following or following or however it goes, right?
You don't want to go to jail; you've got to follow the law. So there is, I think, a big ethical question about can you do things which, as long as you control them, you have perfectly good ethical justification to believe they will be fine, but where you really cannot be sure you'll control them. How much do you worry about the worst case? How much do you worry about disasters?
And the sort of final question that I want to raise here is: to whom do you pledge allegiance? And there are a variety of possible answers here. So the American or the world corporate system, the capital system, basically says it is the goal of the people who run a company to be responsible to the stockholders who own that company. Their duty is to provide value to those stockholders.
Now as we have seen in endless cases and discussions, I'm not going to get into a discussion of corporate ethics broadly here, there are limits to that. There are cases where you say, "This is not going to make the most money for the stockholders, but it's better for some other societal reasons," and some of those are often ethical.
Google again has sort of tried to set itself apart from a lot of other companies in taking a strong stand on that. And it's a constant question that comes up in looking at it. This is actually a quote, the "don't be evil" quote of Sergei.
Second are governments. To what extent are you, as a company running a search engine, obedient or responsible to your government? And we've just seen the discussion about China. If you're going to operate in China, you do have to be responsible to the Chinese government. That's part of being a lawful citizen.
When, and this is again one of these deep, deep ethical questions that's just sort of a little bit on the surface here, which is when do the laws and the rights of governments go against the ethics of the people? When is it right to resist as opposed to following?
And when it's right to resist, sometimes is because you see a larger view. I said a little while ago that I was going to comment on why I think at least a certain population, and by this I mean the ones that I see around me. People who are in computing in the information world found this China question so intense.
In fact, the kind of censorship that's going on in China is not completely out of range with what's going on in other places, other settings. But I think it really triggered a kind of final battle between two views. And I want to read a couple of longer quotes here because I think they get to this sort of heart of what makes these have, these issues have a kind of bite for a lot of people.
This is from an article by Introna and Nissenbaum who in fact have been participants here in the center. They talk about the ideal Web. They say, "It's not merely a new communications infrastructure offering bandwidth, speed, connectivity and so on, but is a platform for social justice.
"It promises access to the kind of information that aids upward social mobility and helps people make better decisions about politics, health, education, and more. It facilitates associations and communications that could empower and give voice to those who traditionally have been weaker and ignored.
"It will empower the traditionally disempowered, giving them access both to typically unreachable nodes of power and to previously inaccessible troves of information."
Now this is pretty utopian. This is a view that says this is not the usual build a bunch of telecommunications infrastructure and people will use it and make money on it. It's saying there's something different. The Internet really isn't the same old kind of thing. There's a shift, a change of mode.
A much stronger version of this, which some of you may have seen, I think it's over stated but in a way that I think captures the spirit, is this Declaration of Independence of Cyberspace. Some of you may even know of John Perry Barlow. He was, among other things, a songwriter for the Grateful Dead and a cattle rancher in Wyoming. He's a very interesting guy, very smart.
And this is now back in really the early days of the Web sort of catching hold. "Governments of the industrial world, you weary giants of flesh and steel, I come from cyber space, the new home of mind. On behalf of the future, I ask you of the past to leave us alone. You are not welcome among us. You have no sovereignty where we gather.
"We have no elected government, nor are we likely to have one. So I address you with no greater authority than that with which liberty itself always speaks. I declare the global social space we are building to be naturally independent of the tyrannies you seek to impose on us. You have no moral right to rule us, nor do you possess any method of enforcement that we have true reason to fear." And so on.
Is it big enough to read from the back? You can read the rest.
This is a really big strand of thought, especially in the early days of the Internet and the Web, which was that this was going to be the opportunity to really change the world social order. It's not a question of supporting industries; it's not a question of online commerce. It's not a question of just being able to talk to people somewhere else faster. It's really going to change, turn the power relations around so that we, the citizens of cyber space, have more power than you, the government of China.
And I think that that is being questioned now in a way that it hadn't been up until now. I think that the whole Internet structure is being questioned by governments and I think these things like censorship. And it's bringing the realization that, in a way, you can't escape reality. The ethical questions of how do you deal with governments and power and politics are not void in cyberspace. They just play out in somewhat different ways.
And I think all of the kinds of ethical questions that have come up in centuries or decades or millennia of thinking about people and democracy and governments are just as active here. They don't go away. They become strong.
The danger is, and I just wanted to comment to this, it's sort of the Barlow thing, that when you start to believe that what you're doing is for the good of humanity in general, then how do you judge that? What kinds of institutions are there? Governments are institutions. Religions are institutions. They all tell us in their ways what it is that constitutes appropriate ethical behavior.
As he said, "We, the citizens of cyberspace have no elected government, and we're not going to have one." So does it just become a matter of personal preferences? And I liked this, this is a quote from a Playboy interview that was famous. It's, I think, the only Playboy interview that's ever been filed as part of a Securities and Exchange Commission stock filing because it came out at the wrong time. And there's a whole bunch of stuff about that. But it was part of the public debut of Google as a public company.
And it's an interview and Sergei says, "For example, we don't accept ads for hard liquor, but we accept ads for wine. It's just a personal preference. We don't allow gun ads and the gun lobby got upset about that. We don't try to put our sense of ethics into the search results, but we do when it comes to advertising."
So again, they're making a clear division and I admire this, to say, "Okay. The search results are not going to have our biases." But even in the advertising, as soon as you say, "I'm doing this for the good of society," somebody says, "Guns are bad." Somebody else says, "Guns are good." Who decides among those?
If it's the stockholders, if it's the corporations, then the question is, "Which makes you more money?" It may be hard to calculate, but it has a clear kind of answer. If it's individual leaders, if you believe Sergei Brin, then you say, "Well, he doesn't like hard liquor, so that's good." But of course then you have leaders on all sides of issues.
Is it governments? Is it institutions? Is it just the way the society works out? And I think that because of this mythical origin I'll call it of cyberspace, these questions have really come up in a new and active way. And I think it's one of the things that makes it a fun topic for people who are looking at ethics to see how they apply. That the old questions which have been in a context of governments, businesses and so on, really have this additional dimension. So I'll be interested to hear what you all have to say.
MR. HANSON: Thank you, Terry.
MR. HANSON: Geoff Bowker.
MR. GEOFF BOWKER: Okay. Hi. You've all been very patient listening to these two wonderful speakers before, so I'm going to race through a little bit what I have to say so that we've got good time for question and panel discussion at the end.
I'm going to cover three issues relatively quickly. The rights of a state to ask for and control information. The role of the information provider to make information available, either to the states or to citizens. And finally I'm going to talk a little bit about the problem of overseeing highly complex technologies, the kind of technologies that we're playing with today.
But let me start with, this is just a picture from last year. And if you went on Microsoft Networks and you tried to work out how to get from Halvorsen in Norway to Trondheim in Norway, you would find out that the best way to go would be south through Stockholm, across to Berlin, across into London, up to Amsterdam. And I started to get worried when I was in Newcastle, I think, and you're being asked to take the ferry back across. And finally, you'll find your way to Trondheim. It's a distance, according to Microsoft Map Serve, of 1,695 miles.
Today you can do that same search and find it's only 476 miles long. The problem is that the information that's presented to us, these are becoming highly trusted sites. We believe in them. They take a certain moral weight, they do have a weight in our society. And unless there's something obvious like, "Oh my God, I shouldn't be going to England in order to get to another place in Norway," then how do you challenge them? How do you question them? How do you say what's wrong about them? And that's really what I'm going to be talking about today.
First point I'm going to make is a little bit, I think, well, I think it's highly sympathetic to one of Peter's main points. That I really do separate off issues of protecting the users and protecting content. States, there's been a lot of hypocrisy, to put it mildly perhaps, in the discussions about China and censorship.
States are in the business of censorship and they always have been. The American state's done very well. We banned Ulysses, which was selected by the modern libraries, the best novel of the 20th century. It was banned as obscene for 15 years. Voltaire's Candide, written over 200 years, ago was banned and seized. Aristophanes' Lysistrata, Chaucer's Canterbury Tales. These are all highly recommended books. I love them all. I'm ashamed to say I've read all of…the obscene books here.
They were banned at various times. Jean-Jacques Rousseau made the trifecta. He was banned in this country, he was banned by the Catholic Church Index of Prohibited Books and he was banned in the Soviet Union.
States do this. They try and control the information that their populous can get access to. Is that a good thing? No, it's not. Is it the personal responsibility of folks in Google to make certain that the state isn't involved in this sort of behavior? Absolutely not.
That is not the responsibility of an individual company, it's the responsibility of us all to take care of our own democratic rights. And it's the responsibility of corporate America, perhaps to make a joint statement about this. But we cannot fight state censorship through the corporations alone.
I'm just going to zip through some other aspects of censorship today. If you went, in 2002, to Google in China and typed www.google.com, you actually got a site at Peking University which was a site which had it's own kind of search engine attached to it.
Not so today. Now as we know, you actually have pretty good access to a lot of information through Google China. Now here's, if I do democracy, which is one of those wonderful play words, if I do democracy on Google.com, I'll find 145 million hits in about 0.1 seconds. If I do that same search in Google China, I'll get 139 million hits. So there are six million hits different.
Which six million? How am I going to get access to those six million? And I do somewhat take to task, if you'd let me go forward a little bit here, take to task a little what Peter was saying about transparency here. Because yes, there is some transparency, the Chilling Effects Website, which is a very interesting site and I highly, recommend it to you.
However, if you burrow down a little bit and you see this February 2, 2006, we have sender information, private, recipient information, private. On February 2, 2006, Google received a complaint about Web pages that allegedly violated section 86.
It's very useful that that's there, but it's really not telling me very much about what exactly is being banned, why is it being banned, who has demanded that it be banned. It is a step towards transparency, but it is certainly not acting as full transparency.
So a second issue which has come up lately with Google which I've been highly sympathetic with their stand on, and which I think is one which has received less play than it might, but I think it's actually a thin end of the wedge-style issue, is Google's refusal to accept the subpoena of their search records from the American State Department.
I see this as a very long tradition of government overreach of trying to get hold of personal information about searches, personal information about our own information behavior. The American Library Association has been fighting this since 1938 when it first came up with its statement about protecting library records. It is now in the Code of Ethics of the American Library Association. This is the sort of body that should be dealing with this, not the corporations. Number three, we protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
And they've not unfortunately been all that successful. They say right now they're extremely worried. In June 2005, they say, "Since September 11, since October, actually, 2001, they've had 63 legal records, request for records in public libraries and 74 in academic libraries." This is a constant battle; it's not just a search engine front. This is a front where we need to think as a society about what our relationship is with our information, how we use it, how we share it, who should know about it.
I'm going to touch on a final issue. It's a somewhat hairy one, but I'm going to have to go through it quickly just in the interest of discussion. There's a wonderful cataloger, Sanford Berman, who's been attacking the Library of Congress and the Dewey Decimal System for a number of years. And he talks about this wonderful concept "bibliocide by cataloging."
What he means by this is if you can't get access to the book, if it's not well enough catalogued, well enough defined, it doesn't matter that it's in your library, nobody's ever going to know that it's there in the first place. So a standard cataloguing record takes a radical book about street art, graffiti, ethnic art in society and mural painting and just makes it available through art in society, art amateur, and street art United States.
Now it's one of the great advantage of Google is they don't fall into that catalog trap. They do allow you to get free access to lots of information, but they do end up doing a huge amount of filtering. Your default on Google preferences, no, your default is to use moderate filtering. Filter explicit images only, default behavior.
That's highly problematic. Most people when they know Google, don't even know they have that set of preferences going. Sure, they can change it and you can change so you use either no filtering or strict filtering, but most of you without knowing it are actually engaged in moderate filtering.
You have to be a highly engaged user to understand the issues and to understand the capabilities that Google is very well and rightly making available to you.
Let's look at some problems with this. So if I do an unfiltered search on breast cancer, breast cancer is always one that the search engines have had difficulty with by the way, with their safety programs, their nanny programs.
I'll find on the first page of the unfiltered Google, the breast cancer site that is founded to help offer free mammograms to underprivileged women. You go to the site, you click on an advertisement and that provides money, which will then be filtered into providing free mammograms for underprivileged women.
If I go to filtered Google, that site's not there. It's not only not there in the first page, it is not there, full stop. That site has been filtered out. Now I went to that site today, this was just, it's one of those operations well, I'll just try it for myself. Went to the site, I can't see anything wrong with it. Maybe that picture is a little bit risqué down there in the bottom right, but I'm sure that Google could probably, most kids could probably handle it.
This is a mistake. They probably don't want their filter mechanism to be doing this. No I don't want to be running that. Go away.
They probably don't want, they don't want to be doing it, but it is happening. Who's got the control on this? How can we develop good controls so that we have good public understanding of these highly complex technical issues and ways of dealing with them, which are rich within our democracy?
Within our democracy today, within our lives today, our information infrastructure is central to the way in which we live and it's central to the way in which we act as citizens. Google has taken, I think, a very good and strong lead in protecting users, however, there are problems with Google.
There are problems with all search engines. There are problems that we as a society face, which I think we should have free and open discourse about right now. And for which I'm very grateful for all of you tonight in taking part in. Thank you.
MR. HANSON: Thank you. Let me invite our panel to come up front and I'm going to remain over here and ask your questions. And in advance we've already got more questions than we can handle given the time. But I will do my best to get all of the themes into this.
Let me just highlight for you, please do write down more questions and they will be picked up and handed to me. Secondly, there will be time for informal discussion after our question and answer period, because we're going to have a reception next door in the adjoining room. You all are welcome to join us for that reception.
Let me get right into it. There was discussion, Peter, you mentioned that Google was in favor indeed of an industry-wide effort to develop a set of principles regarding dealing with government censorship. Let me ask you and the other panelists to comment on what are the key elements that ought to be in such a code or in such a cooperative effort. What areas do you expect it to cover? Perhaps, what behaviors should it prohibit or declare to be evil?
MR. NORVIG: So I guess we're just starting up the conversations now with some of the other companies and I think it's really the things we've already addressed. There are issues of protecting privacy of users and its use of censorship. And what they can stake or what the legal status of them would be. But we're in discussions to try to work that out.
MR. HANSON: Do you Terry or Geoff have suggestions for Google and the other companies in the industry as to things that definitely ought to be addressed in such a cooperative code?
MR. WINOGRAD: Well, I have a concern and it has to do with the international structure. There is a certain attitude towards free speech and free information that we all sort of take for granted being in this culture, that's very, very different from what you have in a closed culture, like Iran or China. And even somewhat different from what they have in Germany and France and so on.
And there was a recent trial of a Holocaust denier going to jail for what we might consider free speech, and so on. And I think that it's very important for this not to become another place where the rest of the world sees it as Americans trying to impose their particular values on everybody. But it's hard, because the companies don't have national reach. They have global reach.
And I think trying to come up with ways that can take into account the whole world-global, cultural differences along with the principles we believe in. You don't want to say, "Well, we give those up." I think it's a very hard problem.
MR. BOWKER: Yeah, I'd agree with what both Peter and Terry had said. I think all that I'd add particularly is the issue of transparency, which I raised before, which I think is a highly complex issue. But the industry as a whole should develop ways in which it's obvious how and why censorship is occurring or when and how records are actually being made available.
MR. HANSON: When Terry was talking about choices Google had made, Peter, he said that you've chosen to favor the consumer as you said, even to the point of outraging some of the content providers. Now you have three professors here who are all content providers and write books, and we'd like you to pay attention to us as well and not always to favor the user, rather than us. What kind of arguments would you give us that indeed you ought to pay attention to the user rather than to wise content providers like ourselves?
MR. NORVIG: Right. Well, I'm a book publisher too, or a book author, so I have that interest at heart. But I think in most of these cases, the providers of information themselves are on our side. That they want more access to their information, and the way we're providing that most of them seem to agree to.
The place where we're being hit back is not from the author's of the information but it's from the aggregators of them. So for example, in the newspaper case where we aggregate stories from all the newspapers on our Google News page, if you go to the newspapers, they love it. They're getting lots of click through from those headlines that we show on the page, and they want more of it.
The people that are objecting are like the news services, the API and particularly the French agency where they say, "No. We don't want that." And they could certainly stop it. If they wanted to, they just put in their robust text or in the meta tags, they could tell us, "Don't index these pages." But the newspapers want to index it. It's these aggregators who want more control and they want, they're following the money. They're saying, "Google's making profit off of this and we aren't, so what can we do about that?"
MR. HANSON: And to your co-panelists, do you agree with that position that, indeed, you're glad to have your content shared? And do you believe there's ever a circumstance in which Google should be paying for your intellectual property?
MR. WINOGRAD: Well, Google is not, they shouldn't be paying in the sense they should either be not taking advantage of it or it should be free. They're not going to be in the business of a paid intermediary. But from a personal point of view, and I think this would be true for most people that I know, the feeling is you'd much rather have them making it accessible and visible to the world. And some of them might buy the books, but it's a net gain to everybody.
I think the big question that comes up here is that you have different opinions about how the future is going to work out and the question is whose opinion gets taken into account? It goes back to the question of who do you trust?
So Google says, and I actually believe most of this, that if you do this kind of broad indexing and putting everything online and so on, it's going to be good for everybody. The providers will, in fact, make more; the booksellers will sell more. But people who don't have that same view say, "We don't want to be forced to believe that now. We want to still have some control because we have institutions, we have courts," and so on. And that's where it's being fought out.
MR. BOWKER: Yeah, I think it's one of the weird things in academia especially is that we've got this huge publishing industry around the journals, which is relatively unnecessary and relatively not providing enough value added to really make it worthwhile.
As someone who writes a lot, I'm desperate for people to read me. I'll do anything to have you read me and have anything that I have made available. I have no problem with that.
The bottom line as far as publishers are concerned in general, it's been shown that when you put full text of books up, people will buy more. So I think Terry's quite right in his thinking about that. Let me stop there.
MR. HANSON: This panel is very Google-oriented, one of our audience participants mentions. And so the question is that I'll add to that: Are the strategies or the choices made by other search engines different than those made by Google? And how would you characterize them and would you critique them?
MR. BOWKER: Well, maybe one of us should go before Peter on that.
MR. NORVIG: I think I can pass on this one. You guys can handle it.
MR. BOWKER: There are certainly differences in China, for example, where Yahoo has released user records that led to someone's arrest. I've not followed that case in huge detail, but I think that's very different in terms of protecting the rights of the users.
There are differences in this country between the Microsoft and Yahoo with respect to Google on the subpoena for the search terms. And I actually, as I said before, I strongly support Google's stand on that. Although it has been argued and truth in advertising that Google's stand was more about protecting their algorithm than protecting the rights of their users in this case.
But whatever, I think their stand was absolutely right.
MR. WINOGRAD: Yeah, I think there is a difference in tone with some of the free Yahoo in particular on the issue that Peter raised about who advises you on what you should be reading. And that Google has taken this very strong stand, which says, "We're not going to say this particular thing is better than that; let's put it first."
And I think that Yahoo in particular, but also Microsoft to some extent, has taken much more the traditional, what you think of as the newspaper view, which is you're going to hire editors who really have good judgment as to what should be first. And you go to them because you're buying their judgment rather than their objectivity.
And if those are clear, if you know that's what you're getting, as I said before, that's not bad, but I think those get mixed up because people do think of a search engine as being neutral in more of a sense.
MR. BOWKER: Can I just jump in there because there was something I actually somewhat disagree with what you said, Peter. That I think just because you have an algorithm doesn't mean it's objective. And just because you have an algorithm doesn't mean there are not social, political, and ethical values written in. It's just that it's hidden in the algorithm rather than in the mind of the person that's making the decision-the editorial choice.
The editorial choices are still there, and they're much harder to access when they're in a technical algorithm most of us can't understand and which are actually secret, than they are if we could actually have access to the real live people making the editorial choices. So I'm not quite certain that Google's position is as strong there as it might be.
MR. HANSON: Peter, would you like to respond to that? Are your biases all hidden in the algorithms?
MR. NORVIG: So obviously there are biases because somebody's got to come out number one and everyone else doesn't. And I think Terry made a good point that there is this bias towards pages that gets links from other sites and then that's part of the page rank part of the algorithm. There are lots of other parts to the algorithm that have other kinds of biases.
And I think that's right that we do our best to come up with what we think are the best results, but ultimately someone's got to make a decision of what goes into the algorithm and what doesn't.
I think where we do have a more defensible stance is when we get criticism, and we've gotten this from all sides. We've said, we have people saying, "Well, Google's too liberal." "Google's too conservative." "Google's too libertarian." And I can tell you because I've looked at the code, that there are no lines of code that says, "If this page is liberal or conservative, then do this."
Now there may be subtle effects that would promote one page or another, but there's no bias of that kind.
MR. HANSON: All right. This question has to do with the China operation and operations like that. How can you guarantee that the IP addresses for searches in China are going to be safe either because you've moved them offshore? How do you know they're not being maintained in other ways within China? And by the way, why do you keep track of the IP addresses of searches anyway?
MR. NORVIG: So you're right. We don't know for sure. If the Chinese government or any other government wants to track you, they can track you through your ISP and make a request for the ISP to release that information. And then we have no say over that. So you're not 100 percent safe and, I think, in most cases, if I were a law enforcement going after an individual, I would probably go to the ISP first before I went to the search engine, because then you've got all the users' interactions, not just his searches. So that would probably be the first place you'd look.
As to the question of why do we keep it, it's to help us improve. So we look at our logs, we say, what people have been making what kind of queries, what results are they getting? Are those the right results? And we keep on trying to improve.
And we need the IP address for two reasons. One is because we want to be able to look at people's sessions and we need something to identify them. We want to be able to say, "Well, look, this one person made, searched, A, B, C, and D, it looks like they're frustrated and they're not getting the right answer." And then they've finally made search F and they got the right answer.
So we'd look at cases like that and say, "Can we automate that?" Can we go directly from search A to the results on F that you were happy with? And in order to do that, we have to know-is this the same person making the same search as was making the one before? So we need some kind of identifying information.
And the other thing about having an IP address is that you can then localize it. You can map from an IP address to a country and usually to a city and then you can say, "Well, what are the right results there?" And then the right results in this country or in this city may be different than the right results someplace else.
MR. HANSON: Any comment from either of our other panelists? Throughout, and this relates to the question of your trade secrets and your potential for commercialization. What's the proper balance between accountability by being as transparent as possible about everything that you do and maintaining secrecy around the algorithm and around some of your data so that you yourself can use it either to improve your product or to develop new products, data products?
MR. NORVIG: Right. So the pledge we have made is to try to deliver the best, unbiased results and we've chosen to keep the methodology for that secret for two reasons. One is competitive advantage in that we've put a lot of work into it and we wouldn't want to publish it and have other competitors being able to use it for free.
And another one is one I think Terry touched on, is this adversarial relationship. That for every search, there are thousands of people out there who think they should be the number one result for that search. And some of them just hope for the best, but others go to sort of devious means to try to promote themselves up to number one.
And if they knew exactly what we were doing, then they would write their page in order to move up in a way that we think would be unfair. So we have to, it's sort of an ongoing competitive war where they try some move and we have to try a counter move.
They have the advantage of not telling us what they're doing and we need that same advantage in order to stay on top of that battle.
MR. BOWKER: I fully understand that and I think that obviously they would want to keep the algorithm secret. However, I think it's a little bit like wanting to have access to cataloguing rules in a library. If it's going to be so important to us, I would at least like a trusted third party of some kind to have some kind of oversight of the algorithm.
Certainly not that I distrust Google, but that I think that when you have a company which is taking on such incredible importance in our society right now, and incredible importance in our relationship with our information rules around us, then I think it wouldn't hurt to have some kind of an oversight. But that oversight should not be one that is making the algorithm public, for all the reasons that Peter has given.
MR. WINOGRAD: I think it's also important just to be aware of what's in the algorithm in general. It doesn't, as Peter said, say if it's liberal or conservative. It's a complicated mathematical structure with lots of different factors like how many words appear how close to each other, in what part of the document and so on.
So even if you or somebody who knows computer science, somebody who knows algorithms were to read that, they wouldn't have the faintest idea how it would rank two particular things that they saw.
MR. NORVIG: Yeah. So I guess I-
MR. WINOGRAD: [Interposing] It really is not couched in human terms.
MR. NORVIG: Right. So I would agree with your point that I think it would be interesting to think of some kind of an oversight committee that could audit the code and say that yes, this is reliable. And I guess I wouldn't have objection to that. If they actually could audit it and understand it well, we'd want to hire them.
MR. HANSON: One more question regarding the algorithm. Isn't it true that many of the results are very close in the ranking that they would get? And if that's true and if there is such power in being number one or being on the first page, wouldn't it be fair or fairer to have a random assignment of the top category or the top group of search results?
MR. NORVIG: Yeah, and we do use some randomization and experimentation in our results. So at any one time, we're probably running dozens of different experiments where we're trying out variations to see is this variation going to be better than the standard one? So you do see a lot of turn and mix, both because of our changes in the algorithms and also because of the changes in the Web. So the results that are number one today may be different than the results tomorrow for very subtle reasons having to do with both changes in the link structure of the Web and with the changes we're experimenting with.
MR. HANSON: But not due to deliberate randomization?
MR. NORVIG: There is some randomization part in it, but to a limited extent.
MR. BOWKER: Can I just comment on that for a second, as actually a talk Terry gave last year which was extremely interesting on this topic. And one of the issues with Google is certainly the representing unrepresented voices and how that can be done. Because there is an element of the math you effect "to them that hath, shall be given."
If you already have links into you, you will get more links. And you will get the Google recognition. So it becomes very hard for groups from under represented parts of the world or from particular kind of interest groups to get themselves actually represented on Google. So I actually think it's extremely good that they go through some kind of randomization if it's to address this sort of diversity issue.
MR. HANSON: This has to do with the choices that Google and other search engines do make to screen or filter certain kinds of results, be it pornographic results in standard searches, if you want to call it that, or searches seeking for how to make an atom bomb.
How does Google make those decisions? What set of filters are put in and is it fair for Google to make those decisions or shouldn't there be some kind of community input if not government input into the process of what filters Google chooses to use on its search results?
MR. NORVIG: So from our point of view, we do two things. One is the pornography that we filter for and there are settings for that. And as was pointed out the default setting is to eliminate images but try to give you all the text. And was also well pointed out, that filter is not perfect. It makes errors in both directions.
And then the other thing we deal with is in terms of legality. And so in France in Germany you can't show Holocaust deniers and so on. And each location has its own set of laws and we obey those laws. Other than those, we're for free and open information.
MR. HANSON: Responses from either of the two of you? Would it be a good idea to have public input to the choices that Google makes regarding filtering?
MR. WINOGRAD: In principle I think it's a good idea. I think to just say what Peter said, the things they are actually filtering as now, are either there's public input from the governments. So what they can filter in Germany or France is that's by their laws. It's not the choice of Google. And pornography, you can imagine the sort of, I don't know who'd want to be a member of this, right? You'd have fun.
But a commission that basically tried to judge on the Internet as a whole what's pornographic and what's not. And then all the search engines would agree to use that. But it's not obvious that putting that fine a point on it's going to help anybody. Pornography is a very vague concept, and I'm not sure I want the legislators to get involved.
MR. HANSON: All right. Anything more? All right. The next question has to do with the ideal of a completely open society for information. Is there any hope on the part of any of you that indeed we will achieve the kind of complete openness envisioned in some of the eloquent quotations that both Terry and Geoff used in their presentations.
MR. WINOGRAD: A quick answer is no. I think they're idealistic. I think that they don't really take into account the social interactions that people have. And I think that you can fight against particular abuses of information, ways of hiding information and so on.
But this sort of, I mean, John Perry Barlow, the fellow who wrote this, had this view. He said that there should be no privacy. Anything you ever do should be public, and he likened it to the small town that he grew up in where you go to the grocery store and they say, "Oh, did you have fun last night when you went out with Sara Jane?" and so on. That everybody knows everything.
And I think the ideals in there are that that would work in a large setting as opposed to a very small setting. Even in a small setting, it has pathologies, but it just doesn't, it's not realistic. And I think a lot of these notions about ideals of open, completely open information as opposed to fighting the abuses of it, I think you're just not realistic in that way.
MR. BOWKER: There are a few aspects on the open information. One is that there's actually a very good article in The Santa Clara [SCU's Student-run newspaper] the other week about Facebook.com and the difficulty that many people are actually leaving traces on Face Book that are now going to go with them for all of their lives. People are going to be able to search and find out that you were the one who, you were the budding alcoholic or you were the one who liked to go out to the disco on Saturday night and get raving drunk and things like that.
People are not thinking about their own relationship with their information rules right now and the sort of traces that they are leaving in the world. And that's something that we as a society need to develop some really good morals about and really good thinking about.
I think our concept of privacy is changing massively and I think we are losing many of the old concepts of privacy as being a castle behind which no one shall come.
But I'm being completely free and open, probably not. That's probably not going to happen, but let me put another spin on this and this was with respect to the availability of information. I get information now at my desktop that it would have taken me years to get before. I would have had to go to libraries in 30 different countries to do the sorts of international research that I want to do.
I would have had to track down sources here and there. I would not have known who to talk to about what. Right now, I have fingertip access to massive amounts of information that in the 19th, 18th centuries they could only have dreamed about. And in diverse form.
So I think that despite all the problems and despite all the difficulties, this is an absolutely wonderful time as far as that goes.
MR. NORVIG: So I can say I think there are idealists and realists in the world and John Perry Barlow is an idealist, but at Google, we're all engineers, and we're more realists. And so we'll root for the idealists, but we'll get back to our day-to-day work.
MR. HANSON: A couple of final questions. Our time is up but let me get a couple more in. Do you see a future in which Internet searches will not generally be done by the default settings but indeed will be very personalized and individuals will put their values into choices and therefore the search results will indeed come back with a whole set of values associated with the individual?
MR. NORVIG: Yes. So certainly you're seeing a lot of interest in personalization and in social networking related to search. And I think that is an interesting thing. It's something we're following very closely-beginning to offer some personalized services.
I think it works less well for search than it does in other domains. So when I go to Amazon and I get recommendations, I'm pretty happy with them, and I think the reason is because it's a much smaller domain that you're searching. And if I bought jazz records last week, I'm probably going to want more jazz records this week and not rap all of a sudden.
But on Web search, it's quite a different thing where every search you're sort of by definition, you're asking about something that you don't know about. And many of the searches are about brand new things, not just more of the same. And so I think personalization there is less powerful than in some of the other collaborative filtering applications.
MR. HANSON: And final question then has to do any of you. Is there any way to control the tendency towards focusing on fewer and fewer sites because of the self-reinforcing nature of the search process and the linkages and the emphasis on those that come out of the top of the searches?
MR. BOWKER: One immediate response on that is one of the reasons why there's so many problems is that nobody ever goes past the first page of a hit screen. So they only have a look at the first 20 results that come back.
If you decide you yourself right now have the ability to go to the 15th, the 20th, the 30th or the 40th page, which I will often do as a matter of principle, and I think this is one of the things where we need as a society to be teaching literacy in the information age. To be teaching that kind of information sophistication which allows us to take advantage of the possibilities that are there.
And I don't think Google's, Google's not necessarily the answer in that case. I think we need to educate ourselves better.
MR. NORVIG: And I think you want to distinguish between sites that are sort of encyclopedic versus more newspaper like. So certainly on our new site, links aren't that important because every article is new. It's just published and it doesn't have links to it yet. And you can look at other sites that specialize in bringing up these kind of random new things.
So things like del.icio.us or ht://dig or e-readit, where they have users voting on what's interesting today. And there you get a very wide variety and then those start getting links to them. I think a similar kind of thing happens in the blogging community where new things get published and very quickly get linked to and get pushed up. So I think there are a lot of voices if you look in different places.
MR. WINOGRAD: I think there's an ecology that develops over time. Back in the early days, you talked about surfing the Web. It was not like you were looking for something, you were just sort of wandering around seeing what waves crashed and what went on, what was interesting. Because if you wanted to look for something, it probably wasn't there.
And then that sort of shifted to the other side, which is now the Web searches. Okay. I need to know this fact, let me go find it. And in that direct search you're not looking around.
But then there's this sort of ecosystem and these new things come up, like blogs and like the tagging and so on, which are like the old surfing. There's a way that you can go and say, "What's interesting here today?"
So I think there's always going to be that back and forth.
Feb 27, 2006
On personal data, personalized advertising, and pain
How can we change online practices that lead to marketing that's both intrusive and inaccurate?
An upcoming talk by journalist Julia Angwin
The criminal justice system is one of many contexts currently impacted by algorithmic decision-making. The notion of “algorithmic accountability,” however, is a developing concept.
Internet access is, increasingly, a necessity
How might we make internet access—and digital literacy education—readily accessible to all low-income residents of Silicon Valley and the rest of the state?