The Ethics and Politics of Search Engines
Listen:
Audio clips from the event
This panel was co-sponsored by the Santa Clara University
Markkula Center for Applied Ethics and the Santa Clara University
Center for Science, Technology, and Society February 27, 2006.
- Participants included:Peter Norvig, director of research,
Google
- Terry Winograd, professor of computer science, Stanford
University
- Geoffrey Bowker, executive director, Center for Science,
Technology, and Society
- Moderator: Kirk O. Hanson, executive director, Markkula
Center for Applied Ethics
- Introduced by Susan Leigh Star, Senior Scholar, Center for
Science, Technology, and Society
MS. LEIGH STAR: Welcome. My name is Leigh Star, and
I'm from the STS Center and the Department of Computer Engineering,
and also president of the Society for the Social Study of Science.
This evening's presentation will look at all sorts of patterns
in information retrieval created by search engines, including
who understands them and who owns them.
Last fall, we sponsored an event examining similarly the politics
and ethics of video gaming, and we engaged that with the Tech
Museum's "Game On" exhibit. We'll have one more event
this year in that series on ethics and social aspects of information.
On May 16, we'll have a discussion of the Social Order of Databases,
what we keep and what we throw away and how.
In all of these talks, we have incorporated the axiom that even
the most humble of information tools involves social organization,
design, and use and politics. Who knew 10 years ago that to
Google would become a transitive verb? I always thought it was
just a big number. Information and its uses pervade all sorts
of spaces, once taken up exclusively by books or by people or
by other forms of action such as picking up the telephone and
asking somebody for an answer, or even doing it face to face.
At the same time, it's prompted new forms of communication
and intimacy-all the sorts of behaviors that extend and support
these new ways of knowing.
MR. KIRK HANSON: Thank you very much, Leigh. My name
is Kirk Hanson; I'm a university professor in the field of organizations
and society here at Santa Clara and also executive director
of the Markkula Center for Applied Ethics, which is one of the
two co-sponsors for this meeting.
My role is to be the chair for this evening's discussion. We
feel delighted that the topic of Search Engine Ethics and Politics
is even more au courant than we anticipated when we set the
title and the format and such for this session. Perhaps what
I'd better do first is to give you a sense of our initial thoughts
about the breadth of issues that might be covered tonight, because
I think it's important for us to give ourselves the freedom
of ranging over these.
It could be everything as broad as are there ethical decisions
in terms of what comes up first when one does a search? What
is truth if truth is represented by page one of a search? What
about advertising and paid placement on search engines? Are
there ethical and social dimensions to that? Transparency regarding
that?
What about self-censorship? What about decisions not to display
as many porn sites that perhaps used to come up when one put
almost any innocuous term into a search engine? What about the
storage of search results and search inquiries; the issue which
Google is confronting with the United States government currently?
What about preserving those and putting those into databases
and offering products based upon what searches are indeed conducted?
What about questions of intellectual property? And Google was
involved in a lawsuit that had a decision last week regarding
the presentation of photos through image searches, in which
there was conflict over intellectual property?
And what about libraries and the intellectual property in libraries
and their presentation through search engines? What about relations
with those Web masters who are trying to outsmart the search
engines? Whether they try to find out the algorithms that Google
and others use, when are they scamming the search engines, and
when can you kick the so and so's off your search results because
they have violated some kind of ethical norm of what's appropriate
behavior and what's not?
And then finally, the government issue of the question of screening
of sites, screening of search results. The government of China
and the representative who spoke last week saying that China
is doing nothing different than every government in the world
does. Is that true? What is the range of practices amongst governments,
and what are the range of responsibilities that search engines
have in that?
So that's just to suggest some of the breadth of issues that
we may want to deal with this evening. We are very, very pleased
to have with us a panel that has extraordinary background to
deal with these and many other questions. Peter Norvig will
be our first speaker. Peter has been with Google since 2001
as director of machine learning, search quality and research.
He's a fellow of the American Association for Artificial Intelligence,
coauthor of Artificial Intelligence; a Modern Approach, which
is a leading textbook in the field.
Prior to coming to Google, Peter taught (a professor at USC,
a research faculty member at Berkeley) and then served in a
number of companies and is head of the 200-person computational
sciences division at the Ames Research Center for NASA. So his
experience is deep and broad in the field of artificial intelligence.
The second speaker will be Terry Winograd. Terry is a professor
of computer science at Stanford University. Terry is well known
for his work in natural language. His focus is on the human-computer
interaction, the design of that interaction with a focus on
both theoretical background of that as well as conceptual models.
And he directs teaching programs in a variety of related areas.
He was one of the founding members and is past president of
the Computer Professionals for Social Responsibility.
Our third speaker will be Geoff Bowker. Geoff is the executive
director of the Center for Science, Technology and Society,
our co-hosts for this evening. He's the Regis and Diane McKenna
Professor in Santa Clara University. Previously, he was professor
and chair of the department of communication at U.C. San Diego.
He holds a Ph.D. in history and the philosophy of science; and
he has a great interest in the design of information systems
and the social and ethical questions that arise in that design.
We're delighted to have the three of you here. Let me simply
turn it over to Peter to begin our discussion.
MR. PETER NORVIG: Thank you Kirk. It's great to be here.
So Kirk mentioned a wide range of topics. I don't think I can
cover them all, but hopefully between our speakers and your
questions, we'll get to all of them eventually.
So first I want to start by showing you Google's mission, which
is to organize the world's information and make it universally
accessible and useful. And I think there are ethical implications
of that and that we feel that we have a responsibility. We're
providing the service and we recognize that others are providing
similar services as well. But we really see it as a mission.
That's something that's our duty to do and do as well as we
can.
Now on top of that, that's the mission statement. On top of
that is our informal motto: "don't be evil." And this
was something that was coined by Paul Pukite, one of our engineers,
I think in 2001 or 2002. And having a motto like that is something
like painting a big bull's eye on your chest.
But we welcome it, because I think it's something that as a
company, we want to live up to and we want to be held accountable.
A while ago, Bill Gates got into a little bit of trouble because
he meant to refer to our mission statement and said, we're the
opposite, but he said, "We're the opposite of Google's
motto," and I don't think that was quite what he intended.
So now in any company, there are a lot of stakeholders that
you serve, and trying to do the right thing depends on who you're
doing it for. And for us, we identify these, there are the end
users, there are all of you when you type a query into Google.
There are our stockholders who we're now accountable to. There
are the content providers, the Web masters who make all this
material available. And we're in sort of the symbiotic relationship
with them where we help them and they help us.
There are the advertisers that eventually pay the bill and
there are the employees of the company. So how do you balance
all that?
What we've chosen to do is to say that our number one focus
is always going to be on the end users, and everything else
is going to follow from that. So to the extent to which we make
money and do good for the company and therefore help the stockholders,
it's all by serving the users first. And I think that has implications
for how we run the company, the decisions we make, and the ethical
considerations.
So now let's look at a couple of the issues. So one of the
ones that Kirk mentioned was this idea of having impartial results.
And we've taken a strict line of saying that there's a separation
of the editorial content and the advertising content. So just
like a newspaper has an editorial staff and an advertising staff,
we make that distinction and we have sort of a firewall between
them where we don't cross.
So one of the implications of that is we've decided to have
no kind of paid placement. You can't buy your way into the search
results. You can buy an ad and be shown on the side of the page,
but you can't be shown in the regular search results. Nothing
you can do can influence that.
The other part of it is that we've decided to have our search
results be entirely algorithmically determined. So in the end,
there is a human, because there's a set of us programmers who
are trying to come up with the best results. But we write our
algorithms the best way we can, we debug and evaluate them,
and then that word is final.
Some of the other companies do that and then on top of that,
they do some hand editing. So maybe they write their algorithms
and then they run it and they say, "Well, what's the number
one result for the query 'weather'?" And maybe it's weather.com
and maybe it's Weather Underground. And then they say, "Well,
I really think this one should be number one. Let's just push
that up to be the first result."
We've decided not to take that route because we think it's
just a slippery slope where if you do that one case, then somebody
else is going to complain and say, "Why aren't I the number
one result for this other query?" And we think it's simple
as just to say that everything's going to be done algorithmically.
There's no hand editing of the results. And so no human bias
can fit in.
Now another important consideration is legality, so Vinton Cerf
has said that part of "don't be evil" is don't be
illegal. And that's mostly true. Sometimes to not be evil, you
do have to break the law, but most of the time you want to follow
the law.
One of the things we have to follow in the United States is
the Digital Millennium Copyright Act where copyright holders
can make a claim to say that "this is something, another
Website is showing a result that I hold copyright to and I don't
want them to do that." And there's a formal process where
they can file these letters of complaint.
So to obey the law, we have to go along with that process.
But what we've chosen to do is be transparent about it. And
so when you do a search that would have shown one of these pages
that had been removed by one of these letters, we show that
at the bottom of the page and then we point you to a site, this
chillingeffect.org, that lists these complaints. And so you'll
see the results at the end of a search page.
Here I've shown you if you go directly to Chilling Effects
and you search for Google, you see a list of the notices that
have come in. And if you look at them, there's about one every
day or two come in for some pages that are removed. So we want
to follow the law and say, "This guy has made a claim saying,
'I'm the copy holder of this material, therefore you can't show
it from somebody else's site,'" but we also want to be
transparent about that to make it clear to our users when we've
removed any results. And we do that for these kinds of things
and we do that in China and we do that throughout the world.
Another important issue is privacy of our users' information.
And we're a member of the European Union U.S. Privacy Safe Harbor
Consortium. So in Europe there are very strict laws for privacy.
There are different laws throughout the world. But the European
Union has set up this safe harbor mechanism whereby if you agree
to abide by these certain standards, then you can be certified
for their level of privacy.
And these are the important considerations that they go on,
and they're ones that we uphold. So notice: we let you know
what we're going to do with your data; choice: we give you the
choice to opt out if you don't want us to collect that data.
Onward transfer means that we have to let you know if we're
going to release your private data to another company.
We've taken the policy as even stronger is that we're never
going to do that. We'll never release private data, but we may
release aggregate data that's anonymized. So we do things like
we release the 10 most popular queries for the month. That's
something that millions of people are contributing to, not just
one. So we feel it's not compromising your privacy.
Security: we have to make sure nobody can steal the
data. Integrity: access, we have to give you access to the data
so you can contest it. That's seems to be not so important for
a search engine. Nobody's saying, "Well gee, I want you
to strike out the fact that I made this query at this particular
time." That's more important for things like credit-card
companies where you say, "Is this a true fact on my record?"
And then we have to have a means of enforcement whereby we
agree to arbitration standards if people have complaints. So
we've done all of that.
Another thing that I think has an ethical implication is the
ability to help the world by enabling new content. So certainly
we feel we're helping the world by allowing access to information
by allowing you all to do searches and get results back. But
we think we're also helping by helping people to create new
content, to make the Web more powerful.
So one thing is we're driving traffic to sites, so we're helping
Web masters get better known. Another is we're allowing businesses
to prosper online by taking out ads and getting traffic that
way. And then the third is it's a slightly different program
where we're allowing content providers to show our Google ads
on their site. And so this allows another way to provide information.
And there are lots of good examples. I was reading a letter
that was really touching. A guy who said, "You know, I
used to have a job, but in my spare time, I worked on this fishing
Website. And my boss would always tell me, 'you're wasting your
time. Why don't you come back and work harder and not waste
your time on the Website?' And then I started showing Google
ads on my fishing Website. And after a month or two, I figured
out I was making enough money that I quit my job and now I spend
all my time either going out fishing or working on the Website
and making it better."
So that was maybe a bad result for his boss, but for him, he
gets to do the thing that he loves and for other fishermen,
they get better information because he's able to spend more
time making his Website and making it be a good thing. And that's
the kind of thing we want to be able to enable.
We also think it's important to work on philanthropy. We've
taken the sort of venture philanthropy approach where we went
to attack philanthropy sort of as a business problem and do
the best we can on it. So from google.org we are funding a number
of businesses that you can read about the various things they're
doing there. And we just this week hired a director for the
organization, Dr. Larry Brilliant, who was one of the eradicators
of smallpox and done a number of things throughout his life.
He was just awarded a wish from the Ted Foundation Conference.
And what he wished for was an early-warning health alarm. So
if a pandemic was breaking out, a way to be able to do that.
And that's actually coincidentally is something that we've been
working on within Google, is a way of detecting that kind of
information. So there's lots of information you can put together.
We have the computing power to be able to do that.
And one of the things that Dr. Brilliant said is that SARS
was the pandemic that did not occur. And part of the reason
it did not occur is because of the information technology that
was available. So when it first started to break out in Asia,
instead of this information being repressed, the doctors were
able to communicate with each other and to start to shut it
down. And I think that's a tribute to the power of modern communication,
both Internet and cell phones and other types of communication
that stopped that from happening.
So speaking of Asia, let's talk about China. When I agreed
to do this talk, it was before this China stuff came up and
I thought the room was going to be more empty than it is. But
perhaps this is one of the reasons why some of you are here.
Now I think some of this coverage of Google's and other companies'
stance with respect to China has been confused, because there
are really two issues. And sometimes I think the press coverage
has not differentiated them properly. So the two issues are
the protection of users and censorship. And they are quite different
things.
So first, the protection of users. So before we agreed to go
into China with our China.cn domain, we thought it was very
important that we needed a policy to keep our users out of jail.
So again, our users are number one. Having them in jail, we
thought, was not a good thing.
So how did we do that? Well, a couple of things. So one is
we're not going to offer any services in China with the current
government the way it is, that have user-identified accounts.
So there'll be no g-mail service, no blogger service, nothing
where you have an account that identifies you as a person. Because
we just thought that was just too dangerous. We didn't want
to be in the position of having to hand over those kinds of
records to the government, and so we thought the only way we
could handle that is not to have those records in the first
place.
Then the second aspect of that is the search log. So we're
not going to have user accounts, but our search logs don't have
personally identifiable information, but they do have an IP
address that is potentially identifiable with an individual
or maybe a small set of individuals who share an IP address.
So the decision we made there is we're not going to keep any
search logs in China. We're going to ship them over to the U.S.
If the Chinese government wants to get it, there's a process
they have to go through that goes to the U.S. State Department,
they have to agree to it, then they have to come to us. We have
to agree to it and if everyone agrees, then probably it is the
right thing to do to release that information.
If anybody in that chain breaks the chain, then the information
does not get sent back. And we recognize this is a hard choice
to decide if this is the right way to go. But we thought this
offered sufficient protection to our users.
And then we support the U.N. Universal Declaration of Human
Rights, the Global Sullivan Principles and so on. And in fact,
we'd like to have stronger principles. We'd like to have people
get together to have stronger principles for them.
So then the second part of the issue is censorship that I said
I think is sometimes confused with the first. Now, part of this
confusion is I think some people don't recognize that censorship
is, in a way, unavoidable. And so this is a little diagram from
the Washington Post that shows how the Internet works in China.
Is that if you're on the Web in China, you type in a URL, it
first goes through your ISP and the government has a back door
into the ISP. It will immediately eliminate certain addresses
and then it will also look at the contents of the pages that
come back and eliminate certain pages if they have content that
the government deems is inappropriate.
So no matter what you do, censorship is there. We've chosen
to have our site within China and in order to do that, we have
to agree to this level of censorship. Even if we didn't agree
to it, it wouldn't make much difference because it would still
be censored. But what we have chosen to do that as far as I
know nobody else is doing, is add this level of transparency.
So when we do censor results, when we remove results from a
page, just as we do everywhere else in the world, we say, "Some
of your results have been removed." So at least the user
knows what's going on.
We feel that the U.S. government can stand up and make stronger
laws and we feel that corporate America can get together and
have stronger principles as well. And we're supporting efforts
on both those fronts. We feel it's difficult for us to do it
on our own.
So we announced that this Google.cn site this month, and in
some parts of the press we were hammered for it. But another
way of looking at us announcing it this month is to say from
1998 up until this month we resisted opening a Google.cn because
we were worried about these issues.
And we didn't see a lot of press coverage saying how Google
was so courageous for resisting for those number of years. So
I think that shows that as popular as we'd like to think we
are, we really don't have that much power by ourselves to change
the world in that way. We'd like to get together with governments
and with other corporations in order to provide some more power
in the aggregate.
Another thing we support is the Open Net Initiative, and this
is adding some more transparency. And so this is an example
where they have a page where you can compare the results on
google.com in the Chinese language interface, that's what you
see around the world, and Google.cn, which is what you see in
China.
Another thing we've done for transparency is we don't geo target.
So for some other sites, some company.cn, they'll give you different
results depending on where you come from; whether your IP address
is within China or without. We give the same results no matter
where you come from, so that that way as an outside observer,
you can see what a user in China would see.
So on the left is a Google.cn, on the right is the Google.com.
The query, they had a drop-down menu of queries and I just chose
one, which was Tibet. And I was actually surprised at that they're
pretty much the same. I guess we don't quite have the local
search and images up in China yet, so that doesn't show up on
the left. But there is the first one here Tibet online provides
information about the plight of Tibet and serves as a virtual
community to end the suffering of the Tibetan people by returning
the right of self-determination.
So that gets through the censors and I was a little bit surprised
at that. I thought that would probably be blocked. But it sort
of shows that there is a lot of information that's getting through.
Another example here searching for bird flu. And here the results
are the same, except that there's an extra ad on the Google.com
compared to the Google.cn. Now probably if you searched with
Chinese characters rather than with the English characters,
there would be more of a difference there. And certainly there
is a difference. There are only 52 million results for bird
flu in Google.cn versus 94 million for Google.com. But the first
10 are exactly the same.
And as we've been out, we've talked to a lot of people about
this, a lot of our own internal Chinese engineers and other
employees, a lot of dissidents and so on. And most of the message
that we got back is what's important for the Chinese end user
is access to information. And sure they want to know about democracy
and Falun Gong and so on, but really they want to know about
their day-to-day information. And they want to know about things
like outbreaks of bird flu and so on.
And so we're giving them that and we think that's the most
important. We'd like to be able to give them all the information,
but we just can't. And we thought it was more important to give
them this information that they can use, even though we have
to offer a compromised service.
And this is a quote from Baidu.com, which is a Chinese search
company. On their page they say, they talk about all the little
things that we do differently. And I think this is actually
very accurate. So it's very important if you want to get the
best results to be on the ground in a country and be able to
do all the little things.
So not only do you have to be able to speak Chinese, and we
certainly have plenty of engineers here in Mountain View that
speak Chinese, but you also have to know who's the Chinese pop
star and what's the equivalent of "American Idol"
in China in order to get all those queries right. Because let's
face it, that's what most of the queries are about.
Some of the people will want to query about democracy, but
most of them just want to know about their pop stars. And in
order to get that right, we have to be able to hire people in
China who know the culture as well as knowing the language.
And so to serve our users better, we felt like we had to be
there on the ground.
Liu Xiabo who's a jailed Chinese activist and he says, "The
Internet is God's present to China. It provides the best tool
for the Chinese people in their project to cast off slavery
and strive for freedom." And we think this is really right.
This is really the message that hit me as I went out and talked
to various Chinese people, saying, certainly censorship is bad
and you don't want to collaborate with that, but given no choice,
access to information is important and will change the world
for these people.
And finally here's Nicholas Kristof from the New York Times
comparing the various companies and their actions. And I think
this was a very good article because it really differentiated
between these two aspects of protecting the user versus censorship.
And his conclusion was, "Google strikes me as innocent
of wrongdoing, and that by providing both the Google.com site
and the Google.cn, we're providing them more choices rather
than fewer."
So it's a hard call to make, but we think this is the right
one, and we'll continue to review it as we go forward.
MR. HANSON: Thank you, Peter.
MR. HANSON: Terry Winograd.
MR. TERRY WINOGRAD: I apologize on my slide here, I
didn't put both sponsors. So it's the Science, Technology and
Society and the Markkula Center. Thank you.
In the interest of full disclosure, I just want to start off
by saying that my main affiliation, where I actually spend all
my time is at Stanford, but I have had and do continue to have
connections to Google. So this is not a completely independent
view, although I'm coming at it from a somewhat different angle.
Larry Page was my graduate student, never finished his degree,
unfortunately. And the Google project actually started as part
of the digital libraries project, which I was one of the principle
investigators on. And over the years, I've been on the technical
advisory board, I took a sabbatical there. I hang out a bit,
but I have not been for quite a while involved in any of the
active engineering or decision-making. So I had this sort of
funny role as an insider outsider.
I know a lot of the people; I've known Peter for many years.
I've known many of the people there. But I have a separate stance
really from my position as a university professor, somebody
who's looking at these questions from an intellectual and academic
point of view and trying to see how they then get reflected.
But I will admit that my exposure to and knowledge of what
search engine companies do is strongly influenced by the fact
that that's what I've seen.
So the question that the organizers asked us is: what are the
ethical challenges? And there's a very nice handout that you've
gotten that talks about this from a philosophical ethical perspective.
And this is sort of a complementary cut at it; it's not really
from a philosophical perspective, but more of a common sense
perspective. Where do the ethical questions come from? What
are the things that you might want to worry about if you're
thinking about search engines or if you're designing one or
if you own a company or part of a company that does one?
So the first, which Peter has talked about some, is this question
of what's fair? It's important to recognize that search is inherently
a socially conditioned process. It's not an objective process
over an objectively given body of material. It is inherently
editorial activity in how you go about finding the things you're
going to search, how you organize them. There is a ranking process
that is inherently adversarial. What I mean is somebody gets
first, somebody else doesn't get first.
So you're in a social setting where you have people, search
engine optimizers as they're called, who are working very hard
to see that their information gets put before somebody else's.
So the search engine provider becomes an arbiter in some sense
between these competing entities.
And you have the problem which is much more recent because things
like Google weren't around before, but where as you concentrate,
and of course, we've seen this in the mass media much earlier,
as you tend to concentrate, there is a narrowing of alternatives;
a narrowing of views, because there is just one or a few places
that you go to see things.
So there are ethical questions that come up in all communications,
mass media
[inaudible]...which are part of the scene for
search engines as well.
So we talk about bias first. What does it mean for a search
result to be biased? And one obvious thing that people raise
and Peter talked about this is if I can buy a place in your
search because I give you more money, then obviously your search
is biased towards money.
And of course that's been true for advertising for years. Why
do I get a big chunk of the New York Times? Not because I'm
more important, I'm talking the advertising chunk. But it's
there, but it's known as the advertising. It's very carefully
marked. And I think as Peter pointed out, Google has made a
very strong effort to say, "If you're going to buy space,
position, then that will be made visible so that the consumer,
the user knows that's what it is."
I think that to some extent, this is a problem that today has
been dealt with very well by a number of companies, and Google's
been particularly there.
There's another kind of bias, which is that you have a point
of view and you express that bias. And that's not always bad.
So looking around on the Web, you'll find things like the Christian
search engine. Okay. They put their bias right in front. They
say, "If you want to find," and it says up at the
top, "Agape, Christian Search." Now there may be multiple
versions of what's Christian, that's another more subtle thing,
but you know that you're going to find certain kinds of things
here and not other kinds. And that's why you choose to go this
site.
So it's fine for a site to be biased in the sense that they're
selected, as long as that's part of who they are. That's part
of their identity; it's part of what you know is going on.
There is censorship, and we've just heard quite a bit about
the whole issue of censorship in China. I want to come back
a little later to talk about why that is such a big issue for
some people. But I think it's an obvious point that you will
get bias if you're explicitly excluding things. And again, the
question is sometimes: do you make that visible? Do people know
that that's being excluded? In which case, they may not like
what they're getting, but at least they aren't being fooled.
There is a lot of discussion as to whether that's true in China.
Whether the average Chinese user really knows that the Internet
they're seeing isn't the whole Internet, or they just think
it is the whole Internet and the other stuff isn't there.
But I want to talk a little more about this notion of technology
bias. So this is a quote from Eric Schmidt who's the president
of Google. Actually, I was looking through my notes to try to
find where it was from. It was an interview and I've forgotten
the exact date. But he was being pressed about bias in Google
News and he said, "This is not really a newspaper. These
are computers." And you can read it up there, I'll skip
through, but, "They're assembling it using algorithms.
I can assure you it has no bias. These are just computers. They're
boring. I'm sorry, you just don't get it."
He was a little annoyed with the question. And Peter made that
same point about algorithmic things. But I think it's very important
to take a step back and say, "What is the algorithm running
on?" And, "What biases are built into an assumption
about how it works?"
So take Google News for example. If I decide to publish the
Santa Clara Radical News and put it on the Web, will my articles
get into Google News? Well, turns out that they have a particular
set of news sources that they consider. They don't just take
anything they find on the Web. That's different from the regular
search. So there's a bias; they try very hard to be balanced.
You'll find Al-Jazeera as well as you'll find right wing American
papers and so on. But there is a bias inherent in the fact that
you're selecting it and even with things that you don't select
that way, there's a bias inherent in the technology.
So one of Google's main advances in search engines was the
notion of page rank technology, the details of which there have
been many other talks on. This one isn't important here, but
it does a calculation based on the structure of links which
lead certain kinds of pages that have lots of links to them
from certain other pages that have lots of links, to get moved
up in the ranking as opposed to others that therefore move down.
Now it's only one of the parts of the ranking. So it's not
like that's the magic algorithm that does everything. But if
you think about what it does, it does have a concentrating power.
It says those people who get more attention get more attention
because they got more attention. And you could make an argument
that that's not necessarily the best for diversity.
So I had a student who did an honors thesis in science, technology,
and society at Stanford, a very astute sort of analysis of search
engines. So this is from his bachelors' thesis, actually. "While
the market mechanism is intended to most fully satisfy the pre-existing
preferences of consumers," that's what gets liked gets
seen more and therefore gets liked more.
"The deliberative ideal," so here we're talking about
issues of democracy, about how people find and debate information
in the world, "requires that individuals be also exposed
to material that is contrary to those predispositions. What's
good for consumers in other words, is not always what's good
for citizens."
So this is the kind of ethical question that you then have
to balance. Is it good? What's good for the public benefit in
terms of a diversity of views that may or may not show up based
on a particular algorithm that is perfectly neutral in the sense
that it's not designed to have right wing versus left wing or
Chinese versus Tibetan or whatever it is. But which implicitly-just
because of the nature of the way the calculations are done-has
this kind of bias let's say for things that are better known
than things that aren't. You could make arguments that that's
better for most users as well, but it is an ethical question
that you're dealing with.
I'm just going to go through a series of questions here and
we can talk more about them in the discussions. Because to put
it in simple terms, I have more questions than answers. I don't
know what the right answers are, and I think that all of the
search engine companies in their own ways are struggling with
these as they go.
So what does it mean to own information? You heard about the
case that Google recently lost for somebody whose pornographic
images, actually they felt were getting unfairly put up in the
search results when people did searches for them. But there
have been more complex kinds of things.
So this is a report from the Financial Times last month. "A
group of newspaper, magazine, and book publishers is accusing
Google and other aggregators of online news stories," this
is back to the Google News, "of unfairly exploiting their
content." Again, I won't read the whole thing, but the
obvious tagline is, "They're building their business on
the back of kleptomania."
So I think it's a very complex question. Anybody who's studied
intellectual property rights and copyright and ownership of
intangible property like this knows that it depends on what
country you're in, what tradition you're in, what kind of material
it is. This is not like you can come up with a simple set of
rules to follow.
And then it's the question for anybody who is displaying content
to figure out what are the ethical things to do. And I think
Google has taken a strong position here based on, as you saw,
that "user is number one" slide that Peter put up,
which says: if we think it's going to be good for the consumers
of content, we're going to push as hard as we can, even if the
people who are, say, the producers of that content, are complaining.
And we believe it's probably good for them in the long run too,
but we're not going to take that into the primary account at
the beginning.
This is a choice that other search engines and other online
information providers are taking differently. Great source for
discussion and debate. We'll move on.
There is a view, this is Richard Stallman, many of you know
of him and this is a sort of famous quote, I believe, "All
generally useful information should be free." You should
have the freedom to copy and adapt it to one's own uses.
And the ideology here is when information is generally useful,
redistributing it makes humanity wealthier, not the owner of
the information wealthier; humanity wealthier, regardless of
who's distributing and no matter who is receiving.
So there are strong views. This is a more subtle one-Stuart
Brand. This is actually earlier than Stallman's, this is quite
a while ago when he was just looking at the future of the Internet.
"Information wants to be expensive because it's so valuable,
but it wants to be free because the cost of getting it out is
getting lower and lower." So you have these two fighting
against each other.
Classical ethical situation. You have two different sets of
values. You could argue both sides, and how do you judge in
a particular case which is the more important?
A third one I want to talk about in a slightly different way
is how much can you-and by 'you' here, I'm addressing these
questions to the search engine providers-control your future?
If you look at issues like maintaining personal information,
this is an article from the Electronic Privacy Information Center,
which is actually an outgrowth at one point of the Computer
Professionals for Social Responsibility, which I was involved
with at the time.
They're looking at what happens when you store online information,
in fact, peoples' e-mail, and you search through it. And a lot
of the debate when Google first started putting advertisements
in e-mail and the computers were going through and processing
it, I think it was misdirected because it really, the fact that
an Internet mail provider is storing your mail in the first
place creates this opportunity.
And whether they are using some algorithm that puts ads on
it or not is not the key. But it can be subpoenaed. That is,
it says here, "Scanning personal communications in the
way Google is proposing is letting the proverbial genie out
of the bottle. Google could tomorrow by choice or by court order
employ its scanning systems for law enforcement." So what
is stopping Google from searching your e-mail to see if you
use the word 'bomb' or whatever it is in it?
The answer is, the people who are there don't intend to do
it. And I trust them. I do know a lot of the people and I know
that's sincere. But how much can they control that? What's going
to happen, first of all, when there's a precedent so that some
kinds of scanning are done, which makes the court say, "Well,
if this kind of scanning is done, we're not going to stop that
kind of scanning."
And what happens when that same company is taken over by different
people or is put under different constraints by the government?
That is, how much are you responsible in your conduct today
to take steps that would prevent ethical problems in the future
that you can't control?
And I think this is a big one. One of the things that Peter
talked about is Google's logs. Google keeps extensive logs of
searches, not with people, not with your name unless you've
signed up specifically for personal search, but, in fact, data-mining
techniques could be used in those logs to, as they like to say
in the intelligence business, connect the dots.
They are resisting a subpoena from the Department of Justice
right now to collect for the government, to sort of harvest
that kind of information for a particular use which has to do
with pornography, which isn't the issue. It's who has the right
to go take those logs and do data mining over them and find
things out?
As long as the logs are there and being collected, it's very
hard for a company to have control. Ultimately, as Peter said,
being evil doesn't always, or not being evil doesn't always
mean following the law. But not going to jail usually means
not following or following or however it goes, right?
You don't want to go to jail; you've got to follow the law.
So there is, I think, a big ethical question about can you do
things which, as long as you control them, you have perfectly
good ethical justification to believe they will be fine, but
where you really cannot be sure you'll control them. How much
do you worry about the worst case? How much do you worry about
disasters?
And the sort of final question that I want to raise here is:
to whom do you pledge allegiance? And there are a variety of
possible answers here. So the American or the world corporate
system, the capital system, basically says it is the goal of
the people who run a company to be responsible to the stockholders
who own that company. Their duty is to provide value to those
stockholders.
Now as we have seen in endless cases and discussions, I'm not
going to get into a discussion of corporate ethics broadly here,
there are limits to that. There are cases where you say, "This
is not going to make the most money for the stockholders, but
it's better for some other societal reasons," and some
of those are often ethical.
Google again has sort of tried to set itself apart from a lot
of other companies in taking a strong stand on that. And it's
a constant question that comes up in looking at it. This is
actually a quote, the "don't be evil" quote of Sergei.
Second are governments. To what extent are you, as a company
running a search engine, obedient or responsible to your government?
And we've just seen the discussion about China. If you're going
to operate in China, you do have to be responsible to the Chinese
government. That's part of being a lawful citizen.
When, and this is again one of these deep, deep ethical questions
that's just sort of a little bit on the surface here, which
is when do the laws and the rights of governments go against
the ethics of the people? When is it right to resist as opposed
to following?
And when it's right to resist, sometimes is because you see
a larger view. I said a little while ago that I was going to
comment on why I think at least a certain population, and by
this I mean the ones that I see around me. People who are in
computing in the information world found this China question
so intense.
In fact, the kind of censorship that's going on in China is
not completely out of range with what's going on in other places,
other settings. But I think it really triggered a kind of final
battle between two views. And I want to read a couple of longer
quotes here because I think they get to this sort of heart of
what makes these have, these issues have a kind of bite for
a lot of people.
This is from an article by Introna and Nissenbaum who in fact
have been participants here in the center. They talk about the
ideal Web. They say, "It's not merely a new communications
infrastructure offering bandwidth, speed, connectivity and so
on, but is a platform for social justice.
"It promises access to the kind of information that aids
upward social mobility and helps people make better decisions
about politics, health, education, and more. It facilitates
associations and communications that could empower and give
voice to those who traditionally have been weaker and ignored.
"It will empower the traditionally disempowered, giving
them access both to typically unreachable nodes of power and
to previously inaccessible troves of information."
Now this is pretty utopian. This is a view that says this is
not the usual build a bunch of telecommunications infrastructure
and people will use it and make money on it. It's saying there's
something different. The Internet really isn't the same old
kind of thing. There's a shift, a change of mode.
A much stronger version of this, which some of you may have
seen, I think it's over stated but in a way that I think captures
the spirit, is this Declaration of Independence of Cyberspace.
Some of you may even know of John Perry Barlow. He was, among
other things, a songwriter for the Grateful Dead and a cattle
rancher in Wyoming. He's a very interesting guy, very smart.
And this is now back in really the early days of the Web sort
of catching hold. "Governments of the industrial world,
you weary giants of flesh and steel, I come from cyber space,
the new home of mind. On behalf of the future, I ask you of
the past to leave us alone. You are not welcome among us. You
have no sovereignty where we gather.
"We have no elected government, nor are we likely to have
one. So I address you with no greater authority than that with
which liberty itself always speaks. I declare the global social
space we are building to be naturally independent of the tyrannies
you seek to impose on us. You have no moral right to rule us,
nor do you possess any method of enforcement that we have true
reason to fear." And so on.
Is it big enough to read from the back? You can read the rest.
This is a really big strand of thought, especially in the early
days of the Internet and the Web, which was that this was going
to be the opportunity to really change the world social order.
It's not a question of supporting industries; it's not a question
of online commerce. It's not a question of just being able to
talk to people somewhere else faster. It's really going to change,
turn the power relations around so that we, the citizens of
cyber space, have more power than you, the government of China.
And I think that that is being questioned now in a way that
it hadn't been up until now. I think that the whole Internet
structure is being questioned by governments and I think these
things like censorship. And it's bringing the realization that,
in a way, you can't escape reality. The ethical questions of
how do you deal with governments and power and politics are
not void in cyberspace. They just play out in somewhat different
ways.
And I think all of the kinds of ethical questions that have
come up in centuries or decades or millennia of thinking about
people and democracy and governments are just as active here.
They don't go away. They become strong.
The danger is, and I just wanted to comment to this, it's sort
of the Barlow thing, that when you start to believe that what
you're doing is for the good of humanity in general, then how
do you judge that? What kinds of institutions are there? Governments
are institutions. Religions are institutions. They all tell
us in their ways what it is that constitutes appropriate ethical
behavior.
As he said, "We, the citizens of cyberspace have no elected
government, and we're not going to have one." So does it
just become a matter of personal preferences? And I liked this,
this is a quote from a Playboy interview that was famous. It's,
I think, the only Playboy interview that's ever been filed as
part of a Securities and Exchange Commission stock filing because
it came out at the wrong time. And there's a whole bunch of
stuff about that. But it was part of the public debut of Google
as a public company.
And it's an interview and Sergei says, "For example, we
don't accept ads for hard liquor, but we accept ads for wine.
It's just a personal preference. We don't allow gun ads and
the gun lobby got upset about that. We don't try to put our
sense of ethics into the search results, but we do when it comes
to advertising."
So again, they're making a clear division and I admire this,
to say, "Okay. The search results are not going to have
our biases." But even in the advertising, as soon as you
say, "I'm doing this for the good of society," somebody
says, "Guns are bad." Somebody else says, "Guns
are good." Who decides among those?
If it's the stockholders, if it's the corporations, then the
question is, "Which makes you more money?" It may
be hard to calculate, but it has a clear kind of answer. If
it's individual leaders, if you believe Sergei Brin, then you
say, "Well, he doesn't like hard liquor, so that's good."
But of course then you have leaders on all sides of issues.
Is it governments? Is it institutions? Is it just the way the
society works out? And I think that because of this mythical
origin I'll call it of cyberspace, these questions have really
come up in a new and active way. And I think it's one of the
things that makes it a fun topic for people who are looking
at ethics to see how they apply. That the old questions which
have been in a context of governments, businesses and so on,
really have this additional dimension. So I'll be interested
to hear what you all have to say.
MR. HANSON: Thank you, Terry.
MR. HANSON: Geoff Bowker.
MR. GEOFF BOWKER: Okay. Hi. You've all been very patient
listening to these two wonderful speakers before, so I'm going
to race through a little bit what I have to say so that we've
got good time for question and panel discussion at the end.
I'm going to cover three issues relatively quickly. The rights
of a state to ask for and control information. The role of the
information provider to make information available, either to
the states or to citizens. And finally I'm going to talk a little
bit about the problem of overseeing highly complex technologies,
the kind of technologies that we're playing with today.
But let me start with, this is just a picture from last year.
And if you went on Microsoft Networks and you tried to work
out how to get from Halvorsen in Norway to Trondheim in Norway,
you would find out that the best way to go would be south through
Stockholm, across to Berlin, across into London, up to Amsterdam.
And I started to get worried when I was in Newcastle, I think,
and you're being asked to take the ferry back across. And finally,
you'll find your way to Trondheim. It's a distance, according
to Microsoft Map Serve, of 1,695 miles.
Today you can do that same search and find it's only 476 miles
long. The problem is that the information that's presented to
us, these are becoming highly trusted sites. We believe in them.
They take a certain moral weight, they do have a weight in our
society. And unless there's something obvious like, "Oh
my God, I shouldn't be going to England in order to get to another
place in Norway," then how do you challenge them? How do
you question them? How do you say what's wrong about them? And
that's really what I'm going to be talking about today.
First point I'm going to make is a little bit, I think, well,
I think it's highly sympathetic to one of Peter's main points.
That I really do separate off issues of protecting the users
and protecting content. States, there's been a lot of hypocrisy,
to put it mildly perhaps, in the discussions about China and
censorship.
States are in the business of censorship and they always have
been. The American state's done very well. We banned Ulysses,
which was selected by the modern libraries, the best novel of
the 20th century. It was banned as obscene for 15 years. Voltaire's
Candide, written over 200 years, ago was banned and seized.
Aristophanes' Lysistrata, Chaucer's Canterbury Tales. These
are all highly recommended books. I love them all. I'm ashamed
to say I've read all of
the obscene books here.
They were banned at various times. Jean-Jacques Rousseau made
the trifecta. He was banned in this country, he was banned by
the Catholic Church Index of Prohibited Books and he was banned
in the Soviet Union.
States do this. They try and control the information that their
populous can get access to. Is that a good thing? No, it's not.
Is it the personal responsibility of folks in Google to make
certain that the state isn't involved in this sort of behavior?
Absolutely not.
That is not the responsibility of an individual company, it's
the responsibility of us all to take care of our own democratic
rights. And it's the responsibility of corporate America, perhaps
to make a joint statement about this. But we cannot fight state
censorship through the corporations alone.
I'm just going to zip through some other aspects of censorship
today. If you went, in 2002, to Google in China and typed www.google.com,
you actually got a site at Peking University which was a site
which had it's own kind of search engine attached to it.
Not so today. Now as we know, you actually have pretty good
access to a lot of information through Google China. Now here's,
if I do democracy, which is one of those wonderful play words,
if I do democracy on Google.com, I'll find 145 million hits
in about 0.1 seconds. If I do that same search in Google China,
I'll get 139 million hits. So there are six million hits different.
Which six million? How am I going to get access to those six
million? And I do somewhat take to task, if you'd let me go
forward a little bit here, take to task a little what Peter
was saying about transparency here. Because yes, there is some
transparency, the Chilling Effects Website, which is a very
interesting site and I highly, recommend it to you.
However, if you burrow down a little bit and you see this February
2, 2006, we have sender information, private, recipient information,
private. On February 2, 2006, Google received a complaint about
Web pages that allegedly violated section 86.
It's very useful that that's there, but it's really not telling
me very much about what exactly is being banned, why is it being
banned, who has demanded that it be banned. It is a step towards
transparency, but it is certainly not acting as full transparency.
So a second issue which has come up lately with Google which
I've been highly sympathetic with their stand on, and which
I think is one which has received less play than it might, but
I think it's actually a thin end of the wedge-style issue, is
Google's refusal to accept the subpoena of their search records
from the American State Department.
I see this as a very long tradition of government overreach
of trying to get hold of personal information about searches,
personal information about our own information behavior. The
American Library Association has been fighting this since 1938
when it first came up with its statement about protecting library
records. It is now in the Code of Ethics of the American Library
Association. This is the sort of body that should be dealing
with this, not the corporations. Number three, we protect each
library user's right to privacy and confidentiality with respect
to information sought or received and resources consulted, borrowed,
acquired or transmitted.
And they've not unfortunately been all that successful. They
say right now they're extremely worried. In June 2005, they
say, "Since September 11, since October, actually, 2001,
they've had 63 legal records, request for records in public
libraries and 74 in academic libraries." This is a constant
battle; it's not just a search engine front. This is a front
where we need to think as a society about what our relationship
is with our information, how we use it, how we share it, who
should know about it.
I'm going to touch on a final issue. It's a somewhat hairy
one, but I'm going to have to go through it quickly just in
the interest of discussion. There's a wonderful cataloger, Sanford
Berman, who's been attacking the Library of Congress and the
Dewey Decimal System for a number of years. And he talks about
this wonderful concept "bibliocide by cataloging."
What he means by this is if you can't get access to the book,
if it's not well enough catalogued, well enough defined, it
doesn't matter that it's in your library, nobody's ever going
to know that it's there in the first place. So a standard cataloguing
record takes a radical book about street art, graffiti, ethnic
art in society and mural painting and just makes it available
through art in society, art amateur, and street art United States.
Now it's one of the great advantage of Google is they don't
fall into that catalog trap. They do allow you to get free access
to lots of information, but they do end up doing a huge amount
of filtering. Your default on Google preferences, no, your default
is to use moderate filtering. Filter explicit images only, default
behavior.
That's highly problematic. Most people when they know Google,
don't even know they have that set of preferences going. Sure,
they can change it and you can change so you use either no filtering
or strict filtering, but most of you without knowing it are
actually engaged in moderate filtering.
You have to be a highly engaged user to understand the issues
and to understand the capabilities that Google is very well
and rightly making available to you.
Let's look at some problems with this. So if I do an unfiltered
search on breast cancer, breast cancer is always one that the
search engines have had difficulty with by the way, with their
safety programs, their nanny programs.
I'll find on the first page of the unfiltered Google, the breast
cancer site that is founded to help offer free mammograms to
underprivileged women. You go to the site, you click on an advertisement
and that provides money, which will then be filtered into providing
free mammograms for underprivileged women.
If I go to filtered Google, that site's not there. It's not
only not there in the first page, it is not there, full stop.
That site has been filtered out. Now I went to that site today,
this was just, it's one of those operations well, I'll just
try it for myself. Went to the site, I can't see anything wrong
with it. Maybe that picture is a little bit risqué down
there in the bottom right, but I'm sure that Google could probably,
most kids could probably handle it.
This is a mistake. They probably don't want their filter mechanism
to be doing this. No I don't want to be running that. Go away.
They probably don't want, they don't want to be doing it, but
it is happening. Who's got the control on this? How can we develop
good controls so that we have good public understanding of these
highly complex technical issues and ways of dealing with them,
which are rich within our democracy?
Within our democracy today, within our lives today, our information
infrastructure is central to the way in which we live and it's
central to the way in which we act as citizens. Google has taken,
I think, a very good and strong lead in protecting users, however,
there are problems with Google.
There are problems with all search engines. There are problems
that we as a society face, which I think we should have free
and open discourse about right now. And for which I'm very grateful
for all of you tonight in taking part in. Thank you.
MR. HANSON: Thank you. Let me invite our panel to come
up front and I'm going to remain over here and ask your questions.
And in advance we've already got more questions than we can
handle given the time. But I will do my best to get all of the
themes into this.
Let me just highlight for you, please do write down more questions
and they will be picked up and handed to me. Secondly, there
will be time for informal discussion after our question and
answer period, because we're going to have a reception next
door in the adjoining room. You all are welcome to join us for
that reception.
Let me get right into it. There was discussion, Peter, you
mentioned that Google was in favor indeed of an industry-wide
effort to develop a set of principles regarding dealing with
government censorship. Let me ask you and the other panelists
to comment on what are the key elements that ought to be in
such a code or in such a cooperative effort. What areas do you
expect it to cover? Perhaps, what behaviors should it prohibit
or declare to be evil?
MR. NORVIG: So I guess we're just starting up the conversations
now with some of the other companies and I think it's really
the things we've already addressed. There are issues of protecting
privacy of users and its use of censorship. And what they can
stake or what the legal status of them would be. But we're in
discussions to try to work that out.
MR. HANSON: Do you Terry or Geoff have suggestions for
Google and the other companies in the industry as to things
that definitely ought to be addressed in such a cooperative
code?
MR. WINOGRAD: Well, I have a concern and it has to do
with the international structure. There is a certain attitude
towards free speech and free information that we all sort of
take for granted being in this culture, that's very, very different
from what you have in a closed culture, like Iran or China.
And even somewhat different from what they have in Germany and
France and so on.
And there was a recent trial of a Holocaust denier going to
jail for what we might consider free speech, and so on. And
I think that it's very important for this not to become another
place where the rest of the world sees it as Americans trying
to impose their particular values on everybody. But it's hard,
because the companies don't have national reach. They have global
reach.
And I think trying to come up with ways that can take into
account the whole world-global, cultural differences along with
the principles we believe in. You don't want to say, "Well,
we give those up." I think it's a very hard problem.
MR. BOWKER: Yeah, I'd agree with what both Peter and
Terry had said. I think all that I'd add particularly is the
issue of transparency, which I raised before, which I think
is a highly complex issue. But the industry as a whole should
develop ways in which it's obvious how and why censorship is
occurring or when and how records are actually being made available.
MR. HANSON: When Terry was talking about choices Google
had made, Peter, he said that you've chosen to favor the consumer
as you said, even to the point of outraging some of the content
providers. Now you have three professors here who are all content
providers and write books, and we'd like you to pay attention
to us as well and not always to favor the user, rather than
us. What kind of arguments would you give us that indeed you
ought to pay attention to the user rather than to wise content
providers like ourselves?
MR. NORVIG: Right. Well, I'm a book publisher too, or
a book author, so I have that interest at heart. But I think
in most of these cases, the providers of information themselves
are on our side. That they want more access to their information,
and the way we're providing that most of them seem to agree
to.
The place where we're being hit back is not from the author's
of the information but it's from the aggregators of them. So
for example, in the newspaper case where we aggregate stories
from all the newspapers on our Google News page, if you go to
the newspapers, they love it. They're getting lots of click
through from those headlines that we show on the page, and they
want more of it.
The people that are objecting are like the news services, the
API and particularly the French agency where they say, "No.
We don't want that." And they could certainly stop it.
If they wanted to, they just put in their robust text or in
the meta tags, they could tell us, "Don't index these pages."
But the newspapers want to index it. It's these aggregators
who want more control and they want, they're following the money.
They're saying, "Google's making profit off of this and
we aren't, so what can we do about that?"
MR. HANSON: And to your co-panelists, do you agree with
that position that, indeed, you're glad to have your content
shared? And do you believe there's ever a circumstance in which
Google should be paying for your intellectual property?
MR. WINOGRAD: Well, Google is not, they shouldn't be
paying in the sense they should either be not taking advantage
of it or it should be free. They're not going to be in the business
of a paid intermediary. But from a personal point of view, and
I think this would be true for most people that I know, the
feeling is you'd much rather have them making it accessible
and visible to the world. And some of them might buy the books,
but it's a net gain to everybody.
I think the big question that comes up here is that you have
different opinions about how the future is going to work out
and the question is whose opinion gets taken into account? It
goes back to the question of who do you trust?
So Google says, and I actually believe most of this, that if
you do this kind of broad indexing and putting everything online
and so on, it's going to be good for everybody. The providers
will, in fact, make more; the booksellers will sell more. But
people who don't have that same view say, "We don't want
to be forced to believe that now. We want to still have some
control because we have institutions, we have courts,"
and so on. And that's where it's being fought out.
MR. BOWKER: Yeah, I think it's one of the weird things
in academia especially is that we've got this huge publishing
industry around the journals, which is relatively unnecessary
and relatively not providing enough value added to really make
it worthwhile.
As someone who writes a lot, I'm desperate for people to read
me. I'll do anything to have you read me and have anything that
I have made available. I have no problem with that.
The bottom line as far as publishers are concerned in general,
it's been shown that when you put full text of books up, people
will buy more. So I think Terry's quite right in his thinking
about that. Let me stop there.
MR. HANSON: This panel is very Google-oriented, one
of our audience participants mentions. And so the question is
that I'll add to that: Are the strategies or the choices made
by other search engines different than those made by Google?
And how would you characterize them and would you critique them?
MR. BOWKER: Well, maybe one of us should go before Peter
on that.
MR. NORVIG: I think I can pass on this one. You guys
can handle it.
MR. BOWKER: There are certainly differences in China,
for example, where Yahoo has released user records that led
to someone's arrest. I've not followed that case in huge detail,
but I think that's very different in terms of protecting the
rights of the users.
There are differences in this country between the Microsoft
and Yahoo with respect to Google on the subpoena for the search
terms. And I actually, as I said before, I strongly support
Google's stand on that. Although it has been argued and truth
in advertising that Google's stand was more about protecting
their algorithm than protecting the rights of their users in
this case.
But whatever, I think their stand was absolutely right.
MR. WINOGRAD: Yeah, I think there is a difference in
tone with some of the free Yahoo in particular on the issue
that Peter raised about who advises you on what you should be
reading. And that Google has taken this very strong stand, which
says, "We're not going to say this particular thing is
better than that; let's put it first."
And I think that Yahoo in particular, but also Microsoft to
some extent, has taken much more the traditional, what you think
of as the newspaper view, which is you're going to hire editors
who really have good judgment as to what should be first. And
you go to them because you're buying their judgment rather than
their objectivity.
And if those are clear, if you know that's what you're getting,
as I said before, that's not bad, but I think those get mixed
up because people do think of a search engine as being neutral
in more of a sense.
MR. BOWKER: Can I just jump in there because there was
something I actually somewhat disagree with what you said, Peter.
That I think just because you have an algorithm doesn't mean
it's objective. And just because you have an algorithm doesn't
mean there are not social, political, and ethical values written
in. It's just that it's hidden in the algorithm rather than
in the mind of the person that's making the decision-the editorial
choice.
The editorial choices are still there, and they're much harder
to access when they're in a technical algorithm most of us can't
understand and which are actually secret, than they are if we
could actually have access to the real live people making the
editorial choices. So I'm not quite certain that Google's position
is as strong there as it might be.
MR. HANSON: Peter, would you like to respond to that?
Are your biases all hidden in the algorithms?
MR. NORVIG: So obviously there are biases because somebody's
got to come out number one and everyone else doesn't. And I
think Terry made a good point that there is this bias towards
pages that gets links from other sites and then that's part
of the page rank part of the algorithm. There are lots of other
parts to the algorithm that have other kinds of biases.
And I think that's right that we do our best to come up with
what we think are the best results, but ultimately someone's
got to make a decision of what goes into the algorithm and what
doesn't.
I think where we do have a more defensible stance is when we
get criticism, and we've gotten this from all sides. We've said,
we have people saying, "Well, Google's too liberal."
"Google's too conservative." "Google's too libertarian."
And I can tell you because I've looked at the code, that there
are no lines of code that says, "If this page is liberal
or conservative, then do this."
Now there may be subtle effects that would promote one page
or another, but there's no bias of that kind.
MR. HANSON: All right. This question has to do with
the China operation and operations like that. How can you guarantee
that the IP addresses for searches in China are going to be
safe either because you've moved them offshore? How do you know
they're not being maintained in other ways within China? And
by the way, why do you keep track of the IP addresses of searches
anyway?
MR. NORVIG: So you're right. We don't know for sure.
If the Chinese government or any other government wants to track
you, they can track you through your ISP and make a request
for the ISP to release that information. And then we have no
say over that. So you're not 100 percent safe and, I think,
in most cases, if I were a law enforcement going after an individual,
I would probably go to the ISP first before I went to the search
engine, because then you've got all the users' interactions,
not just his searches. So that would probably be the first place
you'd look.
As to the question of why do we keep it, it's to help us improve.
So we look at our logs, we say, what people have been making
what kind of queries, what results are they getting? Are those
the right results? And we keep on trying to improve.
And we need the IP address for two reasons. One is because
we want to be able to look at people's sessions and we need
something to identify them. We want to be able to say, "Well,
look, this one person made, searched, A, B, C, and D, it looks
like they're frustrated and they're not getting the right answer."
And then they've finally made search F and they got the right
answer.
So we'd look at cases like that and say, "Can we automate
that?" Can we go directly from search A to the results
on F that you were happy with? And in order to do that, we have
to know-is this the same person making the same search as was
making the one before? So we need some kind of identifying information.
And the other thing about having an IP address is that you
can then localize it. You can map from an IP address to a country
and usually to a city and then you can say, "Well, what
are the right results there?" And then the right results
in this country or in this city may be different than the right
results someplace else.
MR. HANSON: Any comment from either of our other panelists?
Throughout, and this relates to the question of your trade secrets
and your potential for commercialization. What's the proper
balance between accountability by being as transparent as possible
about everything that you do and maintaining secrecy around
the algorithm and around some of your data so that you yourself
can use it either to improve your product or to develop new
products, data products?
MR. NORVIG: Right. So the pledge we have made is to
try to deliver the best, unbiased results and we've chosen to
keep the methodology for that secret for two reasons. One is
competitive advantage in that we've put a lot of work into it
and we wouldn't want to publish it and have other competitors
being able to use it for free.
And another one is one I think Terry touched on, is this adversarial
relationship. That for every search, there are thousands of
people out there who think they should be the number one result
for that search. And some of them just hope for the best, but
others go to sort of devious means to try to promote themselves
up to number one.
And if they knew exactly what we were doing, then they would
write their page in order to move up in a way that we think
would be unfair. So we have to, it's sort of an ongoing competitive
war where they try some move and we have to try a counter move.
They have the advantage of not telling us what they're doing
and we need that same advantage in order to stay on top of that
battle.
MR. BOWKER: I fully understand that and I think that
obviously they would want to keep the algorithm secret. However,
I think it's a little bit like wanting to have access to cataloguing
rules in a library. If it's going to be so important to us,
I would at least like a trusted third party of some kind to
have some kind of oversight of the algorithm.
Certainly not that I distrust Google, but that I think that
when you have a company which is taking on such incredible importance
in our society right now, and incredible importance in our relationship
with our information rules around us, then I think it wouldn't
hurt to have some kind of an oversight. But that oversight should
not be one that is making the algorithm public, for all the
reasons that Peter has given.
MR. WINOGRAD: I think it's also important just to be
aware of what's in the algorithm in general. It doesn't, as
Peter said, say if it's liberal or conservative. It's a complicated
mathematical structure with lots of different factors like how
many words appear how close to each other, in what part of the
document and so on.
So even if you or somebody who knows computer science, somebody
who knows algorithms were to read that, they wouldn't have the
faintest idea how it would rank two particular things that they
saw.
MR. NORVIG: Yeah. So I guess I-
MR. WINOGRAD: [Interposing] It really is not couched
in human terms.
MR. NORVIG: Right. So I would agree with your point
that I think it would be interesting to think of some kind of
an oversight committee that could audit the code and say that
yes, this is reliable. And I guess I wouldn't have objection
to that. If they actually could audit it and understand it well,
we'd want to hire them.
MR. HANSON: One more question regarding the algorithm.
Isn't it true that many of the results are very close in the
ranking that they would get? And if that's true and if there
is such power in being number one or being on the first page,
wouldn't it be fair or fairer to have a random assignment of
the top category or the top group of search results?
MR. NORVIG: Yeah, and we do use some randomization and
experimentation in our results. So at any one time, we're probably
running dozens of different experiments where we're trying out
variations to see is this variation going to be better than
the standard one? So you do see a lot of turn and mix, both
because of our changes in the algorithms and also because of
the changes in the Web. So the results that are number one today
may be different than the results tomorrow for very subtle reasons
having to do with both changes in the link structure of the
Web and with the changes we're experimenting with.
MR. HANSON: But not due to deliberate randomization?
MR. NORVIG: There is some randomization part in it,
but to a limited extent.
MR. BOWKER: Can I just comment on that for a second, as actually
a talk Terry gave last year which was extremely interesting
on this topic. And one of the issues with Google is certainly
the representing unrepresented voices and how that can be done.
Because there is an element of the math you effect "to
them that hath, shall be given."
If you already have links into you, you will get more links.
And you will get the Google recognition. So it becomes very
hard for groups from under represented parts of the world or
from particular kind of interest groups to get themselves actually
represented on Google. So I actually think it's extremely good
that they go through some kind of randomization if it's to address
this sort of diversity issue.
MR. HANSON: This has to do with the choices that Google
and other search engines do make to screen or filter certain
kinds of results, be it pornographic results in standard searches,
if you want to call it that, or searches seeking for how to
make an atom bomb.
How does Google make those decisions? What set of filters are
put in and is it fair for Google to make those decisions or
shouldn't there be some kind of community input if not government
input into the process of what filters Google chooses to use
on its search results?
MR. NORVIG: So from our point of view, we do two things.
One is the pornography that we filter for and there are settings
for that. And as was pointed out the default setting is to eliminate
images but try to give you all the text. And was also well pointed
out, that filter is not perfect. It makes errors in both directions.
And then the other thing we deal with is in terms of legality.
And so in France in Germany you can't show Holocaust deniers
and so on. And each location has its own set of laws and we
obey those laws. Other than those, we're for free and open information.
MR. HANSON: Responses from either of the two of you?
Would it be a good idea to have public input to the choices
that Google makes regarding filtering?
MR. WINOGRAD: In principle I think it's a good idea.
I think to just say what Peter said, the things they are actually
filtering as now, are either there's public input from the governments.
So what they can filter in Germany or France is that's by their
laws. It's not the choice of Google. And pornography, you can
imagine the sort of, I don't know who'd want to be a member
of this, right? You'd have fun.
But a commission that basically tried to judge on the Internet
as a whole what's pornographic and what's not. And then all
the search engines would agree to use that. But it's not obvious
that putting that fine a point on it's going to help anybody.
Pornography is a very vague concept, and I'm not sure I want
the legislators to get involved.
MR. HANSON: All right. Anything more? All right. The
next question has to do with the ideal of a completely open
society for information. Is there any hope on the part of any
of you that indeed we will achieve the kind of complete openness
envisioned in some of the eloquent quotations that both Terry
and Geoff used in their presentations.
MR. WINOGRAD: A quick answer is no. I think they're
idealistic. I think that they don't really take into account
the social interactions that people have. And I think that you
can fight against particular abuses of information, ways of
hiding information and so on.
But this sort of, I mean, John Perry Barlow, the fellow who
wrote this, had this view. He said that there should be no privacy.
Anything you ever do should be public, and he likened it to
the small town that he grew up in where you go to the grocery
store and they say, "Oh, did you have fun last night when
you went out with Sara Jane?" and so on. That everybody
knows everything.
And I think the ideals in there are that that would work in
a large setting as opposed to a very small setting. Even in
a small setting, it has pathologies, but it just doesn't, it's
not realistic. And I think a lot of these notions about ideals
of open, completely open information as opposed to fighting
the abuses of it, I think you're just not realistic in that
way.
MR. BOWKER: There are a few aspects on the open information.
One is that there's actually a very good article in The Santa
Clara [SCU's Student-run newspaper] the other week about Facebook.com
and the difficulty that many people are actually leaving traces
on Face Book that are now going to go with them for all of their
lives. People are going to be able to search and find out that
you were the one who, you were the budding alcoholic or you
were the one who liked to go out to the disco on Saturday night
and get raving drunk and things like that.
People are not thinking about their own relationship with their
information rules right now and the sort of traces that they
are leaving in the world. And that's something that we as a
society need to develop some really good morals about and really
good thinking about.
I think our concept of privacy is changing massively and I
think we are losing many of the old concepts of privacy as being
a castle behind which no one shall come.
But I'm being completely free and open, probably not. That's
probably not going to happen, but let me put another spin on
this and this was with respect to the availability of information.
I get information now at my desktop that it would have taken
me years to get before. I would have had to go to libraries
in 30 different countries to do the sorts of international research
that I want to do.
I would have had to track down sources here and there. I would
not have known who to talk to about what. Right now, I have
fingertip access to massive amounts of information that in the
19th, 18th centuries they could only have dreamed about. And
in diverse form.
So I think that despite all the problems and despite all the
difficulties, this is an absolutely wonderful time as far as
that goes.
MR. NORVIG: So I can say I think there are idealists
and realists in the world and John Perry Barlow is an idealist,
but at Google, we're all engineers, and we're more realists.
And so we'll root for the idealists, but we'll get back to our
day-to-day work.
MR. HANSON: A couple of final questions. Our time is
up but let me get a couple more in. Do you see a future in which
Internet searches will not generally be done by the default
settings but indeed will be very personalized and individuals
will put their values into choices and therefore the search
results will indeed come back with a whole set of values associated
with the individual?
MR. NORVIG: Yes. So certainly you're seeing a lot of
interest in personalization and in social networking related
to search. And I think that is an interesting thing. It's something
we're following very closely-beginning to offer some personalized
services.
I think it works less well for search than it does in other
domains. So when I go to Amazon and I get recommendations, I'm
pretty happy with them, and I think the reason is because it's
a much smaller domain that you're searching. And if I bought
jazz records last week, I'm probably going to want more jazz
records this week and not rap all of a sudden.
But on Web search, it's quite a different thing where every
search you're sort of by definition, you're asking about something
that you don't know about. And many of the searches are about
brand new things, not just more of the same. And so I think
personalization there is less powerful than in some of the other
collaborative filtering applications.
MR. HANSON: And final question then has to do any of
you. Is there any way to control the tendency towards focusing
on fewer and fewer sites because of the self-reinforcing nature
of the search process and the linkages and the emphasis on those
that come out of the top of the searches?
MR. BOWKER: One immediate response on that is one of
the reasons why there's so many problems is that nobody ever
goes past the first page of a hit screen. So they only have
a look at the first 20 results that come back.
If you decide you yourself right now have the ability to go
to the 15th, the 20th, the 30th or the 40th page, which I will
often do as a matter of principle, and I think this is one of
the things where we need as a society to be teaching literacy
in the information age. To be teaching that kind of information
sophistication which allows us to take advantage of the possibilities
that are there.
And I don't think Google's, Google's not necessarily the answer
in that case. I think we need to educate ourselves better.
MR. NORVIG: And I think you want to distinguish between
sites that are sort of encyclopedic versus more newspaper like.
So certainly on our new site, links aren't that important because
every article is new. It's just published and it doesn't have
links to it yet. And you can look at other sites that specialize
in bringing up these kind of random new things.
So things like del.icio.us or ht://dig or e-readit, where they
have users voting on what's interesting today. And there you
get a very wide variety and then those start getting links to
them. I think a similar kind of thing happens in the blogging
community where new things get published and very quickly get
linked to and get pushed up. So I think there are a lot of voices
if you look in different places.
MR. WINOGRAD: I think there's an ecology that develops
over time. Back in the early days, you talked about surfing
the Web. It was not like you were looking for something, you
were just sort of wandering around seeing what waves crashed
and what went on, what was interesting. Because if you wanted
to look for something, it probably wasn't there.
And then that sort of shifted to the other side, which is now
the Web searches. Okay. I need to know this fact, let me go
find it. And in that direct search you're not looking around.
But then there's this sort of ecosystem and these new things
come up, like blogs and like the tagging and so on, which are
like the old surfing. There's a way that you can go and say,
"What's interesting here today?"
So I think there's always going to be that back and forth.
|