Thursday, March 22, 2007

Next Generation Search -An Interview with Google's Matt cutts

Hi Friends,

Here I am posting a recent interview of Head of GOOGLE's Web Spam Team- Matt Cutts
about new generation search with Richard.
Written by Richard MacManus

Last week I had the pleasure of interviewing the head of Google's Webspam team, Matt Cutts. The topic of our conversation was Next-Generation Search. In my pitch to get an interview with someone at Google, I explained how Read/WriteWeb has been covering
Next-Gen search a lot and so it would (obviously) be great to get Google's views on this topic!

Matt Cutts is a well-known Google identity, who apparently gets mobbed by fans at SEO conferences. His Wikipedia page states that he co-invented one of the most well-known patent filings from Google, involving search engines and web spam. One note about the following interview: Google has a policy of not discussing competitors, so a few of my original questions had to be dropped or re-phrased.

Richard: When we write about 'next-generation search' on Read/WriteWeb, a lot of times we position it as: how can a startup become the next Google? But obviously Google is also hard at work with next-generation search technologies. Can you give us an overview of what Google is working on in regards to next-gen search - e.g. personalized search, AI, etc.

Matt: I think personalization has a very high chance of being able to improve search for the average user. One of the great things about it is, you don't really have to do a lot of work. Once you decide this is something you're interested in, Google can take care of a lot of the details. I recently saw a post online where somebody was complaining about metadata and having to make metadata - and the nice thing about personalization is that it's free for the user. So as far as the next generation of search, I think that is something that is very exciting.

Richard: Can you give us a couple of examples of how Google is implementing personalization?

Matt: I think of localization as a type of personalization. If you type in a query like "football", that will give you different results in the US versus the UK. And a query like "bank" done on Google - in New Zealand it will get New Zealand banks, in Australia it will get Australian banks. And it makes a big difference to know those sorts of things. So that's just personalization at a country level, but it already shows the sort of potential that you can reach.

Richard: Also recently Google implemented personalization with Google Accounts, so I believe personalization can happen out of that, via the main Google search?

Matt: Absolutely, yes. It's nice because the mental model that users have to keep has been simplified. So now if you're signed into Google search, we will be able to help personalize your search results. And that's a really nice win, because it's much easier for people to know. If I don't want personalized search results, I can just click in the top right and sign out. But if I am signed in, I can check that by just looking at my email address in the top right - then I know that I'm benefiting from personalization automatically.

Richard: What do you think about semantic technologies (like for example Hakia)? How important is natural language understanding for search and is Google doing anything in this direction?

Matt: We do pay a lot of attention to a lot of different technologies, so I would define Google's approach as very pragmatic. And we keep an eye on the entire space and we try to say, ‘ok what are the areas that are most promising for users?’ Historically it's always interesting to view the progress of semantic technologies. For example if you do a search like: 'how many states are in America?'. Some search engines that claim to be semantic won't do a good job in delivering the right results, whereas Google can do a very good job - even if you think, ‘ok how can they handle natural language, or how can they handle the semantics of that search.’ And I think what Google benefits from is the sheer size of the Web and the sheer amount of data, and it really does help us understand the meanings of words and synonyms. So we do have a pragmatic approach and we don't necessarily place all our bets on one particular way of doing things. We are exploring a lot of different things all at once.

Richard: So you would say that Google is already doing that kind of semantic technology, that it's just integrated into the current service you provide?

Matt: Yeah, I would say there's a lot of semantic technology already built in, under the hood of Google.

Richard: One of the most popular posts this year on R/WW was one called The Top 100 Alternative Search Engines. What are some of the "alternative" search engines that have most impressed you lately? Or if you can't mention names, what are some of the technologies that impress you? The February list had 32 changes and so it perhaps indicates the sheer speed of innovation in search.

Matt: You also did a really good job in another post, where you had a poll that asked what would be next [in search]. It was interesting that 209 votes were for personalized search, and after that Artificial Intelligence. I think a lot of those trends are very interesting. Having a lot of data, we are able to try things as different as visualization, all the way up to things like clustering, or query refinement. Sometimes at the bottom of our search results, if we think it's relevant, we'll take the user's query and suggest other related queries. And that's something that Google didn't launch for a while, but we wanted to test it and get the best possible result. It didn't make sense to launch it until we found a combination that we thought was very good for the user. But I do think that we watch a lot of those different technologies and try to stay aware of what people are doing in the industry and what people are trying.

Richard: SearchMash is an experimental site from Google [introduced around Oct/Nov 2006], with some new Ajax-powered UI ideas. Can we expect any of the SearchMash features to be implemented into the main UI any time soon?

Matt: There is a possibility, but not a guarantee that the features you see on SearchMash will be seen on Google search. It's always a trade-off and we have to consider things like how well something might be supported by different browsers, how much users like it, and also how much screen real estate or time to ramp up on a feature it might take. For example there was an interesting feature on SearchMash where you could start typing anywhere on the page and it would start filling in the search box for you. But that wouldn't work with every single browser. I think the big value in SearchMash is that it lets us try a lot of very different user interfaces - things that might throw your average user. And we can try out those really unusual interfaces and see how people respond.

Richard: On our Alt Search Engine list, there were some search engines with amazing UIs - e.g. one had a talking avatar. So I guess you could, in future, experiment with that kind of UI on SearchMash...

Matt: Yeah, it's fun because once you step off the Google domain, you've got a lot more freedom to try different things - including bringing in image results, results from news, all sorts of fun things. So it's a fun playground to have, and I'm glad that we introduced it.

Richard: Google Base is essentially a database of structured content and home for many different verticals currently (jobs, vehicles, classified). There's also GData and the Google Base API. Can you explain how all these things fit together and what (if any) impact it will have on search going forward? I presume that structured data will become very useful for Google search over time, so perhaps you could help our readers understand that some more...

Matt: It's certainly the case that structured data is really interesting, because once you have data in different fields, you can imagine doing different types of searches over it. And GData is especially interesting, because it almost provides a way to plug data into Google. Which throws up a lot of interesting possibilities. For example, Google's had a couple of other types of searches - we've had patent search, code search, book search - and those are slightly different verticals, a little more free-form. But you could certainly imagine being able to search over new verticals; and having that fielded search, or the structured content (however you want to refer to it) can definitely be really useful as far as letting people have more flexibility. So I'm pretty excited about it, but it's always hard to say how things will go in the future and the direction things will go.

Richard: Do you have any plans for vertical search beyond blogs, I mean the major verticals... for example Microsoft bought a health search company recently. So is Google going to do anything in those major verticals?

Matt: Well, there are two answers to that. Firstly things like patent search, code search, book search - whether you want to call them vertical search is kind of up for dispute. They search over different types of data. So for example with Google Calendar, being able to search over calendar data or Gmail being able to search over email, is an entirely different and new capability. And really, really interesting. I'll let you decide whether to call that vertical search or not.

My second answer though, is that I think it's really interesting that Google has taken a step back and looked at the general issue of vertical search - and as a result has introduced Google Custom Search Engine (CSE). It's built on the power of Google Co-op, and the wonderful thing about it is that it lets anybody define their own custom search engine. And not just something feeble, we're talking about the ability to add 5,000 URLs very easily - and not just to filter over them, but to be able to boost for some sets of URLs, and detract or downgrade other sets of URLs.

So what's really interesting to me is if you think about a new vertical, for example podcasts, you could certainly have Google say: ‘well ok, how do we search over podcasts?’ But if you go into Google Custom Search Engine, I think there's been dozens of people who've actually made their own podcast search engines - by using the power of CSE. For example, the other day I found a search engine for 'engineering podcasts', so you could search for Google and get all the podcasts about tech talk, etc. I think that's a really interesting approach. I'd certainly say that we want to return the best results to users, so in some cases it might make sense for Google to look at individual areas. But the general issue is often well addressed by giving the power to the people, so to speak, and letting them build their own search engines. So it's really been fun to see just how many people have signed up for it, and how much growth the custom search engine area is getting.

Richard: Your particular area of expertise is fighting spam. Can you tell us the latest on how Google is trying to keep its results pure... what are some of the trends in fighting spam?

Matt: We've done a lot of stuff to return better search results for users over the last year, including on web spam. For example, we've got internal metrics that we keep track of to show that we're doing a much better job than even a couple of years ago, to make sure that a user doesn't randomly come across spam. One of the big trends last year and continuing into this year is internationalization. It's really important for us to be able to offer spam-free search in any language, whether it's French, Italian, German, Chinese or Japanese. So a lot of what my team looks at is trying to make sure that any new approach that we do, we are also able to do in a scalable and robust way across many languages. So that's probably the biggest trend.

