Searching

Search engines act as our digital viewport to the world’s information. It’s amazing to think of the transition made by society in the last few decades. We’ve moved from paper files stored in a boring greyish cabinet to instantly accessible online information on our devices.

Filing cabinet
We navigate to this online information through a variety of sources. Sometimes we now the exact URL, other times we click on a link from a webpage or social media feed.
 
Nothing has done more to make the internet’s information accessible and navigable than the search engine. This means Google for the majority of the world’s population (nearly 90%). Did somebody say monopoly?
Search engine global market share
 
Anyway, there are lots of other search engines out there. In China, Baidu is the most popular (and Google is banned). In Russia, Yandex is prominent. For environmentalists, Ecosia may be the foremost choice and DuckDuckGo is popular for those worried about their data. Oh and there’s Bing and Yahoo!, whatever they’re used for.
 
We’ll focus on Google, but the good news is that most search engines today work off the same structure.

What's the purpose of a search engine?

For the searcher, it is to provide the most relevant and valuable online results, according to their individual search query. Search engines exist to allow a searcher to navigate the internet’s information, usually for free.
 
For search engines, the purpose is a little different. Their user base allows them to charge for advertising space and generate revenue. This doesn’t diminish the importance of their value to the user. It’s the most important factor for them. If a search engine provides the best results for a query, more searchers use them. The more searchers (and searches per day), the more real estate they have to offer for advertising. Couple this with the value offered by their advertising platforms and they can charge more, sell more and earn more.
 
So the purpose of a search engine can vary depending on which side of the search you sit. Though at the end of the day, the best search experience wins.

How do Search Engines generate their results?

Search engines work through a function of crawling, indexing and ranking the web’s information. They then present this information, usually in the form of links or other forms of content to the searcher.
 
It’s important to note that, for the vast majority of searches, search engines do not provide their own information. Yes, search engines do have features such as Google Translate and conversion tables, but don’t let this mislead you.
 
We’ve all heard someone search and say, “Google says this…..”. Most of the time this is incorrect. What Google and other search engines are doing is to display information provided by other websites. Even Google’s Knowledge Graph is generated from information scraped from other websites.
Google currency conversion table

We can see here that the data for this current conversion is provided by Morningstar

Google knowledge graph of SEO

Knowledge Graph information is amalgamated from other sites, such as Wikipedia in this case.

Google translate

Even a Google product like Google Translate sometimes relies on a community to verify things. 

Crawling the web

So we know that search engines are organising the web’s information into a searchable format. We also know that on the whole, they aren’t creating the results themselves. They’re just displaying, and providing access to, other webpages.
 
So how do they find this information in the first place?
 
That’s where crawlers come in. You may also hear them referred to as search engine crawlers, webcrawlers, robots, bots or spiders. Google’s crawler is most often referred to as Googlebot.
 
Search engine crawlers are software applications that systematically navigate the web. They’re job is to find and document the internet’s information. They tend to start off with already known sites or sites where a webmaster has submitted a sitemap to the search engine. The crawler will then document key content and metrics of that page. They will then follow hyperlinks to other pages on the website, and hyperlinks to other websites. The process then repeats, allowing the crawlers to explore the web.

Indexing the web

It’s no good having all of these bots crawling the web if there’s nowhere to store the information. It’s a common misunderstanding that the search engine searches the web in real time when you search. This would take far too long. Instead, what you’re actually searching is an index of the web.
 
When a crawler gets to a page it will “render the content of the page, just as a browser does”. The crawler will extract key data from the page, from keywords, headings and title tags, through to page speed, freshness and content. There is a process called parsing, where the data is sent to the index and URL sent to a schedule for redrawing.
 
The important part is that all of this key information about a webpage is then indexed. In Google’s case, this is imaginatively called the Search Index.
 
The Search Index is a way of storing and categorising the web’s information, ready to be retrieved when a user searches. It significantly increases the time from a user searching, to the results being returned. Webpages aren’t just organised in alphabetical order. Although the exact list of how websites are organised is a guarded secret. All you need to know is that the information is stored in a fashion that makes it far easier to return on the search engine results pages.

Ranking the web

So the search engines have sent out their industrious crawlers to explore the web. They’ve then made a record of everything they find and organised it in their index. They’ve also set up a schedule to recrawl each URL so that they can spot any updates and re-index.
 
We also know that when you enter a search query, you’re searching the index, not the web. Although it’s a pretty good representation of the web, it’s billions of webpages.
 
This is where it gets complex (although it sort of already has). When a search is performed, Google isn’t just matching keywords. We’ve moved far past that. Famously, Google has over 200 ranking factors, although that phrase has been around for at least a decade. With RankBrain, the Knowledge Graph and machine learning, it feels far more organic than that now. Regardless, it’s not a simple process. Google employs a swathe of algorithms to help match your search with the best possible result.
 
It’s these algorithms that are doing the final piece of leg work. They’re looking at your search and deciding what is most relevant and useful. Google state that “the weight of each factor varies depending on the nature of your query”. In other words, what you see on the SERPs is tailored to what you searched….but you already knew that.
 
Search engines aren’t just matching the best content with your query. They’re also deciding on what types of content would be most useful. It’s all about matching your searcher intent with results. What is the searcher actually trying to achieve?
 
Let’s look at some examples.
 
This is the example that Google give. If you’re searching for current events, then there may be a heavier weighting towards the freshness of the content. This makes sense considering the rapid time decay associated with current affairs. They may also prefer established news sites, or those with the correct schema mark up. In this instance, sports results show a table and news results first.
SERP for sports results

Another example would be location based search. You might search ‘Italian restaurant in central London’, or ‘Wine bar near me’. This search has a different intent, and therefore will need different weighting. In this scenario, other factors may be more relevant. for example, your location or whether the restaurant or wine bar is open? This information may be crawled on the site or be known through Google My Business.

SERP for restaurant search
The point is that search results are more complex than just 10 blue links. Google has invested heavily in maps, news carousels, image carousels, video, the Knowledge Graph, featured snippets and much more. All in the name of providing you the most relevant and valuable results for your specific search. Pretty clever huh?
 
Hopefully that gives you a good overview of how search engines work through their process of crawling, indexing and ranking. We would also recommend having a browse of Google’s rather stylish explanation of search

5 Comments

  1. […] what they want, we can use this as the keystone in our metaphorical bridge. As discussed in our explanation of how search engines work, they want to provide the most relevant and valuable search results for the user. This creates a […]

  2. […] surprise. If you haven’t, just hop over to the ‘The Fundamentals of SEO‘, ‘How Search Engines Work‘ and ‘Why SEO is important‘ articles. Focussing on the user should be your […]

Leave a Reply

Your email address will not be published. Required fields are marked *