Yandex has a ton of source code for all the technology allegedly leaked by disgruntled employees, some of which was from Yandex, Russia’s largest search engine. As you can imagine, SEOs and others are looking at what they can learn from source code.
I haven’t personally downloaded the source code, so I haven’t looked it up myself, but I thought I’d share on Twitter what I learned from researching the source code.
This is an alpha version of the leaked explorer tool. #yandex search code.
You can browse ranking factors, view by tag, find connections, and more.
If there is something you want to see, you can easily add new functions! https://t.co/AjbYnrDl9P pic.twitter.com/pQ4scOkP6w
— Rob Ousbey : @RobOusbey@mastodon.social (@RobOusbey) January 28, 2023
After downloading and analyzing the code, I found a lot of useful information for Google SEO as well. pic.twitter.com/RWrgnnlpj6
— Alex Buraks (@alex_buraks) January 27, 2023
In theory, what are the differences between the algorithms used by Google and Yandex?
They are very similar:
– There is a RankBrain analogue – MatrixNet;
– Uses PageRank (much like Google).
– Many text algorithms are the same. pic.twitter.com/Djjl8Bmjwn— Alex Buraks (@alex_buraks) January 27, 2023
According to Statcounter, Yandex is close to Yahoo and Bing in market share. pic.twitter.com/5GKIvKIvAo
— Alex Buraks (@alex_buraks) January 27, 2023
Key insights after analyzing this list:
#1 Link age is a ranking factor. pic.twitter.com/U47uWvEq9w
— Alex Buraks (@alex_buraks) January 27, 2023
#3 URL numbers are bad for rankings pic.twitter.com/ECgwGeGUfb
— Alex Buraks (@alex_buraks) January 27, 2023
#5 Hard pessimization equals PR=0 pic.twitter.com/RRbhuJyZr1
— Alex Buraks (@alex_buraks) January 27, 2023
#7 Fun Fact – There’s Another Ranking Factor That Boosts Wikipedia pic.twitter.com/799F8KFpkE
— Alex Buraks (@alex_buraks) January 27, 2023
#9 A document’s age and last update are both ranking factors. pic.twitter.com/ay1GTMVEtJ
— Alex Buraks (@alex_buraks) January 27, 2023
We’ve now reviewed about 40% of the list, but there are many more (text relevance, behavioral factors, page rank, internal links, etc.).
I will continue this thread in a while.
— Alex Buraks (@alex_buraks) January 27, 2023
The first thread got a lot of impressions (500k views so far, thanks for the retweets and likes!) so I decided to finalize it. https://t.co/UQiQsnpWd2
— Alex Buraks (@alex_buraks) January 28, 2023
#2 Added: Ranking factor for orphaned pages.
It can be easily found by Screming Frog and other crawlers. pic.twitter.com/zIPwAelpD0
— Alex Buraks (@alex_buraks) January 28, 2023
#4 Site/URL search query count is a ranking factor.
Obviously, more = better. pic.twitter.com/xXQ6FMDghP
— Alex Buraks (@alex_buraks) January 28, 2023
#6 If your URL ends up in a search session (users will find what they need) – it can affect rankings.
There are hard and predictable factors in this. pic.twitter.com/Zx3sBZORCs
— Alex Buraks (@alex_buraks) January 28, 2023
#8 Special ranking factor for short videos (tiktok, short, reel) pic.twitter.com/oKPzL09MID
— Alex Buraks (@alex_buraks) January 28, 2023
#10 Keywords in URLs are a ranking factor.
As you can see from the description, it’s best to include up to three words from your search query. pic.twitter.com/Q1euKWSiST
— Alex Buraks (@alex_buraks) January 28, 2023
#14 Another ranking factor for content quality – videos embedded in pages are corrupted.
Embedding videos – Good for ranking.
Broken embedded video – bad. pic.twitter.com/2SUys65PHp— Alex Buraks (@alex_buraks) January 28, 2023
#16 Good for SEO if your backlink anchor contains all the words of your keyword.
It’s more informative if it’s in one link. Especially when the word order is the same. pic.twitter.com/WrbESJ8Da5
— Alex Buraks (@alex_buraks) January 28, 2023
#18 The quality rank of text on a domain is a ranking factor.
Pages with poor quality content affect the entire domain. pic.twitter.com/MJUCTVB9CH
— Alex Buraks (@alex_buraks) January 28, 2023
#20 Interesting, another ranking factor is randomness.
If you don’t know why some pages are at the top, it could be random (to test behavior factors). pic.twitter.com/TGtzFrmBOV
— Alex Buraks (@alex_buraks) January 28, 2023
#22 Backlinks from PageRank’s Top 100 Best Websites impact your ranking.
That’s not news. pic.twitter.com/ikxldWLJqy
— Alex Buraks (@alex_buraks) January 28, 2023
Wow, you found a list of initial weights for Yandex ranking factors.
Need one more thread? 😁
The final weights for PS are computed by AI (matrixnet), but initial values are also useful. pic.twitter.com/WeroYQy7Yu
— Alex Buraks (@alex_buraks) January 28, 2023
That said, I’ve been digging through the codebase myself and finding interesting stuff.
I’m doing this live, so I don’t know how long it will be before my next tweet.
— Mike King (@iPullRank) January 27, 2023
Much of the code related to Yandex Search can be found in the Kernel, ExtSearch, Search, and Robot archives, but again, it’s impossible to comprehensively describe them until you’ve seen them all.
— Mike King (@iPullRank) January 27, 2023
The web_meta_factors_info/factors_gen.in file has some very interesting things related to content features and elements.
There are things we expect, for example the minimum expectation that words in the title are close to words in the query. pic.twitter.com/YRsrCpVsqU
— Mike King (@iPullRank) January 27, 2023
Interestingly, there are many scrapers on Google News, Shopping, YouTube, and even other Yandex services.
— Mike King (@iPullRank) January 27, 2023
Hmm… this could be the structure of how Yandex stores documents in the version of the document server.
I’m still looking for ideas on how to construct the inverted index. pic.twitter.com/1lwTbOirnx
— Mike King (@iPullRank) January 27, 2023
Here is the protobuffer for the link element: pic.twitter.com/1RM6o1xzRg
— Mike King (@iPullRank) January 27, 2023
“Link Priority Code” talks about lowering the priority of links with the same text from the same host. In other words, don’t count links from duplicate content. pic.twitter.com/dQTUnScCUy
— Mike King (@iPullRank) January 27, 2023
How did you come up with that number of ranking factors?
481 factors related to ‘rapid clicks’ are shown pic.twitter.com/sw5A3ia3Bk
— Mike King (@iPullRank) January 28, 2023
Like Googs, Yandex has multiple ranking models to choose from.
This select_ranking_models.cpp file describes using different models for different languages and locations. pic.twitter.com/m210tpOUDb
— Mike King (@iPullRank) January 28, 2023
I’m going to watch TV but obviously I need to add this to the book so I’ll be adding more in the next few days
— Mike King (@iPullRank) January 28, 2023
I’m digging into how this robot’s archive is structured.
There seems to be a lot of interesting things going on with the Zora directory. I have a limits.pb.txt file that stores requests per second for hosts and IP addresses for 204k hosts. pic.twitter.com/0oulKm58dx
— Mike King (@iPullRank) January 28, 2023
Here, document and query elements are collected and scored.
After this, it seems to move to the warehouse. pic.twitter.com/qJAiLfSrsU
— Mike King (@iPullRank) January 29, 2023
Okay, really easy, the top 5 most positively and negatively weighted ranking factors and their coefficients in the initial weighting of Yandex’s document relevance calculation.negative first
#1 FI_ADV: -0.2509284637
This element determines that the site has ads.
— Mike King (@iPullRank) January 29, 2023
#3 FI_QURL_STAT_POWER: -0.1943768768
The factor is the number of URL impressions for the request
— Mike King (@iPullRank) January 29, 2023
#5 FI_GEO_CITY_URL_REGION_COUNTRY: -0.168645758
The factor is the geographic match between the document and the country the user searched for.
Let’s take a look at the top 5 positive weighting factors.
— Mike King (@iPullRank) January 29, 2023
The starting point for link-related factors is here https://t.co/fwP8TxuOrM
— Christoph C. Cemper 🇺🇦 🧡 SEO (@cemper) January 30, 2023
Will this help me do SEO on Google? Probably not, but hey, that’s very interesting.
Ah, but once you find the optimal word count…
boom
— John Mueller keeps an eye on Google+ 🐀 (@JohnMu) January 29, 2023
Forum discussion on WebmasterWorld.