Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "robots=noindex" meta tag to allow translated pages in search results #6260

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

smikitky
Copy link
Member

@smikitky smikitky commented Aug 26, 2023

Currently, the translated content of each language version of the docs is set to explicitly opt-out from web crawlers until almost all of the site's translation are completed. The relevant code is as follows:

const deployedTranslations = [
'en',
'zh-hans',
'es',
// We'll add more languages when they have enough content.
// Please DO NOT edit this list without a discussion in the reactjs/react.dev repo.
// It must be the same between all translations.
];
let shouldPreventIndexing = false;
if (
siteConfig.languageCode !== 'en' &&
!deployedTranslations.includes(siteConfig.languageCode)
) {
shouldPreventIndexing = true;
}

{shouldPreventIndexing && <meta name="robots" content="noindex" />}

Due to this, except for very few languages whose translation has been completed (currently only 2), most language versions are still not displayed in Google search results. They are still not linked from react.dev, either, so it's extremely hard to find the new translated site.

We have been translating the docs into Japanese for several months, but the results are still almost "invisible" on the web. Sadly, from what I can see on Twitter, many Japanese React users still believe that the effort to translate react.dev into Japanese hasn't even started!

Even if fewer than 10 articles have been translated, those articles are translated, and there is no need to hide them. If someone searches for "React useEffect" and there is a translated reference for it, they should be able to view it. After all, it's Google's job to evaluate the value of each page; there's no need for us to intentionally specify "robot=noindex".

What's worse, because of this "noindex", when you search for "React" in Japan (either with Google or Bing), the legacy Japanese version still happily comes at the top:

image

According to @gaearon's comment here, the React team would rather have the latest content rank higher than the legacy Japanese content. However, Google's algorithm doesn't think so; it believes that the legacy Japanese site is more useful for average Japanese developers (which I must agree). Removing the meta tag will help the latest content appear near the top even if it's not translated yet.

For example, if you search for "React useEffect" in Japan, the official legacy page still comes at the top, followed by many unofficial articles. The latest official reference (in English) is only ranked at the 11th position. The Japanese translated version has been published for more than 3 months, but it's nowhere to be found in the search results. Likewise, if you search for "React tutorial" in Japan, the class-based legacy official tutorial is still at the top, and the latest tutorial is at the 37th position because it's only available in English.

@smikitky smikitky changed the title Remove "robot=noindex" meta tag Remove "robots=noindex" meta tag Aug 26, 2023
@github-actions
Copy link

Size changes

📦 Next.js Bundle Analysis for react-dev

This analysis was generated by the Next.js Bundle Analysis action. 🤖

Three Pages Changed Size

The following pages changed size from the code in this PR compared to its base branch:

Page Size (compressed) First Load
/404 76.97 KB (🟢 -49 B) 180.92 KB
/500 76.96 KB (🟢 -49 B) 180.91 KB
/[[...markdownPath]] 78.43 KB (🟢 -49 B) 182.38 KB
Details

Only the gzipped size is provided here based on an expert tip.

First Load is the size of the global bundle plus the bundle for the individual page. If a user were to show up to your website and land on a given page, the first load size represents the amount of javascript that user would need to download. If next/link is used, subsequent page loads would only need to download that page's bundle (the number in the "Size" column), since the global bundle has already been downloaded.

Any third party scripts you have added directly to your app using the <script> tag are not accounted for in this analysis

Next to the size is how much the size has increased or decreased compared with the base branch of this PR. If this percentage has increased by 10% or more, there will be a red status indicator applied, indicating that special attention should be given to this.

@smikitky smikitky changed the title Remove "robots=noindex" meta tag Remove "robots=noindex" meta tag to allow translated pages in search results Aug 26, 2023
@lunaleaps
Copy link
Contributor

lunaleaps commented Sep 9, 2023

it believes that the legacy Japanese site is more useful for average Japanese developers (which I must agree).

I'm confused by this statement or perhaps I don't fully understand the implications of this change.

If we do allow robots to index the half-translated pages of the new react site (aka merging this PR), you're saying we're letting Google choose the rank of what is more relevant to Japanese users? That may be the old docs that are fully translated, or it may be the untranslated new docs, or in the ideal case, the new translated docs?

That logic makes sense to me but let me know if I'm understanding correctly.

Also, I wonder if we should instead opt into that behavior versus enabling it for all languages -- but still open to discussion for that

@lunaleaps lunaleaps self-assigned this Sep 9, 2023
@smikitky
Copy link
Member Author

smikitky commented Sep 12, 2023

@lunaleaps Yes, I think it's always best to let Google decide which articles are most useful. There's no point at all in specifying noindex.

  • Sites like the Japanese version, where >50% of the translation has been completed, will appear at the top of Google as soon as this meta tag is removed (of course, when searched within Japan). Many people will finally realize that there is an official tutorial that is not class-based! While some less important articles haven't been translated yet, this must not prevent the translated tutorial from being accessible for months. There is no reason to "release" the translation of Quickstart and useInsertionEffect on the same day. Besides, people who are lucky enough to come across mentions of the Japanese version on SNS can already read it. It doesn't make sense to allow traffic from SNS and block traffic from search engines.

  • Sites where only a few articles have been translated can also benefit from becoming searchable. This will make more people aware of the existence of the translated site, accelerate volunteer translations, and naturally raise the page rank over time. It's a win-win situation. With noindex, (not only will nobody pay money to the translators,) few people will even pay attention to them. Few people would want to work on translations that might not be read until a year later, for free. I suspect this is what is happening in many language versions.

  • While it's true that there are language versions that have been half-abandoned, they won't get a high page rank anyway. The absence of noindex won't cause any substantial harm in this regard, either. Even without noindex, depending on the country, legacy docs, unofficial articles in their local language, or whatever Google finds useful will continue to appear at the top. That's how the Web works, and you shouldn't do anything special. Just because Mongolian or Czech translations are slow does not mean the entire Japanese audience has to be penalized for it.

@gaearon You are the one who introduced this meta tag. Could you explain why you thought this was necessary? You changed the URL of all the old sites to legacy as soon as the new English docs were released. Yet, you are trying hard to keep these legacy sites at the top of Google searches in many places around the world, as shown in the screenshot above. Is this really your intention?

@gaearon
Copy link
Member

gaearon commented Sep 12, 2023

The main issue at the time was stale English pages on abandoned and/or incomplete translation sites showing up above the actual recent English docs — eg because the exact corresponding page was no longer found on the main site. I’m happy to experiment with this again if y’all would like but let’s keep an eye on that.

@gaearon
Copy link
Member

gaearon commented Sep 12, 2023

I think ideally (?) we’d have noindex for individual pages that have not been translated yet. We could track this by having a “lang” field in the Markdown metadata for each page that defaults to English and gets changed when translating. Then we could mark pages as noindex if their lang differs from the site lang. We could also maybe (?) use this info as a source of truth for “which pages are translated” instead of checklists in the issue template.

@smikitky
Copy link
Member Author

smikitky commented Sep 13, 2023

@gaearon Ah, so you're talking about the "Concurrent Mode" articles, right? Ironically, outside English-speaking countries, your measures to hide some articles that may become stale in the future are currently causing even older legacy articles to show up in large numbers at the top of searches. I think it's best to remove noindex altogether for now, and address your concern before some articles are removed from react.dev. Some language version may have to be shut down eventually, but such versions should not inconvenience speakers of other languages.

Adding metadata like lang: ja to every page seems like a great idea! I agree with adding noindex on a per-article basis if they are not translated. We can easily add a "Looking for translation contributors" sticker to untranslated articles. In the long run, this would also be very helpful in building a single site across all languages and repositories, similar to MDN, where the availability of translations can be displayed for each article. I think the "git submodule" mechanism could be used to achieve this.

@smikitky
Copy link
Member Author

Here are some screenshots of the search results for "React Tutorial" in languages where a significant amount of translation has already been completed:

Japanese:

japanese-tutorial

Korean:

korean-tutorial

Indonesian:

indonesian-tutorial

Turkish:

turkish-tutorial

Finnish:

finnish-tutorial

Arabic:

arabic-tutorial

@gaearon So the code you added to prevent potential future issues is actually causing severe problems right now, in many countries all around the world. Yesterday, I saw an X post saying "I can't find the Japanese documentation for some reason, so I have to manually type ja.react.dev into the browser each time". I believe this needs to be addressed immediately.

@gaearon gaearon merged commit dfd15e8 into reactjs:main Sep 20, 2023
@gaearon
Copy link
Member

gaearon commented Sep 20, 2023

sure, let's try it!

@smikitky smikitky deleted the patch-3 branch September 20, 2023 17:15
@smikitky
Copy link
Member Author

Looks like the preferred way to lower the rank of an untranslated page is not "noindex" but "rel=canonical".

How to specify a canonical with rel="canonical" and other methods

Of course, using this method won't solve the problem where articles in language forks may continue to appear in search results even after the corresponding article has been removed from the English docs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants