-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "robots=noindex" meta tag to allow translated pages in search results #6260
Conversation
Size changes📦 Next.js Bundle Analysis for react-devThis analysis was generated by the Next.js Bundle Analysis action. 🤖 Three Pages Changed SizeThe following pages changed size from the code in this PR compared to its base branch:
DetailsOnly the gzipped size is provided here based on an expert tip. First Load is the size of the global bundle plus the bundle for the individual page. If a user were to show up to your website and land on a given page, the first load size represents the amount of javascript that user would need to download. If Any third party scripts you have added directly to your app using the Next to the size is how much the size has increased or decreased compared with the base branch of this PR. If this percentage has increased by 10% or more, there will be a red status indicator applied, indicating that special attention should be given to this. |
I'm confused by this statement or perhaps I don't fully understand the implications of this change. If we do allow robots to index the half-translated pages of the new react site (aka merging this PR), you're saying we're letting Google choose the rank of what is more relevant to Japanese users? That may be the old docs that are fully translated, or it may be the untranslated new docs, or in the ideal case, the new translated docs? That logic makes sense to me but let me know if I'm understanding correctly. Also, I wonder if we should instead opt into that behavior versus enabling it for all languages -- but still open to discussion for that |
@lunaleaps Yes, I think it's always best to let Google decide which articles are most useful. There's no point at all in specifying
@gaearon You are the one who introduced this meta tag. Could you explain why you thought this was necessary? You changed the URL of all the old sites to |
The main issue at the time was stale English pages on abandoned and/or incomplete translation sites showing up above the actual recent English docs — eg because the exact corresponding page was no longer found on the main site. I’m happy to experiment with this again if y’all would like but let’s keep an eye on that. |
I think ideally (?) we’d have noindex for individual pages that have not been translated yet. We could track this by having a “lang” field in the Markdown metadata for each page that defaults to English and gets changed when translating. Then we could mark pages as noindex if their lang differs from the site lang. We could also maybe (?) use this info as a source of truth for “which pages are translated” instead of checklists in the issue template. |
@gaearon Ah, so you're talking about the "Concurrent Mode" articles, right? Ironically, outside English-speaking countries, your measures to hide some articles that may become stale in the future are currently causing even older Adding metadata like |
Here are some screenshots of the search results for "React Tutorial" in languages where a significant amount of translation has already been completed: Japanese: Korean: Indonesian: Turkish: Finnish: Arabic: @gaearon So the code you added to prevent potential future issues is actually causing severe problems right now, in many countries all around the world. Yesterday, I saw an X post saying "I can't find the Japanese documentation for some reason, so I have to manually type ja.react.dev into the browser each time". I believe this needs to be addressed immediately. |
sure, let's try it! |
For future reference, here are links to directly preview search results for each country:
|
Looks like the preferred way to lower the rank of an untranslated page is not "noindex" but "rel=canonical". How to specify a canonical with rel="canonical" and other methods Of course, using this method won't solve the problem where articles in language forks may continue to appear in search results even after the corresponding article has been removed from the English docs... |
Currently, the translated content of each language version of the docs is set to explicitly opt-out from web crawlers until almost all of the site's translation are completed. The relevant code is as follows:
react.dev/src/components/Seo.tsx
Lines 20 to 35 in 3189529
react.dev/src/components/Seo.tsx
Line 72 in 3189529
Due to this, except for very few languages whose translation has been completed (currently only 2), most language versions are still not displayed in Google search results. They are still not linked from react.dev, either, so it's extremely hard to find the new translated site.
We have been translating the docs into Japanese for several months, but the results are still almost "invisible" on the web. Sadly, from what I can see on Twitter, many Japanese React users still believe that the effort to translate react.dev into Japanese hasn't even started!
Even if fewer than 10 articles have been translated, those articles are translated, and there is no need to hide them. If someone searches for "React useEffect" and there is a translated reference for it, they should be able to view it. After all, it's Google's job to evaluate the value of each page; there's no need for us to intentionally specify "robot=noindex".
What's worse, because of this "noindex", when you search for "React" in Japan (either with Google or Bing), the legacy Japanese version still happily comes at the top:
According to @gaearon's comment here, the React team would rather have the latest content rank higher than the legacy Japanese content. However, Google's algorithm doesn't think so; it believes that the legacy Japanese site is more useful for average Japanese developers (which I must agree). Removing the meta tag will help the latest content appear near the top even if it's not translated yet.
For example, if you search for "React useEffect" in Japan, the official legacy page still comes at the top, followed by many unofficial articles. The latest official reference (in English) is only ranked at the 11th position. The Japanese translated version has been published for more than 3 months, but it's nowhere to be found in the search results. Likewise, if you search for "React tutorial" in Japan, the class-based legacy official tutorial is still at the top, and the latest tutorial is at the 37th position because it's only available in English.