-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EES-4823: Pre-empt content urls that will redirect #5013
base: dev
Are you sure you want to change the base?
EES-4823: Pre-empt content urls that will redirect #5013
Conversation
…aching frontend content
|
||
const redirectService = { | ||
async list(): Promise<Redirects> { | ||
return (await fetch(`${contentApiUrl}/redirects`)).json(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eurgh, so apparently this causes build errors in the frontend. We could re-implement the frontend version of this service and have two, but that's messy. Problem as things stand I need to make this call available from ContentHtml
which is in common.
@ntsim I'm gonna hold off solving this pending your thoughts on my big TODO: comment in AnchorHtml
, as I suspect there might be a few changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This service was implemented like this because Axios doesn't work with Next's middleware runtime (which is using this service).
There's potentially other libraries that work across all runtimes, but it's probably not worth investing any time on this as Axios works fine for like 99% of our use-cases.
Would just revert this to how it was previously as this is the most prudent approach.
If we go with the context approach for redirects, we'll also want to move this service back into the frontend module as well.
await waitFor(() => { | ||
expect(screen.getByText('Featured table')).toBeInTheDocument(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement the context based redirects as suggested, we don't need to do any waiting around for link text to be correct. Make sure to undo all these changes in this PR!
|
||
const redirectService = { | ||
async list(): Promise<Redirects> { | ||
return (await fetch(`${contentApiUrl}/redirects`)).json(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This service was implemented like this because Axios doesn't work with Next's middleware runtime (which is using this service).
There's potentially other libraries that work across all runtimes, but it's probably not worth investing any time on this as Axios works fine for like 99% of our use-cases.
Would just revert this to how it was previously as this is the most prudent approach.
If we go with the context approach for redirects, we'll also want to move this service back into the frontend module as well.
const [sanitisedHref, setSanitisedHref] = useState<string | null>(null); | ||
|
||
// TODO: Discuss how we want to achieve this in PR review | ||
// One option is to make this a hook | ||
// Another is to make this call during frontend's GetServerSideProps and expose to common through a context or something | ||
// Another is to leave it like this - but there's still a question of what to render until the API has responded | ||
// Also, can we handle caching a bit more smartly? | ||
useEffect(() => { | ||
const sanitiseHref = async (rawHref: string) => { | ||
setSanitisedHref(await applyRedirectRules(rawHref)); | ||
}; | ||
|
||
sanitiseHref(href); | ||
}, [href]); | ||
|
||
if (!sanitisedHref) { | ||
return <a href={undefined}>[link loading...]</a>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a fair few problems with this approach which you've already identified, but a fundamental problem is that it's not likely to be parsed correctly by crawlers and could break all our content links.
There aren't strong guarantees that a crawler can read client rendered content correctly. Ensuring crawlers have everything they need in the initial server rendered document is the safest approach.
There's some additional issues too:
- Without any caching, making an API call per link is just a non-starter as content could have many links inside it
- These links would also be rendered in admin too. This would likely break without adjusting the implementation
I think the options for a server-rendered approach are either:
- Context (as you've mentioned)
- Transforming HTML strings server-side e.g. in the backend API, or even frontend
Given we're already transforming the HTML strings as part of React rendering, it's probably more efficient to go down the context route.
Implementation
If we agree the context approach is correct, then I'd propose the following implementation:
The context itself
This should be declared in a similar way to our other contexts, which should consist of a provider component (RedirectsContextProvider
) and a hook (useRedirectsContext
) to access the context.
The context should not fetch its own redirects from the service. It should just receive these as prop so we can leave it to the consuming application to implement the fetching.
Typically, we implement these contexts so that if the provider hasn't been set up, an exception is thrown by the hook when it's used. However, this will be tougher to do as a fair few tests may need to be updated to add the context provider in (without any redirects).
It might be simpler to implement this so that it doesn't throw, but you might want to try the throwing approach first to see how many tests would need to be updated.
If we do decide to throw and to update the tests, the simplest approach would be to update our custom render
function in common. This can then be swapped in really easily into the tests:
+ import render from '@common-test/render'
- import { render } from '@testing-library/react'
Adding globally
It'd be good for to implement the context globally so we can provide the redirects across any link, not just links in content.
This approach is less fiddly as we don't need to wrap areas of the page with the context, which would otherwise create lots of changes. It also means that we don't forget to add the context for new pages.
To do this, we can just add the context in _app.tsx
. We can fetch the redirects in App.getInitialProps
and pass these through as props to Providers
. The Providers
component will need to be modified pass the props through to the context.
Caching
A missing part of the puzzle will be to cache the redirects rather than re-fetch them on every getInitialProps
(which is on every page render).
The simplest cache implementation will be to store the redirects in a module-scoped variable using a simple time-based expiration. We're already doing this type of caching in the redirectPages
middleware, but we could move this to redirectsService
so other places can benefit from this caching as well.
We don't have to do this, but for a nice bonus, I'd personally be tempted to re-implement the caching via a util function that could be used for this type of caching elsewhere e.g.
import createMemoryCache from '@common/utils/cache/createMemoryCache';
const redirects = createMemoryCache({
get: () => (await fetch(`${contentApiUrl}/redirects`)).json(),
duration: getCacheTime()
});
const redirectService = {
list(): Promise<Redirects> {
return redirects.get();
}
}
Admin
To simplify things, I'd probably suggest that we don't bother fetching the redirects in admin. The SEO penalty doesn't apply to admin so it's probably not worth the hassle to implement these here as well.
If we implement the useRedirectsContext
hook to not throw when there isn't a provider, we can just omit setting up a RedirectsContextProvider
for the admin.
href: string; | ||
} | ||
|
||
export default function AnchorHtml({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would just call this Link
, or ContentLink
if you want to avoid confusion with the other Link
components.
@@ -87,11 +88,11 @@ export default function ContentHtml({ | |||
return !node.attribs.href.includes( | |||
'explore-education-statistics.service.gov.uk', | |||
) && typeof node.attribs['data-featured-table'] === 'undefined' ? ( | |||
<a href={url} target="_blank" rel="noopener noreferrer"> | |||
<AnchorHtml href={url} target="_blank" rel="noopener noreferrer"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need redirects for external links? We don't have redirects for external URLs, so this doesn't seem necessary
let newUrl: string = url; | ||
|
||
newUrl = await noTrailingSlashUnlessOnlyHost(newUrl); | ||
newUrl = await pathMustNotEndInSlash1000(newUrl); | ||
newUrl = await publicSiteHostMustNotStartWithWww(newUrl); | ||
newUrl = await mustNotTriggerAContentApiRedirect(newUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this approach is potentially over-complicating things by trying to be functional.
I'm not really sure it's worth the effort as we don't typically write code like this, and it's less performant as it has to parse into a URL
object multiple times and is using regexes over the entire URL in multiple places.
Would suggest that we just construct the URL
object once and potentially just simplifying this back to some if
statements (with comments) which modify the url
i.e.
const newUrl = new URL(url);
// Do a check
if (blah) {
newUrl.pathname = ...;
}
// Do another check
if (blah) {
newUrl.hostname = ...;
}
// Using functions is still fine for more
// contained pieces of logic.
await callAFunction(newUrl);
return newUrl.toString();
export default async function applyRedirectRules(url: string): Promise<string> { | ||
let newUrl: string = url; | ||
|
||
newUrl = await noTrailingSlashUnlessOnlyHost(newUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't modify as suggested in comment below, then we should remove any unnecessary awaits
. This is only necessary on the function that calls redirectService.list
.
redirectPathStarts, | ||
} from '@common/services/redirectService'; | ||
|
||
const prodPublicUrl = 'https://explore-education-statistics.service.gov.uk/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the right approach to use this as the base? If we're on non-prod environments, won't this potentially result in incorrect links in those environments?
I mentioned this in your other PR, but we should probably set the host's URL using setHostUrl
/ getHostUrl
functions and use this instead?
if ( | ||
Object.values(redirectPathStarts).find(path => | ||
parsedPath.startsWith(path), | ||
) && | ||
parsedPath.split('/').length > 1 | ||
) { | ||
const redirectPath = Object.keys(redirectPathStarts).reduce((acc, key) => { | ||
const redirectType = key as RedirectType; | ||
// If starts with "methodologies/" or "publications/" | ||
if (parsedPath.startsWith(redirectPathStarts[redirectType])) { | ||
const pathSegments = parsedPath.split('/'); | ||
|
||
const rewriteRule = redirectsFromContentApi[redirectType]?.find( | ||
({ fromSlug }) => pathSegments[2].toLowerCase() === fromSlug, | ||
); | ||
|
||
if (rewriteRule) { | ||
return pathSegments | ||
.map(segment => | ||
segment.toLowerCase() === rewriteRule?.fromSlug | ||
? rewriteRule?.toSlug | ||
: segment.toLowerCase(), | ||
) | ||
.join('/'); | ||
} | ||
} | ||
return acc; | ||
}, ''); | ||
|
||
if (redirectPath) { | ||
return new URL(`${redirectPath}${parsedUrl.search}`, prodPublicUrl).href; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the reduce
and we're splitting out the path segments too many times.
This can be simplified to:
const redirectPathStart = Object.entries(redirectPathStarts).find(
([_, path]) => parsedPath.startsWith(path),
);
const pathSegments = parsedPath.split('/');
if (redirectPathStart && pathSegments.length > 1) {
const [key] = redirectPathStart;
const redirectType = key as RedirectType;
const rewriteRule = redirectsFromContentApi[redirectType].find(
({ fromSlug }) => pathSegments[2].toLowerCase() === fromSlug,
);
if (rewriteRule) {
const path = pathSegments
.map(segment =>
segment.toLowerCase() === rewriteRule?.fromSlug
? rewriteRule?.toSlug
: segment.toLowerCase(),
)
.join('/');
return new URL(`${path}${parsedUrl.search}`, prodPublicUrl).href;
}
}
Google has crawled over 200 links on our site which hit redirects, and is marking us down for this (in Michael's spreadsheet)
A sizable chunk of these are renamed publications/methodologies as per
<contentApi>/redirects
, and a few others also hit redirect rules introduced toserver.js
(i.e., ending in a trailing slash or having a hostname starting with www).This PR applies all our redirect rules to hrefs before they are rendered by
ContentHtml
to pre-empt these redirects. This unfortunately means making an API call to the content API each time we render a link, which is hardly ideal. I've left a TODO comment in the files below to prompt a discussion on this - there's a few ways we could go.