Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EES-4823 - Add rel keywords to links based on externality #5001

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

sambiramairelogic
Copy link
Contributor

This PR modifies the html attributes of anchor and link tags as they're displayed by the public site. It also fixes a minor bug because we're currently assessing whether a link is external by whether it includes explore-education-statistics.service.gov.uk, which means the admin site is being counted as an internal link.

Links to external sites which give 404s, 500s, or 401s (in the case of the admin site) are being followed by Googlebot and showing up as errors in the Google Search Console.

We should:

  • Add noreferrer, and noopener to any links to external sites, including admin. This tells the browser not to provide a referrer when the user clicks the link (a security issue in older browsers), and to begin a new context when it does.
  • Add nofollow to any external sites, including admin. This tells Googlebot not to follow them.
  • Add external to any sites we don't trust. This tells Googlebot that we don't endorse (not that we necessarily condemn it either!) the target of this link. I've chosen to exempt admin and gov.uk from this, as we're all under the umbrella of the larger gov.uk domain.
  • Add target="_blank" to any external links, including admin.
  • Add (opens in a new tab) to the link body text to any external links, including admin.
  • Continue to adhere to all existing link formatting, including maintaining upper case paths on external links, and trimming all links.

Pic for proof of new behaviour:
Links

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do appreciate there's lots of ways we could approach this - we may not need to make the distinction between external-admin and external-trusted, for example. We might also want an array of trusted sites so we can include more than just gov.uk, or we may want to drop this distinction entirely and only have internal and external.

I think this is a good starter for ten and allows us to tweak things going forward - but if anyone disagrees let's chat it through :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with this approach 👍

Comment on lines 72 to 101
if (node.name === 'a') {
const targetUrl = node.attribs.href;

if (
transformFeaturedTableLinks &&
typeof node.attribs['data-featured-table'] !== 'undefined'
) {
const text = domToReact(node.children);
return isMounted && typeof text === 'string'
? transformFeaturedTableLinks(targetUrl, text)
: undefined;
}

if (formatLinks) {
const text = domToReact(node.children);
const { url, target, rel, externality } =
getPropsForExternality(targetUrl);

return externality !== 'internal' &&
typeof node.attribs['data-featured-table'] === 'undefined' ? (
// eslint-disable-next-line react/jsx-no-target-blank
<a href={url} target={target} rel={rel}>
{text} (opens in a new tab)
</a>
) : (
<a href={url} rel={rel}>
{text}
</a>
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could extract this into a module-scoped renderLink function to make this a bit easier to read

Comment on lines +265 to +277

const rel = link.getAttribute('rel');
expect(rel).toContain('noopener');
expect(rel).toContain('noreferrer');
expect(rel).toContain('nofollow');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be safer to keep the assertion like it was previously (i.e. using toHaveAttribute) there could be additional stuff inside of the rel that we're not expecting and could point to an issue.

The current assertions would fail to detect these issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not revert back to toHaveAttribute because I want to be order agnostic (i.e. the rel could be noopener noreferrer or noreferrer noopener and the test should still pass).

We could maybe do expect(rel.split(' ')).toHaveLength(3) to catch the issues you raise?

Comment on lines +284 to +296
const rel = link.getAttribute('rel');
if (rel !== null) {
expect(rel).not.toContain('noopener');
expect(rel).not.toContain('noreferrer');
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thing to above. Would use not.toHaveAttribute to be more precise. Also it's a bit strange to wrap assertions in a condition like this.

Same applies to test case further down as well.

});

test('returns new values if no existing rel values', () => {
const result = buildRel(['noopener', 'noreferrer'], undefined);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just drop the undefined parameter as it's optional

});

test('removes empty values', () => {
const result = buildRel([''], 'noopener noreferrer');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to test multiple values. One value could be a string that only contains spaces i.e. ' '

Comment on lines 22 to 30
case 'internal':
return {
url: formattedUrl,
target: undefined,
rel: undefined,
externality,
text,
};
case 'external-admin':
Copy link
Collaborator

@ntsim ntsim Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other comments, an admin URL isn't always external (i.e. if you're already in admin), and vice-versa for public URLs. I think we need to go further and figure out what our host URL is and compare this with the external link's URL to figure out the externality.

Unfortunately, the implementation for this isn't super straightforward.

One initial idea was to use window.location, but I suspect this won't work well on the public frontend due to it being incompatible with SSR.

Environment variables don't really work as these can't be changed in the admin where they get baked into the build. As a result, I don't think they can be changed server or client-side in the admin.

Currently, I think the simplest approach would be to implement some common config that can set the host URL globally e.g.

// @common/utils/hostUrl.ts

let hostUrl: URL;

export function setHostUrl(value: string): void {
  hostUrl = new URL(value);
}

export function getHostUrl(): URL {
  return hostUrl;
}

We can then call the setHostUrl function in the admin and public frontend (in the correct places) to set the hostUrl.

In the public:

// pages/_app.tsx

// Call this at module-scope
setHostUrl(process.env.PUBLIC_URL);

In the admin:

// src/config.ts

import { setHostUrl } from '@common/utils/clientUrl';

export async function getConfig(): Promise<Config> {
  if (!config) {
    config = await fetch('/api/config').then(r => r.json());

    setHostUrl(config.url);
  }

  return config;
}

The admin will require us to add a new url property to the response from /api/config. This would need to be set from the AdminUri configuration variable, which is already set up for us (see appsettings.Development.json). It just needs to be fully qualified with a https://.

Once we have these bits in place, we should be able to check the URLs using getHostUrl e.g.

import { getHostUrl } from '@common/utils/hostUrl';

const url = new URL(options.url);
const hostUrl = getHostUrl();

if (url.host === hostUrl.host) {
  // internal
} else {
  // external
}

Comment on lines +10 to +11
PUBLIC_API_BASE_URL=https://public-api-base-url
PUBLIC_API_DOCUMENTATION_URL=https://public-api-base-url/docs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like a mistake. We already have variables for the public API above.

Comment on lines 349 to 352
const { url, target, rel, text } =
getPropsForExternality(
legacyLinkUrl,
description,
);

return (
<li key={id} data-testid="other-release-item">
<a href={url} target={target} rel={rel}>
{text}
</a>
</li>
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're in the frontend and we've already added the external linking props within the Link component, we can just simplify this all down to:

return (
  <li key={id} data-testid="other-release-item">
    <Link to={legacyLinkUrl}>{description}</Link>
  </li>
);

Comment on lines 422 to 431
const { url, target, rel, text } =
getPropsForExternality(link.url, link.description);

return (
<li key={link.id}>
<a href={url} rel={rel} target={target}>
{text}
</a>
</li>
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above - can replace this with the Link component

...props
}: LinkProps) => {
const { target, rel } = getPropsForExternality(
to.toString(),
undefined,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per other comment - if we use an options object for the function parameter, we don't need to have an undefined here

@sambiramairelogic sambiramairelogic force-pushed the EES-4823/update-broken-links branch from 1cfe056 to 6c6ee01 Compare July 16, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants