Skip to content

AUTHORS/mailmap update (with update_authors.pl improvements) #17316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .mailmap
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,13 @@ Youzhong Yang <[email protected]>

# Signed-off-by: overriding Author:
Alexander Ziaee <[email protected]> <[email protected]>
Ryan <[email protected]> <error.nointernet@gmail.com>
Sietse <[email protected]> <[email protected]>
Felix Schmidt <[email protected]> <f.sch.prototype@gmail.com>
Olivier Certner <[email protected]> <[email protected]>
Phil Sutter <[email protected]> <[email protected]>
poscat <[email protected]> <[email protected]>
Qiuhao Chen <[email protected]> <[email protected]>
Ryan <[email protected]> <[email protected]>
Sietse <[email protected]> <[email protected]>
Yuxin Wang <[email protected]> <[email protected]>
Zhenlei Huang <[email protected]> <[email protected]>

Expand All @@ -101,6 +103,7 @@ Tulsi Jain <[email protected]> <[email protected]>
# Mappings from Github no-reply addresses
ajs124 <[email protected]> <[email protected]>
Alek Pinchuk <[email protected]> <[email protected]>
Aleksandr Liber <[email protected]> <[email protected]>
Alexander Lobakin <[email protected]> <[email protected]>
Alexey Smirnoff <[email protected]> <[email protected]>
Allen Holl <[email protected]> <[email protected]>
Expand Down Expand Up @@ -137,6 +140,7 @@ Fedor Uporov <[email protected]> <[email protected]
Felix Dörre <[email protected]> <[email protected]>
Felix Neumärker <[email protected]> <[email protected]>
Finix Yan <[email protected]> <[email protected]>
Friedrich Weber <[email protected]> <[email protected]>
Gaurav Kumar <[email protected]> <[email protected]>
George Gaydarov <[email protected]> <[email protected]>
Georgy Yakovlev <[email protected]> <[email protected]>
Expand Down
10 changes: 9 additions & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ CONTRIBUTORS:
Alejandro Colomar <[email protected]>
Alejandro R. Sedeño <[email protected]>
Alek Pinchuk <[email protected]>
Aleksandr Liber <[email protected]>
Aleksa Sarai <[email protected]>
Alexander Eremin <[email protected]>
Alexander Lobakin <[email protected]>
Expand Down Expand Up @@ -81,6 +82,7 @@ CONTRIBUTORS:
Arne Jansen <[email protected]>
Aron Xu <[email protected]>
Arshad Hussain <[email protected]>
Artem <[email protected]>
Arun KV <[email protected]>
Arvind Sankar <[email protected]>
Attila Fülöp <[email protected]>
Expand Down Expand Up @@ -227,10 +229,12 @@ CONTRIBUTORS:
Fedor Uporov <[email protected]>
Felix Dörre <[email protected]>
Felix Neumärker <[email protected]>
Felix Schmidt <[email protected]>
Feng Sun <[email protected]>
Finix Yan <[email protected]>
Francesco Mazzoli <[email protected]>
Frederik Wessels <[email protected]>
Friedrich Weber <[email protected]>
Frédéric Vanniere <[email protected]>
Gabriel A. Devenyi <[email protected]>
Garrett D'Amore <[email protected]>
Expand Down Expand Up @@ -484,7 +488,7 @@ CONTRIBUTORS:
Olaf Faaland <[email protected]>
Oleg Drokin <[email protected]>
Oleg Stepura <[email protected]>
Olivier Certner <olce[email protected]>
Olivier Certner <olce@FreeBSD.org>
Olivier Mazouffre <[email protected]>
omni <[email protected]>
Orivej Desh <[email protected]>
Expand Down Expand Up @@ -522,6 +526,7 @@ CONTRIBUTORS:
P.SCH <[email protected]>
Qiuhao Chen <[email protected]>
Quartz <[email protected]>
Quentin Thébault <[email protected]>
Quentin Zdanis <[email protected]>
Rafael Kitover <[email protected]>
RageLtMan <[email protected]>
Expand Down Expand Up @@ -573,6 +578,7 @@ CONTRIBUTORS:
Scot W. Stevenson <[email protected]>
Sean Eric Fagan <[email protected]>
Sebastian Gottschall <[email protected]>
Sebastian Pauka <[email protected]>
Sebastian Wuerl <[email protected]>
Sebastien Roy <[email protected]>
Sen Haerens <[email protected]>
Expand All @@ -589,6 +595,7 @@ CONTRIBUTORS:
Shen Yan <[email protected]>
Sietse <[email protected]>
Simon Guest <[email protected]>
Simon Howard <[email protected]>
Simon Klinkert <[email protected]>
Sowrabha Gopal <[email protected]>
Spencer Kinny <[email protected]>
Expand All @@ -610,6 +617,7 @@ CONTRIBUTORS:
Stéphane Lesimple <[email protected]>
Suman Chakravartula <[email protected]>
Sydney Vanda <[email protected]>
Syed Shahrukh Hussain <[email protected]>
Sören Tempel <[email protected]>
Tamas TEVESZ <[email protected]>
Teodor Spæren <[email protected]>
Expand Down
104 changes: 80 additions & 24 deletions scripts/update_authors.pl
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,17 @@
# the display version. We use this slug to update two maps, one of email->name,
# the other of name->email.
#
# Where possible, we also consider Signed-off-by: trailers in the commit
# message, and if they match the commit author, enter them into the maps also.
# Because a commit can contain multiple signoffs, we only track one if either
# the name or the email address match the commit author (by slug). This is
# mostly aimed at letting an explicit signoff override a generated name or
# email on the same commit (usually a Github noreply), while avoiding every
# signoff ever being treated as a possible canonical ident for some other
# committer. (Also note that this behaviour only works for signoffs that can be
# extracted with git-interpret-trailers, which misses many seen in the OpenZFS
# git history, for various reasons).
#
# Once collected, we then walk all the emails we've seen and get all the names
# associated with every instance. Then for each of those names, we get all the
# emails associated, and so on until we've seen all the connected names and
Expand Down Expand Up @@ -118,41 +129,62 @@
}
}

# Next, we load all the commit authors. and form name<->email mappings, keyed
# on slug. Note that this format is getting the .mailmap-converted form. This
# lets us control the input to some extent by making changes there.
my %git_names;
my %git_emails;

for my $line (reverse qx(git log --pretty=tformat:'%aN:::%aE')) {
# Next, we load all the commit authors and signoff pairs, and form name<->email
# mappings, keyed on slug. Note that this format is getting the
# .mailmap-converted form. This lets us control the input to some extent by
# making changes there.
my %seen_names;
my %seen_emails;

# The true email address from commits, by slug. We do this so we can generate
# mailmap entries, which will only match the exact address from the commit,
# not anything "prettified". This lets us remember the prefix part of Github
# noreply addresses, while not including it in AUTHORS if that is truly the
# best option we have.
my %commit_email;

for my $line (reverse qx(git log --pretty=tformat:'%aN:::%aE:::%(trailers:key=signed-off-by,valueonly,separator=:::)')) {
chomp $line;
my ($name, $email) = $line =~ m/^(.*):::(.*)/;
my ($name, $email, @signoffs) = split ':::', $line;
next unless $name && $email;

my $semail = email_slug($email);
my $sname = name_slug($name);

$git_names{$semail}{$sname} = 1;
$git_emails{$sname}{$semail} = 1;

# Update the "best looking" display value, but only if we don't already
# have something from the AUTHORS file. If we do, we must not change it.
if (!$authors_name{email_slug($email)}) {
update_display_email($email);
}

if (!$authors_email{name_slug($name)}) {
update_display_name($name);
# Track the committer name and email.
$seen_names{$semail}{$sname} = 1;
$seen_emails{$sname}{$semail} = 1;

# Keep the original commit address.
$commit_email{$semail} = $email;

# Consider if these are the best we've ever seen.
update_display_name($name);
update_display_email($email);

# Check signoffs. any that have a matching name or email as the
# committer (by slug), also track them.
for my $signoff (@signoffs) {
my ($soname, $soemail) = $signoff =~ m/^([^<]+)\s+<(.+)>$/;
next unless $soname && $soemail;
my $ssoname = name_slug($soname);
my $ssoemail = email_slug($soemail);
if (($semail eq $ssoemail) ^ ($sname eq $ssoname)) {
$seen_names{$ssoemail}{$ssoname} = 1;
$seen_emails{$ssoname}{$ssoemail} = 1;
update_display_name($soname);
update_display_email($soemail);
}
}
}

# Now collect unique committers by all names+emails we've ever seen for them.
# We start with emails and resolve all possible names, then we resolve the
# emails for those names, and round and round until there's nothing left.
my @committers;
for my $start_email (sort keys %git_names) {
for my $start_email (sort keys %seen_names) {
# it might have been deleted already through a cross-reference
next unless $git_names{$start_email};
next unless $seen_names{$start_email};

my %emails;
my %names;
Expand All @@ -163,12 +195,12 @@
while (my $email = shift @check_emails) {
next if $emails{$email}++;
push @check_names,
sort keys %{delete $git_names{$email}};
sort keys %{delete $seen_names{$email}};
}
while (my $name = shift @check_names) {
next if $names{$name}++;
push @check_emails,
sort keys %{delete $git_emails{$name}};
sort keys %{delete $seen_emails{$name}};
}
}

Expand All @@ -190,11 +222,24 @@

$authors_email{$name} = $email;
$authors_name{$email} = $name;

# We've now selected our canonical name going forward. If there
# were other options from commit authors only (not signoffs),
# emit mailmap lines for the user to past into .mailmap
my $cemail = $display_email{email_slug($authors_email{$name})};
for my $alias (@$emails) {
next if $alias eq $email;

my $calias = $commit_email{$alias};
next unless $calias;

my $cname = $display_name{$name};
say "$cname <$cemail> <$calias>";
}
}

# Now output the new AUTHORS file
open my $fh, '>', 'AUTHORS' or die "E: couldn't open AUTHORS for write: $!\n";
#my $fh = \*STDOUT;
say $fh join("\n", @authors_header, "");
for my $name (sort keys %authors_email) {
my $cname = $display_name{$name};
Expand Down Expand Up @@ -233,9 +278,18 @@ sub email_slug {
return lc $email;
}

# As we accumulate new names and addresses, record the "best looking" version
# of each. Once we decide to add a committer to AUTHORS, we'll take the best
# version of their name and address from here.
#
# Note that we don't record them if they're already in AUTHORS (that is, in
# %authors_name or %authors_email) because that file already contains the
# "best" version, by definition. So we return immediately if we've seen it
# there already.
sub update_display_name {
my ($name) = @_;
my $sname = name_slug($name);
return if $authors_email{$sname};

# For names, "more specific" means "has more non-lower-case characters"
# (in ASCII), guessing that if a person has gone to some effort to
Expand All @@ -252,9 +306,11 @@ sub update_display_name {
sub update_display_email {
my ($email) = @_;
my $semail = email_slug($email);
return if $authors_name{$semail};

# Like names, we prefer uppercase when possible. We also remove any
# leading "plus address" for Github noreply addresses.

$email =~ s/^[^\+]*\+//g if $email =~ m/\.noreply\.github\.com$/;

my $cemail = $display_email{$semail};
Expand Down