Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink serialization: Do not provide the type explicitly #2498

Merged
merged 1 commit into from
Nov 26, 2020

Conversation

Geod24
Copy link
Contributor

@Geod24 Geod24 commented Nov 24, 2020

Let IFTI do its thing, as it needs to for 'auto ref'.

Let IFTI do its thing, as it needs to for 'auto ref'.
@s-ludwig
Copy link
Member

Hm, nothing to do with the changes in here, but the "sink serialization" as it stands still looks a bit strange to me. Shouldn't it just support the same range based toString as format et al.? I think rather than falling back to isStringSerializable the serialization framework should also just use appender!string to present a string to serializers that don't support sink based toString to force making the results consistent. The public API changes could then also be dropped completely, since the fallback would be implicit.

Sorry for coming up with this now, I briefly saw the two PRs, but didn't manage to look through them in time.

On a related note, #2493 and #2495 appear to have been merged without a merge commit, which makes it hard to track it back to their PRs. We should avoid that and should just require the branch to be rebased before the merge instead if it is too out of date.

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 25, 2020

Shouldn't it just support the same range based toString as format et al.?

toString also support this sink approach. I'm not fully happy with the sink approach at the moment, because I think the type it accepts should be more specialized, but it's currently the best middle ground we have for our use case.

Using appender is not really solving the issue, it's still forcing one allocation strategy over another.
For reference, I discussed this at DConf: https://www.youtube.com/watch?v=9lOtOtiwXY4

Sorry for coming up with this now, I briefly saw the two PRs, but didn't manage to look through them in time.

No worries, I went ahead with the knowledge that you might come in and provide an alternative if it really wasn't working :)

On a related note, #2493 and #2495 appear to have been merged without a merge commit, which makes it hard to track it back to their PRs. We should avoid that and should just require the branch to be rebased before the merge instead if it is too out of date.

I'm not sure I follow ? You can see the PR from the commit in Github UI: d5d47d1 (you see which branches it's in, which tags, and the PR that introduced it).
I tend to prefer rebasing because it keeps a linear history. That's how git was meant to be used AFAIK, Github's PR just set a new default. They have started to add support for more git-friendly workflow over the past few years, including requiring linear history, this rebase approach, etc...

Obviously, if you prefer merge commit, I'll go back to that, but be aware that it's also dlang-bot default nowadays (when the PR has a single commit).

@s-ludwig
Copy link
Member

Shouldn't it just support the same range based toString as format et al.?

toString also support this sink approach. I'm not fully happy with the sink approach at the moment, because I think the type it accepts should be more specialized, but it's currently the best middle ground we have for our use case.

Yeah, sure, what I actually meant was to support all applicable variants, although I worded that poorly.

Using appender is not really solving the issue, it's still forcing one allocation strategy over another.
For reference, I discussed this at DConf: https://www.youtube.com/watch?v=9lOtOtiwXY4

But that's only for the case where the serializer doesn't support sink delegate based strings. It might in theory make certain structs use appender+sink when before they would use the basic toString, which could in turn work without allocations. The question is why would it then define a sink based toString in the first place?

BTW, I managed to watch your talk "live" with one ear, but missed a few parts. I have quite a number of places in mind where it would really help, though (opApply of course being the most prominent example). I'll watch it properly when I get the chance!

On a related note, #2493 and #2495 appear to have been merged without a merge commit, which makes it hard to track it back to their PRs. We should avoid that and should just require the branch to be rebased before the merge instead if it is too out of date.

I'm not sure I follow ? You can see the PR from the commit in Github UI: d5d47d1 (you see which branches it's in, which tags, and the PR that introduced it).

It may still be visible in the GitHub UI, but not in the repository (log) itself. I almost always rely on the latter and being forced to go through the web interface would be rather annoying. Also, this knowledge is lost as soon as this isn't hosted on GitHub anymore (who knows how things change).

Another benefit is that there is a history of merge commits that are known to pass the tests, whereas intermediate commits in a branch may actually fail, making bisecting more difficult.

I agree that a linear history is nice, though, and what we use internally is forcing a linear history of merge commits, which IMO is a quite acceptable middle ground, but AFAIK GitHub doesn't support that.

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 25, 2020

It may still be visible in the GitHub UI, but not in the repository (log) itself. I almost always rely on the latter and being forced to go through the web interface would be rather annoying. Also, this knowledge is lost as soon as this isn't hosted on GitHub anymore (who knows how things change).

Correct, but then, what use is the PR number if you don't have access to the PR itself ?

Another benefit is that there is a history of merge commits that are known to pass the tests, whereas intermediate commits in a branch may actually fail, making bisecting more difficult.

I only use it on single commit PRs, or when I'm 100% sure that the commits are atomic. And it's also why dlang-bot does rebase only for single-commit PRs, and use merges otherwise. git bisect is a godsend.

I agree that a linear history is nice, though, and what we use internally is forcing a linear history of merge commits, which IMO is a quite acceptable middle ground, but AFAIK GitHub doesn't support that.

You can require a linear history in the branch protections rules:
Screen Shot 2020-11-25 at 11 44 52

If I understood correctly what you meant.

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 25, 2020

The question is why would it then define a sink based toString in the first place?

Because that's the best way to compose. You can intercept in the delegate and do whatever you want. You can even write directly to a socket, bypassing any intermediary buffer. You can write to multiple destinations at once, etc...

@s-ludwig
Copy link
Member

Correct, but then, what use is the PR number if you don't have access to the PR itself ?

For example having it mentioned in the change log, also the important information is all in the commit itself - author, merger, merge date, PR description. Discussion comments and other GitHub-only things would have to be backed up separately, of course.

You can require a linear history in the branch protections rules:

Yeah, but doesn't that specifically disable the possibility to emit merge commits? I though that I had tried that once, but I'm not completely sure.

What I'm after is this:

  o---o--o   o--o-o   o---o
 /        \ /      \ /     \
o----------o--------o-------o

and only disallow this:

   o-----o-----o---o
  /          o--o-o \   o---o
 /          /      \ \ /     \
o----------o--------o-o------o

@s-ludwig
Copy link
Member

The question is why would it then define a sink based toString in the first place?

Because that's the best way to compose. You can intercept in the delegate and do whatever you want. You can even write directly to a socket, bypassing any intermediary buffer. You can write to multiple destinations at once, etc...

I was thinking of a case like this:

struct S {
   string field;
   string toString() { return field; }
   void toString(void delegate(string) del) { del(field); } // why would anyone define this?
}

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 25, 2020

I was thinking of a case like this: [...]

Well in this case there's little use for it, because the caller can do this itself. But that's a very rare case. If field is int for example, suddenly, you have to allocate. Or if it's char[] or const char[], you have to allocate (or cast but that leads to other issues).

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 25, 2020

For example having it mentioned in the change log

I guess that depends on your workflow, but in order to have it in the changelog, I still do the commit => PR things online. But if you want to do everything from the command line, that might make things a bit more annoying.

also the important information is all in the commit itself - author, merger, merge date, PR description.

Aside from the PR description (which should be the commit message), it's all there too:

$ git show --pretty=fuller d5d47d15
commit d5d47d15ca36fc824bfbc0b7520f0535419e2d5a
Author:     Daniel Graczer <[email protected]>
AuthorDate: Thu Nov 12 09:14:24 2020 +0100
Commit:     Mathias LANG <[email protected]>
CommitDate: Mon Nov 23 15:41:38 2020 +0100

    Added sink type toString to the serialization

    This change should reduce the number of memory allocations when using toString custom serialization
[...]

But okay, I'll make sure to use auto-merge in the future.

What I'm after is this:
[...]
and only disallow this:
[...]

The closest you can come to is "Require branches to be up to date before merging". But IMO that nullifies one big advantage of having a DVCS in the first place .

@dlang-bot dlang-bot merged commit ac7824c into vibe-d:master Nov 26, 2020
@Geod24
Copy link
Contributor Author

Geod24 commented Nov 26, 2020

Merging this, since it fixes a bug. Will use the auto-merge label from now or merge commit.
If you want to revisit the sink interface, I'm more than happy to see how it could be improved. I'm going to use it in our server in the meantime to reduce amount of allocations.

@Geod24 Geod24 deleted the serialization-fixup branch November 26, 2020 10:03
@PetarKirov
Copy link
Contributor

PetarKirov commented Nov 26, 2020

also the important information is all in the commit itself - author, merger, merge date, PR description.

BTW, @dlang-bot adds Merged-on-behalf-of to commits messages when doing merge commits (or at least it used to). I suppose it can do the same for auto-merge-rebase / auto-merge-squash and also add additional info like PR number and GitHub link, PR title, etc.

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 26, 2020

That'd be great. Now we just need to raise a PR (or at least, an issue so it isn't forgotten)

@PetarKirov
Copy link
Contributor

The closest you can come to is "Require branches to be up to date before merging". But IMO that nullifies one big advantage of having a DVCS in the first place .

At work we use exclusively fast-forward-only, as the idea is that before merging each developer needs to rebase on the dev (our main development branch). This can be done locally, or via GitLab's UI. In both cases the CI will always validate the result of a fast-forward merge, so it's possible for another merge request to sneak in and break the current one. As soon something is merged in dev we rebase again. In practice, this works great and we haven't found the need to look at the merge request themselves, as the idea is that each commit should have all the context necessary to understand the change. Though perhaps this is easier in a company, then an open-source projects where not everyone has time for follow things closely.

image

@PetarKirov
Copy link
Contributor

That'd be great. Now we just need to raise a PR (or at least, an issue so it isn't forgotten)

@Geod24 there you go: dlang/dlang-bot#262. Feel free to update the description.

@s-ludwig
Copy link
Member

s-ludwig commented Nov 26, 2020

I was thinking of a case like this: [...]

Well in this case there's little use for it, because the caller can do this itself. But that's a very rare case. If field is int for example, suddenly, you have to allocate. Or if it's char[] or const char[], you have to allocate (or cast but that leads to other issues).

Okay, so in that case, if the serializer supports the sink approach, it will use that overload to avoid allocations, only if that's not the case appender will be used as a fallback. The question is how serious or frequent that case is - using a custom serializer that doesn't support the sink overload and having a type that uses a custom allocation strategy instead of format or similar in its toString.

The closest you can come to is "Require branches to be up to date before merging". But IMO that nullifies one big advantage of having a DVCS in the first place .

I don't get that. You want to rebase everything directly on top of (say) master. Why would rebasing and having an additional merge commit suddenly nullify any DVCS advantages? edit: The only difference I can see is that the first case is directly supported by GitHub using the "rebase and merge" button, while the second one needs to be done manually by the contributor (in contrast to GitLab).

But aside from that question this indeed seems to be the setting that I was searching for. I overlooked that completely, probably because it is oddly placed above the status check list.

@Geod24
Copy link
Contributor Author

Geod24 commented Nov 26, 2020

Why would rebasing and having an additional merge commit suddenly nullify any DVCS advantages?

DVCS are great at parallelizing work. Remember the days of SVN ? I do, and it wasn't great to collaborate. OTOH, having the contributor to rebase has the opposite effect, it creates "choke points".
Nowadays Github has an option to allow a maintainer to push to one's fork, luckily, but it's not always enabled, and just creates more work for the maintainer(s).

I still use merge commits BTW, because I don't trust most people to make truly atomic commits (commits that address a single concern, compile, and will pass the test-suite). But for single-commit PRs, that's not an issue.

@s-ludwig
Copy link
Member

I agree, at least from an open source perspective, that this is a problem for contributors to a certain degree. For that reason I usually don't ask for it, if a branch is just slightly outdated. But it would be really nice if that "rebase and merge" button could just be configured to create a merge request. Having a "rebase now" button like GitLab has would also solve this.

Yeah, but SVN was really not great for parallel work and branching. But it did have a certain merit to always have a single source of truth (not saying that this outweighs its disadvantages in any way, though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants