Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Raku file extensions #108

Closed
AlexDaniel opened this issue Sep 24, 2019 · 22 comments
Closed

New Raku file extensions #108

AlexDaniel opened this issue Sep 24, 2019 · 22 comments
Assignees
Labels
language Changes to the Raku Programming Language

Comments

@AlexDaniel
Copy link
Member

This is a separate ticket for discussing solutions for the filename extensions issue in the Path to Raku PR. I'm starting a separate ticket so that we don't pollute the PR which is already hard to follow, and I hope that the PR will be eventually adjusted based on the feedback in this ticket. Because previous tickets¹ ² were a bit noisy, it was recommended to lock this discussion (so that only devs can comment).

Please see my review.

@AlexDaniel AlexDaniel added the language Changes to the Raku Programming Language label Sep 24, 2019
@Raku Raku locked as too heated and limited conversation to collaborators Sep 24, 2019
@lizmat
Copy link
Collaborator

lizmat commented Sep 24, 2019

I feel I don't have anything valuable to add to this discussion.

@duncand
Copy link

duncand commented Sep 24, 2019

Given that every solution we come up with is going to include keeping near term support for the current-to-be-legacy file extensions anyway, I suggest one way to avoid an impasse is to just make the PR say that existing extensions will continue to be supported for the short term, and that adding new extensions to better fit the name Raku will be the subject of a separate PR to follow soon after. Its not like the extension changes HAVE to happen at the same time, since they are backwards-compatible additions, not just substitutions. I personally don't want the main language renaming PR to be held up any further by extensions discussion.

@jnthn
Copy link
Contributor

jnthn commented Sep 24, 2019

Some comments.

The chosen extensions are too long/“ugly”. ... Change them to something likable

Thing is, this is a subjective judgement; if the PR had picked shorter ones, somebody could have written the exact same argument, just exchanging "long" for "short", to advocate the opposite position.

My initial reaction was also "hmm, those are a bit long", but:

  • The (admittedly small sample size) poll showed a preference for them.
  • I couldn't find an objective reason against them that felt very compelling. "Too long to type" didn't feel that convincing given things like tab completion and multiple ways to have a META6.json maintained for you.
  • It seemed like all of the shorter options had conflicts. The .rk, .rkm, etc. series breaks down when considering a test extension (.rkt is used by Racket, and the last thing we want to trample on is another programming language). I put forward an alternative (.ra...), but there were concerns over conflicts with RealMedia extensions, which are perhaps still bound widely enough to be an issue.

people will keep asking for alternatives

Maybe, but, as language responsible, I can say that problem-solving issues that attempt to revisit decisions made in the renaming PR will need to make a very compelling case to avoid being dismissed out of hand.

Do we need a test extension? Why?

I don't know for sure. To me it's fairly clear that:

  • prove is already just one of various ways to run tests, so we shouldn't really constrain ourselves by its behavior, or assume it's the only thing being used. (Evidence: prove6 exists, and anybody running tests from Comma IDE is using its runner.)
  • Expecting people to write shebang lines in their tests feels unreasonable in general. If folks choose to so they can use prove to run tests in one directory that are written in multiple languages, then that's fine. But I don't think it's a good default expectation or recommendation; it'll just feel like clutter/boilerplate, and I'd rather not have to explain it. The common case is that folks will have a single-language codebase, or be using quite different tools for running the tests of different parts of the system (frontend, backend).

While I accept that there exist technical mechanisms in many editors, IDEs, GitHub source rendering, etc. for making a distinction based on file content, that doesn't make this approach ideal. As noted above, I don't really feel "you should write shebang lines in your test files" is a recommendation we should be making. By contrast, a given file extension mapping to a certain programming language is unambiguous. In summary: I'm not arguing on the basis of "what's possible", but rather "what will tend to Just Work".

Justify why a separate extension for tests is needed

I'm not sure it's strictly possible to argue that it's needed, but that doesn't mean one can't argue that it would have value.

I'm open to all kinds of solutions, but not the ones that attempt to rush this PR by ignoring the details

As noted above, I did spend some time thinking the extensions in the PR through before approving it. So it wasn't a rush on my part.

@AlexDaniel
Copy link
Member Author

Just a quick note regarding shebang lines, I'm getting some weird hints. @jnthn, I think you're projecting your own habits as if it was common practice.

  • Expecting people to write shebang lines in their tests feels unreasonable in general. If folks choose to so they can use prove to run tests in one directory that are written in multiple languages, then that's fine. But I don't think it's a good default expectation or recommendation; it'll just feel like clutter/boilerplate, and I'd rather not have to explain it. The common case is that folks will have a single-language codebase, or be using quite different tools for running the tests of different parts of the system (frontend, backend).

While I accept that there exist technical mechanisms in many editors, IDEs, GitHub source rendering, etc. for making a distinction based on file content, that doesn't make this approach ideal. As noted above, I don't really feel "you should write shebang lines in your test files" is a recommendation we should be making. By contrast, a given file extension mapping to a certain programming language is unambiguous. In summary: I'm not arguing on the basis of "what's possible", but rather "what will tend to Just Work".

I wrote a quick script to gather some stats and ran it on a slightly outdated perl6-all-modules repo. Out of 1318 script files (p6, pl6, pl), 54.6% (719) have a shebang line and 35.9% (473) have executable bit set. This doesn't count files which have no extension, but if it did, the percentage would've been even higher.

By contrast, 0 scripts in all of your cpan and github repos have a shebang line (there are around 20, I can't be bothered to collect the exact number). And yes, even those scripts that have no extension, still don't have a shebang. Just because you've never written a shebang line for a public perl6 project doesn't mean that it's unthinkable that others will, because roughly half have no problem with it.

We can also do the same for .t/.t6 files, but we have to keep in mind that prove was misinterpreting perl6 shebangs as perl5, so there was never a good enough incentive to use it. That said: out of 6319 .t6 and .t files, 23.1% (1457) have a shebang and 3.6% (227) are executable.

So I totally don't see how it's unreasonable.

@AlexDaniel
Copy link
Member Author

To be fair, I probably made a dent in the number because over the years I've done many pull requests to various modules. If I ever saw any inconsistencies, I'd usually fix them (file extensions, executable bits, sometimes shebangs, etc.). I'd be surprised if my contribution is more than a 1%, but I guess it's fair to mention it anyway.

@jnthn
Copy link
Contributor

jnthn commented Sep 25, 2019

I wrote a quick script to gather some stats and ran it on a slightly outdated perl6-all-modules repo. Out of 1318 script files (p6, pl6, pl), 54.6% (719) have a shebang line and 35.9% (473) have executable bit set.

I was talking exclusively about test files.

Having slept on it, I'm thinking we could:

  • Offer an extension for those who simply wish to declare "this is Raku code, and it's intended as a test file".
  • For those who might wish to have multi-language test suites, recommend .t + shebang.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 25, 2019

Having slept on it, I'm thinking we could:

  • Offer an extension for those who simply wish to declare "this is Raku code, and it's intended as a test file".

  • For those who might wish to have multi-language test suites, recommend .t + shebang.

And what changed? Yesterday you said exactly the same thing. Basically, “If you really want to use .t, you can”. From the discussion yesterday:

<jnthn> 1) Nothing about the presence of an alternative for .t for those who wish to use it prevents them using it […]

@jnthn
Copy link
Contributor

jnthn commented Sep 25, 2019

And what changed?

Yesterday I was considering it, and today I'm feeling more decisive about it.

@AlexDaniel
Copy link
Member Author

Does anybody know any good example of long extensions that are similar to our use case?

I know there's .markdown, but then .md seems to be more popular. In case of .asciidoc. their own documentation recommends .adoc and says that it is most often used.

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 25, 2019

So I decided to gather some stats. On debian, apt-file tool has an index of all files in all existing packages (even those not installed). So accumulating some stats from paths spewed by apt-file is somewhat reasonable, but it does depend on repos that are configured on a particular system.

Anyway, on my system apt-file search . > all-files gives a 938.9 M output (11 699 782 files listed).

Then I wrote a quick and dirty raku script to count files with particular extensions. Resulting json file.

But that's just raw data, now what? Well, let's look at extensions that are equal in length or longer than .rakumod. Also, there are 5576 pm/pm6 files in all of our modules, so assuming that debian eventually includes most of them, let's cap at 5000 results (not unreasonable, there are 58k “.pm” files which are probably all perl5-related).

#!/usr/bin/env perl6
use JSON::Fast;
my %extensions = from-json slurp extension-stats.json;

.say for
%extensions
.grep({.key.chars7+1})
.grep({.value > 5000})
.sort({-.key.chars, -.value})

Result:

.shader_test => 33939
.properties => 6559
.kicad_mod => 14518
.docbook => 22575
.desktop => 19462

.shader_test files come from a single package piglit, and all kicad footprints come from 3 kicad packages. .desktop and .properties are different beasts, but .docbook is probably the closest thing we'll find to compare our extensions to.

Some conclusions:

  • In the stupid competition for the longest extension for a common file type we get the 2nd place, yay 🎉🥈 (.docbook wins)
  • Answering my own question, .docbook seems to be an interesting example. It seems that .dbk is an alternative, yet everyone is preferring .docbook? I'd love to know more about it.

Edit: Oops, I got confused. These are kicad footprints, not 3d models

@AlexDaniel
Copy link
Member Author

Now that I think about it, I shouldn't be dismissing .kicad_mod files. It just happens that all of these files are gathered in just a few packages, but every KiCad project will often have their own custom footprints.

Which makes me wonder, why not .raku-mod or .raku_mod? I couldn't find an objective reason against them that feels very compelling.

@vrurg
Copy link
Contributor

vrurg commented Sep 25, 2019

I would like to note that I strongly disagree to the following point: Note that mentioning some random text editor as a reason is not going to work, you have to actually know why it won't work with that editor. Janathan used the right term for this situation: things are better just work. So, where we're with editors? Alex demand they all understand shebangs. I would rather be happy if they're but unfortunately, we can't rely on this. Shebang itself is very much unixish thing. Windows-bound editors are not obliged to understand it. Good if they're but I wouldn't expect this.

Some other editors can be fixed with additional configuration. Which sometimes means time and efforts. Perhaps most people reading this are ready to adjust they tooling for their needs. And capable of doing so. But if we aim at wider adoption of Raku we must consider the army of juniors or just lazy ones who would rather adapt themselves to the tools. They do so for various reason, but the result is this: they won't try something that requires breaking their habbits.

And my bottom line for the above: extension is the primary and most used way of defining the file type. And even if one has an editor which supports shebangs or other ways of detecting file format, OSes are not that smart. Most of them are solely dependent on extensions. And wether we want this or not, a lot of people are used to just double click get... what? RealPlayer of VLC? Other language runtime?

Long extensions isn't what I wanted in first place. But neither I wanted all this renaming headache. It's just so that reasoning is in favor of these two things. It's not about our wishes. My preference, BTW, is for twextensions for .rk because it the clearest way, similar to namespacing. I.e. .t.rk defines 'test file of Raku', .m.rk is 'module of Raku', and so on. A decent editor would see a Raku file whereas more advanced tools would also know what exact kind of file is it. And I see no big deal in making prove understand *.t.* as test files.

But the whole point about twextensions is that they got no support from the community and even strong opposition from @lizmat! Perhaps the reason for this was that the idea was born too late and got no time to convince people.

As to lengthiness and that no other language is using such extensions. Is it the first thing where Perl6 is first of its kind? Saying "this is bad because no one else doing this" is like saying "this is bad because our grandfathers didn't do it".

@patrickbkr
Copy link
Member

For me type-ability is not the same as word length. I find it easier to type words only consisting of letters than words also containing special characters such as -, _ or .. Symbols that need the shift key (like the _) are least prefered by me. So typing wise .rakumod is preferable to .raku_mod, .m.rk or similar to me, but that's highly subjective. But it'd be interesting how others perceive this.

@ugexe
Copy link
Contributor

ugexe commented Sep 25, 2019

Twigiled extensions are just asking for more ambiguity (just like having multiple extensions represent the same thing, which I'll also adamantly reject). If you have .t.rk and .rk then anything grepping for e.g. scripts can't just look for .rk, they have to also filter out anything with t..

The proposed solution was fine, and extending this to get everything you want is not generally how compromises that succeed end.

@AlexDaniel
Copy link
Member Author

Another quick note:

  • prove is already just one of various ways to run tests, so we shouldn't really constrain ourselves by its behavior, or assume it's the only thing being used. (Evidence: prove6 exists, and anybody running tests from Comma IDE is using its runner.)

  • Expecting people to write shebang lines in their tests feels unreasonable in general

Rereading it, it's pretty weird. So we shouldn't expect people to write shebangs, but we're also not constraining ourselves to the behavior of prove… But that means .t without shebangs is also OK because we can recommend our own tooling which defaults to raku.


@vrurg I ask you for concrete examples yet you come up with more mythical stuff. Please be more specific.

@AlexDaniel
Copy link
Member Author

FWIW, I think we didn't really get further than the brainstorming phase on this issue, there are probably still some potentially good ideas out there.

For example, just as an idea, .rk.t (yes, in that order). Include a shebang and prove works. All other tools can immediately see what's inside without peeking into the file. *.t glob finds all test files, *.rk.t finds raku test files. You can drop .rk part if you hate it.

For example: .rk, .rkm, .rkd, .rk.t

@vrurg
Copy link
Contributor

vrurg commented Sep 25, 2019

If you have .t.rk and .rk then anything grepping for e.g. scripts can't just look for .rk, they have to also filter out anything with t..

Actually, I can come up with counter-example of finding all Raku files. And whereas filtering out .t.rk is a solution for your task, in my task there is no other solution but to analyze file content. Yet:

The proposed solution was fine, and extending this to get everything you want is not generally how compromises that succeed end.

I'm totally ok with long extensions. I would be ok with short ones if they're pushed through even if they overlap with other formats. What I'm not ok with is another discussion on this subject as if we haven't had enough of #89, #101, and #106.

@AlexDaniel concrete examples of what? I have concrete examples of people around behaving the way I described. I know that would they be used to Atom and would they have language-perl6 package installed just to find out their Perl5 files are now of 'plain text' with no support whatsoever – they would just drop the idea of trying Perl6.

@jnthn
Copy link
Contributor

jnthn commented Sep 25, 2019

Well, if you want concrete, I also play product lead/manager for Comma. Wearing that hat, a language rename is, in the near term, nothing but a cost. I've already made a list of things we'll need to take care of as a result of the rename; it's not terribly short, and it's not like we suddenly have more resources to deal with them. If the rename helps the userbase increase in the future, that's great, but the costs are to be paid now.

As @vrurg notes, in the context of working with files, distinguishing content by an extension - usually defined as something after the final . in a filename - is overwhelmingly the common case. It's simple, and there's no question about whether an editor is able to do it. Anything more than that is more effort. For example, does the two-part extension name thing work out naturally or not? I don't actually know without trying it, even on the IDEA platform, which I know fairly well. I know we can do distinctions based on file content, but even then I'd have to check what happens if one plugin tries to take an extension without looking at the content, while another wants to look at the content (e.g. does every plugin have to play ball, or is there precedence given to content-based checks).

Comma has it good compared to the situation for, I believe, Perl 6 support in every other editor/IDE. I have some budget/team to work on it; we'll maybe not be able to do some other feature we'd hoped to work on quite so soon, but the required adaptations can be done on a relatively short time schedule. Every other editor/IDE plugin support is volunteer-driven, and it's been noted that there's few volunteers. Do we really want to make this any harder than it needs to be?

What I'm not ok with is another discussion on this subject as if we haven't had enough of #89, #101, and #106.

Indeed, and - unfairly or not - I find @AlexDaniel more unreasonable with every post I read. That probably means I need some days away from this topic. I'll catch up with it again next week. (I've unsubscribed. Please respect that by not @-ing me in the meantime.)

@AlexDaniel
Copy link
Member Author

AlexDaniel commented Sep 26, 2019

I find @AlexDaniel more unreasonable with every post I read

You've got to be kidding me. People tell me about some mythical issues in editors, I'm the one who investigates and looks what these editors do, turns out that there are no issues, or at least the situation is definitely not as extreme as people want to depict it. Then you tell me that it's unreasonable to expect people to use shebangs, so I go and gather some stats that show the opposite. Then I wonder if there's any other project that has extensions as long as we will, and nobody has a clue, so I go and gather more real stats and find such example (which doesn't really support my position, right, but whatever, I'm the one unreasonable!)

Alright, if my posts are unreasonable, then please people, bring some reason into this issue.

Edit: It has been pointed out that I misunderstood the last comment, and I did.

@vrurg
Copy link
Contributor

vrurg commented Sep 26, 2019

This would likely be my last comment on this issue. I would try to make things as clear as possible. First, have have to define the clear target or otherwise discussion will never end and no conclusion would be made.

The whole renaming thing is about to gain wider user base. #81 is dedicated to this subject. I'm thinking of this target when I consider any problem related to renaming. This thread is not exception. By considering the target one must remember that enthusiasts are a minority of the programming community these days. It's not like before the dotcom boom days anymore. This is why I'm trying to think user-centric way. In other words, I think of decent juniors and how they make decisions. And, perhaps, this is the difference in approaching the problem. I know that a problem exists and I'm trying to figure out how would it impact an average guy trying a new language. These are not "mythical issues" but my attempt to model a situation. My model could be right or it could be wrong. But because I believe it is right I can't stand aside but trying to show the situation from this point of view.

Look, I'm not saying your statistics is wrong. I just don't understand why the fact that nobody did something before should convince anybody that it shouldn't be done. After all, few if not nobody was developing a multi-paradigm language. Nobody was doing grammars they way Perl6 does them. Heck, nobody considered regexes seriously before Perl!

Or the editors. I know there're many of them. But then, after all, I'm almost like one of those users I mentioned above. My days of puppy-like happiness about configuring another application for my needs is basically over. I'm in love with Apple products because they just work(tm). When I got tired of zsh and decided to switch to fish, it took me about a month to accumulate enough bravery for tuning it for my needs. I'm currently happy about all shell configs being synced over a git repo, but I regret for the time I lost implementing this.

What would you expect from those who's just making money on their daytime job? Those who consider new language for basically three reasons: the boss told so; this could pay back in wages; or because in a competitive environment one must always learn. Neither of three is about "it's a fantastic language, I fell in love with it!", all are pragmatic. Would you propose them to change their tooling they're used to before starting with Perl6/Raku? The only case when this would work is the first one. And even then they'd be likely to quietly sabotage.

BTW, this is common problem of many professional communities: they tend to enclose themselves in own informational bubble and don't see things from the outside.

To conclude, to my view "being unreasonable" doesn't mean you can't reason your point. But it rather means that you allow a minor problem to stop us from resolving the big one. What I started #106 I expected a clean and fast discussion coming to a concrete conclusion. When things went the other way I came up with voting idea exactly for the reason that this is what people do when consensus is too hard to be achieved. The number of votes turns out to be rather small and unrepresentative? Well, those who chose to opt out of voting chose to live with decisions made by others. It's not always bad as some matters doesn't worth bothering. In this case the most active part of the community voiced out their preference. It's not in favor of your choice? Actually, as you can see, it's not what I'd like to see too. But this is the common problem of democracy and consensuses: not everybody likes the outcomes.

So, Alex, I would like to ask you to pull back your RFC on #89 and let it all be over. After all, nobody can guarantee that voting will be in favor of the PR. Perhaps this whole discussion doesn't make sense because there will be no renaming. And if it happens there will be a grace period afterwards when it would not be too late to initiate another vote, give it more time, and see the outcome.

That's all I wanted to say on this.

@AlexDaniel
Copy link
Member Author

@vrurg if you want this to go faster, then please start tweaking the PR. First step is easy, you remove the sentence that says that .t will be deprecated. Then it gets a bit harder, because you'll likely need to define what's going to happen with .t. I don't understand jnthn's position, is it now that both .rakutest and .t will be supported at the same time?

Anyway, see my review for some hints on what else should be tweaked.

@AlexDaniel
Copy link
Member Author

See #89 (review).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
language Changes to the Raku Programming Language
Projects
None yet
Development

No branches or pull requests

7 participants