-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data wide/long reshape functions #142
Comments
Sounds interesting? I’d love to see a sketch of what these might look like. Perhaps some combination of array.flatMap and d3.group? |
Using tidyr's pivoting examples I started putting together some ideas here: https://github.com/mhkeller/pivoting. I think the most readable is 2b but that's an older style. |
I’ve dropped 1 and 2a into Observable notebooks for easy tinkering: https://observablehq.com/d/41bc065377cb7e36 |
If I were to write the relig_income example in vanilla JavaScript, I’d probably use array.flatMap like so: data.columns.slice(1).flatMap(income => data.map(({religion, [income]: count}) => ({religion, income, count}))) Here’s another take on your first pivot function: function pivot(data, columns, name, value) {
const keep = data.columns.filter(c => !columns.includes(c));
return data.flatMap(d => {
const base = keep.map(k => [k, d[k]]);
return columns.map(c => {
return Object.fromEntries([
...base,
[name, c],
[value, d[c]]
]);
});
});
} I haven’t evaluated the performance of any approach yet. https://observablehq.com/d/3ea8d446f5ba96fe Not directly related to this issue, but I’m also interested in making columnar data easier to use in JavaScript, since that should offer better performance. A column-oriented data structure is typically what I think of as a “data frame”. |
Very neat destructuring in the vanilla js example. The question I have that came up working through number two was 'What's the best API to handle transform arguments?' What I did with nested arrays I thought was a bit unwieldy and I hadn't yet gotten to implementing all of the features, such as An alternative would be to limit the scope of this function and say it doesn't handle column name cleaning or casting (although I could see something like
For large datasets, maybe going through the data multiple times is a pain, performance wise? For the casual user, it can be nice just having one data transformation step, for sure. I think my preference would be that if there's a manageable API, it would handy to do these transformations within |
I made pivot 1 as a generator for your amusement https://observablehq.com/d/ac2a320cf2b0adc4 |
An example here https://observablehq.com/@didoesdigital/16-july-2020-data-wrangling-for-population-pyramids ; in that case the "best" strategy, it seems, is a flatMap https://observablehq.com/@didoesdigital/16-july-2020-data-wrangling-for-population-pyramids#pyramid |
Regarding the inverse operation (long to wide), is there a more elegant alternative to using d3.groups(data, d => d.religion)
.map(([religion, x]) => {
return {
religion,
...x.reduce((acc, { income, count }) => {
acc[income] = count;
return acc;
}, {}),
};
}); |
@nachocab Can you enable link sharing on the notebook so we can see? |
Here’s another take of the inverse operation, replacing array.map with Array.from, and replacing array.reduce with Object.fromEntries: Array.from(
d3.group(data, d => d.religion),
([religion, group]) => Object.fromEntries(
[["religion", religion]].concat(
group.map(d => [d.income, d.count])
)
)
) |
@mbostock That's beautiful! Thank you for helping me understand those functions more deeply and for pointing out the link sharing bit. I'll remember it for next time. 👍 |
Just moved to data visualisation and realised I'm a noop with respect to data manipulation... My conversions from Sqlite based normalised long data to wide was uhhh... less then optimal (to put it mildly) :-( Build-in long/wide reshape functions would be very welcome. Btw. thanks for this incredible library! |
What are your thoughts on adding data reshape functions similar to the melt and wide_to_long functions in pandas or pivoting, gather and spread in the tidyverse?
It's a very common pattern when loading data for charts, such as in the multiline example. I find myself frequently writing these reshape functions in each project and they're often some of the least literate parts of my code. They're especially distracting when trying to teach people chart concepts and they hit a big speed bump right off the bat.
Anyway, it would be a great addition to the JavaScript world. If there are other packages that have already done this that I missed, let me know. I've seen a few "let's rewrite pandas/dplyr in js" packages over the years but none ever gets completed, let alone maintained. Happy to be wrong, though, if someone has broken off these functions somewhere!
The text was updated successfully, but these errors were encountered: