Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

[BUG] Multi index groupby with sort=True inconsistent with pandas #37

Open
ChengjieLi28 opened this issue Oct 8, 2022 · 0 comments
Open

Comments

@ChengjieLi28
Copy link

Describe the bug

Multi index groupby with sort=True inconsistent with pandas.

To Reproduce

To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.9.12
  2. The version of Mars you use: latest
  3. Versions of crucial packages, such as numpy, scipy and pandas: follow mars
  4. Full stack of the error.
    No error.
  5. Minimized code to reproduce the error.
In [8]: mdf.groupby(["b", "c"], sort=True)["a"].agg(['nunique', 'mean']).execute()
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:00<00:00, 1064.07it/s]
Out[8]:
     nunique      mean
b c
a d        7  3.636364
b b        7  4.000000
c d        6  5.000000
d b        3  5.000000
a a        5  5.166667
  c        2  4.000000
c a        6  4.285714
  c        4  5.166667
a b        5  2.857143
b d        2  3.500000
c b        5  5.300000
d a        1  7.000000
  c        5  4.125000
b a        5  2.666667
  c        3  6.666667
d d        8  4.230769

In [9]: df.groupby(["b", "c"], sort=True)["a"].agg(['nunique', 'mean'])
Out[9]:
     nunique      mean
b c
a a        5  5.166667
  b        5  2.857143
  c        2  4.000000
  d        7  3.636364
b a        5  2.666667
  b        7  4.000000
  c        3  6.666667
  d        2  3.500000
c a        6  4.285714
  b        5  5.300000
  c        4  5.166667
  d        6  5.000000
d a        1  7.000000
  b        3  5.000000
  c        5  4.125000
  d        8  4.230769

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

mdf.groupby(["b", "c"], sort=True)["a"].agg(['nunique', 'mean']).execute().fetch().sort_index()
is same as pandas result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant