Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation results vary based on duplicate relationships between nodes #6005

Open
MacondoExpress opened this issue Feb 14, 2025 · 1 comment
Labels
bug Something isn't working confirmed Confirmed bug

Comments

@MacondoExpress
Copy link
Contributor

MacondoExpress commented Feb 14, 2025

Describe the bug
The library currently aggregates related node values in a way that depends on how many relationships exist between the same two nodes. For instance, if a node is connected via multiple relationships, its aggregated values (such as the sum of ages) will be counted multiple times. This behavior causes inconsistent aggregation results when duplicate relationships exist between nodes. The expected outcome is that each unique related node is counted only once, regardless of the number of relationships linking them.

Type definitions

type Movie @node {
  title: String
  actors: [Actor!]!
    @relationship(type: "ACTED_IN", direction: IN, properties: "ActedIn")
}

type Actor @node {
  name: String
  age: Int
  born: DateTime
  movies: [Movie!]!
    @relationship(type: "ACTED_IN", direction: OUT, properties: "ActedIn")
}

type ActedIn @relationshipProperties {
  screentime: Int
  character: String
}

To Reproduce

Populate the DB with the following data.

Data

CREATE (m:Movie { title: "Terminator"})
CREATE (m)<-[:ACTED_IN { screentime: 60, character: "Terminator" }]-(arnold:Actor { name: "Arnold", age: 54, born: datetime('1980-07-02')})
CREATE (m)<-[:ACTED_IN { screentime: 120, character: "Sarah" }]-(:Actor {name: "Linda", age:37, born: datetime('2000-02-02')})
CREATE (m)<-[:ACTED_IN { screentime: 10, character: "Future Terminator" }]-(arnold)

Run the following query:

query {
  movies {
    actorsAggregate {
      node {
        age {
          sum
        }
      }
    }
  }
}

Expected behavior
For a movie with actors connected via relationships, each unique actor should contribute only once to the aggregated sum of their ages. In the given scenario, although "Arnold" has two separate "ACTED_IN" relationships (due to different characters), his age should only be summed once alongside Linda's. Therefore, the expected aggregate age should be 91 (54 for Arnold + 37 for Linda), but the current behavior counts Arnold twice, leading to an incorrect sum.

Current output

{
  "data": {
    "movies": [
      {
        "actorsAggregate": {
          "node": {
            "age": {
              "sum": 145
            }
          }
        }
      }
    ]
  }
}

System (please complete the following information):

Additional information

In the Cypher produced sum does not contain DISTINCT

MATCH (this:Movie)
CALL {
    WITH this
    MATCH (this)<-[this0:ACTED_IN]-(this1:Actor)
    RETURN { sum: sum(this1.age) } AS var2
}
RETURN this { actorsAggregate: { node: { age: var2 } } } AS this

This is not an issue specific to sum but to count as well:

query {
  movies {
    actorsAggregate {
      count
    }
  }
}

The above will count 3 actors rather than 2.

@MacondoExpress MacondoExpress added bug Something isn't working confirmed Confirmed bug labels Feb 14, 2025
@neo4j-team-graphql
Copy link
Collaborator

We've been able to confirm this bug using the steps to reproduce that you provided - many thanks @MacondoExpress! 🙏 We will now prioritise the bug and address it appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed Confirmed bug
Projects
None yet
Development

No branches or pull requests

2 participants