Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make calculating of Map Analysis Faster and less resource intensive #587

Open
northdpole opened this issue Feb 22, 2025 · 3 comments
Open
Labels
enhancement New feature or request GSOC this feature is a potential Google Summer of Code candidate help wanted Extra attention is needed

Comments

@northdpole
Copy link
Collaborator

Currently gap analysis requires 64gb of ram on a chunky server with an external neo4j cluster and an external redis in order to calculate ~10gb worth of graph shortest paths.

This crashes most commercial laptops and takes more than 24hours on a GCP medium machine.

There are many micro-optimizations we can do to make the gap analysis faster and less resource intensive such as:

  • preload the relevant subgraphs only in neo4j
  • re-use precalculated paths
  • optimize the cypher queries and the redis usage
  • for any standard pair avoid trying to calculate a path between every node of standard A and every node of standard B
  • experiment with cutting out the standards and calculating a gap analysis between relevant CREs, since we only have 400 CREs this should be much faster than calculating gaps between thousands of standard nodes.
  • Optimize the python code to not access memory repeatedly
  • Reduce the information reported as the final result
    etc.
@northdpole northdpole added enhancement New feature or request GSOC this feature is a potential Google Summer of Code candidate help wanted Extra attention is needed labels Feb 22, 2025
@Hardik301002
Copy link

Hi @northdpole
"Here are some potential optimizations to improve efficiency and reduce resource usage:"

  1. Preload Relevant Subgraphs – Load only necessary subgraphs in Neo4j instead of the entire graph to reduce memory usage.
  2. Reuse Precomputed Paths – Implement caching to store and reuse previously calculated paths, avoiding redundant computations.
  3. Optimize Cypher Queries & Redis Usage – Improve query efficiency and ensure Redis is used effectively to reduce latency.
  4. Limit Pairwise Node Calculations – Focus only on critical node pairs instead of computing paths between all nodes.

Let me know if you’d like me to explore any of these solutions in more detail !

@northdpole
Copy link
Collaborator Author

Hey @Hardik301002 , i think i've tried to do several of these but i'm by no means an expert in graph dbs, if you want you can take a stab at it.

Btw, if you used an LLM to get these suggestions some details:

  1. the query is matching subgraphs already -- if you know cypher in any depth, perhaps you can optimise this further
  2. the whole gap analysis is cached when calculated once, i'm not sure how to cache precomputed paths in neo4j
  3. this is too vague to be able to give you more pointers
  4. i think this is the same as 1.

@Hardik301002
Copy link

Hardik301002 commented Mar 16, 2025

Please assign this issue to me, i want to work on this issue further under your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request GSOC this feature is a potential Google Summer of Code candidate help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants