This repo contains the data and codes for the paper submitted to ACM TKDD, titled "A Compact Vulnerability Knowledge Graph for Risk Assessment".
VulKG
├── README.md
├── import
│ ├── AffectsAddProperty.csv
│ ├── DomainNodes_Vulnerabiliy_HAS_REFERENCE_Domain_relationship.csv
│ ├── ExploitNodes.csv
│ ├── ProductNodes_VendorNodes_Vulnerability_AFFECTS_Product_BELONGS_TO_Vendor.csv
│ ├── VulnerabilityNodes.csv
│ ├── VulnerabilityNodesAddProperties.csv
│ ├── Vulnerabiliy_HAS_EXPLOIT_Exploit_relationship.csv
│ └── WeaknessNodes.csv
├── DescriptionEmbedding
│ └── VulnerabilityNodesTextEmbedding20.pkl
├── 2.VulKG_Deployment_Cypher.cypher
├── 3.3.VCBD_link_prediction.py
├── 3.4.1.Plot_F1_score_comparison.py
├── 3.5.1.select_co_exploitation_links_in_training_and_test_sets.py
├── 3.5.2.CO_AFFECT_subgraph_and_topological_feature_extraction.py
├── 3.5.3.non_topological_feature_extraction.py
└── 3.5.3.non_topological_feature_extraction.py
Folder import contains all original data for VulKG deployment.
Folder DescriptionEmbedding contains the 20-dimensional extracted feature from vulnerability descriptions using a pre-trained BERT model. This file will be used in 3.5.3.non_topological_feature_extraction.py.
File 2.VulKG_Deployment_Cypher.cypher contains the cypher codes for VulKG deployment on the Neo4j graph database platform, described in Section 2.
Files 3.3 - 3.5.3 are the python codes for the use case, Vulnerability Co-Exploitation Behaviour Discovery (VCBD), on the VulKG.
This section introduces how to deploy the VulKG into the Neo4j graph database platform.
Neo4j Desktop
-
Download (from here) and install (refer here) Neo4j Desktop 1.4.15 or higher versions
-
Create a project named VulKG Project with Neo4j Desktop.
-
Add a local DBMS named Graph DBMS with Neo4j Desktop and set the password as Neo4j. Choose version 4.4.11 for Graph DBMS.
-
Start Graph DBMS
-
Install APOC (refer here) and Graph Data Science Library (refer here) plugins for Graph DBMS with Neo4j Desktop.
-
Open the setting file of Graph DBMS and add a line as below in the setting file.
apoc.import.file.enabled=true
to tackle an error:
Failed to invoke procedure `apoc.periodic.iterate`: Caused by:
java.lang.RuntimeException: Import from files not enabled, please set
apoc.import.file.enabled=true in your apoc.conf
- Put all files in the import folder into the import folder of Graph DBMS.
- Open Graph DBMS with Neo4j Browser. Since Neo4j Browser comes out-of-the-box when you install Neo4j Desktop on your system, no installation is required.
- Click the Enable multi statement query editor to enable running multiple Cypher statements separated by semi-colons ; in the Neo4j Browser setting.
- Run Cypher statements in the 2.VulKG_Deployment_Cypher.cypher file with the Neo4j Browser to deploy VulKG.
This section introduces how to implement the use case: Vulnerability Co-exploitation Behaviour Discovery (VCBD) on VulKG.
Python and Cypher
numpy==1.22.4 该版本需要python 3.10或3.9
scikit-learn==1.1.1
matplotlib==3.5.2
Data is provided in folder GD_VCBD_Ready_to_go. The generation process of this subgraph dataset is described in Section 3.5.
Run python codes in 3.3.VCBD_link_prediction.py to get the results reported in Table 7 and Table 8.
Run python codes in 3.4.1.Plot_F1_score_comparison.py to get the visualization results reported in Fig. 4.
Run python codes in 3.4.2.Plot_ROC.py to get the visualization results reported in Fig. 5.
This subsection introduces how to generate a raw version and a ready-to-go version of graph datasets for the VCBD task, which are provided in folders named GD_VCBD_Raw and GD_VCBD_Ready_to_go. In case someone wants to know the details on how to extract subgraph datasets from VulKG.
Python and Cypher
py2neo==2021.2.3 pip没有这个版本,有2021.2.4版
pandas==1.4.2
numpy==1.22.4
- Open Neo4j Desktop and start Graph DBMS
- Open the setting file of Graph DBMS. Search and change the memory setting as below
dbms.memory.heap.initial_size=4G dbms.memory.heap.max_size=4G
to tackle an error:
py2neo.errors.ClientError: [Procedure.ProcedureCallFailed] Failed to invoke procedure
`gds.graph.project.cypher`: Caused by: java.lang.OutOfMemoryError: Java heap space
- run python codes in 3.5.1.select_co_exploitation_links_in_training_and_test_sets.py to construct the link head-tail pairs in the training and test sets
- run python codes in 3.5.2.CO_AFFECT_subgraph_and_topological_feature_extraction.py to extract the CO_AFFECT subgraph and topological features.
- run python codes in 3.3.non_topological_feature_extraction.py to extract non-topological features.
Once done, the generated GD-VCBD subgraph datasets will be saved in the corresponding folders, GD_VCBD_Raw and GD_VCBD_Ready_to_go.