diff --git a/modules/ROOT/images/data-source-fields.png b/modules/ROOT/images/data-source-fields.png new file mode 100644 index 000000000..d42dd3197 Binary files /dev/null and b/modules/ROOT/images/data-source-fields.png differ diff --git a/modules/ROOT/images/data-sources-interaction.png b/modules/ROOT/images/data-sources-interaction.png new file mode 100644 index 000000000..5ed384d8f Binary files /dev/null and b/modules/ROOT/images/data-sources-interaction.png differ diff --git a/modules/ROOT/images/sources.png b/modules/ROOT/images/sources.png index b401e3460..90e7300dd 100644 Binary files a/modules/ROOT/images/sources.png and b/modules/ROOT/images/sources.png differ diff --git a/modules/ROOT/pages/import/file-provision.adoc b/modules/ROOT/pages/import/file-provision.adoc index 880585f76..f1e8be875 100644 --- a/modules/ROOT/pages/import/file-provision.adoc +++ b/modules/ROOT/pages/import/file-provision.adoc @@ -2,16 +2,64 @@ = Data provision :description: This section describes how to provide files for import. -You start by connecting your data source, which can be a relational database or local flat files. +The *Data sources* tab is where you can add new data sources and see sources you have already added. -In essence, you provide the data in some format to be imported and Import imports this into your instance. +You start by connecting a data source, which can be a relational database or a cloud data warehouse, or you can stream local flat files. + +In essence, you provide the data in some format to be imported and the Import service imports this into your instance. Import supports relational databases and flat files, i.e. files that contain data in a tabular format where each row represents a record and each column represents a field in that record. The most common format for flat files is CSV (comma-separated values), but Import also supports TSV (tab-separated values). -When connecting to a remote data source, the tables are provided for you from that database. +== Connecting to remote data sources + +When you use the *New data source* button, you are presented with the following options for remote sources: + +* PostgreSQL +* MySQL +* SQL Server +* Oracle +* Snowflake + +Regardless of which one you select, you are required to provide roughly the same information to allow the Import service to load the tables for you from your remote source. + +.Example data source +[.shadow] +image::data-source-fields.png[] + +First, you need to give the data source a name to identify it. + +Second, you need to complete various fields with information from your data source. +These are different depending on which source you are using. + +The *Host* field is the same for all sources, it is your database server's hostname or IP address, and it can normally be found in your account details with the vendor in question. + +The *Port* is pre-populated for you and defines which network port is used to connect to your database server. -But when you stream your local CSV files, the process can be more iterative and manual. +*Database* or *Service* (for Oracle) is the name of the database that contains the tables you want to import to your Aura instance. + +The *Schema* of your tabular data is needed for Import to know how your tables relate to each other. +Note that this field is not included for MySQL. + +Additionally for cloud data warehouses (Snowflake), you can optionally provide both a *Warehouse* name and a *Role*. +If no information is provided in these fields, the default values are used. + +Third, you need to provide user credentials for your data source. +These are the username and password used to access the remote data source, *not* your Aura credentials. + +Once you have entered all the required information, the Import service can connect to your remote source and can fetch all the tables and import them to your Aura instance. + +When you have added a data source, you need to create a data model before you can import data. +See xref:import/modeling.adoc[] for more information. +Your added data sources are listed on the *Data sources* tab and you can interact with them via the *[...]* menu. + +.Interact with data source +[.shadow] +image::data-sources-interaction.png[] + +== Streaming local files + +When you stream your local CSV files, the process can be more iterative and manual. Import requires all CSV files to have a header row and at least one row of data. The header row speficies how the data in the file should be interpreted and contains information for each field. @@ -28,4 +76,7 @@ image::files.png[] When you provide CSV files to Import, only a reference to your local files is kept. This is used to send the necessary commands to stream the data to your target database when you run the import. Your local files themselves are *not* uploaded anywhere and therefore, if you reload the page before running the import, the files are no longer available to the page and you need to provide them again. -This is due to security features of your browser. \ No newline at end of file +This is due to security features of your browser. + +Once the tables or files are in place, you need to specify how they should be organized in relation to each other, you need to *model your data*, in other words. +See xref:import/modeling.adoc[] for more information. \ No newline at end of file diff --git a/modules/ROOT/pages/import/introduction.adoc b/modules/ROOT/pages/import/introduction.adoc index 8cf5872a4..bb81d8581 100644 --- a/modules/ROOT/pages/import/introduction.adoc +++ b/modules/ROOT/pages/import/introduction.adoc @@ -9,6 +9,7 @@ It allows you to import data without using any code from: * MySQL * SQL Server * Oracle +* Snowflake * .CSV This service is also available as a standalone tool for importing data into self-managed Neo4j instances. diff --git a/modules/ROOT/pages/import/mapping.adoc b/modules/ROOT/pages/import/mapping.adoc index 608c82825..c3640d52e 100644 --- a/modules/ROOT/pages/import/mapping.adoc +++ b/modules/ROOT/pages/import/mapping.adoc @@ -2,9 +2,13 @@ :description: This sections describes how to map files to a data model. = Mapping -Mapping is the process of associating a file with an element in your data model. +Mapping is the process of associating a table or file with an element in your data model. This is what allows Import to construct the Cypher statements needed to load your data. -Your source files may contain data that is not relevant to your data model, and when you build your model, you can select what data to use. + +When you generate a data model, the mapping is largely done for you. +However, as you add elements to your model, whether you create a model manually or if you are adding to a generated model, the new elements need to be mapped to their corresponding tables or files. + +Your data source may contain data that is not relevant to your data model, and when you build your model, you can select what data to use. Only data that is mapped correctly to the elements in your model is imported, so it is important to get the mapping right. [NOTE] diff --git a/modules/ROOT/pages/import/modeling.adoc b/modules/ROOT/pages/import/modeling.adoc index 130147eea..02b57837c 100644 --- a/modules/ROOT/pages/import/modeling.adoc +++ b/modules/ROOT/pages/import/modeling.adoc @@ -2,14 +2,26 @@ = Data modeling The data model is the is the blueprint for the database. -It defines how the data is organized and is the key to create a graph from your files. -The data model is what you map your files to. +It defines how the data is organized and is the key to create a graph from your data. +The data model is what you map your tables or files to. It consists of nodes and relationships. Nodes need to have a _label_ and one or more _properties_. Relationships need to have a _type_ and one or more _properties_. -When you connect a remote data source, you are offered to have the model generated and mapped for you. -You can also upload a model, with or without data, in a _.json_ format. +You can add a new model and see your available models (if you have any) in the *Graph models* tab. +If you add a new model, you can either either *Connect to a data source* to select a source from your list (see xref:import/file-provision.adoc[] for information about data sources), or you can drag and drop (or browse for) local flat files. + +Once you have defined your data source, you have three options to create your model: + +* *Define manually* - sketch your model and map your data +* *Generate from schema* - automatically define and map a model based on tables and constraints in your data +* *Generate with AI* - if you have enabled *Generative AI assistance* (in the xref:visual-tour/index.adoc#org-settings[Organization settings]), your model and mapping is automatically defined using AI on available metadata in your data source, to give you a more complete model. + +When you have a model generated, with or without AI, always review and make sure the model and mapping is done as expected. +The generated model as meant to be a starting point and you add elements to it as needed since a relational database often does not contain sufficient information for the Import service to generate a complete model. +Depending on what metadata your data contains, you may get a more complete model if you generate with AI. + +If you are streaming local files, you can also upload a model, with or without data, in a _.json_ format via the [*...*] more menu. But if you want full control of how your data is modeled, you can create the model manually. The data model panel is located in the center of the screen. diff --git a/modules/ROOT/pages/import/quick-start.adoc b/modules/ROOT/pages/import/quick-start.adoc index ed0417832..0608bf39f 100644 --- a/modules/ROOT/pages/import/quick-start.adoc +++ b/modules/ROOT/pages/import/quick-start.adoc @@ -2,7 +2,9 @@ :description: This section gives an overview of the Import service. = Quick start -The Import service consists of three tabs; *Data sources*, *Graph models*, and *Import jobs*. +The Import service UI consists of three tabs; *Data sources*, *Graph models*, and *Import jobs*. +These reflect the three stages of importing data; provide the data, i.e., configure a source to fetch the data from, model the data, i.e., define how the data is organized, and finally, run the import. + If you haven't previously imported any data, all three are empty, otherwise sources, models, and import jobs are listed here. [.shadow] @@ -16,10 +18,13 @@ Import supports PostgreSQL, MySQL, SQL Server, as well as locally hosted flat fi [.shadow] image::sources.png[width=400] -For SQL-files, you need to configure the data source, add user credentials for the SQL-database, and give the data source a name. +For relational databases and cloud data warehouses, you need to give the data source a name, configure the data source, and add user credentials for the database account. +The data source configuration is essentially the same for both relational databases and data warehouses; you specify a *host* for your database, a *port* to connect to, the name of the *database*/*service*, and a *schema* that contains your tables (except for MySQL data sources). +See xref:import/file-provision.adoc[] for more information. + If you want to stream local files, you can drag and drop them into the data source panel or browse for them. -== Model the data +== Model and map the data When you have connected a data source, you have the option to have a model generated based on primary and foreign key constraints in the source database. The quickest way is to accept to have a model generated, but you can draw your own later, see xref:import/modeling.adoc[] for more information.