Skip to content

Snowflake #675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: console
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added modules/ROOT/images/data-source-fields.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified modules/ROOT/images/sources.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 56 additions & 5 deletions modules/ROOT/pages/import/file-provision.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,64 @@
= Data provision
:description: This section describes how to provide files for import.

You start by connecting your data source, which can be a relational database or local flat files.
The *Data sources* tab is where you can add new data sources and see sources you have already added.

In essence, you provide the data in some format to be imported and Import imports this into your instance.
You start by connecting a data source, which can be a relational database or a cloud data warehouse, or you can stream local flat files.

In essence, you provide the data in some format to be imported and the Import service imports this into your instance.

Import supports relational databases and flat files, i.e. files that contain data in a tabular format where each row represents a record and each column represents a field in that record.
The most common format for flat files is CSV (comma-separated values), but Import also supports TSV (tab-separated values).

When connecting to a remote data source, the tables are provided for you from that database.
== Connecting to remote data sources

When you use the *New data source* button, you are presented with the following options for remote sources:

* PostgreSQL
* MySQL
* SQL Server
* Oracle
* Snowflake

Regardless of which one you select, you are required to provide roughly the same information to allow the Import service to load the tables for you from your remote source.

.Example data source
[.shadow]
image::data-source-fields.png[]

First, you need to give the data source a name to identify it.

Second, you need to complete various fields with information from your data source.
These are different depending on which source you are using.

The *Host* field is the same for all sources, it is your database server's hostname or IP address, and it can normally be found in your account details with the vendor in question.

The *Port* is pre-populated for you and defines which network port is used to connect to your database server.

But when you stream your local CSV files, the process can be more iterative and manual.
*Database* or *Service* (for Oracle) is the name of the database that contains the tables you want to import to your Aura instance.

The *Schema* of your tabular data is needed for Import to know how your tables relate to each other.
Note that this field is not included for MySQL.

Additionally for cloud data warehouses (Snowflake), you can optionally provide both a *Warehouse* name and a *Role*.
If no information is provided in these fields, the default values are used.

Third, you need to provide user credentials for your data source.
These are the username and password used to access the remote data source, *not* your Aura credentials.

Once you have entered all the required information, the Import service can connect to your remote source and can fetch all the tables and import them to your Aura instance.

When you have added a data source, you need to create a data model before you can import data.
See xref:import/modeling.adoc[] for more information.
Your added data sources are listed on the *Data sources* tab and you can interact with them via the *[...]* menu.

.Interact with data source
[.shadow]
image::data-sources-interaction.png[]

== Streaming local files

When you stream your local CSV files, the process can be more iterative and manual.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "iterative and manual." mean? @AlexicaWright

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you do this manually in more than one iteration, unlike with the remote data sources where things are automatic and in one go.


Import requires all CSV files to have a header row and at least one row of data.
The header row speficies how the data in the file should be interpreted and contains information for each field.
Expand All @@ -28,4 +76,7 @@ image::files.png[]
When you provide CSV files to Import, only a reference to your local files is kept.
This is used to send the necessary commands to stream the data to your target database when you run the import.
Your local files themselves are *not* uploaded anywhere and therefore, if you reload the page before running the import, the files are no longer available to the page and you need to provide them again.
This is due to security features of your browser.
This is due to security features of your browser.

Once the tables or files are in place, you need to specify how they should be organized in relation to each other, you need to *model your data*, in other words.
See xref:import/modeling.adoc[] for more information.
1 change: 1 addition & 0 deletions modules/ROOT/pages/import/introduction.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ It allows you to import data without using any code from:
* MySQL
* SQL Server
* Oracle
* Snowflake
* .CSV

This service is also available as a standalone tool for importing data into self-managed Neo4j instances.
Expand Down
8 changes: 6 additions & 2 deletions modules/ROOT/pages/import/mapping.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,13 @@
:description: This sections describes how to map files to a data model.
= Mapping

Mapping is the process of associating a file with an element in your data model.
Mapping is the process of associating a table or file with an element in your data model.
This is what allows Import to construct the Cypher statements needed to load your data.
Your source files may contain data that is not relevant to your data model, and when you build your model, you can select what data to use.

When you generate a data model, the mapping is largely done for you.
However, as you add elements to your model, whether you create a model manually or if you are adding to a generated model, the new elements need to be mapped to their corresponding tables or files.

Your data source may contain data that is not relevant to your data model, and when you build your model, you can select what data to use.
Only data that is mapped correctly to the elements in your model is imported, so it is important to get the mapping right.

[NOTE]
Expand Down
20 changes: 16 additions & 4 deletions modules/ROOT/pages/import/modeling.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,26 @@
= Data modeling

The data model is the is the blueprint for the database.
It defines how the data is organized and is the key to create a graph from your files.
The data model is what you map your files to.
It defines how the data is organized and is the key to create a graph from your data.
The data model is what you map your tables or files to.
It consists of nodes and relationships.
Nodes need to have a _label_ and one or more _properties_.
Relationships need to have a _type_ and one or more _properties_.

When you connect a remote data source, you are offered to have the model generated and mapped for you.
You can also upload a model, with or without data, in a _.json_ format.
You can add a new model and see your available models (if you have any) in the *Graph models* tab.
If you add a new model, you can either either *Connect to a data source* to select a source from your list (see xref:import/file-provision.adoc[] for information about data sources), or you can drag and drop (or browse for) local flat files.

Once you have defined your data source, you have three options to create your model:

* *Define manually* - sketch your model and map your data
* *Generate from schema* - automatically define and map a model based on tables and constraints in your data
* *Generate with AI* - if you have enabled *Generative AI assistance* (in the xref:visual-tour/index.adoc#org-settings[Organization settings]), your model and mapping is automatically defined using AI on available metadata in your data source, to give you a more complete model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "a more complete model." mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you get a model that is closer to completion. Keep reading on line 21.


When you have a model generated, with or without AI, always review and make sure the model and mapping is done as expected.
The generated model as meant to be a starting point and you add elements to it as needed since a relational database often does not contain sufficient information for the Import service to generate a complete model.
Depending on what metadata your data contains, you may get a more complete model if you generate with AI.

If you are streaming local files, you can also upload a model, with or without data, in a _.json_ format via the [*...*] more menu.
But if you want full control of how your data is modeled, you can create the model manually.

The data model panel is located in the center of the screen.
Expand Down
11 changes: 8 additions & 3 deletions modules/ROOT/pages/import/quick-start.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
:description: This section gives an overview of the Import service.
= Quick start

The Import service consists of three tabs; *Data sources*, *Graph models*, and *Import jobs*.
The Import service UI consists of three tabs; *Data sources*, *Graph models*, and *Import jobs*.
These reflect the three stages of importing data; provide the data, i.e., configure a source to fetch the data from, model the data, i.e., define how the data is organized, and finally, run the import.

If you haven't previously imported any data, all three are empty, otherwise sources, models, and import jobs are listed here.

[.shadow]
Expand All @@ -16,10 +18,13 @@ Import supports PostgreSQL, MySQL, SQL Server, as well as locally hosted flat fi
[.shadow]
image::sources.png[width=400]

For SQL-files, you need to configure the data source, add user credentials for the SQL-database, and give the data source a name.
For relational databases and cloud data warehouses, you need to give the data source a name, configure the data source, and add user credentials for the database account.
The data source configuration is essentially the same for both relational databases and data warehouses; you specify a *host* for your database, a *port* to connect to, the name of the *database*/*service*, and a *schema* that contains your tables (except for MySQL data sources).
See xref:import/file-provision.adoc[] for more information.

If you want to stream local files, you can drag and drop them into the data source panel or browse for them.

== Model the data
== Model and map the data

When you have connected a data source, you have the option to have a model generated based on primary and foreign key constraints in the source database.
The quickest way is to accept to have a model generated, but you can draw your own later, see xref:import/modeling.adoc[] for more information.
Expand Down