-
Notifications
You must be signed in to change notification settings - Fork 339
SPL:Access MongoDB
MongoDB is a non-relational database storing data in BSON format, which is similar to JSON format, and it provides a full set of commands to process data. SPL, the lightweight computation engine language, supports nested data structure and is very easy to load JSON data. Therefore, we can extend the computational abilities of MongoDB with SPL. SPL provides the mongo_shell function, through which we can execute Mongo commands to read and write Mongo data.
Similar to the JDBC connection of relational database, the method of paired “open/close” is also used in SPL to connect to MongoDB.
mongo_open(connectionString), please refer to the official website for detailed parameters: Connection String.
mongo_close(mongoConn), mongoConn is the connection to be closed.
Sample: A1 creates the connection, and A3 closes the connection after some reading, writing, and calculation operations are done in the middle steps.
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/mymongo") |
2 | …… |
3 | =mongo_close(A1) |
Mongo defines hundreds of commands including querying, updating data, managing database users, etc. All the commands are inputted and outputted in a specific JSON format.
mongo_shell(inputJson), inputJson is the input of the command.
Sample:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/mymongo") |
2 | =mongo_shell(A1,"{'find':'orders',filter:{OrderID: { $gte: 50}},batchSize:10}") |
3 | =A2.cursor.firstBatch |
4 | =mongo_shell(A1,"{'getMore':"+string(A2.cursor.id)+",batchSize:20}") |
5 | =create(OrderID,Client,SellerId,Amount,OrderDate) |
6 | >A5.insert(0,26,"TAS",1,2142.4,"2009-08-05") |
7 | >A5.insert(0,28,"DSGC",21,2125.4,"2009-09-05") |
8 | =mongo_shell(A1,"{'insert':'orders',documents:"+json(A5)+"}") |
9 | =mongo_close(A1) |
A2 queries the data that satisfy OrdersID>50 from the “orders” table and returns the first 10 rows of data. The result is a nested SPL table sequence which is identical to the JSON outputted by the “find” command. The following compares the JSON format and SPL table sequence structure returned by the “find” command:
{
"cursor": {
"firstBatch": [{
"_id": 61 adf78694c8ba530702472e,
"OrderID": 84,
"Client": "GC",
"SellerId": 1,
"Amount": 88.5,
"OrderDate": "2009-10-16"
}
,......other 9 rows data ……
],
"id": 8676451393605378424,
"ns": "mymongo.orders"
},
"ok": 1
}
A3 is the data table sequence firstBatch, the third layer of A2.
A4 executes the getMore command to fetch another 20 rows of data using the cursor ID (A2.cursor.id) in the result table sequence of A2.
A5 creates an SPL table sequence in the same structure as “orders” table. A6 and A7 initialize two pieces of data to the table sequence of A5.
A8 uses the “insert” command to update A5 table sequence in the “orders” table of mango. And the SPL function json(A5) converts A5 table sequence to json strings, thus concatenating them to a complete “insert” command JSON string. “insert” command is executed to return {"n":2,"ok":1}, indicting two rows of data are successfully added to the “orders” table of Mongo.
We can see that the value of A8 is still the table sequence equivalent to the JSON result.
A complete query usually gets all the data through combining “find” command and multiple subsequent “getMore” commands. The mongo_shell function returns all the data directly with the @d option.
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/mymongo") |
2 | =mongo_shell@d(A1,"{'find':'orders',filter:{OrderID: { $gte: 50}},batchSize:10}") |
3 | =mongo_close(A1) |
A2 directly gets the data of “first ‘find’ + N ‘getMore’”, and encapsulates the getMore command executed N times inside the mongo_shell function which automatically loops through until all the data are obtained:
Returning big result set as a whole may lead to memory overflow. In this case, we can add @c option along with @d on the mongo_shell to return the SPL cursor, and then use the fetch function to retrieve data gradually:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/mymongo") |
2 | =mongo_shell@dc(A1,"{'find':'orders',filter:{OrderID: { $gte: 50}},batchSize:10}") |
3 | =if(A2==null, null, A2.fetch(5)) |
4 | =if(A3==null, null, A2.fetch(10)) |
5 | =mongo_close(A1) |
A2 is executed to get the SPL cursor, A3 fetches the 1^st^ to 5^th^ data, and
A4 fetches the 6^th^ to 15^th^ data.
Please note that @dc cannot return SPL cursor if A2 is not the “mongo” command for querying data, and we need to identify whether A2 is null when fetching data in A3 and A4.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code