Skip to content

Draft: Model Definition and Model Zoo

Yi Wang edited this page Oct 18, 2019 · 1 revision

Design Doc: Model Definition and Model Zoo

Motivations

By extending the SQL syntax, SQLFlow allows users to train, use, or explain models by writing SQL programs. The SQL programs cannot define models but can call models defined in Python. It is essential to have a guide for Python developers, or data scientists, to write model definitions, that can be used by SQL developers, or analysts.

The Usage of Models

The Perspective from Analysts

An analyst could train a model by writing the following SQL statement with the extended syntax:

SELECT * FROM employee WHERE onboard_year < 2019
TO TRAIN MyModelDef 
COLUMN age, address, gender LABEL salary 
WITH param1=150, param2=0.1 
INTO my_first_model;

The identifier MyModelDef names a Python class. Without losing generality, let us assume it is a class derived from tf.keras.Model.

After the training completes, the analysts can use the model for prediction or model explanation.

SELECT * FROM employee WHERE onboard_year >= 2019 
TO PREDICT salary
USING my_first_model;

The Perspective from Data Scientists

The author of class MyModelDef writes a Python source file like the following.

import some_dependency

class MyModelDef(tf.keras.Model):
    def __init__(self, param1, param2, ...):
        self.layer1 = tf.keras.SomeLayer(param1)
        self.layer2 = some_dependency.SomeOtherLayer(param2)

    def __call__(self, input):
        retur self.layer2(self.layer1(input))

The Perspective of SQLFlow

Once received the above SQL statement, the SQLFlow server would generate and run a submitter program, which, in Python or Bash, retrieves and exports data from the database system and submits a distributed training job to a cluster.

The training process