-
Notifications
You must be signed in to change notification settings - Fork 706
Draft: Model Definition and Model Zoo
By extending the SQL syntax, SQLFlow allows users to train, use, or explain models by writing SQL programs. The SQL programs cannot define models but can call models defined in Python. It is essential to have a guide for Python developers, or data scientists, to write model definitions, that can be used by SQL developers, or analysts.
An analyst could train a model by writing the following SQL statement with the extended syntax:
SELECT * FROM employee WHERE onboard_year < 2019
TO TRAIN MyModelDef
COLUMN age, address, gender LABEL salary
WITH param1=150, param2=0.1
INTO my_first_model;
The identifier MyModelDef
names a Python class. Without losing generality, let us assume it is a class derived from tf.keras.Model
.
After the training completes, the analysts can use the model for prediction or model explanation.
SELECT * FROM employee WHERE onboard_year >= 2019
TO PREDICT salary
USING my_first_model;
The author of class MyModelDef
writes a Python source file like the following.
import some_dependency
class MyModelDef(tf.keras.Model):
def __init__(self, param1, param2, ...):
self.layer1 = tf.keras.SomeLayer(param1)
self.layer2 = some_dependency.SomeOtherLayer(param2)
def __call__(self, input):
retur self.layer2(self.layer1(input))
Once received the above SQL statement, the SQLFlow server would generate and run a submitter program, which, in Python or Bash, retrieves and exports data from the database system and submits a distributed training job to a cluster.
The training process