-
Notifications
You must be signed in to change notification settings - Fork 48
/
Copy path01_local_development.py
174 lines (150 loc) · 4.74 KB
/
01_local_development.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Databricks notebook source
# MAGIC %md
# MAGIC
# MAGIC # Local Development
# COMMAND ----------
# MAGIC %md ### Clone the Webinar Project
# MAGIC
# MAGIC The Webinar repo is available at the following location:
# MAGIC
# MAGIC ```
# MAGIC https://github.com/databricks-edu/Deploying-ML-Models-At-Scale
# MAGIC ```
# MAGIC Use `git` to clone the project to your system with this command:
# MAGIC
# MAGIC ```
# MAGIC git clone https://github.com/databricks-edu/Deploying-ML-Models-At-Scale.git
# MAGIC ```
# MAGIC
# MAGIC When it completes cloning, change directories to work in the new project:
# MAGIC
# MAGIC ```
# MAGIC cd Deploying-ML-Models-At-Scale/
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Configuring Conda
# MAGIC We will use Conda in this webinar both to configure our local and databricks environments while working.
# MAGIC
# MAGIC This webinar will not discuss conda installation and will assume that you already have conda running on your system.
# MAGIC
# MAGIC You can visit [https://docs.conda.io/projects/conda/en/latest/user-guide/install](https://docs.conda.io/projects/conda/en/latest/user-guide/install)
# MAGIC for more information about installing Conda.
# MAGIC
# MAGIC #### Conda Version
# MAGIC
# MAGIC You can see which version of conda you have installed on your system
# MAGIC with the command `conda --version`.
# MAGIC
# MAGIC #### Conda Environment
# MAGIC
# MAGIC Before we begin, you should create a new conda environment
# MAGIC for this webinar.
# MAGIC
# MAGIC You can create the new environment with this command:
# MAGIC
# MAGIC ```
# MAGIC conda create --name building-deploying python=3.7.6
# MAGIC ```
# COMMAND ----------
import sys
sys.version
# COMMAND ----------
# MAGIC %md
# MAGIC #### Activate Conda Environment
# MAGIC
# MAGIC Activate the new conda environment with this command:
# MAGIC ```
# MAGIC conda activate building-deploying
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Install Databricks Interfaces
# MAGIC
# MAGIC ```
# MAGIC pip install databricks databricks-connect mlflow
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Configure Databricks CLI
# MAGIC
# MAGIC Use this command to display options available to you with the
# MAGIC Databricks CLI:
# MAGIC
# MAGIC ```
# MAGIC databricks -h
# MAGIC ```
# MAGIC
# MAGIC Use this command to connect the CLI to your Workspace:
# MAGIC
# MAGIC ```
# MAGIC databricks configure
# MAGIC ```
# MAGIC
# MAGIC Use these options:
# MAGIC
# MAGIC - Databricks Host: the URL of your Databricks Workspace
# MAGIC - Username: your username in that Workspace
# MAGIC - Password: An [Access Token](https://docs.databricks.com/dev-tools/api/latest/authentication.html) generated for your Workspace User
# MAGIC
# MAGIC You can view the contents of your Databricks CLI configuration file with this command:
# MAGIC
# MAGIC ```
# MAGIC less ~/.databrickscfg
# MAGIC ```
# MAGIC
# MAGIC Press `q` to exit.
# COMMAND ----------
# MAGIC %md ### Identify Cluster to Use with Databricks Connect
# MAGIC
# MAGIC I used this command to identify the cluster to use with Databricks Connect:
# MAGIC
# MAGIC ```
# MAGIC databricks clusters list | grep joshua
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Configure Databricks Connect
# MAGIC
# MAGIC Use the command
# MAGIC
# MAGIC ```
# MAGIC databricks-connect configure
# MAGIC ```
# MAGIC
# MAGIC You should be able to use the default options.
# MAGIC
# MAGIC #### Test Databricks Connect
# MAGIC
# MAGIC This may fail if you don't have JDK 8 installed. You can install the
# MAGIC open JDK 8 by visiting this link: https://adoptopenjdk.net/
# MAGIC
# COMMAND ----------
# MAGIC %md ### Test the ETL Spark Job
# MAGIC
# MAGIC Use this command to test the ETL spark job:
# MAGIC
# MAGIC ```
# MAGIC spark-submit includes/main/python/etl.py --username FILL_IN_YOUR_USERNAME
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Configure MLflow
# MAGIC
# MAGIC Run the following to configure MLflow to use Databricks.
# MAGIC
# MAGIC 1. `export MLFLOW_TRACKING_URI=databricks`
# MAGIC - this tells MLflow to use Databricks to manage experiments
# MAGIC 1. `mlflow create -n /Users/YOUR_DATABRICKS_USER/building-deploying`
# MAGIC - this will create a new experiment in the Databricks Workspace
# MAGIC - make note of the resulting experiment id
# MAGIC 1. `export EXPERIMENT_NAME=/Users/YOUR_DATABRICKS_USER/building-deploying`
# COMMAND ----------
# MAGIC %md ### Install Scikit-Learn in Conda Environment
# MAGIC
# MAGIC ```
# MAGIC pip install sklearn
# MAGIC ```
# COMMAND ----------
# MAGIC %md ### Test the Experiment Spark Job
# MAGIC
# MAGIC Use this command to test the ETL spark job:
# MAGIC
# MAGIC ```
# MAGIC spark-submit includes/main/python/experiment.py --penalty l1 --max-iter 10000 --username joshuacook --experiment-name $EXPERIMENT_NAME
# MAGIC ```