-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520
Comments
if i change this: val dataFrame1 = loadmodel.setInputDataset(frame)
.score()
dataFrame1.show(false) it’s ok so when i use model to predict data i cann't change the order of columns ? |
In your example you seem does not seem to be using the // Drop id column
val frame = dataFrame.drop("id")
// Extract response and predictor Features
val (irisClass, predictors) = FeatureBuilder.fromDataFrame[Text](frame, response = "irisClass")
// Automated feature engineering
val featureVector = predictors.transmogrify()
// Automated feature validation and selection
val index = irisClass.indexed("__unknown", StringIndexerHandleInvalid.Keep)
val checkedFeatures = index.sanityCheck(featureVector, removeBadFeatures = true)
val pred = MultiClassificationModelSelector
.withTrainValidationSplit()
.setInput(index, checkedFeatures)
.setOutputFeatureName("pred")
.getOutput()
// Setting up a TransmogrifAI workflow and training the model
val model: OpWorkflowModel = new OpWorkflow()
.setInputDataset(frame)
.setResultFeatures(pred)
.train()
val scored = model.setInputDataset(frame).score()
scored.show(false) |
sorry !write mistake。。。 this
you example is right but when i change this frame ( change the order of columns rename frame_new) and then use model predict then have bug:
so we predict data should keep the order of columns???? |
and we can use this like sparkml pipeline example: val (irisClass, predictors1) = FeatureBuilder.fromDataFrame[Text](dataFrame, response = name)
val strindex = new OpStringIndexer()
.setInput(irisClass)
.setOutputFeatureName("index")
val strModel = strindex.fit(dataFrame)
val mm = strModel.getSparkMlStage() match {
case Some ( x ) => x
}
val opdt = new OpDecisionTreeClassifier()
.setInput(strindex.getOutput(), featureVector1)
.setOutputFeatureName("dtPred")
val labels = mm.labels
val inde = new OpIndexToString()
.setInput(strindex.getOutput())
.setLabels(labels)
.setOutputFeatureName("pred")
val pipelineModel = new Pipeline("getAlgorithmType")
.setStages(Array(strindex, opdt, inde))
.fit(dataFrame) do you have example like that? |
We never tried resorting to the columns. In general, this should not be an issue since we refer the columns by their names. Why would you need to do it? Transmogrify stages can be used in Spark ML pipelines as long as you maintain the naming conventions on the columns. |
When we train the model, we use this model again to predict a batch of data, but the column order of this batch of data is different, and the column names are the same. If the order of the data columns read by the model cannot be changed, this reduces the generality |
OK, I just went through the code. Each Feature that was constructed from a Dataframe Row has an One option I see to overcome this is to recreate the features prior scoring using the new dataset, then use them as input for the model. |
I don't quite understand; use new data sets to create features and then use the original model to predict |
when i use : |
when i used iris.csv data:
so i create StructType like this:
next i get label col and feature col:
id isn't label and feature when use this it means id is also a feature col , but i don't want this;
so i select cols comment is label or feature and then i drop other cols
but get bug:
The text was updated successfully, but these errors were encountered: