Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed prediction to run with multithreading #54

Open
wants to merge 6 commits into
base: stable
Choose a base branch
from
Open

Changed prediction to run with multithreading #54

wants to merge 6 commits into from

Conversation

skjerns
Copy link

@skjerns skjerns commented May 15, 2019

I saw that the predict or integrity_score is running quite slow.

  1. I've added functionality to let it run with threading, making it much faster.
    It adds a dependency on joblib, however this is already a dependency of sklearn, so no new dependencies are really added. This makes the code ~8x faster (with 8 threads).

  2. I've changed the call from Shell.check_output to subprocess.check_output. Shell is calling subprocess.check_output in the background anyway, but like this we get another speedup of ~3-4x

so a total speedup of ~30x is possible.

Example:

import numpy as np
import sklearn_porter
from sklearn.ensemble import RandomForestClassifier

train_x = np.random.rand(1000, 8)
train_y = np.random.randint(0, 4, 1000)

rfc = RandomForestClassifier(n_estimators=10)
rfc.fit(train_x, train_y)
        
porter = sklearn_porter.Porter(rfc, language='c')
porter.integrity_score(train_x) # ~30 times faster.

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

@nok
Copy link
Owner

nok commented Jun 25, 2019

Hello @skjerns ,

this is great! I will merge your PR and adapt it to the new major release. Until it's done I will keep this PR open.

Best, Darius

@skjerns
Copy link
Author

skjerns commented Jun 25, 2019

Meanwhile I have found another solution that speeds up things to almost real-time predictions:

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

example for C:

int main(int argc, const char * argv[]) {
    if ((argc-1) % n_features != 0){
            printf("Need to supply N x %d features flattened, %d were given", n_features, argc-1);
            return 1;
        }
    double features[n_features];
    int n_rows = (argc-1) / n_features;
    for (int row=0; row < n_rows; row++){
        printf("row: %d\\n", row);
        for (int i = 0; i < n_features; i++) {
            features[i] = atof(argv[i+row*n_features+1]);
        }
        // calculate outputs for debugging
        int class_idx = predict_class_idx(features);
        // same as calling label = predict(features)
        int label = labels[class_idx];
        
        // now we print the results
        printf("labels: ");
        for (int i=0; i<n_classes; i++){        
            printf("%d ", labels[i]);
        }
        printf("\\n");
        printf("class_idx: %d\\n", class_idx);
        printf("label: %d", label);
        printf("\\n\\n");
    }
    return 0;}

@nok
Copy link
Owner

nok commented Dec 19, 2019

In the next release all internal predictions will be multiprocessed by default. Here is the relevant part:
https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.

What do you think?

@nok
Copy link
Owner

nok commented Dec 19, 2019

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

Thanks for the note! That sounds great. I removed all checks that are related to the operating system:
https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701

@skjerns
Copy link
Author

skjerns commented Dec 21, 2019

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

Thanks for the note! That sounds great. I removed all checks that are related to the operating system:
https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701

great! might be handy to include a gcc_installed() function with printed warnings etc.

edit: Ah, I guess that's done by DEPENDENCIES

@skjerns
Copy link
Author

skjerns commented Dec 21, 2019

In the next release all internal predictions will be multiprocessed by default. Here is the relevant part:
https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682

Great! Nice.

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.

What do you think?

I'll leave it up to you. Having the source code of individual language templates would be feasible I guess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants