pre-trained embedding features #110

bmcfee · 2019-07-28T14:00:07Z

Description

pumpp features currently rely on low-level librosa implementations, but we could also have wrappers for pre-trained feature extractors like openl3 and vggish (the latter as implemented by openmic).

There's some details to work out in terms of standardizing the parameters (hop size, etc).

tomxi · 2019-08-10T03:30:14Z

Also crema... which I think is gonna be my use case.

Why would this involve an api change?

bmcfee · 2019-08-12T13:35:43Z

Absolutely. Though with crema, there's a circular dependency problem: crema depends on pumpp, so we can't have pumpp depend on crema.

Probably the best option here is to add a pumpp feature extractor class to each crema model. This should be fairly easy to do. Extending foreign objects (ie from another package) is generally a bad idea, but in this case, we're in control of both packages so it shouldn't be a huge deal.

tomxi · 2019-09-01T03:50:33Z

Caveat about wrapping pre-trained embedding in general:
If the model contains things like custom Keras layers that can’t be pickled directly, then the resulting pump object won’t be picklable, which may come as a surprise to Pumpp users... This also means things like parallelization with pump that wraps an un-deserializeable model will break with hard to parse error messages...

beasteers · 2019-09-11T21:53:48Z

I like the idea of making CremaModel automatically usable as a pumpp. For openl3 and vggish and other models, I feel like it might make sense to have them in pumpp and just not imported by default.

import pumpp
from pumpp.feature.integrations import vggish, openl3, keras
pump = pumpp.Pump(vggish.VGGish(...), openl3.OpenL3(...), keras.H5Model(...))

If we do that, then we don't have to worry about circular dependencies so we could add Crema as well for consistency. Just a thought.

(Or alternatively we can have them imported by default and do:

from pumpp.feature.base import FeatureExtractor

class Crema(FeatureExtractor):
    def __init__(self, name='crema', model_dir='./model', *a, **kw):
        from crema.models.base import CremaModel
        self.model = CremaModel()
        self.model._instantiate(os.path.abspath(model_dir))
        super().__init__(name, *a, **kw)

    def transform_audio(self, y):
        return self.model.outputs(y=y, sr=self.sr)

)
Not sure what the protocol for using models that are outside of the crema repo is tho.

beasteers · 2019-10-02T15:40:09Z

Actually maybe the best option would be to do what @bmcfee suggested (crema model extending Pump) and then add vggish and openl3 as crema models.

bmcfee added enhancement API change labels Jul 28, 2019

bmcfee added this to the 0.5.1 milestone Jul 28, 2019

tomxi mentioned this issue Aug 16, 2019

wrapping CREMA models as PUMPP feature extractors bmcfee/crema#25

Closed

bmcfee modified the milestones: 0.5.1, 0.6.0 Aug 30, 2019

bmcfee removed this from the 0.6.0 milestone Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-trained embedding features #110

pre-trained embedding features #110

bmcfee commented Jul 28, 2019

tomxi commented Aug 10, 2019 •

edited

Loading

bmcfee commented Aug 12, 2019

tomxi commented Sep 1, 2019 •

edited

Loading

beasteers commented Sep 11, 2019

beasteers commented Oct 2, 2019

pre-trained embedding features #110

pre-trained embedding features #110

Comments

bmcfee commented Jul 28, 2019

Description

tomxi commented Aug 10, 2019 • edited Loading

bmcfee commented Aug 12, 2019

tomxi commented Sep 1, 2019 • edited Loading

beasteers commented Sep 11, 2019

beasteers commented Oct 2, 2019

tomxi commented Aug 10, 2019 •

edited

Loading

tomxi commented Sep 1, 2019 •

edited

Loading