-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vector #102
Comments
That's the intended behavior, sorry if it's not documented well enough. (This is still very much a WIP!) The intended use case here is for things like parametric embedding approximation, where you map an entire recording down to a fixed-dimensional vector. This comes up in things like latent factor approximation in collaborative filtering. If you want to broadcast the vector out over time, you'd have to know the target extent (number of frames), which isn't generally known with independent transformers. Eg your vector transformer would have to know about the CQT transformer inside the pump, and they don't currently support that kind of behavior. Probably your best bet is to reshape it at run-time (eg sampling during training), or design your model to be time-invariant to cut down on redundant memory consumption. |
Let me explain more generally my problem. Maybe you will be able to help me ? I want my work to be easy to benchmark, I naively tried to fit my annotations in the JAMS standard but I am having a hard time ; the basic chord was doing the trick in the original specification ; but as I understand the changes, I cannot use it anymore ; because chords now follow a precise syntax .. stuff like A7, G9... Now I thought the vector would work but it looks like I indeed misunderstood what it stood for. Any suggestions ? Regarding your last paragraph ("if you want..") I don't see the difference between the broadcasting of chords and what I am asking for (except the chord syntax issue...). Surely your ChordTransformer synchronises with your CQT transformer, right? Is there a way I could use that one? |
Ah, I see. That's an interesting setup, and not one that I've thought too carefully about, but yeah, it ought to be possible.
One option might be to model them as tags, rather than dense vector data. If you have some scalar value associated with them (eg the amount of vibrato, or something like that), you could pack that into the
It doesn't actually synchronize to the feature transformer. Rather, it samples the annotations at a specified rate (given clumsily in terms of sampling rate and hop length, to make it easier to parametrize in terms of audio). The reason for this decision is that the typical use-case for pumpp has features going through a model, and then being compared to the task outputs. Models often have some change of resolution associated with them (eg pooling in time or downsampling), so this lets us generate output frames to match whatever the rate of the model is, rather than being tied to the rate given by the input features. As I said above, the vector transformer wasn't really designed for this kind of use case because I hadn't considered time-varying vector data. We definitely could add a |
I can use tag_open with a string which is made of my 6 integers separated with say, spaces. I don't even need the confidence field. And then down the line process the tensors to separate the 6 dimensions. |
Oh I was just thinking of each of your six integers getting their own tag. ( |
in contemporary music, it's a bit more complex than that :
This model could be used for all the strings but we would need others for wind instruments, brasses and percussions. |
Sorry to bother you again. You said : "Then you could use DynamicTaskTransformer directly." |
Hi More importantly : Once this is done I have over 18 hours of cello to process... |
Following up on this: why use integers instead of independent tags for each of the values? |
Sorry being originally a composer I am a bit new to all this. |
Description
Example: strange behavior when trying to use JAMS vector annotations in pump
Steps/Code to Reproduce
from future import division, print_function, unicode_literals
import numpy as np
import jams
import pumpp
from librosa import note_to_hz
audio_f='VSL440Rev0.aif'
jams_f='VSL440Rev0.jams'
sr, hop_length = 44100, 512
p_cqt = pumpp.feature.CQT(name='cqt', sr=sr, hop_length=hop_length, n_octaves=8, over_sample=1, fmin = note_to_hz('C2'), log=True)
p_vector = pumpp.task.VectorTransformer(name="classes", namespace="vector", dimension=6, dtype=np.int32)
pump = pumpp.Pump(p_cqt, p_vector)
data = pump(audio_f=audio_f, jam=jams_f)
print (data['cqt/mag'].shape)
print (data['cqt/phase'].shape)
print (data['classes/vector'].shape)
print(data['classes/vector'])
print (data ['classes/_valid'].shape)
print (data ['classes/_valid'])
Expected Results
My 2157 seconds long audio has been annotated with a 6-dimensional vector using the JAMS format.
I ask for a 96-bin CQT, I get 2 tensors with 185815 frames of magnitude and phase CQT, so far so good.
But then when I look at the (vector) annotations, I have only one frame which matches the first annotation in my file, and that's it. I am obviously expecting a Tensor with an eventually smaller number of valid frames -shaped say 185652 x 6
I used pumpp.task.VectorTransformer because it seemed the most obvious candidate to do the job. But I saw it inherited from a class called BaseTaskTransformer and its init forced sr and hopsize to 1.
I tried to change them with these two lines of code :
p_vector.sr=sr
p_vector.hop_length=hop_length
but it would not change the result
What am I missing ? If this is not the right class to use to process vector annotations, please tell me that there is another one.. ;-)
Actual Results
(1, 185815, 96)
(1, 185815, 96)
(1, 6)
[[ 0 2 0 3 10 -1]]
(1, 2)
[[ 0 185652]]
Versions
Darwin-16.7.0-x86_64-i386-64bit
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
NumPy 1.13.1
SciPy 1.2.0
librosa 0.6.2
Here is my actual JAMS vector annotation file (I had to add the .txt extension to upload it here) :
VSL440Rev0.jams.txt
NB Its syntax was checked with jams.load and it is correct.
The text was updated successfully, but these errors were encountered: