Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-label distances #3

Open
PatCH0816 opened this issue Jun 5, 2022 · 3 comments
Open

Cross-label distances #3

PatCH0816 opened this issue Jun 5, 2022 · 3 comments

Comments

@PatCH0816
Copy link

Cross-label weight function

Regarding the cross-label distances, we were able to reproduce the plots in figure 3 on page 7. There is a weight function mentioned in "3.2 Fixing feature space parameters", which does select values relatively close to zero only, but no exact formula is provided. Therefore, we tried to use a linear function, a quadratic function and an exponentially decaying function, but unfortunately we were not able to reproduce the results presented in the paper in figure 4 on page 8. Could you please give us a hint, what kind of function you were using to obtain the plots in figure 4?

X-axis of figure 3 discussion

For our most promising attempt to reproduce the figure 3 on page 7, we used the exact same feature set as described in the chapter "3.1 Feature space" on page 5. Therefore, we would have expected to have a histogram of distances between 0 and pinumber_of_features_in_feature_set. Since we have two features in our feature set, we would have expected the x-axis to span between 0 and 2pi. However, in the figure 3, the range is between 0 and 6 (which is close to 2pi, but not exact). Is this a coincidence based on statistics, that we do not have at least one cross-label distance between 6 and 2pi? Or is the x-axis just trimmed up to the value 6.0?

X-axis of figure 4 discussion

We were not able to reproduce the figure 4 on page 8 at all. As described in chapter "3.3 Chosen market representation", 28 standard price and volume indicators has been used. Therefore, we would expect the x-axis to range between 0 and 3*28 or something like that. Obviously, we did not grasp the point. Could you please elaborate in detail, how we should be able to reproduce this figure? If this figure contains a subset of all 28 features, how should we select those "best" features accordingly?

Caption of figure 4

There is a sentence in the caption of figure 4, which is the following: "Only 0.005% of cross-label-class distances is below 3 on the same dataset as Fig. 3 histogram." We understand, how this percentage 0.005% is computed, but why is it important to mention this percentage of the area up to the very specific value 3? What information does this number provide?

@m1balcerak
Copy link
Owner

m1balcerak commented Jun 16, 2022

Caption of figure 4: Features do separate given labels. It is possible to build a model to separate labels using these features.
X-axis of figure 4 discussion: the figure is zoomed in. The full range is wider. You calculate the label separation and find the right ones with an optimiser. I.e. BOHB mentioned in the paper in section 3.2.
X-axis of figure 3 discussion: it is trimmed.
Cross-label weight function: Please post your figure - it will be easier for me to comment.

@PatCH0816
Copy link
Author

Caption of figure 4: Yes, that is clear, but why is the specific number 3 on the x-axis so important? And what does the percentage 0.005% of the area indicate specifically?

X-axis of figure 4 discussion: Thanks for the clarification :)

X-axis of figure 3 discussion: Thanks for the clarification :)

Cross-label weight function: Of course, these are our figures:
full_paper_cross_label_distances_buy_nothing
full_paper_cross_label_distances_buy_sell
full_paper_cross_label_distances_nothing_sell
Is your figure just trimmed as well?

@quant2008
Copy link

I am also curious about how "Label separation power of a feature set" is calulated. The paper says it is "inverse of an area under a cross�label-class distances histogram weighted by a function to only select values relatively close to zero. Choice of the weighting function depends on the label and the numbers of feature in the feature space".
Could balcerak give the exact formula? Thanks in advance very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants