Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define floating point numbers as expressions in grammar #21

Open
vonjd opened this issue May 24, 2022 · 17 comments
Open

How to define floating point numbers as expressions in grammar #21

vonjd opened this issue May 24, 2022 · 17 comments

Comments

@vonjd
Copy link

vonjd commented May 24, 2022

How do I define floating point numbers as valid expressions in the grammar? Thank you

@mytarmail
Copy link
Contributor

The question is not very clear.
Can you give an example of what you have and what you want to get.

@vonjd
Copy link
Author

vonjd commented May 26, 2022

I want to get formulas where numbers like 1.434539 or 0.95846 or 27.476503 or any other floating point number but also integers like 1 or 5 are possible coefficients.

Have a look at the following example:

ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
                func = grule(sin, cos, tan, log, sqrt),
                op = grule('+', '-', '*', '/', '^'),
                var = grule(distance, n),
                n = grule(1, 2, 3, 4, 5, 6, 7, 8, 9))

var is of type integer. I want var2 to be of type floating point.

Thank you

@mytarmail
Copy link
Contributor

mytarmail commented May 26, 2022

This is not a solution, more like a trick.
You can not give too large a double vector otherwise the grammar will be very huge

my_var <- round(seq(-10,10,length.out = 100),2)
print(my_var)

[1] -10.00  -9.80  -9.60  -9.39  -9.19  -8.99  -8.79  -8.59  -8.38  -8.18  -7.98  -7.78  -7.58  -7.37  -7.17
 [16]  -6.97  -6.77  -6.57  -6.36  -6.16  -5.96  -5.76  -5.56  -5.35  -5.15  -4.95  -4.75  -4.55  -4.34  -4.14
 [31]  -3.94  -3.74  -3.54  -3.33  -3.13  -2.93  -2.73  -2.53  -2.32  -2.12  -1.92  -1.72  -1.52  -1.31  -1.11
...
ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
                func = grule(sin, cos, tan, log, sqrt),
                op = grule('+', '-', '*', '/', '^'),
                var = grule(distance, n),
                #n = grule(1, 2, 3, 4, 5, 6, 7, 8, 9)
                n = do.call(gsrule, as.list(my_var)) )
grammarDef <- CreateGrammar(ruleDef)
GrammarRandomExpression(grammarDef,numExpr = 10)
[[1]]
expression(tan(-0.71))

[[2]]
expression(log(-3.33 + (distance/1.52/distance + -9.19)))

[[3]]
expression(sqrt(sqrt(tan(distance)/6.57)))

[[4]]
expression(sqrt(distance))

[[5]]
expression(tan(log(distance) + log(-8.99 + distance - sin(tan(-9.8/distance)))))

[[6]]
expression((-3.74)^cos(-9.8) + 2.12)

Even in this humble form, the number of expressions is simply overwhelming.

gramEvol::summary.grammar(grammarDef)
Start Symbol:                <expr> 
Is Recursive:                TRUE 
Tree Depth:                  Limited to 5 
Maximum Rule Choices:        100 
Maximum Sequence Length:     30 
Maximum Sequence Variation:  3 5 100 5 100 5 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 5 100 2 3 2 
No. of Unique Expressions:   **19382472011** 

@vonjd
Copy link
Author

vonjd commented May 27, 2022

Interesting approach... I was hoping for a more elegant solution... won't there be performance issues this way?... anyway, thank you

@mytarmail
Copy link
Contributor

I'm far from an expert, but it seems to me that there is no other way in grammatical evolution.
I would like to hear the opinion of fnoorian

@fnoorian
Copy link
Owner

Thanks @mytarmail but I'm not an expert either! Your implementation is straight-forward, generalizable, and readable.

One other idea is to implement it digit by digit:

floater =  grule(digit * 100 + digit * 10 + digit + digit * 0.1 + digit * 0.01),
digit = grule(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Search space will be more or less the same (10 ^ 5), but it might be a good idea to test this in an actual toy problem against a gvrule(seq(0, 999, by=0.01)).

But reducing the search space will really depend on @vonjd 's problem. For example, what is the search space range? does some sort of non-linear transformation (exponential, tan, or even sin/cos) help map a limited range (e.g. 10 values between -1 to 1) to a broader range? what will be the minimum resolution? Can it be combined with some sort of deterministic optimization (Newton's method etc) that runs as a part of GE?

@mytarmail
Copy link
Contributor

Thanks @mytarmail but I'm not an expert either! Your implementation is straight-forward, generalizable, and readable.

One other idea is to implement it digit by digit:

floater =  grule(digit * 100 + digit * 10 + digit + digit * 0.1 + digit * 0.01),
digit = grule(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Search space will be more or less the same (10 ^ 5), but it might be a good idea to test this in an actual toy problem against a gvrule(seq(0, 999, by=0.01)).

Wow, thanks, cool solution, it was really interesting to see the comparison.

One question, it is possible to make this cumbersome formula displayed as a single number?

@fnoorian
Copy link
Owner

I'm not really sure, but making a function may make it look slightly nicer:

make_fixed_point <- function(a,b,c,d,e) {
    return a * 100 + b * 10 + c + d * 0.1 + e * 0.01
}

Then use it in grammar:

floater =  grule(make_fixed_point(digit, digit,digit, digit,digit)),
digit = grule(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

This will render to something slightly more readable:
make_fixed_point(3, 1, 4, 5, 6)

I named it fixed point as this literally has a fixed decimal point rather than being a true floating point.

@vonjd
Copy link
Author

vonjd commented May 31, 2022

Perhaps it would be more efficient to define reserved placeholders for different numeric types within the package so that you only have to add int or single or double to your grammar which will then be replaced by random numbers during the evaluation of the grammar? I think other symbolic regression engines (e.g. Eureqa) do it this way.

@fnoorian
Copy link
Owner

fnoorian commented Jun 1, 2022

@vonjd, I'm open to suggestions and happy to merge an pull requests.

How do you see these random numbers working? Will these be simple calls to runif? or will they be some sort of constants selected from an already populated list?

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

Just one idea: How about reserving a special word code (perhaps enumerated so that you can have more than one) that can then be defined by custom R code in the grammar? This way you would be totally flexible. Every time the grammar is evaluated the code gets executed and the result is being saved in the respective expression.

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

Even better would be to have something like func() but with code() so that every word can be defined as custom R code. Would that be possible?

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

Ok, so here is my final idea:

ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
                func = grule(sin, cos, tan, log, sqrt),
                op = grule('+', '-', '*', '/', '^'),
                var = grule(distance, n),
                n = grule(code(runif(1))))

code signals the evaluation routine that the code should be executed before it is written into the respective expression.

Does this make sense?

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

or instead of code exec. as a reserved word.

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

Another thing I tried is to define floating-point numbers from the ground up (like other BNF grammars do it, see e.g. here: https://www.gentee.com/doc/syntax/bnf.htm)

ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
                func = grule(sin, cos, tan, log, sqrt),
                op = grule('+', '-', '*', '/', '^'),
                var = grule(distance, n, 'n.n'),
                n = grule(1, 2, 3, 4, 5, 6, 7, 8, 9))

But for some reason the 'n.n' doesn't work, is there any way to make it work?

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

Finally one promising workaround could be the following: frac = grule('/'(n, n))

@vonjd
Copy link
Author

vonjd commented Jun 1, 2022

period <- pi

ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
                func = grule(sin, cos, tan, log, sqrt),
                op = grule('+', '-', '*', '/', '^'),
                var = grule(n, frac),
                n = grule(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25),
                frac = grule('/'(n, n)))

grammarDef <- CreateGrammar(ruleDef)
grammarDef

SymRegFitFunc <- function(expr) {
  result <- eval(expr)
  if (any(is.nan(result)))
    return(Inf)
  return (mean(log(1 + abs(period - result))))
}

This way you can find all kinds of approximations for pi, like the well known 22/7, sqrt(10), log(23), and more complicated ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants