Skip to content

Latest commit

 

History

History
173 lines (138 loc) · 5.5 KB

special-escaping-sequences.md

File metadata and controls

173 lines (138 loc) · 5.5 KB

特别转义序列

PCRE 的转义符号例如 \d,\s 以及 \w 等需要特别注意,因为在字符串语义中,反斜线字符 \ 会被 Lua 语言解析器和 Nginx 配置文件解析器在执行前同时处理掉,所以以下代码片段将无法按预期运行:

 # nginx.conf
 ? location /test {
 ?     content_by_lua '
 ?         local regex = "\d+"  -- 这里是错的!!
 ?         local m = ngx.re.match("hello, 1234", regex)
 ?         if m then ngx.say(m[0]) else ngx.say("not matched!") end
 ?     ';
 ? }
 # 结果为 "not matched!"

为避免这个问题,需要双重转义反斜线符号:

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = "\\\\d+"
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # 结果为 "1234"

这里的 \\\\d+,先被 Nginx 配置文件解析器处理成 \\d+ ,再被 Lua 语言解析器处理成 \d+,之后才被执行。

或者,正则表达式模板可以使用 Lua 字符串"长括号"语义写出,其语法形式为 [[...]],在这种情况下,反斜线仅需为 Nginx 配置文件解析器转义一次。

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = [[\\d+]]
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # 结果为 to "1234"

这里,[[\\d+]] 被 Nginx 配置文件解析器处理成 [[\d+]],符合预期。

注意,当正则表达式模板中包括 [...] 序列时,Lua 语言中“更长的长括号”形式 [=[...]=] 是必要的。如果需要,可以将[=[...]=] 作为默认形式。

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = [=[[0-9]+]=]
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # 结果为 "1234"

还有一种转义 PCRE 序列的方法是把 Lua 代码放到外部脚本文件中,通过各种 *_by_lua_file 指令执行。在这种方法中,反斜线仅被 Lua 语言解析器处理,因此只需要转义一次。

 -- test.lua
 local regex = "\\d+"
 local m = ngx.re.match("hello, 1234", regex)
 if m then ngx.say(m[0]) else ngx.say("not matched!") end
 -- 结果为 "1234"

在外部脚本文件中,PCRE 序列如果使用“长括号”形式 Lua 字符串,则无需修改。

 -- test.lua
 local regex = [[\d+]]
 local m = ngx.re.match("hello, 1234", regex)
 if m then ngx.say(m[0]) else ngx.say("not matched!") end
 -- 结果为 "1234"

English source:

PCRE sequences such as \d, \s, or \w, require special attention because in string literals, the backslash character, \, is stripped out by both the Lua language parser and by the Nginx config file parser before processing. So the following snippet will not work as expected:

 # nginx.conf
 ? location /test {
 ?     content_by_lua '
 ?         local regex = "\d+"  -- THIS IS WRONG!!
 ?         local m = ngx.re.match("hello, 1234", regex)
 ?         if m then ngx.say(m[0]) else ngx.say("not matched!") end
 ?     ';
 ? }
 # evaluates to "not matched!"

To avoid this, double escape the backslash:

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = "\\\\d+"
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # evaluates to "1234"

Here, \\\\d+ is stripped down to \\d+ by the Nginx config file parser and this is further stripped down to \d+ by the Lua language parser before running.

Alternatively, the regex pattern can be presented as a long-bracketed Lua string literal by encasing it in "long brackets", [[...]], in which case backslashes have to only be escaped once for the Nginx config file parser.

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = [[\\d+]]
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # evaluates to "1234"

Here, [[\\d+]] is stripped down to [[\d+]] by the Nginx config file parser and this is processed correctly.

Note that a longer from of the long bracket, [=[...]=], may be required if the regex pattern contains [...] sequences. The [=[...]=] form may be used as the default form if desired.

 # nginx.conf
 location /test {
     content_by_lua '
         local regex = [=[[0-9]+]=]
         local m = ngx.re.match("hello, 1234", regex)
         if m then ngx.say(m[0]) else ngx.say("not matched!") end
     ';
 }
 # evaluates to "1234"

An alternative approach to escaping PCRE sequences is to ensure that Lua code is placed in external script files and executed using the various *_by_lua_file directives. With this approach, the backslashes are only stripped by the Lua language parser and therefore only need to be escaped once each.

 -- test.lua
 local regex = "\\d+"
 local m = ngx.re.match("hello, 1234", regex)
 if m then ngx.say(m[0]) else ngx.say("not matched!") end
 -- evaluates to "1234"

Within external script files, PCRE sequences presented as long-bracketed Lua string literals do not require modification.

 -- test.lua
 local regex = [[\d+]]
 local m = ngx.re.match("hello, 1234", regex)
 if m then ngx.say(m[0]) else ngx.say("not matched!") end
 -- evaluates to "1234"

返回目录