主页 > 教程 > 正则表达式 > 量词

预定义字符类

➜

量词

➜

捕获组

系列中的上一篇: 预定义字符类

系列中的下一篇: 捕获组

量词

量词允许您指定要匹配的出现次数。为了方便起见，下面列出了描述贪婪、勉强和占有量词的 Pattern API 规范的三个部分。

乍一看，量词 X?、X?? 和 X?+ 似乎做着完全相同的事情，因为它们都承诺匹配“X，一次或根本不匹配”。在本文档的末尾将解释一些细微的实现差异。

贪婪	勉强	占有	含义
`X?`	`X??`	`X?+`	`X`，一次或根本不匹配
`X*`	`X*?`	`X*+`	`X`，零次或多次
`X+`	`X+?`	`X++`	`X`，一次或多次
`X{n}`	`X{n}?`	`X{n}+`	`X`，恰好 `n` 次
`X{n,}`	`X{n,}?`	`X{n,}+`	`X`，至少 `n` 次
`X{n,m}`	`X{n,m}?`	`X{n,m}+`	`X`，至少 `n` 次，但不超过 `m` 次

让我们从创建三个不同的正则表达式开始，这些表达式由字母“a”后跟 ?、* 或 + 组成。让我们看看当这些表达式针对空输入字符串 "" 进行测试时会发生什么。

Enter your regex: a?
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+
Enter input string to search: 
No match found.

零长度匹配

在上面的示例中，由于表达式 a? 和 a* 都允许字母 a 出现零次，因此匹配在头两个情况下成功。您还会注意到，开始和结束索引都是零，这与我们迄今为止看到的任何示例都不一样。空输入字符串 "" 没有长度，因此测试只是在索引 0 处匹配任何内容。这种匹配被称为零长度匹配。

零长度匹配可能发生在以下几种情况下

在空输入字符串中，
在输入字符串的开头，
在输入字符串的最后一个字符之后，或在输入字符串的任何两个字符之间。

零长度匹配很容易识别，因为它们总是从相同的索引位置开始和结束。

让我们通过几个例子来探索零长度匹配。将输入字符串更改为单个字母“a”，您会注意到一些有趣的事情。

Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

所有三个量词都找到了字母“a”，但前两个还在索引 1 处找到了零长度匹配；也就是说，在输入字符串的最后一个字符之后。请记住，匹配器将字符“a”视为位于索引 0 和索引 1 之间的单元格中，并且我们的测试工具会循环，直到它不再找到匹配项。根据所使用的量词，最后一个字符后的“无”的存在可能会或可能不会触发匹配。

现在将输入字符串更改为连续五个字母“a”，您将得到以下结果。

Enter your regex: a?
Enter input string to search: aaaaa
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 1 and ending at index 2.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "a" starting at index 3 and ending at index 4.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a*
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a+
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.

表达式 a? 为每个字符找到一个单独的匹配项，因为它在“a”出现零次或一次时匹配。表达式 a* 找到了两个单独的匹配项：第一个匹配项中的所有字母“a”，然后是索引 5 处最后一个字符后的零长度匹配项。最后，a+ 匹配字母“a”的所有出现，忽略最后一个索引处“无”的存在。

此时，您可能想知道如果前两个量词遇到除“a”以外的字母，结果会是什么。例如，如果它遇到字母“b”，如“ababaaaab”中那样，会发生什么？

让我们来找出答案。

Enter your regex: a?
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.

即使字母“b”出现在单元格 1、3 和 8 中，输出仍然报告在这些位置存在零长度匹配。正则表达式 a? 并不是专门寻找字母“b”；它只是在寻找字母“a”的存在（或不存在）。如果量词允许“a”出现零次，则输入字符串中任何不是“a”的内容都会显示为零长度匹配。其余的 a 将根据前面示例中讨论的规则进行匹配。

要精确匹配 n 次的模式，只需在花括号中指定数字即可。

Enter your regex: a{3}
Enter input string to search: aa
No match found.

Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.

这里，正则表达式 a{3} 在一行中搜索字母“a”的三个出现。第一个测试失败，因为输入字符串中没有足够的 a 来匹配。第二个测试包含输入字符串中恰好 3 个 a，这会触发匹配。第三个测试也会触发匹配，因为输入字符串的开头有恰好 3 个 a。任何后续内容与第一个匹配无关。如果模式应该在那之后再次出现，它将触发后续匹配。

Enter your regex: a{3}
Enter input string to search: aaaaaaaaa
I found the text "aaa" starting at index 0 and ending at index 3.
I found the text "aaa" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

要要求模式至少出现 n 次，在数字后面添加一个逗号。

Enter your regex: a{3,}
Enter input string to search: aaaaaaaaa
I found the text "aaaaaaaaa" starting at index 0 and ending at index 9.

使用相同的输入字符串，此测试只找到一个匹配项，因为连续 9 个 a 满足了“至少” 3 个 a 的要求。

最后，要指定出现次数的上限，在花括号中添加第二个数字。

Enter your regex: a{3,6} // find at least 3 (but no more than 6) a's in a row
Enter input string to search: aaaaaaaaa
I found the text "aaaaaa" starting at index 0 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

这里，第一个匹配项被迫在 6 个字符的上限处停止。第二个匹配项包括任何剩余的字符，恰好是三个 a——此匹配允许的最小字符数。如果输入字符串短一个字符，将不会有第二个匹配项，因为只剩下两个 a。

捕获组和带有量词的字符类

到目前为止，我们只在包含一个字符的输入字符串上测试了量词。实际上，量词一次只能附加到一个字符，因此正则表达式 abc+ 将意味着“a，后跟 b，后跟 c 一次或多次”。它不意味着“abc”一次或多次。但是，量词也可以附加到字符类和捕获组，例如 [abc]+（a 或 b 或 c，一次或多次）或 (abc)+（组“abc”，一次或多次）。

让我们通过在连续三行中指定组 (dog) 来进行说明。

Enter your regex: (dog){3}
Enter input string to search: dogdogdogdogdogdog
I found the text "dogdogdog" starting at index 0 and ending at index 9.
I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.

这里，第一个示例找到了三个匹配项，因为量词应用于整个捕获组。但是，删除括号，匹配将失败，因为量词 {3} 现在只应用于字母“g”。

类似地，我们可以将量词应用于整个字符类。

Enter your regex: [abc]{3}
Enter input string to search: abccabaaaccbbbc
I found the text "abc" starting at index 0 and ending at index 3.
I found the text "cab" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.
I found the text "ccb" starting at index 9 and ending at index 12.
I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.

这里，量词 {3} 在第一个示例中应用于整个字符类，但在第二个示例中只应用于字母“c”。

贪婪、勉强和占有量词之间的差异

贪婪、勉强和占有量词之间存在细微的差异。

贪婪量词被认为是“贪婪的”，因为它们迫使匹配器在尝试第一个匹配之前读取或“吃掉”整个输入字符串。如果第一次匹配尝试（整个输入字符串）失败，匹配器会将输入字符串退回一个字符，然后再次尝试，重复此过程，直到找到匹配项或没有更多字符可以退回。根据表达式中使用的量词，它将尝试匹配的最后一件事是 1 个或 0 个字符。

但是，勉强量词采用相反的方法：它们从输入字符串的开头开始，然后勉强地一次吃掉一个字符，寻找匹配项。它们尝试的最后一件事是整个输入字符串。

最后，占有量词总是吃掉整个输入字符串，尝试一次（并且只尝试一次）进行匹配。与贪婪量词不同，占有量词永远不会退回，即使这样做会使整体匹配成功。

为了说明，请考虑输入字符串 xfooxxxxxxfoo。

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

第一个示例使用贪婪量词 .* 来查找“任何内容”，零次或多次，后跟字母“f” “o” “o”。由于量词是贪婪的，因此表达式 .* 部分首先吃掉整个输入字符串。此时，整体表达式无法成功，因为最后三个字母（“f” “o” “o”）已经被消耗掉了。因此，匹配器会慢慢地一次退回一个字母，直到最右侧的“foo”被吐出来，此时匹配成功，搜索结束。

但是，第二个示例是勉强的，因此它首先从“无”开始。由于“foo”没有出现在字符串的开头，因此它被迫吞下第一个字母（一个“x”），这会在 0 和 4 处触发第一个匹配。我们的测试工具会继续此过程，直到输入字符串被耗尽。它在 4 和 13 处找到了另一个匹配项。

第三个示例无法找到匹配项，因为量词是占有的。在这种情况下，整个输入字符串被 .*+ 消耗掉，没有留下任何东西来满足表达式末尾的“foo”。在您希望抓住所有东西而永远不退回的情况下，请使用占有量词；在没有立即找到匹配项的情况下，它将优于等效的贪婪量词。

在本教程中

预定义字符类零长度匹配捕获组和带有量词的字符类贪婪、勉强和占有量词之间的差异

最后更新： 2022年1月10日

预定义字符类

➜

量词

➜

捕获组

系列中的上一篇: 预定义字符类

系列中的下一篇: 捕获组

主页 > 教程 > 正则表达式 > 量词