Regular expression

Regular expression

Regular expressions are usually used to retrieve and replace text that meets a certain pattern (rule).

Assertions

Definition: Indicates that a match occurs under certain conditions. Assertions include pre-assertion, post-action and conditional expressions.

One of the components of the assertion is the boundary. For text, words or patterns, boundaries can be used to indicate their beginning or ending part

Boundary class assertion

charactermeaning
^Match the beginning of the input . If the multi-line mode is set to true, ^ can also be matched immediately after a newline character. For example,/^A/cannot match the "A" in "an A", but it can match the first "A" in "An A".
$Match the end of the input . If the multi-line mode is set to true, ^ can also be matched immediately before the newline character. For example,/t$/cannot match the "t" in "eater", but it can match the "t" in "eat".
\bMatches the boundary of a word, which is the position of a character without another character before and after the character of a word, for example, between a letter and a space. It should be noted that the word boundary of the match is not included in the match. In other words, the length of the matching word boundary is zero. For example (1)/\bm/matches "m" in "moon"; (2)/oo\b/does not match "oo" in "moon", because "oo" is followed by "n" Word character; (3)/oon\b/matches "oon" in "moon", because "oon" is the end of this string, so there is no word character after it; (4)/\w\b\w/will It will never match anything, because there will never be non-word characters and word characters after a word character.
\BMatch non-word boundaries . This is the position where the previous character and the next character belong to the same type: either both must be words, or both must be non-words, such as between two letters or between two spaces. The beginning and end of the string are treated as non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match. For example,/\Bon/matches "on" in "at noon", and/ye\B/matches "ye" in "possibly yesterday".

Other assertions

charactermeaning
x(?=y)Assert forward: x matches x when followed by y . For example, for/Jack(?=Sprat)/, "Jack" will be matched only if it is followed by "Sprat"./Jack(?=Sprat
x(?!y)Forward negative assertion: x is not tightly matched by y at any time . For example, for/\d+(?!\.)/, the match will only be obtained if there is no decimal point after the number. For/\d+(?!.)/.exec(3.141), match '141' instead of '3'.
(?<=y)xBackward assertion: matches x if x follows y . For example, for/(?<=Jack)Sprat/, "Sprat" will be matched when "Jack" is immediately followed. For/(?<=Jack
(?<!y)xNegative backward assertion: match x if x does not follow y . For example, for/(?<!-)\d+/, the match will only be obtained if the number does not immediately follow the-sign. For/(?<!-)\d+/.exec(3), "3" is matched. And the result of/(?<!-)\d+/.exec(-3) does not match, this is because there is a-sign before the number.

Example

//Use regular expression boundaries to fix the wrong string buggyMultiline = `tey, ihe light-greon apple tangs on ihe greon traa` ; //1) Use ^ to correct the match between the beginning of the string and after the newline. buggyMultiline = buggyMultiline.replace( /^t/gim , 'h' ); console .log( 1 , buggyMultiline); //Fix'tey'= >'hey' (start of string),'tangs'=>'hangs' (after line break) //2) Use $ to fix the match at the end of the string. buggyMultiline = buggyMultiline.replace( /aa$/gim , 'ee.' ); console .log( 2 , buggyMultiline); //fix'traa' => ' tree'. //3) Use/b to correct the characters on the boundaries of words and spaces. buggyMultiline = buggyMultiline.replace( /\bi/gim , 't' ); console .log( 3 , buggyMultiline); //Fix'ihe' => 'the' does not affect'light'. //4) Use/B to match characters within the boundary of the entity. fixedMultiline = buggyMultiline.replace( /\Bo/gim , 'e' ); console .log( 4 , fixedMultiline); // Fixing'greon ' does not affect'on '. Copy code

Character class

Character classes can distinguish various characters, such as letters and numbers.

charactermeaning
.It has one of the following meanings: (1) Match any single character except the line terminator:/n,/r,/u2028 or/u2029. For example,/.y/matches "my" in "yes make my day" "And "ay" instead of "yes"; (2) In the character set, the dot loses its special meaning and matches the literal dot. It should be noted that the m multiline flag does not change the behavior of the dots . Therefore, to match a pattern across multiple lines, you can use the character set [^] it will match any character, including new lines. ES2018 added the s "dotAll" flag, which allows dots to also match the line terminator.
\dMatch any number (Arabic numbers) . Equivalent to [0-9]. For example,/\d/or/[0-9]/matches the "2" in "B2is the suite number".
\DMatch any non-digit (Arabic numeral) character . Equivalent to [^0-9]. For example,/\D/or/[^0-9]/matches "B" in "B2 is the suite number".
\wMatches any alphanumeric character in the basic Latin alphabet, including underscores . Equivalent to [A-Za-z0-9_]. For example,/\w/matches "a" in "apple", "5" in "$5.28", "3" in "3D" and "m" in " manuel" .
\WMatches any word character that is not from the basic Latin alphabet . Equivalent to [^A-Za-z0-9_]. For example,/\W/or/[^A-Za-z0-9_]/matches "%" in "50%" and " " in " manuel" .
\sMatch single space characters, including spaces, tabs, form feeds, newlines and other Unicode spaces . Equivalent to [/f/n/r/t/v/u00a0/u1680/u2000-/u200a/u2028/u2029/u202f/u205f/u3000/ufeff]. For example,/\s\w*/matches "bar" in "foo bar".
\SMatch a single character other than a space . Equivalent to [^\f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. For example,/\S\w*/matches "foo" in "foo bar".
\tMatch horizontal tabs.
\rMatch carriage return
\nMatch newline
\vMatches vertical tabs.
\fMatch page feed.
[\b]Matches a backspace key

Example

var randomData = "015 354 8787 687351 3512 8735" ; var regexpFourDigits = /\b\d{4}\b/g ; ///b indicates a boundary (ie do not start matching in the middle of a word) ///d{4} indicates a digit, four times ///b indicates another boundary (ie do not end matching in the middle of a word) Console .table (randomData.match (regexpFourDigits)); //[ '8787', '3512', '8735'] duplicated code

Groups and ranges

Group and range indicate the group and range of expression characters

charactermeaning
xy
[xyz] or [ac]Character set, matches any included character . You can use a hyphen to specify a character range, but if the hyphen is displayed as the first or last character in the square brackets, it will be treated as a literal hyphen included in the character set as a normal character. You can also include character classes in the character set. For example, [abcd] is the same as [ad]. They will match "b" in "brisket" and "c" in "chop".
[^xyz] or [^ac]** A negative or supplemented character set **. In other words, it matches any characters that are not enclosed in parentheses. You can specify a character range by using a hyphen, but if the hyphen appears as the first or last character in the square brackets, it will be considered as a normal character included in the character set. For example, [^abc] is the same as [^ac]. They initially match the "o" in "bacon" and the "h" in "chop".
(x)Capture group: match x and remember the match . For example,/(foo)/matches and remembers "foo" in "foo bar". Regular expressions can have multiple capture groups. As a result, it matches a group that is usually captured in an array whose members are in the same order as the left parenthesis in the captured group. This is usually just the order of the capture group itself. This is very important when capturing groups are nested. Use the index of the result element ([1], ..., [n]) or from the properties of the predefined RegExp object (1,...,1, ..., 9).
(?:x)Non-capturing group: match "x", but don't remember the match . The matched substring ([1], ..., [n]) cannot be retrieved from the elements of the result array or from the properties of the predefined RegExp object (1,...,1, ..., 9).

Example

var aliceExcerpt = "There was a long silence after this, and Alice could only hear whispers now and then."; var regexpVowels =/[aeiouy]/g; console.log("Number of vowels:", aliceExcerpt.match(regexpVowels).length); //Number of vowels: 25 let personList = `First_Name: John, Last_Name: Doe First_Name: Jane, Last_Name: Smith`; let regexpNames =/First_Name: (\w+), Last_Name: (\w+)/mg; let match = regexpNames.exec(personList); do { console.log(`Hello ${match[1]} ${match[2]}`); //hello John Doe //hello Jane Smith } while((match = regexpNames.exec(personList)) !== null); Copy code

quantifier

The quantifier indicates the number of characters or expressions to be matched.

charactermeaning
x*Match the preceding item "x" 0 or more times . For example,/bo*/matches the "boooo" in "A ghost booooed" and the "b" in "A bird warbled", but there is no match in "A goat grunt".
x+Match the previous "x" one or more times . Equivalent to {1,}. For example,/a+/matches the "a" in "candy" and the "a" in "caaaaaaandy".
x?Match the preceding item "x" 0 or 1 times . For example,/e ?Le?/matches el in angel and le in angle. If used immediately after any quantifier *, +, ?, or {}, the quantifier is made non-greedy (the minimum number of matches) instead of the default greedy (the maximum number of matches).
x{n}Where "n" is a positive integer, which matches the previous item "x" n times . For example,/a{2}/does not match the "a" in "candy", but it matches all the "a"s in "caandy" and the first two "a"s in "caaandy".
x{n,}Among them, "n" is a positive integer that matches the previous item "x" at least "n" times . For example,/a{2,}/does not match the "a" in "candy", but matches all a in "caandy" and "caaaaaaandy".
x{n,m}Among them, "n" is 0 or a positive integer, "m" is a positive integer, and m> n matches at least the previous item "x" and at most matches "m". For example,/a{1,3}/does not match the "a" in "cndy", the "a" in "candy", the two "a"s in "caandy", and the first three in "caaaaaaandy" "A". Note that when matching "caaaaaaandy", it will match "aaa", even if there are more "a" in the original string.
x*? or x+? or x?? or x{n}? or x{n,}? or x{n,m}?By default, quantifiers like * and + are "greedy" , which means they try to match as many strings as possible. The characters after the quantifier make the quantifier "non-greedy": meaning that it stops as soon as it finds a match . For example, given a string "some <foo> <bar> new </bar> </foo> thing": (1)/<.*>/matches "<foo> <bar> new </bar> </foo>" (2)/<.*?>/matches "<foo>"

Example

var singleLetterWord =/\b\w\b/g; var notSoLongWord =/\b\w{1,6}\b/g; var loooongWord =/\b\w{13,}\b/g; var sentence = "Why do I have to learn multiplication table?"; console.table(sentence.match(singleLetterWord));//["I"] console.table(sentence.match(notSoLongWord));//["Why", "do", "I", "have", "to", "learn", "table"] console.table(sentence.match(loooongWord));//["multiplication"] optional optional characters Copy code

Regular expression modifiers

Modifiers can be used for more global searches that are not case sensitive

characterdescription
iPerform case-insensitive matching.
gPerform a global match (find all matches instead of stopping after the first match is found).
mPerform multi-line matching.

Use regular expressions

Regular expressions can be used in RegExp's exec and test methods and String's match, replace, search and split methods.

methoddescription
execA RegExp method that searches for a match in a string, it returns an array (if no match is found, it returns null).
testA RegExp method that tests for a match in a string, it returns true or false.
matchA String method that searches for a match in a string, it returns an array, and returns null if there is no match.
matchAllA String method that finds all matches in a string and returns an iterator.
searchA String method that tests for a match in a string. It returns the index of the matched position, or -1 if it fails.
replaceA String method to find a match in a string, and replace the matched substring with a replacement string.
splitA String method that uses regular expressions or a fixed string to separate a string, and stores the separated substring in an array.

Greedy matching and non-greedy matching

These are the two modes of regular matching

  • Greedy matching: match the longest string as possible
  • Non-greedy matching: match the shortest string possible

By default, the matching is greedy mode. The matching is from back to front . For the maximum length of matching , lazy matching is to add one after the quantifier? Match from the front of the string, the minimum length of the match

Whether to use greedy mode or non-greedy mode depends on our needs. But one thing is that the performance of non-greedy mode must be higher than that of greedy mode .

Example

Greedy match

vars ="abbbaabbbaaabbb1234"; varre1=/.*bbb/g;//* is a greedy quantifier re1.test(s); Copy code

This matching process will start from the entire string:

re1.test("abbbaabbbaaabbb1234");//false, then remove the last character 4 and continue re1.test("abbbaabbbaaabbb123");//false, remove the last character 3 and continue re1.test("abbbaabbbaaabbb12");//false, then remove the last character 2 and continue re1.test("abbbaabbbaaabbb1");//false, remove the last character 1 and continue re1.test("abbbaabbbaaabbb");//true, end Copy code

Non-greedy match

vars ="abbbaabbbaaabbb1234"; varre1=/.*?bbb/g;//*? is a lazy quantifier re1.test(s); Copy code

Its matching process is as follows:

re1.test("a");//false, add another re1.test("ab");//false, add another re1.test("abb");//false, add another re1.test("abbb");//true, match, save this result, and then start from the next re1.test("a");//false, add another re1.test("aa");//false, add another re1.test("aab");//false, add another re1.test("aabb");//false, add another re1.test("aabbb");//true, match, save this result, and start from the next Copy code