Regular Expression (RegEx) Basics
RegEx is used in programming languages to match specific parts of strings
egular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.
- We create pattern which helps to match the strings that we want
- Javascript has multiple ways to use RegExs
method to match the string’s presence
Test()
let myString = "I am Perumal Jegan!";
let myRegex = /Jegan/;
myRegex.test(myString) // .test() method used to test for specific sting from a line
Match Literal Strings
let string = "Hello, my name is perumal jegan and i am 27";
let regex = /Name/;
let result = regex.test(string);
Result would be false as N is case sensitive
Match Literal Strings with different possibilities
This is powerful to search single strings, but it’s limited to only one pattern. You can search for multiple patterns using the alternation
or OR
operator: |
.
This operator matches patterns either before or after it. For example, if you wanted to match the strings Perumal
or Jegan
, the regex you want is /Perumal|Jegan/
.
let myString = "I am Perumal Jegan!";
let myRegex = /Jegan|Perumal/;
myRegex.test(myString) // .test() method used to test for specific sting from a line
Ignore case while matching
Ignore case flag i
, You can use it by appending it to the regex.
this case flag i
will simply ignore if any case (upper or lower) mismatches and matches the whole string
let myString = "freeCodeCamp";
let fccRegex = /freecodecaMp/i; // i appended to search string for ignore case flag
let result = fccRegex.test(myString);
Extract Matches
To Extract the exact macthes with match()
method
let extractStr = "I am perumal Jegan and i'm 27";
let codingRegex = /Jegan/; // Change this line
let result = extractStr.match(codingRegex); // Change this line
console.log(result);
Find more than the first match
To search or extract a pattern more than once, you can use the g
flag
let testStr = "Repeat, Repeat, Repeat";
let ourRegex = /Repeat/g;
result = testStr.match(ourRegex);
console.log(result);
it would result [ 'Repeat', 'Repeat', 'Repeat' ]
flag g
would extract all the patterns though it matches more than once.
We may use 2 flags consecutively at regex
let testStr = "Repeat, RepeAt, repeat";
let ourRegex = /Repeat/ig;
result = testStr.match(ourRegex);
console.log(result);
it would result [ 'Repeat', 'RepeAt', 'repeat' ]
Match Anything with wildcard period
The wildcard character will match any one character. The wildcard is also called dot
and period
. You can use the wildcard character just like any other character in the regex.
let humStr = "I'll hum a song";
let hugStr = "Bear hug";
let huRegex = /hu./; // `.` is the wildcard period in RegEx
result1 = huRegex.test(humStr);
result2 = huRegex.test(hugStr);
console.log(result1);
console.log(result2);
results : true true
Match Single Character with Multiple Possibilities
you want to match bag
, big
, and bug
but not bog
. You can create the regex /b[aiu]g/
to do this. The [aiu]
is the character class that will only match the characters a
, i
, or u
let quoteSample = "Beware of bugs in the above code; I have only proved it correct, not tried it.";
let vowelRegex = /[aeiou]/ig;
let result = quoteSample.match(vowelRegex);
console.log(result);
above code will extract all the
vowels (aeiou)
Result would be :-
[ ‘e’,
‘a’,
‘e’,
‘o’,
‘u’,
‘i’,
‘e’,
‘a’,
‘o’,
‘e’,
‘o’,
‘e’,
‘I’,
‘a’,
‘e’,
‘o’,
‘o’,
‘e’,
‘i’,
‘o’,
‘e’,
‘o’,
‘i’,
‘e’,
‘i’ ]
let bigStr = "big bag bug";
let bgRegex = /b[aiu]g/g;
result1 = bigStr.match(bgRegex);
console.log(result1);
it would result : [ 'big', 'bag', 'bug' ]
Match Letters of the Alphabet (Range of characters)
Inside a character set, you can define a range of characters to match using a hyphen character: -
.
For example, to match lowercase letters a
through e
you would use [a-e]
.
let catStr = "cat bat Eat mat pat";
let xatStr = "Xat"
let bgRegex = /[a-e]at/ig;
result1 = catStr.match(bgRegex);
result2 = xatStr.match(bgRegex);
console.log(result1);
console.log(result2);
This would result :[ 'cat', 'bat', 'Eat' ] null
Match Numbers and Letters of the Alphabet
Using the hyphen (-
) to match a range of characters is not limited to letters. It also works to match a range of numbers.
/[0-5]/
matches any number between 0
and 5
, including the 0
and 5
it is possible to combine a range of letters and numbers in a single character set.
let peruStr = "perumal17091995";
let myRegex = /[a-z0-6]/ig;
result = peruStr.match(myRegex);
console.log(result);
Results : [ 'p', 'e', 'r', 'u', 'm', 'a', 'l', '1', '0', '1', '5' ]
Match Single Characters Not Specified (Negation)
you can also create a set of characters that you do not want to match. These types of character sets are called negated character sets.
you place a caret character (^
) after the opening bracket and before the characters you do not want to match.
For example, /[^aeiou]/gi
matches all characters that are not a vowel. Note that characters like .
, !
, [
, @
, /
and white space are matched – the negated vowel character set only excludes the vowel characters.
let quoteSample = "3 blind mice.";
let myRegex = /[^0-9aeiou]/ig;
let result = quoteSample.match(myRegex);
console.log(result);
Results : [ ' ', 'b', 'l', 'n', 'd', ' ', 'm', 'c', '.' ]
Match Characters that Occur One or More Times (consecutive characters)
+
character to check character or pattern has to be present consecutively. That is, the character has to repeat one after the other. ( occur one or more times.)
For example, /a+/g
would find one match in abc
and return ["a"]
. Because of the +
, it would also find a single match in aabc
and return ["aa"]
.
If it were instead checking the string abab
, it would find two matches and return ["a", "a"]
because the a
characters are not in a row – there is a b
between them
let difficultSpelling = "Mississippsi";
let myRegex = /s+/g; // Matched consecutive 's' characters
let result = difficultSpelling.match(myRegex);
results [ 'ss', 'ss', 's' ]
Match Characters that Occur Zero or More Times
*
matches character that occurs zero or more times
let soccerWord = "gooooooooal!";
let gPhrase = "gut feeling";
let oPhrase = "over the moon";
let goRegex = /go*/;
result1 = soccerWord.match(goRegex);
result2 = gPhrase.match(goRegex);
result3 = oPhrase.match(goRegex);
console.log(result1);
console.log(result2);
console.log(result3);
results :[ 'goooooooo' ] [ 'g', 'g' ] null
let chewieQuote = "Aaaaaaaaaaaaaaaarrrgh!";// Only change code below this line
let chewieRegex = /Aa*/; // '*' matches zero times or more occurances of 'a'
let result = chewieQuote.match(chewieRegex);
console.log(result);
results : [ 'Aaaaaaaaaaaaaaaa' ]
Find Characters with Lazy Matching & Greedy Matching
- a greedy match finds the longest possible part of a string that fits the regex pattern.
- The alternative is called a lazy match, which finds the smallest possible part of the string that satisfies the regex pattern.
- You can apply the regex
/t[a-z]*i/
to the string"titanic"
. This regex is basically a pattern that starts witht
, ends withi
, and has some letters in between.
Greedy Match
let text = "titanic";
let myRegex = /t[a-z]*i/; // Change this line
let result = text.match(myRegex);
console.log(result);
results `[ 'titani' ]`
Lazy Match //chooses Zero occurances of [a-z]* and results ti alone
let text = "titanic";
let myRegex = /t[a-z]*?i/; // Change this line
let result = text.match(myRegex);
console.log(result);
results `[ 'ti' ]`
Greedy Match
# Greedy Match
let text = "<h1>Winter is coming</h1>";
let myRegex = /<.*>/; // .* ---> Zero or more occurances of Anything
let result = text.match(myRegex);
console.log(result);
results `[ '<h1>Winter is coming</h1>' ]`
Lazy Match
let text = "<h1>Winter is coming</h1>";
let myRegex = /<.?>/; // . ---> Zero or more occurances of Anything
let result = text.match(myRegex);
console.log(result);
results `[ ‘<h1>’ ]`
Find One or More Criminals in a Hunt `Character : +`
#Scenario
A group of criminals escaped from jail and ran away, but you don’t know how many. However, you do know that they stay close together when they are around other people. You are responsible for finding all of the criminals at once.
A criminal is represented by the capital letter `C`.
`/z+/` matches the letter `z` when it appears one or more times in a row
javascript
let EscaptedCriminals = /C+/;
Match Beginning String Patterns
Regex are also used to **search for patterns in specific positions in strings**. you used the caret character (`^`) inside a character set to create a negated character set in the form `[^thingsThatWillNotBeMatched]`. Outside of a character set, the caret `^` is **used to search for patterns at the beginning of strings**.
javascript
let firstString = "Ricky is first and can be found.";
let firstRegex = /^Ricky/; // ^ caret sign to represent beginning of the line
result1 = firstRegex.test(firstString);
let notFirst = "You can't find Ricky now.";
result2 = firstRegex.test(notFirst);
console.log(result1);
console.log(result2);
returns
`true
false`
Match Ending String Patterns
We can search the end of strings using the dollar sign character `$` at the end of the regex
javascript
let theEnding = "This is a never ending story";
let storyRegex = /story$/; // $ sign to represent the end of the line.
result1 = storyRegex.test(theEnding);
let noEnding = "Sometimes a story will have to end";
result2 = storyRegex.test(noEnding);
console.log(result1);
console.log(result2);
returns `true false`
Match All Letters and Numbers `\w` – Shortcut to all character set
`[A-Za-z0-9_]+` Characterset to match all the characters and numbers,.
`\w+` is equivalent of `[A-Za-z0-9_]+` Note, this character class also includes the underscore character (`_`).
Shortcur to match all characters and number **\w**
js
let longHand = /[A-Za-z0-9_]+/;
let shortHand = /\w+/; // shortcut to all character set
let numbers = "42";
let varNames = "important_var";
result1 = numbers.match(longHand)
result2 = numbers.match(shortHand)
result3 = varNames.match(longHand)
result4 = varNames.match(shortHand)
console.log(result1);
console.log(result2);
console.log(result3);
console.log(result4);
results
`[ ’42’, index: 0, input: ’42’, groups: undefined ]
[ ’42’, index: 0, input: ’42’, groups: undefined ]
[ ‘important_var’ ]
[ ‘important_var’ ]`
To print the number of characters ‘length’ method is used
let quoteSample = "The five boxing wizards jump quickly.";
let alphabetRegexV2 = /\w/g; // Change this line
let result = quoteSample.match(alphabetRegexV2).length;
console.log(result);
results `31`
Match Everything Except Letters , Numbers & Underscore `\W` (Opposite of `\w` )
matched everything except `[A-Za-z0-9_]` , > `[^A-Za-z0-9_]` equivalent is `\W`
js
let shortHand = /\W/;
let numbers = "42%";
let sentence = "Coding!";
result1 = numbers.match(shortHand);
result2 = sentence.match(shortHand);
console.log(result1);
console.log(result2);
results `[ ‘%’ ], [ ‘!’ ]`
js
let quoteSample = "The five boxing wizards jump quickly.";
let nonAlphabetRegex = /\W/g; // \W equivalent of [^A-Za-z0-9_]
let result = quoteSample.match(nonAlphabetRegex).length;
console.log(result);
results `6`
Match All Numbers
Another common shortcur is looking for just *digits or numbers*. > `\d` This is equal to the character class `[0-9]`
js
let movieName = "2001: A Space Odyssey";
let numRegex = /\d/g; // Change this line
let result = movieName.match(numRegex).length;
console.log(result);
results `4`
js
let quoteSample = "2001: A Space Odyssey";
let nonAlphabetRegex = /\d+/g; // \d equivalent of [0-9]
let result = quoteSample.match(nonAlphabetRegex);
console.log(result);
results `[ ‘2001’ ]`
Match All Non-Numbers `\D` (Opposite of `\d` )
`\D` is equivalent of `[^0-9]`
js
let movieName = "2001: A Space Odyssey";
let noNumRegex = /\D/g; //
let result = movieName.match(noNumRegex).length;
console.log(result);
results `17`
js
let quoteSample = "2001: A Space Odyssey";
let nonAlphabetRegex = /\D+/g; // Change this line
let result = quoteSample.match(nonAlphabetRegex);
console.log(result);
results `[ ‘: A Space Odyssey’ ]`
Restrict Possible Usernames (Username Validator)
- Usernames are used everywhere on the internet. They are what give users a unique identity on their favorite sites
- You need to check all the usernames in a database. Here are some simple rules that users have to follow when creating their username.
- Usernames can only use alpha-numeric characters. The only numbers in the username have to be at the end. There can be zero or more of them at the end. Username cannot start with the number.
- Username letters can be lowercase and uppercase.
- Usernames have to be at least two characters long. A two-character username can only use alphabet letters as characters.
`{}` *indicates number of times the previous thing can match*
// 1. Usernames can only use alpha-numeric characters.
// 2. The only numbers in the username have to be at the end. There can be zero or more of them at the end. Username cannot start with the number.
// 3. Username letters can be lowercase and uppercase.
// 4. Usernames have to be at least two characters long. A two-character username can only use alphabet letters as characters.
let username = "JackOfAllTrades";
let userCheck = /^[A-Za-z]{2,}\d*$/; //{}-indicates number of times the previous thing can match,if it should 2-8 letters{2,8}
let result = username.match(userCheck);
console.log(result);
// * Above script will fulfill the conditions,.
Match Whitespace
You can search for whitespace using `\s`, which is a lowercase `s`.
This pattern not only matches whitespace, but also carriage return, tab, form feed, and new line characters.
You can think of it as similar to the character class `[ \r\t\f\n\v]`.
js
let whiteSpace = "Whitespace. Whitespace everywhere!"
let spaceRegex = /\s/g;
result = whiteSpace.match(spaceRegex);
console.log(result);
results `[ ‘ ‘, ‘ ‘ ]`
js
let sample = "Whitespace is important in separating words";
let countWhiteSpace = /\s/g;
let result = sample.match(countWhiteSpace);
console.log(result);
results `[ ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘ ]`
Match Non-Whitespace Characters `\S` Opposite to `\s`
let whiteSpace = "Whitespace. Whitespace everywhere!"
let nonSpaceRegex = /\S/g;
result = whiteSpace.match(nonSpaceRegex);
console.log(result);
results `[ ‘W’, ‘h’, ‘i’, ‘t’, ‘e’, ‘s’, ‘p’, ‘a’, ‘c’, ‘e’, ‘.’, ‘W’, ‘h’, ‘i’, ‘t’, ‘e’, ‘s’, ‘p’, ‘a’, ‘c’, ‘e’, ‘e’, ‘v’, ‘e’, ‘r’, ‘y’, ‘w’, ‘h’, ‘e’, ‘r’, ‘e’, ‘!’ ]`
Specify Upper and Lower Number of Matches
‘+’ to look for one or more characters
‘*’ to look for zero or more characters. These are convenient but sometimes you want to match a certain range of patterns.
We can specify the lower and upper number of patterns with *quantity* specifiers.
Quantity specifiers are used with curly brackets (`{` and `}`). You put two numbers between the curly brackets – for the lower and upper number of patterns.
For example, to match only the letter `a` appearing between `3` and `5` times in the string `ah`, your regex would be `/a{3,5}h/`.
js
let A4 = "aaaah";
let A2 = "aah";
let multipleA = /a{3,5}h/;
result1 = multipleA.test(A4);
result2 = multipleA.test(A2);
console.log(result1,result2);
js
let ohStr = "Ohhh no";
let ohRegex = /Oh{3,6}\sno/; // 3 to 6 times of 'h' & \s to represent space character
let result = ohRegex.test(ohStr);
console.log(result);
results `true`
Specify Only the Lower Number of Matches
You can specify the lower and upper number of patterns with quantity specifiers using curly brackets. **Sometimes you only want to specify the lower number of patterns with no upper limit.**
For example, to match only the string `hah` with the letter `a` appearing at least `3` times, your regex would be `/ha{3,}h/`
js
let A4 = "haaaah";
let A2 = "haah";
let A100 = "h" + "a".repeat(100) + "h"; // repeat(n) function to repeats function 'n times'
let multipleA = /ha{3,}h/; // lower limitspecified while upper limit left empty
multipleA.test(A4);
multipleA.test(A2);
multipleA.test(A100);
results `true` `false` `true`
js
let A4 = "haaaah";
let A2 = "haah";
let A100 = "h" + "a".repeat(100) + "h";
let multipleA = /ha{3,}h/;
result1 = multipleA.test(A4);
result2 = multipleA.test(A2);
result3 = A100.match(multipleA);
console.log(result1,result2,result3);
results `true false [ ‘haaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah’,`
Specify Exact Number of Matches
For example, to match only the word `hah` with the letter `a` `3` times, your regex would be `/ha{3}h/`.
js
let A4 = "haaaah";
let A2 = "haah";
let A100 = "h" + "a".repeat(4) + "h";
let multipleA = /ha{4}h/;
result1 = multipleA.test(A4);
result2 = multipleA.test(A2);
result3 = A100.match(multipleA);
console.log(result1,result2,result3);
results `true false [ ‘haaaah’ ]`
Check for All or None
- Sometimes the patterns you want to search for may have parts of it that may or may not exist. However, it may be important to check for them nonetheless.
- You can specify the possible existence of an element with a question mark, `?`. This checks for zero or one of the preceding element. You can think of this symbol as saying the previous element is optional.
- For example, **there are slight differences in American and British English and you can use the question mark to match both spellings.**
js
// Both the statements are True
let american = "color";
let british = "colour";
let rainbowRegex= /colou?r/; // add ? if the preceding element it doubtful
rainbowRegex.test(american);
rainbowRegex.test(british);
js
let favWord = "favorite";
let favRegex = /favou?rite/; // Change this line
let result = favWord.match(favRegex);
console.log(result);
results `[ ‘favorite’ ]`
Positive and Negative Lookahead
- *Lookaheads* are patterns that tell JavaScript to look-ahead in your string to check for patterns further along. This can be useful when you want to search for multiple patterns over the same string.
- There are two kinds of lookaheads: *positive lookahead* and *negative lookahead*.
- **A positive lookahead will look to make sure the element in the search pattern is there, but won’t actually match it.** A positive lookahead is used as `(?=…)` where the `…` is the required part that is not matched.
- **On the other hand, a negative lookahead will look to make sure the element in the search pattern is not there. **A negative lookahead is used as `(?!…)` where the `…` is the pattern that you do not want to be there.
js
// Both Matches would return 'q'
let quit = "qu";
let noquit = "qt";
let quRegex= /q(?=u)/;
let qRegex = /q(?!u)/;
quit.match(quRegex); //Returns ["q"]
noquit.match(qRegex); //Returns ["q"]
* A more practical use of lookaheads is to **check two or more patterns in one string. Here is a (naively) simple password checker that looks for between 3 and 6 characters and at least one number** :
js
// between 3 and 6 characters
// at least one number
let password = "abc123";
let checkPass = /(?=\w{3,6})(?=\D*\d)/; // checking 2 or more patterns in one string
checkPass.test(password);
results `true`
js
// Use lookaheads in the pwRegex
to match passwords that are greater than 5 characters long, and have two consecutive digits.
let sampleWord = "astronaut";
let pwRegex = /(?=\w{5})(?=\D*\d{2})/; // Change this line
let result = pwRegex.test(sampleWord);
Check For Mixed Grouping of Characters
* Sometimes we want to check for groups of characters using a Regular Expression and to achieve that we use parentheses `()`
* If you want to find either `Penguin` or `Pumpkin` in a string, you can use the following Regular Expression: `/P(engu|umpk)in/g`
js
let testStr = "Pumpkin";
let testRegex = /P(engu|umpk)in/;
testRegex.test(testStr); // Returns 'true'
js
let myString = "Eleanor Roosevelt";
let myRegex = /(Eleanor|Franklin)\sRoosevelt/;
let result = myRegex.test(myString);
console.log(result); // Returns 'true'
Reuse Patterns Using Capture Groups
- Say you want to match a word that occurs multiple times like below.
- let repeatStr = “row row row your boat”;
- You could use `/row row row/`, but what if you don’t know the specific word repeated? Capture groups can be used to find repeated substrings.
- **Capture groups are constructed by enclosing the regex pattern to be captured in parentheses.**
- In this case, the goal is to capture a word consisting of alphanumeric characters so the capture group will be `\w+` enclosed by parentheses: `/(\w+)/`.
- * The substring matched by the group is saved to a temporary “**variable**”
- * which can be accessed within the same regex using *a backslash and the number of the capture group* (e.g. `\1`).
- * *Capture groups are automatically numbered by the position of their opening parentheses (left to right), starting at 1
js
let repeatStr = "row row row your boat";
let repeatRegex = /(\w+) \1 \1/;
repeatRegex.test(repeatStr); // Returns true
repeatStr.match(repeatRegex); // Returns ["row row row", "row"] --> 2nd element in the array in the content in capture group
js
let repeatNum = "42 42 42";
let reRegex = /(\w+)\s\1\s\1/; // Change this line
let result = repeatNum.match(reRegex); // results [ '42 42 42', '42' ] 2nd element in the array in the content in capture group
console.log(result);
js
let repeatNum = "42 42 42";
let reRegex = /^(\d+)\s\1\s\1$/; // equals to /(\w+)\s(\w+)\s(\w+)/ repeats 42 three times with spaces (two spaces) alone
let result = reRegex.test(repeatNum);
console.log(result); // Returns True
Use Capture Groups to Search and Replace
- * Searching is useful. However, *you can make searching even more powerful when it also changes (or replaces) the text you match.*
- * You can search and replace text in a string using `.replace()` on a string.
- * The inputs for `.replace()` *is first the regex pattern you want to search for*. *The second parameter is the string to replace the match or a function to do something.*
js
let wrongText = "The sky is silver.";
let silverRegex = /silver/;
wrongText.replace(silverRegex, "blue"); // Returns "The sky is blue."
You can also access capture groups in the replacement string with dollar signs (`$`).
js
"Code Camp".replace(/(\w+)\s(\w+)/, '$2 $1'); // Returns Camp Code
* Write a regex `fixRegex` using three capture groups that will search for each word in the string `one two three`. Then update the `replaceText` variable to replace `one two three` with the string `three two one` and assign the result to the `result` variable. Make sure you are utilizing capture groups in the replacement string using the dollar sign (`$`) syntax.
js
let str = "one two three";
let fixRegex = /(\w+)\s(\w+)\s(\w+)/; // Change this line
let replaceText = "$3 $2 $1"; // Change this line
let result = str.replace(fixRegex, replaceText);
console.log(result); // Returns "three two one"
Remove Whitespace from Start and End
js
let hello = " Hello, World! ";
let wsRegex = /^\s+|\s+$/g; // matches beginning and trailing spaces
let result = hello.replace(wsRegex, ''); // Replace it with empty string
console.log(result); //Returns "Hello, World!" beginning and trailing spaces removed
Create a regular expression to check the valid image file extension as mentioned below:
regex = “([^\s]+(\.(?i)(jpe?g|png|gif|bmp))$)”;
- Where:
- ( represents the starting of group 1.
- [^\s]+ represents the string must contain at least one character.
- ( represents the starting of group 2.
- \. Represents the string should follow by a dot(.).
- (?i) represents the string ignore the case-sensitive.
- ( represents the starting of group3.
- jpe?g|png|gif|bmp represents the string end with jpg or jpeg or png or gif or bmp extension.
- ) represents the ending of the group 3.
- ) represents the ending of the group 2.
- $ represents the end of the string.
- ) represents the ending of the group 1.