Thursday, March 17, 2011

Regular Expression in JavaScript

I was struggling to get familiar with JavaScript's Regular expression handling and thought my learning might help others. Here I'll explain exec() function to parse/search matching regular expression in a given string.

Regular Expression is a special string in JavaScript which need to be inside of 2 "/" characters i.e. /REG_EXP/, you don't need to use ' or " characters. "i" and "g" followed by last "/" indicates whether you want to do ignore case and global search. REG_EXP can contain all standard regular expression syntax. Another important thing, we need to escape any Regular expression character if you put this in REG_EXP with "\" e.g. /\d{3}/. That's it! we are now almost ready to use Regular Expression. Following is a quick reference of mostly used RegEx Literals-

Position Matching: 
^ (Start of String), $(End of String), \b(Word Boundary) and \B(Non Word boundary)

RegEx Special Literals: w, W, d, D, s, S

{1}- One time. {1,}- One or more time, * (Zero or More), + (One or More) and ? (Zero or One). Repetition symbols are used along with RegEx Literals e.g. \w* (Zero or More Word Characters), \d{3}- (3 Numeric characters)
[]- matches any char. [a]- matches only "a". [^a] matches any character other than a. [^#]+ matches any sequence of string not having "#".

Logical OR and Grouping:
/(ab)/- any occurrence of "ab". /(ab)|(xy)/ matches either "ab" or "xy".

Now, lets start with some example scenarios. Following code snippet will find out occurrence of "ex" sub string-
var str = "some string example with expression"; // change this with HTML str
var rg = /ex/g; // replace this with your REG ExPression
var ma = null;
// I'm using Firebug
while((ma = rg.exec(str)) != null){
 console.log("Match Index="+ ma.index +" #"+ ma[0] );
 console.log("REG LastIndex="+ rg.lastIndex +" GL="+ +" Source="+ rg.source);
exec() method stores the next index to be used to resume search in the regiular expression variable i.e. "rg". Make sure you make it global search /g otherwise it will start from beginning and end up in a Infinite Loop!

Let's look into the Regular Expression to replace all width and height attributes of the HTML string. I'll replace all width and height values to 285 and 200 respectively and will try to keep postfix "px" (if there is any) as it is.
var html = '[div style="width: 400px; height: 500px"][iframe width="400" height="500"][/div]';

// Regular Expression for width and height attributes
// [Word Start]width="[One or More Digit]"
var wdrx = /\bwidth="\d+"/ig;
var htrx = /\bheight="\d+"/ig;
// Regular Expression for width and height within Style
// [Word Start]width:[Zero Or more space][One or more Digit]
var style_wdrx = /\bwidth:\s*\d+/ig;
var style_htrx = /\bheight:\s*\d+/ig;  

//Replace statements-
html = html.replace(wdrx, 'width="285"');
html = html.replace(htrx, 'height="200"');
html = html.replace(style_wdrx, 'width: 285');
html = html.replace(style_htrx, 'height: 200');

It seems it is going good...:-) Now let's try something different, say you want to replace all HREF ([a href=""]I'm a Link[/a]) links from your HTML code snippet.


  1. A simple and helpful explanation .. thank you.

  2. I agree, thanks a lot