XRegExp 0.2: Now With Named Capture_js面向对象

Update: A beta version of XRegExp 0.3 is now available as part of the RegexPal download package.

JavaScript's regular expression flavor doesn't support named capture. Well, says who? XRegExp 0.2 brings named capture support, along with several other new features. But first of all, if you haven't seen the previous version, make sure to check out my post on XRegExp 0.1, because not all of the documentation is repeated below.

Highlights

  • Comprehensive named capture support (New)
  • Supports regex literals through the addFlags method (New)
  • Free-spacing and comments mode (x)
  • Dot matches all mode (s)
  • Several other minor improvements over v0.1

Named capture

There are several different syntaxes in the wild for named capture. I've compiled the following table based on my understanding of the regex support of the libraries in question. XRegExp's syntax is included at the top.

Library Capture Backreference In replacement Stored at
XRegExp (<name>…) \k<name> ${name} result.name
.NET (?<name>…)


(?'name'…)
\k<name>


\k'name'
${name} Matcher.Groups('name')
Perl 5.10 (beta) (?<name>…)


(?'name'…)
\k<name>


\k'name'


\g{name}
$+{name} ??
Python (?P<name>…) (?P=name) \g<name> result.group('name')
PHP preg (PCRE) (.NET, Perl, and Python styles) $regs['name'] $result['name']

No other major regex library currently supports named capture, although the JGsoft engine (used by products like RegexBuddy) supports both .NET and Python syntax. XRegExp does not use a question mark at the beginning of a named capturing group because that would prevent it from being used in regex literals (JavaScript would immediately throw an "invalid quantifier" error).

XRegExp supports named capture on an on-request basis. You can add named capture support to any regex though the use of the new "k" flag. This is done for compatibility reasons and to ensure that regex compilation time remains as fast as possible in all situations.

Following are several examples of using named capture:

var repeatedWords = new XRegExp("\\b (<word> \\w+ ) \\s+ \\k<word> \\b", "gixk");

XRegExp.overrideNative();
var repeatedWords = new RegExp("\\b (<word> \\w+ ) \\s+ \\k<word> \\b", "gixk");

var repeatedWords = /\b (<word> \w+ ) \s+ \k<word> \b/.addFlags("gixk");

var data = "The the test data.";

var hasDuplicates = repeatedWords.test(data);

var output = data.replace(repeatedWords, "${word}");

In the above code, I've also used the x flag provided by XRegExp, to improve readability. Note that the addFlags method can be called multiple times on the same regex (e.g., /pattern/g.addFlags("k").addFlags("s")), but I'd recommend adding all flags in one shot, for efficiency.

Here are a few more examples of using named capture, with an overly simplistic URL-matching regex (for comprehensive URL parsing, see parseUri):

var url = "http://microsoft.com/path/to/file?q=1";
var urlParser = new XRegExp("^(<protocol>[^:/?]+)://(<host>[^/?]*)(<path>[^?]*)\\?(<query>.*)", "k");
var parts = urlParser.exec(url);

var newUrl = url.replace(urlParser, function(match){
	return match.replace(match.host, "yahoo.com");
});

Note that XRegExp's named capture functionality does not support deprecated JavaScript features including the lastMatch property of the global RegExp object and the RegExp.prototype.compile() method.

Singleline (s) and extended (x) modes

The other non-native flags XRegExp supports are s (singleline) for "dot matches all" mode, and x (extended) for "free-spacing and comments" mode. For full details about these modifiers, see the FAQ in my XRegExp 0.1 post. However, one difference from the previous version is that XRegExp 0.2, when using the x flag, now allows whitespace between a regex token and its quantifier (quantifiers are, e.g., +, *?, or {1,3}). Although the previous version's handling/limitation in this regard was documented, it was atypical compared to other regex libraries. This has been fixed.

The code


if (window.XRegExp === undefined) {
	var XRegExp;

	(function () {
		var native = {
			RegExp: RegExp,
			exec: RegExp.prototype.exec,
			match: String.prototype.match,
			replace: String.prototype.replace
		};

		XRegExp = function (pattern, flags) {
			return native.RegExp(pattern).addFlags(flags);
		};

		RegExp.prototype.addFlags = function (flags) {
			var pattern = this.source,
				useNamedCapture = false,
				re = XRegExp._re;

			flags = (flags || "") + native.replace.call(this.toString(), /^[\S\s]+\//, "");

			if (flags.indexOf("x") > -1) {
				pattern = native.replace.call(pattern, re.extended, function ($0, $1, $2) {
					return $1 ? ($2 ? $2 : "(?:)") : $0;
				});
			}

			if (flags.indexOf("k") > -1) {
				var captureNames = [];
				pattern = native.replace.call(pattern, re.capturingGroup, function ($0, $1) {
					if (/^\((?!\?)/.test($0)) {
						if ($1) useNamedCapture = true;
						captureNames.push($1 || null);
						return "(";
					} else {
						return $0;
					}
				});
				if (useNamedCapture) {

					pattern = native.replace.call(pattern, re.namedBackreference, function ($0, $1, $2) {
						var index = $1 ? captureNames.indexOf($1) : -1;
						return index > -1 ? "\\" + (index + 1).toString() + ($2 ? "(?:)" + $2 : "") : $0;
					});
				}
			}

			pattern = native.replace.call(pattern, re.characterClass, function ($0, $1) {

				return $1 ? native.replace.call($0, /^(\[\^?)]/, "$1\\]") : $0;
			});

			if (flags.indexOf("s") > -1) {
				pattern = native.replace.call(pattern, re.singleline, function ($0) {
					return $0 === "." ? "[\\S\\s]" : $0;
				});
			}

			var regex = native.RegExp(pattern, native.replace.call(flags, /[sxk]+/g, ""));

			if (useNamedCapture) {
				regex._captureNames = captureNames;

			} else if (this._captureNames) {
				regex._captureNames = this._captureNames.valueOf();
			}

			return regex;
		};

		String.prototype.replace = function (search, replacement) {

			if (!(search instanceof native.RegExp && search._captureNames)) {
				return native.replace.apply(this, arguments);
			}

			if (typeof replacement === "function") {
				return native.replace.call(this, search, function () {

					arguments[0] = new String(arguments[0]);

					for (var i = 0; i < search._captureNames.length; i++) {
						if (search._captureNames[i]) arguments[0][search._captureNames[i]] = arguments[i + 1];
					}
					return replacement.apply(window, arguments);
				});
			} else {
				return native.replace.call(this, search, function () {
					var args = arguments;
					return native.replace.call(replacement, XRegExp._re.replacementVariable, function ($0, $1, $2) {

						if ($1) {
							switch ($1) {
								case "$": return "$";
								case "&": return args[0];
								case "`": return args[args.length - 1].substring(0, args[args.length - 2]);
								case "'": return args[args.length - 1].substring(args[args.length - 2] + args[0].length);

								default:

									var literalNumbers = "";
									$1 = +$1;
									while ($1 > search._captureNames.length) {
										literalNumbers = $1.toString().match(/\d$/)[0] + literalNumbers;
										$1 = Math.floor($1 / 10);
									}
									return ($1 ? args[$1] : "$") + literalNumbers;
							}

						} else if ($2) {

							var index = search._captureNames.indexOf($2);
							return index > -1 ? args[index + 1] : $0;
						} else {
							return $0;
						}
					});
				});
			}
		};

		RegExp.prototype.exec = function (str) {
			var result = native.exec.call(this, str);
			if (!(this._captureNames && result && result.length > 1)) return result;

			for (var i = 1; i < result.length; i++) {
				var name = this._captureNames[i - 1];
				if (name) result[name] = result[i];
			}

			return result;
		};

		String.prototype.match = function (regexp) {
			if (!regexp._captureNames || regexp.global) return native.match.call(this, regexp);
			return regexp.exec(this);
		};
	})();
}

XRegExp._re = {
	extended: /(?:[^[#\s\\]+|\\(?:[\S\s]|$)|\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?)+|(\s*#[^\n\r]*\s*|\s+)([?*+]|{\d+(?:,\d*)?})?/g,
	singleline: /(?:[^[\\.]+|\\(?:[\S\s]|$)|\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?)+|\./g,
	characterClass: /(?:[^\\[]+|\\(?:[\S\s]|$))+|\[\^?(]?)(?:[^\\\]]+|\\(?:[\S\s]|$))*]?/g,
	capturingGroup: /(?:[^[(\\]+|\\(?:[\S\s]|$)|\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?|\((?=\?))+|\((?:<([$\w]+)>)?/g,
	namedBackreference: /(?:[^\\[]+|\\(?:[^k]|$)|\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?|\\k(?!<[$\w]+>))+|\\k<([$\w]+)>(\d*)/g,
	replacementVariable: /(?:[^$]+|\$(?![1-9$&`']|{[$\w]+}))+|\$(?:([1-9]\d*|[$&`'])|{([$\w]+)})/g
};

XRegExp.overrideNative = function () {

	RegExp = XRegExp;
};

Array.prototype.indexOf = Array.prototype.indexOf || function (item, from) {
	var len = this.length;
	for (var i = (from < 0) ? Math.max(0, len + from) : from || 0; i < len; i++) {
		if (this[i] === item) return i;
	}
	return -1;
};

You can download it, or get the packed version (2.7 KB).

XRegExp has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.2 beta for Windows, and Swift 0.2.

Finally, note that the XRE object from v0.1 has been removed. XRegExp now only creates one global variable: XRegExp. To permanently override the native RegExp constructor/object, you can now run XRegExp.overrideNative();

以上是小编为您精心准备的的内容,在的博客、问答、公众号、人物、课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索with
, Capture
, XRegExp
, 0.2:
, Now
Named
xregexp、xregexp.js、faststone capture、capture、capture one,以便于您获取更多的相关知识。

时间: 2024-10-06 14:29:19

XRegExp 0.2: Now With Named Capture_js面向对象的相关文章

C# 4.0 Optional Parameters 和Named Parameters

Optional Parameters 是C# 4.0的特色之一,可减少重载函数的数量,却可达到相同的效果,加快开发效率.在使用上就跟C++一样,只需用等号为函数的参数加上默认值即可.需注意的是Optional Parameters要放在必要性参数的后面,也就是说Optional Parameters后面不得有必要性参数的存在. Optional Parameters 虽然不是新的概念,但对于缺少该功能的C#而言,习惯C++的程序员来说,写起来总是会觉得不顺,在函数重载的编写上也麻烦了许多.好在

oobash 0.39.3发布 bash 4面向对象的样式库

oobash 是一个采用bash编写的bash 4面向对象的样式库. oobash 0.39.3该版本添加的文件包括:md5sum,SHA1SUM,getFreeSpace,getTotalSpace,getUsableSpace 和 getInode.添加到I18N:I18n.out.print,I18n.err.print,I18n.out.println 和 I18n.err.println.从I18N 删除了I18n.out.message 和 I18n.err.message.添加替换

Flash ActionScript 3.0 概要

可以说这是我翻译的第一个ActionScript文件,虽然AS3.0吵的很大.但由于有关中文介绍的还并不多见,所以翻译了这篇文章.这篇文章译自:http://labs.macromedia.com/wiki/index.php/ActionScript_3:overview 由于英语水平和actionscript水平有限,也许有些错误之处,在此恳请斑主和大伙们指出,但同时也希望对大家有所用处. ActionScript3.0概要 ActionScript 3.0 演变成一门强大的面向对象的编程语

AS代码2.0:新的语言元素

Flash的ActionScript(简称AS)代码控制是Flash实现交互性的重要组成部分,也是区别于其他动画软件的看家本领.今年新发布的Flash MX Professional 2004的动作脚本语言已经升级到2.0,它是一种面向对象的脚本语言,执行ECMA-262脚本语言规范,支持继承.强类型和事件模型.使用动作脚本语言2.0可以编写出更加稳健的脚本. 动作脚本语言2.0的新特性包括:新的语言元素.改进的编辑和调试工具.引入更多.的面向对象编程模型. 本系列文章将向大家详细介绍AS代码2

精进不休 .NET 4.0 (4)

C# 4.0 新特性之命名参数和可选参数, 动态绑定(dynamic), 泛型协变和逆变, CountdownEvent, Barrier 介绍 C# 4.0 的新特性 * Named And Optional Arguments - 命名参数和可选参数 * Dynamic Binding - 动态绑定(dynamic 用于动态编程,其依赖于Dynamic Language Runtime) * Covariance - 泛型的协变 * Contravariance - 泛型的逆变 * Coun

在 CentOS7.0 上搭建 Chroot 的 Bind DNS 服务器

在 CentOS7.0 上搭建 Chroot 的 Bind DNS 服务器 BIND(Berkeley internet Name Daemon)也叫做NAMED,是现今互联网上使用最为广泛的DNS 服务器程序.这篇文章将要讲述如何在 chroot 监牢中运行 BIND,这样它就无法访问文件系统中除"监牢"以外的其它部分. 例如,在这篇文章中,我会将BIND的运行根目录改为 /var/named/chroot/.当然,对于BIND来说,这个目录就是 /(根目录). "jail

在 CentOS7.0 上搭建DNS 服务器

BIND也叫做NAMED,是现今互联网上使用最为广泛的DNS 服务器程序.这篇文章将要讲述如何在 chroot 监牢中运行 BIND,这样它就无法访问文件系统中除"监牢"以外的其它部分. 例如,在这篇文章中,我会将BIND的运行根目录改为 /var/named/chroot/.当然,对于BIND来说,这个目录就是 /(根目录). "jail"(监牢,下同)是一个软件机制,其功能是使得某个程序无法访问规定区域之外的资源,同样也为了增强安全性(LCTT 译注:chroo

058_《突破Delphi7.0编程实例五十讲》

<突破Delphi7.0编程实例五十讲> Delphi 教程 系列书籍 (058) <突破Delphi7.0编程实例五十讲> 网友(邦)整理 EMail: shuaihj@163.com 下载地址: Pdf 作者: 张增强 丛书名: 万水编程实例五十讲丛书 出版社:中国水利水电出版社 ISBN:7508412761 上架时间:2002-12-21 出版日期:2002 年12月 开本:16开 页码:362 版次:1-1 内容简介 Delphi是面向对象的可视化编程语言,它是目前面向对

ActionScript 3.0 概要

翻译:衡-----------蓝色理想--经典论坛 --- 类型:转载 ActionScript 3.0 演变成一门强大的面向对象的编程语言意味着flash平台的重大变革.这种变化也意味着 ActionScript 3.0 将创造性地将语言理想地迅速地建立出适应网络的丰富应用程序, 成为丰富网络应用(Rich Internet Application)项目的本质部分.比较早期的ActionScript版本就已经提供了这种要求为创造真实地参与在线体验的力量和灵活性.ActionScript 3.0