The lexer hack

The lexer hack

From Wikipedia, the free encyclopedia

In computer programmingthe
lexer hack
 (as opposed to "a lexer hack") describes a common solution to the problems in parsing ANSI
C
, due to the reference lexical grammar being context-sensitive. In C, classifying
a sequence of characters as a variable name or a type name requires contextual information of the phrase structure, which prevents one from having a context-free lexer based
only on the regular grammar of token classes.

Contents

  [hide

Problem[edit]

The problem is that in the following code, the lexical class of A cannot be determined without
further contextual information:

(A)*B

This code could be multiplication of two variables, in which case A is a variable;
written unambiguously:

A * B

Alternatively, it could be casting the dereferenced value of B to the type A,
in which case A is a typedef-name; written in
usual human-readable form, but still ambiguously from the point of view of the grammar:

(A) *B

In more detail, in a compiler, the lexer performs one
of the earliest stages of converting the source code to a program. It scans the text to
extract meaningful tokens, such as words, numbers, and strings. The parser analyzes sequences of tokens attempting to match them to syntax rules representing language structures, such as loops and variable declarations. A problem occurs
here if a single sequence of tokens can ambiguously match more than one syntax rule.

This ambiguity can happen in C if the lexer does not distinguish between variable and typedef identifiers.[1] For
example, in the C expression:

(A) * B

the lexer may find these tokens:

  1. left parenthesis
  2. identifier 'A'
  3. right parenthesis
  4. operator '*'
  5. identifier 'B'

The problem is precisely that the lexical class of A cannot be determined without further context: the parser can interpret this as variable A multiplied
by B or as type A casting the dereferenced value of B. This is known as the "typedef-name: identifier" problem, due to the name of the problematic production
rule
.[2]

The hack solution[edit]

The solution generally consists of feeding information from the parser's symbol
table
 back into the lexer. That is, rather than functioning as a pure one-way pipeline from
the lexer to the parser, there is a backchannel from the parser back to the lexer.
This mixing of the lexer and parser is generally regarded as inelegant, which is why it is called a "hack".

Without added context, the lexer cannot distinguish type identifiers from other identifiers without extra context because all identifiers have the same format. With the hack in the above example, when the
lexer finds the identifier A it should be able to classify the token as a type identifier. The rules of the language would be clarified by specifying that typecasts require a type identifier and the ambiguity disappears.

The problem also exists in C++ and parsers can
use the same hack.[1]

Alternative solutions[edit]

This problem does not arise (and hence needs no "hack" in order to solve) when using lexerless
parsing
 techniques, as these are intrinsically contextual. These are generally seen as less elegant designs, however, because they lack the modularity of having a concurrent lexer
and parser in a pipeline.

Some parser generators, such as the yacc-derived BtYacc ("Backtracking
Yacc"), give the generated parser the ability to try multiple attempts to parse the tokens. In the problem described here, if an attempt fails because of semantic information about the identifier, it can backtrack and attempt other rules.[3]

The relatively new LLVM-based parser Clang handles
the situation a completely different way, namely by using a non-reference lexical grammar. Clang's lexer does not attempt to differentiate between type names and variable names: it simply reports the current token as an identifier. The parser then uses Clang's semantic
analysis
 library to determine the nature of the identifier. This allows a much cleaner separation
of concerns
 and encapsulation of the lexer and parser, and is therefore
considered a much more elegant solution than The Lexer Hack by most modern software design metrics.[4] This
is also the approach used in most other modern languages, which do not distinguish different classes of identifiers in the lexical grammar, but instead defer them to the parsing or semantic analysis phase, when sufficient information is available.

See also[edit]

References[edit]

  1. Jump
    up to:a
     b Roskind,
    James A. (1991-07-11). "A YACC-able C++
    2.1 GRAMMAR, AND THE RESULTING AMBIGUITIES"
    .
  2. Jump
    up^
     Bendersky, Eli (2007-11-24). "The
    context sensitivity of C's grammar"
    .
  3. Jump
    up^
     "BtYacc
    3.0"
    . Based
    on yacc with modifications by Chris Dodd and Vadim Maslov.
  4. Jump
    up^
     Bendersky, Eli. "How
    Clang handles the type / variable name ambiguity of C/C++"
    .

Citations[edit]

时间: 2024-09-10 22:13:35

The lexer hack的相关文章

How Clang handles the type / variable name ambiguity of C/C++

原文地址: http://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/ How Clang handles the type / variable name ambiguity of C/C++ July 5th, 2012 at 7:35 pm My previous articles on the context sensitivity and ambigu

实例解析一款针对IE6的CSS hack用法

本文通过实例向大家描述一下IE6的CSS hack用法,我们以一个LOGO为实例,讲解了针对IE6,应用CSS HACK设置有别于IE7和FF的效果.具体内容请看本文详细介绍,相信本文介绍一定会让你有所收获. 一款针对IE6的CSS hack用法实例 CSS网页布局的兼容性问题一直困扰着大家,其实在51cto.com以前的文章中也有着丰富的CSS HACK与CSS兼容性的文章,只是有些文章可能讲的比较抽象而没有实例化,不便于大家的理解. 现在要讲解的是一个关于IE6的CSS HACK的用法.我们

探究针对GoogleChrome的CSS hack写法

本节和大家一起学习一下针对GoogleChrome谷歌浏览器的CSS hack的使用,CSS hack是因为现有浏览器对标准的解析不同,为了兼容各浏览器,所采用的一种补救方法:也有人说CSS hack是一种类似作弊的手段,以欺骗浏览器的方式达到兼容的目的,是用浏览器的兼容性差异来解决浏览器的兼容性问题. CSS hack简介 CSS hack是因为现有浏览器对标准的解析不同,为了兼容各浏览器,所采用的一种补救方法. CSS hack是一种类似作弊的手段,以欺骗浏览器的方式达到兼容的目的,是用浏览

关于CSS hack的思考

css 我已经习惯了做好页面之后去解决不同浏览器的兼容性问题,不断的测试,不停的修改CSS hack以保证在大部分的浏览器上得到最佳效果.光IE就需要兼顾IE5.X与IE6,以后也许还要为IE7来写单独的CSS hack,或许是这样的工作做得多了开始讨厌这样的没有效益的劳动.就是为了去满足那些少数的IE5.0用户或是为了满足那些极端的Firefox或是 Opera的推崇者们,我需要花费多一倍的时间来研究这些,我开始思考当浏览器不断成长,不断更新,我们的CSS hack是不是要没完没了的写下去,并

实例代码:使用@media实现IE hack的方法

文章简介:众所周知,有些时候为了实现IE下的某些效果与现代浏览器一致,我们不得不使用一些hack手段来实现目的.比如说使用"\0","\"和"\9"来仅让IE某些版本识别,而对于现代浏览器来说,他会直接无视这些代码.今天我想为大家介绍的是使用@media实现IE hack的方法. 随着Responsive设计的流行,Medial Queries可算是越来越让人观注了.他可以让Web前端工程实现不同设备下的样式选择,让站点在不同的设备中实现不同的效

css hack:通用的css hack

文章简介:css hack . .all-IE{property:value\9;} :root .IE-9{property:value\0/;} .gte-IE-8{property:value\0;} .lte-IE-7{*property:value;} .IE-7{+property:value;} .IE-6{_property:value;} .not-IE{property//:value;} @-moz-document url-prefix() { .firefox{prop

AJAX Hacks之Hack 4. 接收XML格式的数据

ajax|xml|数据 AJAX Hacks之Hack 4. 接收XML格式的数据 当前的许多交换数据的技术都使用XML格式的数据,那是因为XML格式的数据被广泛的使用和支持.因此,不同用户可以已有的技术来生成.发送.接收XML数据而不需要使用别的工具转换数据的格式. 一个典型的例子就是一个GPS设备可以在任何地方共享它需要的数据.无论是在远行.或是户外活动,当把设备插入到计算机的UBS接口后,就可以向web发送数据了.GPS软件被设置为默认支持XML格式的数据.而web也使用xml格式的数据.

Ajax Hack 之hack 11 动态产生样式

ajax|动态 Ajax Hack 之hack 11 动态产生样式 为web内容动态定义和制定CSS样式. JavaScript和DOM编程允许用户定义CSS样式属性,并应用于页面元素.一个典型的例子是一个wiki页面允许用户设计自己页面的方案和样式. 通常情况下,比较好的方法是将样式定义从JavaScript代码中分离出来.这样的习惯可以使元素独立扩展,降低web页面元素的复杂性,使之更高效. 本hack和上一个相似,根据用户选择的样式,动态显示服务器信息.和前一个不同之处就是:这里是在代码里

Ajax Hack 之hack 12不刷新浏览器的情况下向服务器提交text或textarea的值

ajax|服务器|浏览器|刷新 Ajax Hack 之hack 12不刷新浏览器的情况下向服务器提交text或textarea的值 本节主要讲的是:将text或textarea的值平滑地传递给服务器. 当用户输入text或textarea的值以后,Ajax能将这些值自动的发给服务器.程序等待text的onblur 事件,然后使用request对象向服务器发送数据.在常用的情况是,用户点击一个按钮,然后将 整个form作为一个大的数据包向服务器发送.服务器相应也与此类似.例如,在线测试或者 教程能