Stream Tokenizing(分解字符串)

stream|字符串

从sun网站看到的Stream Tokenizing
In Tech Tips: June 23, 1998, an example of string tokenization was presented, using the class java.util.StringTokenizer.

There's also another way to do tokenization, using java.io.StreamTokenizer. StreamTokenizer operates on input streams rather than strings, and each byte in the input stream is regarded as a character in the range '\u0000' through '\u00FF'.

StreamTokenizer is lower level than StringTokenizer, but offers more control over the tokenization process. The class uses an internal table to control how tokens are parsed, and this syntax table can be modified to change the parsing rules. Here's an example of how StreamTokenizer works:

import java.io.*;
import java.util.*;
   
public class streamtoken {
  public static void main(String args[])
  {
    if (args.length == 0) {
      System.err.println("missing input filename");
      System.exit(1);
    }
   
    Hashtable wordlist = new Hashtable();
   
    try {
      FileReader fr = new FileReader(args[0]);
      BufferedReader br = new BufferedReader(fr);
   
      StreamTokenizer st = new StreamTokenizer(br);
      //StreamTokenizer st =
      //    new StreamTokenizer(new StringReader(
      //    "this is a test"));
      st.resetSyntax();
      st.wordChars('A', 'Z');
      st.wordChars('a', 'z');
      int type;
      Object dummy = new Object();
      while ((type = st.nextToken()) !=
        StreamTokenizer.TT_EOF) {
          if (type == StreamTokenizer.TT_WORD)
            wordlist.put(st.sval, dummy);
        }
        br.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
   
      Enumeration enum = wordlist.keys();
      while (enum.hasMoreElements())
        System.out.println(enum.nextElement());
   }
}

In this example, a StreamTokenizer is created on top of a FileReader / BufferedReader pair that represents a text file. Note that a StreamTokenizer can also be made to read from a String by using StringReader as illustrated in the commented-out code shown above (StringBufferInputStream also works, although this class has been deprecated).

The method resetSyntax is used to clear the internal syntax table, so that StreamTokenizer forgets any rules that it knows about parsing tokens. Then wordChars is used to declare that only upper and lower case letters should be considered to form words. That is, the only tokens that StreamTokenizer recognizes are sequences of upper and lower case letters.

nextToken is called repeatedly to retrieve words, and each resulting word is found in the public instance variable "st.sval". The words are inserted into a Hashtable, and at the end of processing the contents of the table are displayed, using an Enumeration as illustrated in Tech Tips: June 23, 1998. So the action of this program is to find all the unique words in a text file and display them.

StreamTokenizer also has special facilities for parsing numbers, quoted strings, and comments. It's a useful alternative to StringTokenizer, and is especially applicable if you are tokenizing input streams, or wish to exercise finer control over the tokenization process

时间: 2024-10-28 20:17:55

Stream Tokenizing(分解字符串)的相关文章

asp分解字符串为数组

idArr=split(ID)     for i = 0 to ubound(idArr)          call DelNews(clng(idarr(i)))     next   以上是小编为您精心准备的的内容,在的博客.问答.公众号.人物.课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索数组 , 字符串 , split , call , ubound , clng , 分解字符串 字符串分解 字符串分解成数组.asp.net 字符串转数组.asp字符串转数组.asp.n

[华为机试练习题]3.分解字符串

题目 按要求分解字符串,输入两个数M,N:M代表输入的M串字符串,N代表输出的每串字符串的位数,不够补0.例如:输入2,8, "abc" ,"123456789",则输出为"abc00000","12345678","90000000" 代码 /*------------------------------------- * 日期:2015-06-18 * 作者:SJF0115 * 来源:华为机试题 * 题

SQL根据指定分隔符分解字符串实现步骤_MsSql

如果有一个字符串 eg: "sun,star,moon,clouds",想要在MS SQL中根据给定的分隔符','把这个字符串分解成各个元素[sun] [star] [moon] [clouds],如何实现呢?为此,创建一个Function,代码如下: 复制代码 代码如下: CREATE FUNCTION [dbo].[Split_StrByDelimiter](@String VARCHAR(8000), @Delimiter CHAR(1)) RETURNS @temptable

SQL根据指定分隔符分解字符串实现步骤

如果有一个字符串 eg: "sun,star,moon,clouds",想要在MS SQL中根据给定的分隔符','把这个字符串分解成各个元素[sun] [star] [moon] [clouds],如何实现呢?为此,创建一个Function,代码如下: 复制代码 代码如下: CREATE FUNCTION [dbo].[Split_StrByDelimiter](@String VARCHAR(8000), @Delimiter CHAR(1)) RETURNS @temptable

使用StringTokenizer分解字符串

Java分割字符串,一般使用substring.split.StringTokenizer来处理,前两种是String对象的方法,使用字符串可以直接处理,本文介绍下StringTokenizer的使用. StringTokenizer 的实例化使用new的方式. 构造方法最多有3个参数: StringTokenizer(String str, String delim, boolean returnDelims) 第一个参数为我们要处理的字符串. 第二个参数为我们分割字符串的分割标记字符.del

PHP将字符分解为多个字符串的方法_php实例

本文实例讲述了PHP将字符分解为多个子串的方法.分享给大家供大家参考.具体实现方法如下: 分解字符串主要通过Split()函数实现,该函数用于指把一个字符串通过指定的字符分解为多个子串,并分别存入数组中.其语法声明如下: array split(string pattern,string str[,int limit]); 其中的参数: pattern:用于指定作为分解标识的符号,注意该参数区分大小写. str:欲处理的字符串. limit:返回分解子串个数的最大值,缺省时为全部返回. 示例如下

Asp.net 字符串操作基类(安全,替换,分解等)_实用技巧

/********************************************************************************** * * 功能说明:常用函数基类 * 作者: 刘功勋; * 版本:V0.1(C#2.0);时间:2006-8-13 * * *******************************************************************************/ /***********************

asp.net中常用的字符串处理类

 代码如下 复制代码 using System; using System.Data; using System.Configuration; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; usi

C字符串函数strtok

原型:extern char *strtok(char *s, char *delim); 用法:#include <string.h> 功能:分解字符串为一组标记串.s为要分解的字符串,delim为分隔符字符串. 说明:首次调用时,s必须指向要分解的字符串,随后调用要把s设成NULL.