最长的组合词 Trie -- 谁能改写此算法？或者有更好的算法？

问题描述

找出txt文件中最长和第二长的组合词，文件有10万条记录，样本如下：以下样本中，最长的组合词是ratcatdogcatwordsforproblem.txt———————catcatscatsdogcatscatxdogcatsratdogdogcatsdoghippopotamusesratratcatdogcat————————————————————有Main.java和Trie.java两个文件，能通过编译并执行。哪位大牛能有更好的算法？或者修改一下此代码，令其有所改变？要注意以下几点：1。效率2。内存溢出Main.java————————importjava.io.BufferedReader;importjava.io.FileNotFoundException;importjava.io.FileReader;importjava.io.IOException;importjava.util.ArrayList;publicclassMain{publicstaticfinalStringFILENAME="wordsforproblem.txt";publicstaticvoidmain(String[]args){Trietrie=newTrie();intcount=0;Stringfirst="";Stringsecond="";loadFileToTrie(trie,FILENAME);try{BufferedReaderbr=newBufferedReader(newFileReader(FILENAME));Stringline;while((line=br.readLine())!=null){//Foreachwordinthefile,findallitsprefix.//Foreachprefix,ifthecorrespondingsuffixisvalid(i.e.contained//inthetrieorconstructedfromthewordsintrie),compareitwith//thelongestcompoundandmakecorrespondingoperations.ArrayList<String>prefix=trie.getAllPrefix(line);for(Stringp:prefix){Stringsuffix=line.substring(p.length());if(trie.isValidWord(suffix)){count++;if(line.length()>first.length())first=line;elseif(line.length()>second.length())second=line;}}}br.close();}catch(FileNotFoundExceptione){e.printStackTrace();}catch(IOExceptione){e.printStackTrace();}System.out.println("Numberofwordsthatcanbeconstructedusingotherwords:"+count);System.out.println("1stlongest:""+first+""");System.out.println("2ndlongest:""+second+""");}/***loadthewordscontainedinthefiletotheTrie.**@paramtrie*@paramfilename*/publicstaticvoidloadFileToTrie(Trietrie,Stringfilename){intcount=0;try{BufferedReaderbr=newBufferedReader(newFileReader(filename));Stringline;while((line=br.readLine())!=null){trie.insert(line);count++;}br.close();System.out.println("Totalnumberofwordsinthefile:"+count);}catch(FileNotFoundExceptione){e.printStackTrace();}catch(IOExceptione){e.printStackTrace();}}}————————————————————————————Trie.java——————————importjava.util.ArrayList;/***Trie-theknownprefixtreeordictionarytree.**@authorXuan*@version:Jun/3/2014*/publicclassTrie{privateNoderoot;publicTrie(){root=newNode('',false);}/***insertawordtothetrie*/publicvoidinsert(Stringword){intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;for(inti=0;i<len;i++){if(cur.children[letters[i]-'a']==null)cur.children[letters[i]-'a']=newNode(letters[i],i==len-1?true:false);cur=cur.children[letters[i]-'a'];}}/****@paramword*@returntrueifthetriecontainstheinputword*/publicbooleancontains(Stringword){intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;inti;for(i=0;i<len;i++){if(cur.children[letters[i]-'a']==null)returnfalse;elsecur=cur.children[letters[i]-'a'];}if(cur==null||!cur.terminal)returnfalse;returntrue;}/***@paramword*@returnallthevalidprefixofthewordwithinthetrie*/publicArrayList<String>getAllPrefix(Stringword){ArrayList<String>ret=newArrayList<String>();intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;for(inti=0;i<len;i++){if(cur.children[letters[i]-'a']==null)returnret;else{cur=cur.children[letters[i]-'a'];if(cur.terminal&&i<len-1)ret.add(word.substring(0,i+1));}}returnret;}/***isValidWordreturnstrueeitherwhentheinputwordiscontainedinthetrie*orwhenthewordisacompoundofwordsinthetrie.*/publicbooleanisValidWord(Stringword){if(this.contains(word))returntrue;ArrayList<String>prefix=this.getAllPrefix(word);for(Stringp:prefix){if(isValidWord(word.substring(p.length())))returntrue;}returnfalse;}}classNode{charval;Node[]children;booleanterminal;Node(charvalue,booleanisTerminal){this.val=value;this.terminal=isTerminal;children=newNode[26];}}

解决方案

解决方案二：
自己顶，有大神吗？
解决方案三：
先声明一点，并没有看源代码因为你需要的更好的算法，而不是优化代码有以下思路:对数据记录进行拆分，并行处理这些记录疑问：——组合词是否实际存在？——最长组合词是指单词个数最多？
解决方案四：
疑问：——组合词是否实际存在？——最长组合词是指单词个数最多？_______________回答：1。组合词存在。2。字母个数最多。
解决方案五：
就是酱紫的啊
解决方案六：
先声明，并没有看源码。疑问：你的这么多字符之间，有","分割么？在读取过程中不会变成一整个字符串么？
解决方案七：
txt文件，换行的。这样是样板wordsforproblem.txt———————catcatscatsdogcatscatxdogcatsratdogdogcatsdoghippopotamusesratratcatdogcat

时间： 2024-07-31 08:07:56

最长的组合词 Trie -- 谁能改写此算法？或者有更好的算法？

问题描述

解决方案

最长的组合词 Trie -- 谁能改写此算法？或者有更好的算法？的相关文章

通过哪几种方式可以扩展出更多的关键词

深度学习性能提升的诀窍

如何准确选择长尾词

大众点评：长三角地区的消费幸福感相对较高

长尾关键字多长更能吸引用户点击

如何选择易出效果的长尾词

B2C网站通过长尾词让流量飞涨

关于长尾词优化分析

怎么比较两个字符串中相同的词？