问题描述
找出txt文件中最长和第二长的组合词,文件有10万条记录,样本如下:以下样本中,最长的组合词是ratcatdogcatwordsforproblem.txt———————catcatscatsdogcatscatxdogcatsratdogdogcatsdoghippopotamusesratratcatdogcat————————————————————有Main.java和Trie.java两个文件,能通过编译并执行。哪位大牛能有更好的算法?或者修改一下此代码,令其有所改变?要注意以下几点:1。效率2。内存溢出Main.java————————importjava.io.BufferedReader;importjava.io.FileNotFoundException;importjava.io.FileReader;importjava.io.IOException;importjava.util.ArrayList;publicclassMain{publicstaticfinalStringFILENAME="wordsforproblem.txt";publicstaticvoidmain(String[]args){Trietrie=newTrie();intcount=0;Stringfirst="";Stringsecond="";loadFileToTrie(trie,FILENAME);try{BufferedReaderbr=newBufferedReader(newFileReader(FILENAME));Stringline;while((line=br.readLine())!=null){//Foreachwordinthefile,findallitsprefix.//Foreachprefix,ifthecorrespondingsuffixisvalid(i.e.contained//inthetrieorconstructedfromthewordsintrie),compareitwith//thelongestcompoundandmakecorrespondingoperations.ArrayList<String>prefix=trie.getAllPrefix(line);for(Stringp:prefix){Stringsuffix=line.substring(p.length());if(trie.isValidWord(suffix)){count++;if(line.length()>first.length())first=line;elseif(line.length()>second.length())second=line;}}}br.close();}catch(FileNotFoundExceptione){e.printStackTrace();}catch(IOExceptione){e.printStackTrace();}System.out.println("Numberofwordsthatcanbeconstructedusingotherwords:"+count);System.out.println("1stlongest:""+first+""");System.out.println("2ndlongest:""+second+""");}/***loadthewordscontainedinthefiletotheTrie.**@paramtrie*@paramfilename*/publicstaticvoidloadFileToTrie(Trietrie,Stringfilename){intcount=0;try{BufferedReaderbr=newBufferedReader(newFileReader(filename));Stringline;while((line=br.readLine())!=null){trie.insert(line);count++;}br.close();System.out.println("Totalnumberofwordsinthefile:"+count);}catch(FileNotFoundExceptione){e.printStackTrace();}catch(IOExceptione){e.printStackTrace();}}}————————————————————————————Trie.java——————————importjava.util.ArrayList;/***Trie-theknownprefixtreeordictionarytree.**@authorXuan*@version:Jun/3/2014*/publicclassTrie{privateNoderoot;publicTrie(){root=newNode('',false);}/***insertawordtothetrie*/publicvoidinsert(Stringword){intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;for(inti=0;i<len;i++){if(cur.children[letters[i]-'a']==null)cur.children[letters[i]-'a']=newNode(letters[i],i==len-1?true:false);cur=cur.children[letters[i]-'a'];}}/****@paramword*@returntrueifthetriecontainstheinputword*/publicbooleancontains(Stringword){intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;inti;for(i=0;i<len;i++){if(cur.children[letters[i]-'a']==null)returnfalse;elsecur=cur.children[letters[i]-'a'];}if(cur==null||!cur.terminal)returnfalse;returntrue;}/***@paramword*@returnallthevalidprefixofthewordwithinthetrie*/publicArrayList<String>getAllPrefix(Stringword){ArrayList<String>ret=newArrayList<String>();intlen=word.length();char[]letters=word.toCharArray();Nodecur=root;for(inti=0;i<len;i++){if(cur.children[letters[i]-'a']==null)returnret;else{cur=cur.children[letters[i]-'a'];if(cur.terminal&&i<len-1)ret.add(word.substring(0,i+1));}}returnret;}/***isValidWordreturnstrueeitherwhentheinputwordiscontainedinthetrie*orwhenthewordisacompoundofwordsinthetrie.*/publicbooleanisValidWord(Stringword){if(this.contains(word))returntrue;ArrayList<String>prefix=this.getAllPrefix(word);for(Stringp:prefix){if(isValidWord(word.substring(p.length())))returntrue;}returnfalse;}}classNode{charval;Node[]children;booleanterminal;Node(charvalue,booleanisTerminal){this.val=value;this.terminal=isTerminal;children=newNode[26];}}
解决方案
解决方案二:
自己顶,有大神吗?
解决方案三:
先声明一点,并没有看源代码因为你需要的更好的算法,而不是优化代码有以下思路:对数据记录进行拆分,并行处理这些记录疑问:——组合词是否实际存在?——最长组合词是指单词个数最多?
解决方案四:
疑问:——组合词是否实际存在?——最长组合词是指单词个数最多?_______________回答:1。组合词存在。2。字母个数最多。
解决方案五:
就是酱紫的啊
解决方案六:
先声明,并没有看源码。疑问:你的这么多字符之间,有","分割么?在读取过程中不会变成一整个字符串么?
解决方案七:
txt文件,换行的。这样是样板wordsforproblem.txt———————catcatscatsdogcatscatxdogcatsratdogdogcatsdoghippopotamusesratratcatdogcat