编写一个Java站内搜索引擎系统
优采云 发布时间: 2022-06-06 17:16编写一个Java站内搜索引擎系统
如果用我们的小服务器去搞百度,搜狗那种引擎肯定是不行的,那些属于全网搜索,我们这里做一个站内搜索。类似于我们对网站里的资源进行搜索。
一、搜索引擎怎么搜索
搜索引擎就像一个小蜜蜂每天不停的采摘蜂蜜,就是去爬虫各个网页,然后通过爬取之后建立索引,以供于我们去搜索。建议大家别爬虫要不然被告了。
这里我们可以使用Python,或者下载文档压缩包。
为什么要用索引呢?
因为爬的数据太多了,不索引,难道去遍历吗?时间复杂度太大了。
这里我们需要建立索引,索引分别为正排索引,和倒排索引。
拿LOL举个例子吧,正排就相当于,我们提到无极剑圣的技能就可以联想到
故根据名字选技能
倒排索引就是LOL里面谁有剑
蛮王无极剑圣剑姬
故根据特点选择英雄
二、模块划分1、索引模块
1)扫描下载到的文档,分析内容,构建出,正排索引和倒排索引。并且把索引内容保存到文件中。
2)加载制作i好的索引。并提供一些API实现查正排和查倒排这样的功能。
2、搜索模块
1)调用索引模块,实现一个搜索的完整过程。
3、web模块
需要实现一个简单的web程序,能够通过网页的形式和用户进行交互。
包含了前端和后端。
三、怎么实现分词
分词的原理
1、基于词库
尝试把所有的词都进行穷举,把这些结果放到词典文件中。
2、基于统计
收集到很多的语料库,进行人工标注,知道了那些字在一起的概率比较大~
java中能够实现分词的第三方工具也是有很多的
比如ansj(听说唱的兄弟可能听过ansj,哈哈)这个就是一个maven中央仓库的分词第三方库。
我们直接下载最新版本然后放入pom.xml里面
test包里直接操作:我们使用这个测试代码直接搞。试一下这个包咋用。
import org.ansj.domain.Term;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import org.ansj.splitWord.analysis.ToAnalysis;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.util.List;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />public class TastAnsj {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> public static void main(String[] args) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String str = "易大师是一个有超高机动性的刺客、战士型英雄,擅长利用快速的打击迅速击溃对手,易大师一般打野和走单人路,作为无极剑道的最后传人,易可以迅速砍出大量伤害,同时还能利用技能躲避猛烈的攻击,避开敌人的集火。";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> List terms = ToAnalysis.parse(str).getTerms();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> for (Term term : terms) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println(term.getName());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />}
四、文件读取
把刚刚下载好的文档的路径复制到String中并且用常量标记。
这一步是为了用遍历的方法把所有html文件搞出来,我们这里用了一个递归,如果是绝对路径,就填加到文件链表,如果不是就递归,继续添加里面的值。
import java.io.File;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.util.ArrayList;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />//读取刚刚文档<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />public class Parser {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private static final String INPUT_PATH="D:/test/docs/api";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> public void run(){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //整个Parser类的入口<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //1.根据路径,去枚举出所有的文件.(html);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> ArrayList fileList=new ArrayList();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> enumFile(INPUT_PATH,fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println(fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println(fileList.size());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //2.针对上面罗列出的文件,打开文件,读取文件内容,并进行解析<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //3.把在内存中构造好的索引数据结构,保定到指定的文件中。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //第一个参数表示从哪里开始遍历 //第二个表示结果。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private void enumFile(String inputPath,ArrayListfileList){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> File rootPath=new File(inputPath);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //listFiles 能够获取到一层目录下的文件<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> File[] files= rootPath.listFiles();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> for(File f:files){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //根据当前f的类型判断是否递归。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果f是一个普通文件,就把f加入到fileList里面<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果不是就调用递归<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(f.isDirectory()){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> enumFile(f.getAbsolutePath(),fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }else {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> fileList.add(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> public static void main(String[] args) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //通过main方法来实现整个制作索引的过程<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> Parser parser=new Parser();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> parser.run();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />}<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />
我们尝试运行一下,这里的文件也太多了吧,而且无论是什么都打印出来了。所以我们下一步就是把这些文件进行筛选,选择有用的。
else {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(f.getAbsolutePath().endsWith(",html"))<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> fileList.add(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />}<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />
这个代码就是只是针对末尾为html的文件,下图就是展示结果。
4.1 打开文件,解析内容。
这里分为三个分别是解析Title,解析Url,解析内容Content
4.1.1解析Title
f.getName()是直接读取文件名字的方法。
我们用的name.substring(0,f.getName().length()-5);为什么要用总的文件名字长度减去5呢,因为.HTML刚好就是五。
private String parseTitle(File f) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String name= f.getName();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return name.substring(0,f.getName().length()-5);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />
4.1.2解析Url操作
这里的url就是我们平时去一个浏览器输入一个东西下面会有一个url,这个url就是我们的绝对路径经过截取获得出我们的相对的目录,然后与我们的http进行拼接,这样就可以直接得到一个页面。
private String parseUrl(File f) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String part1="https://docs.oracle.com/javase/8/docs/api/";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String part2=f.getAbsolutePath().substring(INPUT_PATH.length());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return part1+part2;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />
4.1.3解析内容
以为开关进行对数据的读取,以int类型读取,为什么要用int而不是char呢因为int类型读完之后就变成-1可以判断一下是否读取完毕。
具体代码如下很容易理解。
private String parseContent(File f) throws IOException {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //先按照一个一个字符来读取,以作为开关<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> try(FileReader fileReader=new FileReader(f)) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //加上一个是否拷贝的开关.<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> boolean isCopy=true;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //还需要准备一个结果保存<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> StringBuilder content=new StringBuilder();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> while (true){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //此处的read的返回值是int,不是char<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果读到文件末尾,就会返回-1,这是用int的好处;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> int ret = 0;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> try {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> ret = fileReader.read();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> } catch (IOException e) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> e.printStackTrace();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(ret==-1) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> break;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> char c=(char) ret;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(isCopy){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(c==''){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> isCopy=true;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return content.toString();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> } catch (FileNotFoundException e) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> e.printStackTrace();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return "";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />
这一模块总的代码块如下:
import java.io.File;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.io.FileNotFoundException;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.io.FileReader;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.io.IOException;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />import java.util.ArrayList;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />//读取刚刚文档<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />public class Parser {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private static final String INPUT_PATH="D:/test/docs/api";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> public void run(){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //整个Parser类的入口<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //1.根据路径,去枚举出所有的文件.(html);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> ArrayList fileList=new ArrayList();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> enumFile(INPUT_PATH,fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println(fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println(fileList.size());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //2.针对上面罗列出的文件,打开文件,读取文件内容,并进行解析<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> for (File f:fileList){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> System.out.println("开始解析"+f.getAbsolutePath());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> parseHTML(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //3.把在内存中构造好的索引数据结构,保定到指定的文件中。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private String parseTitle(File f) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String name= f.getName();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return name.substring(0,f.getName().length()-5);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private String parseUrl(File f) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String part1="https://docs.oracle.com/javase/8/docs/api/";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String part2=f.getAbsolutePath().substring(INPUT_PATH.length());<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return part1+part2;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private String parseContent(File f) throws IOException {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //先按照一个一个字符来读取,以作为开关<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> try(FileReader fileReader=new FileReader(f)) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //加上一个是否拷贝的开关.<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> boolean isCopy=true;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //还需要准备一个结果保存<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> StringBuilder content=new StringBuilder();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> while (true){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //此处的read的返回值是int,不是char<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果读到文件末尾,就会返回-1,这是用int的好处;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> int ret = 0;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> try {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> ret = fileReader.read();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> } catch (IOException e) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> e.printStackTrace();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(ret==-1) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> break;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> char c=(char) ret;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(isCopy){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(c==''){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> isCopy=true;<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /><br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return content.toString();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> } catch (FileNotFoundException e) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> e.printStackTrace();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> return "";<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private void parseHTML (File f){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //解析出标题<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String title=parseTitle(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //解析出对应的url<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String url=parseUrl(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //解析出对应的正文<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> try {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> String content=parseContent(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> } catch (IOException e) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> e.printStackTrace();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //第一个参数表示从哪里开始遍历 //第二个表示结果。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> private void enumFile(String inputPath,ArrayListfileList){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> File rootPath=new File(inputPath);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //listFiles 能够获取到一层目录下的文件<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> File[] files= rootPath.listFiles();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> for(File f:files){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //根据当前f的类型判断是否递归。<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果f是一个普通文件,就把f加入到fileList里面<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //如果不是就调用递归<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(f.isDirectory()){<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> enumFile(f.getAbsolutePath(),fileList);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }else {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> if(f.getAbsolutePath().endsWith(".html"))<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> fileList.add(f);<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> public static void main(String[] args) {<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> //通过main方法来实现整个制作索引的过程<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> Parser parser=new Parser();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> parser.run();<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /> }<br style="outline: 0px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" />}
--- EOF ---