<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for My experience while making AjaxTrend.com</title>
	<atom:link href="http://ajaxtrend.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://ajaxtrend.wordpress.com</link>
	<description>AjaxTrend.com : making</description>
	<lastBuildDate>Tue, 18 Nov 2008 07:42:36 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on How to merge nutch indexes v 0.9 by Kimvais</title>
		<link>http://ajaxtrend.wordpress.com/2007/11/29/how-to-merge-nutch-indexes-v-09/#comment-12</link>
		<dc:creator>Kimvais</dc:creator>
		<pubDate>Tue, 18 Nov 2008 07:42:36 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxtrend.wordpress.com/2007/11/29/how-to-merge-nutch-indexes-v-09/#comment-12</guid>
		<description>I get an error when:

$ bin/nutch index new/indexes new/linkdb/ new/crawldb/ new/segments/*
Indexer: starting
Indexer: linkdb: new/crawldb
Indexer: adding segment: new/segments/20081118093959
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/crawl_fetch
Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/parse_data
Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/parse_text
        at org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:295)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:278)</description>
		<content:encoded><![CDATA[<p>I get an error when:</p>
<p>$ bin/nutch index new/indexes new/linkdb/ new/crawldb/ new/segments/*<br />
Indexer: starting<br />
Indexer: linkdb: new/crawldb<br />
Indexer: adding segment: new/segments/20081118093959<br />
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/crawl_fetch<br />
Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/parse_data<br />
Input path doesnt exist : /opt/nutch-0.9/new/segments/20081118093959/parse_text<br />
        at org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)<br />
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)<br />
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)<br />
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)<br />
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:295)<br />
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)<br />
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:278)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to merge nutch indexes v 0.9 by tigertail</title>
		<link>http://ajaxtrend.wordpress.com/2007/11/29/how-to-merge-nutch-indexes-v-09/#comment-11</link>
		<dc:creator>tigertail</dc:creator>
		<pubDate>Thu, 28 Aug 2008 20:19:32 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxtrend.wordpress.com/2007/11/29/how-to-merge-nutch-indexes-v-09/#comment-11</guid>
		<description>Good stuff for index merging. The question is at the last step, we have index all fetched URLs again. If we have two large crawl1 and crawl2, it takes long time. Is there anyway to avoid index again? I tried

nutch merge mergeaall/index crawl1/indexes crawl2/indexes

It seems to work at the first glance, because we can get some search results. But when i click cached to see the detailed content for 1 URL, it returns error.</description>
		<content:encoded><![CDATA[<p>Good stuff for index merging. The question is at the last step, we have index all fetched URLs again. If we have two large crawl1 and crawl2, it takes long time. Is there anyway to avoid index again? I tried</p>
<p>nutch merge mergeaall/index crawl1/indexes crawl2/indexes</p>
<p>It seems to work at the first glance, because we can get some search results. But when i click cached to see the detailed content for 1 URL, it returns error.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
