search - Best way to pre-process text messages using Hadoop -
i using hadoop process text messages(sms). not sure of best way pre-process these data can efficient search. example, after preprocessing data if searches 'ny' able display messages containing word 'ny'. advisable write pre-processed data xml file , not database.
note: have around 200k text messages in .csv file.
the way import preprocessed data hdfs first import data (csv file in case) database , create table view fine-tunes needs. import data hdfs using sqoop. more information on sqoop can found here
http://www.cloudera.com/blog/2009/06/introducing-sqoop/
for doing sqoop import database take @
http://archive.cloudera.com/cdh/3/sqoop/sqoopuserguide.html#_connecting_to_a_database_server
Comments
Post a Comment