- manOfSteelReview - classic word count example, but using Man Of Steel review from various websites. Would like to eventually use Flume to ingest from Twitter streams hash tags
- averageWordLength - another classic mapreduce sample, compute average word length
- invertedIndex - create an inverted index of every word from list of files
- logAnalysisCounter - example project to demonstrate MapReduce Counters functionality
- logAnalysis - example project deonstrating MapReduce multiple partitioners, where individual logs files are generated on a month basis
- tidem - text analyzer that processes text and provides information about its word contents. Generate key value pairs that shows a count of how many times each word occurs in the text. Result is Primary sort by word length, and a Secondary sort based on ASCII.
Note: For detailed instructions on setting up Hadoop in your local environment:
http://hadoop.apache.org/docs/stable/single_node_setup.html
-
Clone this repository and download Hadoop Distribution from here, you will need the libraries from ${hadoop_home}/lib dir http://hadoop.apache.org/releases.html
-
Add the Hadoop /lib libraries to your classpath in your IDE of choice
-
For example, if you would like to run "logAnalysis" MapReduce job:
run mvn clean package from the project directory (ex. ${user_workspace}/hadoop-mapreduce-local/code/logAnalysis/target
You should be able to run the mapreduce job using appropriate input and output directories