Skip to content
rohitjoshi edited this page Sep 10, 2012 · 16 revisions

lua-mapreduce is a fast and easy MapReduce implementation for lua inspired by other ma-reduce implementation and particularly octopy in python.

It doesn't aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there's a good chance that you can make it distributed with just a few small changes.

It uses following lua modules.

  1. lausocket: tcp client-server connectivity
  2. copas: Coroutine Oriented Portable Asynchronous Services for Lua
  3. lualogging
  4. serialize(included in this project)
  5. luafilesystem: Used only in the task-file example to list files from the directory. lua-mapreduce client/server doesn't depend on this module

For windows, you can install luaforwindows which includes these modules.

For Linux/Unix/MacOS and Windows: you can use LuaDist

Directory structure:

  1. lua-mapreduce-server.lua : It is a map-reduce server which receives the connections from clients, sends them task-file and than sends them tasks to perform map/reduce functionality.

  2. lua-mapreduce-client.lua : It connects to the server, receives the task and executes map/reduce functions defines in the task-file

  3. utils/utils.lua : Provides utility functionality

  4. utils/serialize.lua : Provides table serialization functionality

  5. example/word-count-taskfile.lua : Example task-file for counting words from all .lua files in current directory More details on how to create task file is given in word-count example page of wiki.

Example:

  1. Start Server: lua-mapreduce-server.lua -t task-file.lua [-s server-ip -p port -l loglevel]

  2. Start Client: lua-mapreduce-client.lua [-s server-ip -p port -l loglevel]

Todo

  1. Add support to handled failed task. currently if client disconnect, the task handled by the client is lost
  2. Support for multiple client connections based on number of cores available on the computer. Use copas for async
  3. Ability to send multiple task-files to the server.
  4. Add more example of task-files
  5. Add support for filter after reduce is performed
  6. Possibly integrate with apache-mesos
Clone this wiki locally