Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation regarding running local EDSL jobs in parallel. #1536

Open
zer0dss opened this issue Jan 30, 2025 · 0 comments
Open

Recommendation regarding running local EDSL jobs in parallel. #1536

zer0dss opened this issue Jan 30, 2025 · 0 comments
Labels
wontfix This will not be worked on

Comments

@zer0dss
Copy link
Collaborator

zer0dss commented Jan 30, 2025

Running Parallel EDSL Jobs Locally

Solution 1.

The recommended solution for running parallel EDSL jobs locally is to start each job as a separate process using Python’s multiprocessing module.

To achieve this, define your job logic within a worker function. Then, from the main script, start multiple processes that will execute this function.

Example:

from edsl import Jobs
import multiprocessing

def worker_function(crt_process_id,extra_data):
    """Create the job logic and run it."""
    
    job = Jobs.example()
    results = job.run()

    print(f"Process {crt_process_id} ended", results.select("answer.*"))

if __name__ == '__main__': 
    processes = []
    extra_data = {}
    for i in range(10):
        new_process_job = multiprocessing.Process(target=worker_function, args=(i,extra_data))
        new_process_job.start()
        processes.append(new_process_job)

    for process in processes:
        process.join()
        print(f"Process {process.pid} joined")

Solution 2 ( slower if the jobs are CPU intensive)

This solution uses async tasks to run multiple jobs at the same time.

import asyncio
from edsl import Jobs

async def worker_function(crt_task_id):
    """Create and run the job asynchronously."""
    job = Jobs.example()
    results = await job.run_async(disable_remote_inference=True, verbose=True, cache=False)
    print(f"Task {crt_task_id} ended", results.select("answer.*"))

async def main():
    """Create and run multiple async jobs concurrently."""
    tasks = [asyncio.create_task(worker_function(i)) for i in range(10)]
    await asyncio.gather(*tasks)
    
if __name__ == "__main__":
    asyncio.run(main())

Not recommended solution ( using Python threads)

Using Python threads triggers issues in the async event loops used by Openai library. Inside edsl OpenaiService we declare a single async client for each api_key and because the threads will execute calls using the same client this will trigger the issue in the async library.
Example notebook that triggers this issue: https://www.expectedparrot.com/content/f376bb2c-dfed-4361-9ac7-d4b2c4d38de4

@zer0dss zer0dss added the wontfix This will not be worked on label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant