The following for loop is part of a iterative simulation process and is the main bottleneck regarding computational time:

```
import numpy as np
n_int = 10
class Simulation(object):
def loop(self):
for itr in range(n_int):
cols_red_list = []
rows_list = list(range(2500))
diff = np.random.uniform(-1, 1, (2500, 300))
for row in rows_list:
col = next(idx for idx, val in enumerate(diff[row,:]) if val < 0)
cols_red_list.append(col)
print(len(cols_red_list))
sim1 = Simulation()
sim1.loop()
```

Hence, I tried to parallelize it by using the multiprocessing package in hope to reduce computation time:

```
import numpy as np
from multiprocessing import Pool, cpu_count
from functools import partial
n_int = 10
def crossings(row, diff):
return next(idx for idx, val in enumerate(diff[row,:]) if val < 0)
class Simulation(object):
def loop(self):
for itr in range(n_int):
rows_list = list(range(2500))
diff = np.random.uniform(-1, 1, (2500, 300))
if __name__ == '__main__':
num_of_workers = cpu_count()
print('number of CPUs : ', num_of_workers)
pool = Pool(num_of_workers)
cols_red_list = pool.map(partial(crossings,diff = diff), rows_list)
pool.close()
print(len(cols_red_list))
#some code.....
sim1 = Simulation()
sim1.loop()
```

Unfortunately, the parallelization turns out to be much slower compared to the sequential piece of code. Hence my question: Did I use the multiprocessing package properly in that particular example? Are there alternative ways to parallelize the above mentioned for loop ?