Profiling Code: What lines, and what functions are taking the most time?

Profiling Code: What lines, and what functions are taking the most time?

Author
Dr. Nicholas Del Grosso

%load_ext line_profiler
import numpy as np

Profiling Code: What lines, and what functions are taking the most time?

When code feels slow, guessing is almost always wrong. Performance problems rarely live where we expect them to. This notebook introduces a disciplined workflow for improving runtime without breaking correctness: first lock in behavior, then measure where time is actually spent, and only then optimize. The goal is not cleverness — it is controlled, evidence-based improvement.

The notebook is structured in three stages. First, you will learn how to use approval testing to safely refactor internal implementations without changing external behavior. Second, you will use line-level profiling to identify bottlenecks inside individual functions and make targeted improvements. Finally, you will move to full-script profiling, analyzing complete call stacks with professional profiling tools and applying the full workflow — protect, measure, optimize, verify — to a realistic simulation script.

Section 1: Utility Functions

def _approve(function, *args, save=False, **kwargs):
    from pathlib import Path
    from hashlib import sha256

    "A (very) basic approval testing function."
    result = function(*args, **kwargs)
    result_code = sha256(bytes(str(result), encoding='utf8')).hexdigest()[:]
    input_code = sha256(bytes(str((args, kwargs)), encoding='utf8')).hexdigest()[:64]
    approval_filepath = Path(f".approvaltests/{function.__name__}__{input_code}.verify.pkl")
    
    approval_filepath.parent.mkdir(exist_ok=True)
    if save == True or not approval_filepath.exists():
        approval_filepath.write_text(result_code)
        return "Saved results."
    
    approved_result_code = approval_filepath.read_text()
    if result_code == approved_result_code:
        return "Approved: No changes detected."
    
    raise AssertionError("New Result is different from Original Result. Either fix or override.")
    

class utils:
    approve = _approve

Section 2: Section 1: Approval Testing: Changing Functions’ Internal Behavior without Changing Their External Behavior

Optimization almost always involves rewriting code. The danger is subtle breakage: changing internal implementation details that accidentally alter external behavior. Approval testing flips the focus from “is this implementation correct?” to “did the observable result change?” By recording known-good outputs and comparing against them after modification, we can refactor implementations freely while preserving externally visible behavior. In this section, you’ll practice separating what a function does from how it does it.

Code Description
utils.approve(fun, input1, save=True) Records the function, inputs and output into a file for later comparison.
utils.approve(fun, input1) Raises an error if the result has changed; else, does nothing.

Exercises

In each of these exercises, the goal is to take a given function and follow the approval testing procedure:

  1. Run the function with some inputs, saving the results.
  2. Change the code, taking one of the suggested changes (note: some of the suggestions are totally wrong.)
  3. Run the approval tests again and check if the function’s results changed. If so, then something went wrong–you’ll need to change it to something else.

Exercise 1 (Example): “The Sum Total”

def total(x, y):
    return x + y

Task A: Run three different approval tests, so that the results of three different combinationas of inputs are saved:

  • total(2, 6)
  • total(4, 10)
  • total(-3, 100)
utils.approve(total, 2, 6, save=True)
utils.approve(total, 4, 10, save=True)
utils.approve(total, -3, 100, save=True)
'Saved results.'

Task B: Update the code to a different method of computing total:

def total(x, y):
    """
    Possible Changes:
      1. np.sub(x, y)
      2. np.sum(x, y)
      3. np.sum([x, y])
    """
    return np.sum([x, y])

Task C: Run the same approval tests again (this time don’t save the results). If no errors appear, then it all worked! If there is an error, then something is wrong with the changed function; best to go back and fix it.

utils.approve(total, 2, 6)
utils.approve(total, 4, 10)
utils.approve(total, -3, 100)
'Approved: No changes detected.'

Exercise 2: “Spike Counting”

def count_spikes(spikes):
    total = 0
    for spike_count in spikes:
        total += spike_count
    return total

Task A: Run three different approval tests, so that the results of three different combinationas of inputs are saved:

  • count_spikes([1, 0, 0, 2, 0, 1])
  • count_spikes([1])
  • count_spikes([1, 2])
Solution
utils.approve(count_spikes, [1, 0, 0, 2, 0, 1], save=True)
utils.approve(count_spikes, [1], save=True)
utils.approve(count_spikes, [1, 2], save=True)
'Saved results.'

Task B: Update the code to a different method of computing total:

def count_spikes(spikes):
    """
    Possible Changes:
      1. len(spikes)
      2. sum(spikes)
      3. list(spikes)
    """
    total = 0
    for spike_count in spikes:
        total += spike_count
    return total
Solution
def count_spikes(spikes):
    """
    Possible Changes:
      1. len(spikes)
      2. sum(spikes)
      3. list(spikes)
    """
    return sum(spikes)

Task C: Run the same approval tests again (this time don’t save the results). If no errors appear, then it all worked! If there is an error, then something is wrong with the changed function; best to go back and fix it.

Solution
utils.approve(count_spikes, [1, 0, 0, 2, 0, 1])
utils.approve(count_spikes, [1])
utils.approve(count_spikes, [1, 2])
'Approved: No changes detected.'

Section 3: Section 2: Line Profiling with line_profiler: Which Lines are taking the most time?

Code Description
%load_ext line_profiler loads the line_profiler tool into the notebook
%lprun -f fun1 fun1(1000) Runs the code fun1(1000) in the line profiler, while the function fun1 is being profiled.

Once correctness is protected, the next step is measurement. Performance intuition is unreliable, especially in numerical or scientific code. Line-level profiling allows us to see exactly where time is spent inside a function. Instead of arguing about style or speculating about bottlenecks, we can identify the specific lines that dominate runtime. This section focuses on developing the habit of measuring before optimizing, and prioritizing changes based on real data rather than assumptions.

Exercise 1 (Example): Searching through Data

def find_unique_values(data):    
    unique = []
    for x in data:
        if x not in unique:
            unique.append(x)
    return set(unique)


def main():
    data = np.random.randint(0, 10_000, 40_000)
    return find_unique_values(data)

Task A: Use line profiler’s lprun command to find how much time each line in find_unique_values() is run when main() is called.

Which line(s) take up the most time?

Solution
%lprun -f find_unique_values main()
Timer unit: 1e-09 s

Total time: 4.53463 s
File: /tmp/ipykernel_833931/264886108.py
Function: find_unique_values at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def find_unique_values(data):    
     2         1       1530.0   1530.0      0.0      unique = []
     3     40001   55057940.0   1376.4      1.2      for x in data:
     4     40000 4466187952.0 111654.7     98.5          if x not in unique:
     5      9825   12710811.0   1293.7      0.3              unique.append(x)
     6         1     671589.0 671589.0      0.0      return set(unique)

Task B: Write three approval tests, to save the behavior of the function:

Solution
np.random.seed(43)
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000), save=True)
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000), save=True)
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000), save=True)
'Saved results.'

Task C: Below is a copy of the program. Update the code and profile it again, with the goal of reducing the runtime to less then 5% of the original runtime.

def find_unique_values(data):
    """
    Solutions:
      1. `set(data)` automatically finds unique values; no need to search!
    """
    unique = []
    for x in data:
        if x not in unique:
            unique.append(x)
    return set(unique)


def main():
    data = np.random.randint(0, 10_000, 40_000)
    find_unique_values(data)
Solution
def find_unique_values(data):
    """
    Solutions:
      1. `set(data)` automatically finds unique values; no need to search!
    """
    return set(data)


def main():
    data = np.random.randint(0, 10_000, 40_000)
    find_unique_values(data)
new_result = %lprun -f find_unique_values main()
Timer unit: 1e-09 s

Total time: 0.00446078 s
File: /tmp/ipykernel_833931/1239050040.py
Function: find_unique_values at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def find_unique_values(data):
     2                                               """
     3                                               Solutions:
     4                                                 1. `set(data)` automatically finds unique values; no need to search!
     5                                               """
     6         1    4460779.0 4.46e+06    100.0      return set(data)

Task D: Run the approval tests again (without saving): Confirm that the new version of the function gets the same result as the original version!

Solution
np.random.seed(42)
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000))
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000))
utils.approve(find_unique_values, np.random.randint(0, 500, 2_000))
'Saved results.'

Exercise 2: Time-Wasters

import time
import os

def analyze_graph(n=4000):
    edges = {(i, (i * 37) % n) for i in range(n)}
    adjacency = {}

    with open(os.devnull, 'w') as f:
        for a, b in edges:
            adjacency.setdefault(a, []).append(b)
            adjacency.setdefault(b, []).append(a)
            print('Current A:', a, 'Current B', b, file=f, flush=True)


    degrees = {node: len(neigh) for node, neigh in adjacency.items()}
    time.sleep(1.5) 
    heavy_nodes = [k for k, v in degrees.items() if v > 1]

    return sum(heavy_nodes)

def main():
    print("Number of Nodes:", analyze_graph(4000))

Task A: Use line profiler’s lprun command to find how much time each line in analyze_graph() is run when main() is called.

Which line(s) take up the most time?

Solution
%lprun -f analyze_graph main()
Number of Nodes: 7998000
Timer unit: 1e-09 s

Total time: 1.56272 s
File: /tmp/ipykernel_833931/2387032650.py
Function: analyze_graph at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def analyze_graph(n=4000):
     5         1    1573945.0 1.57e+06      0.1      edges = {(i, (i * 37) % n) for i in range(n)}
     6         1       1123.0   1123.0      0.0      adjacency = {}
     7                                           
     8         2     780427.0 390213.5      0.0      with open(os.devnull, 'w') as f:
     9      4001    6565795.0   1641.0      0.4          for a, b in edges:
    10      4000    6040365.0   1510.1      0.4              adjacency.setdefault(a, []).append(b)
    11      4000    5117400.0   1279.3      0.3              adjacency.setdefault(b, []).append(a)
    12      4000   40445346.0  10111.3      2.6              print('Current A:', a, 'Current B', b, file=f, flush=True)
    13                                           
    14                                           
    15         1    1210623.0 1.21e+06      0.1      degrees = {node: len(neigh) for node, neigh in adjacency.items()}
    16         1 1500328829.0  1.5e+09     96.0      time.sleep(1.5) 
    17         1     601420.0 601420.0      0.0      heavy_nodes = [k for k, v in degrees.items() if v > 1]
    18                                           
    19         1      54434.0  54434.0      0.0      return sum(heavy_nodes)

Task B: Write three approval tests, to save the behavior of the function:

Solution
utils.approve(analyze_graph, 10, save=True)
utils.approve(analyze_graph, 30, save=True)
utils.approve(analyze_graph, 95, save=True)
'Saved results.'

Task C: Below is a copy of the program. Update the code and profile it again, with the goal of reducing the runtime to less then 5% of the original runtime.

import time
import os

def analyze_graph(n=4000):
    """
    Solutions:
      1. Remove the biggest time-waster.
      2. Remove the (here) needless output
    
    """
    edges = {(i, (i * 37) % n) for i in range(n)}
    adjacency = {}

    with open(os.devnull, 'w') as f:
        for a, b in edges:
            adjacency.setdefault(a, []).append(b)
            adjacency.setdefault(b, []).append(a)
            print('Current A:', a, 'Current B', b, file=f, flush=True)


    degrees = {node: len(neigh) for node, neigh in adjacency.items()}
    time.sleep(1.5) 
    heavy_nodes = [k for k, v in degrees.items() if v > 1]

    return sum(heavy_nodes)

def main():
    print("Number of Nodes:", analyze_graph(4000))
Solution
import time
import os

def analyze_graph(n=4000):
    """
    Solutions:
      1. Remove the biggest time-waster.
      2. Remove the (here) needless output
    
    """
    edges = {(i, (i * 37) % n) for i in range(n)}
    adjacency = {}

    with open(os.devnull, 'w') as f:
        for a, b in edges:
            adjacency.setdefault(a, []).append(b)
            adjacency.setdefault(b, []).append(a)


    degrees = {node: len(neigh) for node, neigh in adjacency.items()}
    heavy_nodes = [k for k, v in degrees.items() if v > 1]

    return sum(heavy_nodes)

def main():
    print("Number of Nodes:", analyze_graph(4000))
%lprun -f analyze_graph main()
Number of Nodes: 7998000
Timer unit: 1e-09 s

Total time: 0.0335783 s
File: /tmp/ipykernel_833931/472454989.py
Function: analyze_graph at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def analyze_graph(n=4000):
     5                                               """
     6                                               Solutions:
     7                                                 1. Remove the biggest time-waster.
     8                                                 2. Remove the (here) needless output
     9                                           
    10                                               """
    11         1    5670841.0 5.67e+06     16.9      edges = {(i, (i * 37) % n) for i in range(n)}
    12         1       1288.0   1288.0      0.0      adjacency = {}
    13                                           
    14         2    5913581.0 2.96e+06     17.6      with open(os.devnull, 'w') as f:
    15      4001    6615350.0   1653.4     19.7          for a, b in edges:
    16      4000    6938171.0   1734.5     20.7              adjacency.setdefault(a, []).append(b)
    17      4000    6770221.0   1692.6     20.2              adjacency.setdefault(b, []).append(a)
    18                                           
    19                                           
    20         1     986702.0 986702.0      2.9      degrees = {node: len(neigh) for node, neigh in adjacency.items()}
    21         1     548647.0 548647.0      1.6      heavy_nodes = [k for k, v in degrees.items() if v > 1]
    22                                           
    23         1     133538.0 133538.0      0.4      return sum(heavy_nodes)

Task D: Run the approval tests again (without saving): Confirm that the new version of the function gets the same result as the original version!

utils.approve(analyze_graph, 10)
utils.approve(analyze_graph, 30)
utils.approve(analyze_graph, 95)
'Approved: No changes detected.'

Exercise 3: Picking Your Battles

Below is a function that is running way too slow. It has two sections, and let’s pretend that we only have time to work on speeding up one of the two sections. (Note: it’s always this way in the real world–we just don’t have time to fix everything!) Which section would we get the biggest benefit from speeding up?

import numpy as np


def sum_of_squares(data):
    # Square all values
    squares = np.array([])
    for x in data:
        square = x ** 2
        squares = np.append(squares, square)

    # Sum them
    total = 0
    for x in squares:
        total += x
        
    return total


def main():
    data = np.random.random(100_000)
    print(sum_of_squares(data))

Task A: Use line profiler’s lprun command to find how much time each line in sum_of_squares() is run when main() is called.

Which line(s) take up the most time? Which section should, therefore, be improved?

Solution
%lprun -f sum_of_squares main()
33273.384700646435
Timer unit: 1e-09 s

Total time: 2.51441 s
File: /tmp/ipykernel_833931/871225197.py
Function: sum_of_squares at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def sum_of_squares(data):
     5                                               # Square all values
     6         1      16837.0  16837.0      0.0      squares = np.array([])
     7    100001   41269794.0    412.7      1.6      for x in data:
     8    100000   44149165.0    441.5      1.8          square = x ** 2
     9    100000 2371913885.0  23719.1     94.3          squares = np.append(squares, square)
    10                                           
    11                                               # Sum them
    12         1        437.0    437.0      0.0      total = 0
    13    100001   28136865.0    281.4      1.1      for x in squares:
    14    100000   28916922.0    289.2      1.2          total += x
    15                                           
    16         1       3790.0   3790.0      0.0      return total

Task B: Write three approval tests, to save the behavior of the function:

  • sum_of_squares(np.array([1, 2, 3], dtype=np.float64))
  • sum_of_squares(np.array([42], dtype=np.float64))
  • sum_of_squares(np.arange(100, dtype=np.float64))
Solution
utils.approve(sum_of_squares, np.array([1, 2, 3], dtype=np.float64), save=True)
utils.approve(sum_of_squares, np.array([42], dtype=np.float64), save=True)
utils.approve(sum_of_squares, np.arange(100, dtype=np.float64), save=True)
'Saved results.'

Task B: Below is a copy of the program. Only changing one of the two original sections, update the code and profile it again, with the goal of reducing the runtime to less then 5% of the original runtime.

import numpy as np


def sum_of_squares(data):
    # Square all values (solution: squares = data ** 2)
    squares = np.array([])
    for x in data:
        square = x ** 2
        squares = np.append(squares, square)

    # Sum them (solution: total = np.sum(squares))
    total = 0
    for x in squares:
        total += x
        
    return total


def main():
    data = np.random.random(100_000)
    print(sum_of_squares(data))
Solution
import numpy as np


def sum_of_squares(data):
    squares = data ** 2
    total = np.sum(squares)
        
    return total


def main():
    data = np.random.random(100_000)
    print(sum_of_squares(data))
%lprun -f sum_of_squares main()
33197.880365912424
Timer unit: 1e-09 s

Total time: 0.00128098 s
File: /tmp/ipykernel_833931/850249149.py
Function: sum_of_squares at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def sum_of_squares(data):
     5         1     996206.0 996206.0     77.8      squares = data ** 2
     6         1     282927.0 282927.0     22.1      total = np.sum(squares)
     7                                           
     8         1       1845.0   1845.0      0.1      return total

Task D: Run the approval tests again (without saving): Confirm that the new version of the function gets the same result as the original version!

Solution
utils.approve(sum_of_squares, np.array([1, 2, 3], dtype=np.float64))
utils.approve(sum_of_squares, np.array([42], dtype=np.float64))
utils.approve(sum_of_squares, np.arange(100, dtype=np.float64))
'Approved: No changes detected.'

Section 4: Section 3: Profiling full Scripts with cProfile and snakeviz or pyinstrument:

Line profiling is powerful for individual functions, but real systems involve call stacks across many layers. When performance problems span multiple functions, we need higher-level tools. In this section, you will profile an entire script using deterministic profiling (cProfile) and statistical sampling (pyinstrument). The objective is to understand where time accumulates across the full call hierarchy — not just within isolated functions — and to learn how different profilers present performance data.

Code Description
python -m cProfile -o output.prof my_script.py Runs the profiler on my_script.py and saves the output to output.prof
snakeviz output.prof Launches the Snakeviz viewer in the web browser
pyinstrument neural_simulation.py Profiles and outputs the analysis into the terminal
pyinstrument -r html neural_simulation.py Profiles and outputs the analysis in the web browser.

Exercises

Run the cell below to create the file neural_stimulation.py. This is the script that we’ll be profiling.

%%writefile neural_simulation.py

import time
import numpy as np


def generate_stimulus(n_trials: int, n_timepoints: int) -> np.ndarray:
    """Simulate a noisy stimulus time series."""
    base = np.sin(np.linspace(0, 10 * np.pi, n_timepoints))
    noise = np.random.randn(n_trials, n_timepoints) * 0.5
    return base + noise


def simulate_neural_response(stimulus: np.ndarray) -> np.ndarray:
    """Apply a simple nonlinear transformation."""
    response = np.tanh(stimulus)
    time.sleep(.02)
    return response


def compute_mean_firing_rate_loop(response: np.ndarray) -> np.ndarray:
    """Deliberately slow mean computation using Python loops."""
    n_trials, n_timepoints = response.shape
    means = np.zeros(n_trials)
    for i in range(n_trials):
        total = 0.0
        for j in range(n_timepoints):
            total += response[i, j]
        means[i] = total / n_timepoints
    return means


def compute_mean_firing_rate_vectorized(response: np.ndarray) -> np.ndarray:
    """Fast NumPy version."""
    return response.mean(axis=1)


def compute_pairwise_correlations(response: np.ndarray) -> float:
    """Compute average pairwise correlation (intentionally heavy)."""
    corr_matrix = np.corrcoef(response)
    return np.mean(corr_matrix)

def compute_statistics(response):
    slow_means = compute_mean_firing_rate_loop(response)
    fast_means = compute_mean_firing_rate_vectorized(response)
    return slow_means, fast_means


def analyze_trial_batch(n_trials: int, n_timepoints: int) -> float:
    stimulus = generate_stimulus(n_trials, n_timepoints)
    response = simulate_neural_response(stimulus)

    # Compare slow vs fast
    means1, means2 = compute_statistics(response)
    corr_value = compute_pairwise_correlations(response)

    return means1.mean() + means2.mean() + corr_value


def main():
    np.random.seed(42)

    results = []
    for _ in range(5):
        value = analyze_trial_batch(n_trials=200, n_timepoints=1000)
        results.append(value)

    print("Final result:", np.mean(results))


if __name__ == "__main__":
    main()
Overwriting neural_simulation.py

Exercise: Profile the Call Stack using cProfile and snakeviz

Exercise: In the terminal, use cProfile to profile this code, and use snakeviz to view in profiling results in the browser. Which two functions are taking up the most runtime?

pixi run python -m cProfile -o output.prof neural_simulation.py
pixi run snakeviz output.prof
Solution

Exercise: Profile the Call Stack using a Statistical Sampling Profiler: pyinstrument

Exercise: In the terminal, use pyinstrument to profile this code, and view in profiling results in the terminal. Based on the terminal output, which two functions are taking up the most runtime?

pixi run pyinstrument neural_simulation.py
Final result: 0.702670134829342

  _     ._   __/__   _ _  _  _ _/_   Recorded: 12:00:47  Samples:  267
 /_//_/// /_\ / //_// / //_'/ //     Duration: 1.692     CPU time: 20.528
/   _/                      v5.1.2

Program: neural_simulation.py

1.687 <module>  neural_simulation.py:1
├─ 1.402 main  neural_simulation.py:59
│  └─ 1.394 analyze_trial_batch  neural_simulation.py:48
│     ├─ 0.617 compute_pairwise_correlations  neural_simulation.py:37
│     │  └─ 0.617 corrcoef  numpy/lib/function_base.py:2757
│     │     └─ 0.607 cov  numpy/lib/function_base.py:2530
│     ├─ 0.402 compute_statistics  neural_simulation.py:42
│     │  └─ 0.402 compute_mean_firing_rate_loop  neural_simulation.py:20
│     ├─ 0.226 simulate_neural_response  neural_simulation.py:13
│     │  ├─ 0.121 [self]  neural_simulation.py
│     │  └─ 0.105 sleep  <built-in>
│     └─ 0.149 generate_stimulus  neural_simulation.py:6
│        ├─ 0.120 RandomState.randn  <built-in>
│        └─ 0.023 [self]  neural_simulation.py
└─ 0.284 <module>  numpy/__init__.py:1
      [22 frames hidden]  numpy

To view this report with different options, run:
    pyinstrument --load-prev 2026-03-02T12-00-47 [options]

Solution
├─ 1.402 main  neural_simulation.py:59
│  └─ 1.394 analyze_trial_batch  neural_simulation.py:48

Exercise: In the terminal, use pyinstrumnet to profile this code, and open the saved HTML file to view in profiling results in the browser. Which two functions are taking up the most runtime?

pixi run pyinstrument -r html neural_simulation.py
Solution

Project: Speed up the Code!

Now you combine everything: protect behavior with approval testing, measure bottlenecks with a profiler, zoom in with line profiling, and refactor strategically. The provided script intentionally mixes vectorized NumPy code with slow Python loops and artificial delays. The goal is not just to “make it faster,” but to follow a disciplined workflow: measure → prioritize → optimize → verify. If you skip steps, you risk either breaking correctness or optimizing the wrong thing.

Task: Below is the same code again, this time saved as neural_stimulation2.py. This is a chance to try out the full workflow: using the profiler of your choice, and doing approval testing and line profiling, speed up the code!

%%writefile neural_simulation2.py

import time
import numpy as np


def generate_stimulus(n_trials: int, n_timepoints: int) -> np.ndarray:
    """Simulate a noisy stimulus time series."""
    base = np.sin(np.linspace(0, 10 * np.pi, n_timepoints))
    noise = np.random.randn(n_trials, n_timepoints) * 0.5
    return base + noise


def simulate_neural_response(stimulus: np.ndarray) -> np.ndarray:
    """Apply a simple nonlinear transformation."""
    response = np.tanh(stimulus)
    time.sleep(.02)
    return response


def compute_mean_firing_rate_loop(response: np.ndarray) -> np.ndarray:
    """Deliberately slow mean computation using Python loops."""
    n_trials, n_timepoints = response.shape
    means = np.zeros(n_trials)
    for i in range(n_trials):
        total = 0.0
        for j in range(n_timepoints):
            total += response[i, j]
        means[i] = total / n_timepoints
    return means


def compute_mean_firing_rate_vectorized(response: np.ndarray) -> np.ndarray:
    """Fast NumPy version."""
    return response.mean(axis=1)


def compute_pairwise_correlations(response: np.ndarray) -> float:
    """Compute average pairwise correlation (intentionally heavy)."""
    corr_matrix = np.corrcoef(response)
    return np.mean(corr_matrix)

def compute_statistics(response):
    slow_means = compute_mean_firing_rate_loop(response)
    fast_means = compute_mean_firing_rate_vectorized(response)
    return slow_means, fast_means


def analyze_trial_batch(n_trials: int, n_timepoints: int) -> float:
    stimulus = generate_stimulus(n_trials, n_timepoints)
    response = simulate_neural_response(stimulus)

    # Compare slow vs fast
    means1, means2 = compute_statistics(response)
    corr_value = compute_pairwise_correlations(response)

    return means1.mean() + means2.mean() + corr_value


def main():
    np.random.seed(42)

    results = []
    for _ in range(5):
        value = analyze_trial_batch(n_trials=200, n_timepoints=1000)
        results.append(value)

    print("Final result:", np.mean(results))


if __name__ == "__main__":
    main()
Writing neural_simulation2.py