It probably depends a bit on the sizes of a
and k
but often the fastest appears to be combining partition
with flatnonzero
or where
:
>>> a = np.random.random(10000)>>> k = 5>>> >>> timeit("np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])", globals=globals(), number=10000)0.8328661819687113>>> timeit("np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])", globals=globals(), number=10000)1.0577796879806556>>> np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])array([2527, 4299, 5531, 6945, 7174])>>> np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])array([2527, 4299, 5531, 6945, 7174])
Note 1: this highlights the significant performance cost of indirect indexing.
Note 2: as we only use the pivot element and discard the actual partition percentile
should in theory be at least as fast but in practice it is way slower.