Abstract

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

by Matteo Frigo and Volker Strumpen.

Theory of Computing Systems August 2009, Volume 45, Issue 2, pp 203--233.
An earlier version of this paper appears in the Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'06).
[Full text]


Abstract

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs $O(n^3/\sqrt{Z} + (P n)^{1/3} n^2)$ cache misses when executed by the Cilk scheduler on a machine with~$P$ processors, each with a cache of size~$Z$, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring $O(n^2/Z + n + \sqrt{P n^{3+\epsilon}})$ cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring $O(n^2/Z + \sqrt{P n^{3.58}})$ cache misses with high probability.


This page maintained by: Matteo Frigo. Last updated: Mon Sep 16 16:14:09 2019