The provenance of a piece of data is of utility to a wide range of applications. Its availability can be drastically increased by automatically collecting lineage information during filesystem operations. However, when data is processed by multiple users in independent administrative domains, the resulting filesystem metadata can be trusted only if it has been cryptographically certified. This has three ramifications: it slows down filesystem operations, it requires more storage for metadata, and verification depends on attestations from remote nodes. We show that current schemes do not scale in a distributed environment. In particular, as data is processed, the latency of filesystem operations will degrade exponentially. Further, the amount of storage needed for the lineage metadata will grow at a similar rate. Next, we examine a completely decentralized scheme that has fast filesystem operations with minimal storage overhead. We demonstrate that its verification operation will fail with an exponentially increasing likelihood as more nodes are unreachable (because of being powered off or disconnected from the network). Finally, we present a new scheme, Bonsai, where the verification failure is significantly reduced by tolerating a small increase in filesystem latency and storage overhead for certification compared to filesystems without lineage certification.
Read in PDF