Cloud Science At GRAIL, our mission is to detect cancer early, when it can be cured. Our approach is data intensive: we sequence cell-free DNA in blood in order to detect minuscule evidence of tumors. In order to support our myriad workloads—among them ad-hoc analyses, model training, and complicated bioinformatics pipelines—we built Reflow, a system and language for scientific computing in the cloud. Reflow’s data processing engine is fully incremental, and focuses on efficiency, reproducibility, and ease-of-use. With Reflow, scientists and engineers write ordinary programs that compose existing tools; these programs are then transparently parallelized, memoized, and distributed across many workers using your favorite cloud computing provider. Reflow is vertically integrated: a single binary evaluates the program and is also responsible for elastic cluster management and execution coordination — this makes Reflow very simple to deploy, operate, and retarget to different cloud providers. I’ll describe how Reflow’s language semantics and runtime are co-designed to yield a simple, robust implementation. I’ll also talk about how Reflow is used for data intensive scientific computing at GRAIL, primarily to analyze next generation sequencing (NGS) data sets.