Discussion about this post

User's avatar
samdiago's avatar

Great article! I really enjoyed the visual approach used to explain on-disk storage concepts. The breakdown of storage formats, blocks, and file organization makes a complex topic much easier to understand, especially for data engineers and database professionals. The discussion around Parquet, ORC, and other storage mechanisms provides valuable insight into how modern analytics platforms optimize performance and storage efficiency. A highly informative read for anyone looking to deepen their understanding of data storage fundamentals and big data architectures.

Jason's avatar

While this is a nice explanation of row vs columnar file formats and the concept of row-groups, it doesn't really say much about partitioning? Nor does it give any recommendations on how choices like row group size and file size impact cost and efficiency and how to optimize. I really appreciate that you are willing to tackle these complex topics but am also helpful this can go deeper than just simple concept explanations (we have a lot of those already).

1 more comment...

No posts

Ready for more?