While this is a nice explanation of row vs columnar file formats and the concept of row-groups, it doesn't really say much about partitioning? Nor does it give any recommendations on how choices like row group size and file size impact cost and efficiency and how to optimize. I really appreciate that you are willing to tackle these complex topics but am also helpful this can go deeper than just simple concept explanations (we have a lot of those already).
Thanks for the feedback, Jason! I've been debating about how shallow / deep to go on these topics and am still trying to figure out both my target audience, so I really appreciate your thoughts.
Here's what I'll do. I'm going to make this Part 1 in a series on partitioning. In subsequent posts, I'm going to dive more into these topics in a deeper and more hands-on level.
While this is a nice explanation of row vs columnar file formats and the concept of row-groups, it doesn't really say much about partitioning? Nor does it give any recommendations on how choices like row group size and file size impact cost and efficiency and how to optimize. I really appreciate that you are willing to tackle these complex topics but am also helpful this can go deeper than just simple concept explanations (we have a lot of those already).
Thanks for the feedback, Jason! I've been debating about how shallow / deep to go on these topics and am still trying to figure out both my target audience, so I really appreciate your thoughts.
Here's what I'll do. I'm going to make this Part 1 in a series on partitioning. In subsequent posts, I'm going to dive more into these topics in a deeper and more hands-on level.
Thanks again and please keep the feedback coming!