<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Vinoo Ganesh]]></title><description><![CDATA[The unglamorous truth about data, AI, and building things that actually work.]]></description><link>https://blog.vinoo.io</link><image><url>https://substackcdn.com/image/fetch/$s_!jESO!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8403d0-5108-45b7-b53e-ef7a679af264_1280x1280.png</url><title>Vinoo Ganesh</title><link>https://blog.vinoo.io</link></image><generator>Substack</generator><lastBuildDate>Thu, 16 Apr 2026 12:28:17 GMT</lastBuildDate><atom:link href="https://blog.vinoo.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Vinoo Ganesh]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[vinooganesh@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[vinooganesh@substack.com]]></itunes:email><itunes:name><![CDATA[Vinoo Ganesh]]></itunes:name></itunes:owner><itunes:author><![CDATA[Vinoo Ganesh]]></itunes:author><googleplay:owner><![CDATA[vinooganesh@substack.com]]></googleplay:owner><googleplay:email><![CDATA[vinooganesh@substack.com]]></googleplay:email><googleplay:author><![CDATA[Vinoo Ganesh]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Efficiently Guide to Snowflake (Top Down)]]></title><description><![CDATA[4 changes you can make *right now* to run Snowflake more Efficiently.]]></description><link>https://blog.vinoo.io/p/snowflake-top-down</link><guid isPermaLink="false">https://blog.vinoo.io/p/snowflake-top-down</guid><dc:creator><![CDATA[Vinoo Ganesh]]></dc:creator><pubDate>Thu, 02 Feb 2023 22:15:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/924b2ceb-6203-4f07-b341-766c755c83d4_670x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p> The majority of my career has been focused on making data systems more efficient - on a variety of dimensions: performance, scalability, and cost. Through these experiences, I&#8217;ve developed a multitude of mental frameworks that, until today, I&#8217;ve kept mostly in my head. </p><p>The goal of this series has always been to democratize knowledge about how to <em><strong>Efficiently</strong></em> operationalize data, and this involves understanding how to approach data optimization problems in an effective and meaningful way. </p><p>This point will focus on what I call the &#8220;Top Down&#8221; portion of one of the biggest players in industry. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3YYX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3YYX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 424w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 848w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 1272w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3YYX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png" width="443" height="134.59882005899706" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:1356,&quot;resizeWidth&quot;:443,&quot;bytes&quot;:91543,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3YYX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 424w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 848w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 1272w, https://substackcdn.com/image/fetch/$s_!3YYX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf26a59-6a26-43e3-9732-b5d235fe9306_1356x412.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Snowflake has rapidly become one of the top tools for data processing in the ecosystem. It is powerful, easy to use, SQL based, seems to scale up/down easily, and most importantly, mostly just works. </p><p>This won&#8217;t be an architecture post, as the Select.dev team has done an incredible job of that <a href="https://select.dev/posts/snowflake-architecture">here</a>. I&#8217;d strongly recommend reading that post before this one.</p><p>Rather, the goal of this post is to the clear and concrete drivers of cost in Snowflake. Then give you a few strategies that you can use to ensure that you&#8217;re getting your optimal ROI from your Snowflake investment.   </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><h1>TLDR </h1><p>&#8220;Vinoo, just tell me what to do. Don&#8217;t tell me why.&#8221;</p><p>Okay, here&#8217;s what I think you should do right now. </p><p><em>Disclaimer: These steps are <strong>very</strong> opinionated and work well on a majority of workloads. However, it is up to you to make sure that they work in your unique setup. </em></p><ol><li><p>File a ticket on Snowflake to get access to <a href="https://docs.snowflake.com/en/LIMITEDACCESS/get_query_stats.html">GET_QUERY_STATS</a>:</p></li><li><p>Minimize your warehouse AUTO_SUSPEND time:</p><ol><li><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>SET</code> AUTO_SUSPEND = 60;</p></li></ol></li><li><p>For all multi-cluster Warehouses:</p><ol><li><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>SET </code>MIN_CLUSTER_COUNT = 1;</p></li><li><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>set </code>SCALING_POLICY = ECONOMY;</p></li></ol></li><li><p>Modify STATEMENT_TIMEOUT_IN_SECONDS to be a value lower than the default of 2 days. </p><ol><li><p>ALTER WAREHOUSE &lt;warehouseName&gt; set STATEMENT_TIMEOUT_IN_SECONDS=36000</p></li></ol></li></ol><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/p/snowflake-top-down?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Know people that can benefit from this information? Feel free to share it!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/p/snowflake-top-down?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/p/snowflake-top-down?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Now, let me tell you why.</p><h1>Snowflake + Driving</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sXlD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sXlD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sXlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg" width="417" height="234.5625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:417,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Snowflake: The Race is On! - YouTube&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Snowflake: The Race is On! - YouTube" title="Snowflake: The Race is On! - YouTube" srcset="https://substackcdn.com/image/fetch/$s_!sXlD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sXlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F701002b5-339f-4e26-8623-70a00f15e57c_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>If you&#8217;ve spoken to me about this space in the past, you&#8217;ve likely heard me the Snowflake ecosystem as a busy city. </p><p>You as a query author would like to get from Point A to Point B in the most gas-efficient way possible, but you do have a few limitations:</p><ol><li><p>Your car&#8217;s overall gas-efficiency </p><ol><li><p>Hummers tend to be less cost effective than say a Hyundai Elantra</p></li></ol></li><li><p>Your skill as a driver</p><ol><li><p>Slamming the brakes at stop lights or hitting the gas as soon as the light turns green eats up a lot of gas</p></li></ol></li><li><p>How busy the roads are</p><ol><li><p>Sitting in traffic eats up gas</p></li></ol></li><li><p>The route you take</p><ol><li><p>Taking a non-optimal route results in burning more gas than necessary</p></li></ol></li></ol><p>In Snowflake land, you have roughly the same set of challenges. Your ability to write a query to retrieve results in an optimal way have to do with:</p><ol><li><p>Your choice of warehouse for the query you&#8217;re running</p><ol><li><p>Big warehouses are like Hummers, they consume a lot of gas and may not be the optimal choice unless you need them.</p></li></ol></li><li><p>Your skill as a query author</p><ol><li><p>There are folks that know every bit of the optimal SQL query authorship model in and out. </p></li></ol></li><li><p>How saturated your Warehouse is </p><ol><li><p>If there are a number of other queries running on your warehouse, it can slow down your query&#8217;s time to results</p></li></ol></li><li><p>How you construct your set of queries to yield results</p><ol><li><p>This has to do with how &#8220;easy&#8221; your data is to query - schema, layout, partitioning, etc&#8230; and how well your query takes advantage of them. </p></li></ol></li></ol><h3>Top down vs. Bottom Up</h3><p>It is clear that there are two areas to optimize looking at the world this way. </p><ol><li><p>The &#8220;Environment&#8221;</p></li><li><p>The &#8220;Driver&#8221; </p></li></ol><p>Meaning, there are things that we can do to optimize the usage of the car as well as the environment the car is driving in, without affecting the driver in any way.</p><p>I&#8217;m going to call optimizing the Environment the &#8220;Top Down&#8221; approach, since it involves us looking at the top elements (the core infrastructure) of the system. I&#8217;m going to call optimizing the Driver the &#8220;Bottom Up&#8221; approach, given that it involves the operations of the system.</p><p>Picking the <strong>ideal</strong> car for the job can be complicated, so let&#8217;s try and focus on optimizing our current car for now. </p><p>Specifically:</p><ol><li><p>Let&#8217;s ensure we have the data we need to debug.</p></li><li><p>Let&#8217;s ensure the car is on for a minimal amount of time, when it is not needed. </p></li><li><p>Let&#8217;s ensure that our engine cylinder count is optimal. </p></li><li><p>Let&#8217;s ensure that trips that are unduly long are stopped at some point.</p><p></p></li></ol><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><h2>Insight #1: Get Data</h2><p>Solving problems requires data. While most of this post is about the Top Down portion, equally important is setting ourselves up for the subsequent bottom up portion. </p><p>There are two functions in particular you will want access to. </p><ol><li><p>GET_QUERY_STATS - https://docs.snowflake.com/en/LIMITEDACCESS/get_query_stats.html</p></li><li><p>GET_QUERY_OPERATOR_STATS </p></li></ol><p><strong>Insight: File a Snowflake Ticket to get access to GET_QUERY_STATS. </strong>GET_QUERY_OPERATOR_STATS is a preview feature enabled on all accounts</p><h2>Insight #2: Turn Your Car Off </h2><p>A Warehouse is a logical grouping of cloud servers. That&#8217;s really it. What makes warehouses complicated though, is that they are the driver of Snowflake cost.</p><p>Think about a warehouse like a car. When the car is on, regardless of whether it is doing anything, you&#8217;re burning gas. As such, it&#8217;s a good idea to turn your car on, only when you need to drive somewhere. </p><p>However, there is one important detail. People generally don&#8217;t forget to turn their car off, but they do forget to turn their warehouses off.</p><p>Luckily, Snowflake has built an <strong>autosuspend</strong> time into the warehouses. </p><p>The <strong>autosuspend</strong> time is the time before your idle warehouse automatically suspends. </p><p><strong>Insight:</strong> Your car (warehouse) should only be on when you need it. Minimize the time that it is on and doing nothing.</p><p>At any given moment, a warehouse is in one of the following states: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ts23!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ts23!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 424w, https://substackcdn.com/image/fetch/$s_!ts23!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 848w, https://substackcdn.com/image/fetch/$s_!ts23!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 1272w, https://substackcdn.com/image/fetch/$s_!ts23!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ts23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png" width="1171" height="932" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:932,&quot;width&quot;:1171,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ts23!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 424w, https://substackcdn.com/image/fetch/$s_!ts23!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 848w, https://substackcdn.com/image/fetch/$s_!ts23!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 1272w, https://substackcdn.com/image/fetch/$s_!ts23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6bdd53c-dfff-4d91-bec9-c2516a157ab1_1171x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s break this down:</p><ol><li><p><strong>Suspended - </strong>The warehouse is off. And, of course, not charging you anything</p></li><li><p><strong>Running</strong> - The warehouse is on and you&#8217;re being charged.</p><ol><li><p><strong>Idle - </strong>The warehouse is on, and no queries are running. You&#8217;re still being charged.</p><ol><li><p><strong>Will Suspend</strong> - You have no queries running and you are within your <strong>AUTO_SUSPEND</strong> time. This is normally a <strong>bad</strong> state that we want to get rid of. </p></li><li><p><strong>Won&#8217;t Suspend</strong> - You are within your <strong>AUTO_SUSPEND</strong> time but a query will come in before the auto suspend time is hit (because of orchestration or something else). Removing this state generally involves some investigative work or changes to query orchestration schedule. </p></li></ol></li><li><p><strong>Active - </strong>The warehouse is on, and queries are running. This is good. </p></li></ol></li></ol><p>What does this look like in practice?</p><p>Let&#8217;s say you have a consumption history that looks like this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zi8d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zi8d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 424w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 848w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 1272w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zi8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png" width="529" height="325.53846153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1456,&quot;resizeWidth&quot;:529,&quot;bytes&quot;:62513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zi8d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 424w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 848w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 1272w, https://substackcdn.com/image/fetch/$s_!zi8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F396dda54-265e-4f50-b03b-cf6790d6beba_1551x954.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s break this down a bit. Here&#8217;s what really happening. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UjRn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UjRn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 424w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 848w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UjRn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png" width="571" height="443.03245749613603" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1004,&quot;width&quot;:1294,&quot;resizeWidth&quot;:571,&quot;bytes&quot;:107996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UjRn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 424w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 848w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!UjRn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb9dd6e-7524-4122-9613-a4cc947900bc_1294x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the simplest case and matches the above. Let&#8217;s pick a slightly more complicated case.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RN-g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RN-g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 424w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 848w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 1272w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RN-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png" width="593" height="457.80610795454544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1087,&quot;width&quot;:1408,&quot;resizeWidth&quot;:593,&quot;bytes&quot;:128750,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RN-g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 424w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 848w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 1272w, https://substackcdn.com/image/fetch/$s_!RN-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29e628-60a9-44ff-a524-bf3cb34c9d0a_1408x1087.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this situation, there is a period of time (the pink &#8220;No Queries Running&#8221;) when the warehouse is up/active, but with no queries running. In this period of time, the warehouse is, however, <strong>unable</strong> to suspend because the this time is less than the warehouse autosuspend time. </p><p><strong>To optimize your Snowflake efficiency, your goal is to minimize the Red &#8220;No Queries Running&#8221;, while looking for ways to also minimize the Pink &#8220;No Queries Running.&#8221; </strong></p><p>The minimum auto-suspend that Snowflake supports is 30 seconds. However, Snowflake will charge you the full minute. For that reason, it&#8217;s best to keep auto_suspend at 60 seconds. </p><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>SET</code> AUTO_SUSPEND = 60;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><h2>Insight #3: Car Engines + Cylinders</h2><p>Car engines come in a variety of options: 2 cylinders, 4 cylinders, 6 cylinders, and 8+ cylinders for some trucks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-QJ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-QJ2!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-QJ2!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif" width="451" height="253.6875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:800,&quot;resizeWidth&quot;:451,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Here's How Your Car's Engine Works&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Here's How Your Car's Engine Works" title="Here's How Your Car's Engine Works" srcset="https://substackcdn.com/image/fetch/$s_!-QJ2!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!-QJ2!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f7e9e2-8dc8-491c-8e0c-7da1786604e8_800x450.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The more cylinders you have, the more power the car has, and the more gas that is consumed. A V6 (6 cylinder) car will have more power than a V4 (3 cylinder) car, but the latter will generally have better fuel economy. </p><p>Snowflake has a somewhat similar concept - called <strong>clusters</strong>. Clusters though, are effectively full copies of your warehouse. In fact, Snowflake bills at <code># Warehouse * # Clusters</code>. In Snowflake land, the most clusters you have the most you can do at a time under one warehouse. </p><p>Now, Snowflake offers one pretty powerful feature, the ability to scale up the number of clusters in a warehouse. Initiatively, you can think of this as only adding an additional cylinder when you need it.</p><p>In the <em>Efficiently</em> case, we want to maximize our cost efficiency, meaning we only want resources when we need them and want to scale only when necessary. </p><p><strong>First, let&#8217;s make sure we always start with only one cluster. No point in starting with more resources than we need.</strong> </p><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>SET </code>MIN_CLUSTER_COUNT = 1;</p><p><strong>Next, let&#8217;s make sure to only scale up when we really need to. You can read more about scaling policies <a href="https://docs.snowflake.com/en/user-guide/warehouses-multicluster.html#setting-the-scaling-policy-for-a-multi-cluster-warehouse">here</a>.</strong> </p><p><code>ALTER WAREHOUSE</code> &lt;warehouseName&gt; <code>set </code>SCALING_POLICY = ECONOMY;</p><h2>Insight #4: Restrict Trip Distance</h2><p>Our last top down insight focuses on the &#8220;something is clearly wrong&#8221; case. The Federal Motor Carrier Safety Administration <a href="https://www.fmcsa.dot.gov/regulations/hours-service/summary-hours-service-regulations">has a clear rule</a> about drivers who are driving too long. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TaJ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TaJ2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 424w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 848w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 1272w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TaJ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png" width="685" height="267.59284731774414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:1454,&quot;resizeWidth&quot;:685,&quot;bytes&quot;:106741,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TaJ2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 424w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 848w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 1272w, https://substackcdn.com/image/fetch/$s_!TaJ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8c1e1e-aaf9-4510-ba93-99844d108f71_1454x568.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Thinking about driving for 10 hours straight without a break is a fairly large ordeal for many drivers, so why let our queries operate this way?</p><p>Luckily, Snowflake has a session called <a href="https://docs.snowflake.com/en/sql-reference/parameters.html#statement-timeout-in-seconds">STATEMENT_TIMEOUT_IN_SECONDS</a>. This specifies the maximum time that a query is allowed to run before it dies. Meaning, the longest amount of time our driver is allowed to drive. </p><p>Strangely though, Snowflake&#8217;s default value for this parameter is 172,800 seconds&#8230; which is 2 days. These poor drivers! </p><p>Let&#8217;s give them (and your wallet) a break. </p><p>This value can be configured on a per-warehouse basis. </p><p><strong>I prefer something more reasonable - let&#8217;s just use the same 10 hour (36,000 seconds) driving limit that passenger-carrying vehicles are permitted.</strong> </p><p>ALTER WAREHOUSE &lt;warehouseName&gt; set STATEMENT_TIMEOUT_IN_SECONDS=36000</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><h2>Conclusion</h2><p>There is a prevalent idea floating around that Snowflake is expensive. That can be true, but as is the case in most of these systems, it really comes down to how effectively and <em>Efficiently</em> you use Snowflake.</p><p>Thoughts, ideas, hate my car analogy? Leave a comment and let me know.</p>]]></content:encoded></item><item><title><![CDATA[Hands-On: Predicate Pushdown ]]></title><description><![CDATA[Query optimizers do a lot for us. Let's see how.]]></description><link>https://blog.vinoo.io/p/hands-on-predicate-pushdown</link><guid isPermaLink="false">https://blog.vinoo.io/p/hands-on-predicate-pushdown</guid><dc:creator><![CDATA[Vinoo Ganesh]]></dc:creator><pubDate>Sat, 28 Jan 2023 22:28:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;ve spoken a lot about on-disk and distributed storage, as well as blocks. All of this theory is great, let&#8217;s talk about this in practice. Here&#8217;s what we&#8217;re going to do:</p><ol><li><p>We&#8217;ll read a CSV dataset into Spark</p></li><li><p>We&#8217;ll write the dataset into 5 Parquet files (treating each file as a block)</p><ol><li><p>We&#8217;re going to have 1 row group per block for simplicity&#8217;s sake.</p></li></ol></li><li><p>We&#8217;ll introspect some of the metadata that exists on the files.</p></li><li><p>We&#8217;ll run queries that show the power of predicate pushdown.</p></li></ol><p>You can follow along using the commands below.</p><h1>Hands-On: Setup</h1><p>I&#8217;m first going to pick a random example dataset.</p><pre><code><code>$ wget https://raw.githubusercontent.com/curran/data/gh-pages/vegaExamples/airports.csv -O dataset.csv</code></code></pre><pre><code><code>$ head dataset.csv
iata,name,city,state,country,latitude,longitude
00M,Thigpen,Bay Springs,MS,USA,31.95376472,-89.23450472
00R,Livingston Municipal,Livingston,TX,USA,30.68586111,-95.01792778
00V,Meadow Lake,Colorado Springs,CO,USA,38.94574889,-104.5698933
01G,Perry-Warsaw,Perry,NY,USA,42.74134667,-78.05208056
01J,Hilliard Airpark,Hilliard,FL,USA,30.6880125,-81.90594389
01M,Tishomingo County,Belmont,MS,USA,34.49166667,-88.20111111
02A,Gragg-Wade,Clanton,AL,USA,32.85048667,-86.61145333
02C,Capitol,Brookfield,WI,USA,43.08751,-88.17786917
02G,Columbiana County,East Liverpool,OH,USA,40.67331278,-80.64140639</code></code></pre><p>Next, I&#8217;m going to open my spark shell and read this data in, inferring it&#8217;s schema.</p><pre><code><code>$ ./spark-shell
scala&gt; val dataset = spark.read.option("header","true").option("inferSchema","true").csv("dataset.csv")
dataset: org.apache.spark.sql.DataFrame = [iata: string, name: string ... 5 more field
</code></code></pre><p>And let&#8217;s check out the inferred schema. We can see that spark has inferred the data type of the <code>latitude</code> and <code>longitude</code> column as doubles. </p><pre><code><code>scala&gt; dataset.printSchema
root
 |-- iata: string (nullable = true)
 |-- name: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- country: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)</code></code></pre><p>For purposes of this example, I&#8217;m going to force our hand a bit and write this dataset into 5 <a href="https://parquet.apache.org">Parquet</a> files. By writing the files in Parquet, extra metadata will be added to each file that that gives the readers information about the contents of the file.  </p><pre><code><code>scala&gt; dataset.repartition(5).write.parquet("/root/parquet_dataset")</code></code></pre><p>Let&#8217;s see how these files look on disk. </p><pre><code><code>$ ls /root/parquet_dataset
part-00000-2c88417e-fb93-4b83-8912-0027ade7804c-c000.snappy.parquet
part-00001-2c88417e-fb93-4b83-8912-0027ade7804c-c000.snappy.parquet
part-00002-2c88417e-fb93-4b83-8912-0027ade7804c-c000.snappy.parquet
part-00003-2c88417e-fb93-4b83-8912-0027ade7804c-c000.snappy.parquet
part-00004-2c88417e-fb93-4b83-8912-0027ade7804c-c000.snappy.parquet
_SUCCESS</code></code></pre><p>Let&#8217;s ignore the <code>_SUCCESS</code> file for now given that it&#8217;s mostly just a flag that our write job completed successfully (and yes, it&#8217;s very possible that write jobs can fail at times). </p><p>Unfortunately, we can use something like vim or emacs to open these files, given that they are in the Parquet file format.</p><p>To aid in our investigative process, we&#8217;re going to install 2 libraries that will help us introspect Parquet data. </p><pre><code><code>pip3 install parquet-tools
pip3 install parquet-metadata</code></code></pre><p>Now, let&#8217;s look at one of the files. Note: the names of your <em>part</em> files, if you are following along at home, may be different from the names of mine above.</p><pre><code><code>$ parquet-tools show part-00000-53b27d15-b049-41db-a8aa-fa3033763836-c000.snappy.parquet</code></code></pre><p>You should see the data contained in this file nicely printed out. </p><p>At this point, we have 5 parquet files (5 Blocks) each with a distinct subset of our initial file. </p><h1>Hands-On: Query Plans  </h1><p>Let&#8217;s query this data and see how the magic of partitioning can help us. </p><p>Let&#8217;s first read our CSV dataset:</p><pre><code><code>scala&gt; val dataset = spark.read.option("header","true").option("inferSchema","true").csv("dataset.csv")</code></code></pre><p>And run a simple filter operation.</p><pre><code><code>scala&gt; val simpleFilter = dataset.filter($"latitude" &gt; 30)</code></code></pre><p>You can see the results with the following command:</p><pre><code><code>scala&gt; simpleFilter.show()</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A5wt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A5wt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 424w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 848w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A5wt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png" width="1456" height="922" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:922,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:475725,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!A5wt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 424w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 848w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!A5wt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b85603-e7b0-46b1-a9e0-859f10b83c8c_1670x1058.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can see that all of our values for the <code>latitude</code> column are &gt; 30.</p><p>Now, let&#8217;s dig a little deeper and see what&#8217;s going on behind the scenes.</p><pre><code><code>res3: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
'Filter ('latitude &gt; 30)
+- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Analyzed Logical Plan ==
iata: string, name: string, city: string, state: string, country: string, latitude: double, longitude: double
Filter (latitude#21 &gt; cast(30 as double))
+- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Optimized Logical Plan ==
Filter (isnotnull(latitude#21) AND (latitude#21 &gt; 30.0))
+- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Physical Plan ==
*(1) Filter (isnotnull(latitude#21) AND (latitude#21 &gt; 30.0))
+- FileScan csv [iata#16,name#17,city#18,state#19,country...</code></code></pre><p>This seems like a lot, so let me make it easier.</p><p>The parsed logical plan has our filter condition. So does the analyzed logical plan, but it has pushed the condition down a bit, and has cast our predicate value as a double (which makes sense, since schema inference has deemed that age is a double). Finally, the optimized logical plan has added some null checking&#8212;which also matches our predicate.</p><pre><code><code>  == Parsed Logical Plan ==
  'Filter ('latitude &gt; 30)
  ...

  == Analyzed Logical Plan ==
  ...
  Filter (latitude#21 &gt; cast(30 as double))
  ...

  == Optimized Logical Plan ==
  Filter (isnotnull(latitude#21) AND (latitude#21 &gt; 30.0))
  ..</code></code></pre><p>Let's make our query a bit more complicated and look for a set of values in a range on the same column.</p><pre><code><code>scala&gt; val complexFilter = dataset.filter($"latitude" &gt; 30).filter($"latitude" &lt; 40)</code></code></pre><p>Once again, let&#8217;s look at the query plan.</p><pre><code><code>res4: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
'Filter ('latitude &lt; 40)
+- Filter (latitude#21 &gt; cast(30 as double))
   +- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Analyzed Logical Plan ==
iata: string, name: string, city: string, state: string, country: string, latitude: double, longitude: double
Filter (latitude#21 &lt; cast(40 as double))
+- Filter (latitude#21 &gt; cast(30 as double))
   +- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Optimized Logical Plan ==
Filter ((isnotnull(latitude#21) AND (latitude#21 &gt; 30.0)) AND (latitude#21 &lt; 40.0))
+- Relation[iata#16,name#17,city#18,state#19,country#20,latitude#21,longitude#22] csv

== Physical Plan...</code></code></pre><p>Let&#8217;s simplify it again. As you can see below, the plan has combined both of our predicates into one step as part of the query process (meaning that what would previously take two passes over the data now only requires one). Meaning, we&#8217;re looking for values where <code>latitude</code> is &gt; 30 AND &lt; 40 during the same pass of the data. </p><pre><code><code>  == Parsed Logical Plan ==
  'Filter ('latitude &lt; 40)
  +- Filter (latitude#21 &gt; cast(30 as double))
  ...

  == Analyzed Logical Plan ==
  ...
  Filter (latitude#21 &lt; cast(40 as double))
  +- Filter (latitude#21 &gt; cast(30 as double))
  ...

  == Optimized Logical Plan ==
  Filter ((isnotnull(latitude#21) AND (latitude#21 &gt; 30.0)) AND (latitude#21 &lt; 40.0))
  ...</code></code></pre><p>Let&#8217;s exit out of our spark shell and play a bit more with Parquet.</p><h1>Hands-On: Querying with Parquet </h1><p> Remember, we have 5 Parquet files. Let&#8217;s inspect one of them.</p><pre><code><code>$ parquet-metadata /root/parquet_dataset/part-00000-53b27d15-b049-41db-a8aa-fa3033763836-c000.snappy.parquet</code></code></pre><p>You will get an output that looks like the following. </p><pre><code><code>file    created_by      parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)
file    columns 7
file    row_groups      1
file    rows    675
...
row_group       0       latitude        type    DOUBLE
row_group       0       latitude        num_values      675
row_group       0       latitude        compression     SNAPPY
row_group       0       latitude        encodings       BIT_PACKED,PLAIN,RLE
row_group       0       latitude        compressed_size 5476
row_group       0       latitude        uncompressed_size       5471
row_group       0       latitude        stats:min       14.1743075
row_group       0       latitude        stats:max       70.46727611
...</code></code></pre><p>The two things I'd like you to focus on for now are the <code>stats:min</code> and <code>stats:max</code> attributes.</p><p>These attributes contain the minimum and maximum values inside each specified column of the specified <code>row_group</code>. That is, the smallest value in the columns of that group of rows is equal to <code>stats:min</code> and the largest value is equal to <code>stats:max</code>.</p><p>This information is a huge performance win! <strong>If we are running a query that falls outside of this range, this entire row group can be excluded.</strong></p><p>Let&#8217;s say we have another file with the following metadata. </p><pre><code><code>file    created_by      parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)
file    columns 7
file    row_groups      1
file    rows    675
...
row_group       0       latitude        type    DOUBLE
row_group       0       latitude        num_values      674
row_group       0       latitude        compression     SNAPPY
row_group       0       latitude        encodings       BIT_PACKED,PLAIN,RLE
row_group       0       latitude        compressed_size 5176
row_group       0       latitude        uncompressed_size       5771
row_group       0       latitude        stats:min       44.4430157
row_group       0       latitude        stats:max       74.46727611
...</code></code></pre><p>In our original query (where <code>latitude</code> is &gt; 30 and &lt; 40) we are able to exclude this whole file. </p><pre><code><code>scala&gt; val complexFilter = dataset.filter($"latitude" &gt; 30).filter($"latitude" &lt; 40)</code></code></pre><p>In practice, this is called <strong>predicate pushdown</strong>. The requirements of the predicate (the query) have been pushed down, allowing the optimizers to look at the metadata on the row groups themselves to decide which row groups to read, and when they can be ignored.</p><h1>Conclusion</h1><p>There is a lot of magic that goes into our ability to query data quickly and <em>Efficiently. </em>In future posts, we&#8217;ll cover some of these in more detail as well. </p>]]></content:encoded></item><item><title><![CDATA[Distributed Data and Blocks]]></title><description><![CDATA[Tuning the "Chunks" of data that live on distributed file systems.]]></description><link>https://blog.vinoo.io/p/distributed-data-and-blocks</link><guid isPermaLink="false">https://blog.vinoo.io/p/distributed-data-and-blocks</guid><dc:creator><![CDATA[Vinoo Ganesh]]></dc:creator><pubDate>Tue, 24 Jan 2023 17:55:33 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9e30346e-8315-4f72-8301-7bc9deca907f_782x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is a continuation of <a href="https://vinooganesh.substack.com/p/efficient-data-partitioning">this</a> blog post. </p><p>In that post, I wrote a bit about how the layout of data on disk can impact the performance of your analytics jobs. This post will be focused on how that tactically works using open source technologies. </p><p>In this post, I&#8217;ll talk a bit about:</p><ol><li><p>HDFS </p></li><li><p>Blocks + Block Size</p></li><li><p>Block sizes + tradeoffs</p></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Efficiently is a reader-supported publication. To receive my latest posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Background</h1><p>Let&#8217;s take a step back and review what we have learned. We know that the way that data is laid out on disk can drastically affect the performance of analytics. We also know that there are a few strategies for laying data out on disk (we explored row-oriented, columnar, and hybrid models in the last post), but how is this actually relevant to us in our modern data stack?</p><h2>Hadoop</h2><p>To really understand what I&#8217;m about to describe, it&#8217;ll help to have some history. I&#8217;m going to start with <a href="https://hadoop.apache.org/">Hadoop</a>. If you&#8217;re reading this post, you&#8217;ve likely heard of Hadoop, but in the event you haven&#8217;t, I&#8217;m going to describe Hadoop in the <em>Efficiently</em> way. </p><p>Hadoop is an ecosystem of products that allow you to:</p><ol><li><p>Store data in a distributed way (HDFS - Hadoop File System)</p></li><li><p>Query that data (MapReduce)</p></li><li><p>Manage compute resources for those queries (YARN)</p></li><li><p>[There is also a Commons Library and an Object store called Ozone].</p></li></ol><p>For our purposes, we&#8217;re choosing to focus on HDFS, since it&#8217;s the most directly relevant piece of the partitioning ecosystem from the storage side.</p><h2>HDFS </h2><p>Let&#8217;s say you have an invaluable file that contains the names, addresses, and phone numbers of employees at your company. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7chj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7chj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 424w, https://substackcdn.com/image/fetch/$s_!7chj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 848w, https://substackcdn.com/image/fetch/$s_!7chj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 1272w, https://substackcdn.com/image/fetch/$s_!7chj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7chj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png" width="639" height="181.4798206278027" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:380,&quot;width&quot;:1338,&quot;resizeWidth&quot;:639,&quot;bytes&quot;:91534,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7chj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 424w, https://substackcdn.com/image/fetch/$s_!7chj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 848w, https://substackcdn.com/image/fetch/$s_!7chj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 1272w, https://substackcdn.com/image/fetch/$s_!7chj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c0062de-7218-49da-a59a-5debfed07dc8_1338x380.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>You find this file very valuable and you want to access it from many places. Therefore, you make the logical decision to put this file somewhere that makes it easy to access. Now these are the days before Google Drive/Dropbox (kind of), so you had to build things your own way. You decide to upload this file on a server. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zEhm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zEhm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 424w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 848w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 1272w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zEhm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png" width="1456" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zEhm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 424w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 848w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 1272w, https://substackcdn.com/image/fetch/$s_!zEhm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0a7aff-ddf6-46bd-a430-0216f4cb328c_1538x602.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One day, you come into work and realize that the server is down! Meaning, you can&#8217;t access your file anymore. That's not good. </p><p>So, you decide that you can store 2 copies of that file on two different servers to avoid this situation from happening. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5dpt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5dpt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 424w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 848w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 1272w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5dpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png" width="1456" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:200748,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5dpt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 424w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 848w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 1272w, https://substackcdn.com/image/fetch/$s_!5dpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb364f817-d7ce-4500-b0a5-5fba38de4ede_1804x620.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But wait, now you need to update the file&#8230; which file should be updated? </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z89C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z89C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 424w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 848w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 1272w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z89C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png" width="705" height="275.5116758241758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:569,&quot;width&quot;:1456,&quot;resizeWidth&quot;:705,&quot;bytes&quot;:203787,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z89C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 424w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 848w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 1272w, https://substackcdn.com/image/fetch/$s_!Z89C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd02517e-25eb-47b7-bde2-eb7110a3cb02_1800x704.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How do you keep the two in sync? Furthermore, let&#8217;s say you increase the number of people accessing/updating this file. The orchestration around all of this becomes extraordinarily difficult. </p><p>And now we have a problem. </p><p>Luckily, there are a bunch of mechanisms in HDFS that allow you to work through these challenges (which I&#8217;ll likely cover in subsequent posts). </p><p>But for now, let&#8217;s get back to partitioning. </p><h2>Blocks</h2><p>HDFS stores data in blocks. These are the same Blocks from the last post. They are indivisible segments of data. A block is also, just like in the last point, the minimum amount of data that Hadoop can read to do a single read. Meaning, any read requires me to read at <strong>minimum</strong> the whole block.</p><p>In order to store data into HDFS, Hadoop takes the files and divides them into blocks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!59TE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!59TE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 424w, https://substackcdn.com/image/fetch/$s_!59TE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 848w, https://substackcdn.com/image/fetch/$s_!59TE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 1272w, https://substackcdn.com/image/fetch/$s_!59TE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!59TE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png" width="1356" height="386" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6156900b-468a-4619-b709-823699c3b3f5_1356x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:386,&quot;width&quot;:1356,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!59TE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 424w, https://substackcdn.com/image/fetch/$s_!59TE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 848w, https://substackcdn.com/image/fetch/$s_!59TE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 1272w, https://substackcdn.com/image/fetch/$s_!59TE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6156900b-468a-4619-b709-823699c3b3f5_1356x386.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It then takes these blocks and distributes them across multiple nodes in its cluster. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fTFL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fTFL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 424w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 848w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 1272w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fTFL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png" width="471" height="353.51460674157306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95f45378-b640-4061-8abd-6af913f207ac_890x668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:668,&quot;width&quot;:890,&quot;resizeWidth&quot;:471,&quot;bytes&quot;:108222,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fTFL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 424w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 848w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 1272w, https://substackcdn.com/image/fetch/$s_!fTFL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f45378-b640-4061-8abd-6af913f207ac_890x668.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#8230; Technically it does some replication too, so it really looks more like this in practice (depending on how many replicas you set). In the case of the below image, we&#8217;re assuming two replicas. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3QCb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3QCb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 424w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 848w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 1272w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3QCb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png" width="491" height="356.3008849557522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:904,&quot;resizeWidth&quot;:491,&quot;bytes&quot;:126532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3QCb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 424w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 848w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 1272w, https://substackcdn.com/image/fetch/$s_!3QCb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1258808f-d5a4-4a6f-91b6-f028f84656cf_904x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Note: Those of you experienced with HDFS will notice that my architecture diagram is intentionally simple. No Name Node / Data Nodes separation, no clients, etc.. The purpose of this guide is a description of partitioning, not a deep dive into HDFS. </strong></p><p>Once that process is complete, your file is officially stored on HDFS. This process seems pretty straightforward and simple, but these blocks play a critical role in the performance considerations associated with reads. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><h3>Block Size</h3><p>So far, we&#8217;ve glossed over a somewhat important concept. We know that files are divided into blocks&#8230; but exactly how many blocks? Are all of the blocks the same size? How big should a block be?</p><p>Let&#8217;s start with some details. The <strong>default</strong> block size in Hadoop is 128MB (it used to be 64MB). When I first heard this number, I remember thinking, &#8220;wow. that&#8217;s massive.&#8221; And at the time, it felt like it was. 128MB feels like a huge amount of data, especially when you&#8217;re working with tiny local CSVs. </p><p>That is, until data scale grew and we started seeing folks storing files that are 100s of GBs into HDFS. </p><p>The block size is configurable on a per-client basis. </p><h3>Tradeoffs </h3><p>Selection of the Block size affects the number of blocks that a file is &#8220;chunked&#8221; into and, as such, the number of I/O operations required to read or write the file. </p><p>Intuitively, a <strong>larger</strong> block size means the file is divided into <strong>fewer</strong> blocks, and thus requires fewer IO operations to read/write, but increases the amount of memory required to store each block in memory during processing.  </p><p>Conversely, a <strong>smaller</strong> block size means the file is divided into more blocks. This means it requires more IO operations to read/write, but consumes less memory to store each block in memory during processing. This can be beneficial when working with small files (but not too small) or when performing random access on a large file. However, a smaller block size means more blocks are created, which increases the metadata overhead on the NameNode and can become a scalability bottleneck.</p><p>As an aside, there is a well-documented problem associated with having too many small files, called the <em>small files problem. </em>I don&#8217;t think I can do a better job of explaining why this is problematic than this article does: https://blog.cloudera.com/the-small-files-problem/ </p><p>Block size also impacts HDFS&#8217; fault tolerance, but that&#8217;s outside the scope of this article. <strong> </strong></p><h2>Tuning for Efficiency </h2><p>As you&#8217;ve seen, reducing IO through partition pruning is a powerful way to query your data <em>Efficiently</em>. As such, selecting an optimal block size yields strong results. </p><p>Unfortunately, there isn&#8217;t a one-size-fits-all solution. Selecting a block size depends on the dataset, usage patterns, type of data, use case, and many other factors.</p><p>At a high level though, here&#8217;s what I&#8217;d recommend. </p><ol><li><p>Files less than a few hundred MBs: a smaller block size, such as 64MB or even 128MB can minimize the amount of wasted space in a block. </p></li><li><p>Files that are several GB or more, a larger block size, such as 256MB or 512MB can help minimize the number of blocks generated by the &#8220;chunking&#8221; of a dataset and ideally, minimize unnecessary IO. </p></li></ol><p><strong>Disclaimer</strong>: These are just <strong>guidelines</strong> and selecting your optimal block size will likely require some testing on your side. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://vinooganesh.substack.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share Efficiently&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://vinooganesh.substack.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share Efficiently</span></a></p><h2>Feedback</h2><p>Thanks for reading! I&#8217;d love any feedback. Feel free to comment below. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Efficiently is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[On-Disk Storage Methods (w/ visualizations) ]]></title><description><![CDATA[The way you write data can affect your performance!]]></description><link>https://blog.vinoo.io/p/on-disk-storage-methods</link><guid isPermaLink="false">https://blog.vinoo.io/p/on-disk-storage-methods</guid><dc:creator><![CDATA[Vinoo Ganesh]]></dc:creator><pubDate>Sat, 14 Jan 2023 13:40:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0d3dab60-981a-4e13-8fa9-5484794b24fd_688x618.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://vinooganesh.substack.com/p/partitioning-part-2">Part 2 here.</a> </p><p>A few years ago, I gave a talk at <a href="https://www.databricks.com/session_na20/the-apache-spark-file-format-ecosystem">Spark Summit 2020</a> about File Formats about Avro, ORC, and Parquet. I got a bunch of questions about that topic, which I promptly responded to point-to-point, ensuring that the knowledge lives only in those forums&#8230; and nowhere else. </p><p>That isn&#8217;t helpful for most people. This post is my attempt to fix that.</p><p>In this series, I&#8217;ll lay out some of the primitives of this topic and then dive into the hands on details. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Efficiently is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Problem</h2><p>In the efficiency space - minimizing &#8220;work&#8221; is key. Whether work requires compute, network, or storage, t<strong>he goal of efficient data usage is to get the most accurate answer in the fastest and cheapest way possible.</strong> </p><p>One of the ways the ecosystem has developed to help data practitioners accomplish this task is though the development of File Formats. When you think of a file format, you may think of an extension that looks like .xlsx, .pdf, .pptx&#8230; and you&#8217;d be right.</p><p>Each of these formats tells the &#8220;reader&#8221; of the format, how to interpret the data that has been written to disk. Similarly, technologies like Parquet, Avro, and ORC help practitioners store their data in a way that minimizes work. </p><h2>Background / Example Data</h2><p><s>The word partition is derived from the Latin</s> Let&#8217;s get to the actionable details. </p><p>A partition is a logical segment of data. In the big data world, this usually means a piece of a lager dataset. For our purposes, and to illustrate this, I&#8217;m going to use the example dataset below.</p><p>This dataset has 3 columns (Column A, Column B, and Column C) and 4 rows (Row 0, Row 1, Row 2, and Row 3). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IWOW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IWOW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 424w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 848w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 1272w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IWOW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png" width="1456" height="569" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:569,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95318,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IWOW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 424w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 848w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 1272w, https://substackcdn.com/image/fetch/$s_!IWOW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf1bf432-64af-41c8-9302-e9b7ad986093_1974x772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This table should look just like something you&#8217;ve seen on Excel, in Pandas, etc.. Let&#8217;s take this example a bit further and split up the individual elements into their own logical &#8220;pieces.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ig7a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ig7a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 424w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 848w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 1272w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ig7a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png" width="1456" height="606" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:606,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130212,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ig7a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 424w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 848w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 1272w, https://substackcdn.com/image/fetch/$s_!Ig7a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F272553a8-b0b0-4000-8360-aa364f14ad38_1874x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can refer to each &#8220;cell&#8221; by it&#8217;s &#8220;&lt;column&gt;&lt;row&gt;.&#8221; For example, the second row in Column B is called B1. </p><h2>Storage</h2><p>Let&#8217;s talk a bit about storage. In the data ecosystem, storage is thought of as cheap (at least cheaper than compute), and as such, few people look into ways of actually optimizing storage. </p><p>Storage - in this post - refers to the way that this data is actually stored on disk. Now, whatever represents &#8220;disk&#8221; may differ from system to system (for example, even though the concept of a disk isn&#8217;t exposed to you on S3, there data eventually ends up there). </p><p>For this example, and to learn more about partitioning however, I&#8217;m going to focus on the most simple storage mechanism - data that is stored on your local hard disk. From there, most everything else can be extrapolated.</p><h3>Background</h3><p>Data is stored on hard disks in what is called a <strong>block. </strong>A block is the minimum amount of data read during any read operation. </p><p>I&#8217;ve always thought about blocks as a suitcase. When you go on a trip, and have to check in a bag, you pay the same price regardless of how full or empty your suitcase is. That being said, it&#8217;s optimal to fill your suitcase with as many objects relevant to your trip as possible, in as easy of a way to find as possible. </p><p>Extending this analogy further, packing a bunch of unnecessary stuff in your suitcase isn&#8217;t great. Additionally, bringing a ton of suitcases on your trip (unless everything is strictly necessary), also isn&#8217;t great. Finally, inside of the suitcase, you want to &#8220;group&#8221; similar things together - ie. each of a pair of socks should likely be next to each other in the same suitcase, rather than one sock being in one suitcase and one sock being in the other.</p><p>In the land of hard drives, all of these insights apply. Reading unnecessary data is expensive. Reading fragmented data is expensive. Random seeks (sock example above) are expensive as well. </p><p>Our goal is to lay data out in a manner optimized for our workflows.</p><h3>Row-wise Storage</h3><p>In database land, the common way to storage data used to be row-wise. It&#8217;s pretty easy to understand why. Most people think about datasets as a list of rows. </p><p>Taking our dataset above, let&#8217;s store this in a row-wise method.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jhJE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jhJE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 424w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 848w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 1272w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jhJE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png" width="1456" height="632" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:632,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jhJE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 424w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 848w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 1272w, https://substackcdn.com/image/fetch/$s_!jhJE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6bb086-1692-4a1e-aa15-23b45af891f2_2106x914.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you can see, I have taken each row in order and packed as much of the rows as I can into a block, before moving to the next block.  </p><p>This method works great when my goal is to read the data sequentially. All that&#8217;s required is a simple linear scan of the block in order. It doesn&#8217;t work as well if, for example, I want to only look at Column C. In that case, I&#8217;m required to read all of the block (ie. read all of the data) and filter down to Column C.</p><p>This is <strong>row-wise</strong> storage methodology.</p><h3>Columnar (Column-wise) Storage</h3><p>Column-wise storage takes the opposite approach and orients around columns. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DckT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DckT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 424w, https://substackcdn.com/image/fetch/$s_!DckT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 848w, https://substackcdn.com/image/fetch/$s_!DckT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 1272w, https://substackcdn.com/image/fetch/$s_!DckT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DckT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png" width="1456" height="624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:624,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:117456,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DckT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 424w, https://substackcdn.com/image/fetch/$s_!DckT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 848w, https://substackcdn.com/image/fetch/$s_!DckT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 1272w, https://substackcdn.com/image/fetch/$s_!DckT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a4d0e9c-6079-4343-bede-5d53087c8861_2148x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you can see, we first take the entire column, pack it into a block, and then move onto the next column. </p><p>This method works great when the data is read as in a columnar way (ie one column at a time). It doesn&#8217;t work well if, if example, I want to reconstruct Row 0. In that situation, I&#8217;d need to read all of the data and filter down to the elements that make up Row 0.</p><p>Now, we&#8217;re in a dilemma - one approach seem too <s>hot</s> favor a row oriented of workflow, one approach seems to <s>cold</s> favor a column oriented workflow. Luckily for us (and Goldilocks), there&#8217;s a middle ground.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!elWW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!elWW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 424w, https://substackcdn.com/image/fetch/$s_!elWW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 848w, https://substackcdn.com/image/fetch/$s_!elWW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!elWW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!elWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg" width="480" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Goldilocks Principle: Getting Your Software Implementation Just Right -  Wilson Allen&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Goldilocks Principle: Getting Your Software Implementation Just Right -  Wilson Allen" title="The Goldilocks Principle: Getting Your Software Implementation Just Right -  Wilson Allen" srcset="https://substackcdn.com/image/fetch/$s_!elWW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 424w, https://substackcdn.com/image/fetch/$s_!elWW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 848w, https://substackcdn.com/image/fetch/$s_!elWW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!elWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa62069f4-5403-425d-ab8f-366845d37b61_480x320.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Hybrid Storage</h3><p>A hybrid storage model gives us the best of both worlds. First, we group the a fixed number of Rows together and then the further group that by columns. We segment these and call these &#8220;Row Groups&#8221; (at least in the Parquet terminology)</p><p>In this example, we first selected two rows - Row 0 and Row 1. We then, grouped those rows by column, and inserted them into our first Row Group.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dbR3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dbR3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 424w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 848w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 1272w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dbR3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114242,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dbR3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 424w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 848w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 1272w, https://substackcdn.com/image/fetch/$s_!dbR3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f0fdca-b73d-4708-a0b3-0a598f4562d3_2122x902.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I called these logical Row Groups because this is more of how we should be thinking about them, rather than how they may necessarily end up on disk. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dhRd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dhRd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 424w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 848w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 1272w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dhRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png" width="1456" height="542" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:542,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134806,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dhRd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 424w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 848w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 1272w, https://substackcdn.com/image/fetch/$s_!dhRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe7db58e-9392-4fd0-bfd7-771f6ce73c36_2164x806.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This representation of data is actually immensely powerful. It allows us to both optimize our workflows for row oriented workflows as well as column oriented workflows.</p><p>Let&#8217;s talk about how this works. </p><p>In the case of a row oriented workflow, let&#8217;s say I&#8217;d want to recreate Row 2. To do this, I would simply need to look at Block 1 and Block 2. If I were operating in a Columnar storage model, I would need to look at Block 1, Block 2, and Block 3. I&#8217;ve saved a whole Block! </p><p>In the case of a column oriented workflow, let&#8217;s say I&#8217;d want to recreate Column B. In this case, I would simply need to look at Block 1 and Block 2. If I were operating in a Row-wise storage model, I would need to look at Block 1, Block 2, and Block 3. I&#8217;ve once again saved a whole Block!</p><p>Our examples used very small data, you can imagine how this extrapolates further with larger data sets.</p><h2>Data Workflows</h2><p>Throughout this post, I&#8217;d referred to my data workflows as "row oriented&#8221; or &#8220;column oriented.&#8221; Luckily for us, the big data community has come up with some terminology that should help bring these two workflows to life.</p><h3>OLTP</h3><p>Online Transaction Processing (OLTP) workloads generally involve larger amounts of short queries/transactions. These tend to be more focused on processing than analytics and as such have more data updates and deleted. Roughly - we can consider OLTP workflows as &#8220;row oriented&#8221; workflows.</p><h3>OLAP</h3><p>Online Analytical Processing (OLAP) workloads are more analysis than processing focused. As such, there tends to be more analytical complexity per query and fewer CRUD transactions. Roughly - we can consider OLAP workflows as &#8220;column oriented&#8221; workflows.</p><h2>Conclusion</h2><p>Using data efficiently, relies on using all levels of the &#8220;data stack&#8221; (storage, network, compute) efficiently. Reducing the amount of unnecessary data read during a query process can have compounding effects on the speed and efficiency of your analytics process. </p><h2>Coming soon</h2><p>In subsequent parts of this series, I&#8217;ll be digging more into the details of how everything we have covered thus far can be applied in analytics workloads. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Efficiently is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Introducing: Efficiently ]]></title><description><![CDATA[Efficiently: A blog dedicated to operationalizing data in a world of limited resources.]]></description><link>https://blog.vinoo.io/p/efficiently</link><guid isPermaLink="false">https://blog.vinoo.io/p/efficiently</guid><dc:creator><![CDATA[Vinoo Ganesh]]></dc:creator><pubDate>Wed, 28 Dec 2022 20:39:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f4a6952f-885f-40d3-832a-17f9e121bc83_870x872.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.vinoo.io/subscribe?"><span>Subscribe now</span></a></p><p>My name is Vinoo and I&#8217;ll be your guide, writer, and (likely) ranter throughout this series. </p><h3>0. What this is </h3><p>Recently, I had a friend recommend Austin Kelon&#8217;s <a href="https://www.amazon.com/Show-Your-Work-Austin-Kleon/dp/076117897X">Show Your Work</a>. I read it over the course of a short plane ride and realized&#8230; it&#8217;s time to start showing my work. Publicly. </p><p>This blog series, at it&#8217;s core, is a collection of the the things that I&#8217;ve learned while building analytical tools, creating data products, and consuming data products. Mostly, it&#8217;s intended as a technical series for technical audiences, but I&#8217;m not sure where it&#8217;ll actually go. </p><p>But, at it&#8217;s core, this blog is me challenging myself to extract meaningful and applicable lessons from the challenges I&#8217;ve faced working in the data ecosystem for over a decade.</p><p><strong>In short, this blog is a public forcing function for me to democratize the best practices I&#8217;ve learned along the way.</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!An0p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!An0p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 424w, https://substackcdn.com/image/fetch/$s_!An0p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 848w, https://substackcdn.com/image/fetch/$s_!An0p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 1272w, https://substackcdn.com/image/fetch/$s_!An0p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!An0p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png" width="1334" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1334,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87404,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!An0p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 424w, https://substackcdn.com/image/fetch/$s_!An0p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 848w, https://substackcdn.com/image/fetch/$s_!An0p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 1272w, https://substackcdn.com/image/fetch/$s_!An0p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f346618-1593-4759-a6cc-9ec4c12f58d2_1334x256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><h3>1. Why this, why now </h3><p>We grew up in the golden age of data. The cloud migration was well underway. Organizations transformed and decisions became data driven. The modern data stack was cool, powerful, and new. The potential for compute was endless. Money was cheap and plentiful. It was worse to do nothing than spend 7 figures on building a data platform. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3JVL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3JVL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3JVL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Anne Lamott Quote: &#8220;If you have a problem you can solve by throwing money  at it,&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Anne Lamott Quote: &#8220;If you have a problem you can solve by throwing money  at it," title="Anne Lamott Quote: &#8220;If you have a problem you can solve by throwing money  at it," srcset="https://substackcdn.com/image/fetch/$s_!3JVL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3JVL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6f4be2-42ba-4d16-a046-a99c8728474a_3840x2160.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Then, things got real. The economic downturn hit&#8230; and it hit hard. Layoffs started. Money got tight. Many startups had to look up the word &#8220;profit&#8221; for the first time (joking, but not really). </p><p>In these hard times though, companies tend to focus on <a href="https://fortune.com/2022/11/09/top-budget-items-companies-cutting-costs-recession/">cutting</a> only what they can &#8220;see.&#8221; Specifically, they cut office space, employee travel, and employee expense budgets. They cut marketing budgets, third-party contractors, and of course, sadly, employees. </p><p>All the while, their compute bills are increasing by an order of magnitude year over year. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ewBp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ewBp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ewBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg" width="500" height="649" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ewBp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ewBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21a06a8-8010-4ca8-8aaf-e34798830bef_500x649.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The compounding cost of clouds is an existential threat for enterprises. Without appropriate capital to fund business operations and without the means of assessing the current cloud/data investments, organizations are left in a precarious position. Data practitioners need to know about this methodology because it can be the very element that saves their company from at best an unexpected bill and at worst a cession of business.</p><h3>2. Who may benefit from this</h3><p>YC just sent this to it&#8217;s founders.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F4Vj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F4Vj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F4Vj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg" width="1022" height="1600" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1022,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377220,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F4Vj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F4Vj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F02210889-291c-471e-89b5-b85f05b58050_1022x1600.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A lot of scary things going on in this post, but the high level is that founders need to <a href="https://www.youtube.com/watch?v=yvHYWD29ZNY">$ave Dat Money</a>. This series is for you if your enterprise would like to $ave Dat Money. </p><h3>3. Why me</h3><p>I&#8217;ve gotten it wrong&#8230; a lot. And as such, have had to learn a lot. </p><p>I have always been fascinated by the power of data to drive business decisions, inform research, and improve our understanding of the world around us. My career has taken me from the tech sector to the financial sector. I&#8217;ve built analytical tools, sold data as a broker, consumed data from brokers, and most recently, help folks optimize their internal tech stacks. </p><p>That's where this blog comes in. My goal is to explore the ways in which we can use data more efficiently, both in terms of the resources we use to collect and process it, and in terms of how we use it to drive meaningful results.</p><h3>4. Where I&#8217;ll start</h3><p>I&#8217;m likely going to start with some of the areas I&#8217;ve worked personally with and tell (more than) a few stories along the way. I'll be covering topics ranging from data storage and management to data analysis and visualization, and sharing case studies and best practices from my own experiences and those of others in the industry. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZFbP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZFbP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZFbP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg" width="576" height="433" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:433,&quot;width&quot;:576,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZFbP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZFbP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa86b6081-7890-4e8e-9a62-e1192f695b51_576x433.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I'm thrilled to have you join me on this journey, and I hope you'll find value in the content we'll be sharing. </p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.vinoo.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Efficient Data! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>