<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <generator uri="http://jekyllrb.com" version="4.3.3">Jekyll</generator>
  
  
  <link href="/feed.xml" rel="self" type="application/atom+xml" />
  <link href="/" rel="alternate" type="text/html" />
  <updated>2025-11-20T12:58:55+00:00</updated>
  <id>//</id>

  
    <title type="html">SCALED VOID</title>
  

  

  

  
  
    <entry>
      
      <title type="html">[DRAFT] Debugging FLINK-36808 Issue</title>
      
      
      <link href="/2025/11/20/flink-36808/" rel="alternate" type="text/html" title="[DRAFT] Debugging FLINK-36808 Issue" />
      
      <published>2025-11-20T00:00:00+00:00</published>
      <updated>2025-11-20T00:00:00+00:00</updated>
      <id>/2025/11/20/flink-36808</id>
      <content type="html" xml:base="/2025/11/20/flink-36808/">&lt;p&gt;Recently I picked up the &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-36808&quot;&gt;FLINK-36808&lt;/a&gt; bug in the Apache Flink project. The solution to bug covers topics such as SQL LookupJoin, Flink SQL Planner, and Volcano optimizer (which is provided by the Apache Calcite project).&lt;/p&gt;

&lt;p&gt;In this post I describe my experience debugging the issue and the lessons learned along the way. You can already find the pull request with the fix here &lt;a href=&quot;https://github.com/apache/flink/pull/26514&quot;&gt;PR #26514&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The issue was internally reported by my colleague &lt;a href=&quot;&quot;&gt;Jun Qin&lt;/a&gt; and initial investigations were done by my teammate &lt;a href=&quot;&quot;&gt;Qingsheng Ren&lt;/a&gt;. Let us start by first trying to understand the issue.&lt;/p&gt;

&lt;p&gt;As mentioned in the ticket, given the union query of two lookup joins on the dimension tables, we get wrong results.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Data of table `stream`:&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- (1, Alice)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- (2, Bob)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`stream`&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;`name`&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;`txn_time`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;proctime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ENFORCED&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;connector&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;jdbc&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;url&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;jdbc:postgresql://localhost:5432/postgres&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;table-name&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;stream&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;username&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;postgres&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;password&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;postgres&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Data of table `dim`:&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- (1, OK)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- (2, OK)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`dim`&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;`status`&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ENFORCED&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;connector&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;jdbc&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;url&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;jdbc:postgresql://localhost:5432/postgres&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;table-name&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;dim&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;username&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;postgres&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&apos;password&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;postgres&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Lookup join two tables twice with different filter, and union them together&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;txn_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`stream`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`dim`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYSTEM_TIME&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`txn_time`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt;
     &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
     &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`status`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;OK&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;txn_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;status&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`stream`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INNER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`dim`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYSTEM_TIME&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`txn_time`&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt;
     &lt;span class=&quot;nv&quot;&gt;`s`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`id`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
     &lt;span class=&quot;nv&quot;&gt;`d`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;`status`&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;NOT_EXISTS&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We expect to get the following results:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;1, Alice 2024-11-27 11:52:19.332, OK
2, Bob   2024-11-27 11:52:19.332, OK
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, the actual results we got were:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;1, Alice, 2024-11-27 11:52:19.332, OK
2, Bob,   2024-11-27 11:52:19.332, OK
1, Alice, 2024-11-27 11:52:19.333, NOT_EXISTS
2, Bob,   2024-11-27 11:52:19.333, NOT_EXISTS
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is obviously wrong, since there are no statuses with value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT_EXISTS&lt;/code&gt; in the dimension &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dim&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;And SQL plans reported on the ticket seem correct.&lt;/p&gt;

&lt;p&gt;Abstract syntax tree:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;LogicalUnion(all=[true])
:- LogicalProject(id=[$0], name=[$1], txn_time=[$2], status=[$4])
:  +- LogicalFilter(condition=[=($4, _UTF-16LE&apos;OK&apos;)])
:     +- LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 2}])
:        :- LogicalProject(id=[$0], name=[$1], txn_time=[PROCTIME()])
:        :  +- LogicalTableScan(table=[[default_catalog, default_database, stream]])
:        +- LogicalFilter(condition=[=($cor0.id, $0)])
:           +- LogicalSnapshot(period=[$cor0.txn_time])
:              +- LogicalTableScan(table=[[default_catalog, default_database, dim]])
+- LogicalProject(id=[$0], name=[$1], txn_time=[$2], status=[$4])
   +- LogicalFilter(condition=[=($4, _UTF-16LE&apos;NOT_EXISTS&apos;)])
      +- LogicalCorrelate(correlation=[$cor1], joinType=[inner], requiredColumns=[{0, 2}])
         :- LogicalProject(id=[$0], name=[$1], txn_time=[PROCTIME()])
         :  +- LogicalTableScan(table=[[default_catalog, default_database, stream]])
         +- LogicalFilter(condition=[=($cor1.id, $0)])
            +- LogicalSnapshot(period=[$cor1.txn_time])
               +- LogicalTableScan(table=[[default_catalog, default_database, dim]])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Optimized physical plan:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Calc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- Union(all=[true], union=[id, name, txn_time, status])
   :- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot; AS VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot;) AS status])
   :  +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[id=id], select=[id, name, txn_time, id])
   :     +- Calc(select=[id, name, PROCTIME() AS txn_time])
   :        +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   +- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot; AS VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot;) AS status])
      +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[id=id], select=[id, name, txn_time, id])
         +- Calc(select=[id, name, PROCTIME() AS txn_time])
            +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And optimized execution plan:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Calc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- Union(all=[true], union=[id, name, txn_time, status])
   :- Calc(select=[id, name, txn_time, CAST(&apos;OK&apos; AS VARCHAR(2147483647)) AS status])
   :  +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[id=id], select=[id, name, txn_time, id])(reuse_id=[1])
   :     +- Calc(select=[id, name, PROCTIME() AS txn_time])
   :        +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   +- Calc(select=[id, name, txn_time, CAST(&apos;NOT_EXISTS&apos; AS VARCHAR(2147483647)) AS status])
      +- Reused(reference_id=[1])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I have also run the same query using our internal Flink engines or using MySQL as another database, but the results were the same. Thus fixing this bug in the open-source Apache Flink is important, all other enterprise Flink distributions will also benefit.&lt;/p&gt;

&lt;h2 id=&quot;understanding-the-issue&quot;&gt;Understanding the Issue&lt;/h2&gt;

&lt;p&gt;From the quick look at the optimized execution plan, we can see that the first lookup join is reused. To make sure that this is not the source of the problem, we can disable the reuse by setting the following configuration:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;SET table.optimizer.reuse-sub-plan-enabled=false;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We are still getting the wrong results.&lt;/p&gt;

&lt;p&gt;To better debug the issue, it is a good idea to create and reproduce the issue in the tests. After looking into the planner tests I came up with a test case using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;values&apos;&lt;/code&gt; connector for both tables. For example, we can update the properties of the dimension table as below:&lt;/p&gt;

&lt;div class=&quot;language-scala highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;dimTableId&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;TestValuesTableFactory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;registerData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Seq&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;OK&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;OK&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;dimTableDDL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;
     | CREATE TABLE `dim` (
     |   `id` BIGINT,
     |   `status` STRING,
     |   PRIMARY KEY (`id`) NOT ENFORCED
     | ) WITH (
     |   &apos;connector&apos; = &apos;values&apos;,
     |   &apos;data-id&apos; = &apos;$dimTableId&apos;
     | )
     |&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;stripMargin&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;executeSql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dimTableDDL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hold and behold, the issue does not happen! Okay, this is good since sign, we can now try to find the reasons for differences when using different connectors.&lt;/p&gt;

&lt;h3 id=&quot;comparing-planner-transformations&quot;&gt;Comparing Planner Transformations&lt;/h3&gt;

&lt;p&gt;We know that two queries behave differently when used with different connectors, to investigate further, let’s compare each query transformation. To do so, enable the debug logging or add log statements in the &lt;a href=&quot;https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/optimize/program/FlinkChainedProgram.scala#L62-L66&quot;&gt;FlinkChainedProgram#optimize&lt;/a&gt; method.&lt;/p&gt;

&lt;p&gt;(There are many optimization steps in the program they require a topic of their own)&lt;/p&gt;

&lt;p&gt;Listing all the optimization steps and comparing each plan (with minor simplifications), we can see the following differences.&lt;/p&gt;

&lt;h4 id=&quot;logical-rewrite&quot;&gt;Logical Rewrite&lt;/h4&gt;

&lt;p&gt;The first difference is in the logical rewrite step.&lt;/p&gt;

&lt;style&gt;
.diff {
    pre { background-color: #282c34; color: #abb2bf; font-size: 10px; }
    .DiffChange {background-color: #44403c; color: #e0af68}
    .DiffText {background-color: #564d41}
}

/* Tree styles */
ul.tree {
  font-size: 14px;
  list-style: none;
  margin: 0;
  padding-top: 5px;
  padding-bottom: 20px;
  padding-left: 1em;
  border-radius: 5px;
  background-color: #2b2b2b;
  color: #a9b7c6;
}
ul.tree ul {
  list-style: none;
  margin-left: .6em;
  padding-left: .6em;
}
ul.tree li {
  position: relative;
  line-height: 1.4em;
}

* Common row styling */
.row {
  display: flex;
  align-items: center;
  border-radius: 3px;
}

/* Only this .row gets the highlight */
.row.highlighted {
  /* pull the blue back out to the very left edge */
  margin-left: -1em;
  padding-left: 1em;
  background-color: #214283;
}

icon {
  width: 16px;
  height: 16px;
  margin-right: 4px;
}

/* Toggle icons */
.toggle {
  display: inline-block;
  width: 1em;
  color: #BBB;
}

/* colouring */
.field-name { color: #CC7832; padding-right: 3px }  /* orange fields */
.type       { color: #9876AA; padding-left: 3px }   /* purple type names */
.value      { color: #6A8759; padding-left: 0 }     /* green values */
.id         { color: #d2d2cc; padding-left: 3px }     /* white values */

li.highlighted {
  /* pull the blue back out to the very left edge */
  margin-left: -7em;
  padding-left: 7em;
  background-color: #214283;
  border-radius: 3px;
}
&lt;/style&gt;

&lt;table style=&quot;table-layout:fixed; width: 1500px;&quot;&gt;
&lt;tr style=&quot;text-align: center;&quot;&gt;
    &lt;th&gt;&lt;em&gt;Logical Rewrite rule for union all query using JDBC connector&lt;/em&gt;&lt;/th&gt;
    &lt;th&gt;&lt;em&gt;Logical Rewrite rule for union all query using test &apos;values&apos; connector&lt;/em&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr class=&quot;diff&quot;&gt;
&lt;td&gt;
&lt;pre&gt;
optimize &apos;logical_rewrite&apos; cost 29 ms.

original input:

FlinkLogicalUnion(all=[true])
:- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
:  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
:     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
:     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
:     +- FlinkLogicalCalc(select=[id], where=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))])
:        +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
:           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim]], fields=[id, status])
+- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
   +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
      :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
      :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
      +- FlinkLogicalCalc(select=[id], where=[=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))])
         +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
            +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim]], fields=[id, status])

optimize output:

FlinkLogicalUnion(all=[true])
:- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
:  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
:     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
:     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
:     +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;:        +- FlinkLogicalCalc(select=[id])&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;:           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;]]], fields=[id, status])&lt;/span&gt;
+- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
   +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
      :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
      :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
      +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;         +- FlinkLogicalCalc(select=[id])&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;            +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;]]], fields=[id, status])&lt;/span&gt;

&lt;/pre&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;pre&gt;
optimize &apos;logical_rewrite&apos; cost 29 ms.

original input:

FlinkLogicalUnion(all=[true])
:- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
:  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
:     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
:     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
:     +- FlinkLogicalCalc(select=[id], where=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))])
:        +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
:           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim]], fields=[id, status])
+- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
   +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
      :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
      :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
      +- FlinkLogicalCalc(select=[id], where=[=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))])
         +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
            +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim]], fields=[id, status])

optimized output:

FlinkLogicalUnion(all=[true])
:- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
:  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
:     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
:     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
:     +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;:        +- FlinkLogicalCalc(select=[id]&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;, where=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))]&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;:           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[]]], fields=[id, status])&lt;/span&gt;
+- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
   +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
      :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
      :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
      +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;         +- FlinkLogicalCalc(select=[id]&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;, where=[=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))]&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;            +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[]]], fields=[id, status])&lt;/span&gt;

&lt;/pre&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Well, what is going on here?&lt;/p&gt;

&lt;p&gt;We have the same logical input, but the optimized results are different. After discussing with my colleagues, I realized that the JDBC connector supports &lt;a href=&quot;https://github.com/apache/flink-connector-jdbc/blob/main/flink-connector-jdbc-core/src/main/java/org/apache/flink/connector/jdbc/core/table/source/JdbcDynamicTableSource.java#L70&quot;&gt;filter push down&lt;/a&gt; capability. Because of this, the Flink SQL planner pushes the filter condition down to the table level when using the JDBC connector. This is not the case for the test &lt;strong&gt;values&lt;/strong&gt; connector, and the filter/where condition is kept on the calculated logical expression that will be applied after the table scan.&lt;/p&gt;

&lt;!-- credits:
    - https://github.com/tomjoht/documentation-theme-jekyll
    - https://github.com/antfu/markdown-it-github-alerts/tree/main
    - https://idratherbewriting.com/documentation-theme-jekyll/mydoc_alerts.html
--&gt;
&lt;div class=&quot;markdown-alert markdown-alert-note&quot; role=&quot;alert&quot;&gt;
    &lt;p class=&quot;markdown-alert-title&quot; dir=&quot;auto&quot;&gt;
        &lt;svg class=&quot;octicon octicon-info mr-2&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot; height=&quot;16&quot; aria-hidden=&quot;true&quot;&gt;
            &lt;path d=&quot;M0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8Zm8-6.5a6.5 6.5 0 1 0 0 13 6.5 6.5 0 0 0 0-13ZM6.5 7.75A.75.75 0 0 1 7.25 7h1a.75.75 0 0 1 .75.75v2.75h.25a.75.75 0 0 1 0 1.5h-2a.75.75 0 0 1 0-1.5h.25v-2h-.25a.75.75 0 0 1-.75-.75ZM8 6a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z&quot;&gt;&lt;/path&gt;
        &lt;/svg&gt;
        Note
    &lt;/p&gt;
    &lt;div&gt;&lt;em&gt;The filter push down optimization is important since it allows us to filter the data at the source level, reducing the amount of data transferred over the network, improving performance of the SQL queries.&lt;/em&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Okay, this is fine. Let’s check the next difference.&lt;/p&gt;

&lt;h4 id=&quot;physical-optimization&quot;&gt;Physical Optimization&lt;/h4&gt;

&lt;p&gt;The second difference happens in the physical optimization step.&lt;/p&gt;

&lt;table style=&quot;table-layout:fixed; width:1500px;&quot;&gt;
&lt;tr style=&quot;text-align: center;&quot;&gt;
    &lt;th&gt;&lt;em&gt;Physical Optimization rule for union all query using JDBC connector&lt;/em&gt;&lt;/th&gt;
    &lt;th&gt;&lt;em&gt;Physical Optimization rule for union all query using test &apos;values&apos; connector&lt;/em&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr class=&quot;diff&quot;&gt;
&lt;td&gt;
&lt;pre&gt;
optimize &apos;physical&apos; cost 29 ms.

original input:

FlinkLogicalCalc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- FlinkLogicalUnion(all=[true])
   :- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
   :  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
   :     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
   :     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   :     +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;   :        +- FlinkLogicalCalc(select=[id])&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;   :           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;]]], fields=[id, status])&lt;/span&gt;
   +- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
      +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
         :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
         :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
         +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;            +- FlinkLogicalCalc(select=[id])&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;               +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;]]], fields=[id, status])&lt;/span&gt;

optimized output:

Calc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- Union(all=[true], union=[id, name, txn_time, status])
   :- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
&lt;span class=&quot;DiffChange&quot;&gt;   :  +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;id=id&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;], select=[id, name, txn_time, id], upsertKey=[[0]])&lt;/span&gt;
   :     +- Calc(select=[id, name, PROCTIME() AS txn_time])
   :        +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   +- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
&lt;span class=&quot;DiffChange&quot;&gt;      +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;id=id&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;], select=[id, name, txn_time, id], upsertKey=[[0]])&lt;/span&gt;
         +- Calc(select=[id, name, PROCTIME() AS txn_time])
            +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])

&lt;/pre&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;pre&gt;
optimize &apos;physical&apos; cost 29 ms.

original input:

FlinkLogicalCalc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- FlinkLogicalUnion(all=[true])
   :- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
   :  +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
   :     :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
   :     :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   :     +- FlinkLogicalSnapshot(period=[$cor0.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;   :        +- FlinkLogicalCalc(select=[id]&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;, where=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))]&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;   :           +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[]]], fields=[id, status])&lt;/span&gt;
   +- FlinkLogicalCalc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
      +- FlinkLogicalJoin(condition=[=($0, $3)], joinType=[inner])
         :- FlinkLogicalCalc(select=[id, name, PROCTIME() AS txn_time])
         :  +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
         +- FlinkLogicalSnapshot(period=[$cor1.txn_time])
&lt;span class=&quot;DiffChange&quot;&gt;            +- FlinkLogicalCalc(select=[id]&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;, where=[=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))]&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;DiffChange&quot;&gt;               +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dim, filter=[]]], fields=[id, status])&lt;/span&gt;

optimized output:

Calc(select=[id, name, PROCTIME_MATERIALIZE(txn_time) AS txn_time, status])
+- Union(all=[true], union=[id, name, txn_time, status])
   :- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;OK&apos;:VARCHAR(2147483647)) AS status])
&lt;span class=&quot;DiffChange&quot;&gt;   :  +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;status=_UTF-16LE&apos;OK&apos;, id=id], where=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;], select=[id, name, txn_time, id], upsertKey=[[0]])&lt;/span&gt;
   :     +- Calc(select=[id, name, PROCTIME() AS txn_time])
   :        +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])
   +- Calc(select=[id, name, txn_time, CAST(_UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647)) AS status])
&lt;span class=&quot;DiffChange&quot;&gt;      +- LookupJoin(table=[default_catalog.default_database.dim], joinType=[InnerJoin], lookup=[&lt;/span&gt;&lt;span class=&quot;DiffText&quot;&gt;status=_UTF-16LE&apos;NOT_EXISTS&apos;, id=id], where=[=(status, _UTF-16LE&apos;NOT_EXISTS&apos;:VARCHAR(2147483647))&lt;/span&gt;&lt;span class=&quot;DiffChange&quot;&gt;], select=[id, name, txn_time, id], upsertKey=[[0]])&lt;/span&gt;
         +- Calc(select=[id, name, PROCTIME() AS txn_time])
            +- TableSourceScan(table=[[default_catalog, default_database, stream]], fields=[id, name])

&lt;/pre&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;This transformation also looks fine. But let’s pay closer attention to understand what is going on here.&lt;/p&gt;

&lt;p&gt;This step converts the join into the lookup join, adds additional information, e.g, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;joinType&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;select&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lookup&lt;/code&gt; definitions. It adds &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;id&lt;/code&gt; as a lookup key to both transformations. However, for the &lt;strong&gt;values&lt;/strong&gt; connector plan (on the right) the lookup definition includes the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;status&lt;/code&gt; column as a lookup key. Plus, it also contains information about the filter condition in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;where&lt;/code&gt; definition.&lt;/p&gt;

&lt;p&gt;But on the JDBC connector plan (on the left) does not include &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;where&lt;/code&gt; definition and status in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lookup&lt;/code&gt; definition. This shouldn’t be a problem for JDBC or any other source that support filter pushdowns since the filter condition (here &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;status&lt;/code&gt; column comparison) is pushed down in the table scan.&lt;/p&gt;

&lt;p&gt;But still, it would be helpful to include the filter pushdowns in the lookup join definitions in query explanations.&lt;/p&gt;

&lt;div class=&quot;markdown-alert markdown-alert-warning&quot; role=&quot;alert&quot;&gt;
    &lt;p class=&quot;markdown-alert-title&quot; dir=&quot;auto&quot;&gt;
        &lt;svg class=&quot;octicon octicon-alert mr-2&quot; viewBox=&quot;0 0 16 16&quot; version=&quot;1.1&quot; width=&quot;16&quot; height=&quot;16&quot; aria-hidden=&quot;true&quot;&gt;
            &lt;path d=&quot;M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z&quot;&gt;&lt;/path&gt;
        &lt;/svg&gt;
        Warning
    &lt;/p&gt;
    &lt;div&gt;&lt;em&gt;This is also foreshadowing for the source of the bug.&lt;/em&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;At this point, I have conducted several debug sessions and discussions with my colleagues, but I still could not identify the root cause of the bug.&lt;/p&gt;

&lt;p&gt;I was thinking maybe some other optimization rule is messing up the filter pushdowns, e.g., discards them. Thus, I tried to create minimal test case that only applies the relevant physical optimization rules. You can find the example file &lt;a href=&quot;&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While debugging and reading other test cases, I noticed that the &lt;strong&gt;values&lt;/strong&gt; connector could simulate the filter pushdowns, by just adding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;filterable-fields&lt;/code&gt; property to the table creation.&lt;/p&gt;

&lt;p&gt;By updating the dimension table create statement to:&lt;/p&gt;

&lt;div class=&quot;language-scala highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;dimTableId&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;TestValuesTableFactory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;registerData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Seq&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;OK&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;OK&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;dimTableDDL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;
     | CREATE TABLE `dim` (
     |   `id` BIGINT,
     |   `status` STRING,
     |   PRIMARY KEY (`id`) NOT ENFORCED
     | ) WITH (
     |   &apos;connector&apos; = &apos;values&apos;,
     |   &apos;filterable-fields&apos; = &apos;id;status&apos;,
     |   &apos;data-id&apos; = &apos;$dimTableId&apos;
     | )
     |&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;stripMargin&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;executeSql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dimTableDDL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We have no the reproducible test case, and no need to depend on the databases. Okay good, but we are still have no clue about the root cause of the bug.&lt;/p&gt;

&lt;p&gt;Continuing debugging, I added breakpoint on the &lt;a href=&quot;https://github.com/apache/flink/blob/0ce8cb1f2bd13c7cfca8b777972db0cbaa99af46/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/rules/physical/stream/StreamPhysicalUnionRule.scala#L43&quot;&gt;StreamPhysicalUnionRule#440&lt;/a&gt; and noticed that the logical &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FlinkLogicalJoin&lt;/code&gt; relations for both inputs are the same.&lt;/p&gt;

&lt;ul class=&quot;tree&quot;&gt;
  &lt;li&gt;
    &lt;div class=&quot;row&quot;&gt;
      &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
      &lt;span class=&quot;field-name&quot;&gt;this&lt;/span&gt; =
      &lt;span class=&quot;type&quot;&gt;StreamPhysicalUnionRule&lt;/span&gt;
      &lt;span class=&quot;value&quot;&gt;@13203&lt;/span&gt;
    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;div class=&quot;row&quot;&gt;
      &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
      &lt;span class=&quot;field-name&quot;&gt;rel&lt;/span&gt; =
      &lt;span class=&quot;type&quot;&gt;FlinkLogicalUnion&lt;/span&gt;
      &lt;span class=&quot;value&quot;&gt;@13204&lt;/span&gt;
    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;div class=&quot;row highlighted&quot;&gt;
      &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/variablesTab.svg&quot; /&gt;
      &lt;span class=&quot;field-name&quot;&gt;union&lt;/span&gt; =
      &lt;span class=&quot;type&quot;&gt;FlinkLogicalUnion&lt;/span&gt;
      &lt;span class=&quot;value&quot;&gt;@13204&lt;/span&gt;
    &lt;/div&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;div class=&quot;row&quot;&gt;
          &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
          &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
          &lt;span class=&quot;id&quot;&gt;888&lt;/span&gt;
        &lt;/div&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;div class=&quot;row&quot;&gt;
          &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
          &lt;span class=&quot;field-name&quot;&gt;inputs&lt;/span&gt; =
          &lt;span class=&quot;type&quot;&gt;RegularImmutableList&lt;/span&gt;
          &lt;span class=&quot;value&quot;&gt;@13218&lt;/span&gt;
        &lt;/div&gt;
        &lt;ul&gt;
          &lt;li&gt;
            &lt;div class=&quot;row&quot;&gt;
              &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/variablesTab.svg&quot; /&gt;
              &lt;span class=&quot;field-name&quot;&gt;0&lt;/span&gt; =
              &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
              &lt;span class=&quot;value&quot;&gt;@13227&lt;/span&gt;
            &lt;/div&gt;
            &lt;ul&gt;
              &lt;li&gt;
                &lt;div class=&quot;row&quot;&gt;
                  &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                  &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                  &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                  &lt;span class=&quot;id&quot;&gt;876&lt;/span&gt;
                &lt;/div&gt;
              &lt;/li&gt;
              &lt;li&gt;
                &lt;div class=&quot;row&quot;&gt;
                  &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                  &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                  &lt;span class=&quot;field-name&quot;&gt;best&lt;/span&gt; =
                  &lt;span class=&quot;type&quot;&gt;FlinkLogicalCalc&lt;/span&gt;
                  &lt;span class=&quot;value&quot;&gt;@13232&lt;/span&gt;
                &lt;/div&gt;
                &lt;ul&gt;
                  &lt;li&gt;
                    &lt;div class=&quot;row&quot;&gt;
                      &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                      &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                      &lt;span class=&quot;id&quot;&gt;875&lt;/span&gt;
                    &lt;/div&gt;
                  &lt;/li&gt;
                  &lt;li&gt;
                    &lt;div class=&quot;row&quot;&gt;
                      &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                      &lt;span class=&quot;field-name&quot;&gt;input&lt;/span&gt; =
                      &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                      &lt;span class=&quot;value&quot;&gt;@13237&lt;/span&gt;
                    &lt;/div&gt;
                    &lt;ul&gt;
                      &lt;li&gt;
                        &lt;div class=&quot;row&quot;&gt;
                          &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                          &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                          &lt;span class=&quot;id&quot;&gt;874&lt;/span&gt;
                        &lt;/div&gt;
                      &lt;/li&gt;
                      &lt;li class=&quot;highlighted&quot;&gt;
                        &lt;div class=&quot;row&quot;&gt;
                          &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                          &lt;span class=&quot;field-name&quot;&gt;best&lt;/span&gt; =
                          &lt;span class=&quot;type&quot;&gt;FlinkLogicalJoin&lt;/span&gt;
                          &lt;span class=&quot;value&quot;&gt;@13242&lt;/span&gt;
                        &lt;/div&gt;
                        &lt;ul&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                              &lt;span class=&quot;id&quot;&gt;873&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;joinType&lt;/span&gt; =
                              &lt;span class=&quot;type&quot;&gt;JoinRelType&lt;/span&gt;
                              &lt;span class=&quot;value&quot;&gt;@13255&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;condition&lt;/span&gt; =
                              &lt;span class=&quot;type&quot;&gt;RexCall&lt;/span&gt;
                              &lt;span class=&quot;value&quot;&gt;@13256&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;joinInfo&lt;/span&gt; =
                              &lt;span class=&quot;type&quot;&gt;JoinInfo&lt;/span&gt;
                              &lt;span class=&quot;value&quot;&gt;@13258&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;left&lt;/span&gt; =
                              &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                              &lt;span class=&quot;value&quot;&gt;@13259&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                          &lt;li&gt;
                            &lt;div class=&quot;row&quot;&gt;
                              &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                              &lt;span class=&quot;field-name&quot;&gt;right&lt;/span&gt; =
                              &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                              &lt;span class=&quot;value&quot;&gt;@13260&lt;/span&gt;
                            &lt;/div&gt;
                          &lt;/li&gt;
                        &lt;/ul&gt;
                      &lt;/li&gt;
                    &lt;/ul&gt;
                  &lt;/li&gt;
                &lt;/ul&gt;
              &lt;/li&gt;
            &lt;/ul&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;div class=&quot;row&quot;&gt;
              &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
              &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/variablesTab.svg&quot; /&gt;
              &lt;span class=&quot;field-name&quot;&gt;1&lt;/span&gt; =
              &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
              &lt;span class=&quot;value&quot;&gt;@13228&lt;/span&gt;
            &lt;/div&gt;
            &lt;ul&gt;
              &lt;li&gt;
                &lt;div class=&quot;row&quot;&gt;
                  &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                  &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                  &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                  &lt;span class=&quot;value&quot;&gt;886&lt;/span&gt;
                &lt;/div&gt;
              &lt;/li&gt;
              &lt;li&gt;
                &lt;div class=&quot;row&quot;&gt;
                  &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                  &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                  &lt;span class=&quot;field-name&quot;&gt;best&lt;/span&gt; =
                  &lt;span class=&quot;type&quot;&gt;FlinkLogicalCalc&lt;/span&gt;
                  &lt;span class=&quot;value&quot;&gt;@13248&lt;/span&gt;
                &lt;/div&gt;
                &lt;ul&gt;
                  &lt;li&gt;
                    &lt;div class=&quot;row&quot;&gt;
                      &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                      &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                      &lt;span class=&quot;id&quot;&gt;885&lt;/span&gt;
                    &lt;/div&gt;
                  &lt;/li&gt;
                  &lt;li&gt;
                    &lt;div class=&quot;row&quot;&gt;
                      &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                      &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                      &lt;span class=&quot;field-name&quot;&gt;input&lt;/span&gt; =
                      &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                      &lt;span class=&quot;value&quot;&gt;@13237&lt;/span&gt;
                    &lt;/div&gt;
                    &lt;ul&gt;
                      &lt;li&gt;
                        &lt;div class=&quot;row&quot;&gt;
                          &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                          &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                          &lt;span class=&quot;id&quot;&gt;874&lt;/span&gt;
                        &lt;/div&gt;
                      &lt;/li&gt;
                      &lt;li class=&quot;highlighted&quot;&gt;
                        &lt;div class=&quot;row&quot;&gt;
                          &lt;span class=&quot;toggle&quot;&gt;▾&lt;/span&gt;
                          &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                          &lt;span class=&quot;field-name&quot;&gt;best&lt;/span&gt; =
                          &lt;span class=&quot;type&quot;&gt;FlinkLogicalJoin&lt;/span&gt;
                          &lt;span class=&quot;value&quot;&gt;@13242&lt;/span&gt;
                        &lt;/div&gt;
                        &lt;ul&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;id&lt;/span&gt; =
                            &lt;span class=&quot;id&quot;&gt;873&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;joinType&lt;/span&gt; =
                            &lt;span class=&quot;type&quot;&gt;JoinRelType&lt;/span&gt;
                            &lt;span class=&quot;value&quot;&gt;@13255&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;condition&lt;/span&gt; =
                            &lt;span class=&quot;type&quot;&gt;RexCall&lt;/span&gt;
                            &lt;span class=&quot;value&quot;&gt;@13256&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;joinInfo&lt;/span&gt; =
                            &lt;span class=&quot;type&quot;&gt;JoinInfo&lt;/span&gt;
                            &lt;span class=&quot;value&quot;&gt;@13258&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;left&lt;/span&gt; =
                            &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                            &lt;span class=&quot;value&quot;&gt;@13259&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                          &lt;li&gt;&lt;div class=&quot;row&quot;&gt;
                            &lt;span class=&quot;toggle&quot;&gt;▸&lt;/span&gt;
                            &lt;img class=&quot;icon&quot; src=&quot;/assets/icons/field.svg&quot; /&gt;
                            &lt;span class=&quot;field-name&quot;&gt;right&lt;/span&gt; =
                            &lt;span class=&quot;type&quot;&gt;RelSubset&lt;/span&gt;
                            &lt;span class=&quot;value&quot;&gt;@13260&lt;/span&gt;
                          &lt;/div&gt;&lt;/li&gt;
                        &lt;/ul&gt;
                      &lt;/li&gt;
                    &lt;/ul&gt;
                  &lt;/li&gt;
                &lt;/ul&gt;
              &lt;/li&gt;
            &lt;/ul&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br /&gt;
Somehow the SQL optimizer finds the best plan (cheapest cost plan) for the union with both logical joins equivalent.&lt;/p&gt;

&lt;p&gt;Indeed, if we print the logical union relation before converting to the physical union, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;System.out.println(union.explain())&lt;/code&gt;:&lt;/p&gt;

&lt;table&gt;
&lt;tr class=&quot;diff&quot;&gt;
&lt;td&gt;
&lt;pre&gt;
Union: FlinkLogicalUnion(all=[true])
  FlinkLogicalCalc(subset=[rel#876:RelSubset#19.LOGICAL.any.None: 0.[NONE].[NONE]], select=[id, name, CAST(&apos;OK&apos; AS VARCHAR(2147483647)) AS status])
    FlinkLogicalJoin(subset=[rel#874:RelSubset#18.LOGICAL.any.None: 0.[NONE].[NONE]], condition=[=($0, $2)], joinType=[inner])
      FlinkLogicalTableSourceScan(subset=[rel#867:RelSubset#14.LOGICAL.any.None: 0.[NONE].[NONE]], table=[[default_catalog, default_database, stream]], fields=[id, name])
      FlinkLogicalSnapshot(subset=[rel#872:RelSubset#17.LOGICAL.any.None: 0.[NONE].[NONE]], period=[$cor0.txn_time])
        FlinkLogicalCalc(subset=[rel#870:RelSubset#16.LOGICAL.any.None: 0.[NONE].[NONE]], select=[id])
&lt;span class=&quot;DiffChange&quot;&gt;          FlinkLogicalTableSourceScan(subset=[rel#868:RelSubset#15.LOGICAL.any.None: 0.[NONE].[NONE]], table=[[default_catalog, default_database, dim, filter=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot;)]]], fields=[id, status])&lt;/span&gt;
  FlinkLogicalCalc(subset=[rel#886:RelSubset#24.LOGICAL.any.None: 0.[NONE].[NONE]], select=[id, name, CAST(&apos;NOT_EXISTS&apos; AS VARCHAR(2147483647)) AS status])
    FlinkLogicalJoin(subset=[rel#874:RelSubset#18.LOGICAL.any.None: 0.[NONE].[NONE]], condition=[=($0, $2)], joinType=[inner])
      FlinkLogicalTableSourceScan(subset=[rel#867:RelSubset#14.LOGICAL.any.None: 0.[NONE].[NONE]], table=[[default_catalog, default_database, stream]], fields=[id, name])
      FlinkLogicalSnapshot(subset=[rel#872:RelSubset#17.LOGICAL.any.None: 0.[NONE].[NONE]], period=[$cor0.txn_time])
        FlinkLogicalCalc(subset=[rel#870:RelSubset#16.LOGICAL.any.None: 0.[NONE].[NONE]], select=[id])
&lt;span class=&quot;DiffChange&quot;&gt;          FlinkLogicalTableSourceScan(subset=[rel#868:RelSubset#15.LOGICAL.any.None: 0.[NONE].[NONE]], table=[[default_catalog, default_database, dim, filter=[=(status, _UTF-16LE&apos;OK&apos;:VARCHAR(2147483647) CHARACTER SET &quot;UTF-16LE&quot;)]]], fields=[id, status])&lt;/span&gt;
&lt;/pre&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;We can see that the &lt;em&gt;FlinkLogicalJoin&lt;/em&gt; for both inputs are exactly the same deep into the table scan. Both parts only filter the data if the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;status&lt;/code&gt; column is equal to the string &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is good finding!&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;At this point, I spent extra time debugging the LogicalJoin optimization rules, but it didn’t lead to anything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;markdown-alert markdown-alert-tip&quot; role=&quot;alert&quot;&gt;
    &lt;p class=&quot;markdown-alert-title&quot; dir=&quot;auto&quot;&gt;
        &lt;svg class=&quot;octicon octicon-light-bulb mr-2&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot; height=&quot;16&quot; aria-hidden=&quot;true&quot;&gt;
            &lt;path d=&quot;M8 1.5c-2.363 0-4 1.69-4 3.75 0 .984.424 1.625.984 2.304l.214.253c.223.264.47.556.673.848.284.411.537.896.621 1.49a.75.75 0 0 1-1.484.211c-.04-.282-.163-.547-.37-.847a8.456 8.456 0 0 0-.542-.68c-.084-.1-.173-.205-.268-.32C3.201 7.75 2.5 6.766 2.5 5.25 2.5 2.31 4.863 0 8 0s5.5 2.31 5.5 5.25c0 1.516-.701 2.5-1.328 3.259-.095.115-.184.22-.268.319-.207.245-.383.453-.541.681-.208.3-.33.565-.37.847a.751.751 0 0 1-1.485-.212c.084-.593.337-1.078.621-1.489.203-.292.45-.584.673-.848.075-.088.147-.173.213-.253.561-.679.985-1.32.985-2.304 0-2.06-1.637-3.75-4-3.75ZM5.75 12h4.5a.75.75 0 0 1 0 1.5h-4.5a.75.75 0 0 1 0-1.5ZM6 15.25a.75.75 0 0 1 .75-.75h2.5a.75.75 0 0 1 0 1.5h-2.5a.75.75 0 0 1-.75-.75Z&quot;&gt;&lt;/path&gt;
        &lt;/svg&gt;
        Tip
    &lt;/p&gt;
    &lt;div&gt;&lt;em&gt;Now is also good time to update our tests and add unit test for then case when the filter condition are reversed. That is what happens if we filter for the non-existing&lt;/em&gt; &lt;b&gt;status&lt;/b&gt; &lt;em&gt;values on the first part of the union query?&lt;/em&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;But if both logical joins filter data matching the status &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;OK&apos;&lt;/code&gt;, why do we get the following result with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NOT EXISTS&lt;/code&gt; status rows?&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1, Alice, 2024-11-27 11:52:19.332, OK
2, Bob,   2024-11-27 11:52:19.332, OK
1, Alice, 2024-11-27 11:52:19.333, NOT_EXISTS
2, Bob,   2024-11-27 11:52:19.333, NOT_EXISTS
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Shouldn’t all four rows have status as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OK&lt;/code&gt;?&lt;/p&gt;

&lt;h3 id=&quot;code-generation&quot;&gt;Code Generation&lt;/h3&gt;</content>

      
      
      
      
      

      

      

      
        <category term="apache-flink" />
      
        <category term="flink" />
      
        <category term="java" />
      

      
        <summary type="html">Recently I picked up the FLINK-36808 bug in the Apache Flink project. The solution to bug covers topics such as SQL LookupJoin, Flink SQL Planner, and Volcano optimizer (which is provided by the Apache Calcite project).</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Prospero Challenge: JIT Codegen with LLVM and AsmJit</title>
      
      
      <link href="/2025/08/27/prospero-challenge/" rel="alternate" type="text/html" title="Prospero Challenge: JIT Codegen with LLVM and AsmJit" />
      
      <published>2025-08-27T00:00:00+00:00</published>
      <updated>2025-08-27T00:00:00+00:00</updated>
      <id>/2025/08/27/prospero-challenge</id>
      <content type="html" xml:base="/2025/08/27/prospero-challenge/">&lt;p&gt;I wanted to explore and deep dive into the Just-In-Time (JIT) compilation and code generation topics. To get some hands-on experience, I was looking for a fun project and came across the &lt;a href=&quot;https://www.mattkeeter.com/projects/prospero/&quot;&gt;Prospero Challenge&lt;/a&gt; from &lt;a href=&quot;https://www.mattkeeter.com&quot;&gt;Matt Keeter&lt;/a&gt;, which was a perfect fit.&lt;/p&gt;

&lt;p&gt;The source code for my implementation is available at &lt;a href=&quot;https://github.com/morazow/prospero-rust&quot;&gt;morazow/prospero-rust&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The challenge is simple, you are given list of ~8k mathematical expressions that must be evaluated for each pixel &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(x, y)&lt;/code&gt; in an image (e.g., 1024x1024 size). The pixel is then colored black or white based on the sign of the final result.&lt;/p&gt;

&lt;p&gt;The input is a series of operations that build up the expression:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Text of a monologue from The Tempest
_0 const 2.95
_1 var-x
_2 const 8.13008
_3 mul _1 _2
_4 add _0 _3
_5 const 3.675
# ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When rendered, these expressions produce an image as following:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/prospero-challenge/prospero.png&quot; alt=&quot;Prospero Output&quot; width=&quot;400&quot; height=&quot;400&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can read more about the challenge and see solutions from other developers on the &lt;a href=&quot;https://www.mattkeeter.com/projects/prospero/&quot;&gt;Prospero site&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;my-approach&quot;&gt;My Approach&lt;/h2&gt;

&lt;p&gt;Since this was a learning project for me rather than a pure optimization challenge, I took a multi-stage approach, starting with the simplest implementation and progressively adding different JIT backends.&lt;/p&gt;

&lt;h3 id=&quot;a-baseline-virtual-machine-vm&quot;&gt;A Baseline Virtual Machine (VM)&lt;/h3&gt;

&lt;p&gt;I started by implementing a Virtual Machine (VM). This approach is straightforward and served as a baseline for verifying the correctness of the challenge.&lt;/p&gt;

&lt;h3 id=&quot;jit-with-llvm&quot;&gt;JIT with LLVM&lt;/h3&gt;

&lt;p&gt;For my first Just-In-Time (JIT) compilation backend, I chose the powerful and popular &lt;a href=&quot;https://llvm.org&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LLVM&lt;/code&gt;&lt;/a&gt; framework. It is a portable, cross-platform ecosystem with a mature optimization steps. LLVM also provides a rich set of intrinsic functions (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;llvm.sqrt.f64&lt;/code&gt; for square roots), which simplifies code generation.&lt;/p&gt;

&lt;p&gt;To connect Rust frontend with LLVM, I used the &lt;a href=&quot;https://crates.io/crates/inkwell&quot;&gt;inkwell&lt;/a&gt; crate, it helped to generate LLVM Intermediate Representation (IR) from project’s Abstract Syntax Tree (AST).&lt;/p&gt;

&lt;h3 id=&quot;low-level-jit-with-asmjit&quot;&gt;Low-Level JIT with AsmJit&lt;/h3&gt;

&lt;p&gt;Next, I wanted to experiment with &lt;a href=&quot;https://asmjit.com&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AsmJit&lt;/code&gt;&lt;/a&gt;. It’s a lightweight C++ library designed for low-latency machine code generation, supporting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X86&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X86_64&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AArch64&lt;/code&gt; architectures.&lt;/p&gt;

&lt;p&gt;This part of the project required integrating C++ code with Rust. I built a &lt;a href=&quot;https://doc.rust-lang.org/rust-by-example/std_misc/ffi.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Foreign Function Interface&lt;/code&gt;&lt;/a&gt; (FFI) bridge that allowed the AsmJit compiler to consume the Rust AST and generate native code.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This was a fun exercise for learning the fundamentals of just-in-time compilation, and progressing from a high-level VM to low-level machine code generation.&lt;/p&gt;

&lt;p&gt;A special thanks to Matt Keeter for creating such an inspiring challenge!&lt;/p&gt;

&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;

&lt;p&gt;While I’m happy with the result, there are always more things to explore:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;AST-level optimizations:&lt;/strong&gt; Implementing duplicate expression elimination before code generation.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;AsmJit enhancements:&lt;/strong&gt; Adding a constant pool for the AsmJit backend (e.g., using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.data&lt;/code&gt; section) to reduce code size and improve performance.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Parallelism:&lt;/strong&gt; Parallelizing the image rendering by processing tiles in multiple threads.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;A C++ version:&lt;/strong&gt; Re-implementing the project in C++ to avoid FFI in AsmJit and to directly use LLVM C++ API.&lt;/li&gt;
&lt;/ul&gt;</content>

      
      
      
      
      

      

      

      
        <category term="jit" />
      
        <category term="codegen" />
      
        <category term="llvm" />
      
        <category term="asmjit" />
      

      
        <summary type="html">I wanted to explore and deep dive into the Just-In-Time (JIT) compilation and code generation topics. To get some hands-on experience, I was looking for a fun project and came across the Prospero Challenge from Matt Keeter, which was a perfect fit.</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Finding Semicolons: Examples From 1BRC Submissions</title>
      
      
      <link href="/2024/02/11/finding-semicolons/" rel="alternate" type="text/html" title="Finding Semicolons: Examples From 1BRC Submissions" />
      
      <published>2024-02-11T00:00:00+00:00</published>
      <updated>2024-02-11T00:00:00+00:00</updated>
      <id>/2024/02/11/finding-semicolons</id>
      <content type="html" xml:base="/2024/02/11/finding-semicolons/">&lt;p&gt;&lt;a href=&quot;https://twitter.com/gunnarmorling&quot;&gt;Gunnar Morling&lt;/a&gt; launched &lt;a href=&quot;https://github.com/gunnarmorling/1brc&quot;&gt;One Billion Row Challenge (1BRC)&lt;/a&gt; in the beginning of the year. The goal is to calculate temperature aggregates (min, max, sum) of weather stations. The data is one billion rows of measurements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;string: station&amp;gt;;&amp;lt;double: temperature&amp;gt;&lt;/code&gt; format.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John&apos;s;15.2
Cracow;12.6
Bridgetown;26.9
Istanbul;6.2
Roseau;34.4
Conakry;31.2
Istanbul;23.0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The station names are arbitrary length strings and temperatures are of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X.XX&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XX.X&lt;/code&gt; formatted double values.&lt;/p&gt;

&lt;p&gt;Submitted solutions were golden, there were many interesting, optimized submissions. I enjoyed reading them and trying to understand how they work and why they are fast. If only they were easier to understand :-)&lt;/p&gt;

&lt;h2 id=&quot;parsing&quot;&gt;Parsing&lt;/h2&gt;

&lt;p&gt;One of the tasks is to parse the input data, separate station name and its temperature value. For that we have to find the location of semicolon &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;;&apos;&lt;/code&gt; in the each input line!&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICOLON&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;&apos;;&apos;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;findSemicolonPosition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICOLON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;blockquote&gt;
  &lt;p&gt;Linear scan to find the semicolon position in byte array&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Apparently, simple task of finding a byte character in a given string could be heavily optimized.&lt;/p&gt;

&lt;p&gt;In this blog I am going to look into two options that were used in &lt;strong&gt;1BRC&lt;/strong&gt; submissions, the first is “SIMD Within A Register (SWAR)” technique, and the second is to use &lt;a href=&quot;https://openjdk.org/jeps/460&quot;&gt;Java Vector API&lt;/a&gt;. Both of these techniques take advantage of processing multiple data on a single instruction.&lt;/p&gt;

&lt;h2 id=&quot;swar&quot;&gt;SWAR&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/SWAR&quot;&gt;SWAR&lt;/a&gt; is technique to process multiple bytes at once taking advantage of &lt;em&gt;64-bit&lt;/em&gt; processor CPU architectures.&lt;/p&gt;

&lt;p&gt;The idea is to load multiple bytes into single 64-bit register and then perform bitwise operations to find the index of the matching byte. On &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;little-endian&lt;/code&gt; machines, we want the index of the first matching byte from the right end of the register, since little-endian machines reverse the bytes when a word is loaded into a register.&lt;/p&gt;

&lt;p&gt;Mainly we are looking for the following function, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;indexOfFirstMatched(word, pattern)&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;                                      /
                                     /  0, word = XXXXXXXXXXXXXXOO
                                    |   1, word = XXXXXXXXXXXXOONN
                                    |   2, word = XXXXXXXXXXOONNNN
                                   /    3, word = XXXXXXXXOONNNNNN
indexOfFirstMatched(word, 0xOO) = &amp;lt;     4, word = XXXXXXOONNNNNNNN
                                   \    5, word = XXXXOONNNNNNNNNN
                                    |   6, word = XXOONNNNNNNNNNNN
                                    |   7, word = OONNNNNNNNNNNNNN
                                     \  8, word = NNNNNNNNNNNNNNNN // return byte length
                                      \                            // if no match is found
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OO&lt;/code&gt; denotes the match byte, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NN&lt;/code&gt; denotes a nonzero byte, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XX&lt;/code&gt; denotes a byte that maybe zero or nonzero. If no match is found, the function returns the length of word.&lt;/p&gt;

&lt;p&gt;This technique perfectly fits for finding locations of semicolons in 1BRC problem during parsing of each line. Most of the submissions used technique from &lt;a href=&quot;https://twitter.com/richardstartin&quot;&gt;Richard Startin&lt;/a&gt;’s &lt;a href=&quot;https://richardstartin.github.io/posts/finding-bytes.html&quot;&gt;“Finding Bytes”&lt;/a&gt; blog post.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href=&quot;https://twitter.com/thomaswue&quot;&gt;Thomas Würthinger&lt;/a&gt; early &lt;a href=&quot;https://github.com/thomaswue/1brc/blob/b3b88515475bc71f4b11564e62ebdf24120a8088/src/main/java/dev/morling/onebrc/CalculateAverage_thomaswue.java#L224-L229&quot;&gt;submission&lt;/a&gt; (slightly modified by me):&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;findDelimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x3B3B3B3B3B3B3B3B&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7F7F7F7F7F7F7F7F&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7F7F7F7F7F7F7F7F&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7F7F7F7F7F7F7F7F&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;numberOfTrailingZeros&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, there is improved version of above approach that works in little-endian architectures. An example of this is from &lt;a href=&quot;https://twitter.com/royvanrijn&quot;&gt;Roy van Rijn&lt;/a&gt; early &lt;a href=&quot;https://github.com/gunnarmorling/1brc/blob/5570f1b60a557baf9ec6af412f8d5bd75fc44891/src/main/java/dev/morling/onebrc/CalculateAverage_royvanrijn.java#L178-L184&quot;&gt;submission&lt;/a&gt; (slightly modified by me):&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;firstAnyPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x3B3B3B3B3B3B3B3B&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x0101010101010101&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x8080808080808080&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;numberOfTrailingZeros&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Both of these methods find the index of first byte in the 64-bit long word that matches the semicolon &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;;&apos;&lt;/code&gt; pattern encoded as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x3B3B3B3B3B3B3B3BL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To learn more why this works, please read &lt;a href=&quot;https://richardstartin.github.io/posts/finding-bytes&quot;&gt;Richard Startin’s explanation&lt;/a&gt; and check the visuals from &lt;a href=&quot;http://0x80.pl/notesen/2023-03-06-swar-find-any.html&quot;&gt;Wojciech Muła’s “SWAR find any byte from set”&lt;/a&gt; post.&lt;/p&gt;

&lt;h3 id=&quot;finding-0-byte&quot;&gt;Finding 0-Byte&lt;/h3&gt;

&lt;p&gt;These optimizations are used in many applications that need to find the index of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;\0&apos;&lt;/code&gt; — characters in strings. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strlen&lt;/code&gt; function in C, instead of checking each byte separately, could load multiple bytes as word to check for the matching byte pattern.&lt;/p&gt;

&lt;p&gt;The early versions of the SWAR algorithm was presented by &lt;a href=&quot;https://lamport.azurewebsites.net/pubs/pubs.html&quot;&gt;Leslie Lamport&lt;/a&gt; in his paper titled &lt;a href=&quot;https://lamport.azurewebsites.net/pubs/pubs.html#multiple-byte&quot;&gt;“Multiple byte processing with full-word instructions”&lt;/a&gt; in 1975.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;It’s a neat hack, and it’s more useful now than it was then for two reasons.  The obvious reason is that word size is larger now, with many computers having 64-bit words.  The less obvious reason is that conditional operations are implemented with masking rather than branching.  Instead of branching around the operation when the condition is not met, masks are constructed so the operation is performed only on those data items for which the condition is true.  Branching is more costly on modern multi-issue computers than it was on the computers of the 70s.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/Hacker%27s_Delight&quot;&gt;Hacker’s Delight&lt;/a&gt; book, in Chapter 6 &lt;em&gt;“Find First 0-Byte”&lt;/em&gt;, describes both of the above approaches using 32-bits words. The first method is attributed to Leslie Lamport because he uses similar tricks in his paper. The second method was proposed by &lt;a href=&quot;https://www.cl.cam.ac.uk/~am21/&quot;&gt;Alan Mycroft&lt;/a&gt; on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp.arch&lt;/code&gt; newsgroup in 1987.&lt;/p&gt;

&lt;p&gt;Let’s try to understand the Mycroft’s version.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;firstFirstSemicolon&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x3B3B3B3B3B3B3B3B&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x0101010101010101&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x8080808080808080&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;numberOfTrailingZeros&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x3B&lt;/code&gt; is our pattern, it is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;;&apos;&lt;/code&gt; in ASCII&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;word ^ 0x3B&lt;/code&gt; will set all matching &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&apos;;&apos;&lt;/code&gt; bytes in word to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(x - 0x01)&lt;/code&gt; converts all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; bytes to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xFF&lt;/code&gt;, sets the highest bit of a byte to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;
    &lt;ul&gt;
      &lt;li&gt;It converts the set of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{ 0x00, 0x81, 0x82, 0x83, ..., 0xFF }&lt;/code&gt; bytes into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{ 0xFF, 0x80, 0x81, 0x82, ..., 0xFE }&lt;/code&gt; set.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~x&lt;/code&gt; converts all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; bytes to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xFF&lt;/code&gt;, sets the highest bit of a byte to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;
    &lt;ul&gt;
      &lt;li&gt;Similarly, this instruction converts &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{ 0x00, 0x01, 0-x02, 0x03, ..., 0xFF }&lt;/code&gt; byte set into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{ 0xFF, 0xFE, 0xFD, 0xFC, ..., 0x80 }&lt;/code&gt; set.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(x - 0x01) &amp;amp; ~x&lt;/code&gt; will retain the highest set bit of byte, only if the byte is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt;
    &lt;ul&gt;
      &lt;li&gt;Since both above input sets contain &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt;, applying bitwise &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND&lt;/code&gt; on both sets will keep the highest bit set only for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; byte.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x80&lt;/code&gt; then zeros all bits except the highest bit of each byte&lt;/li&gt;
  &lt;li&gt;By counting number of trailing zeros, we find the index of first &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; byte&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is nice explanation from &lt;a href=&quot;https://bits.stephan-brumme.com/null.html&quot;&gt;“Detects zero bytes inside a 32 bit integer”&lt;/a&gt; article.&lt;/p&gt;

&lt;h2 id=&quot;java-vector-api&quot;&gt;Java Vector API&lt;/h2&gt;

&lt;p&gt;The second optimization is to use the &lt;strong&gt;Java Vector API&lt;/strong&gt; for finding index of the first matching pattern in a byte array.&lt;/p&gt;

&lt;p&gt;Java Vector API is a preview feature in Java 21 that enable developers to take advantage of &lt;a href=&quot;https://en.wikipedia.org/wiki/Single_instruction,_multiple_data&quot;&gt;Single Instruction Multiple Data (SIMD)&lt;/a&gt; processors. Instead of depending on Java autovectorization, by using Vector API you tell the compiler to take advantage of the SIMD instructions.&lt;/p&gt;

&lt;p&gt;To see how to use Java Vector API, let us calculate the norm of two arrays. The scalar implementation will be as following:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Assuming arrays are of the same length&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;scalarNormComputation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And a possible vectorized implementation:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VectorSpecies&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SPECIES&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FloatVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SPECIES_PREFERRED&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;vectorNormComputation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upperBound&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;loopBound&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;upperBound&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;va&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FloatVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromArray&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FloatVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromArray&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;va&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mul&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;va&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mul&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;neg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;vc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;intoArray&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At the end, we perform calculation for the leftover elements that are not included in the vectorized loop.&lt;/p&gt;

&lt;p&gt;Similar to norm computation, we can check equality (single instruction) of semicolon &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;;&lt;/code&gt; on multiple bytes at once using Vector API.&lt;/p&gt;

&lt;p&gt;An example from &lt;a href=&quot;https://twitter.com/melgenek&quot;&gt;Yevhenii Melnyk&lt;/a&gt; &lt;a href=&quot;https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_melgenek.java#L162-L178&quot;&gt;submission&lt;/a&gt; (slightly modified by me):&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICOLON&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;&apos;;&apos;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VectorSpecies&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ByteVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SPECIES_PREFERRED&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES_BYTE_SIZE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;vectorByteSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Vector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICOLON_VECTOR&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;broadcast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;SEMICOLON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Finds the position of first semicolon in the given byte array&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;findDelimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BufferedFile&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;startPos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;startPos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vectorLoopBound&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;startPos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;loopBound&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;bufferLimit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;startPos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vectorLoopBound&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;BYTE_SPECIES_BYTE_SIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vector&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ByteVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromArray&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;BYTE_SPECIES&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comparisonResult&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;compare&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;VectorOperators&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;EQ&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICONLON_VECTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comparisonResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;anyTrue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comparisonResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;firstTrue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;SEMICOLON&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;position&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I have also run these methods through &lt;a href=&quot;https://github.com/openjdk/jmh&quot;&gt;JMH&lt;/a&gt; benchmarks. GitHub repository with the code and benchmarks is &lt;a href=&quot;https://github.com/morazow/java-simd-benchmarks&quot;&gt;morazow/java-simd-benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The benchmark evaluates the average running time (lower is faster) of each method on randomly generated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;100K&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;100M&lt;/code&gt; measurements data.&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img alt=&quot;Evaluation of finding semicolons in 100K measurements data.&quot; src=&quot;/files/semicolon/semicolon-evaluations-100K.png&quot; /&gt;&lt;/th&gt;
&lt;th&gt;&lt;img alt=&quot;Evaluation of finding semicolons in 100M measurements data.&quot; src=&quot;/files/semicolon/semicolon-evaluations-100M.png&quot; /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Benchmark                                         (filename)  Mode  Cnt       Score       Error  Units
FindingSemicolonBenchmark.linearScan   measurements-100K.txt  avgt   10     831,308 ±     5,847  us/op
FindingSemicolonBenchmark.swarLamport  measurements-100K.txt  avgt   10     837,476 ±     6,740  us/op
FindingSemicolonBenchmark.swarMycroft  measurements-100K.txt  avgt   10     755,864 ±     6,198  us/op
FindingSemicolonBenchmark.vectorAPI    measurements-100K.txt  avgt   10      41,236 ±     0,827  us/op
FindingSemicolonBenchmark.linearScan   measurements-100M.txt  avgt   10  850736,385 ± 13573,874  us/op
FindingSemicolonBenchmark.swarLamport  measurements-100M.txt  avgt   10  847787,746 ± 24557,818  us/op
FindingSemicolonBenchmark.swarMycroft  measurements-100M.txt  avgt   10  767232,577 ± 19555,154  us/op
FindingSemicolonBenchmark.vectorAPI    measurements-100M.txt  avgt   10   42925,814 ±  1993,941  us/op

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;blockquote&gt;
  &lt;p&gt;Benchmark is run using OpenJDK 21 Temurin, on Apple M2 Pro with 12 (8 performance and 4 efficiency) CPU cores and 16 GB memory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The linear scan method having similar running time as first SWAR method could be explained by the fact that branch predictor learning the benchmark data. The Java Vector API implementation runs the fastest.&lt;/p&gt;

&lt;p&gt;In this post we only looked at a couple of the optimizations. There are many other interesting techniques that were used by the participants. The challenge was fun and great learning experience.&lt;/p&gt;

&lt;p&gt;Thanks a lot Gunnar for your efforts and time organizing it!&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://richardstartin.github.io/posts/finding-bytes.html&quot;&gt;Finding Bytes in Arrays&lt;/a&gt; by &lt;em&gt;Richard Startin&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://0x80.pl/notesen/2023-03-06-swar-find-any.html&quot;&gt;SWAR find any byte from set&lt;/a&gt; by &lt;em&gt;Wojciech Muła&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://stackoverflow.com/questions/76401479/speed-up-strlen-using-swar-in-x86-64-assembly&quot;&gt;Speed up strlen using SWAR in x86-64 Assembly&lt;/a&gt; &lt;em&gt;StackOverflow Question&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://graphics.stanford.edu/~seander/bithacks.html&quot;&gt;Bithacks: Determine if a word has a zero byte&lt;/a&gt; by &lt;em&gt;Sean Eron Anderson&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lamport.azurewebsites.net/pubs/pubs.html#multiple-byte&quot;&gt;Multiple Byte Processing with Full-Word Instructions&lt;/a&gt; by &lt;em&gt;Leslie Lamport&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://bits.stephan-brumme.com/null.html&quot;&gt;Detects zero bytes inside a 32 bit integer&lt;/a&gt; by &lt;em&gt;Stephan Brumme&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
      
      
      
      

      

      

      
        <category term="java" />
      
        <category term="swar" />
      
        <category term="simd" />
      
        <category term="java-vector-api" />
      

      
        <summary type="html">Gunnar Morling launched One Billion Row Challenge (1BRC) in the beginning of the year. The goal is to calculate temperature aggregates (min, max, sum) of weather stations. The data is one billion rows of measurements in &amp;lt;string: station&amp;gt;;&amp;lt;double: temperature&amp;gt; format.</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Understanding Java Memory Model</title>
      
      
      <link href="/2023/10/13/java-memory-model/" rel="alternate" type="text/html" title="Understanding Java Memory Model" />
      
      <published>2023-10-13T00:00:00+00:00</published>
      <updated>2023-10-13T00:00:00+00:00</updated>
      <id>/2023/10/13/java-memory-model</id>
      <content type="html" xml:base="/2023/10/13/java-memory-model/">&lt;p&gt;These are notes taken to better understand the &lt;a href=&quot;https://docs.oracle.com/javase/specs/jls/se21/html/jls-17.html#jls-17.4&quot;&gt;Java Memory Model&lt;/a&gt;, which I now publish as a blog post.&lt;/p&gt;

&lt;p&gt;Programming language memory models, such as the Java Memory Model, attempt to define the behavior of multi-threaded programs. These specifications help to reason about code execution in a concurrent environment, even when the code runs on different hardware architectures or undergoes numerous compiler optimizations.&lt;/p&gt;

&lt;p&gt;For example, given the following multi-threaded program:&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;done&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;done&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/** **/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Initially, all variables are set to zero, and each thread runs in its own processor. Can we reason about the output of the program?&lt;/p&gt;

&lt;p&gt;It appears that the output depends on the hardware and compiler optimizations. On x86 architecture, the assembly version of this code will always print &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;. However, on ARM architecture, it may print &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;. Additionally, compiler optimizations can cause this program to either print zero or enter an infinite loop.&lt;/p&gt;

&lt;p&gt;As a programmer, it would be frustrating if programs don’t work on new hardware (mobile devices, cloud servers) or with new compilers. Thus, high-level programming language memory models are defined to provide a set of guarantees that programmers, while writing code in that language, can rely upon.&lt;/p&gt;

&lt;p&gt;But first, let’s understand what guarantees are provided by the hardware.&lt;/p&gt;

&lt;h2 id=&quot;hardware-guarantees&quot;&gt;Hardware Guarantees&lt;/h2&gt;

&lt;p&gt;Let us imagine we are writing an assembly code for our multiprocessor computer. What kind of guarantees could we expect from the hardware?&lt;/p&gt;

&lt;h3 id=&quot;sequential-consistency&quot;&gt;Sequential Consistency&lt;/h3&gt;

&lt;p&gt;From &lt;a href=&quot;https://www.microsoft.com/en-us/research/uploads/prod/2016/12/How-to-Make-a-Multiprocessor-Computer-That-Correctly-Executes-Multiprocess-Programs.pdf&quot;&gt;Leslie Lamport’s 1979 paper&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The customary approach to designing and proving the correctness of
multiprocess algorithms for such a computer assumes that the following
condition is satisfied: the result of any execution is the same as if the
operations of all the processors were executed in some sequential order, and
the operations of each individual processor appear in this sequence in the
order specified by its program. A multiprocessor satisfying this condition
will be called sequentially consistent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This definition is natural to a programmer. It states that operations will be executed in the order they appear in a written program, and threads will be interleaved in some order.&lt;/p&gt;

&lt;p&gt;For example, given the following program with two threads:&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;We could expect the following six outcomes from the above program in a sequentially consistent hardware, with interleaving of each thread operations.&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;    &lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;As you can see, in the sequential consistent model the execution result where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 0&lt;/code&gt; is not possible.&lt;/p&gt;

&lt;p&gt;We can envision this model where all processors are directly linked to a single shared memory. In this case, there are no caches involved each write or read operation goes straight to the memory.&lt;/p&gt;

&lt;div class=&quot;img-group&quot;&gt;
&lt;div class=&quot;&quot;&gt;
  &lt;img alt=&quot;Sequential Consistent Hardware&quot; src=&quot;https://research.swtch.com/mem-sc.png&quot; /&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot; style=&quot;text-align:left&quot;&gt;
  Sequential Consistent Hardware Model. Image © &lt;a href=&quot;https://research.swtch.com/hwmm&quot;&gt;Russ Cox - Hardware Memory Models&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;As we’ll see, modern hardware designs often give up on strict sequential consistency for performance reasons.&lt;/p&gt;

&lt;h3 id=&quot;x86-total-store-order-tso&quot;&gt;x86 Total Store Order (TSO)&lt;/h3&gt;

&lt;p&gt;The modern x86 architecture memory model is based on the following hardware structure.&lt;/p&gt;

&lt;div class=&quot;img-group&quot;&gt;
&lt;div class=&quot;&quot;&gt;
  &lt;img alt=&quot;x86 Architecture.&quot; src=&quot;https://research.swtch.com/mem-tso.png&quot; /&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot; style=&quot;text-align:left&quot;&gt;
  x85 Architecture. Image © &lt;a href=&quot;https://research.swtch.com/hwmm&quot;&gt;Russ Cox - Hardware Memory Models&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In this model, each write is queued in first-in, first-out (FIFO) order before being written to the shared memory. Similarly, each read first checks the local queue and then queries the shared memory. The local queue is flushed to the shared memory in FIFO order, ensuring that each write is applied in the same execution order in the processor.&lt;/p&gt;

&lt;p&gt;This results in &lt;strong&gt;total store order&lt;/strong&gt; (TSO). Once write reaches the shared memory, each next read sees it until it’s overwritten or buffered in the local write queue.&lt;/p&gt;

&lt;h3 id=&quot;arm-relaxed-memory-order&quot;&gt;ARM Relaxed Memory Order&lt;/h3&gt;

&lt;p&gt;ARM processors have weaker memory models.&lt;/p&gt;

&lt;div class=&quot;img-group&quot;&gt;
&lt;div class=&quot;&quot;&gt;
  &lt;img alt=&quot;ARM Architecture.&quot; src=&quot;https://research.swtch.com/mem-weak.png&quot; /&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot; style=&quot;text-align:left&quot;&gt;
  ARM Architecture. Image © &lt;a href=&quot;https://research.swtch.com/hwmm&quot;&gt;Russ Cox - Hardware Memory Models&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;As depicted in the picture, each processor has its own copy of memory. Each write propagates independently to other processors and there is a possibility of write reordering. Additionally, a read can be delayed until it’s needed or until after a write.&lt;/p&gt;

&lt;p&gt;In ARM hardware model, there is no total store order. It only provides total order for writes on a single memory location (&lt;strong&gt;coherence&lt;/strong&gt;) that we will see later.&lt;/p&gt;

&lt;h3 id=&quot;data-race-free-sequential-consistency-drf-sc&quot;&gt;Data Race Free Sequential Consistency (DRF-SC)&lt;/h3&gt;

&lt;p&gt;In their 1990 paper called &lt;a href=&quot;https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.5567&quot;&gt;Weak Ordering — A New Definition&lt;/a&gt;, Sarita Adve and Mark Hill introduced a synchronization model known as the data-race-free (DRF) model.&lt;/p&gt;

&lt;p&gt;This model assumes that there are distinct hardware memory synchronization operations, and memory read or write operations can be rearranged between these synchronization operations. However, reads and writes must not be moved across the synchronization operations.&lt;/p&gt;

&lt;p&gt;A program is considered data-race-free if any two accesses to the same memory location from two different threads are either both read operations or are separated by a synchronization operation that forces one to happen before the other.&lt;/p&gt;

&lt;p&gt;Well, this provides an agreement between hardware and software, given a data-race-free program, the hardware will execute it in a sequentially consistent manner. The above paper provides a proof, even for weaker ARM hardware, that the hardware will appear sequentially consistent to a data-race-free programs. This guarantee is abbreviated as Data Race Free Sequential Consistency (&lt;strong&gt;DRF-SC&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;So far, we have been assuming that we are programming in an assembly language that is close to the hardware. Let’s now examine the guarantees offered by high-level programming language, Java’s memory model.&lt;/p&gt;

&lt;h2 id=&quot;java-memory-model&quot;&gt;Java Memory Model&lt;/h2&gt;

&lt;p&gt;In the preceding section, we learned that if a programming language offers synchronization mechanisms to coordinate different threads, we can use these to create &lt;em&gt;data-race-free&lt;/em&gt; (DRF) programs. Data-race-free multi-threaded program operations could be arbitrarily interleaved, and the outcome of this program can be explained by some sequential consistent execution, as if the operations are run on a single processor.&lt;/p&gt;

&lt;p&gt;Java offers various options for synchronization mechanisms to develop DRF programs. It’s crucial to understand that these are “synchronizing instructions” which establish a &lt;strong&gt;happens-before&lt;/strong&gt; relationship between code executed on one thread and code executed on another.&lt;/p&gt;

&lt;p&gt;The main Java synchronization operations are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The creation of a thread happens before the first action in the thread.&lt;/li&gt;
  &lt;li&gt;Unlock of mutex &lt;strong&gt;m&lt;/strong&gt; happens before any following lock of &lt;strong&gt;m&lt;/strong&gt;.&lt;/li&gt;
  &lt;li&gt;Write to volatile variable &lt;strong&gt;v&lt;/strong&gt; happens before any following read of &lt;strong&gt;v&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Of course, there are more synchronization operations in Java, but in this blog
we will focus on `volatile` since it demonstrates the main idea.
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&quot;happens-before&quot;&gt;Happens Before&lt;/h3&gt;

&lt;p&gt;Java Memory Model (JMM) specifies which &lt;strong&gt;outcomes&lt;/strong&gt; are permitted by the Java language. The outcomes are results of executions containing different orderings of operations of entire program.&lt;/p&gt;

&lt;p&gt;We could arrange all operations, including lock, unlock, volatile write, and volatile read, in some interleaving order. Then, as an example, we could write to a volatile variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v&lt;/code&gt; and subsequently later in the ordering read &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v&lt;/code&gt;, which &lt;em&gt;observes that write&lt;/em&gt;. This creates a happens-before edge on that particular execution.&lt;/p&gt;

&lt;p&gt;These edges define whether an execution has a data race; if there is no data race, then the execution behaves in a sequentially consistent manner.&lt;/p&gt;

&lt;p&gt;This conforms to the definition of &lt;strong&gt;DRF-SC&lt;/strong&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Two events occurring on separate processors and not ordered by the
happens-before relationship may happen at the same moment; the exact order is
unclear. We refer to them as executing concurrently. A data race occurs when a
write to a variable executes concurrently with a read or another write of the
same variable.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We should remark two important points:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;happens-before&lt;/strong&gt; edges also synchronize the rest of programs, ordering ordinary operations across threads.&lt;/li&gt;
  &lt;li&gt;The &lt;strong&gt;happens-before&lt;/strong&gt; edges are not established by locking or unlocking different mutexes, or accessing different volatile variables.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;compiler-optimizations&quot;&gt;Compiler Optimizations&lt;/h3&gt;

&lt;p&gt;These JMM rules define which optmizations are allowed by the compilers.&lt;/p&gt;

&lt;p&gt;Given the following program with two threads, which instruction reorderings are allowed?&lt;/p&gt;

&lt;div class=&quot;img-group&quot;&gt;
&lt;div class=&quot;&quot;&gt;
  &lt;img alt=&quot;Multi-threaded program with happens-before edges.&quot; src=&quot;/files/jmm/jmm-reorderings-hb-edges.jpg&quot; /&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot; style=&quot;text-align:left&quot;&gt;
  Multi-threaded program with happens-before edges.
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;These two orderings are allowed.&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img alt=&quot;Happens-before allowed reordering, case one.&quot; src=&quot;/files/jmm/jmm-reorderings-hb-ordering01.jpg&quot; /&gt;&lt;/th&gt;
&lt;th&gt;&lt;img alt=&quot;Happens-before allowed reordering, case two.&quot; src=&quot;/files/jmm/jmm-reorderings-hb-ordering02.jpg&quot; /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; could be moved before write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volatile v&lt;/code&gt; since it doesn’t break happens-before (HB) edge and read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; can observe write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; with race.&lt;/p&gt;

&lt;p&gt;Similarly, read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; could be moved after read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volatile v&lt;/code&gt; since it could observe result of write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; with race.&lt;/p&gt;

&lt;p&gt;These two orderings are not allowed.&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img alt=&quot;Happens-before forbidden reordering, case one.&quot; src=&quot;/files/jmm/jmm-reorderings-hb-ordering03.jpg&quot; /&gt;&lt;/th&gt;
&lt;th&gt;&lt;img alt=&quot;Happens-before forbidden reordering, case two.&quot; src=&quot;/files/jmm/jmm-reorderings-hb-ordering04.jpg&quot; /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Both of these orderings break the &lt;strong&gt;happens-before&lt;/strong&gt; edge.&lt;/p&gt;

&lt;p&gt;For the first case, implementations cannot know if there is any read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; that should observe write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; before moving. Similarly, in the second case, implementations are unable to determine if the moved read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; should observe any preceding write to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;litmus-tests&quot;&gt;Litmus Tests&lt;/h2&gt;

&lt;p&gt;In this section, we are going to run several &lt;strong&gt;litmus tests&lt;/strong&gt; to evaluate how different models behave. These tests provide an answer to the question if certain outcome is possible or not under specific model.&lt;/p&gt;

&lt;p&gt;In these examples, we assume that each shared variable starts with zero and each thread runs in its own dedicated processor. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rN&lt;/code&gt; is a thread-local variable, and we check if a thread-local result is possible at the end of execution.&lt;/p&gt;

&lt;h3 id=&quot;message-passing&quot;&gt;Message Passing&lt;/h3&gt;

&lt;p&gt;Can this program see the following result &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 0&lt;/code&gt;?&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;2&quot;&gt;
- On sequentially consistent hardware: no&lt;br /&gt;
- On x86-TSO: no&lt;br /&gt;
- On ARM: yes&lt;br /&gt;
- Java language (plain): yes&lt;br /&gt;
- Java language (volatile y): no&lt;br /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;This outcome isn’t possible in a sequentially consistent hardware model. We can imagine that each processor is directly connected to the shared memory, there are no caches, registers, or write buffers. Thus, if we see an update on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;, then we should also see the update on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Similar reasoning applies to the x86-TSO model. The write buffer from a single processor is flushed in FIFO order, ensuring that the update on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; should become visible if the update on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; happens.&lt;/p&gt;

&lt;p&gt;Please be aware that we are writing these programs in assembly like language, where each instruction is executed by the processor. For the ARM model, these instructions can be reordered. Writes can be run in a different order, resulting in the possibility of an outcome.&lt;/p&gt;

&lt;p&gt;For the Java language, using plain variables, the outcome mentioned above is possible. The optimizing compilers can reorder updates on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; variables.&lt;/p&gt;

&lt;p&gt;However, if we declare &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volatile&lt;/code&gt;, then write and read on a volatile variable produces a happens-before edge. This edge precludes the optimizing Java compiler from moving the update on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; after write on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; operation.&lt;/p&gt;

&lt;p&gt;By making variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; volatile, the above outcome is not possible. Similar to x86-TSO, update on a volatile propagates all updates, including ordinary variables, before it into the memory. If read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; results in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, then read on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; must also result in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;.&lt;/p&gt;

&lt;h4 id=&quot;message-passing-test--happens-before-edges&quot;&gt;Message Passing Test — Happens Before Edges&lt;/h4&gt;

&lt;p&gt;We could easily visualize the happens-before (HB) edges in the possible executions of the message passing test.&lt;/p&gt;

&lt;table style=&quot;table-layout: fixed;&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img alt=&quot;Message passing test happens-before outcome one.&quot; src=&quot;/files/jmm/jmm-message-passing-hb-outcome01.jpg&quot; /&gt;&lt;/th&gt;
&lt;th&gt;&lt;img alt=&quot;Message passing test happens-before outcome two.&quot; src=&quot;/files/jmm/jmm-message-passing-hb-outcome02.jpg&quot; /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;padding-right:30px&quot;&gt;&lt;em&gt;HB consistent, reads observe latest writes on the happens-before edge.&lt;/em&gt;&lt;/td&gt;
&lt;td style=&quot;vertical-align:top;&quot;&gt;&lt;em&gt;HB consistent, reads observe the initial values.&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img alt=&quot;Message passing test happens-before outcome three.&quot; src=&quot;/files/jmm/jmm-message-passing-hb-outcome03.jpg&quot; /&gt;&lt;/th&gt;
&lt;th&gt;&lt;img alt=&quot;Message passing test happens-before outcome four.&quot; src=&quot;/files/jmm/jmm-message-passing-hb-outcome04.jpg&quot; /&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;padding-right:30px&quot;&gt;
&lt;em&gt;
HB consistent, it is racy read. There is not happens-before edge between write on x and read on x. It is read via race. HB allows observing &quot;unsynchronized&quot; writes via race.
&lt;/em&gt;
&lt;/td&gt;
&lt;td style=&quot;vertical-align:top; padding-right:30px&quot;&gt;
&lt;em&gt;
HB &lt;b&gt;incosistent&lt;/b&gt;. We cannot use this particular execution to reason about program outcomes. This outcome is impossible.
&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;As expected the outcome &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 0&lt;/code&gt; is not possible in Java when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volatile&lt;/code&gt; variable.&lt;/p&gt;

&lt;h3 id=&quot;store-buffering&quot;&gt;Store Buffering&lt;/h3&gt;

&lt;p&gt;Can this program see the following result &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 0, r2 = 0&lt;/code&gt;?&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;2&quot;&gt;
- On sequentially consistent hardware: no&lt;br /&gt;
- On x86-TSO: yes&lt;br /&gt;
- On ARM: yes&lt;br /&gt;
- Java language (plain): yes&lt;br /&gt;
- Java language (volatile x &lt;b&gt;and&lt;/b&gt; y): no&lt;br /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;On sequentially consistent hardware, this outcome isn’t possible because instructions are executed in total store order. However, on x86-TSO, the outcome is possible because the write buffer from a processor may not have been flushed to shared memory.&lt;/p&gt;

&lt;p&gt;Due to the reordering of instructions, it’s also possible to achieve the outcome in both ARM and Java programming languages (using plain variables).&lt;/p&gt;

&lt;p&gt;By making both variables &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; volatile, the above outcome is not possible. There exists a happens-before edge between the write and read operations on both volatile variables, so one of the writes should be the first to be executed. Remember that reordering of instructions is not allowed across synchronization operations.&lt;/p&gt;

&lt;h3 id=&quot;load-buffering&quot;&gt;Load Buffering&lt;/h3&gt;

&lt;p&gt;Can this program see the following result &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 1&lt;/code&gt;?&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;2&quot;&gt;
- On sequentially consistent hardware: no&lt;br /&gt;
- On x86-TSO: no&lt;br /&gt;
- On ARM: yes&lt;br /&gt;
- Java language (plain): yes&lt;br /&gt;
- Java language (volatile x or y): no&lt;br /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;This outcome isn’t possible in sequentially consistent and x86-TSO modes because the instructions cannot be reordered.&lt;/p&gt;

&lt;p&gt;On an ARM model and with Java using plain variables, the outcome is possible because reads can be delayed until after writes.&lt;/p&gt;

&lt;p&gt;In Java, declaring one of the variables as volatile, the above outcome is not possible. Any write operation on a volatile variable creates a happens-before edge, which prevents the compiler from reordering the instructions.&lt;/p&gt;

&lt;h3 id=&quot;coherence&quot;&gt;Coherence&lt;/h3&gt;

&lt;p&gt;Can this program see the following result &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 2, r3 = 2, r4 = 1&lt;/code&gt;?&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;Thread III&lt;/th&gt;
&lt;th&gt;Thread IV&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;4&quot;&gt;
- On sequentially consistent hardware: no&lt;br /&gt;
- On x86-TSO: no&lt;br /&gt;
- On ARM: no&lt;br /&gt;
- Java language (plain): yes&lt;br /&gt;
- Java language (volatile x): no&lt;br /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Here is something that is not possible in the ARM model. This test checks if updates to a single memory location are observed in a different order.&lt;/p&gt;

&lt;p&gt;The threads agree on the total order of writes to a single memory location. One of the writes overwrites the other, and all the hardware models agree on the order.&lt;/p&gt;

&lt;p&gt;However, due to optimizing compilers, the read instructions on thread IV could potentially be reordered. While this is possible in Java with regular variables, it is not feasible with volatile variables.&lt;/p&gt;

&lt;h3 id=&quot;independent-reads-of-independent-writes-iriw&quot;&gt;Independent Reads of Independent Writes (IRIW)&lt;/h3&gt;

&lt;p&gt;Can this program see the following result &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1 = 1, r2 = 0, r3 = 1, r4 = 0&lt;/code&gt;?&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Thread I&lt;/th&gt;
&lt;th&gt;Thread II&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;Thread III&lt;/th&gt;
&lt;th&gt;Thread IV&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;td&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;r3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;4&quot;&gt;
- On sequentially consistent hardware: no&lt;br /&gt;
- On x86-TSO: no&lt;br /&gt;
- On ARM: yes&lt;br /&gt;
- Java language (plain): yes&lt;br /&gt;
- Java language (volatile x &lt;b&gt;and&lt;/b&gt; y): no&lt;br /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;This is similar to the coherence test, but with two distinct memory locations. Primarily, we’re verifying whether threads III and IV observe the updates on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; in different orders.&lt;/p&gt;

&lt;p&gt;On ARM, there is no total store order guarantee on different writes. Similarly, the optimizing compiler could reorder the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r3&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r4&lt;/code&gt; reads, making thread interleaving to produce the above outcome.&lt;/p&gt;

&lt;p&gt;Adding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volatile&lt;/code&gt; to both variables, the outcome isn’t possible since it creates a happens-before edge that prevents the compiler from reordering the reads.&lt;/p&gt;

&lt;h3 id=&quot;jcstress-tests&quot;&gt;JCStress Tests&lt;/h3&gt;

&lt;p&gt;All of the litmus tests are validated using the &lt;a href=&quot;https://github.com/openjdk/jcstress&quot;&gt;JCStress&lt;/a&gt; framework. You can find the GitHub repository here at &lt;a href=&quot;https://github.com/morazow/jmm-litmus-tests&quot;&gt;github.com/morazow/jmm-litmus-tests&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;locks&quot;&gt;Locks&lt;/h2&gt;

&lt;p&gt;Java locks also provide the ordering, as lock enter happens before lock exit, which is similar to the behavior of volatile write and read. However, they ensure &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mutual exclusion&lt;/code&gt;, preventing two threads from concurrently accessing the locked or synchronized section.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;With blog I tried to summarize my understanding of the Java Memory Model.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We began by learning about the guarantees offered by the hardware models.&lt;/li&gt;
  &lt;li&gt;We learned that by using proper synchronization mechanisms to ensure data-race-free implementations, the program outcomes could be explained as though they are executed in sequentially consistent manner.&lt;/li&gt;
  &lt;li&gt;We looked into the Java Memory Model and learned how &lt;strong&gt;happens-before&lt;/strong&gt; edges are established.&lt;/li&gt;
  &lt;li&gt;We also ran several litmus tests to better understand how different models behave.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h2&gt;

&lt;p&gt;I could not have understood this topic without the help of the following resources.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://research.swtch.com/mm&quot;&gt;https://research.swtch.com/mm — Memory Models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://shipilev.net/blog/2014/jmm-pragmatics/&quot;&gt;https://shipilev.net/blog/2014/jmm-pragmatics/ — JMM Pragmatics&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/&quot;&gt;https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/ — Close Encounters of JMM Kind&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;YT: &lt;a href=&quot;https://www.youtube.com/playlist?app=desktop&amp;amp;list=PLC5OGTO4dWxYC9Eh9RJYRSP85GKRoho3S&quot;&gt;Hydra Conference 2021, JCStress Workshop by Aleksey Shipilev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
      
      
      
      

      

      

      
        <category term="java" />
      
        <category term="memory-model" />
      

      
        <summary type="html">These are notes taken to better understand the Java Memory Model, which I now publish as a blog post.</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">E in REPL</title>
      
      
      <link href="/2018/07/29/e-in-eval/" rel="alternate" type="text/html" title="E in REPL" />
      
      <published>2018-07-29T00:00:00+00:00</published>
      <updated>2018-07-29T00:00:00+00:00</updated>
      <id>/2018/07/29/e-in-eval</id>
      <content type="html" xml:base="/2018/07/29/e-in-eval/">&lt;p class=&quot;meta&quot;&gt;06 August 2018 - Lagos, Portugal&lt;/p&gt;

&lt;h2 id=&quot;eval-in-repl&quot;&gt;Eval in REPL&lt;/h2&gt;

&lt;p&gt;There are many strategies when evaluating a source code in interpreted
languages.&lt;/p&gt;

&lt;p&gt;The most common and easy to implement method is a &lt;strong&gt;tree-walking&lt;/strong&gt; interpreter.
Interpreters working in this way just evaluate a provided &lt;strong&gt;Abstract Syntax
Tree&lt;/strong&gt; (&lt;a href=&quot;https://en.wikipedia.org/wiki/Abstract_syntax_tree&quot;&gt;AST&lt;/a&gt;). Usually there might be preceding steps to make
optimizations such as rewriting or transforming an AST so that it is more
suitable for repeated or recursive evaluation.&lt;/p&gt;

&lt;p&gt;Other interpreters first convert the AST to &lt;a href=&quot;https://en.wikipedia.org/wiki/Bytecode&quot;&gt;bytecode&lt;/a&gt;. Bytecode is
composed of opcodes, which are similar to mnemonics of assembly language.
However, the bytecode needs to be emulated by a &lt;a href=&quot;https://en.wikipedia.org/wiki/Virtual_machine&quot;&gt;virtual machine&lt;/a&gt; that is
part of interpreter. This approach can be more performant than a tree-walking
interpreter evaluation.&lt;/p&gt;

&lt;p&gt;However, some interpreters do not build an AST at all. The parser directly emits
bytecode, and then it gets emulated by a virtual machine.&lt;/p&gt;

&lt;p&gt;Yet, some programming languages parse a source code, build an AST and convert
the AST to bytecode. But instead of emulating opcodes specified by bytecode, the
VM compiles them into native machine code before executed them — just in
time. Usually these are called &lt;a href=&quot;https://en.wikipedia.org/wiki/Just-in-time_compilation&quot;&gt;JIT&lt;/a&gt; (just in time) interpreters or
compilers.&lt;/p&gt;

&lt;p&gt;Some interpreters recursively traverse the AST but convert specific branches of
it into native code, then execute that branch just in time. Slight variation of
this is where a particular branch is compiled to native code only after
traversing it multiple times.&lt;/p&gt;

&lt;h2 id=&quot;evaluation-in-real-world-programming-language&quot;&gt;Evaluation in Real World Programming Language&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://www.ruby-lang.org/en/&quot;&gt;Ruby&lt;/a&gt; started as a tree-walk interpreter, executing the AST while
traversing it, until version &lt;strong&gt;1.9&lt;/strong&gt;. With version 1.9, they introduced a
virtual machine. After that Ruby interpreter parses source code, builds an AST
and then compiles the AST into bytecode, which gets interpreted by virtual
machine.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.lua.org/&quot;&gt;Lua&lt;/a&gt; started out as an interpreter that compiles to bytecode without
building an AST, and then the bytecode is executed in &lt;strong&gt;register-based&lt;/strong&gt; virtual
machine. However, with introduction of &lt;a href=&quot;https://github.com/LuaJIT/LuaJIT&quot;&gt;LuaJIT&lt;/a&gt;, the bytecode is
compiled to highly-optimized machine code for several architectures.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In summary, it is a trade off between performance or portability. If you want a
performant language it is better to choose a bytecode VM that JIT compiles to
native code for different machine architectures. But, tree-walking interpreters
are less performant but portable since you do not have to target different
architecture, only evaluate an AST.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;p&gt;Many of this material was adopted from Thorsten Ball’s &lt;a href=&quot;https://interpreterbook.com/&quot;&gt;Writing An
Interpreter&lt;/a&gt; book. Additionally, &lt;a href=&quot;https://craftinginterpreters.com/&quot;&gt;Crafting Interpreters&lt;/a&gt; by
Robert Nystrom was great help. I thank them both dearly!&lt;/p&gt;</content>

      
      
      
      
      

      

      

      
        <category term="repl" />
      
        <category term="evaluation" />
      
        <category term="interpreter" />
      
        <category term="jit" />
      
        <category term="compiler" />
      

      
        <summary type="html">06 August 2018 - Lagos, Portugal</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Using MultiGroupBy with Scalding</title>
      
      
      <link href="/2014/11/14/scalding-multi-groupby/" rel="alternate" type="text/html" title="Using MultiGroupBy with Scalding" />
      
      <published>2014-11-14T00:00:00+00:00</published>
      <updated>2014-11-14T00:00:00+00:00</updated>
      <id>/2014/11/14/scalding-multi-groupby</id>
      <content type="html" xml:base="/2014/11/14/scalding-multi-groupby/">&lt;p class=&quot;meta&quot;&gt;18 November 2014 - Nuremberg&lt;/p&gt;

&lt;h2 id=&quot;tldr&quot;&gt;TL;DR&lt;/h2&gt;

&lt;p&gt;Scalding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.groupBy&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.join&lt;/code&gt; operations can be combined into single operation
using &lt;a href=&quot;https://github.com/LiveRamp/cascading_ext#multigroupby&quot;&gt;MultiGroupBy&lt;/a&gt;
from &lt;a href=&quot;https://github.com/LiveRamp/cascading_ext&quot;&gt;Cascading extension&lt;/a&gt;, which
improves the job performance. Scalding job example using MultiGroupBy can be
found &lt;a href=&quot;https://github.com/morazow/ScaldingExamples/tree/master/src/main/scala/com/morazow/multigroupby&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Let’s imagine we have two data sources. The first data contains the purchase
record of the users per time and per geographical State. This data is formatted
as following, &lt;span id=&quot;backcolor&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;user_id, timestamp, state,
purchases&amp;gt;&lt;/code&gt;&lt;/span&gt;. The second data contains the user demographic information.
For this particular example, it only contains user age, &lt;span id=&quot;backcolor&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;user_id, age&amp;gt;&lt;/code&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The main goal of this map reduce job is to count the number of purchases per
state and per age group.&lt;/p&gt;

&lt;p&gt;In Scalding, we can implement this job as,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MultiGroupByExample1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Job&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;Purchases&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;purchasesPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;PURCHASE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;read&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;UserAges&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userAgesPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;AGE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;read&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;MyJob&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Purchases&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;COUNT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;joinWithSmaller&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;UserAges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;AGE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;](&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;COUNT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This is elegant and concise solution however it is not very efficient.&lt;/p&gt;

&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;/h2&gt;

&lt;p&gt;In Scalding each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.groupBy&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.join&lt;/code&gt; operation introduces another map reduce
phase.  That is with the code above, data will be shuffled, sorted and reduced
three times before finishing the computation. Therefore, when there are very
&lt;span id=&quot;backcolor&quot;&gt;big data&lt;/span&gt; to be processed, the overall job performance
will be very inefficient.&lt;/p&gt;

&lt;p&gt;Luckily we can do better!&lt;/p&gt;

&lt;h2 id=&quot;multigroupby-operation&quot;&gt;MultiGroupBy Operation&lt;/h2&gt;

&lt;p&gt;The desired solution is to perform aggregation operations while joining two data
sources. Fortunately, it can be achieved using &lt;strong&gt;MultiGroupBy&lt;/strong&gt; operation. In
the rest of this blog I will show how to use MultiGroupBy in Scalding by
reducing the three steps from above job into single map reduce phase.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Recently I was reading tips for optimizing Cascading
flows (at https://nathanmarz.com/blog/tips-for-optimizing-cascading-flows.html)
and recalled &lt;a href=&quot;https://github.com/LiveRamp/cascading_ext&quot;&gt;Cascading extensions&lt;/a&gt;
project, which I saw several months ago.  It offers additional operations on top
of Cascading. Here I will only show MultiGroupBy (maybe BloomJoin in some other
blog post). It is great!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The API of MultiGroupBy is defined &lt;a href=&quot;https://github.com/LiveRamp/cascading_ext/blob/master/src/main/java/com/liveramp/cascading_ext/assembly/MultiGroupBy.java#L35-L55&quot; target=&quot;_blank&quot; data-proofer-ignore=&quot;&quot;&gt;MultiGroupBy.java#L35-L55&lt;/a&gt;. It accepts
two pipes, two fields definitions as joining fields, renamed join field(s) and
aggregation operation. We will have to write Cascading multi buffer operation in
Java, but it is worth the effort.&lt;/p&gt;

&lt;p&gt;The updated Scalding job will be as below,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.liveramp.cascading_ext.assembly.MultiGroupBy&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MultiGroupByExample2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Job&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;MyJob&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MultiGroupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;nc&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;UserAges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Purchases&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;nc&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;USERID&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;USERID&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;USERID&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyMultiBufferOp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;STATE&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;AGE&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;COUNT&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;discard&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Because MultiGroupBy performs join operation, it keeps the join fields.
Therefore, on line 13 we just discard &lt;em&gt;‘USERID&lt;/em&gt; column.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Please notice the smooth Scala/Scalding and Java/Cascading interop. &lt;em&gt;new
Fields(“USERID”)&lt;/em&gt; and &lt;em&gt;‘USERID&lt;/em&gt; are the same.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next we write our multi buffer operation, &lt;strong&gt;MyMultiBufferOp&lt;/strong&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;com.liveramp.cascading_ext.multi_group_by.MultiBuffer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.commons.collections.keyvalue.MultiKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyMultiBufferOp&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MultiBuffer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;operate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// First pipe: UserAges &amp;lt;USERID, AGE&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userAges&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getArgumentsIterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userAges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Tuple&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userAgesTuple&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userAges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userAgesTuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getInteger&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// second field is age&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Data structure to store the count&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;MultiKey&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;MultiKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HashMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;MultiKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;();&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Second pipe: Purchases &amp;lt;USERID, TIMESTAMP, STATE, PURCHASES&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;purchases&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getArgumentsIterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;purchases&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nc&quot;&gt;Tuple&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;purchasesTuple&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;purchases&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;purchasesTuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getInteger&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// third column is state&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MultiKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_age&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;containsKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// We just calculated &amp;lt;STATE, AGE, COUNT&amp;gt; results stored in &apos;countMap&apos;&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Now we just have to emit COUNT, because we gave &amp;lt;STATE, AGE&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// as grouping names when calling this buffer operation&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;MultiKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;countMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;entrySet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;emit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;age&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;First, we obtain tuple iterators for the two data sources. Then we keep updating
the hashmap &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HashMap(&amp;lt;state, age&amp;gt;, count)&lt;/code&gt; until exhausting iterators values.
Finally, we emit the hashmap contents as results for this buffer operation.&lt;/p&gt;

&lt;p&gt;You can find the full code &lt;a href=&quot;https://github.com/morazow/ScaldingExamples/tree/master/src/main/scala/com/morazow/multigroupby&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://github.com/morazow/ScaldingExamples/tree/master/src/main/java/com/morazow/multigroupby&quot;&gt;here&lt;/a&gt; multi buffer
operation. In order to test the MultiGroupBy example you will have to assembly
fat jar and run it on Hadoop environment.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In find this kind of patterns, join after or before groupBy, a lot in our map
reduce job chains. Using MultiGroupBy we achieved considerable performance
increase. Additionally, it resulted in efficient cluster utilization.&lt;/p&gt;

&lt;p&gt;I strongly believe this operation should be default in both Cascading and
Scalding.&lt;/p&gt;

&lt;p&gt;If you liked this post, you can
&lt;a href=&quot;https://twitter.com/intent/tweet?url=/2014/11/14/scalding-multi-groupby/&amp;amp;text=Using MultiGroupBy with Scalding&amp;amp;hashtags=Scalding,Cascading,Hadoop,Bigdata&amp;amp;via=morazow&quot; target=&quot;_blank&quot;&gt;click to Tweet&lt;/a&gt; it or &lt;a href=&quot;https://twitter.com/morazow&quot; target=&quot;_blank&quot;&gt; follow me on Twitter&lt;/a&gt;!&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="en" />
      

      
        <category term="scalding" />
      
        <category term="cascading" />
      
        <category term="hadoop" />
      
        <category term="bigdata" />
      

      
        <summary type="html">18 November 2014 - Nuremberg</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Scalding TemplatedTsv And Hadoop Many Files Problem</title>
      
      
      <link href="/2014/10/03/scalding-templated-tsv/" rel="alternate" type="text/html" title="Scalding TemplatedTsv And Hadoop Many Files Problem" />
      
      <published>2014-10-03T00:00:00+00:00</published>
      <updated>2014-10-03T00:00:00+00:00</updated>
      <id>/2014/10/03/scalding-templated-tsv</id>
      <content type="html" xml:base="/2014/10/03/scalding-templated-tsv/">&lt;p class=&quot;meta&quot;&gt;03 October 2014 - Nuremberg&lt;/p&gt;

&lt;h2 id=&quot;tldr&quot;&gt;TL;DR&lt;/h2&gt;

&lt;p&gt;If Scalding TemplatedTsv tap creates lots of output files, do a groupBy on
template column(s) just before writing the tap.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;table class=&quot;rouge-table&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=&quot;gutter gl&quot;&gt;&lt;pre class=&quot;lineno&quot;&gt;1
2
3
4
&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;pipe&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// some other etl&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;pass&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TemplatedTsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%02d&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;You can find full pseudocode at the bottom of this page.&lt;/p&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;My daily job involves writing Hadoop map reduce jobs. I use
&lt;a href=&quot;https://github.com/twitter/scalding&quot;&gt;Scalding&lt;/a&gt; and
&lt;a href=&quot;https://www.cascading.org/projects/cascading/&quot;&gt;Cascading&lt;/a&gt;. They are really
really really awesome. I cannot recommend them enough.&lt;/p&gt;

&lt;p&gt;Usually we have several chain of map reduce jobs running. One of the jobs
performs daily aggregation of the incoming data. The result of this job is then
used as input for other jobs that run weekly or monthly.&lt;/p&gt;

&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;/h2&gt;

&lt;p&gt;Let’s imagine the input data is formatted as, &lt;span id=&quot;backcolor&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;user_id,
timestamp, state, transactions&amp;gt;&lt;/code&gt;&lt;/span&gt;  That is we have data about user making
transactions in particular timestamp (epoch) per place, which is geographical
State.&lt;/p&gt;

&lt;p&gt;The main goal of this job is to count the number of transactions user made each
day per state. In Scalding that would be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;groupBy&lt;/code&gt; operations.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;n&quot;&gt;pipe&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;DAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;DAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;COUNT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;One other requirement for the job is that it needs to store its results into
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/year/month/day/state/&lt;/code&gt; partitions.&lt;/p&gt;

&lt;p&gt;Depending on the incoming input data we need to partition the aggregated data.
That is, all the transactions for a particular state should be in single bucket
partition.  Input data may not contain all states, we should not create folders
for not existing states.&lt;/p&gt;

&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;/h2&gt;

&lt;p&gt;To achieve the goal we can use &lt;a href=&quot;https://twitter.github.io/scalding/api/#com.twitter.scalding.TemplatedTsv&quot; target=&quot;_blank&quot; data-proofer-ignore=&quot;&quot;&gt;TemplatedTsv&lt;/a&gt; tap from Scalding. Just
change the Tsv tap with it,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TemplatedTsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%02d&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;When running the job jar just give the base output path as &lt;span id=&quot;backcolor&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--output /year/month/day/&lt;/code&gt;&lt;/span&gt;, and it will create state
folders inside above path.&lt;/p&gt;

&lt;p&gt;However, this approach will create &lt;a href=&quot;https://blog.cloudera.com/blog/2009/02/the-small-files-problem/&quot;&gt;lots
files&lt;/a&gt;. Because
the data is not organized in any way, each reducer will have data containing
several states, reducers will create several files in the state folder.&lt;/p&gt;

&lt;p&gt;This is very very bad for the next jobs in the chain if they use as input the
results of above job. For instance, weekly running job will be very slow because
of lots files it has to read.&lt;/p&gt;

&lt;p&gt;Can we mitigate this problem somehow?&lt;/p&gt;

&lt;p&gt;Yes, sure. When reducers are done processing the data and about to write, we
want the data that reducer processed to be from one (or two) state at most. So
it will create one or two files at most.&lt;/p&gt;

&lt;p&gt;To achieve this, sort the data by state using Hadoop power before writing.  In
other words, we just add another &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;groupBy&lt;/code&gt; operation in Scalding and do not
perform any aggregation operation in that grouping.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;pass&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TemplatedTsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%02d&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This solves many files problem by introducing another map reduce phase overhead.&lt;/p&gt;

&lt;p&gt;However, there is another problem with this solution. Because the data is not
balanced with respect to state, some reducers will process only records (which
might be a lot) belonging to a single state and delay the whole process.&lt;/p&gt;

&lt;p&gt;Now the problem at hand is that some reducers process considerably large
percentage of the data while some others process very small percentage.
Therefore, our next goal is to process the states with lots of data in parallel
with several reducers instead of single reducer handling that state.&lt;/p&gt;

&lt;p&gt;After analyzing the incoming data or the results of the previous aggregation
jobs, we can determine the states containing large portion of data. And
distribute their load to number of reducers (of our choice) with the following
trick,&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;modulo&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;_2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;37&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;nv&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;_1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;hashCode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;modulo&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;pass&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;discard&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TemplatedTsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%02d&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;For instance, we redistribute the California (6) state’s data into five
reducers.  Therefore, instead of single reducer, five of them will be writing
into output partition thus creating file smaller files.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;TemplatedTsv is great. However, it creates lots of small output files, which
affects negatively the performance of the next job on the chain. Fortunately,
the number of files can be reduced by sorting the data according to template
before writing the tap. Furthermore, if the data is skewed you can apply some
tricks to balance the templated data among reducers. This adds overhead of
another map reduce phase.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;pipeSource&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Tsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;InputSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;TRANSACTIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;pipeETL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pipeSource&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;read&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;DAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;USERID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;DAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;](&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;COUNT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;HASHCODE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;modulo&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;_2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;37&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;_1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;hashCode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;modulo&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;pass&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;discard&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;&apos;SORTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TemplatedTsv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseOutputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%02d&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;&apos;STATE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you liked this post, you can
&lt;a href=&quot;https://twitter.com/intent/tweet?url=/2014/10/03/scalding-templated-tsv/&amp;amp;text=Scalding TemplatedTsv And Hadoop Many Files Problem&amp;amp;hashtags=Scalding,Cascading,Hadoop&amp;amp;via=morazow&quot; target=&quot;_blank&quot;&gt;click to Tweet&lt;/a&gt; it or &lt;a href=&quot;https://twitter.com/morazow&quot; target=&quot;_blank&quot;&gt; follow me on Twitter&lt;/a&gt;!&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="en" />
      

      
        <category term="scalding" />
      
        <category term="cascading" />
      
        <category term="hadoop" />
      

      
        <summary type="html">03 October 2014 - Nuremberg</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Madeira Adasyna Gezelenç</title>
      
      
      <link href="/2012/05/17/madeira-gezelenc/" rel="alternate" type="text/html" title="Madeira Adasyna Gezelenç" />
      
      <published>2012-05-17T00:00:00+00:00</published>
      <updated>2012-05-17T00:00:00+00:00</updated>
      <id>/2012/05/17/madeira-gezelenc</id>
      <content type="html" xml:base="/2012/05/17/madeira-gezelenc/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 17-nji maýy - Lissabon&lt;/p&gt;

&lt;p&gt;Ýene bir gezelenç, Portugaliýa degişli bolan, okeýanyň ortasynda ýerleşen ada
&lt;a href=&quot;https://en.wikipedia.org/wiki/Madeira&quot;&gt;Madeira&lt;/a&gt;. Bu gezek kursdaşym, dostym
Vaidas Brundza bilen gitdim. Üç gün ýaly gowy edip adany aýlandyk, ýaman gowy
tebigaty bar ekeni.&lt;/p&gt;

&lt;p&gt;Ilkinji gün, uçar bilen gelen şäherimiz
&lt;a href=&quot;https://en.wikipedia.org/wiki/Funchal&quot;&gt;Funchal-a&lt;/a&gt; gezişdirdik, birnäçe ýerli
tagamlary datdyk. Ikinji gün bolsa, awtoulag arenda alyp ada aýlanmaga gitdik.
Şu ýerde okyjylara kömek bolar umudy bilen birnäçe zady bellemekçi, ilki bilen
aeroportda awtoulag arenda almaň sebäbi iki esse baha pul tölemeli, iň gowysy
awtobus bilen şäher merkezine gidip şol ýerden almak. Biz
&lt;a href=&quot;https://www.europcar.com/&quot;&gt;Europcar&lt;/a&gt; awtoulag arenda hyzmatyny ulanypdyk, arzan
we amatly. Hawa näme indi elimizde ulag bar adany bir başyndan aýlandyk. Ilki
bilen iň beýik depesine tarap ýola çykdyk.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/01.JPG&quot; alt=&quot;madeira1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/02.JPG&quot; alt=&quot;madeira2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/03.JPG&quot; alt=&quot;madeira3&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/04.JPG&quot; alt=&quot;madeira4&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/05.JPG&quot; alt=&quot;madeira5&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Şeýlelik bilen adanyň ýarysyna diýen ýaly aýlandyk. Gyzykly gezelençleriň biri
boldy.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/madeira/route.png&quot; alt=&quot;madeira ugur&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 17-nji maýy - Lissabon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Ýewropanyň iň günbatar çüňki, Cabo da Roca</title>
      
      
      <link href="/2012/02/20/yewropanyn-in-gunbatar-cunki/" rel="alternate" type="text/html" title="Ýewropanyň iň günbatar çüňki, Cabo da Roca" />
      
      <published>2012-02-20T00:00:00+00:00</published>
      <updated>2012-02-20T00:00:00+00:00</updated>
      <id>/2012/02/20/yewropanyn-in-gunbatar-cunki</id>
      <content type="html" xml:base="/2012/02/20/yewropanyn-in-gunbatar-cunki/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 20-nji fewraly - Lissabon&lt;/p&gt;

&lt;p&gt;Şu gün irden oglanlar bilen Ýewropanyň iň günbatar çüňki bolan &lt;a href=&quot;https://en.wikipedia.org/wiki/Cabo_da_Roca&quot;&gt;Cabo da
Roca&lt;/a&gt; gitmegi maksat edinipdik. Hawa
Lissabondan ol ýerik ýöräp ha gidip bolmaýar, iň amatlysy welisipet (tigir)
bilen gitmek. Ilki Lissabondan Cascais-a çenli otly bilen gitdik. Ol ýerde hem
mugt tigir berýän ýerler bar, şolaryň birinden tigirlerimizi alyşdyryp ýola
düşdük.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/06.png&quot; alt=&quot;cabadaroca6&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Bäş oglan bolyp gidipdik, içinde iň erbet sürýäni men. (:D) Sebäbi tigir
ulanmanyma eýýäm ýedi sekiz ýyl bolyp barýardy. Hatta ýolda iki gezek ýykyldym,
birindi ýolda işleýän adamlary süsdüm.. Ýenede gezelenç gaty gyzykly geçdi.
Portugaliýanyň (hatta ýewropanyň) günbatar kenaryndan, okaýanyň gyrasyndan tigir
sürýädik. Ýolda käwagtlar depeleri çykmaly bolýar, ýadatýar. Ýöne gaýdyşyn has
hezil, welikleri sürmek hökman däl, diňe ýönüňi gönülemek ýeterlikdi.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Cabo_da_Roca&quot;&gt;Cabo da Roca&lt;/a&gt;-a ýetip gowy dynç
aldyk, okeýanyň gyrasynda otyryp iýip içdik. Daş töweregi birsalym aýlanyp
yzymyza dolandyk. Ýolda bir plaja sowylyp ol ýerde hem bir salym okeýan
tolkynlary synlap çägede oýnap bildik.&lt;/p&gt;

&lt;p&gt;Portugaliýada gezip görer ýaly gaty kän ýer bar..&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/01.JPG&quot; alt=&quot;cabadaroca1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/02.JPG&quot; alt=&quot;cabadaroca2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/03.JPG&quot; alt=&quot;cabadaroca3&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/04.JPG&quot; alt=&quot;cabadaroca4&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/cabo-da-roca/05.JPG&quot; alt=&quot;cabadaroca5&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 20-nji fewraly - Lissabon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">EMDC Winter Event</title>
      
      
      <link href="/2012/02/15/EMDC-Winter-Event/" rel="alternate" type="text/html" title="EMDC Winter Event" />
      
      <published>2012-02-15T00:00:00+00:00</published>
      <updated>2012-02-15T00:00:00+00:00</updated>
      <id>/2012/02/15/EMDC-Winter-Event</id>
      <content type="html" xml:base="/2012/02/15/EMDC-Winter-Event/">&lt;p class=&quot;meta&quot;&gt;15 February 2012 - Sintra&lt;/p&gt;

&lt;p&gt;After traveling around the Europe, I have eventually arrived to Lisbon. There
was an &lt;a href=&quot;https://www.ac.upc.edu/en/academics/master/master-emdc-european-master-in-distributed-computing&quot;&gt;EMDC&lt;/a&gt; event at &lt;a href=&quot;https://en.wikipedia.org/wiki/Sintra&quot;&gt;Sintra&lt;/a&gt;
that I should attend. I was excited since it would be gathering whole actors of
this program. Some guys from previous batch, guys from Barcelona and Lisbon
tracks and of course professors, and some invited speakers were there.&lt;/p&gt;

&lt;p&gt;Sintra was great place to have such an event. Meeting fellow EMDC-ers and
talking, getting tips from seniors was another activity I enjoyed very much.
Moreover, we had some scientific talks about &lt;a href=&quot;https://en.wikipedia.org/wiki/Distributed_computing&quot;&gt;Distributed
Computing&lt;/a&gt;, how to write
master’s thesis, etc.  Another presentation was given about Stockholm, Sweden
and &lt;a href=&quot;https://www.kth.se/en&quot;&gt;KTH&lt;/a&gt; where we will be studying one semester.&lt;/p&gt;

&lt;p&gt;One thing I really liked was, seniors wore t-shirts while they were giving
presentation, and the tagline was&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“no tagline yet, still running &lt;a href=&quot;https://en.wikipedia.org/wiki/Paxos_%28computer_science%29&quot;&gt;paxos&lt;/a&gt;”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;maybe I will use it in this blog as a tagline.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/emdc.jpg&quot; alt=&quot;emdc&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="en" />
      

      

      
        <summary type="html">15 February 2012 - Sintra</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Eurail Gezelenjiň Netijesi</title>
      
      
      <link href="/2012/02/11/eurail-gezelenjin-netijesi/" rel="alternate" type="text/html" title="Eurail Gezelenjiň Netijesi" />
      
      <published>2012-02-11T00:00:00+00:00</published>
      <updated>2012-02-11T00:00:00+00:00</updated>
      <id>/2012/02/11/eurail-gezelenjin-netijesi</id>
      <content type="html" xml:base="/2012/02/11/eurail-gezelenjin-netijesi/">&lt;p class=&quot;meta&quot;&gt;2012-njy ýylyň 11-nji fewraly - Lissabon&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/yollar.png&quot; alt=&quot;yollar&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Şeýlelik bilen 15 günlik gezelenji hem gutardym. Käwagtlar ýadadym, üşedim ýöne
şonda hem gaty gyzyklydy.&lt;/p&gt;

&lt;p&gt;Birnäçe agza sorapdy wiza gerekmi ýa diňe &lt;a href=&quot;https://www.eurail.com/en&quot;&gt;Eurail&lt;/a&gt;
bilet alsaň bolýamy diýip, “Wiza gerek”!  Öňüräk mail bilen özüm hem soradym şu
ýerik ýazjak, “The Eurail Pass is no visa; travelers are responsible for
ensuring that they have all necessary travel documents and visas.” Ýagny bilet
satyn alanyňyzdan soň hökman haýsam bolsa bir şengen ýurtlaryndan turist wiza
alaýmaly. Ýöne ýene gowy habar, eger haýsam bolsa bir Türkmenistan raýaty
Türkiýede (ýada başga haýsam bolsa bir ýewropa ýurdynda) alty aýdan köp ýaşan
bolsa arkaýyn &lt;a href=&quot;https://www.interrail.eu/en&quot;&gt;Interrail&lt;/a&gt; (bu Eurail-dan arzan)
bileti satyn alyp, ýanyna pasaportyny we ikametini alyp arkaýyn gezip biler.
Sebäbi Türkiýani hem ýewropa zonasynda diýip hasaplapdyrlar.&lt;/p&gt;

&lt;h2 id=&quot;kiçijek-maslahatlarym&quot;&gt;Kiçijek Maslahatlarym&lt;/h2&gt;

&lt;p&gt;Men gyşda gezdim, ýanyňyza ýyly eşikleri hökman alyň, ellik, şarf, şapka we ýene
hökman alynmaly zatlaryň biri hem şypbyk, gündiz gezip ýadap agyr göwüşi çykaryp
arkaýyn dynç almak üçin.  Ikinji men ýanyma içine girilip ýatylýan ýorgan (adyny
bilemok) alypdym, käwagtlar garaşmaly bolanda hezillik berdi.  Bulardan başga
hem karta, telefonyň maps application-y ulanýadym ýöne käwagt internet
tapdyraýanok. Mysal üçin men 3G ulanmak üçin Italiýada bolan bir operator sim
kart hem alypdym. Birem şäher hakynda azajyk hem bolsa öňünden maglumat edinmek
peýdaly, wikipediýadan okamak ýeterlik. Sebäbi, mysal, Prague-da eger gyzyk
çyrada geçseň tutaýsalar 60 Kron (~ 3euro) jerime bar. Iň möhümi hem gymmat baha
zatlaryňyza göz gulak bolmak, pul, laptop, fotoaparat wş.&lt;/p&gt;

&lt;p&gt;Hawa näçe çykdaýjy boldy? Men hasapladym, 15 günlik Eurail bilet hem içinde,
1300 euro töweregi. Esasy diňe suwinirlere köp sowdym. Iýip-içki üçin kän pul
berenim ýadyma düşenok, diňe Italiýada bir ýerde betinden pizza başga bir ýerde
bolsa pasta iýdim. Başga wagtlar diňe sandwiç we çaý alyp iýmegim ýeterlik
bolýady. 4 sany hostelde galdym olaram gymmat däldi, gijelik 15 euro. Öňünden
zanitleseňä hasam arzan bolýar.&lt;/p&gt;

&lt;p&gt;Näçe sowan bolsam hem hiçhili gynanamok sebäbi gezelençden gaty lezzet aldym we
meň üçin üýtgeşik tejribe boldy. Birnäçe dost edindim, köp nätanyş adamlar bilen
sohbetdeş boldym we iki sany dünýäniň başga başga ýerlerinden bolanlaryň
gürleşip maglumat paýlaşmagy gaty üýtgeşik..&lt;/p&gt;

&lt;p&gt;Iň soňky netijäm, eger ýenede ekonomiki we wagt mümkinçiligim bolsa ýene
gezmekçi..&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012-njy ýylyň 11-nji fewraly - Lissabon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Berlin we Amsterdam Gezelenç</title>
      
      
      <link href="/2012/02/07/berlin-we-amsterdam-gezelenci/" rel="alternate" type="text/html" title="Berlin we Amsterdam Gezelenç" />
      
      <published>2012-02-07T00:00:00+00:00</published>
      <updated>2012-02-07T00:00:00+00:00</updated>
      <id>/2012/02/07/berlin-we-amsterdam-gezelenci</id>
      <content type="html" xml:base="/2012/02/07/berlin-we-amsterdam-gezelenci/">&lt;p class=&quot;meta&quot;&gt;2012-njy ýylyň 07-nji fewraly - Amsterdam&lt;/p&gt;

&lt;h2 id=&quot;berlin&quot;&gt;Berlin&lt;/h2&gt;

&lt;p&gt;Gije sagat on töwerekleri Berlin wokzalyna geldim, göni öňünden reserve eden
&lt;a href=&quot;https://www.meininger-hotels.com/en/home/&quot;&gt;Meinengir&lt;/a&gt; hostelime gitdim,
wokzalyň edil ýanynda ýerleşýärdi. 10euro bilen bir gijelik galdym. (Aslynda öz
bahasy mundan azajyk gymmadyrak ýöne Eurail bileti bolanlara skidka edýäler)&lt;/p&gt;

&lt;p&gt;Ýatyp dynjymy alyp ertesi gün Berlinde bolan birnäçe türkmen oglanlara jaň edip
duşuşdym. Gunduz oglanlar bilen Alexanderplatz we töwereklerine aýlandym. Soň
hem elbette diňe Berline mahsus bolan Berlin Diwarynyň (Berlin Wall)
galyndylaryny görmäne gitdim. Diwaryň yzy häzirem dur, köp zatlary
aýyrmandyrlar.&lt;/p&gt;

&lt;p&gt;Birem Berliniň Metro sistemasy iň çylşyrymly bolany öýdýän, düşünýänçäm kelle
çişdi.&lt;/p&gt;

&lt;h2 id=&quot;amsterdam&quot;&gt;Amsterdam&lt;/h2&gt;

&lt;p&gt;Iki gün Berlinde türkmen oglanlary bilen galyp gije Amsterdama tarap ýola
düşdim. Irden sagat 11 töwerekleri gelip bir Inner Amsterdam hostela ýerleşdim.&lt;/p&gt;

&lt;p&gt;Amsterdam ýeke söz bilen çözýär. Çynym özüňiz gelip gözüňiz bilen görmeli.
Şäheri bet edip oňarypdyrlar, birnäçe kanal geçýär içinden. (Kartasyna seredip
görüň) Birem ýolda gidip barýadym welin kanallaryň biri doňypdyr we adamlar
içinde skating edip ýördüler.. (:D) Şäheriň ýene bir üýtgeşikligi her ýerde
welikleriň bolmagy.  Menä adamdan kän tigir barmyka öýtdüm. Ýörite ýollary
zatlaram bar.&lt;/p&gt;

&lt;p&gt;Edil galan hostelimiň ýanynda Van Gogh muzeýi bardy ýöne gitmedim, giriş üçin
14euro isleýärdiler, “Şuňa girenimden bir t-shirt alanym gowylaý” diýip iki sany
t-shirtjik suwinir aldym.&lt;/p&gt;

&lt;p&gt;Amsterdama gelip edilmeli köp zatlardan birnäçesi Coffeeshop-a gitmek, we meşhur
Red Light Street-i gezmek.&lt;/p&gt;

&lt;p&gt;Berlinde bir gün artykmaç galanym üçin bileti mundan soň ulanyp bilemokdym,
eýýäm 15 gün doldy. Netijede Amsterdamdan Lissabona uçaga bilet aldym. Ýöne
ýenede gezelenç gutaranokdy, öňümde iki günlik UNESCO-nyň goragy astyna alan
&lt;a href=&quot;https://en.wikipedia.org/wiki/Sintra&quot;&gt;Lissabon Sintra&lt;/a&gt; gezelençi bardy..&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012-njy ýylyň 07-nji fewraly - Amsterdam</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Zurich-Vienna-Prague Gezelenç</title>
      
      
      <link href="/2012/02/05/Zurich-Vienna-Prague/" rel="alternate" type="text/html" title="Zurich-Vienna-Prague Gezelenç" />
      
      <published>2012-02-05T00:00:00+00:00</published>
      <updated>2012-02-05T00:00:00+00:00</updated>
      <id>/2012/02/05/Zurich-Vienna-Prague</id>
      <content type="html" xml:base="/2012/02/05/Zurich-Vienna-Prague/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 2-nji fewraly - Berlin&lt;/p&gt;

&lt;h2 id=&quot;zürich&quot;&gt;Zürich&lt;/h2&gt;

&lt;p&gt;Irden sagat 7 töwerekleri Milan wokzalyndan Zürich gidýän otla mündim. Öňümde üç
sagatlyk ýol bardy.  Milan-Zürich arasy gezelenç hasam owadandy sebäbi
Şwisariýada bolan Alp daglarynyň içinden geçip gidýärdik. Şol wagt şu ýylyň
ilkinji garyny hem gören wagtym boldy, howa hem azajyk sowap başlapdy öňki
şäherlerdäki gezelençlere görä hasam üşäp başlapdym.&lt;/p&gt;

&lt;p&gt;Zürich wokzalyna gelemde ilkinji gören zadym bagaj goýulýan ýer (kamera
hraneniýa). Öňki şäherlerde näme üçindir görmedim ýa-da kellä gelmedi şu, derrew
uly goşymy goýyp diňe fotoaparat bilen şäheri gezmäne gitdim. Ýörän ýolymyň
gyrasynda kartajyk bardy, seretsem ýakynda bir derýa bar şol tarapa ýöräberdim.
Şäher onsuzam beýlekilerden tapawutly, kän milli geziljek ýa-da taryhy gezilip
bilinjek ýerler ýok. Senagat şäheri bolandygy belli bolyp dur.&lt;/p&gt;

&lt;p&gt;Azajyk aýlananymdan soň derrew telefonyň karta (maps) programmasyndan Google
ofisleriniň bolan ýerini öwrenip şol binaň ýolyny tutdym. Ýolda çynym ýüzlerçe
gezek aşyk boldym (:D). Google ofislerine baryp “Içerini gezsem bolarmy?” diýip
soradym ýöne rugsat bermediler. Şedýdip ýene töweregi aýlanyp yzyma gaýtdym,
agşam sagat ondaky Viýenna gidýän otlydan otyrgyç book edip garaşmak bilen
bolyberdim.&lt;/p&gt;

&lt;p&gt;Şwisariýa gelip iň gowy edip bolaýjak zatlaryň biri lyžada taýmak ýa-da
snowboarding etmek. Alp daglar ýanyňda we meniň arkamda uly sumka bar bolsa daş
töwerekdäki köp kişide lyža we snowboarding bardy. Nesip bolsa başga bir wagt..&lt;/p&gt;

&lt;h2 id=&quot;wiýenna&quot;&gt;Wiýenna&lt;/h2&gt;

&lt;p&gt;Wiýenna ýolçylygynda compartment-de Çehiýa respublikasyndan bolan bir gyz hem
bardy, Wiýenna we Prague hakynda birnäçe maglumatlar alyşdyrdym. Özüne hem
Turkmenistan, milli zatlarymyz hakynda gürrüň berdim. Wagt derrew geçdi we irden
Wiýenna wokzalyna geldim.&lt;/p&gt;

&lt;p&gt;Şu wokzal iň halan wokzalymyň biri, mugt internet we kompýuteri ulanmak üçin
ýörite tokly ýerler bardy. Ýöne edil Zürich ýaly gymmadyrak şäher, çaý alanymdan
bildim, iki ýewro töweregidi.&lt;/p&gt;

&lt;p&gt;Howa sowygrak bolansoň kän daşaryk çykasym gelmedi, wokzalda otyryp blog ýazmak,
internetde bolmak has gowy gelýärdi. Ýöne 1-2 sagat daşaryk aýlanyp gaýtdym.
Wiýenna Mozart ýaly artistleriň dörän şäheri ekeni, suwinirlerden we Mozart
şekilli şokoladlardan belli. Ýöne bir metro duragynda, üýtgeşik hajathana duş
geldim, hajathanada klasik, opera aýdymlar çalýardy. (:D) Girip görmedim ýöne
işiňi saz bilen bitirmek göz öňüme geldi..&lt;/p&gt;

&lt;p&gt;Wiýenna hem köp aşyk bolan şäherlerimiň biri.. Derrew aýlanyşdyryp Prague ýolyny
tutmak bilen boldym.&lt;/p&gt;

&lt;h2 id=&quot;prague&quot;&gt;Prague&lt;/h2&gt;

&lt;p&gt;Şäheri daň bilen sagat bäş töwerekleri geldim. Ilkinji üns beren zadym Çek
sözlerine düşünmegim, bular hem öň Sowýet baknalygynda bolany üçin köp zat
birhili tanyş geldi. Hatta aýdymlary hem bir ors aýdymlaryna meňzeýädi.  Orsça
düşünýän hem köp, ýöne otlydaky sohbetdeş bolanym “Kän orsça gürleme,
halanoklar” diýipdi. Menem köp ýerde iňlis dilinde düşünişdim.&lt;/p&gt;

&lt;p&gt;Bu ýerde hem howa gaty sowykdy, gün çykýança wokzalda garaşdym. Soň derrew bir
turist magazinden bir karta alyp gezmäne başladym. Prague-da gaty köp taryhy
ýerler bar ekeni. Ol tower bul tower, Kurlev koprusi wş ýerleri aýlanyşdyrdym.
Howanyň sowyk bolmagy hem özüme hem gezelençe täsir etmäne başlady, elim ellikli
bolsa hem barmaklarym doňýady käwagt. Günortan töwerekleri gün çykyp azajygam
bolsa daşaryny ýylatdy bolmasa gaty sowykdy.  Derrew şäheri gezişdirip öýlän
sagat dörtde Berline gitýän otlydan ýer aýyrtdym.&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 2-nji fewraly - Berlin</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Italiýada Birnäçe Şäher</title>
      
      
      <link href="/2012/01/29/italiyada-birnace-saher/" rel="alternate" type="text/html" title="Italiýada Birnäçe Şäher" />
      
      <published>2012-01-29T00:00:00+00:00</published>
      <updated>2012-01-29T00:00:00+00:00</updated>
      <id>/2012/01/29/italiyada-birnace-saher</id>
      <content type="html" xml:base="/2012/01/29/italiyada-birnace-saher/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 29-njy ýanwary - Milan-Zurich arasyndaky otlyda&lt;/p&gt;

&lt;h2 id=&quot;pisa&quot;&gt;Pisa&lt;/h2&gt;

&lt;p&gt;Gije ýarysyna sähel galanda, Livorno portyna geldik, derrew gämiden düşüp taksi
tutyp wokzalyň ýolyny tutdym. Özüm ilki ýörärn wokzala çenli diýýädim ýöne
soraşdyrsam gaty uzak ekeni.. Wokzaldaky gümrük işgärlerine haýyşt edipdim taksi
çagyryp beräýiň diýip. Ýolda taksist bilen çat pat gürrüňdeş boldym, asyl bugün
wokzal işgärleri “strike” (grev) edipdirler, haý bolmady bula diýip wokzala
gelsem, ýörite bilet alynýan maşynlar bar ekeni, gelenimden edil on minut soňam
bir otly Pisa-a gidýän ekeni, seretsem ýerli otly (Eurail biletlileriň ýerli
gatnawlar üçin bilet almaklary hökman däl.) derrew liniýa baryp garaşdym.&lt;/p&gt;

&lt;p&gt;On bäş minut soň Pisa wokzalyna getirdi, gidip ýakynda bir otelden otag tutdym
(30 euro).  Ertir sagat on bir töwerekleri şäheri aýlanmana başladym, Pisa
şäheriň ýerleşiş düzgünini gaty haladym (aslynda Italiýada gezen hemme
şähiriňkini haladym) edil wokzaldan garşyňa çykan ilkinji ýol bilen göni
gidiberseň öňüňden ýap çykýar, köprini geçip ýene göni dowam ediberseň, dünýäniň
täsinlikleriniň biri bolan Pisa binasyna eltýär..&lt;/p&gt;

&lt;p&gt;Şäher kän uly däl ýöne gaty owadan, sada .. Aýlanyp, surat çekişdirdim, ýeke
gezmek hezilem welin iki ýa-da ondan köp bolsaň has gowy, üýtgeşik suratlar
çekse bolardy :D, herkim Pisa binasyny itekläp ýa-da ýykalmagyna päsgelmiş ýaly
duryp surata düşýär. Herhili düşýän bardy özä, depýäni haýsy, barmagy bilen
saklaýany haýsy wş. Men binaň ýokarsyna türkmen baýdagy asyp aşakdan hem biri
çekse bet bolardy diýen fantaziýa kelläme geldi ýöne ýeke özümdim, ýokarda şol
wagtda aşakda hem tanyş gerekdi .. Özümem hany itekläp düşeýin surata diýip
geçip barýan iki sany gyza meň suratymy çekäýiň diýip haýyşt etdim, olar bilen
azajyk sohbetdeş bolamdan soň, nirelidigimi soradylar, Türkmenistan diýseň
herkim biläýenok soň nirede ýerleşýär, Kazagystan, Özbegistan goňşy döwletler wş
diýişdirip düşündirýäň. Olaram “Ýöne seň iňliçäň gaty gowy!” (But you speak
english quite good!) diýenlerinde monça boldym.  Köp kişi biziň ýurdymyzy
bilenok, şotaýlardan gelen adamlaryň özleri bilen sohbetdeş bolmagyna haýran
galýalar..&lt;/p&gt;

&lt;p&gt;Ýadap agşam sagat on töwerekleri wokzala geldim, Roma gitmek maksady bilen otly
wagtlaryna seretsem irden bäş töwerekleri biri bar ekeni. Şol wagta çenli
wokzalda garaşdan, sagat 12 çenli McDonalds-da ondan soňam sowyk, bomşlaryň
üýşýän ýeri bolan wokzalda .. Menem bir burçy eýeledim we kitap okap aýdym
diňläp wagty geçirdim ..&lt;/p&gt;

&lt;h2 id=&quot;rim&quot;&gt;Rim&lt;/h2&gt;

&lt;p&gt;Bäşde ýola çykyp sagat on töwerekleri Rim (Roma) şäher wokzalyna geldim. Wokzal
gaty uly ekeni ilkinji üns beren zadym, 30 töweregi otly liniýasy bardy. Derrew
daşyna çykyp ilkinji adama “Kolossal” haýsy tarapda diýip soradym, salgy beren
tarapyna tarap ýöredim. Şol wagtda jaýlara üns berdim, aý söz ýoklaý, çynym biz
wagtynda palçykdan galalar gurýakak bular eýýäm daşdan jaýlar, mermer monument,
statue wş gurypdyrlar.&lt;/p&gt;

&lt;p&gt;Kolossala gelemde onsuzam haýran galdym .. Wi birem, Kolossal, hol bara
Gladiýatorlaryň uryşýan ýeri şonyň galyndysy (surat goýmakçy). Daş töweregine
aýlanyşdyryp ýene bet bina bolan, Sezaryň jaýy diýdilermi şonýaly bir bet mermer
bina gurypdyrlar. Ony hem aýlanyp üstüne çykyp şäheri synlap yza gaýtmak bilen
boldym. Sebäbi belli bir wagtdan soň aýak diýen edenok, birem şugün otlyda
zordan 3 sagat töweregi uky alyp bilipdim.&lt;/p&gt;

&lt;p&gt;Wokzala gelip, altyň ýarynda bir otly Florence (Florentina) gidýärdi şoňa mündim
we ýatmak bilen boldym.&lt;/p&gt;

&lt;h2 id=&quot;florentina&quot;&gt;Florentina&lt;/h2&gt;

&lt;p&gt;Gelemde eýýäm sagat agşam dokuz töwerekleri. Iki sany Hostel adresi alypdym
internetden ikisine hem gitdim (telefonyn maps app-i sag bolsyn). Ýöne ikisinde
hem hiçkim gapyny açmady. Borlaý diýip şäheri gezmelemek bilen boldym.&lt;/p&gt;

&lt;p&gt;Çynym bärsi beýlekilere görä has hem betdi birhili, bugün altynjy gün bolandanmy
nämemi herkim daşarda ýalydy, ýaşy garrysy, owadanja gyzlar, oglanlara üns
bermändirin, herkim ýolda .. Şäheriň özi hasam.  Bärde hiç surata düşürmedim
(gije bolandanmy kän elim fotoaparatly gezmedim. Şäheriň binalary uly uly we
dykma dykma ýagny köçelerden galan ýerler diňe owadan we uly bina. Özümä şu
ýerde okardym ýa-da ýaşardym ..&lt;/p&gt;

&lt;p&gt;Soň ýene yzyma wokzala gaýtdym, gije dörtde Milana gitýän otla münmekçidim.
Sagat ýaňy bir ikiň ýary töwerekleri ýene garaşmak üçin wokzalyň iki ädim
ýanyndaky McDonalds-a gitdim.&lt;/p&gt;

&lt;h2 id=&quot;milan&quot;&gt;Milan&lt;/h2&gt;

&lt;p&gt;Irden sagat dokuz töwerekleri Milan merkez wokzalyna geldim. Derrew internetden
bir iki sany hostel adresi alyp metro bilen iň ýakyna gitdim. Baran hostelimde
boş ýer ýok ekeni ýöne maňa ýakynragynda bolan bir hosteli salgy berdi, belki
olarda bardyr boş ýer diýip. Tapan hostelim, California Hostel, olarda hem edil
şol wagt boş ýer ýok ekeni, ýöne sagat öýlän üçden soň boş ýer bolar diýdiler.
Sumkany goýsam bolarmy diýip, San Siro nähili gitmeli salgysyny alyp aýlanmana
gitdim.&lt;/p&gt;

&lt;p&gt;Stadion şäheriň daşragynda ýerleşýän ekeni, taş töweregini aýlandym, birnäçe
adamlar bilen gürrüňdeş bolsam asyl agşam AC Milan - Caiglari oýyny bar ekeni,
bilet bahasyny sorasam iň arzany 20euro (iň sonky, 5-nji setir), gidip 22euro
3-nji setirden bilet aldym. Ýöne ýanymda ýekeje hem Milana degişli zat ýokdy, AC
Milan magazynyna barsam bazar güni sebäpli ýapyk ekeni. Ýakynlarda kän futbol
kluba degişli odyk budyk satýan bardy şolardan bir klubyň şarfyny satyn aldym.
Şeýdip birden hemme zat öz özünden bolyberdi, agşamky oýuna biletim bardy birem
ýer hem boşaýsa hostelde diýip şäheriň merkezine gitdim metro bilen.&lt;/p&gt;

&lt;p&gt;Milan şäher merkezinde meşhur bir Katedral (church-yň ulusy) bar, üýtgeşik
arhitekturasy bar, Gothic Catedral diýýäler. Otada butaýda aýlanyp yzyma hostele
gaýtdym.&lt;/p&gt;

&lt;p&gt;Agşam stadiona oýyn görmäne gitdim, aslynda ilkinji gezek stadiona bir futbol
oýyny görmäne gitýänimi aňdym, ýerimi tapyp otyrdym. Bilmedim gowy balet
etdimmikän diýýän özä (:D), her golda türkmen baýdagy açan boldym, wş.. Şol
ýerde hem ýanymda otyran bir iki sany ýerli janköýerler, “Sen nireden, niräň
baýdagy bul?” diýip soradylar, menem “Turkmenistandan” diýip soňam ilkinji
eşiden ýurtlary bolansoň, nirede ýerleşýänini, goňşy ýurtlary aýdyşdyrdym.
“Şotaýdan AC Milana balet etmäne geldiňmi?” diýip sorady, menem “Elbette!”
diýdim, aslynda aldadym, oýyny görmegim hiçhili planlanmandy. Aý Milan
janköýerleri begensin diýdimdä..&lt;/p&gt;

&lt;p&gt;Hostelde hem boş ýer bar ekeni, iki günlik ýer aldym, toplamda 30euro. Bir gün
gowy dynç almagy planlaýadym.  Otaýda hem 3 sany ors oglan bilen tanyşdym,
fençing (fihtawaniýa) ýaryşa gelipdirler, Kyrgyzystan adyna ýaryşýan ekenler.
Olar maňa rusça men olara iňlisçe, rusça garyşyk dilde gürleşipdim. (Çynym
orsçamy gowylaşdyrmy!) Hostel kuhnýasynda yzakly gün gülüşip otyrdyk, soň bir
ýerlerden ýaponlar geldi olar bilen goşylyp bir gün gowy dynç aldym.&lt;/p&gt;

&lt;p&gt;Arada bir wagt tapyp San Siro AC Milan store hem aýlanyp gaýdypdym. Magazindan
çykan ýanyňda klub muzeýi bardy. Göreýin diýip barsam eýýäm ýapjak bolyp
durdylar öýlän bäş töwerekleridi, gijä galypdym.&lt;/p&gt;

&lt;p&gt;Ertesi gün, 31-nji Ýanwar, irden wokzala gaýtdym. Sagat ýedide Zurich gitýän
otla bilet alyp ýola çykdym.  Şuwagt hem bulary otlyň içinden ýazyp otyrn..&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 29-njy ýanwary - Milan-Zurich arasyndaky otlyda</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Barcelona We Başdan Geçirenlerim</title>
      
      
      <link href="/2012/01/26/barcelona-we-basdan-gecirenlerim/" rel="alternate" type="text/html" title="Barcelona We Başdan Geçirenlerim" />
      
      <published>2012-01-26T00:00:00+00:00</published>
      <updated>2012-01-26T00:00:00+00:00</updated>
      <id>/2012/01/26/barcelona-we-basdan-gecirenlerim</id>
      <content type="html" xml:base="/2012/01/26/barcelona-we-basdan-gecirenlerim/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 26-njy ýanwary - Orta ýer deňizinde bir ýerler&lt;/p&gt;

&lt;h2 id=&quot;şäher-barada&quot;&gt;Şäher Barada&lt;/h2&gt;

&lt;p&gt;Lissabondan gije ýola çykyp, Madrid-den transfer edip, soňynda günortan
töwerekleri Barcelona şäherine geldim, sag aman. Güneşli we salkyn howa bardy.
Derrew wokzaldan çykyp daş töwerege aýlanmana gitdim. Bir ýolyň soňynda bir
aýlawa we aýlawyň ýokarsynda depede hem betje bina göze ildi, şol tarapa
ýöredim. Şäheriň onsuzam daş töweregi uly depeler bilen gabalan ýalňyşmasam, her
tarapda ýaşyl depeler bardy.  Depedäki ýer bir milli muzeý ekeni, daş töweregi
turistik ýerler. Owadan ýerler surata düşürişdirip bir iki sagat aýlandym.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/01.JPG&quot; alt=&quot;aýlaw&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/02.JPG&quot; alt=&quot;age of empiresdäki, ýaý atýan bina menzeya&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/03.JPG&quot; alt=&quot;depe&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/04.JPG&quot; alt=&quot;sutunler1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/05.JPG&quot; alt=&quot;gummez&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/06.JPG&quot; alt=&quot;sutunler2&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;pulymyň-ogurlanmagy&quot;&gt;Pulymyň Ogurlanmagy&lt;/h2&gt;

&lt;p&gt;Boldy indi diýip yzyma barýarkam, öňümden 4-5 sany çaga (14-15 ýaşly) gyz çykyp,
“Biz maýyp çagalar üçin ýardym toplaýas, golyňyz we şaheriňizi ýazyp 5 euro
beräýiň?” diýdiler, menem borlaý diýip, gapjykdan puly çykaryp berdim, onýança
olaram ID-ňizi hem görkeziň diýişdirip gapjyga ýapyşyp kimlik kartymy aljak
boldylar, bolýar ýok diýip ýaňy aýrylyp 1 minut ýöremänkäm, kellä depdi, şulara
bir oýuna oýnady diýip gapjyga seretsem beýleki jübüsindäki 4 sany ellilik ýok!
Derrew ylgap baryp biriniň elinden tutyp “Bolyň puly beriň bolmasa polisiýa
aýtýan häzir!” diýip, bir bir ýarym minut gygyryşdyk, soň tutan gyzym kurtkasyny
çykardy we hemmesi birden gaçyp gitdiler. Yzlaryndan ylgajagam boldym welin
arkadaky 7-8 kg bilen ylgap bilmedim. Elimde kiçi gara kurtgadan başga zat
galmady.  Derrew ýanymdaky dükana baryp polise jaň edäýiň diýdim, olar eýýäm
eşidip, görüp jaň edipdirler. 3-4 minut soň polis geldi we soraşdyryp, eger
report etmek isleseň merkeze gelmeli bolarsyň diýdiler menem ylalaşyp gitdim.
Aňyrda hem kinolardaky ýaly (:D) birnäçe çaga kriminallaryň suratlaryny
görkezdiler, aklymda galan we meňzeýänlerini görkezişdirdim, hemme zady
aýdyşdyrdym, report ýazyp özüme hem bir kopiýasyny berdiler we aýryldym..&lt;/p&gt;

&lt;p&gt;Ýola çykmankam maňa iň ýakyn bolan adamlaryň biri, sadaka bermäni ýatdan çykarma
diýipdi. Menem sadaka berer ýaly musulman barmy töwerekde bahanasy bilen
bermändim, olam niýetiň sadaka bolsa bolýar diýipdi.. ogurlykdan soň ilkinji
kelläme gelen zatdy şu. Haýyrlysy, ýol we başymyň sadakasy bolsyn diýip ýenede
gezelençden lezzet almak bilen bolyberdim.  (aslynda özümide gaharym gelýar,
eldäki pullaryň hemmesini bir ýerde tutmak gaty howply, her ýerde birnäçe manat
saklamak amatly we howpsyzrak.. haýyrlysy)&lt;/p&gt;

&lt;h2 id=&quot;indi-niräk&quot;&gt;Indi Niräk&lt;/h2&gt;

&lt;p&gt;Ýene bir sagat ýaly şäheriň içinde ýöredim, gün ýaşypdy, soraşdyryp ýene
Barcelona Sants wokzala geldim, derrew gidesim gelýärdi Italiýa ýöne otly ýok
ertir agşama çenli, ýakynrakda bolan Fransiýadaky Marselle, Nice şaherlere hem
otly ýokdy. Soň gämi (ferryboat) bilen gideýinle diýip metro bilen Barcelona
port-a gitdim.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/07.JPG&quot; alt=&quot;barcelona porty&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Derrew bilet soraşdyrdym, şugün gije ýarysy bir gämi Livorno-a (Italiýada Pisa
şäherine ýakyn bolan şäher) gitýän ekeni, derrew biledi satyn aldym (19 sagatlyk
Barcelona-dan Livorno-a gämi ýolçulugy Orta ýer deňizinden).  Sagat entäk ýaňy
bir on töweregi bolansoň birzatlar iýişdirdim we töweregi gezdim, gaty bet
ýerler ekeni port töwerekleri, tüýs jübitler üçin (:D).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/08.JPG&quot; alt=&quot;köprüjik&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/09.JPG&quot; alt=&quot;bet gadymy gämi&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Onça ýadawlykdan soň öli ýaly ýatyp turyp, şu wagt gämiden Orta ýer deňiziň bir
ýerlerinden ýazyp otyrn şulary. (şu wagt internet ýok, şoň üçin bolan wagty
&lt;code&gt;git push&lt;/code&gt; etmeli boljak..)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/barcelona/10.JPG&quot; alt=&quot;ferryboat&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 26-njy ýanwary - Orta ýer deňizinde bir ýerler</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Eurail Bilen Gezelenç</title>
      
      
      <link href="/2012/01/24/eurail-bilen-gezelenc/" rel="alternate" type="text/html" title="Eurail Bilen Gezelenç" />
      
      <published>2012-01-24T00:00:00+00:00</published>
      <updated>2012-01-24T00:00:00+00:00</updated>
      <id>/2012/01/24/eurail-bilen-gezelenc</id>
      <content type="html" xml:base="/2012/01/24/eurail-bilen-gezelenc/">&lt;p class=&quot;meta&quot;&gt;2012 ýylyň 24-nji ýanwary - Lissabon&lt;/p&gt;

&lt;h2 id=&quot;giriş&quot;&gt;Giriş&lt;/h2&gt;

&lt;p&gt;Köpüňiz öň hem eşidensiňiz, Interrail ýa-da otly bilen ýewropa ýurtlaryny
gezmek, Eurail bolsa ýewropadan başga ýerde ýaşaýanlar üçin şu mümkinçiligi
ýetirýär. &lt;a href=&quot;https://www.eurail.com/en&quot;&gt;Eurail.com&lt;/a&gt;-dan islän bilet görnüşini
saýlap gezelenç edip bilersiňiz. &lt;a href=&quot;https://www.eurail.com/en/eurail-passes&quot;&gt;Bilet görnüşleri&lt;/a&gt;,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Global Pass&lt;/strong&gt; — ýagny ýewropaň islendik ýurtlaryny 15 gün içinde gezmek&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Regional Pass&lt;/strong&gt; — birnäçe gün ýewropada 2-3 döwleti saýlap gezmek&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Flexi Pass&lt;/strong&gt; — alty aý içinde islendik gün, toplamda 15 gün islendik ýeri
gezmek&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ýaly bilet görnüşleri bar.&lt;/p&gt;

&lt;p&gt;Biletler online satyn alynýar, we beren adresiňize iki hepde içinde ýetirilýär.
Soň öňümizdäki alty aý içinde islendik wagt gezelençe başlap bolýar, ondan soň
bilet köýýär.  Şeýle sorag ýüze çykyp biler, “Wiza gerekmi?” gezelenç üçin, meň
özümä gerek däl diýip bilýän, biletiň özi wiza ýerine geçýärmikä diýýän, haýsam
bolsa bir ýewropa döwletine gelip başlabermeli aýlan-çaýlaňa.  Ýöne anyk
bilmeli, öňümizdäki günlerde soramakçy. Islän soragyňyzy &lt;a href=&quot;https://www.facebook.com/eurorail&quot;&gt;Eurail
Facebook&lt;/a&gt; sahypalarynda birem hepde içi chat
wagtlary bar hergün 5 sagatmy birzat, bolmasa hem mail bilen sorap bilersiňiz,
iki iş güni içinde jogap berýärler. (sales@eurail.com)&lt;/p&gt;

&lt;h2 id=&quot;taýýarlyk&quot;&gt;Taýýarlyk&lt;/h2&gt;

&lt;p&gt;Gezelenje taýýarlyk barada ýüzlerçe maslahat bar, ýöne şu ýerde öz eden
zatlarymy ýazmakçy. Ilki bilen rugzaga (sumka) nämeler alyşdyrdym,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Eşikler: gyşlyk geýip bolaýjak, şarf, şapka wş.&lt;/li&gt;
  &lt;li&gt;Ýuwynmak üçin gerek zatlar, sabyn, şampun, diş pastasy, iň esasy hem sakgal
syrýan maşynka :D&lt;/li&gt;
  &lt;li&gt;Kitap: &lt;a href=&quot;https://www.amazon.com/Game-Thrones-Song-Fire-Book/dp/0553573403&quot;&gt;Game of Thrones&lt;/a&gt; (A Song of Ice and Fire, Book One), nesip
bolsa gutaryp bilerin ýolda okap&lt;/li&gt;
  &lt;li&gt;Music: &lt;a href=&quot;https://en.wikipedia.org/wiki/AC/DC&quot;&gt;AC/DC&lt;/a&gt; butin albomlaryny download
etdim, hemme aýdymlaryny diňlemekçi we birem &lt;a href=&quot;https://en.wikipedia.org/wiki/Pink_Floyd&quot;&gt;Pink
Floyd&lt;/a&gt;-dan hem öň diňlemedik
albomlarymy aldym&lt;/li&gt;
  &lt;li&gt;Laptop: men munsyz gezip bilmeýän, iň agyram şul&lt;/li&gt;
  &lt;li&gt;Fotoaparat&lt;/li&gt;
  &lt;li&gt;Pasaport we Bilet&lt;/li&gt;
  &lt;li&gt;We başgada birgiden akyr-ukur, karta, şarj enjamlar, kompass, fonarik, first
aid üçin ýara bagy wş&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bulardan başga hem telefonyma &lt;a href=&quot;https://www.bahn.com&quot;&gt;Bahn&lt;/a&gt; (Germaniýaň otly
sistemasy) &lt;a href=&quot;https://www.bahn.com/en/view/booking-information/booking/db-navigator-app.shtml&quot;&gt;app&lt;/a&gt; gurdym, ýewropadaky hemme şäherler arasy otly
wagtlaryny tapsa bolýar, ýenede &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Free Wi-Fi Finder&lt;/code&gt; app we birnäçe şäher
guide-leri bolan Lonely Planet &lt;a href=&quot;https://www.lonelyplanet.com&quot;&gt;app&lt;/a&gt;-lerini
guryşdyrdym.&lt;/p&gt;

&lt;p&gt;Kyn, howply we şol bir wagtda gyzykly on bäş gün garaşýar..&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Tourists don’t know where they’ve been, travellers don’t know where they’re
going.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;img src=&quot;/files/eurail/01.JPG&quot; alt=&quot;eurail planning&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012 ýylyň 24-nji ýanwary - Lissabon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">SOPA Barada Gysgaça</title>
      
      
      <link href="/2012/01/18/sopa-barada-gysgaca/" rel="alternate" type="text/html" title="SOPA Barada Gysgaça" />
      
      <published>2012-01-18T00:00:00+00:00</published>
      <updated>2012-01-18T00:00:00+00:00</updated>
      <id>/2012/01/18/sopa-barada-gysgaca</id>
      <content type="html" xml:base="/2012/01/18/sopa-barada-gysgaca/">&lt;p class=&quot;meta&quot;&gt;2012-njy ýylyň 18-nji ýanwary - Lissabon&lt;/p&gt;

&lt;p&gt;Ýaňy pikirlerde Wikipedia iňlis dilinde näme üçin ýapylýar wş soraglar soraldy.
Şu mowzukda azajygam bolsa düşündirip bilişimçe sebäbini ýetirmekçi.  Her bir
internet ulanyjysynyň bunuň sebäbi barada maglumat edinmegi möhüm!&lt;/p&gt;

&lt;p&gt;SOPA (Stop Online Piracy Act) eýýäm näçe aý bäri Amerikada kanun boljak bolyp
ýör. Gysgaça aýtsak SOPA copyright haklary goramak üçin kanunlaşdyryljak bolýar.
Hä bize näme dahyly bar diýip bilersiňiz.&lt;/p&gt;

&lt;p&gt;Eger SOPA kanun bolyp kabul ediläýse Amerikada, hakkyny gözleýän USA raýatlary
congress kanuny bozan saýty barada dawa açandan soň şol saýty gözleg motorlary
(Google, Yahoo, Bing) tapmagy (indexing) gadagan, online reklam bilen
işleşýänleriň iş alyşmagy gadagan we iň möhümi ISP-leriň (internet service
provider) şol saýta girilmegini gadaganlamagyna getirip bilýär.&lt;/p&gt;

&lt;p&gt;Asyl SOPA-nyň maksady hem şol, ýagny USA daşynda bolan kino wş zatlary paýlaşýan
saýtlary bloklamak wş. Kanun bolaýsa köp ulanýan saýtlarymyzyň ýapylmagy diýmek
(torrent wş).&lt;/p&gt;

&lt;p&gt;Bu kanuna, Google, Facebook, Twitter, Zynga, eBay, Mozilla, Yahoo we LinkedIn
açyk şekilde garşy bolandyklaryny yglan edip birnäçe USA congressman-a hat
ýazypdylar..&lt;/p&gt;

&lt;p&gt;Eger kanun giriziläýse, ilki bilen Youtube-a täsiri uly we Vimeo, Flickr ýaly
saýtlaryň bolsa ýapylmagy diýmek.&lt;/p&gt;

&lt;p&gt;Gysgaça şeýleräk.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/wiki-blackout.png&quot; alt=&quot;wiki-blackout&quot; /&gt;&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012-njy ýylyň 18-nji ýanwary - Lissabon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">CS Papers - BitTorrent</title>
      
      
      <link href="/2012/01/18/bit-torrent-paper/" rel="alternate" type="text/html" title="CS Papers - BitTorrent" />
      
      <published>2012-01-18T00:00:00+00:00</published>
      <updated>2012-01-18T00:00:00+00:00</updated>
      <id>/2012/01/18/bit-torrent-paper</id>
      <content type="html" xml:base="/2012/01/18/bit-torrent-paper/">&lt;p class=&quot;meta&quot;&gt;18 January 2012 - Lisbon&lt;/p&gt;

&lt;h2 id=&quot;1-introduction&quot;&gt;1 Introduction&lt;/h2&gt;

&lt;p&gt;Original paper is
&lt;a href=&quot;https://github.com/bittorrent/bittorrent.org/blob/master/bittorrentecon.pdf&quot;&gt;here&lt;/a&gt;.
As you know BitTorrent is
&lt;a href=&quot;https://en.wikipedia.org/wiki/Peer-to-peer&quot;&gt;Peer-to-Peer&lt;/a&gt; file sharing
protocol. &lt;!--In this blog I will try to give brief summary of this protocol.--&gt;&lt;/p&gt;

&lt;h2 id=&quot;2-technical-framework&quot;&gt;2 Technical Framework&lt;/h2&gt;

&lt;h3 id=&quot;21-publishing-content&quot;&gt;2.1 Publishing Content&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;.torrent&lt;/code&gt; file contains information about the file, its length,
name, hashing information and url of a tracker. Trackers are responsible for
helping downloaders to find each other. A downloader sends information about
what file it is downloading, what port it’s listening on, and tracker responds
with a list of contact information for peers which are downloading the same
file. Downloaders use this information to find and connect to each other. To
make a file available, a ‘downloader’ which happens to have the complete file,
known as seed, must be started.&lt;/p&gt;

&lt;h3 id=&quot;22-peer-distribution&quot;&gt;2.2 Peer Distribution&lt;/h3&gt;

&lt;p&gt;The tracker’s responsibilities are strictly limited to helping peers to find
each other.  All logistical problems of file downloading are handled in the
interactions between peers.  In order to keep track of which peers have what,
BitTorrent cuts files into pieces of fixed size. Each downloader reports to all
of its peers what pieces it has. To verify data integrity,
&lt;a href=&quot;https://en.wikipedia.org/wiki/SHA-1&quot;&gt;SHA1&lt;/a&gt; hashes of all pieces are included in
the &lt;code&gt;.torrent&lt;/code&gt; file, and peers don’t report that they have a piece
until they’ve checked the hash. Peers continuously download pieces from all
peers which they can.&lt;/p&gt;

&lt;h3 id=&quot;23-pipelining&quot;&gt;2.3 Pipelining&lt;/h3&gt;

&lt;p&gt;BitTorrent facilitates the pipelining by breaking pieces further into sub-pieces
over the wire, typically sixteen kilobytes in size, and always keeping some
number, typically five, requests pipelined at once. This will avoid a delay
between pieces being sent, which is disastrous for transfer rates. The number
for pipelining can be selected as a value that will reliably saturate most
    connections.&lt;/p&gt;

&lt;h3 id=&quot;24-piece-selection&quot;&gt;2.4 Piece Selection&lt;/h3&gt;

&lt;p&gt;Selecting pieces to download in a good order is very important for good
performance.&lt;/p&gt;

&lt;h4 id=&quot;241-strict-priority&quot;&gt;2.4.1 Strict Priority&lt;/h4&gt;

&lt;p&gt;BitTorrent’s first policy for piece selection is that once a single sub-piece
has been requested, the remaining sub-pieces from that particular piece are
requested before sub-pieces from any other piece. This will be good for getting
complete pieces as quickly as possible.&lt;/p&gt;

&lt;h4 id=&quot;242-rarest-first&quot;&gt;2.4.2 Rarest First&lt;/h4&gt;

&lt;p&gt;When selecting which piece to start downloading next, peers generally download
pieces which the fewest of their own peers have first, a technique referred as
“rarest first”.  This technique does a good job of making sure that peers have
pieces which all of their peers want, so uploading can be done when wanted. It
also makes sure that pieces which are more common are left for later, so the
likelihood that a peer which currently is offering upload will later not have
anything of interest is reduced.&lt;/p&gt;

&lt;h4 id=&quot;243-random-first-piece&quot;&gt;2.4.3 Random First Piece&lt;/h4&gt;

&lt;p&gt;An exception to rarest first is when downloading starts. At that time, the peer
has nothing to upload, so it’s important to get a complete piece as quickly as
possible. Rare pieces are generally present on one peer, so they would be
downloaded slower than pieces which are present on multiple peers for which it
is possible to download sub-pieces from different places. Until the first
complete piece is assembled, pieces to download are selected at random and then
strategy changes to rarest first.&lt;/p&gt;

&lt;h4 id=&quot;244-endgame-mode&quot;&gt;2.4.4 Endgame Mode&lt;/h4&gt;

&lt;p&gt;Closer to the end of download, a peer with very slow transfer rates may delay
download’s finish.  To keep that from happening, once all sub-pieces which peer
doesn’t have are actively being requested it sends requests for all sub-pieces
to all peers. Cancels are sent for sub-pieces which arrive to keep to much
bandwidth from being wasted on redundant sends.&lt;/p&gt;

&lt;h2 id=&quot;3-choking-algorithms&quot;&gt;3 Choking Algorithms&lt;/h2&gt;

&lt;p&gt;To cooperate peers upload, and to not cooperate they ‘choke’ peers. Choking is a
temporary refusal to upload; it stops uploading but downloading can still happen
and the connection doesn’t need to be renegotiated when choking stops.  A good
choking algorithm should utilize all available resources, provide reasonably
consistent download rates for everyone, and be somewhat resistant to peers only
downloading and not uploading.&lt;/p&gt;

&lt;h3 id=&quot;32-bittorrents-choking-algortihm&quot;&gt;3.2 BitTorrent’s Choking Algortihm&lt;/h3&gt;

&lt;p&gt;Each BitTorrent peer always unchokes a fixed number of other peers (default is
four), so the issue becomes which peers to unchoke. Decisions as to which peers
to unchoke are based strictly on current download rate. Calculating current
download rate meaningfully is a surprisingly difficult problem; the current
implementation essentially uses a rolling 20-second average.  BitTorrent peers
recalculate who they want to choke once every ten seconds, and then leave the
situation as is until the next ten seconds period is up.&lt;/p&gt;

&lt;h3 id=&quot;33-optimistic-unchoking&quot;&gt;3.3 Optimistic Unchoking&lt;/h3&gt;

&lt;p&gt;Simply uploading to the peers which provide the best download rate would suffer
from having no method of discovering if currently unused connections are better
than the ones being used. To fix this, at all times a BitTorrent peer has a
single ‘optimistic unchoke’ which is unchoked regardless of the current download
rate from it. Which peer is the optimistic unchoke is rotated every third
rechoke period (30 seconds).&lt;/p&gt;

&lt;h3 id=&quot;34-anti-snubbing&quot;&gt;3.4 Anti-snubbing&lt;/h3&gt;

&lt;p&gt;Occasionally a BitTorrent peer will be choked by all peers which it was formerly
downloading from.  In such cases it will usually continue to get poor download
rates until the optimistic unchoke finds better peers. Therefore, if over a
minute goes by without getting a single piece from a particular peer, BitTorrent
assumes it is ‘snubbed’ by that peer and doesn’t upload to it except as an
optimistic unchoke.&lt;/p&gt;

&lt;h3 id=&quot;35-upload-only&quot;&gt;3.5 Upload Only&lt;/h3&gt;

&lt;p&gt;Once a peer is done downloading, it no longer has useful download rates to
decide which peers to upload to. The current implementation then switches to
preferring peers which it has better upload rates to, which does a decent job of
utilizing all available upload capacity and preferring peers which no one else
happens to be uploading to at the moment.&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="en" />
      

      

      
        <summary type="html">18 January 2012 - Lisbon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">EMDC Courses</title>
      
      
      <link href="/2012/01/15/EMDC-Courses/" rel="alternate" type="text/html" title="EMDC Courses" />
      
      <published>2012-01-15T00:00:00+00:00</published>
      <updated>2012-01-15T00:00:00+00:00</updated>
      <id>/2012/01/15/EMDC-Courses</id>
      <content type="html" xml:base="/2012/01/15/EMDC-Courses/">&lt;p class=&quot;meta&quot;&gt;15 January 2012 - Lisbon&lt;/p&gt;

&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;On the summer of 2011 after graduating from &lt;a href=&quot;https://www.metu.edu.tr&quot;&gt;METU&lt;/a&gt;
Computer Engineering, I have been accepted to European Masters in Distributed
Computing (&lt;a href=&quot;https://www.ac.upc.edu/en/academics/master/master-emdc-european-master-in-distributed-computing&quot;&gt;EMDC&lt;/a&gt;) joint Erasmus Mundus programme between &lt;a href=&quot;https://www.kth.se/&quot;&gt;KTH Royal
Institute of Technology (KTH)&lt;/a&gt; in Sweden, &lt;a href=&quot;https://www.ist.utl.pt/&quot;&gt;Instituto
Superior Técnico in Portugal (IST)&lt;/a&gt; and &lt;a href=&quot;https://www.upc.edu/en&quot;&gt;Universitat
Politècnica de Catalunya (UPC)&lt;/a&gt; in Spain.  It is a
two-year Master’s programme including compulsory mobility for the students.&lt;/p&gt;

&lt;p&gt;I have been accepted to IST - KTH track, which means I will be studying in the
Instituto Superior Técnico for two semesters and another semester in KTH Royal
Institute of Technology and for final semester will be doing my thesis in
Institute Superior Técnico again.&lt;/p&gt;

&lt;p&gt;After long efforts and procedures of getting visa, I have managed to get to the
Lisbon and adapt to the beautiful city, to the university and  meet with great
friends.&lt;/p&gt;

&lt;p&gt;This semester (Fall 2011) at &lt;a href=&quot;https://www.ist.utl.pt/&quot;&gt;IST&lt;/a&gt; I have taken four
master’s courses,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Parallel and Distributed Computing&lt;/li&gt;
  &lt;li&gt;Peer-to-Peer Systems and Overlay Networks&lt;/li&gt;
  &lt;li&gt;Cloud Computing&lt;/li&gt;
  &lt;li&gt;Network and Computer Security&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these courses were challenging and interesting, I have learned whole new
technologies, concepts and had hands on practice doing projects.&lt;/p&gt;

&lt;h2 id=&quot;parallel-and-distributed-computing&quot;&gt;Parallel and Distributed Computing&lt;/h2&gt;

&lt;p&gt;This course was most familiar for me since I had Parallel computing course while
studying my Bachelor’s degree. Nevertheless I have learned new things about
distributed computing; distributed architecture, OpenMP, MPI, designing and
implementing distributed and parallel algorithms and analyzing them using
several metrics.&lt;/p&gt;

&lt;p&gt;Course project was to implement network slack computation both using OpenMP
(shared memory) and MPI (distributed memory).&lt;/p&gt;

&lt;h2 id=&quot;peer-to-peer-systems-and-overlay-networks&quot;&gt;Peer-to-Peer Systems and Overlay Networks&lt;/h2&gt;

&lt;p&gt;This was one of the interesting courses. Starting from BitTorrent, eMule and
then learning about unstructured p2p systems, Gnutella; then structures ones,
Chord, Pastry, Kademia and Can. After that were a little bit challenging, we
have studied Gossip, One-hop protocols, distance estimation (Vivaldi algorithm),
content distribution (Akamai), load balancing in p2p systems, etc.&lt;/p&gt;

&lt;p&gt;Course project was to design Http p2p proxy.&lt;/p&gt;

&lt;h2 id=&quot;cloud-computing&quot;&gt;Cloud Computing&lt;/h2&gt;

&lt;p&gt;This course was also very interesting. Starting from clusters, grids and
eventually about cloud technologies. Implementing mini MapReduce using Hadoop,
cloud technologies Amazon AWS, Google App Engine and MS Azure.&lt;/p&gt;

&lt;p&gt;Course project was to design and implement mini web indexing system that will be
using all cloud systems. We, project team, had implemented page-rank algorithm
on Amazon EC2, stored data on MS Azure database and implemented web front-end on
Google App Engine communicating with Azure database. Trying and learning these
was very interesting.&lt;/p&gt;

&lt;h2 id=&quot;network-and-computer-security&quot;&gt;Network and Computer Security&lt;/h2&gt;

&lt;p&gt;This course was my least interesting. We have learned lots of protocols and key
encryption algorithms, both symmetrical and asymmetrical. Moreover, about the
network certification organizations.&lt;/p&gt;

&lt;p&gt;Since semester is over and we had first round of exams (hope we pass them) we
can concentrate to the next learning adventures.&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="en" />
      

      

      
        <summary type="html">15 January 2012 - Lisbon</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">Miradouro da Graça</title>
      
      
      <link href="/2012/01/10/miradouro-da-graca/" rel="alternate" type="text/html" title="Miradouro da Graça" />
      
      <published>2012-01-10T00:00:00+00:00</published>
      <updated>2012-01-10T00:00:00+00:00</updated>
      <id>/2012/01/10/miradouro-da-graca</id>
      <content type="html" xml:base="/2012/01/10/miradouro-da-graca/">&lt;p class=&quot;meta&quot;&gt;2012-njy ýylyň 10-njy ýanwary - Lissabon&lt;/p&gt;

&lt;p&gt;Lissabonda howa köplenç maýyl, gyşyň ortasynda gezip ýörs ýöne entäk ne gar bar
nede sowyk howa.  Käwagtlar ýagyş ýagýar diýäýmeseň howa köplenç güneşli we
maýyl.&lt;/p&gt;

&lt;p&gt;Şugün dostlarymyň biri, ýakynymyzda çaýlamak ýa-da kitap wş okamak üçin gowy bir
ýer bar diýdi we biz ekzamine taýýarlanmak üçin gitdik şol ýere.  Bilýän
gülýäňiz ýöne öýde oturyp okamak ýerine şol ýere gidenime begenýän, azajygam
bolsa kelläň dynç alýar, we öz komfort zonaňdan çykýaňda..&lt;/p&gt;

&lt;p&gt;Mirodouro da Graça, terjime etsek Graça viewpoint-y (ýerliler miradouro diýýär)
ýagny şäheriň bir bölegini synlap biljegiňiz ýer diýmek. Lissabonyň duran
ýerinde köp beýikli pesli ýerler bar we şäheri synlamak üçin birnäçe depe bar.
Bularyň biri hem Mirodouro da Graça, turistleriň, studentleriň we başgada kän
adamlaryň dynç almak üçin ugraýan ýeri. Çaý, tost iýmek içmek üçin hem kiçijek
kafe bar..&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/graca/01.JPG&quot; alt=&quot;1-nji surat&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/graca/02.JPG&quot; alt=&quot;2-nji surat&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/graca/03.JPG&quot; alt=&quot;3-nji surat&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/graca/04.JPG&quot; alt=&quot;4-nji surat&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/files/graca/05.JPG&quot; alt=&quot;5-nji surat&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Şondan soň birnäçe gezek gitdim şol ýere (ekzaminlere taýýarlanmak üçin :D) ..&lt;/p&gt;</content>

      
      
      
      
      

      

      
        <category term="tk" />
      

      

      
        <summary type="html">2012-njy ýylyň 10-njy ýanwary - Lissabon</summary>
      

      
      
    </entry>
  
  
</feed>
