<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://kerrickstaley.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://kerrickstaley.com/" rel="alternate" type="text/html" /><updated>2026-02-26T08:16:19+00:00</updated><id>https://kerrickstaley.com/feed.xml</id><title type="html">Kerrick Staley</title><subtitle>Kerrick Staley&apos;s blog and homepage, serving up juicy posts about programming, Mandarin, learning (machine and human), and more!
</subtitle><entry><title type="html">Can frontier LLMs solve CAD tasks?</title><link href="https://kerrickstaley.com/2026/02/22/can-frontier-llms-solve-cad-tasks" rel="alternate" type="text/html" title="Can frontier LLMs solve CAD tasks?" /><published>2026-02-22T00:00:00+00:00</published><updated>2026-02-22T00:00:00+00:00</updated><id>https://kerrickstaley.com/2026/02/22/can-frontier-llms-solve-cad-tasks</id><content type="html" xml:base="https://kerrickstaley.com/2026/02/22/can-frontier-llms-solve-cad-tasks"><![CDATA[<p>Frontier LLMs like GPT-5.3-Codex, Gemini 3.1 Pro, and Claude Opus 4.6 have <a href="https://www.robinlinacre.com/llms_in_2025/">spiky</a> capabilities, performing several stddevs above median human performance at <a href="https://x.com/alexwei_/status/1946477742855532918">some tasks</a> while failing at some <a href="https://www.reddit.com/r/singularity/comments/1r2ndfz/the_car_wash_test_a_new_and_simple_benchmark_for/">“easy”</a> ones.</p>

<p>LLM pretraining datamixes often emphasize general knowledge, reasoning, and coding. The human “training mix” includes far more samples of visual/spatial/motor tasks, which come about naturally in the embodied human experience.</p>

<p>World models like Sora and Genie are pretrained on video and 3D video game data and excel at predicting the behavior of the real world. But no current model is at the frontier of both reasoning/coding and spatial reasoning/world modeling.</p>

<p>We’d expect (and it seems empirically true) that LLMs trained primarily on text are worse than humans on visual/spatial tasks. Computer-aided design (CAD) tasks require strong 3D reasoning ability as well as common-sense world knowledge, so LLMs might struggle with these tasks.</p>

<h2 id="the-experiment">The experiment</h2>

<p>I started with a practical CAD task I wanted done: designing a 3D-printable wall mount for my bike pump. Could some LLM do this task for me?</p>

<p>Current models can’t use graphical CAD programs like <a href="https://www.freecad.org/">FreeCAD</a>, but they’re great at writing code, so I had the models use <a href="https://openscad.org/">OpenSCAD</a>. Here’s the prompt:</p>

<blockquote>
  <p>Design a wall mount for this Lezyne Steel Floor Drive bike pump that I can 3D print. […] It should hold the bike pump by the handle, so that the bike pump hangs with the dial facing outward. It should hold the pump far enough away from the wall that the valve (which sticks out from the bottom of the pump) doesn’t touch the wall. Orient and position the design so that the wall is the YZ plane, and the mount protrudes into the positive X direction and is symmetric about the XZ plane. […]</p>

  <p>Implement your design in openscad. […]
Keep iterating on your design using the provided tool(s) until your most recent mujoco_mount_sim call returns *ONLY* the status “object_held” and *NO OTHER STATUSES*.
If you get any other status, it means your design was not successful.
Before each call to mujoco_mount_sim, write 1-3 sentences about how your design will work and/or how you will fix the issue(s) with previous versions of your design.
[…]
<br /><br />
<img src="/images/can-frontier-llms-solve-cad-tasks/pump1.jpeg" width="150px" />
<img src="/images/can-frontier-llms-solve-cad-tasks/pump2.jpeg" width="150px" />
<img src="/images/can-frontier-llms-solve-cad-tasks/pump3.jpeg" width="200px" />
<img src="/images/can-frontier-llms-solve-cad-tasks/pump4.jpeg" width="200px" /></p>
</blockquote>

<p>To test the designs, I made a 3D scan of the bike pump using <a href="https://lumalabs.ai/interactive-scenes">Luma AI</a> and created a simulation using <a href="https://mujoco.org/">MuJoCo</a> to check whether the mount holds the pump.</p>

<p style="display: flex; justify-content: center">
<img src="/images/can-frontier-llms-solve-cad-tasks/simulator.png" width="300px" />
</p>

<p>I put each model in an agentic loop where it could call the simulator up to 10 times. If the mount held the pump (i.e. the pump was touching the mount after 5 seconds) then the design passed.</p>

<p>I also tried two other objects: a model of a pan from <a href="https://amazon-berkeley-objects.s3.amazonaws.com/index.html">Amazon Berkeley Objects</a> and a mug from <a href="https://research.google/blog/scanned-objects-by-google-research-a-dataset-of-3d-scanned-common-household-items/">Google Scanned Objects</a>. I evaluated 7 LLMs and did 10 trials per (LLM x object) pair.</p>

<p><a href="https://github.com/kerrickstaley/llm-cad-mount">Code for this project is here.</a></p>

<h2 id="results-and-commentary">Results and commentary</h2>

<table>
  <thead>
    <tr>
      <th rowspan="2">Model</th>
      <th colspan="3">Pass rate (out of 10)</th>
      <th colspan="3">Examples</th>
      <th rowspan="2">Total cost</th>
      <th rowspan="2">Total time</th>
    </tr>
    <tr>
      <th>Pump</th>
      <th>Mug</th>
      <th>Pan</th>
      <th>Pump</th>
      <th>Mug</th>
      <th>Pan</th>
    </tr>
  </thead>
  <tbody>
    <tr><td>Claude Opus 4.6</td><td>10</td><td>10</td><td>10</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__anthropic_claude-opus-4.6__2026-02-22T20:17:39Z.html">pass</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__anthropic_claude-opus-4.6__2026-02-22T19:39:44Z.html">pass</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__anthropic_claude-opus-4.6__2026-02-22T18:49:52Z.html">pass</a></td><td>$41.11</td><td>5.2h</td></tr>
    <tr><td>Gemini 3 Flash</td><td>6</td><td>4</td><td>5</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3-flash-preview__2026-02-22T10:30:05Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3-flash-preview_2026-02-22T05:17:38Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__google_gemini-3-flash-preview__2026-02-22T19:42:52Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__google_gemini-3-flash-preview_2026-02-22T07:32:58Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__google_gemini-3-flash-preview__2026-02-22T10:20:12Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__google_gemini-3-flash-preview_2026-02-22T05:07:31Z.html">fail</a></td><td>$4.01</td><td>3.7h</td></tr>
    <tr><td>Gemini 3.1 Pro</td><td>5</td><td>6</td><td>4</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3.1-pro-preview_2026-02-22T03:10:15Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3.1-pro-preview_2026-02-22T05:32:20Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__google_gemini-3.1-pro-preview_2026-02-22T05:24:05Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__google_gemini-3.1-pro-preview_2026-02-22T04:09:30Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__google_gemini-3.1-pro-preview__2026-02-22T08:02:26Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__google_gemini-3.1-pro-preview__2026-02-22T18:50:24Z.html">fail</a></td><td>$7.06</td><td>3.0h</td></tr>
    <tr><td>GLM-4.6V</td><td>1</td><td>0</td><td>1</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__z-ai_glm-4.6v_2026-02-22T02:09:21Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__z-ai_glm-4.6v__2026-02-22T09:40:26Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__z-ai_glm-4.6v_2026-02-22T06:24:42Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__z-ai_glm-4.6v_2026-02-22T03:51:36Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__z-ai_glm-4.6v__2026-02-22T18:55:36Z.html">fail</a></td><td>$1.49</td><td>6.3h</td></tr>
    <tr><td>GPT-5.2</td><td>8</td><td>9</td><td>9</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__openai_gpt-5.2__2026-02-22T23:52:08Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__openai_gpt-5.2__2026-02-22T14:07:18Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__openai_gpt-5.2__2026-02-22T23:31:19Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__openai_gpt-5.2__2026-02-22T14:34:17Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__openai_gpt-5.2__2026-02-23T00:11:53Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__openai_gpt-5.2__2026-02-22T11:18:54Z.html">fail</a></td><td>$12.15</td><td>7.7h</td></tr>
    <tr><td>Kimi K2.5</td><td>4</td><td>2</td><td>0</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__moonshotai_kimi-k2.5__2026-02-22T20:34:45Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__moonshotai_kimi-k2.5_2026-02-22T07:13:46Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__moonshotai_kimi-k2.5_2026-02-22T06:59:07Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__moonshotai_kimi-k2.5_2026-02-22T06:56:15Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__moonshotai_kimi-k2.5_2026-02-22T06:48:23Z.html">fail</a></td><td>$3.39</td><td>8.5h</td></tr>
    <tr><td>Qwen 3.5 397B</td><td>2</td><td>1</td><td>0</td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__qwen_qwen3.5-397b-a17b_2026-02-22T05:24:18Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__qwen_qwen3.5-397b-a17b__2026-02-22T20:39:23Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__qwen_qwen3.5-397b-a17b_2026-02-21T23:24:21Z.html">pass</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__qwen_qwen3.5-397b-a17b__2026-02-22T23:43:12Z.html">fail</a></td><td><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__qwen_qwen3.5-397b-a17b__2026-02-22T09:12:19Z.html">fail</a></td><td>$2.64</td><td>5.6h</td></tr>
  </tbody>
</table>
<p><em><a href="https://kerrickstaley.com/ai-cad-design-mount-viz/">See all results here</a></em></p>

<p>Claude Opus 4.6 is best at this task. In the table I only evaluate whether the mount held the object, and Claude gets perfect marks. Subjectively, most of its designs are not directly usable but almost all are <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__anthropic_claude-opus-4.6_2026-02-22T00:51:05Z.html">close</a>. They are sometimes <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__anthropic_claude-opus-4.6_2026-02-21T22:53:07Z.html">too large</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__anthropic_claude-opus-4.6__2026-02-22T19:01:07Z.html">too small</a>, would be <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__anthropic_claude-opus-4.6_2026-02-21T18:34:17Z.html">too weak if 3D-printed in plastic</a>, or are <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__anthropic_claude-opus-4.6__2026-02-22T18:41:02Z.html">random shapes that coincidentally work</a>. This capability seems new; I did a smaller run with Claude Opus 4.1 and it failed 100% of the time.</p>

<p>GPT-5.2 has a good pass rate but its designs are subjectively quite bad and almost all would need to be completely reworked. Almost all of its designs have <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__openai_gpt-5.2__2026-02-22T13:04:53Z.html">redundant parts</a>, and they often are <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__openai_gpt-5.2__2026-02-22T13:31:52Z.html">too weak</a> or have <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__openai_gpt-5.2__2026-02-22T09:50:03Z.html">“floating” components</a> that are physically impossible (I could check for this but wanted to avoid scope creep).</p>

<p>Gemini 3.1 Pro and 3 Flash sometimes produce <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3.1-pro-preview_2026-02-21T14:20:43Z.html">great designs</a>. For example, here is <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3-flash-preview_2026-02-21T19:45:56Z.html">Flash one-shotting a usable design for 2.5 cents</a>. However, these models often <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3-flash-preview_2026-02-22T03:02:11Z.html">end up in loops</a> or <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__google_gemini-3.1-pro-preview_2026-02-22T00:25:35Z.html">fail to make use of all 10 turns</a>. Other times they produce <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__google_gemini-3-flash-preview__2026-02-22T10:28:02Z.html">garbled designs similar to GPT-5.2</a>. They often act erratically, producing <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/abo__pan__google_gemini-3.1-pro-preview_2026-02-21T18:34:42Z.html">random words in their commentary</a>. Pro and Flash perform and behave very similarly.</p>

<p>All the open-weight models do poorly. Even in cases where they technically hold the object, their designs are subjectively quite bad and are often <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__qwen_qwen3.5-397b-a17b_2026-02-21T23:24:21Z.html">overly simplistic</a>, <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__z-ai_glm-4.6v_2026-02-22T02:09:21Z.html">only work by accident</a>, or have <a href="https://kerrickstaley.com/ai-cad-design-mount-viz/task_002_mounting_bike_pump__moonshotai_kimi-k2.5_2026-02-22T02:49:05Z.html">floating parts</a>. Kimi K2.5 is noticeably closer to producing usable designs than the other two however.</p>

<h2 id="under-the-hood">Under the hood</h2>

<p>Creating the simulator and building the agentic harness was the bulk of the work on this project. MuJoCo is complex and powerful and often has <a href="https://github.com/google-deepmind/mujoco/discussions/3000">surprising behaviors</a>. LLMs often make mistakes when calling tools and I had to carefully validate the simulator input to distinguish tool call failures from legitimate bugs in my code.</p>

<p>One surprising bottleneck was convex decomposition. MuJoCo can only simulate objects composed of convex components, and so concave geometries have to be broken down into multiple convex geoms. The SOTA method for this is <a href="https://github.com/SarahWeiii/CoACD">CoACD</a>, and it’s quite slow. Generating the above table took 15.9 hours of CoACD processing time on my potato-class Hetzner server (almost as long as the 21.8 hours spent calling the model providers).</p>

<h2 id="future-work">Future work</h2>

<p>I built a simple custom agent harness for this, but it’s possible I could get better results using an off-the-shelf harness like Codex or Claude Code and turning my MuJoCo simulator into a CLI or MCP tool. These could provide a better system prompt and tools like memory to help keep the agent on-track.</p>

<p>Including more objects would make this into a better, more realistic eval. Amazon Berkeley Objects and Google Scanned Objects have ~8k and ~1k 3D models respectively, and although some are irrelevant (e.g. couches), the set of objects could be expanded without much effort.</p>

<p>The biggest thing that could be improved is the grading of results, by checking many aspects of each design and scoring them on a rubric. Here’s a non-exhaustive list of additional things that could be checked:</p>

<ul>
  <li>Does the mount have “floating” parts or multiple parts that would have to separately be attached to the wall?</li>
  <li>Does the mount still hold the object if the object is perturbed?</li>
  <li>Can the object be easily lifted off the mount? (Try moving the object along several reasonable trajectories and see if the object hits the mount).</li>
  <li>Can the object be easily grabbed while in the mount? (Define an exclusion zone around the point where one would grab the object and see if it intersects the mount).</li>
  <li>Does the mount have a big enough contact patch with the wall?</li>
  <li>Does the mount intersect the wall?</li>
  <li>How much material does the mount use? (Actually slice it with PrusaSlicer and check the estimated filament usage).</li>
  <li>Does the mount fit in the build volume of a typical 3D printer?</li>
  <li>Are there thin sections of the model which would be weak when printed?</li>
  <li>How much weight does the model hold before deforming, using a <a href="https://en.wikipedia.org/wiki/Finite_element_method">finite element analysis</a>?</li>
  <li>Can the screw / nail holes in the mount be accessed by a screwdriver / hammer? (Define exclusion zones around the holes and see if they intersect the mount).</li>
</ul>

<h2 id="related-stuff">Related stuff</h2>

<ul>
  <li><a href="https://arxiv.org/pdf/2505.14646">CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation</a></li>
  <li><a href="https://storygold.com/blog/gpt-5_on_3d">How good is GPT-5 at 3D?</a></li>
</ul>]]></content><author><name></name></author><category term="ai" /><category term="3d-printing" /><summary type="html"><![CDATA[Frontier LLMs like GPT-5.3-Codex, Gemini 3.1 Pro, and Claude Opus 4.6 have spiky capabilities, performing several stddevs above median human performance at some tasks while failing at some “easy” ones.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://kerrickstaley.com/images/can-frontier-llms-solve-cad-tasks/simulator.png" /><media:content medium="image" url="https://kerrickstaley.com/images/can-frontier-llms-solve-cad-tasks/simulator.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Law of Total Variance Is Counterintuitive</title><link href="https://kerrickstaley.com/2026/01/21/the-law-of-total-variance-is-counterintuitive" rel="alternate" type="text/html" title="The Law of Total Variance Is Counterintuitive" /><published>2026-01-21T00:00:00+00:00</published><updated>2026-01-21T00:00:00+00:00</updated><id>https://kerrickstaley.com/2026/01/21/the-law-of-total-variance-is-counterintuitive</id><content type="html" xml:base="https://kerrickstaley.com/2026/01/21/the-law-of-total-variance-is-counterintuitive"><![CDATA[<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>

<p>The <a href="https://en.wikipedia.org/wiki/Law_of_total_variance">law of total variance</a> states that for two random variables \(X\) and \(Y\),</p>

\[\mathrm{Var}(Y) = E[\mathrm{Var}(Y | X)] + \mathrm{Var}(E[Y | X])\]

<p>If \(Y\) is a <a href="https://en.wikipedia.org/wiki/Mixture_distribution">mixture</a> of distributions, this tells you that \(\mathrm{Var}(Y)\) is the mean variance of the components plus the variance of the means of the components (\(X\) in this case is a discrete variable indicating which component you selected).</p>

<p>At first blush this makes sense: jumping from \(\bar{Y}\) to \(\mu_i\) (the mean of component \(i\)) gives you some variance, and then jumping from \(\mu_i\) to your sample gives you some additional variance. But! These are <em>dependent</em> jumps, so you can’t naively add the variances.</p>

<h3 id="some-pretty-graphs">Some pretty graphs</h3>

<p>Let’s concretely say we have an equal mixture of 4 Gaussians with means -8, -2, 2, and 8. If the 2 outer Gaussians have variance 9 and the inner Gaussians have variance 1, you get this distribution:</p>

<p style="display: flex; justify-content: center">
<img src="/images/law-of-total-variance/high-var-at-edges.png" width="600px" />
</p>

<p>Compare this to a version where the inner Gaussians have high variance:</p>

<p style="display: flex; justify-content: center">
<img src="/images/law-of-total-variance/high-var-in-middle.png" width="600px" />
</p>

<p>The law of total variance says these two distributions have exactly the same overall variance (which works out to 39). That’s surprising to me! I would intuitively expect the first graph to have higher variance because the wider Gaussians on the edges lead to more extreme values.</p>

<h3 id="a-discrete-version">A discrete version</h3>
<p>Let’s try to understand what’s happening by replacing these Gaussians with <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution">Bernoulli distributions</a> <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> with the same means and variances. For example, a Gaussian with mean 2 and variance 9 becomes a coin flip where heads is 5 and tails is -1.</p>

<table>
  <tr>
    <th></th>
    <th colspan="2">scenario 1</th>
    <th colspan="2">scenario 2</th>
  </tr>
  <tr>
    <th>$$\mu_i$$</th>
    <th>$$\sigma_i$$</th>
    <th>samples</th>
    <th>$$\sigma_i$$</th>
    <th>samples</th>
  </tr>
  <tr><td>-8</td><td>3</td><td>-11, -5</td><td>1</td><td>-9, -7</td></tr>
  <tr><td>-2</td><td>1</td><td>-3, -1</td><td>3</td><td>-5, 1</td></tr>
  <tr><td>2</td><td>1</td><td>1, 3</td><td>3</td><td>-1, 5</td></tr>
  <tr><td>8</td><td>3</td><td>5, 11</td><td>1</td><td>7, 9</td></tr>
</table>

<p>The variances of the two lists [-11, -5, -3, -1, 1, 3, 5, 11] and [-9, -7, -5, -1, 1, 5, 7, 9] are indeed both 39. Cool I guess? But it’s not really illuminating.</p>

<p>Instead let’s sic ✨algebra✨ on the problem. We’ll zoom in on one of these Bernoulli distributions which is a coin flip between \(\mu_i - \sigma_i\) and \(\mu_i + \sigma_i\). We can add up their contributions to the variance. (Here \(N=8\) is the total number of samples):</p>

\[\frac{(\mu_i - \sigma_i)^2}{N} + \frac{(\mu_i + \sigma_i)^2}{N} = 2\frac{\mu_i^2 + \sigma_i^2}{N}\]

<p>So on average, each sample contributes \(\frac{\mu_i^2 + \sigma_i^2}{N}\) to the total variance. That’s what the law of total variance says! It works, at least in this simple case.</p>

<h3 id="ok-so-why-does-it-actually-work">OK so why does it actually work</h3>
<p>\(\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)\) if \(X\) and \(Y\) are <em>uncorrelated</em>. They don’t necessarily need to be independent (which is a stronger condition). You can see this from the more general formula</p>

\[\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X, Y)\]

<p>which works for all \(X\) and \(Y\). Here, once we pick \(X=\mu_i\), the distribution of \((Y \vert X = \mu_i) - \mu_i\) is zero-mean by definition, and so it’s uncorrelated with which \(\mu_i\) we chose.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Technically two-point distributions, but eh. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="stats" /><category term="math" /><summary type="html"><![CDATA[]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://kerrickstaley.com/images/law-of-total-variance/high-var-in-middle.png" /><media:content medium="image" url="https://kerrickstaley.com/images/law-of-total-variance/high-var-in-middle.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Money Myth</title><link href="https://kerrickstaley.com/2025/05/08/the-money-myth" rel="alternate" type="text/html" title="The Money Myth" /><published>2025-05-08T00:00:00+00:00</published><updated>2025-05-08T00:00:00+00:00</updated><id>https://kerrickstaley.com/2025/05/08/the-money-myth</id><content type="html" xml:base="https://kerrickstaley.com/2025/05/08/the-money-myth"><![CDATA[<p>Matt Levine <a href="https://www.bloomberg.com/opinion/articles/2022-02-28/russia-s-money-is-gone">once wrote</a>:</p>

<blockquote>
  <p>money is a social construct, a way to keep track of what society thinks you deserve in terms of goods and services</p>
</blockquote>

<p>That is: Bob makes me a sandwich. I give Bob $5. In doing so, I proclaim to the world that Bob’s sandwich has increased my happiness by five units. The whole of society is indebted to Bob for the Good he has done.</p>

<p>Bob then goes to Amazon and buys a pair of socks with the $5. Society repays his debt by giving him a pair of socks that make him five units happier. The money cycle continues: Jeff Bezos has done $5 of Good, and Jeff calls in the favor by buying (part of) a G700, etc.</p>

<p>A challenge to our modern free market system is that <em>people do not believe this</em>. People do not believe that the money you have reflects how much goodwill you deserve. For good reason! Wealth is often earned in ways that have minimal or negative effects on global utility. Wealth is often inherited by heirs who did nothing to earn it. Even a self-made fortune through honest work is highly random and path-dependent.</p>

<p>That money equals social good is a <em>myth</em> of the free market system. A myth in the sense that it is a meme, which, though not entirely true, reflects the ethos of the system.</p>

<p>If you believe that free market capitalism is the least-bad economic system and want it to succeed (and I do <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>), it is imperative to continually push for the myth to be more true. That means taxing externalities, raising inheritance taxes, and closing loopholes and regulatory arbitrage. <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Caveat 1: In groups <a href="https://en.wikipedia.org/wiki/Dunbar%27s_number">smaller than 150</a>, you can mentally track your mutual social balance with everyone, and money is cumbersome. But you need a more rigorous accounting system to scale it to 8 billion people. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Caveat 2: In a post-ASI world where humans produce negligible economic output, it is unclear that free market capitalism is the best system, because wealth will be frozen at pre-signularity levels. But this is a topic for another essay :) <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Caveat 3: I think society should owe you a balance of goodwill merely for existing. That is, we should have <a href="https://en.wikipedia.org/wiki/Universal_basic_income">UBI</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Saving Capitalism by Robert Reich is a great read that goes further on this topic. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="policy" /><category term="economics" /><summary type="html"><![CDATA[Matt Levine once wrote:]]></summary></entry><entry><title type="html">Drug Dilution</title><link href="https://kerrickstaley.com/2025/01/04/drug-dilution" rel="alternate" type="text/html" title="Drug Dilution" /><published>2025-01-04T00:00:00+00:00</published><updated>2025-01-04T00:00:00+00:00</updated><id>https://kerrickstaley.com/2025/01/04/drug-dilution</id><content type="html" xml:base="https://kerrickstaley.com/2025/01/04/drug-dilution"><![CDATA[<p>Caffeine and alcohol are the most commonly consumed psychoactive drugs in the world. They are most often consumed as beverages, which has a few benefits:</p>
<ul>
  <li><em>dosing</em>: It’s easy to judge how much you have consumed.</li>
  <li><em>moderation</em>: It’s physically difficult to consume a lethal dose.</li>
  <li><em>hydration</em>: Users consume water alongside the drugs.</li>
</ul>

<p>I’d argue that American society would have a healthier relationship with most recreational drugs if they were legal but could only be sold diluted in drinks. This approach could work for popular but currently scheduled drugs like MDMA, LSD, psilocybin, mescaline, ketamine, cocaine, amphetamine, heroin<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, and morphine<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, and could be an alternative delivery route for legal drugs like nicotine and THC.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p>If legalized in this fashion, I’d propose the “beer standard” for intoxicating drugs. Drugs like LSD and psilocybin would be required to be diluted such that one 12 oz drink produces a subjectively similar level of intoxication for the average user as one 12 oz 5% ABV beer. For stimulant drugs like cocaine and amphetamine the standard would be similar to weak coffee.</p>

<p>Dosing is tricky for many recreational drugs because milligram/microgram-scale effective doses can’t be accurately measured without special tools. Illicit drugs generally aren’t sold in standard packaging listing the drug(s) and dosage. This contributes to the <a href="https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates">over 100,000 US drug overdose deaths per year</a>.</p>

<p><a href="https://en.wikipedia.org/wiki/Coca_tea">Coca tea</a>, <a href="https://en.wikipedia.org/wiki/Poppy_tea">poppy tea</a>, <a href="https://en.wikipedia.org/wiki/Mushroom_tea">mushroom tea</a>, and <a href="https://en.wikipedia.org/wiki/Cannabis_edible#Drink">cannabis drinks</a> are existing incarnations of this idea.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>As a salt. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>When emulsified. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="policy" /><category term="drugs" /><summary type="html"><![CDATA[Caffeine and alcohol are the most commonly consumed psychoactive drugs in the world. They are most often consumed as beverages, which has a few benefits: dosing: It’s easy to judge how much you have consumed. moderation: It’s physically difficult to consume a lethal dose. hydration: Users consume water alongside the drugs.]]></summary></entry><entry><title type="html">100 replicas</title><link href="https://kerrickstaley.com/2025/01/03/100-replicas" rel="alternate" type="text/html" title="100 replicas" /><published>2025-01-03T00:00:00+00:00</published><updated>2025-01-03T00:00:00+00:00</updated><id>https://kerrickstaley.com/2025/01/03/100-replicas</id><content type="html" xml:base="https://kerrickstaley.com/2025/01/03/100-replicas"><![CDATA[<p>I have a thought experiment I do when I’m uncertain about a life decision: I imagine I have 100 replicas of myself that I could send off into the world to explore the branching life paths I could take. Where would these 100 replicas live? What vocations or lifestyles would they pursue?</p>

<p>For instance, I recently accepted an offer for a team at <a href="https://www.anthropic.com/">Anthropic</a> that is distributed across San Francisco, New York, and Seattle, and I have the flexibility to live and work in any of those cities. Where should I choose to live? San Francisco is the center of mass of Anthropic and has great nature and weather, New York is where I and most of my friends currently are, and Seattle also has great nature and is lower-tax/lower-cost-of-living.</p>

<p>If I had a “portfolio” of 100 of me and I could allocate them to these different strategies, I’d probably have 45 move to the San Francisco, 35 move to Seattle, 15 stay in New York, and 5 would renege the Anthropic offer and start a startup or pursue a PhD or travel the world or make art. These numbers are from my gut, but what I’m trying to maximize is the “return” in the sense of expected happiness for the whole portfolio.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>I only have one of me, so it makes most sense to do the modal thing, and I’m in fact planning to move to San Francisco in a few months. (The 45 replicas in San Francisco would in turn make different sub-decisions about which neighborhood to live in, whom to live with, etc.)</p>

<p>This exercise helps you think through decisions, and moreover acknowledges the paths you didn’t take which helps with FOMO (at least for me). Yes, one of the hundred would move abroad and pursue a singing career, but you can take only one route, and generally the mode is the best.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>There is a way to put this into more precise mathematical terms but I struggled to write it up succinctly. If there is non-zero demand I will do a follow-on post :) <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="decision-theory" /><category term="philosophy" /><category term="economics" /><summary type="html"><![CDATA[I have a thought experiment I do when I’m uncertain about a life decision: I imagine I have 100 replicas of myself that I could send off into the world to explore the branching life paths I could take. Where would these 100 replicas live? What vocations or lifestyles would they pursue?]]></summary></entry><entry><title type="html">homegit</title><link href="https://kerrickstaley.com/2023/11/24/homegit" rel="alternate" type="text/html" title="homegit" /><published>2023-11-24T00:00:00+00:00</published><updated>2023-11-24T00:00:00+00:00</updated><id>https://kerrickstaley.com/2023/11/24/homegit</id><content type="html" xml:base="https://kerrickstaley.com/2023/11/24/homegit"><![CDATA[<p>I use a system I call <strong>homegit</strong> to manage config files and scripts in my home directory on all my machines. The idea is simple: a Git repository rooted at <code class="language-plaintext highlighter-rouge">~</code> that <a href="https://github.com/kerrickstaley/homedir">I push to GitHub</a>. I’ve used this system for 6 years and like it a lot. See below to set it up for yourself!</p>

<p>The core functionality comes from 3 lines in my <code class="language-plaintext highlighter-rouge">~/.bashrc</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>homegit() {
    git --git-dir=$HOME/.homegit --work-tree=$HOME "$@"
}
</code></pre></div></div>

<p>This defines a command <code class="language-plaintext highlighter-rouge">homegit</code> that works like regular <code class="language-plaintext highlighter-rouge">git</code> except that it saves its state in <code class="language-plaintext highlighter-rouge">~/.homegit</code> (as opposed to <code class="language-plaintext highlighter-rouge">~/.git</code>). Having a special command helps avoid accidents, like thinking I’m in a project repo, running <code class="language-plaintext highlighter-rouge">git clean -dfx</code>, and deleting everything in my home. I also use <code class="language-plaintext highlighter-rouge">git_prompt_info</code> in my Zsh <code class="language-plaintext highlighter-rouge">PS1</code> to show my current Git branch, and by using <code class="language-plaintext highlighter-rouge">--git-dir=$HOME/.homegit</code> I avoid seeing branch information all the time.</p>

<p>I also define this command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>homegit-private() {
    git --git-dir=$HOME/.homegit-private --work-tree=$HOME "$@"
}
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">homegit-private</code> works just like <code class="language-plaintext highlighter-rouge">homegit</code>, with the same Git toplevel dir (!) The difference is that I push the contents to a private repo. This allows me to version things like my SSH config and easily keep it in sync across devices. It’s also where I put anything that isn’t the same between my home and work setups.</p>

<p>When I want to pull my dotfiles onto a new device, I run these commands:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone git@github.com:kerrickstaley/homedir
mv homedir/.git ~/.homegit
git --git-dir=$HOME/.homegit diff  # check that the next command won't overwrite anything
git --git-dir=$HOME/.homegit reset --hard HEAD
</code></pre></div></div>

<p>The last piece that helps with all this is the <a href="https://github.com/kerrickstaley/homedir/blob/main/bin/runningon"><code class="language-plaintext highlighter-rouge">runningon</code> command</a>. This allows me to put conditional blocks in my rc files like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Default to Gnu binaries on macOS.
if runningon macos; then
    export PATH="/opt/homebrew/opt/coreutils/libexec/gnubin:$PATH"  # coreutils
    export PATH="/opt/homebrew/opt/grep/libexec/gnubin:$PATH"       # grep
    export PATH="/opt/homebrew/opt/gnu-sed/libexec/gnubin:$PATH"    # sed
    export PATH="/opt/homebrew/opt/gnu-tar/libexec/gnubin:$PATH"    # tar
fi
</code></pre></div></div>
<p>That way, I can use the same rc files on both macOS and Linux, at both home and work.</p>

<p>If you want to use this system yourself, you just need to copy <a href="https://github.com/kerrickstaley/homedir/blob/5638b40e250645af0d449dd025c2c175f2a4ba35/.bashrc#L7-L13">these 7 lines</a> into your <code class="language-plaintext highlighter-rouge">.bashrc</code> or <code class="language-plaintext highlighter-rouge">.zshrc</code>, and copy the <code class="language-plaintext highlighter-rouge">runningon</code> script into your <code class="language-plaintext highlighter-rouge">~/bin</code> and modify the list of hostnames.</p>

<h4 id="update-2023-11-25">Update 2023-11-25:</h4>
<p><a href="https://vcs-home.branchable.com/">The vcs-home wiki page</a> links to many other approaches to this idea.</p>

<p>To avoid messy <code class="language-plaintext highlighter-rouge">git status</code> output, I’ve run</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>homegit config status.showUntrackedFiles no
homegit-private config status.showUntrackedFiles no
</code></pre></div></div>
<p>on all my machines as suggested in <a href="https://news.ycombinator.com/item?id=11071754">this HN post</a>.</p>]]></content><author><name></name></author><category term="sysadmin" /><category term="utilities" /><category term="git" /><summary type="html"><![CDATA[I use a system I call homegit to manage config files and scripts in my home directory on all my machines. The idea is simple: a Git repository rooted at ~ that I push to GitHub. I’ve used this system for 6 years and like it a lot. See below to set it up for yourself!]]></summary></entry><entry><title type="html">Timeline of a Train Departure</title><link href="https://kerrickstaley.com/2022/11/26/timeline-of-a-train-departure" rel="alternate" type="text/html" title="Timeline of a Train Departure" /><published>2022-11-26T00:00:00+00:00</published><updated>2022-11-26T00:00:00+00:00</updated><id>https://kerrickstaley.com/2022/11/26/timeline-of-a-train-departure</id><content type="html" xml:base="https://kerrickstaley.com/2022/11/26/timeline-of-a-train-departure"><![CDATA[<p>Mid-morning last Sunday, I took a <a href="https://www.panynj.gov/path/en/index.html">PATH train</a> from <a href="https://goo.gl/maps/zE9eX2tQ2ooDi9Fk7">Hoboken station</a> to <a href="https://goo.gl/maps/c81AFeRpviPamjnS8">Christopher Street station</a> en route to a coffee shop to hack on projects with a friend.</p>

<p>I enjoy micro-optimizing things, so I have a <a href="/2022/02/25/transit-panel">device that tells me when to leave my apartment to catch the train</a>. At 10:53 AM, it told me to leave, so I did; it was lining me up for a train scheduled to depart at 11:01 AM. The station is 10 minutes away walking but the transit panel thought the train was running 2 minutes late.</p>

<p>At the station I waited about 2 minutes for the train to appear and another 2 for it to leave once I boarded. 4 minutes <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> of precious time which could have been spent blogging was instead wasted waiting in the near-freezing station. What happened—why did my transit panel mislead me so?</p>

<p><a href="https://en.wikipedia.org/wiki/Garbage_in,_garbage_out">A system’s output is only as good as the input you feed to it</a>, and it turns out the transit panel was itself misled by the API it was using to get realtime PATH departures. That API had an overly-optimistic view of how late the train was running. Let’s look at the data!</p>

<p>Here’s a graph of how many minutes were left before the train departed according to 4 sources, in the 10-odd minutes leading up to departure:</p>

<p style="display: flex; justify-content: center">
<img src="/images/timeline-of-a-train-departure/minutes_to_departure.png" width="900px" />
</p>

<p>The gray line is how much time was left until departure according to the train schedule. By “departure” I mean when the doors closed. The green line uses data from an <a href="https://github.com/mrazza/path-data">API made by Matt Razza</a>.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> The blue line uses data from an “official” API.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> The orange line is the actual time left (as if we had a crystal ball which knew the exact departure time). The dashed red line is the departure.</p>

<p>This graph is easier to interpret if you instead look at how late the sources predict the train will be:</p>

<p style="display: flex; justify-content: center">
<img src="/images/timeline-of-a-train-departure/lateness.png" width="900px" />
</p>

<p>The train was ultimately 3m32s late, but intially the two APIs estimated it at 1m48s late. As the actual departure time drew closer, the APIs became more accurate and eventually overshot, predicting it would leave 13 seconds after it did.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>

<p>To catch the train I needed to leave 10 minutes before departure. Here’s what that looks like superimposed on the first graph:</p>

<p style="display: flex; justify-content: center">
<img src="/images/timeline-of-a-train-departure/minutes_to_departure_with_leave_time_plus_intercepts.png" width="900px" />
</p>

<p>The intersections of the magenta line with the 4 timelines are the times I should leave my apartment according to these 4 sources:</p>

<table>
  <tr>
    <th>source</th>
    <th>when to leave according to source</th>
  </tr>
  <tr>
    <td>schedule</td>
    <td>10:51:00</td>
  </tr>
  <tr>
    <td>"official" API</td>
    <td>10:52:46</td>
  </tr>
  <tr>
    <td>mrazza API</td>
    <td>10:53:46</td>
  </tr>
  <tr>
    <td>ground truth</td>
    <td>10:54:32</td>
  </tr>
</table>

<p>Following the schedule, I would have left 3m32s too early. Since my transit panel uses the “official” API, I was able to save half of that. If mrazza’s API—which gives better data (see below) but is down—were available, I could have saved another minute. And if the API made perfect predictions, it would have saved the remaining 46 seconds.</p>

<p>Squinting at that second graph, the “official” API looks an awful lot like a lagged version of Matt Razza’s API. Indeed, if we scoot it ahead 80 seconds, we see that it aligns pretty well:</p>

<p style="display: flex; justify-content: center">
<img src="/images/timeline-of-a-train-departure/lateness_with_official_scooted_forward.png" width="900px" />
</p>

<p>I hypothesize that the same data source feeds both the mrazza and “official” APIs (<a href="https://medium.com/@mrazza/programmatic-path-real-time-arrival-data-5d0884ae1ad6">Matt’s blog post</a> has more information about the data infra that PATH uses on the backend) and something in the “official” API adds a random delay of 70 to 90 seconds. So if you’re looking at <a href="https://www.panynj.gov/path/en/index.html">the real-time departures on PATH’s website</a>, the info you’re looking at is a little stale compared to what the backend actually knows.</p>

<p>That’s all for now,<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">5</a></sup> but maybe in a future episode I’ll revisit the data that we get from PATH’s API. The Jupyter notebook where I did this analysis is <a href="https://github.com/kerrickstaley/transit-panel/blob/main/analysis/Predicted%20vs%20actual%20departure.ipynb">here</a>. The scraper that recorded the data is <a href="https://github.com/kerrickstaley/transit-panel/blob/main/scripts/api_compare.py">here</a>. To record the train departure time I used a simple Google Form (Google Forms records the timestamp when you submit a form).</p>

<p><a href="https://news.ycombinator.com/item?id=33755892">Discussion on Hacker News</a></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>There’s a 2-minute buffer baked into the 10 minute walk time, so I only waited an extra 2 minutes here.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">6</a></sup> You maybe noticed that I waited 4 minutes but the API’s error only explained ~2 minutes. Can’t let a technicality interrupt my narrative flow. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>mrazza’s API is currently down so I hosted it locally to collect data, but my transit panel doesn’t use it. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>The “official” API is at <a href="https://www.panynj.gov/bin/portauthority/ridepath.json">https://www.panynj.gov/bin/portauthority/ridepath.json</a>. I’m using scare quotes because I don’t think this API is meant for public consumption: it’s made for displaying <a href="https://www.panynj.gov/path/en/index.html">realtime info on panynj.gov</a>. It lacks an <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin"><code class="language-plaintext highlighter-rouge">Access-Control-Allow-Origin: *</code> header</a> and so I have to run a proxy on my Raspberry Pi in order to access it from my transit panel web app. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Maybe the APIs use some other definition of “departure”? I think “the time the doors close” is the definition they should be using because it’s the most relevant to me as a rider, but maybe they use “the time the train starts moving”? <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Here’s a bonus graph, showing lateness like the 2nd graph but adding the magenta line. The magenta line has a slope of positive 1 in these transformed coordinates: <img src="/images/timeline-of-a-train-departure/lateness_with_leave_time_plus_intercepts.png" width="900px" /> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>I’ve probably spent more time working on the transit panel and blogging about it than it will ever save me, but eh, you have to find something to spend your time on. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="home-improvement" /><category term="public-transit" /><category term="data-science" /><summary type="html"><![CDATA[Mid-morning last Sunday, I took a PATH train from Hoboken station to Christopher Street station en route to a coffee shop to hack on projects with a friend.]]></summary></entry><entry><title type="html">Transit Panel</title><link href="https://kerrickstaley.com/2022/02/25/transit-panel" rel="alternate" type="text/html" title="Transit Panel" /><published>2022-02-25T00:00:00+00:00</published><updated>2022-02-25T00:00:00+00:00</updated><id>https://kerrickstaley.com/2022/02/25/transit-panel</id><content type="html" xml:base="https://kerrickstaley.com/2022/02/25/transit-panel"><![CDATA[<p>I live in <a href="https://en.wikipedia.org/wiki/Jersey_City,_New_Jersey">Jersey City</a> and many of my days start with a short commute into New York City on the <a href="https://www.panynj.gov/path/en/index.html">PATH train</a>. On weekdays, trains run frequently, but for weekend visits to <a href="https://fatcatfablab.org/">Fat Cat Fab Lab</a> or <a href="https://www.vitalclimbinggym.com/brooklyn-gym">Vital</a>, trains come every 12 minutes. Waiting 6 minutes on average isn’t bad, but not knowing how long I’ll wait bugs me.</p>

<p>To make my life a little easier, I built a web app that runs on a tablet by my front door that tells me when I should leave my apartment to catch the next train. This is what it looks like:</p>

<p style="display: flex; justify-content: center">
<img src="/images/transit-panel/transit-panel-mounted.jpg" width="500px" />
</p>

<p>I’ve been using it for 3 months and I’m really happy with it. It feels great to walk at a leisurely pace to the station and show up 2 minutes before the train leaves, every time. Instead of waiting in a cold station I can wait in my warm apartment.</p>

<p style="display: flex; flex-direction: column; align-items: center">
<img src="/images/transit-panel/2-minutes-before-departure.jpg" width="500px" />
<em>2 minutes before departure, train waiting in station</em>
</p>

<p>The rest of this post will explain how I made this thing, which I call a transit panel. It wasn’t hard, and with a bit of HTML/Javascript knowledge and handiness you can make one of your own!</p>

<h2 id="hardware">Hardware</h2>
<p>I used an <a href="https://www.amazon.com/gp/product/B08BX7FV5L">Amazon Fire HD 10 tablet</a>. This tablet is cheap ($150 at time of writing, but frequently on sale for $110 or less) and has a large-ish screen, which is important because I need to be able to read the screen from 10 meters away. The web app isn’t demanding and the system is always plugged in, so specs like processor and battery life don’t matter. The app runs full-time so the lockscreen ads aren’t an annoyance.</p>

<p>To mount it on the wall, I used the <a href="https://www.amazon.com/gp/product/B01BX5YU7Y">Dockem Koala Wall Mount 2.0</a>, which worked well. It can be screwed to the wall or adhered using Command Strips; I chose to use <a href="https://www.homedepot.com/p/E-Z-Ancor-Twist-N-Lock-8-x-1-1-4-in-White-Nylon-Phillips-Flat-Head-75-Medium-Duty-Drywall-Anchors-with-Screws-20-Pack-25210/100140114">these drywall anchors</a> because Command Strips sometimes fall off after several months of use.</p>

<p style="display: flex; flex-direction: column; align-items: center">
<img src="/images/transit-panel/dockem-koala.jpg" width="500px" />
<em>mount with tablet removed</em>
</p>

<p>I routed the power cable using <a href="https://www.monoprice.com/product?p_id=5834">these Monoprice cable clips</a>.</p>

<h2 id="software">Software</h2>
<p>The transit panel runs a simple web app without any frameworks. The <a href="https://github.com/kerrickstaley/transit-panel">source code is here</a>, with most of the logic in <a href="https://github.com/kerrickstaley/transit-panel/blob/main/main.js">main.js</a>. I’m not a JS wizard; don’t judge my code 😅 (but constructive comments are appreciated! Send me a PR).</p>

<p style="display: flex; flex-direction: column; align-items: center">
<img src="/images/transit-panel/transit-panel-ui.png" width="500px" />
<em>currently 3 transit options are supported</em>
</p>

<h3 id="data-source">Data Source</h3>
<p>The app hard-codes schedules for the PATH and ferry (in <a href="https://github.com/kerrickstaley/transit-panel/blob/main/departure_times.js">departure_times.js</a>). I wanted to use real-time data, but NY Waterway ferries don’t have real-time tracking at all, and the PATH has real-time information but they don’t have a Javascript-accessible web API (if you work at PATH, please implement this!).</p>

<p>Luckily, a true internet hero named Matthew Razza runs a <a href="https://github.com/mrazza/path-data">web service</a> that exposes the PATH data in a JSON HTTPS API. A second problem however is that the live PATH data doesn’t have a long enough time horizon. I take 10 minutes to walk to the station, and oftentimes the only departures returned by the API are less than 10 minutes in the future. (This affects everyone, including people using the official website and app—if you work at PATH, please fix this!).</p>

<p>I could combine the approaches, using the live API and falling back to the schedule if there is no data, but for now the hardcoded schedule works well enough.</p>

<h3 id="fonts">Fonts</h3>
<p>I spent a ton of time trying to center the numbers vertically in their rows (e.g. the 4 in “leave in 4 min”). I would tweak the CSS to center it in Firefox, but then I would open it in Chrome on the same computer and it would look different, and the tablet would be different from both.</p>

<p>I thus learned the hard way that if you want your app’s text to have a consistent appearance across platforms, you need to use a font from a font service like Google Fonts instead of relying on the browser’s built-in font library.</p>

<p>It turns out that <a href="https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align">font geometry is complicated</a> and compensating for different fonts in CSS/JavaScript to vertically center text <a href="https://stackoverflow.com/questions/36891362/is-it-possible-to-vertically-center-text-in-its-bounding-box/">is hard</a>. The simplest solution is to use a consistent font and tweak the margins in CSS to make it look like you want.</p>

<h3 id="libraries">Libraries</h3>
<p>I found the <a href="https://moment.github.io/luxon/#/">Luxon</a> Javascript library really helpful for working with date/time values.</p>

<h3 id="browser">Browser</h3>
<p>I used the <a href="https://f-droid.org/en/packages/org.mozilla.fennec_fdroid/">Fennec</a> browser from the F-Droid app store, which is a reskin of Firefox for Android. This was the only browser I found that behaved correctly when full-screened (hiding both the top and bottom system UI bars).</p>

<h2 id="reliability">Reliability</h2>
<p>I’ve kept the tablet running continuously for about 4 months. Once or twice in that period, it’s gotten into a bad state and needed a reboot. I’ve noticed no issues with burn-in on the screen. I’m hoping to get several years of use before the hardware needs replacement.</p>

<h2 id="future-improvements">Future Improvements</h2>
<p>At some point (maybe when it’s warmer) I’m planning to add a row showing bike availability at the nearest <a href="https://citibikenyc.com/homepage">Citi Bike</a> station. Citi Bike has a delightful and easy-to-use REST API. Speaking of weather, I’m also planning to add a row that shows the weather and current time.</p>]]></content><author><name></name></author><category term="home-improvement" /><category term="web" /><category term="javascript" /><summary type="html"><![CDATA[I live in Jersey City and many of my days start with a short commute into New York City on the PATH train. On weekdays, trains run frequently, but for weekend visits to Fat Cat Fab Lab or Vital, trains come every 12 minutes. Waiting 6 minutes on average isn’t bad, but not knowing how long I’ll wait bugs me.]]></summary></entry><entry><title type="html">“Prestudy”: Learning Chinese through Reading</title><link href="https://kerrickstaley.com/2018/09/04/chinese-prestudy" rel="alternate" type="text/html" title="“Prestudy”: Learning Chinese through Reading" /><published>2018-09-04T00:00:00+00:00</published><updated>2018-09-04T00:00:00+00:00</updated><id>https://kerrickstaley.com/2018/09/04/chinese-prestudy</id><content type="html" xml:base="https://kerrickstaley.com/2018/09/04/chinese-prestudy"><![CDATA[<p>In May 2017, I sat and passed the <a href="http://www.chinesetest.cn">Hanyu Shuiping Kaoshi</a> Level 4. The HSK is a Mandarin Chinese proficency test similar to the TOEFL, and passing Level 4 means that one can <a href="http://www.chinesetest.cn/gonewcontent.do?id=677487">“discuss a relatively wide range of topics in Chinese and [is] capable of communicating with Chinese speakers at a high standard”</a>. Just one level higher, at Level 5, you’re supposedly able to “read Chinese newspapers and magazines, [and] watch Chinese films”.</p>

<p>This is a little optimistic. A year ago I could barely bumble through a basic conversation in Mandarin, and any sort of “real” Chinese text was totally inaccessible—I could only read things that were designed for language learners. Continuing to cram vocabulary didn’t seem to help. I couldn’t make it more than a sentence into a newspaper article or book without hitting an unfamiliar word and needing to pull out a dictionary.</p>

<p>This article is about some software I wrote, based on the <a href="https://apps.ankiweb.net/">Anki flashcard app</a>, to help me leapfrog from HSK 4 to reading a real Chinese book. If you’re learning Chinese, you can download and use this software too! Fair warning, it’s still a lot of work. Chinese is hard.</p>

<h2 id="the-three-body-problem">The Three Body Problem</h2>
<p>The novel <em><a href="https://en.wikipedia.org/wiki/The_Three-Body_Problem_(novel)">The Three Body Problem</a></em> (三体 [Sān Tǐ] in
Chinese) is moderately famous in sci-fi circles. The English translation won the <a href="http://www.thehugoawards.org/hugo-history/2015-hugo-awards/">2015 Hugo Award for Best
Novel</a> and to date it is the only novel in translation
to have done so. <a href="https://www.nytimes.com/2017/01/16/books/transcript-president-obama-on-what-books-mean-to-him.html">It’s one of Barack Obama’s favorite books</a>. And <a href="https://www.independent.co.uk/arts-entertainment/tv/news/the-three-body-problem-tv-adaptation-show-amazon-a8278066.html">Amazon was reported in May 2018 to be eyeing film rights to the book</a> for $1 billion USD, in a bid to boost Amazon Prime Video’s original content (no word on whether that did happen).</p>

<p style="display: flex; justify-content: center">
<img src="/images/chinese-prestudy/san-ti-cover.png" height="350px" />
</p>

<p>One of my friends (hi <a href="https://medium.com/@ttommyliu">Tommy</a>!) recommended <em>The Three Body Problem</em> to me, and he also mentioned that the book reads a little more fluidly in the original Chinese. And so I decided to try reading the Chinese version. So far, I’m 45 pages in and actually kinda enjoying it. Which is a start, right?</p>

<h2 id="the-prestudy-technique">The “Prestudy” Technique</h2>
<p>I call the method I’m using to read the book “prestudy”. Here’s how it works:</p>
<ol>
  <li>You come up with a list of vocabulary words you want to learn. Using the 3500 most common words works pretty well.</li>
  <li>You take the first 3 or so pages of the book you’re reading, and you find out which words from (1) are in those pages.</li>
  <li>You generate <a href="https://apps.ankiweb.net/">Anki</a> flashcards for all the words from (2) that you don’t already have flashcards for.</li>
  <li>You study the flashcards and learn the words.</li>
  <li>You read the pages. If you encounter a word you don’t know, you look it up.</li>
  <li>You repeat steps 2-5. You can pipeline it so that you’re studying words from page N+3 on the same day you’re reading page N.</li>
</ol>

<p>With this technique you can read a page or so a day with moderate effort. The bottleneck is acquiring vocabulary; each page will generally have about 5-10 new words (starting from a HSK 4 base), and it’s difficult to memorize more than 5-10 new words per day. It gets gradually easier as you build up a vocabulary base and encounter fewer and fewer new words per page.</p>

<p>The technique also works pretty well for newspaper articles and TV shows (for shows, you’ll want to look for a .srt subtitles file).</p>

<p>This is not a totally new idea. It’s similar to many Chinese textbooks where each chapter presents a text passage and some related vocabulary. The difference is that you can make your own study guide, in the form of Anki flashcards, for anything you want to read.</p>

<h2 id="the-tool">The Tool</h2>
<p>The tool that does all this is an Anki add-on which you can get <a href="https://ankiweb.net/shared/info/882364911">here</a>. You copy/paste in your text (so you’ll need a PDF if it’s a book), enter your target vocabulary size, and select which deck and tags you want to apply to the cards:</p>

<div style="display: flex; justify-content: space-around">
<div style="display: flex; flex-direction: column">
<figure>
<img src="/images/chinese-prestudy/usage_1.png" width="350px" />
<figcaption>1. Paste the text you want to read</figcaption>
</figure>

<figure>
<img src="/images/chinese-prestudy/usage_2.png" width="350px" />
<figcaption>2. Select target vocabulary size</figcaption>
</figure>
</div>

<div style="display: flex; flex-direction: column">
<figure>
<img src="/images/chinese-prestudy/usage_3.png" width="350px" />
<figcaption>3. Select deck and tags</figcaption>
</figure>

<figure>
<img src="/images/chinese-prestudy/usage_4.png" width="350px" />
<figcaption>4. Study!</figcaption>
</figure>
</div>
</div>

<p>It’s <em>only</em> compatible with the beta-channel Anki 2.1, not the stable-channel Anki 2.0. Sorry if this is a dealbreaker for you. I know some people are stuck on 2.0 because certain add-ons only support 2.0. If I have time and Anki 2.1 continues to be stuck in beta, I’ll look at making a 2.0-compatible version.</p>

<p>It also only supports texts with simplified characters. I’ll eventually add support for traditional characters. The silver lining is that when you add a flashcard for a simplified character, you’ll also get a flashcard for the traditional character. It’ll be suspended by default so you’ll have to unsuspend if you want to study it.</p>

<h2 id="the-code">The Code</h2>
<p>All the code behind this is open-source, and it’s split across several components that can be re-used in other projects:</p>
<ul>
  <li><a href="https://github.com/kerrickstaley/Chinese-Prestudy">Chinese-Prestudy</a>: The Anki plugin itself.</li>
  <li><a href="https://github.com/kerrickstaley/Chinese-Vocab-List">Chinese-Vocab-List</a>: A giant YAML file of the 4000+ most common Chinese words (in descending order of frequency), with definitions and example sentences. Plus a small Python wrapper for accessing the data.</li>
  <li><a href="https://github.com/kerrickstaley/chineseflashcards">chineseflashcards</a>: A Python library, built on genanki, for creating Chinese flashcards in Anki.</li>
  <li><a href="https://github.com/kerrickstaley/genanki">genanki</a>: A Python library for creating Anki flashcards.</li>
</ul>

<p>With the exception of genanki, none of these projects is in a very contributor-friendly state. Most of their code isn’t very readable or documented and could use more unit tests. Still, I’d encourage interested persons to dive in and make contributions; I’ll try my best to help you out and make the code more hackable as we go.</p>

<p>This project also leans heavily on a lot of great open-source projects:</p>
<ul>
  <li><a href="https://cc-cedict.org/wiki/">CC-CEDICT</a>: A free (Creative Commons) Chinese/English dictionary.</li>
  <li><a href="https://tatoeba.org/">Tatoeba</a>: Multilingual database of example sentences.</li>
  <li>The <a href="http://www.chinesetest.cn/">Hanyu Shuiping Kaoshi</a> vocab list.</li>
  <li><a href="http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-ch">SUBTLEX-CH</a>: List of Chinese words from most to least common (in spoken usage).</li>
  <li><a href="https://github.com/fxsjy/jieba">jieba</a>: Chinese word segmentation library.</li>
  <li><a href="https://github.com/mozillazg/python-pinyin">pypinyin</a>: Character-to-pinyin library.</li>
</ul>

<h2 id="final-thoughts">Final Thoughts</h2>
<p>Learning this way still demands a <em>lot</em> of perseverence, bordering on masochism, but that’s Chinese. On the bright side, I feel like I’m developing proficiency faster than any point other than when I was in undergrad taking Chinese classes 5 days a week.</p>

<p>Reading better has also lifted my listening and speaking abilities, even though I haven’t spent much time on those recently.</p>

<p>I still have to stop and look up a word every 3 or 4 sentences, which is a pain. I usually use the OCR feature in the <a href="https://play.google.com/store/apps/details?id=com.embermitre.hanping.app.reader.pro">Hanping Chinese Camera</a> app, and on an unsteady train (where I normally read) this gets frustrating fast. I’m working on a solution for this too: a “cheatsheet” that lists all the advanced words so you don’t have to study them with flashcards. But it’s not done yet.</p>

<p>I hope that you find this tool useful on your Chinese learning journey! Feel free to leave feedback on <a href="https://github.com/kerrickstaley/Chinese-Prestudy/issues">GitHub’s issue tracker</a> or by mail to <a href="mailto:k@kerrickstaley.com">k@kerrickstaley.com</a>. Upvotes on <a href="https://news.ycombinator.com/item?id=17914723">Hacker News</a> are also appreciated!</p>]]></content><author><name></name></author><category term="chinese" /><category term="learning" /><category term="anki" /><category term="python" /><category term="reading" /><category term="prestudy" /><summary type="html"><![CDATA[In May 2017, I sat and passed the Hanyu Shuiping Kaoshi Level 4. The HSK is a Mandarin Chinese proficency test similar to the TOEFL, and passing Level 4 means that one can “discuss a relatively wide range of topics in Chinese and [is] capable of communicating with Chinese speakers at a high standard”. Just one level higher, at Level 5, you’re supposedly able to “read Chinese newspapers and magazines, [and] watch Chinese films”.]]></summary></entry><entry><title type="html">Extracting Chinese Hard Subs from a Video, Part 1</title><link href="https://kerrickstaley.com/2017/05/29/extracting-chinese-subs-part-1" rel="alternate" type="text/html" title="Extracting Chinese Hard Subs from a Video, Part 1" /><published>2017-05-29T00:00:00+00:00</published><updated>2017-05-29T00:00:00+00:00</updated><id>https://kerrickstaley.com/2017/05/29/extracting-chinese-subs-part-1</id><content type="html" xml:base="https://kerrickstaley.com/2017/05/29/extracting-chinese-subs-part-1"><![CDATA[<p>I’ve been watching the Chinese TV show 他来了，请闭眼 (<em>Love Me If You Dare</em>). It’s a good show, kinda reminiscent of the BBC series Sherlock, likewise a crime drama centered around an eccentric crime-solving protagonist and a sympathetic sidekick. You should check it out if you’re into Chinese film or are learning Chinese and want something interesting to watch.</p>

<p>I wanted to get a transcript of the episode’s dialog so I could study the unfamiliar vocabulary. Unfortunately, the video files I have only have hard subtitles, i.e. the subtitles are images directly composited into the video stream. After an hour spent scouring both the English- and Chinese- language webs, I couldn’t find any soft subs (e.g. SRT format) for the show.</p>

<p>So I thought it’d be interesting to try to convert the hard subs in the video files to text. For example, here’s a frame of the video:</p>

<p><img src="https://kerrickstaley.com/images/extracting-chinese-subs-part-1/car_scene.png" alt="car scene" /></p>

<p>From this frame, we want to extract the text “怎么去这么远的地方“. To approach this, we’re going to use the <a href="https://github.com/tesseract-ocr/tesseract">Tesseract library</a> and the <a href="https://github.com/openpaperwork/pyocr">PyOCR binding</a> for it.</p>

<p>We could just try throwing Tesseract at it and see what comes out:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pyocr</span>
<span class="kn">from</span> <span class="nn">PIL</span> <span class="kn">import</span> <span class="n">Image</span>

<span class="n">LANG</span> <span class="o">=</span> <span class="s">'chi_sim'</span>

<span class="n">tool</span> <span class="o">=</span> <span class="n">pyocr</span><span class="p">.</span><span class="n">get_available_tools</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">tool</span><span class="p">.</span><span class="n">image_to_string</span><span class="p">(</span><span class="n">Image</span><span class="p">.</span><span class="nb">open</span><span class="p">(</span><span class="s">'car_scene.png'</span><span class="p">),</span> <span class="n">lang</span><span class="o">=</span><span class="n">LANG</span><span class="p">))</span></code></pre></figure>

<p>Running it:</p>

<figure class="highlight"><pre><code class="language-shell_session" data-lang="shell_session"><span class="gp">$</span><span class="w"> </span>python snippet_1.py
<span class="go">
</span><span class="gp">$</span><span class="w"> </span></code></pre></figure>

<p>Hmm, so that didn’t work. What’s happening?</p>

<p>Tesseract requires that you <a href="https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality">clean your input image before you do OCR</a>. Our input image is full of irrelevant background features but Tesseract expects clean black text on a white background (or white on black).</p>

<p>To remove the background image and get just the subtitles, we turn to <a href="http://opencv.org/">OpenCV</a>. The easiest part is cropping the image. We keep a larger left/right border because some frames have more text:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">cv2</span>

<span class="n">TEXT_TOP</span> <span class="o">=</span> <span class="mi">621</span>
<span class="n">TEXT_BOTTOM</span> <span class="o">=</span> <span class="mi">684</span>
<span class="n">TEXT_LEFT</span> <span class="o">=</span> <span class="mi">250</span>
<span class="n">TEXT_RIGHT</span> <span class="o">=</span> <span class="mi">1030</span>


<span class="n">img</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">imread</span><span class="p">(</span><span class="s">'car_scene.png'</span><span class="p">)</span>

<span class="n">cropped</span> <span class="o">=</span> <span class="n">img</span><span class="p">[</span><span class="n">TEXT_TOP</span><span class="p">:</span><span class="n">TEXT_BOTTOM</span><span class="p">,</span> <span class="n">TEXT_LEFT</span><span class="p">:</span><span class="n">TEXT_RIGHT</span><span class="p">]</span>

<span class="n">cv2</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="s">'cropped'</span><span class="p">,</span> <span class="n">cropped</span><span class="p">)</span>
<span class="n">cv2</span><span class="p">.</span><span class="n">waitKey</span><span class="p">(</span><span class="mi">10000</span><span class="p">)</span></code></pre></figure>

<p>The result:</p>

<p><img src="https://kerrickstaley.com/images/extracting-chinese-subs-part-1/car_scene_cropped.png" alt="car scene cropped" /></p>

<p>Now we want to isolate the text. The text is white, so we can mask out all the areas in the image that aren’t white:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">white_region</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">inRange</span><span class="p">(</span><span class="n">cropped</span><span class="p">,</span> <span class="p">(</span><span class="mi">200</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">200</span><span class="p">),</span> <span class="p">(</span><span class="mi">255</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">255</span><span class="p">))</span></code></pre></figure>

<p>This uses the <a href="http://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#inrange">OpenCV <code class="language-plaintext highlighter-rouge">inRange</code> function</a>. <code class="language-plaintext highlighter-rouge">inRange</code> returns a value of 255 (pure white in an 8-bit grayscale context) for pixels where the red, blue, and green components are all between 200 and 255, and 0 (black) for pixels that are outside this range. This is called <em>thresholding</em>. Here’s what we get:</p>

<p><img src="https://kerrickstaley.com/images/extracting-chinese-subs-part-1/car_scene_white_region.png" alt="car scene white region" /></p>

<p>A lot better! Let’s run Tesseract again:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">extracted_text</span> <span class="o">=</span> <span class="n">tool</span><span class="p">.</span><span class="n">image_to_string</span><span class="p">(</span><span class="n">Image</span><span class="p">.</span><span class="n">fromarray</span><span class="p">(</span><span class="n">white_region</span><span class="p">),</span> <span class="n">lang</span><span class="o">=</span><span class="n">LANG</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">extracted_text</span><span class="p">)</span></code></pre></figure>

<p>And Tesseract returns (drumroll…):</p>

<figure class="highlight"><pre><code class="language-plaintext" data-lang="plaintext">′…′二′′′'′ 怎么去逯么远的地方 '/′</code></pre></figure>

<p>Now we’re getting somewhere! Several areas in the background are white, so when we pass those through to Tesseract it interprets them as assorted punctuation. Let’s strip out these non-Chinese characters using the built-in <a href="https://docs.python.org/3/library/unicodedata.html">Python unicodedata library</a>:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">unicodedata</span>

<span class="n">chinese_text</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">extracted_text</span><span class="p">:</span>
  <span class="k">if</span> <span class="n">unicodedata</span><span class="p">.</span><span class="n">category</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">==</span> <span class="s">'Lo'</span><span class="p">:</span>
    <span class="n">chinese_text</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="n">chinese_text</span> <span class="o">=</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">chinese_text</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="n">chinese_text</span><span class="p">)</span></code></pre></figure>

<p>The <code class="language-plaintext highlighter-rouge">'Lo'</code> here is one of the <a href="https://en.wikipedia.org/wiki/Unicode_character_property#General_Category">General Categories that Unicode assigns to characters</a> and stands for “Letter, other”. It’s good for extracting East Asian characters. From this code we get:</p>

<figure class="highlight"><pre><code class="language-plaintext" data-lang="plaintext">二怎么去逯么远的地方</code></pre></figure>

<p>There are two mistakes here: a spurious 二 character on the front, and a mismatched character in the middle (that 逯 should be 这). Still, not bad!</p>

<p>That’s all for now, but in Part 2 (and maybe Part 3?) of this post series I’ll discuss how we can use some more advanced techniques to perfect the above example and also handle cases where extracting the text isn’t so straightforward. If you can’t wait until then, the <a href="https://github.com/kerrickstaley/extracting-chinese-subs/tree/master">code is on GitHub</a>.</p>

<p>If you have any comments about this post, <a href="https://news.ycombinator.com/item?id=14440849">join the discussion on Hacker News</a>, and if you enjoyed it, please upvote on HN!</p>]]></content><author><name></name></author><category term="ocr" /><category term="python" /><category term="opencv" /><category term="chinese" /><summary type="html"><![CDATA[I’ve been watching the Chinese TV show 他来了，请闭眼 (Love Me If You Dare). It’s a good show, kinda reminiscent of the BBC series Sherlock, likewise a crime drama centered around an eccentric crime-solving protagonist and a sympathetic sidekick. You should check it out if you’re into Chinese film or are learning Chinese and want something interesting to watch.]]></summary></entry></feed>