Tuesday, June 02, 2026

Generating Podcast Transcripts with Whisper

In my previous post, I wrote about building my own LLM Wiki. I definitely need to include Avocado Toast, the podcast I co-host. The problem is that podcast content is mostly audio, while all the skills I have written are built to process Markdown-based text. So how do I include audio?

After discussing this with ChatGPT and Claude, the solution was to use OpenAI’s open-source Whisper model to generate transcripts from audio files. On macOS, I only need to install Whisper with brew install whisper-cpp, then call whisper-cli to process audio files. It sounds simple, but in practice there are always pitfalls.

The large-v3 Model Can Produce Loops

Whisper currently has two mainstream models: large-v3 and large-v3-turbo. The former is larger and slower, but has better output quality. The latter is smaller and 2 to 3 times faster, but its output is a little less accurate. I asked Claude Code to write the code that actually calls whisper-cli and test both models. It found that large-v3 was indeed better, so we chose that first, even though processing each podcast episode took about 30 minutes.

After processing a few episodes, Claude Code noticed a loop problem in the transcripts. That means a sentence appeared only once in the audio, but the model heard it multiple times, so the same line appeared repeatedly in the transcript. If Claude Code hadn’t been watching the transcript output for me, I definitely would not have noticed this myself. There is no practical way to manually read through hundreds or thousands of transcript lines. After noticing the problem, it explained to me that larger models are more likely to produce loops and suggested trying large-v3-turbo. After downgrading, the problem did go away.

In the end, Claude Code and I made a decision: process each episode with large-v3 first, then analyze the transcript after it finishes. If the same line appears 5 or more times in a row, treat that as a loop and regenerate the transcript with large-v3-turbo. We used this method to process the remaining audio files. About one third of the transcripts had loops and needed to be regenerated.

Whisper Heard My Name as “Kat”

For episodes I recorded, the opening always includes a line like “大家好,我是 Cat” (“Hello, this is Cat”), and Whisper would hear “Cat” as “Kat”. I asked Claude Code what to do. It found that Whisper accepts a text prompt, so I wrote the core podcast information into the prompt, including the correct spelling of my name:

《牛油果烤面包》播客聊科技发展趋势,聊各行业来龙去脉。我们坐标硅谷,邀请第一线的资深专家分享给大家听!主持人:Cat、斯图亚特、Sean、Vindy、David。

With this prompt, Whisper became more likely to recognize my name correctly, although it is still not perfect.

Since I can provide a text prompt, I also append each episode’s description to the prompt. That way, when Whisper hears related information, it has a better chance of recognizing it correctly. I took a rough look at the results, and they seem pretty good.

Wednesday, May 20, 2026

Pitfalls I Hit Writing LLM Skills

I recently started building my own LLM Wiki. My document structure is different enough from existing examples that I had to write my own skills from scratch. (I also just like tinkering.) Along the way I hit a few pitfalls worth documenting. These may not stay relevant for long — LLM capabilities move fast, and what requires a workaround today might just work in six months.

LLMs Make Mistakes Copying Semantic-Free Text

I track which blog posts have changed and need their summaries regenerated by computing a SHA256 hash of each post in JavaScript, writing it into the summary document’s front matter, and regenerating only when the hash doesn’t match. The skill takes the computed hash from JavaScript, gives it to the LLM and asks it to copy the hash into the front matter. Every so often, it would copy the hash with one character wrong.

I asked Claude Code why this happens. It explained that LLMs are good at predicting the next token based on context, but a SHA256 hash has no semantic content — it’s effectively random characters. With no meaning to latch onto, the LLM occasionally produces a wrong character. Claude Code updated the skill to strongly emphasize that the hash must be copied exactly. That reduced the errors. Eventually I’ll rework the flow to have JavaScript generate the front matter with the hash already in place and have the LLM fill in only the content — so there’s nothing to copy wrong.

LLMs Have No Sense of Time Passing

My skill writes the current UTC timestamp into the document front matter. I ran it once, waited an hour, ran it again, and found it had written the same timestamp as before.

An LLM running in the same conversation remembers the current time from when it first looked it up and doesn’t check again. Once it has a timestamp, it treats that as the current time indefinitely — like a stopped clock. To fix this, I had the skill explicitly call a script to get the timestamp instead of asking the LLM to produce it. Since I’m already using JavaScript in the skill, I just have the LLM run:

node -e 'console.log(new Date().toISOString())'

Self-Contradiction and Over-Cleverness

I wrote an interactive skill that finds candidate concept pairs in my wiki that could be merged (synonyms, related concepts, or parent-child relationships) and asks me whether to merge them and in which direction. The final call is mine.

The skill presents a 4-option menu for each pair:

  • Merge A into B
  • Merge B into A
  • Dismiss (don’t suggest merging A and B again)
  • Skip (suggest again next time)

I also asked the LLM to recommend a direction if it could decide, placing that option first with a “(Recommended)” label. So the expected output looks like:

  • Merge B into A (Recommended)
  • Merge A into B
  • Dismiss
  • Skip

What I actually got was:

  • Dismiss (Recommended)
  • Merge B into A
  • Merge A into B
  • Skip

That’s self-contradictory. The skill instructs the LLM to surface only pairs worth merging. If it recommends not merging, it’s undermining its own earlier judgment.

Even funnier: sometimes the LLM would bundle two pairs together and present options like:

  • Skip both (neither A+B nor C+D)
  • Review each pair individually

I went back and had Claude Code tighten the constraints in the skill prompt. The behaviors went away. I still don’t understand exactly why more constraints help, or at what point adding more constraints starts degrading the quality of the skill’s output.

Takeaways

All of these happened on Sonnet 4.6. Whether the same issues occur on Opus 4.7, I don’t know. That’s the frustrating thing about LLM pitfalls — you can never be sure if a problem is model-specific or version-specific. A fix that needs to live at the harness level today might be unnecessary six months from now. Whether this post ages well is genuinely unclear.

Saturday, April 08, 2023

Strongly Typed String Literal Split-Map-Join in TypeScript

The Problem: Write a strongly typed TypeScript function that takes a valid CSS property name in kebab-case (used in CSS) and return the same property name in camelCase (usually used in CSS-in-JS). For example, call this function with "font-size" and it should return "fontSize".

The Strongly Typed String Literal Requirement: A TypeScript type that contains all valid kebab-case CSS property names is provided. Make sure TypeScript can infer to correct camelCase output from this function. To explain it in code:

type kebabCasePropertyName =
  | 'align-content' 
  | 'align-items'
  | 'align-self'
  | 'background'
  | 'background-attachment'
  | 'background-color'
  | 'background-image'
  /* remaining valid property names */

function convert(propertyName: kebabCasePropertyName) /*: define return type */ {
  /* implement function */
}

const camelCasePropertyName = convert('align-content');
// typeof camelCasePropertyName should be 'alignContent'

convert('invalid-property-name');
// TypeScript should throw a compile time error

Here is a TypeScript Playground with the same code. If you want to try solving this problem yourself, go ahead and try it out. You may find a solution better than the one I’m going to share below.

The Solution: This is the TypeScript Playground with the solution code. Now we can go into understanding how it works.

type Split<S, D>

This generic type splits string literal S with string literal delimiter D.

type Split<S extends string, D extends string>
  = string extends S ? Array<string>
  : S extends '' ? []
  : S extends `${infer T}${D}${infer U}` ? [T, ...Split<U, D>]
  : [S];

Here S extends string means S has to be a string. The more precise definition is S has to be a subset of all possible string values. The same applies to D, so S and D have to be strings or TypeScript will throw a compile time error.

The next few lines use the ternary conditional operator several times to pattern match different possible types of S. The following pseudo-code may help you follow the logic:

type Split<S extends string, D extends string>
if (string extends S) then return Array<string>
else if (S extends '') then return []
else if (S extends `${infer T}${D}${infer U}`) then return [T, ...Split<U, D>]
else return [S];

First, we try to match string extends S. If it passes, that means S isn’t a string literal. (Previously we already knew S is a subset of string. If string is also a subset of S then S is exactly string. Nothing more. Nothing less.) It’s a string and its value is unknown at compile time. There’s nothing we can do here. Split<S, D> can only be narrowed down to Array<string>.

Then we try to match S extends ''. It just means S is an empty string because the subset of empty string is just an empty string. Then we can narrow Split<S, D> down to an empty array.

And then we try to match S extends${infer T}{$D}${infer U}`. There are two concepts we need to understand here:

  1. TypeScript template literal types. When using JavaScript template literals, TypeScript can infer all the possible string interpolation outcomes.
  2. The infer keyword. It can only be used after the extends keyword. It can be used to deconstruct a type that’s constructed from other types.

So here we try to deconstruct S into template literal type `${infer T}${D}${infer U}`. For example, Split<'hello-world', '-'> has D extends '-', so it can be deconstructed into T extends 'hello' and U extends 'world', because ${T}${D}${U} will construct 'hello-world'. By using infer, we ask TypeScript to figure out T and U for us.

If the deconstructing works. we can narrow down Split<S, D> into [T, ...Split<U, D>]. This is very similar to how we would implement a JavaScript split function with recursion:

function split(string, delimiter) {
  const index = string.indexOf(delimiter);
  return index >= 0 
    ? [
      string.substring(0, index),
      ...split(string.substring(index + 1), delimiter)
    ]
    : [string];
}

If the deconstructing doesn’t work, the last line in the pseudo-code is just like the last line inside the JavaScript above. It means S is a string literal but it doesn’t contain D, so we return [S]. We can see the similarity between JavaScript and TypeScript type expressions.

type Join<A, D>

This is like reversing Split<S, D>, in a very similar recursive manner. `${T}${D}${Join<U, D>}` represents that recursion.

type Join<A extends Array<string>, D extends string>
  = A extends [] ? ''
  : A extends [infer T extends string] ? `${T}`
  : A extends [infer T extends string, ...infer U extends Array<string>] ? `${T}${D}${Join<U, D>}`
  : string;

Split<S, D> requires S and D to be string literals. Join<A, D> requires A to be an array literal and all of its elements are string literals. If A doesn’t satisfy these requirements, we can only narrow Join<A, D> down to string.

Here we use template literal type `${T}${D}${Join<U, D>}` to construct one string type from multiple string types. This is the opposite operation of how we deconstruct in Split<S, D>.

type LowercaseArray<A>>

Again we are using recursion to iterate through an array. This is similar to Join<A, D>. However, we don’t return a template literal type. We return a new array type that contains new string literal types.

type LowercaseArray<A extends Array<string>>
  = A extends [] ? []
  : A extends [infer T extends string, ...infer U extends Array<string>] ? [Lowercase<T>, ...LowercaseArray<U>]
  : A;

TypeScript has a built-in Lowercase<T> that returns the string literal type of the lower case of another string leteral type. We don’t need to do this ourselves.

type CapitalizeArray<A>

It’s very similar to LowercaseArray<A>. We use the built-in Capitalize<T> to capitalize the first letter of a string literal type.

type CapitalizeArray<A extends Array<string>>
  = A extends [] ? []
  : A extends [infer T extends string, ...infer U extends Array<string>] ? [Capitalize<Lowercase<T>>, ...CapitalizeArray<U>]
  : A;

type CamelCaseArray<A>

camelCase has first word in all lowercase and subsequent words in lowercase with first character capitalized. This can be achieved by combining LowercaseArray<A> and CapitalizeArray<A> into a new type CamelCaseArray<A>.

In the end, we can combine this type of Split<S, D> and Join<A, D> to create CamelCase<S>. It’s just like how we would implement this as a JavaScript function: a chained split-map-join operation.

function convert(propertyName) {
  return propertyName
    .split('-')
    .map((word, index) => index === 0
      ? word
      : `${word.charAt(0).toUpperCase()}${word.substring(1)}`)
    .join('');
}

How about the opposite operation? How can we create a TypeScript generic type that converts camelCase back to kebab-case? That’s an exercise for you. There’s no clear delimiter like '-' in this operation. Think about how you would do it in JavaScript and use pattern matching in TypeScript to achieve the same result.

Friday, June 19, 2020

Job Promotion: Scaling-bound or Opportunity-bound?

This is my thought after reading a conversation on how promotion to some levels is harder than some other levels in Facebook. The promotion is easier when the next level is the same job as your current level but a more mature version. It’s harder when the next level is a different job that requires new skills.

Based on this I would break down promotion requirements into two categories:

  1. Scaling your current skills.
  2. Learning and practicing new skills.

Promotions that mostly require the first category are the easier ones. The next level is more or less the same job. Let’s call it scale-bound promotion. Promotions that heavily involves the second category are the harder ones. The next level is like a different job. I’ll call it opportunity-bound promotion.

The first step to optimize your promotion is to identify whether your next level is scale-bound or opportunity-bound. In the conversation that I read about, people divided Facebook levels into buckets – [3, 4], [5], [6, 7], [8, 9] – and it’s the same job in the same bucket. Crossing the buckets requires you to learn and practice new skills because it’s a different job.


Opportunity-bound promotion is harder because you need opportunities to learn and practice new skills that are required by your next level while your current level doesn’t provide such opportunities. Usually, those opportunities come to you naturally when you are at your next level. This becomes a chicken and egg problem – you are not at the next level so you don’t get opportunities required by your next level.

Even if the promotion to your next level is opportunity-bound, you need to deal with the scale-bound part well first. Otherwise, when someone offers you an opportunity you might fail because your current skills don’t scale enough. That person will regret offering you the opportunity. The next opportunity will be harder to come by. It’s better to make sure your current skills are mostly scaled to meet the expectation of your next level first.


When you are confronted with an opportunity-bound situation, the best-case scenario is having a great manager. The manager should connect you with the right opportunities to develop the skills you don’t have. These opportunities should push you out of your comfort zone but not too far away. That’s the tailwind setup. You are in good hands.

The meh scenario is when the team is functioning okay but your manager isn’t actively optimizing the part of the team that you are in. That means your manager isn’t actively helping you with new opportunities. This scenario is common when your manager is too aggressive and spread himself too thin. It’s also common when your manager is complacent and doesn’t want to further develop the team.

You need to have better than average (comparing to your peers at your level) soft skills to build relationships with a broader group of people in your company. Learn about the broader business your team is in and discover opportunities yourself. You might also need to perform better than your peers when there are fewer opportunities than the people who are qualified.

The worst-case scenario is when the team is malfunctioning. People don’t know what they are supposed to do or what they can do to meaningfully help the team.

It’s like war. There are casualties. People can’t perform well because the team is broken but they got fired anyway. People who don’t get fired will realize the situation they are in and might jump ship as fast as they could. Interestingly, war heroes only emerge from the war. Field promotion can fast track you at a speed that’s not achievable in peacetime. Opportunities open up when there are casualties and deserters.

Tuesday, November 26, 2019

Front-End Learning for Programmer

If you are an existing programmer (fluent in one common programming language) and want to learn Front-End (HTML+CSS+JS), I would recommend using freeCodeCamp and picking only the modules you need. If you want to learn just enough to work on modern front-end projects or start a new project with Create React App, below is what modules I think you should learn on freeCodeCamp.

“✔️” means you should learn it. “❌” means you could skip it. “❗” means you could skip if you are in a hurry but you should learn if you have time.

  • Responsive Web Design
    • ✔️ Basic HTML and HTML5
    • ✔️ Basic CSS
    • ✔️ Applied Visual Design
    • ❌ Applied Accessibility
    • ❌ Responsive Web Design Principles
    • ✔️ CSS Flexbox
    • ❌ CSS Grid
    • ✔️ Responsive Web Design Projects
  • JavaScript Algorithms and Data Structures
    • ✔️ Basic JavaScript
    • ✔️ ES6
    • ❌ Regular Expressions
    • ✔️ Debugging
    • ✔️ Basic Data Structures
    • ❗ Basic Algorithm Scripting
      • (Use it as a practice to write more JavaScript code.)
    • ✔️ Object Oriented Programming
    • ✔️ Functional Programming
    • ❗ Intermediate Algorithm Scripting
      • (Use it as a practice to write more JavaScript code.)
    • ❗ JavaScript Algorithms and Data Structures Projects
      • (Use it as a practice to write more JavaScript code.)
  • Front End Libraries
    • ✔️ Bootstrap
    • ❌ jQuery
    • ❌ Sass
    • ✔️ React
    • ✔️ Redux
    • ✔️ React and Redux
    • ✔️ Front End Libraries Projects

Tuesday, November 12, 2019

Progressive Web App as Share Target on Android

I built a PWA (Progressive Web App) to trace shortened URL back to its original URL. (You can find it here.) I don’t want to copy a URL and paste it into my app. I want to use the Android’s sharesheet to send any link in Chrome straight to my app. How do I do that?

Google provides good documentation on this. We need to add a share_target section in manifest.json and then declare that our PWA can act as a share target. Most of the properties in this section can be thought of as attributes on a <form> element with the same name. For example, { "action": "/share", "method": "POST" } is like <form action="/share" method="POST">.

params subsection let us change parameter names if we already have a convension of naming search parameters in GET requests or form fields in POST requests. Otherwise, we can keep them in their original names. One caveat is Android doesn’t use url parameter so when sharing a URL it comes through the text parameter. In my app I need to coalesce these two parameters to get the input from the user.

Is there more? Yes! Twitter makes a great PWA and we can check their manifest.json. Here’s the beautified version of the share_target section:

"share_target": {
  "action": "compose/tweet",
  "enctype": "multipart/form-data",
  "method": "POST",
  "params": {
    "title": "title",
    "text": "text",
    "url": "url",
    "files": [
      {
        "name": "externalMedia",
        "accept": [
          "image/jpeg",
          "image/png",
          "image/gif",
          "video/quicktime",
          "video/mp4"
        ]
      }
    ]
  }
}

It has a files subsection under the params section. This is part of the Web Share Target Level 2. We can accept files from sharesheet and we can assign a file to different parameter name based on MIME type or file extension. My app doesn’t need this capability but it’s good to know what’s possible.

If you like my post, you can subscribe through email or RSS/Atom. That makes sure you won’t miss my future posts.

Wednesday, November 06, 2019

Batch Sending Email with Attachments through AppleScript

I want to learn a little bit of AppleScript. I need to help a friend send out emails to welcome new students to the school. The requirements are:

  1. addressing each recipient by their name in email content;
  2. attaching the same file in all emails.

After some research and tinkering I have a script to send emails:

theRecipients is the list of recipient names and email addresses. This is for requirement #1. theAttachment isn’t hardcoded to any file path. It will prompt and let me choose a file when I run the AppleScript.

The trickiest part is the delay 1. Without this line, emails will be sent without the attachment. It’s a hack to make sure each email has the attachment. I don’t know why it works and I can’t find an explanation online.

After building this AppleScript, I learn that Google App Script is another great way to automate sending emails through Gmail (or GSuite). I will learn App Script and write a post about that next time. If you like this kind of posts, you can subscribe through email or RSS/Atom.