Josh Pitzalis

A Programmer’s Learning Log

Read this first

Trusting your LLM-as-a-Judge

The problem with using LLM Judges is that it’s hard to trust them. If an LLM judge rates your output as “clear”, how do you know what it means by clear? How clear is clear for an LLM? What kinds of things does it let slide? or how reliable is it over time?

In this post, I’m going to show you how to align your LLM Judges so that you trust them to some measurable degree of confidence. I’m going to do this with as little setup and tooling as possible, and I’m writing it in Typescript, because there aren’t enough posts about this for non-Python developers.

Step 0 — Setting up your project

Let’s create a simple command-line customer support bot. You ask it a question, and it uses some context to respond with a helpful reply.

mkdir SupportBot
cd SupportBot
pnpm init

Install the necessary dependencies (we’re going to the ai-sdk and evalite for testing).

pnpm add ai @ai-sdk/openai dotenv
...

Continue reading →


Setting Up Your First Eval with Typescript

One big barrier to testing prompts systematically is that writing evaluations usually requires a ton of setup and maintenance. Also, as a TypeScript engineer, there aren’t that many practical guides on the topic, as most of the literature out there is for Python developers.

I want to show you how write your first AI evaluation framework with as little setup as possible.

What you will need

  • LLM API key with a little credit on it (I use Gemini for this walkthrough).

Step 0 — Set up your project

Let’s start with the most basic AI feature. A simple text completion feature that runs on the command line.

mkdir Summarizer
cd Summarizer
pnpm init

Install the AI SDK package, ai, along with other necessary dependencies.

pnpm i ai dotenv @types/node tsx typescript

Once you have the API key, create a .env file and save your API key:

GOOGLE_GENERATIVE_AI_API_KEY=your_api_key

Create an...

Continue reading →


Fuzzy Best Practices

Getting back into development after years, I started writing a little Express server for a new project. I realised I don’t have an implicit checklist of best practices in my head anymore.

I know I need to handle errors on my endpoints and functions, especially the async ones. I’ve forgotten what errors I need to defend against. The specifics are all fuzzy. It feels vague and overwhelming.

What I need is an explicit checklist. Like a list of 16 things I must check before publishing a commit.

Maybe it’s not 16 things; it could be 36. the point is once I have a checklist, it will be easier to add, adjust, or change things as needed. Now I’m just guessing and I can already see the mess I’m going to get myself into.

View →


typescript

Variables

let apples = 5;
let speed: string = 'fast';
let hasName: boolean = true;
let nothingMuch: null = null;
let nothing: undefined = undefined;

Built in objects

let now: Date = new Date();

Arrays

let colors: string[] = ['red', 'green', 'blue'];
let myNumbers: number[] = [1, 2, 3];
let truths: boolean[] = [true, true, false];

Classes

class Car {}
let car: Car = new Car();

Object literals

let point: { x: number; y: number } = {
  x: 10,
  y: 20,
};

Functions

const logNumber: (i: number) => void = (i: number) => {
  console.log(i);
};

//or

const logNumber =  (i: number): void  => {
  console.log(i);
};

View →


Fakes, spies, stubs & mocks

I learned how to test code using Jest. Today I’m working on a codebase that doesn’t have Jest and I have to use Jasmine and Sinon instead.

The most confusing bit is going through the documentation and trying to understand what fakes, spies, stubs, and mocks are. In Jest, I just dealt with mocks. I thought everything was a mock.

I’ve learned that all of these things fall under the umbrella of test doubles. I think I’ve figured out what the difference between each type of test double is now…

Spies - Listen

So a spy is when you just listen to the implementation of a dependency. You run the original function, but you just want to test whether it was run, how many times it was run, and what parameters it was run with.

Mocks - Bypass

A mock is when you bypass a function altogether. If you have a dependency that’s coupled to your code that does important stuff but is unrelated to what...

Continue reading →


React Testing Library and Redux Observable

You can use the test scheduler to test sequences in epics but integration testing breakdown if you don’t include redux observable in your test redux wrapper.

Here is how I successfully managed to integrate redux observable with react testing library by mocking out the store. This wrapper replaces the default react-testing-library render method with a render method that gives components access to redux, react-router and redux observable in a test environment.

import React from 'react';
import { render as rtlRender } from '@testing-library/react';
import { Router } from 'react-router-dom';
import { createMemoryHistory } from 'history';
import { createStore, applyMiddleware } from 'redux';
import { Provider } from 'react-redux';
import { createEpicMiddleware } from 'redux-observable';
import { rootReducer, rootEpic, dependencies } from './store';

function configureStore(initialState) {
...

Continue reading →


Redux Observable Loops

You can make redux complicated but at the most fundamental level, you have a view layer that lets you fire actions that can update the state. There is middleware and actions hit reducers which update the store, but let’s consider those implementation details. Practically speaking, it’s a tiny action loop where View > Action > State.

Redux observable introduces a separate action loop that runs alongside the tiny loop. The view layer lets you fire actions that trigger epics, which can fire off more actions, that can update the state. So it’s View > Action > Epic > Action > State.

This diagram helped me put all this all together.

PNG image-0FBFB8DA8517-1.png

A few important details:

  1. All actions will run through the tiny loop before they run through the epic loop.
  2. All Epics must return an action or it’s doom.
  3. If your epic returns the same action it received, you will create an infinite loop of doom.
  4. If your...

Continue reading →


Using Redux Observable For Async Stuff

Redux doesn’t handle async work too well. Thunk is the go-to solution, but it’s not always great for testing.

Here is how to do a basic async data fetch with redux observable;

export const exampleEpic = (action$, state$, { later }) =>
  action$.pipe(
    ofType('projects/updateTitle'),
    debounceTime(1000),
    switchMap(({ payload }) =>
      from(later(2000, payload)).pipe(
        map(res => ({
          type: 'projects/fetchFulfilled',
          payload: res,
        }))
      )
    )
  );

Thunks are called epics in redux observable. All epics take three parameters (action$, state$, { dependancies }). The last two are optional.

The action$ parameter is a stream of all redux actions emitted over time. state$ is the state of your redux store. dependencies can contain any side effects that you want to use in your epic. Passing in side effects as dependencies is super handy...

Continue reading →


Setting Up Redux Observable

If you are adding redux observable to a new redux project, the first step is to install it along with RXJS

npm i redux-observable rxjs

The next step is to set up the middleware.

Create a middleware and pass it to the createStore function from Redux. Then you create a root epic that combines all your epics and call epicMiddleware.run() with the rootEpic.

import { createStore, compose, applyMiddleware } from 'redux';
import { createEpicMiddleware } from 'redux-observable';
import { combineEpics } from 'redux-observable';
import  { epicA } from './epicA';
import  { epicB } from './epicB';

export const rootEpic = combineEpics(
  epicA,
  epicB
);

const epicMiddleware = createEpicMiddleware();

const composeEnhancers = window.__REDUX_DEVTOOLS_EXTENSION_COMPOSE__ || compose;

export default function configureStore() {
  const store = createStore(
    rootReducer,
    composeEnhancers(
...

Continue reading →


A Basic Smart Contract

pragma solidity ^0.4.24
contract Campaign {

    address public owner;
    uint public deadline;
    uint public goal;
    uint public fundsRaised;
    bool public refundsSent;

    event LogContribution(address sender, uint amount);
    event LogRefundsSent(address funder, uint amount);
    event LogWithdrawal(address beneficiary, uint amount);

    struct FunderStruct {
        address funder;
        uint amount;
    }

    FunderStruct[] public funderStructs;

    constructor ( uint _duration, uint _goal) public {
        owner = msg.sender;
        deadline = block.number + _duration;
        goal = _goal;
    }

    function isSuccess() public constant returns(bool isIndeed) {
        return (fundsRaised >= goal);
    }

    function hasFailed() public constant returns(bool hasIndeed) {
        return (fundsRaised < goal && block.number > deadline );
    }

    function
...

Continue reading →