sliding_correlator example
A sliding correlator computes the cross-correlation between a running input window and a fixed reference sequence:
y[n] = Σ conj(ref[k]) · x[n−k]
k=0..N-1
When ref equals the complex conjugate of a known waveform, the output peaks
whenever that waveform appears in the input. Practical uses include preamble
detection, CDMA despreading, and radar/sonar matched filtering.
The goal of this example is to show that JM_DEFINE_STEPS is algorithm-agnostic:
the correlator uses complex multiply (not real×complex like FIR), a different
state layout, and different semantics — and the macro handles it identically.
TL;DR — see it work first
. <(curl -fsSL https://just-buildit.github.io/just-makeit/install.sh)
just-makeit example sliding_correlator
# sliding_correlator: PASSED
Prerequisites
. <(curl -fsSL https://just-buildit.github.io/just-makeit/install.sh)
Pass a custom path to keep the venv somewhere persistent:
. <(curl -fsSL https://just-buildit.github.io/just-makeit/install.sh) -- ~/my-venv
Or with pip if just-makeit is already installed:
pip install just-makeit && just-makeit install-deps
source /tmp/jm-venv/bin/activate
1. Scaffold
just-makeit new my_corr \
--object sliding_correlator \
--state "ref:float _Complex[16]" \
--state "delay:float _Complex[16]"
cd my_corr
2. Implement sliding_correlator_step
Open native/inc/sliding_correlator/sliding_correlator_core.h and replace the
stub with the implementation below.
The stub:
// before
static inline float complex sliding_correlator_step(const sliding_correlator_state_t *state, float complex x) {
(void)state; /* TODO: implement DSP using state variables */
return x;
}
The implementation:
// after
static inline float complex sliding_correlator_step(sliding_correlator_state_t *state, float complex x) {
memmove(&state->delay[1], &state->delay[0], 15 * sizeof(float complex));
state->delay[0] = x;
float complex acc = 0.0f + 0.0f * I;
for (int k = 0; k < 16; k++)
acc += conjf(state->ref[k]) * state->delay[k];
return acc;
}
Or apply the patch:
"""Patch sliding_correlator_step stub with the implementation.
Run from the project root: python3 .steps/02_patch.py
"""
import pathlib
import re
import sys
header = pathlib.Path("native/inc/sliding_correlator/sliding_correlator_core.h")
impl = pathlib.Path(__file__).with_name("02_step_after.c")
stub_re = re.compile(
r"(static inline|JM_FORCEINLINE JM_HOT) float complex\s*\n"
r"sliding_correlator_step\((?:const )?sliding_correlator_state_t \*state.*?\n\}",
re.DOTALL,
)
text = header.read_text()
m = stub_re.search(text)
if not m:
print("ERROR: stub not found — already patched or file changed", file=sys.stderr)
sys.exit(1)
qualifier = m.group(1)
replacement = impl.read_text().strip().replace("static inline", qualifier, 1)
patched = stub_re.sub(replacement, text)
header.write_text(patched)
print(f"patched {header}")
3. Build and verify
cmake -B build -S . -DCMAKE_BUILD_TYPE=Debug \
-DPython3_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)")
cmake --build build --parallel
cmake --build build --target test ARGS="--output-on-failure"
pip install -e .
Quick sanity check — identity correlator (ref = [1, 0, …, 0]) passes the input through unchanged:
import numpy as np
from my_corr import SlidingCorrelator
c = SlidingCorrelator()
ref = np.zeros(16, dtype=np.complex64)
ref[0] = 1.0
c.set_ref(ref)
impulse = np.zeros(16, dtype=np.complex64)
impulse[0] = 1.0
print(c.steps(impulse)[:4].tolist())
# [(1+0j), 0j, 0j, 0j]
4. JM_DEFINE_STEPS
From inside my_corr:
# run from inside my_corr
just-makeit perf
Add CORR_TAPS, CORR_LENGTH, CORR_BATCH, and sliding_correlator_step_batch()
to native/inc/sliding_correlator/sliding_correlator_core.h just after
sliding_correlator_step():
#define CORR_TAPS 16
#define CORR_LENGTH (CORR_TAPS - 1)
/* Complex samples per step_batch() call. Same derivation as FIR_BATCH:
* JM_SIMD_WIDTH_F32/2 ≥ 1 on AVX2/AVX-512; _JM_STEPS_SIMD_ is a no-op
* on scalar targets so the value there doesn't matter. */
#define CORR_BATCH (JM_SIMD_WIDTH_F32 / 2)
/* No ISA guard needed — _JM_STEPS_SIMD_ only calls this when width > 1.
* The inner loop is auto-vectorisable; the compiler picks the best ISA. */
JM_FORCEINLINE JM_HOT void
sliding_correlator_step_batch(
sliding_correlator_state_t *state,
const float complex *window,
float complex *out)
{
for (int b = 0; b < CORR_BATCH; b++) {
float complex acc = 0.0f + 0.0f * I;
for (int k = 0; k < CORR_TAPS; k++)
acc += conjf(state->ref[k]) * window[b + CORR_LENGTH - k];
out[b] = acc;
}
}
CORR_LENGTH = CORR_TAPS - 1 is the history depth — the quantity JM_DEFINE_STEPS
needs. CORR_TAPS and CORR_BATCH are algorithm-specific; the macro never
sees them directly.
step_batch() computes the same dot product as step() but reads from a
pre-built contiguous window instead of the delay line, so the inner loop has
no scatter-gather and the compiler can auto-vectorise it freely. No #ifdef
guard is needed: _JM_STEPS_SIMD_ only calls step_batch() when
JM_SIMD_WIDTH_F32 > 1, so scalar builds simply use the step() path.
Replace sliding_correlator_steps in native/src/sliding_correlator/sliding_correlator_core.c:
#define CORR_CHUNK 256
JM_DEFINE_STEPS(sliding_correlator, sliding_correlator_state_t, float complex,
CORR_LENGTH, CORR_BATCH, CORR_CHUNK)
JM_DEFINE_STEPS generates sliding_correlator_steps() — scratch buffer,
chunked fill, SIMD dispatch (AVX-512 or AVX2), scalar tail. The three
constants keep each concern separate:
| constant | concern | meaning |
|---|---|---|
CORR_LENGTH |
algorithm | history depth (delay[] entries) |
CORR_BATCH |
parallelism | complex samples per call (JM_SIMD_WIDTH_F32 / 2) |
CORR_CHUNK |
tuning | samples per scratch-buffer fill |