oxsv
2026Archive

oxsv

NASM cli csv simd avx2 csv-parser

xsv, mass assembly.

Drop-in replacement for xsv. Zero dependencies. SIMD-native. Single binary.

Why

Your 8GB CSV crashed VSCode. Excel can't open it. pandas ate 16GB of RAM and died.

oxsv processes it in 0.2 seconds.

Benchmark

1GB CSV, 6.27M rows, 18 columns (WSL2, DDR4-3200)

Commandoxsvxsv (Rust)Speedup
count0.17s13.14s76x
headers0.002s0.044s22x
head0.002s0.023s12x
select0.82s19.27s24x
search1.31s25.63s20x
frequency1.14s29.17s26x
index1.11s14.22s13x
stats12.97s74.25s5.7x
sort2.15s47.58s22x
fmt2.43s37.26s15x

SIMD microbenchmark (ASM vs naive C, 16MB buffer)

FunctionASMCSpeedup
count_byte8.9 GB/s3.3 GB/s2.7x
find_byte11.3 GB/s2.3 GB/s5.0x
find_byte early8 ns88 ns10.8x

Binary size

BinarySize
oxsv27 KB
xsv (Rust)4.8 MB

178x smaller. oxsv is smaller than most favicons.

Install

make && sudo make install

Requires: nasm, gcc, Linux x86-64.

Usage

oxsv count huge.csv                             # count rows
oxsv headers huge.csv                           # show column names
oxsv head -n 20 huge.csv                        # first 20 rows
oxsv select name,email huge.csv                 # extract columns
oxsv search -s status "active" huge.csv         # filter rows
oxsv search -s name "田中" huge.csv              # UTF-8 works
oxsv stats huge.csv                             # column statistics
oxsv frequency -s status huge.csv               # value counts
oxsv sort -s amount -N huge.csv                 # numeric sort
oxsv index huge.csv                             # build index → fast slice
oxsv slice -i 99999000 -l 100 huge.csv          # instant with index

Pipe from S3:

aws s3 cp s3://bucket/huge.csv - | oxsv search --no-mmap -s status "active"

How It Works

mmap -- The file is memory-mapped, not loaded. Physical RAM usage stays in single-digit megabytes regardless of file size. A 64GB CSV on a 4GB machine works fine.

AVX2 -- Delimiters are scanned 32 bytes at a time using SIMD instructions. On DDR4-3200, this saturates memory bandwidth at ~30GB/s. That's the theoretical floor -- oxsv gets close.

No runtime -- The C layer parses arguments. Everything that touches your data is hand-written x86-64 assembly. No libc in the hot path, no allocator, no GC.

Architecture

main.c          argument parsing only (never touches file data)
oxsv.asm        mmap + CPUID dispatch
cmd/*.asm       one file per subcommand
core/*.asm      SIMD scanner, CSV parser, buffered I/O, syscall wrappers

Binary is 27KB stripped. That's smaller than most favicons.

Flags

FlagShortDescription
--delimiter-dField delimiter (default: ,)
--quote-qEnable quoted field parsing
--no-headers-nFile has no header row
--output-oWrite to file instead of stdout
--no-mmapRead from stdin/pipe

Index

oxsv index huge.csv       # creates huge.csv.idx in same directory

With an index, slice and tail are O(1). Without it, they stream. If the source file changes, the index is silently ignored.

UTF-8

oxsv assumes UTF-8. SIMD scans for ASCII delimiters (, \n ") which never appear inside UTF-8 multibyte sequences. Japanese, emoji, whatever -- it just works.

Shift_JIS is not supported. iconv -f SHIFT_JIS -t UTF-8 first.

xsv Compatibility

oxsv implements the core xsv commands with the same names and similar flags. Differences:

  • search uses exact match by default (xsv uses regex)
  • join, sample, split, flatten, table are not implemented

License

MIT. Copyright (c) 2026 rxxuzi.

Acknowledgments

  • xsv by BurntSushi -- command interface design