khipu-computational-toolkit

Phase 1: Corpus Foundation

Generated: 2026-03-08
Database: K-CAT SQLite database (built from KFG source data)
Script: scripts/corpus_statistics.py
Status: ✅ Complete


Research Question

What does the khipu corpus look like at baseline? How many objects are represented, how complete are the data, and what are the distributional properties across cords, knots, colors, and provenance? This phase establishes the empirical foundation that all subsequent phases build on.


Methodology

The baseline statistics script queries the KFG SQLite database and reports:

  1. Corpus size — total khipus, cords, knots, and color records
  2. Cord hierarchy — breakdown by structural level (group cord L0, pendant L1, subsidiary L2+)
  3. Numeric coverage — fraction of cords with decoded non-zero values; treatment of value=0 as a null placeholder
  4. Knot type distribution — S (single/hundreds), L (long/tens), E (figure-eight/units), and special types
  5. Geographic distribution — provenance, museum country, institution counts

Numeric decoding convention (Ascher & Ascher positional notation):

This methodology is described fully in Ascher & Ascher (1978, 1981).


KFG Corpus Results

Corpus Size

Component Count
Khipus 709
Total cords 62,746
— Group / top cords (L0) 45,096
— Pendant cords (L1) 15,465
— Subsidiary cords (L2+) 2,162
Knot clusters 70,143
Decoded knot instances 238,099
Color records 76,258
Unique color codes 2,830

Numeric Coverage

Size distribution across khipus:

Metric Value
Mean cords per khipu 88.5
Median cords per khipu 42
Minimum cords 1
Maximum cords 1831

The wide gap between mean (88.5) and median (42) indicates a right-skewed distribution — most khipus are modest in size but a small number are very large.

Knot Type Distribution

Type Count Interpretation
L (long) 30,651 Tens position (value = turns × 10)
S (single) 29,278 Hundreds position (value = 100)
E (figure-eight) 9,580 Units position (value = 1)
SP (special/pendant) 217 Non-numeric marker
U (unknown) 152 Not yet classified
EE (double figure-eight) 117 Variant units
TF 110 Terminal figure-eight
LL 27 Double long
BL 11 Blank / spacer

The L:S ratio is 1.05.

Geographic Distribution

Metric Count
Unique provenances 89
Museum countries 12
Institutions 73

Top provenances:

Provenance Khipus
Unknown 236
Pachacamac 86
Ica 52
Incahuasi 52
Leymebamba 22
Huaquerones 19
Nazca 13
Huacones 11
Armatambo, Huaca San Pedro 11
Eduard Gaffron 10

Top museum countries:

Country Khipus
Peru 105
Germany 70
USA 64
France 8
Israel 4
Great Britain 4
Switzerland 4
Holland 1

Data Quality Notes

  1. value = 0 ambiguity. The KFG database stores value = 0 for cords where no knot value was decoded. This is a null placeholder, not a true zero. Downstream analyses must account for this before computing numeric statistics. See the summation patterns phase (Phase 2) for how this is handled.

  2. level = 0 cords. 45,096 cords have hierarchy_level = 0. In KFG structure these represent group/top-level cords that organize pendants but do not themselves carry knot values. They should be excluded from pendant-level numeric analyses.

  3. Unknown provenance. 236 khipus (33.3%) have no recorded provenance. Geographic analyses should note this coverage gap.

  4. Unique color codes: 2,830. This is higher than expected from the ~30 Ascher base codes. The KFG database stores compound codes (e.g., MB:W, KB-DB) as single strings; downstream color analyses should normalize these before counting distinct colors.


How to Re-run

python scripts/corpus_statistics.py             # console output only
python scripts/corpus_statistics.py --report    # also writes this report
python scripts/corpus_statistics.py --db path/to/other.db --report  # any corpus

Limitations


See Citations and Acknowledgments in the project README for primary sources, data attribution, and toolkit provenance.


Report generated automatically by scripts/corpus_statistics.py. Re-run to refresh with the latest database state.