Generated: 2026-03-08
Database: K-CAT SQLite database (built from KFG source data)
Script: scripts/corpus_statistics.py
Status: ✅ Complete
What does the khipu corpus look like at baseline? How many objects are represented, how complete are the data, and what are the distributional properties across cords, knots, colors, and provenance? This phase establishes the empirical foundation that all subsequent phases build on.
The baseline statistics script queries the KFG SQLite database and reports:
value=0 as a null placeholderNumeric decoding convention (Ascher & Ascher positional notation):
S knot = hundreds position = 100L knot = tens position = NUM_TURNS × 10E knot = units position = 1value = 0 is used as a null/missing-value placeholder in the KFG databaseThis methodology is described fully in Ascher & Ascher (1978, 1981).
| Component | Count |
|---|---|
| Khipus | 709 |
| Total cords | 62,746 |
| — Group / top cords (L0) | 45,096 |
| — Pendant cords (L1) | 15,465 |
| — Subsidiary cords (L2+) | 2,162 |
| Knot clusters | 70,143 |
| Decoded knot instances | 238,099 |
| Color records | 76,258 |
| Unique color codes | 2,830 |
value = 0 (30.3%) — treated as null/undecodedSize distribution across khipus:
| Metric | Value |
|---|---|
| Mean cords per khipu | 88.5 |
| Median cords per khipu | 42 |
| Minimum cords | 1 |
| Maximum cords | 1831 |
The wide gap between mean (88.5) and median (42) indicates a right-skewed distribution — most khipus are modest in size but a small number are very large.
| Type | Count | Interpretation |
|---|---|---|
| L (long) | 30,651 | Tens position (value = turns × 10) |
| S (single) | 29,278 | Hundreds position (value = 100) |
| E (figure-eight) | 9,580 | Units position (value = 1) |
| SP (special/pendant) | 217 | Non-numeric marker |
| U (unknown) | 152 | Not yet classified |
| EE (double figure-eight) | 117 | Variant units |
| TF | 110 | Terminal figure-eight |
| LL | 27 | Double long |
| BL | 11 | Blank / spacer |
The L:S ratio is 1.05.
| Metric | Count |
|---|---|
| Unique provenances | 89 |
| Museum countries | 12 |
| Institutions | 73 |
Top provenances:
| Provenance | Khipus |
|---|---|
| Unknown | 236 |
| Pachacamac | 86 |
| Ica | 52 |
| Incahuasi | 52 |
| Leymebamba | 22 |
| Huaquerones | 19 |
| Nazca | 13 |
| Huacones | 11 |
| Armatambo, Huaca San Pedro | 11 |
| Eduard Gaffron | 10 |
Top museum countries:
| Country | Khipus |
|---|---|
| Peru | 105 |
| Germany | 70 |
| USA | 64 |
| France | 8 |
| Israel | 4 |
| Great Britain | 4 |
| Switzerland | 4 |
| Holland | 1 |
value = 0 ambiguity. The KFG database stores value = 0 for cords where no knot value was decoded. This is a null placeholder, not a true zero. Downstream analyses must account for this before computing numeric statistics. See the summation patterns phase (Phase 2) for how this is handled.
level = 0 cords. 45,096 cords have hierarchy_level = 0. In KFG structure these represent group/top-level cords that organize pendants but do not themselves carry knot values. They should be excluded from pendant-level numeric analyses.
Unknown provenance. 236 khipus (33.3%) have no recorded provenance. Geographic analyses should note this coverage gap.
Unique color codes: 2,830. This is higher than expected from the ~30 Ascher base codes. The KFG database stores compound codes (e.g., MB:W, KB-DB) as single strings; downstream color analyses should normalize these before counting distinct colors.
python scripts/corpus_statistics.py # console output only
python scripts/corpus_statistics.py --report # also writes this report
python scripts/corpus_statistics.py --db path/to/other.db --report # any corpus
scripts/build_kfg_database.py; re-run that script first to incorporate any upstream KFG updates.See Citations and Acknowledgments in the project README for primary sources, data attribution, and toolkit provenance.
Report generated automatically by scripts/corpus_statistics.py. Re-run to refresh with the latest database state.