Dominate Structured and Semi-Structured Data Explosion
Unlock the power of semi-structured data with BigQuery’s JSON Type. Data processing, storage, and query engines must build custom transformation pipelines to handlesemi-structured data and unstructured data due to its diversity and volume
This post will discuss BigQuery‘s architectural concepts forsemi-structured data JSON, which eliminates complex preprocessing and provides schema flexibility, intuitive querying, and structured data’s scalability
BigQuery’s storage architecture relies on columnar capacitor storage. This format stores exabytes of data and serves millions of queries after a decade of research and optimization
Capacitor can permute rows to improve RLE effectiveness since table row order rarely matters. An embedded expression library uses columnar storage for block-oriented vectorized processing
JSON is shredded into virtual columns as much as possible during ingestion. Most JSON keys are written once per column, not per row. Column data excludes colons and whitespace
This greatly reduces query-time storage and IO costs. The format natively understands JSON nulls and arrays, optimizing virtual column storage
You would have to load the entire JSON STRING row from storage, decompress it, and evaluate each filter and projection expression one row at a time to filter or project specific JSON paths