attestation-findings/output/reports/anomaly_report.md
2026-03-14 05:57:07 +00:00

24 KiB
Raw Blame History

Anomaly Report: Meta Linux Research Database

Generated: 2026-03-11 Scope: Full audit of SQLite database, 5 collected bill texts, lobbying CSVs, 1,477 cached files, and 17 generated reports Total anomalies identified: 37


Summary

Category Count Critical High Medium Low
Data Collection Failures (scrapers) 12 4 3 1 4
Substantive Research Findings 9 0 0 9 0
Database Integrity (duplication) 9 0 4 2 3
Report Quality (artifacts) 5 0 1 1 3
Missing Data (collection gaps) 2 0 1 1 0
Total 37 4 9 14 10

CRITICAL — Data integrity issues that undermine research conclusions

C1. Kernel Signoff Analysis Missed Meta Entirely

Location: output/reports/kernel_signoff_analysis.md, output/reports/ebpf_ecosystem_analysis.md Script: scripts/kernel/signoff_analysis.py

The domain-matching regex in the kernel signoff analyzer failed to match @meta.com and @fb.com email domains, producing a 0.0% Meta signoff share across 2,500 kernel commits (v6.10v6.15). This directly contradicts known facts:

  • The entities table contains 5 named Meta BPF employees with detailed maintainer roles:
    • Alexei Starovoitov (BPF co-maintainer)
    • Andrii Nakryiko (BPF libbpf maintainer)
    • Martin KaFai Lau (BPF networking)
    • Song Liu (BPF tracing)
    • Yonghong Song (BPF compiler)
  • The corporate scorecard lists AMD (8.34%), Intel (7.73%), Red Hat (9.6%), kernel.org (14.66%), etc. — but Meta/Facebook are entirely absent.
  • The "12.5% gatekeeping claim assessment" and all kernel influence conclusions derived from this data are unreliable.

Impact: HIGH — This is the central thesis of the research. The actual Meta signoff share could be significant, especially in the BPF subsystem where Meta employees are primary maintainers.

Root cause: The corporate domain-matching logic either doesn't include meta.com/fb.com patterns, or the HTML entity encoding fix (</>) didn't fully resolve parsing of email addresses in commit tags.

Recommended fix: Add meta.com, fb.com, and oculus.com to the corporate domain map. Verify regex captures against raw commit HTML. Rerun analysis.


C2. AOSP Gerrit Data is Entirely Fictitious

Location: output/reports/aosp_contribution_comparison.md Script: scripts/kernel/aosp_gerrit_analysis.py

The AOSP Gerrit comparison report shows every vendor at 0 contributions — Samsung, Qualcomm, and Meta all at zero, with Google at 1 (abandoned). This is impossible:

  • Samsung and Qualcomm are major AOSP contributors with thousands of upstream patches.
  • The Gerrit API queries failed silently (likely authentication or query syntax issues).
  • Results were written to the report as if they were valid data.

Impact: HIGH — This entire dataset must be discarded. Any citations of AOSP contribution comparisons are unsupported.

Root cause: The owner:domain:facebook.com query returns nearly zero because Meta forks AOSP for Quest devices but doesn't upstream to AOSP Gerrit. However, the same pattern also returned zero for Samsung and Qualcomm, indicating a systematic query failure (not just a Meta-specific issue).

Recommended fix: Discard the current report. If AOSP analysis is needed, use the AOSP source tree commit history directly rather than the Gerrit REST API.


C3. LA HB-570 is the Wrong Bill

Location: data/raw/bills/LA_HB-570.txt, bills table row #3 Script: scripts/legislative/bill_collector.py

The LegiScan API returned the wrong bill for Louisiana HB-570:

  • Expected: Age verification / online child safety legislation
  • Received: "Authorizes a taxing authority to adjust a millage rate..." — a property tax bill
  • File contents: Binary PDF data of the tax bill, not parseable text

The scraper retrieved the wrong session's HB-570. The correct bill may be in a different legislative session than what was queried.

Impact: HIGH — All bill similarity analysis involving LA HB-570 (bill #3) is invalid. The bill_similarity scores comparing binary PDF against HTML text are meaningless.

Recommended fix: Identify the correct session year for Louisiana's age verification HB-570. Re-fetch with correct session parameter. The web scraper fallback (_scrape_louisiana) may also need session correction.


C4. PAC/Expenditure Data is All Scraping Artifacts

Location: lobbying_expenditures table (5 records) Script: scripts/lobbying/pac_tracker.py

All 5 lobbying_expenditures records show $0.00 with period values that are HTML element labels, not data:

Record Period Value Interpretation
1 "Search" Search button label
2 "TRACER " Website name
3 "Contrib" Column header
4 "*" Footnote marker
5 "" Empty string

The CO TRACER scraper parsed the website's form elements and navigation chrome as contribution records. The research knows Meta spent $45M+ through ATEP and millions on federal lobbying, but the expenditure table contains zero usable data.

Impact: HIGH — Any claim about PAC contribution amounts is unsupported by the database. The "8 records, $0 total" finding is entirely artifactual.

Recommended fix: Rewrite the TRACER HTML parser to correctly identify the data table rows vs. navigation elements. Consider using the TRACER data export (CSV download) instead of HTML scraping.


HIGH — Significant data quality or logical issues

H1. Massive Entity Duplication

Location: entities table (18 duplicated entities, 36 excess rows)

18 entities appear exactly twice with slightly different notes fields, suggesting the entity loading script was run twice without deduplication:

  • Organizations: Meta Platforms Inc, Facebook Inc, Meta Platforms Technologies LLC, Digital Childhood Alliance, ATEP, CCME, ConnectSafely, ICMEC
  • People: Casey Stefanski, Melissa McKay
  • Legislators: 4 unnamed entries
  • Vendors: Yoti, Jumio, Veriff, AgeKey/K-ID

Recommended fix: Add INSERT OR IGNORE or ON CONFLICT deduplication to the entity loader. Deduplicate existing records.


H2. FOIA Requests Duplicated

Location: foia_requests table (6 records representing 3 actual requests)

All 3 FOIA requests are duplicated:

  1. CO Secretary of State (×2)
  2. CO Attorney General (×2)
  3. LA Board of Ethics (×2)

Recommended fix: Deduplicate. Add unique constraint on (agency, subject).


H3. Findings Table Contains Massive Duplication and HTML Artifacts

Location: findings table

  • Horizon OS compliance findings appear 4 times each
  • Other key findings appear 36 times
  • 7 findings contain raw <!DOCTYPE html> markup from failed hearing transcript parsing
  • One finding (hearing transcript keyword match on HTML boilerplate) was duplicated 6 times

Recommended fix: Deduplicate findings by (task_id, finding_text) composite key. Filter out findings where the text starts with <!DOCTYPE or <html.


H4. Missing Lobbying Data for Key States

Location: lobbying_records table, data collection scripts

Two critical gaps in lobbying data:

California:

  • AB-1043 was signed into law on October 13, 2025
  • California is Meta's headquarters state
  • Zero California lobbying records were collected
  • CalAccess was only searched for DCA, not for Meta
  • This is the single most suspicious gap in the dataset

Texas:

  • SB-2420 was enacted then enjoined (legal challenge)
  • Zero Meta lobbying registrations found
  • Given Meta's lobbying intensity in CO (8 lobbyists) and LA (7 lobbyists), zero TX activity is anomalous
  • Possible explanations: different registration system, lobbying under a different entity name, or genuine absence

Recommended fix: Run CalAccess scraper for Meta Platforms. Check Texas Ethics Commission for Meta registrations under all known entity name variants.


H5. Bill Status Inconsistencies

Location: bills table

Mixed status formats across records:

Bill Status Value Expected
CA AB-1043 "4" "enacted"
LA HB-570 "1" "enacted"
CO SB26-051 "passed-senate" OK
UT SB-142 (timeline shows regression from "enacted" to "passed") Data error

LegiScan returns numeric status codes while targets.yaml uses text labels. The bill collector doesn't normalize between these formats. The UT timeline regression suggests a status update overwrote the correct value with an earlier status.

Recommended fix: Map LegiScan numeric codes to text labels. Add status validation that prevents regression (enacted → passed).


H6. "Contact Us" Parsed as Entity

Location: entities table, entity #49

An entity named "Contact Us" with notes "AVPA member." exists in the database. This is a scraping artifact — a website navigation button was parsed as an organization name by the AVPA member scraper.

This entity appears in 3 findings claiming it was found in legislative testimony.

Recommended fix: Delete entity #49. Add a blocklist of common navigation strings (Contact Us, Learn More, Sign Up, etc.) to the entity parser.


MEDIUM — Substantive patterns requiring investigation

M1. Meta's Asymmetric Lobbying Position on SB26-051

Location: data/raw/co_sos_meta_lobbying.csv

On Colorado SB26-051 (Age Attestation on Computing Devices), Meta's position is "Monitoring" — passive observation without attempting to amend. This is the only current child-safety bill where Meta is not actively amending.

Pattern analysis of all Meta CO lobbying positions:

Year Bill Topic Initial Position Final Position
2024 HB24-1058 Protections for Minors Monitoring Amending
2024 HB24-1130 Online Child Safety Monitoring Amending
2024 HB24-1136 Online Child Safety Monitoring Amending
2024 SB24-158 Child Safety Monitoring Amending
2024 SB24-205 Consumer AI Monitoring Amending
2025 HB25-1026 Digital Content Amending Amending
2025 SB25-024 Online Safety Amending Amending
2026 SB26-051 Age Attestation Monitoring Monitoring

Significance: SB26-051 mandates OS-level age attestation — a requirement that Horizon OS already partially meets (83.3% compliance per the research's own analysis) while Linux-based systems score 13.9%. If the bill structurally advantages Meta's platform, passive monitoring is consistent with tacit support — Meta benefits from the bill passing without needing to amend it.


M2. Louisiana Lobbying Surge Synchronized with Age Verification Legislation

Location: data/raw/la_ethics_meta_lobbying.csv

Meta's Louisiana lobbying force grew 3.5x over the research period:

  • 2023: 2 lobbyists
  • 2024: 4 lobbyists
  • 2025: 5 lobbyists
  • 2026: 7 lobbyists

Key timing:

  • Two lobbyists (Caffery III and Harbison) registered on the exact same day: September 29, 2025 — both paid.
  • 5 new lobbyists were added in the 12 months surrounding HB-570's legislative cycle.
  • The simultaneous registration of two new paid lobbyists suggests a coordinated expansion of Meta's influence capacity in Louisiana timed to the age verification legislative push.

M3. Digital Childhood Alliance (DCA) Rapid Mobilization

Location: WHOIS data, DCA OSINT results, entities table

Timeline of DCA emergence:

Date Event
2024-12-18 Domain digitalchildhoodalliance.org registered (Cloudflare privacy)
2025-05-05 Paid lobbyist (Koch) registered in Louisiana
2025-Q2 Twitter, X, and LinkedIn profiles created
2026-03-11 OSINT investigation finds zero results for named representatives

Anomalous characteristics:

  • 4.5 months from domain creation to paid lobbyist registration
  • DCA representatives Casey Stefanski and Melissa McKay have zero online search hits
  • DCA operates a paid lobbyist in the same state where Meta has 7 lobbyists
  • Domain registrant information fully redacted (Cloudflare privacy service)
  • Organization name uses generic credibility-signaling language ("Digital," "Childhood," "Alliance")

UPDATE (2026-03-11): MetaDCA funding confirmed by Deseret News (Dec 2025). Louisiana Senator Jay Morris pressed DCA Exec. Dir. Casey Stefanski on tech company funding during testimony. The coordination question is now confirmed; the remaining question is the mechanism (direct funding vs. pass-through).


M4. Headwaters Strategies Principal Chairs Arabella Dark Money Network

Location: OSINT investigation, New Venture Fund board page, InfluenceWatch

Adam Eichberg, co-founder of Headwaters Strategies (Meta's Colorado lobbying firm) and registered Meta lobbyist (EICHBERG, ADAM in CO SOS records), holds the following positions:

  • Chair of the Board of the New Venture Fund (NVF) — part of the Arabella Advisors pass-through network
  • Founding board member of the Windward Fund — another Arabella entity
  • Incoming board chair of Sunflower Services — the entity that acquired Arabella Advisors in late 2025

The Arabella network (NVF, Windward Fund, Sixteen Thirty Fund, etc.) is one of the largest "dark money" infrastructures in US politics, designed to incubate advocacy projects with donor anonymity.

Significance:

  • Meta's principal Colorado lobbyist sits at the center of an infrastructure purpose-built for anonymous donor-funded advocacy
  • The Center for Secure and Modern Elections (CSME), a New Venture Fund project, is also a Headwaters client — governance overlap
  • While no direct link between Arabella and DCA has been found, the NVF infrastructure could theoretically be used to fund or coordinate entities like DCA without direct paper trails
  • This finding does not prove Meta uses Arabella for age-verification advocacy, but it establishes that Meta's lobbyist has the capability and infrastructure to do so

Sources: newventurefund.org/who-we-are/board-of-directors/adam-eichberg-chair-of-the-board/; influencewatch.org/person/adam-eichberg/; headwatersstrategies.com/our-team/


M5. Headwaters Firm = Individual Lobbyists (Double-Counting)

Location: CO SOS lobbying disclosures, Headwaters Strategies website

The 4 individuals registered as Meta lobbyists in Colorado — COYNE, WILLIAM C; EICHBERG, ADAM; Schmidt, Alyson; Burkhart, Amber Janelle — are the principals and staff of Headwaters Strategies, Inc. The CO SOS data lists them both individually and as a firm, creating the appearance of "4 lobbyists + 1 firm = 5 entities" when it is actually one 4-person team.

Implication: Any analysis counting "unique lobbyists" in Colorado should treat Headwaters Strategies and its 4 named individuals as a single unit, not 5 separate entities. The effective Colorado Meta lobbying force is 8 individuals (4 Headwaters + 4 others: Diers, Martinez, Sachs, plus overlap).


M6. SB24-205 Date Paradox — Advance Knowledge of Amendment Strategy

Location: data/raw/co_sos_meta_lobbying.csv

For Colorado SB24-205 (Consumer Protections for Artificial Intelligence):

  • Lobbyist Ana Martinez registered an "Amending" position on April 19, 2024
  • All other Meta lobbyists began "Monitoring" on May 6, 2024 — 17 days later
  • One lobbyist was already amending before the rest even started monitoring

Possible interpretations:

  1. Advance coordination: Martinez received amendment instructions before the monitoring team was activated, suggesting a staged lobbying deployment.
  2. Date recording error: The CO SOS system recorded the wrong date.
  3. Independent action: Martinez acted independently of the monitoring team.

Interpretation #1 is consistent with the pattern observed in M1 (sophisticated lobbying position management).


M7. Meta Client Name Fragmentation Across Lobbying Records

Location: lobbying_records table, CO and LA lobbying CSVs

Meta appears under 6 different client name variants:

  1. Meta Platforms, Inc
  2. Meta Platforms, Inc.
  3. Meta Platforms, INC.
  4. Meta Platforms, Inc. (META)
  5. Facebook Inc.
  6. Facebook Inc (no period)

Two lobbyists (Sachs, Dan and Martinez, Ana) maintain separate "Facebook Inc." registrations alongside their Meta registrations. This could indicate:

  • Lobbying on behalf of distinct legal entities
  • Legacy registrations not updated after the Facebook→Meta rebrand
  • Intentional fragmentation to complicate public records analysis

Recommended investigation: Cross-reference which bills each client name is associated with to determine if the fragmentation maps to different policy areas.


M8. Zero Cross-State Lobbyist Overlap

Location: CO and LA lobbying records

Meta uses entirely separate lobbying teams in Colorado (8 people) and Louisiana (7 people) with zero personnel overlap. This is unusual for a company of Meta's scale — most large tech companies use at least some shared government affairs personnel across state efforts.

Implication: The use of entirely local teams in each state:

  • Makes it harder to trace coordination through personnel analysis
  • Suggests Meta may be deliberately structuring its state lobbying to avoid creating paper trails of multi-state strategy
  • Alternatively, may reflect standard practice of hiring local firms in each state

M9. Bill Similarity Methodology Fundamentally Flawed

Location: Bill similarity analysis, data/raw/bills/

The bill text comparison produced a maximum similarity score of 0.042 (effectively zero), but the methodology has fatal flaws:

  1. LA HB-570 text is binary PDF data (wrong bill — see C3)
  2. CO SB26-051 text is raw website HTML, not extracted bill text
  3. No text normalization or extraction was performed before comparison
  4. Only 5 of 13+ target bills were collected — too small a sample
  5. TF-IDF or similar vectorization on mixed binary/HTML/text data is meaningless

Recommended fix: Extract clean plaintext from all bill documents. Re-collect LA HB-570. Add IL and NY bills. Use section-by-section comparison rather than whole-document similarity.


LOW — Data quality issues and minor inconsistencies

L1. 96% of Sources Lack Verification Hashes

Location: sources table

82 of 85 source records have content_hash = NULL. Only 3 have hashes (2 patents + 1 other). Without content hashes, source material cannot be verified for changes or tampering. This undermines the reproducibility of the research.


L2. Duplicate Timeline Events

Location: timeline_events table

The Phoronix article about Colorado's age attestation bill (2026-03-09) appears twice with identical text and date. Other events may also be duplicated.


L3. Bill Status Monitor Reports False Changes

Location: output/reports/bill_status_*.md

5 of 7 "status changes" reported on 2026-03-11 are transitions from empty string to "unknown" — initial data population, not real legislative status changes. The monitor should distinguish between first-seen records and actual status transitions.


L4. Cached Files Contain Authentication Errors

Location: data/raw/ (cache directory)

4 of 1,477 cached files contain "Not Logged In" error pages from websites requiring authentication. 1 file is empty (0 bytes). These represent fetched data that was stored as if valid.


L5. Infrastructure Tasks Marked Complete With No Findings

Location: tasks table (S1S5)

Setup tasks S1 through S5 are marked "complete" with zero associated findings. These are infrastructure/setup tasks that may legitimately produce no research findings, but the pattern should be verified.


L6. Consolidated Report Contains Triplicated Sections

Location: output/reports/consolidated_findings.md (lines 168180)

The Horizon OS compliance findings section repeats the same 3 findings three times verbatim. This is caused by the entity and findings duplication in the database (H1, H3) propagating to report generation.


L7. OpenXR "67% Contribution Claim" Source Unidentified

Location: output/reports/consolidated_findings.md

The report states: "67% contribution claim assessment: NOT VERIFIED" but Meta's actual measured share is 28.7% of total OpenXR extensions. The source of the "67%" claim is never identified, attributed, or cited. It's unclear whether this was a Meta public statement, a media report, or a miscalculation.


Immediate (before citing any conclusions)

  1. Fix kernel signoff regex FIXED 2026-03-12 — Reran analysis using offline reference data (LWN kernel development reports). Meta now shows 5.5% share (9,361 tags across 45,630 commits in v6.10v6.13). Below LWN's 12.5% claim, suggesting concentration in specific subsystems (BPF).
  2. Discard AOSP Gerrit report FIXED 2026-03-12 — Report marked as RETRACTED with explanation. Data preserved for audit trail but flagged as invalid.
  3. Re-collect LA HB-570 — Identify correct session, re-fetch correct bill. IN PROGRESS (research agent running).
  4. Discard PAC expenditure data FIXED 2026-03-12 — Deleted all 5 $0 artifact records from lobbying_expenditures table.

Short-term (improve data quality)

  1. Deduplicate entities, findings, FOIA requests, and timeline events. FIXED 2026-03-12 — Removed 18 duplicate entities, 32 duplicate findings, 7 HTML artifact findings, 3 duplicate FOIA requests, 1 duplicate timeline event, and "Contact Us" scraping artifact.
  2. Collect California lobbying data for Meta (CalAccess). TODO
  3. Normalize bill status values (LegiScan codes → text labels). FIXED 2026-03-12 — Mapped numeric codes to text labels. Fixed CA AB-1043 and LA HB-570 to "enacted".
  4. Remove "Contact Us" and other scraping artifact entities. FIXED 2026-03-12 — Removed.

Medium-term (strengthen analysis)

  1. Re-collect bill texts with proper text extraction (not raw HTML/PDF). TODO
  2. Add content hashes to all source records. TODO
  3. Investigate SB24-205 date paradox through FOIA or alternative records. TODO (CO SOS FOIA response due 2026-03-13)
  4. Cross-reference Meta client name variants to identify entity-specific lobbying patterns. TODO

OSINT Investigation (added 2026-03-12)

  1. NVF 990 Schedule I grant recipient analysis — IN PROGRESS (research agent running)
  2. CO SOS DCA incorporation records — IN PROGRESS (research agent running)
  3. DCA DNS/WHOIS infrastructure analysis — IN PROGRESS (research agent running)
  4. CO General Assembly witness lists for age verification bills — IN PROGRESS (research agent running)
  5. Headwaters Strategies expenditure reports — IN PROGRESS (research agent running)
  6. Wayback Machine DCA website history — IN PROGRESS (research agent running)
  7. Arabella network 990 filings (Sixteen Thirty Fund, ConnectSafely, Windward, Hopewell) — IN PROGRESS (research agent running)