# Anomaly Report: Meta Linux Research Database **Generated:** 2026-03-11 **Scope:** Full audit of SQLite database, 5 collected bill texts, lobbying CSVs, 1,477 cached files, and 17 generated reports **Total anomalies identified:** 37 --- ## Summary | Category | Count | Critical | High | Medium | Low | |---------------------------------------|-------|----------|------|--------|-----| | Data Collection Failures (scrapers) | 12 | 4 | 3 | 1 | 4 | | Substantive Research Findings | 9 | 0 | 0 | 9 | 0 | | Database Integrity (duplication) | 9 | 0 | 4 | 2 | 3 | | Report Quality (artifacts) | 5 | 0 | 1 | 1 | 3 | | Missing Data (collection gaps) | 2 | 0 | 1 | 1 | 0 | | **Total** | **37**| **4** | **9**| **14** | **10** | --- ## CRITICAL — Data integrity issues that undermine research conclusions ### C1. Kernel Signoff Analysis Missed Meta Entirely **Location:** `output/reports/kernel_signoff_analysis.md`, `output/reports/ebpf_ecosystem_analysis.md` **Script:** `scripts/kernel/signoff_analysis.py` The domain-matching regex in the kernel signoff analyzer failed to match `@meta.com` and `@fb.com` email domains, producing a **0.0% Meta signoff share** across 2,500 kernel commits (v6.10–v6.15). This directly contradicts known facts: - The `entities` table contains 5 named Meta BPF employees with detailed maintainer roles: - Alexei Starovoitov (BPF co-maintainer) - Andrii Nakryiko (BPF libbpf maintainer) - Martin KaFai Lau (BPF networking) - Song Liu (BPF tracing) - Yonghong Song (BPF compiler) - The corporate scorecard lists AMD (8.34%), Intel (7.73%), Red Hat (9.6%), kernel.org (14.66%), etc. — but Meta/Facebook are entirely absent. - The "12.5% gatekeeping claim assessment" and all kernel influence conclusions derived from this data are unreliable. **Impact:** HIGH — This is the central thesis of the research. The actual Meta signoff share could be significant, especially in the BPF subsystem where Meta employees are primary maintainers. **Root cause:** The corporate domain-matching logic either doesn't include `meta.com`/`fb.com` patterns, or the HTML entity encoding fix (`<`/`>`) didn't fully resolve parsing of email addresses in commit tags. **Recommended fix:** Add `meta.com`, `fb.com`, and `oculus.com` to the corporate domain map. Verify regex captures against raw commit HTML. Rerun analysis. --- ### C2. AOSP Gerrit Data is Entirely Fictitious **Location:** `output/reports/aosp_contribution_comparison.md` **Script:** `scripts/kernel/aosp_gerrit_analysis.py` The AOSP Gerrit comparison report shows **every vendor at 0 contributions** — Samsung, Qualcomm, and Meta all at zero, with Google at 1 (abandoned). This is impossible: - Samsung and Qualcomm are major AOSP contributors with thousands of upstream patches. - The Gerrit API queries failed silently (likely authentication or query syntax issues). - Results were written to the report as if they were valid data. **Impact:** HIGH — This entire dataset must be discarded. Any citations of AOSP contribution comparisons are unsupported. **Root cause:** The `owner:domain:facebook.com` query returns nearly zero because Meta forks AOSP for Quest devices but doesn't upstream to AOSP Gerrit. However, the same pattern also returned zero for Samsung and Qualcomm, indicating a systematic query failure (not just a Meta-specific issue). **Recommended fix:** Discard the current report. If AOSP analysis is needed, use the AOSP source tree commit history directly rather than the Gerrit REST API. --- ### C3. LA HB-570 is the Wrong Bill **Location:** `data/raw/bills/LA_HB-570.txt`, bills table row #3 **Script:** `scripts/legislative/bill_collector.py` The LegiScan API returned the wrong bill for Louisiana HB-570: - **Expected:** Age verification / online child safety legislation - **Received:** "Authorizes a taxing authority to adjust a millage rate..." — a property tax bill - **File contents:** Binary PDF data of the tax bill, not parseable text The scraper retrieved the wrong session's HB-570. The correct bill may be in a different legislative session than what was queried. **Impact:** HIGH — All bill similarity analysis involving LA HB-570 (bill #3) is invalid. The bill_similarity scores comparing binary PDF against HTML text are meaningless. **Recommended fix:** Identify the correct session year for Louisiana's age verification HB-570. Re-fetch with correct session parameter. The web scraper fallback (`_scrape_louisiana`) may also need session correction. --- ### C4. PAC/Expenditure Data is All Scraping Artifacts **Location:** `lobbying_expenditures` table (5 records) **Script:** `scripts/lobbying/pac_tracker.py` All 5 `lobbying_expenditures` records show `$0.00` with period values that are HTML element labels, not data: | Record | Period Value | Interpretation | |--------|----------------|-----------------------| | 1 | `"Search"` | Search button label | | 2 | `"TRACER "` | Website name | | 3 | `"Contrib"` | Column header | | 4 | `"*"` | Footnote marker | | 5 | `""` | Empty string | The CO TRACER scraper parsed the website's form elements and navigation chrome as contribution records. The research knows Meta spent $45M+ through ATEP and millions on federal lobbying, but the expenditure table contains zero usable data. **Impact:** HIGH — Any claim about PAC contribution amounts is unsupported by the database. The "8 records, $0 total" finding is entirely artifactual. **Recommended fix:** Rewrite the TRACER HTML parser to correctly identify the data table rows vs. navigation elements. Consider using the TRACER data export (CSV download) instead of HTML scraping. --- ## HIGH — Significant data quality or logical issues ### H1. Massive Entity Duplication **Location:** `entities` table (18 duplicated entities, 36 excess rows) 18 entities appear exactly twice with slightly different `notes` fields, suggesting the entity loading script was run twice without deduplication: - Organizations: Meta Platforms Inc, Facebook Inc, Meta Platforms Technologies LLC, Digital Childhood Alliance, ATEP, CCME, ConnectSafely, ICMEC - People: Casey Stefanski, Melissa McKay - Legislators: 4 unnamed entries - Vendors: Yoti, Jumio, Veriff, AgeKey/K-ID **Recommended fix:** Add `INSERT OR IGNORE` or `ON CONFLICT` deduplication to the entity loader. Deduplicate existing records. --- ### H2. FOIA Requests Duplicated **Location:** `foia_requests` table (6 records representing 3 actual requests) All 3 FOIA requests are duplicated: 1. CO Secretary of State (×2) 2. CO Attorney General (×2) 3. LA Board of Ethics (×2) **Recommended fix:** Deduplicate. Add unique constraint on `(agency, subject)`. --- ### H3. Findings Table Contains Massive Duplication and HTML Artifacts **Location:** `findings` table - Horizon OS compliance findings appear **4 times each** - Other key findings appear 3–6 times - 7 findings contain raw `` markup from failed hearing transcript parsing - One finding (hearing transcript keyword match on HTML boilerplate) was duplicated **6 times** **Recommended fix:** Deduplicate findings by `(task_id, finding_text)` composite key. Filter out findings where the text starts with `