Superfund Map draft

Abstract

There are over a thousand “Superfund” sites in the U.S., which are the most contaminated toxic waste sites being cleaned up by the EPA. Many of these sites have hundreds of pages of technical documentation available, but there exists no realistic tools by which regular interested citizens can learn and engage about their surroundings—digital maps by major tech companies exist to drive commerce, not educate. I am making an easily navigable map of hazardous Superfund sites with clear summaries of their histories/impact on local health. To accomplish this, I’m archiving the dwindling documentation from government agencies about these sites, and training an LLM using embeddings of these documents to be able to answer questions through a novel spatial interface.

Introduction

America has over a thousand sites that are so contaminated with toxic matter they have been taken over by the federal government for cleanup, under the CERCLA designation started in 1980, also known as “Superfund sites.” New Jersey, California, and Pennsylvania have the highest concentrations of contaminated sites—all states where my family and I live. These are former and current ports, mines, landfills, factories, sewage treatment plants, power plants, etc. Since in most cases private entities caused the damage, the federal government attempts to get those entities to pay for cleanup costs, but public money is used as a backstop. Because of the EPA’s role here, the government has published extensive documentation on why sites deserve the Superfund designation, with the results of inspections, cleanup plans, and more. These documents are all available to the public, but are housed on websites designed for those with a Ph.D. and EPA employees, using a long list of jargon and acronyms, coupled with inscrutable visual design, running on high-bandwidth, low-accessibility websites. The information is available, but not realistically accessible, to the public. Moreover, many sites have hundreds of pages of documentation that take hours to comb through even for readers who can understand them. The result is that most Americans are unfamiliar with Superfund sites near them and the potential impact on their health.

The default tool most people reach for to understand the geographic world is digital maps like Google and Apple Maps. But due to their creation being funded by massive technology companies looking to make ad revenue or license data for commercial use, our digital maps show landmarks one can drive a car to or spend money at; preferably, for commerce, both (Campbell). Besides parks, there is sparse information on any other features of our land: industrial hazards, infrastructure, and everything that powers our world—and will outlast us—is not marked, including Superfund sites, much less the hazards they have and continue to pose to public health and the environment. While we understand these maps as fairly complete indexes of our surroundings, they are indexes of our current commercial surroundings. This project is about building a map of the inverse, focusing exclusively on the land and the hazards.

Background

Prior to the first Trump administration, the EPA published a tool called TOXMAP to make the Superfund dataset more easily navigable, which went offline before my research took me to this topic (Post-Tribune). The current standard tool from the EPA is an ArcGIS map, and states like California have their own specialized tools, but all suffer from high bandwidth requirements, difficult to read typography and visual design, and oftentimes poor accessibility. The Trump 2.0 administration is now working to rapidly take down content about climate, health, and the environment, including content they largely left online on federal websites in the 1.0 administration (Santarseiro). The current political climate demonstrates the urgency of the project, as data—data I have argued is already too inaccessible, if available to the American public—is rapidly disappearing from EPA and other environment-related agencies’ websites.

Kate Crawford’s 2023 essay “Earth” in Datapolis: Exploring the Footprint of Data on Our Planet and Beyond traces the connection between the tech industry/San Francisco/Silicon Valley and Earth’s land. The section that stuck with me was reminding us how wealth first came to San Francisco through the Gold Rush—a dangerous, polluting, extractive process based on hype and individualism that made certain men wealthy. Then this spring, journalist Justine Calma wrote a longform piece for The Verge, “The women who made America's microchips and the children who paid for it,” which re-ignited my interest in Superfund. It follows the generation of women that worked in the chip fabrication labs of Silicon Valley in the 1980s, the hazardous materials they worked with, and the generation of children they had with frighteningly high levels of birth defects and developmental disabilities. The article brought to my attention one specific result of Superfund cleanups, land use restrictions, which are oftentimes not followed by future tenants of properties, e.g. building schools or churches over contaminated soil.

While Silicon Valley in my lifetime has had a reputation for hype cycles, these essays reveal how this pattern repeats back to its very inception: cycles of primarily men chasing a dream of individualist extraction that produces outsized wealth, often with lofty ambitions for how this extraction will improve the world. The Gold Rush in the 19th century saw people flock to the unknown West to extract gold, leaving behind polluted waterways. The silicon boom in the 20th century manufactured microprocessors with a slew of toxic ingredients and few worker protections, minting billionaires and household names like Intel alongside a trail of polluted Superfund sites. Now in the 21st century, AI labs have switched to extraction of digital information, outsourcing the toll of manufacturing to Asia and the energy/water costs to more rural areas of America, where they are celebrated as job-creators. In each century, a nationally-hyped dream brought a new generation of young men to Silicon Valley, who built their dreams no matter the environmental costs and a few concentrated the wealth for themselves. Each movement did change the world profoundly, though not without consequences their backers mostly omitted from the narrative. Tech bubbles are built on hype that materializes into technical innovations and fortunes for a lucky few, but abandon someplace in shambles.

This understanding of technology as rooted in environmental extraction/destruction, and the grounding of the “cloud” in the land, is a key factor in my work. No digital computer has ever existed without excising a toll on the natural environment and the people who made it—from Alan Turing in the British military to the modern TSMC iPhone processor’s manufacturing in Taiwan. Though the physical chips continue to shrink, the toll of their manufacturing continues to grow. And as our expectations of compute grow—the more data we’re accumulating, the more processing we do of it—the more servers are needed, and the more electricity and water they consume. LLMs are novel in exacting their scale of toll on the environment, in fossil-fuel-powered electricity usage, water usage for cooling, and the extraction/production that goes into the servers’ microprocessors, completely removed from the interface or site of usage.

LLMs are also novel at answering wide-ranging questions rapidly, if not without asterisks. The latest research indicates that leading AI “answer engines” such as Perplexity get only 2/3 queries correct, and some popular models such as Grok get 94% of queries incorrect (Jaźwińska and Chandrasekar). While models’ default training data leads them to make frequent errors, these can be mitigated with techniques including RAG, where an LLM is pointed at a corpus of text and grounds its answers in those texts instead of general world knowledge.

The state of the art for interacting with LLMs is the scene of much experimentation. Amelia Wattenberger, an interface researcher/prototyper, writes about LLMs as providing a service to text similar to zooming a map, where different landmarks become revealed or hidden as you zoom—LLMs can keep track of what’s important in text and change the length/level of detail at any level of granularity. This closely tracks the concept of ingesting many source documents and composing high-level summaries of site histories, with the ability to drill into more detail by asking questions: zooming out the map. One of the visualizations she explores as part of this concept is an alternative mechanism to serve the hyperlink, exploding out a handful of cards that reorganize when highlighted words are clicked.

Two other recent technical demos show new ways of interacting with LLMs. At South Park Commons in January 2025, Toby Brown showed off the idea of using text selection to initiate follow-up queries to an LLM, as opposed to the linear lists of suggestions employed by “traditional” LLM UIs like Perplexity. The selection allows you to choose any part of the response as a jumping-off point, and could lend itself to asking multiple new questions per answer, instead of having one linear conversation as demonstrated here. Eddie Jiao has prototyped an LLM/infinite canvas experience designed for documents/reading. Jiao’s demos, while early, bring LLMs into a context of reading and learning, using branching results and follow-up questions.

Methodology

LLMs should never be used in artwork unthinkingly. There are valid artistic reasons to include them—but similar to toxic metals or other corrosive physical art materials, they need to be weighed with the societal cost of extraction/manufacturing/storage/use of those materials. LLMs are largely being used and developed to accelerate extractive capitalism, yet they also make fantastic tools for exploring curiosities with lower friction than traditional research. While even RAG does not guarantee zero hallucinations or mistakes, I believe they can create tools for learning about the world so much more accessible to people that the alternative is a void of knowledge, not a perfect research process. This project attempts to justify the destruction of using LLMs to flip the motives for their overuse on their head, plus using technical steps like caching responses and on-device processing to reduce the pollution that goes into making the project itself.

Laurie Voss, a web developer working on a popular RAG tool, wrote two principles to keep in mind about LLMs: “LLMs are good at transforming text into less text” and “LLMs only reliably know what you just told them, don't rely on training data” (Voss). This archive of documents provides plenty of information to feed into an LLM, and this project relies on the concept of turning a lot of text into less. To my knowledge, LLMs have not been applied to the Superfund dataset; this is a novel contribution to both the environmental and AI/web technology/design fields.

While per my initial research the data this project relies on continues to be available, the National Security Archive’s work in preservation, combined with that of Archive.org and other nonprofit initiatives, could become necessary for documentation related to environmental justice that needs to be part of this project’s LLM training. I expect to use a litany of both open and closed source tools in this process, including the Chroma vector database, PostgresQL with pg_vector, OpenAI embeddings and responses, the Perplexity Sonar API, the Vercel AI SDK, Next.js, Tailwind CSS, and more.

The project focuses on the most dangerous, extreme sites, those listed in Superfund’s National Priorities List, and leverages advancements in LLMs to not only render statistics, but communicate the gist of the history and status of these sites to the public. I am creating a tool for connecting with the land around us not as consumers, but as organisms living in an ecosystem we experience, and one which makes it easy to be curious and learn about our surroundings. The web tool shows a satellite map of our areas, grounding visitors in the texture of the land itself, highlighting the Superfund sites. On opening one, visitors can read a brief synopsis of the site’s history and present, and ask questions about it. The LLM powering these answers is trained on the documentation published for each Superfund site, stored as vector embeddings, using RAG to provide answers based on per-site documentation and mitigate hallucinations.

The interface includes a novel interaction where selecting text in LLM answers allows for branching off new lines of inquiry, allowing visitors to ask wide-ranging questions about the world using this map as a grounding space, physically and digitally. The panels float over the map’s surface, tied around the point in space, and associated with each site; as visitors navigate around, they can build a web of knowledge over the land itself. For each question, visitors can choose between the site-specific RAG model and world knowledge from models like Perplexity. Answers can easily be dismissed, or saved.

The website exists as a portal out of the commercial web, out of the AI productivity hype cycle, and into a space designed for discovery and learning. No two journeys through this map will be the same, driven by the visitor’s path through physical space, their intellectual curiosity, and the non-deterministic answers of LLMs. Instead of government databases with rigid tables, acronyms, difficult typography, and dated designs, this space combines brand-new AI technology and novel interface design with decades of historical archives, attempting to bridge environmental history and our present surroundings with the digital space we primarily now occupy.

Works Cited

Brown, Toby. “Beem Demo.” Twitter, 25 01 2025, https://x.com/gopalkraman/status/1883248138745741750. Accessed 03 03 2025. Calma, Justine, and Amelia Holowaty Krales. “The women who made America's microchips and the children who paid for it.” The Verge, 19 February 2025, https://www.theverge.com/features/611297/manufacturing-workers-semiconductor-computer-chip-birth-defect. Accessed 3 March 2025.
Campbell, Lachlan. “Remapping Our Landscapes.” @lachlanjc/edu, 15 May 2023, https://edu.lachlanjc.com/2023-05-15_dis_remapping_our_landscapes. Accessed 3 March 2025.
Crawford, Kate. “Earth.” Datapolis: Exploring the Footprint of Data on Our Planet and Beyond, edited by Paul Cournet and Negar Sanaan Bensi, Nai010 Publishers, 2023. Accessed 4 March 2025.
Jaźwińska, Klaudia, and Aisvarya Chandrasekar. “AI Search Has A Citation Problem.” Columbia Journalism Review, 6 March 2025, https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php. Accessed 11 March 2025.
Jiao, Eddie. “Common Knowledge Demo.” Twitter, 25 01 2025, https://x.com/gopalkraman/status/1883248154491166912. Accessed 03 03 2025.
Post-Tribune. “Federal agency shuts down website with interactive map showing pollution sources and Superfund sites.” Chicago Tribune, 6 January 2020, https://www.chicagotribune.com/2020/01/06/federal-agency-shuts-down-website-with-interactive-map-showing-pollution-sources-and-superfund-sites/. Accessed 3 March 2025.
Santarsiero, Rachel. “Disappearing Data: Trump Administration Removing Climate Information from Government Websites | National Security Archive.” National Security Archive, 6 February 2025, https://nsarchive.gwu.edu/briefing-book/climate-change-transparency-project-foia/2025-02-06/disappearing-data-trump. Accessed 3 March 2025.
Voss, Laurie. “What I've learned about writing AI apps so far.” Seldo.com, 20 January 2025, https://seldo.com/posts/what-ive-learned-about-writing-ai-apps-so-far. Accessed 11 March 2025.
Wattenberger, Amelia. “Fish Eyes.” Amelia Wattenberger, 6 December 2024, https://wattenberger.com/thoughts/fish-eye. Accessed 3 March 2025.

Posted March 11, 2025 via GitHub