The Future Of Health Data In The Age Of AI

Júlia Keserű

The Future Of Health Data In The Age Of AI

Sensitive data about your health used to be relatively safe and anonymous. AI is making it much easier for that information to be used against you in opaque ways.

Ibrahim Rayintakath for Noema Magazine

Essay Technology & the Human

By Júlia Keserű July 22, 2025

Credits

Júlia Keserű is an independent researcher and writer working at the intersection of technology and justice. She was a senior tech policy fellow at Mozilla Foundation from 2023 to 2025.

I remember clearly the voice of the doctor who called to tell me I had advanced breast cancer. It was a September afternoon in 2021. In the midst of shock, I heard words that comforted me. She was calming, her tone sweet; she handled my pain with the grace of a seasoned health professional: with wisdom, poise, empathy. I was speaking to a human who, I believed, felt compassion for my agony.

The moment was fleeting. Within days of the diagnosis, I entered the inner workings of a weird machine. Every milestone of my journey through cancer was conveyed to me through a digital health record system set up by the government in Hungary, where I am from. The system delivered a digitized pathology report that informed me of local metastasis (the cancer’s spread to my lymphatic system) and how that would be treated. In effect, I was being told of the increased likelihood of my early death, and also how that might possibly be avoided, by a database that had been designed to free up my doctors’ time.

The way in which frightening information was delivered over those months deepened my depression and worsened the feeling that I was losing control of my own health and life. But in navigating EESZT, as the Hungarian database is known, something else came into view: I became very aware that I didn’t know who had access to my information. Was it just my oncologists — or other doctors too? What about the rest of my medical history: my mental health treatments and reproductive journey, past lab results and CT scans?

I soon learned that any doctor who had ever treated me — and anyone with a doctor’s permission to access EESZT — could view my entire record, unless I changed my privacy settings. Even with the European Union’s strict data protection regulations, the system’s default made much of my health data available to a large number of people.

Private companies also entered the fray. My doctors offered me AI-powered software to predict how likely my cancer was to respond to chemotherapy. I ultimately didn’t use it — mainly because of the specifics of my cancer, not the technology — but if I had, it would have gathered a vast genetic dataset about me. Later, I did turn to ChatGPT, which archives and uses every user input, to distill complicated research and to ask the questions I didn’t want to saddle my family with: Are these symptoms really serious? How do I cope with my dark thoughts? Through my search history, Google too began to accumulate a clear picture of my situation at any moment, of how my mental and physical health was changing, like a digital diary of my dual fight against cancer and depression.

And so it was no longer just health professionals who had access to sensitive information about my health. By collecting troves of data about my health, Silicon Valley companies did too — and so did whoever they decided to share that information with. This rapidly expanding shadow industry, as I came to learn in the years after my illness, is wielding increasing influence over our bodies and futures.

“A major paradigm shift is needed in how we regard our bodies and their boundaries online.”

Over the past decade, the ways in which we share information about our health have undergone a profound shift. There is now broad acceptance of the idea of depositing health data — from fitness levels, breathing and sleep patterns to mental health conditions, sexually transmitted diseases and genetic predispositions — into commercial and public databases, often with varying levels of security and questionable consent policies.

Once my tumors were removed and I went into remission, I began using digital tools that offered the faint promise of a long life: exercise and diet trackers that nudged me to eat better and remain active; sleep and meditation apps that promised help in alleviating my anxiety. I was 38 at the time of my diagnosis; facing the prospect of dying young, the feeling of control offered by these apps was comforting.

But I didn’t use them lightly. I had by then been working for years as a digital rights activist focused on the societal impact of emerging data-driven technologies. The private companies behind such technologies often justify their use of body-centric data as essential for scientific research, population health and online safety. Yet around the time of my diagnosis, stories were surfacing of unsecured information from apps like Google FitBit and Apple HealthKit ending up online. Even therapy apps like Talkspace or period tracking apps like Flo Health were caught sharing sensitive data with third parties without user consent. In some cases, data was left exposed accidentally; in others, the companies shared it intentionally.

As I became more dependent on these apps, a feeling grew that I was handing sophisticated insights about myself to an opaque, underregulated, rapidly expanding industry. The market for electronic health record systems like EESZT or Epic Systems in the U.S. — developed by private tech companies and managed by public entities — is projected to reach almost $50 billion by 2032, while the mobile health industry, which includes fitness and other apps as well as remote health devices, could grow to $350 billion by the end of the decade. Add in other forms of biometric data collection, such as facial recognition or genetic testing, and the combined market for body- and health-focused data could reach between $500 and $600 billion by the end of the decade. For comparison, the global pharmaceutical market was valued at around $1.7 trillion in 2024.

As the health tech sector has surged, enormous biopolitical power has become concentrated in the hands of a few influential companies. These entities have the potential to dramatically influence individual health behaviors and shape broader population health outcomes.

Several years after my illness, during a fellowship at the Mozilla Foundation, I began a research project exploring the growth, governance and societal impact of what I now refer to as “body-focused” technologies. Out of this research, two scenarios for the future emerged.

In one, advancements in data collection and analysis dramatically improve health outcomes to the benefit of many. Mass collection of data is already advancing medical research by, for instance, revealing how patients respond to treatments, detecting individual diseases early and predicting mass outbreaks. The U.S. Centers for Disease Control’s FluView tool collects data from hospitals, clinics and labs to monitor influenza circulation and detect virus changes; AI tools are able to estimate a tumor’s molecular profile with high accuracy, supporting and enhancing the work of radiologists and oncologists. Innovations of this kind are likely to continue: Work is underway to use voice recognition to detect potential cognitive and respiratory health issues.

But in the other scenario, a rapid expansion of the health tech industry and an accompanying increase in cybersecurity incidents compromise the sensitive data of millions of patients and potentially endanger the medical services of entire nations. Such a future is already here, in fact: In Ireland in 2021, healthcare institutions across the country were paralyzed after a massive online extortion campaign.

As time passes, such incidents are even more likely to recur, with public disquiet growing in tandem. Surveys I’ve undertaken revealed widespread unease about unauthorized data sharing, particularly of health and biometric information, and a strong demand for protections that current systems often fail to provide. Existing legal frameworks fall short in addressing these emerging challenges, and that gap appears to be growing.

“Our willingness to surrender health information so freely not only jeopardizes individual privacy and autonomy but also poses significant risks to societal health.”

Complicating this uncertain new world is another rising industry: data brokers who purchase, sell and trade insights about people’s mental and physical well-being, often without their knowledge. Using a range of methods, companies like Acxiom, Experian, Equifax and LexisNexis gather a wide range of personal data and then package and sell it to advertisers, insurers, credit agencies and even law enforcement or government agencies. Hundreds of data broker companies operate worldwide, with the U.S. being among the most developed and least regulated markets.

This brokerage practice has a long history. Before the internet, companies scoured public records, phone books and other physical data resources, such as land registers, with the same intent: to get information about people that could be used to determine their credit scores or sell them something. The internet shifted the means by which data was gathered, yet for most of the lifespan of the web, brokers have tended to focus on publicly available information: social media posts, public records, loyalty programs and other sources that enable them to categorize individuals in ways that make sense for advertising and targeting. If a woman’s Facebook profile showed her reaching childbearing age, for instance, data brokers might classify her as potentially pregnant and sell that information to fertility and parenthood companies.

While these strategies enabled targeted advertising, the data was often flawed, relying on superficial or partial information and making sweeping assumptions that failed to reflect individual realities. But soon, collection methods became more sophisticated. Software developers began embedding tracking tools, such as Software Development Kits (SDKs), into mobile apps that could silently gather data on phone users’ network activity, usage patterns and interaction history, their health and fitness. Many SDK providers then aggregated and analyzed this data, and some leased or sold access to third parties, including data brokers. With this, brokers could craft highly detailed and nuanced profiles of individual users, mostly without their knowledge and consent, and sell these to advertising firms. Although most of it was aggregated or anonymized, research has repeatedly shown that re-identifying data — the process of matching anonymized data back to individual identities — is surprisingly easy.

Information brokerage is not inherently evil. Brokers have served as a crucial resource for pharmaceutical companies and health insurers, providing information that influences their interventions and policies.

“Data is now much more versatile, and machine learning tools are capable of generating remarkably accurate behavioral and health predictions from even the most ordinary information.”

But while the end results can sometimes be beneficial, the ways in which medical data is being gathered, sometimes from the dark web, have drawn attention to the industry’s questionable legality, if not morality. And acquiring data that way is getting easier by the hour. In the U.S. alone, health-related online breaches skyrocketed from just 18 in 2008 to 734 in 2024, when the data of 275 million people was stolen by hackers who often then put it up for sale.

In particular, numerous cases have involved the theft of genetic information, which is a gold mine for private firms given its ability to reveal individuals’ predispositions to various cancers and hereditary conditions and also their behavioral tendencies and even their susceptibility to mental health conditions. In 2023, the San Francisco-based genomic and biotech company 23andMe, which this year filed for bankruptcy, confirmed that about a million genetic data points had been stolen in an attack and made available for purchase on the dark web. The hackers gained access to the data of almost 7 million users and offered for sale millions of detailed, individualized genetic profiles, including ethnic, phenotypic and familial data.

Unsecured physiological health data is just one part of this problem. Increasingly, people intentionally share their mental health journeys, deepest thoughts and suicidal impulses online, and brokers trade that information too. Researchers from Duke University found that many data brokers openly advertise highly sensitive data about individuals’ experiences with depression, attention disorders, insomnia, anxiety, ADHD and bipolar disorder. They are known to sell this data on the open market without properly vetting buyers and with few controls on its use. “Many of the studied data brokers at least seem to imply that they have the capabilities to provide identifiable data,” the Duke University report warned.

The implications of this are profound. Insurance companies, which have already used broker-provided data on race, marital status and even media consumption to set premiums, are now able to use detailed health and genetic information. In one case from Australia, a man was refused life insurance solely because of his genetic makeup. It isn’t a stretch to see this soon extending to other areas — hiring processes, education admissions, housing eligibility — where biological data becomes a weapon for bias and exclusion.

At the onset of my illness in late 2021, I found myself more at ease sharing certain types of personal information with private companies. Details like my fitness routines, sleep patterns or lifestyle choices seemed relatively innocuous if shared with mobile app developers. The worst consequence, I thought, might be a targeted ad for a local gym or a diet regimen. I felt secure that GDPR and other established protections would shield more sensitive data about my core personal identity from misuse.

Legal data privacy regimes are primarily founded on the idea that some information is harmless while other information is inherently more sensitive. In the early days of the internet, this distinction made sense: Without advanced computational capabilities to link together strands of data and identify patterns, information about breathing patterns, for example, was less sensitive than details regarding your sexual orientation.

Not anymore. Data is now much more versatile, and machine learning tools are capable of generating remarkably accurate behavioral and health predictions from even the most ordinary information. When combined with your facial expressions, for instance, data on your breathing patterns can yield alarmingly accurate predictions about who you may be sexually attracted to. In a 2023 paper, privacy scholar Daniel J. Solove writes of this new era in data analysis:

Personal data is akin to a grand tapestry, with different types of data interwoven to a degree that makes it impossible to separate out the strands. With Big Data and powerful machine learning algorithms, most nonsensitive data give rise to inferences about sensitive data.

In other words, in the age of AI, even innocent bits of information about our bodies and our health can be combined to make sophisticated predictions about us.

If we accept that nearly every piece of data we share can signify potentially sensitive information, the concept of sensitivity becomes obsolete. Where does that leave us in terms of our approach to data protection?

It’s clear that a major paradigm shift is needed in how we regard our bodies and their boundaries online. Instead of fixating on the specific types of data we share, the focus should turn to the potential impact of any data that reveals clues about our overall health and well-being. Questions that I hadn’t had to consider at the start of my illness now need to be asked: What harm could a single data point cause if combined with other datasets? What are the consequences if such data is aggregated?

There are straightforward ways a person can assert greater control over their own data and where it goes: by limiting the use of data-hungry mobile apps, by opting for platforms with a proven track record of prioritizing privacy, by considering how third parties can access data before choosing what to share. Cloud storage, for instance, makes personal data more susceptible to being stolen or hacked. By reverting to local data storage solutions like user-controlled hard drives, people can limit how much companies or governments can know about them.

“In the age of AI, even innocent bits of information about our bodies and our health can be combined to make sophisticated predictions about us.”

Then there are laws. A key reason data can fall into the wrong hands or be used against us is the absence of robust legal frameworks that address the unique challenges posed by the surge in health data collection, particularly as this data is no longer just gathered by traditional healthcare entities. In various jurisdictions, including the U.S., some organizations that collect health data — such as mobile health apps and certain genetic testing companies — are not, as we’ve seen, subjected to the same stringent privacy and security laws as hospitals and doctors.

But privacy experts are increasingly in agreement that while data protection laws are essential, current regulations often lead to confusing and inconsistent protections that overlook the complexities of data use and harm. There is a growing chorus of voices arguing for laws to focus less on types of data and more on the risks and harms that any data can generate when used in a certain way. In other words: Don’t just restrict particular data categories — place more responsibility on tech companies to prove that their data practices pose no risk of harm to any of their users.

Washington state has already made headway with its My Health, My Data Act, which extends broad protections to all health-related inferences, even those derived from non-health data, such as information on grocery purchases or social media posts. The scope of scrutiny also needs to broaden, and significantly greater burdens need to be placed on tech companies than in the past. The Health Insurance Portability and Accountability Act (HIPAA) was passed in 1996 to govern the actions of traditional healthcare providers; its mandates now need to cover the tech companies managing health data or making health inferences, including mobile health app developers, which currently operate outside the scope of the law.

As we confront the shortcomings of our current privacy regimes, the call for innovative technological approaches to bolster consumer trust and control of data also grows louder. Companies are coming to understand that by going beyond mere legal compliance — by, for instance, integrating human-centric consent mechanisms into their platforms that grant users full transparency over the use of their data — they are much more likely to retain users and therefore stay competitive. But none of this will happen unless consumers loudly demand that their privacy be respected, protected and prioritized as a fundamental right.

Even long before my diagnosis, I’d been intrigued by the ways in which the digital landscape intersects with our bodily autonomy and integrity. When I brought my daughter into the world in 2017, this concern only intensified. I began to notice how even small instances of digital intrusion, such as strangers taking photos of her without my permission, evoked a visceral, almost primal response within me.

In the past, such breaches might not have disturbed me so profoundly. But with facial recognition technology able to identify someone and retrieve online information within seconds, the sense of losing control over my daughter’s body, as well as mine, became all the more disturbing. Parenthood is an inherently nerve-wracking experience that forces us to relinquish power over many aspects of our lives, but such an extreme feeling of “digital powerlessness” left me, as a new mother, with a dread that I’d not felt before.

What I now realize, through personal experience and research, is that nuanced biological data, while vastly expanding our understanding of the human body and mind, threatens to transform Orwellian dystopias from fiction into reality. Our willingness to surrender this information so freely not only jeopardizes individual privacy and autonomy but also poses significant risks to societal health, undermining the integrity and fairness of our healthcare systems and threatening our safety and security as patients. This is especially troubling when we consider a plausible future in which data-driven innovation genuinely enhances our well-being, making us ever more eager to hand over data, which is then turned against us and used as fuel for sweeping commercial surveillance and the consolidation of immense biopolitical power in the hands of a few companies.

And yet, I believe it’s not too late to correct course. Beyond pushing for legislative reforms to curb industry overreach, people ought to fundamentally rethink their digital consumption, questioning how much of their private lives they are willing to relinquish.

I want my daughter to grow up in a world where she can exercise genuine control over what happens to her body online — whether in her interactions with strangers on social media or with private companies and government entities. I envision a future for her where respect for her bodily autonomy is the default in digital spaces, not an exception justified by the mantra of “moving fast and breaking things.” Only then can we begin to steer the trajectory of technological progress toward a truly equitable and humane future.

The Future Of Health Data In The Age Of AI

More From Noema Magazine