Human Computer Integration Lab

Computer Science Department, University of Chicago


Essays (long-form writings from our lab)

(Back to all essays)

Essay #7: The Missing Substrate: Nervous System Data Bottleneck (Prof. Pedro Lopes)

April 29th, 2026

chilling at UIST 2025

1. Concrete Bottleneck

Over the past decade, multiple fields have undergone paradigm shifts due to the availability of large-scale, shared datasets paired with foundation models. Computer vision was transformed by ImageNet and subsequent web-scale image corpora; natural language processing by Common Crawl and Wikipedia; protein science by the Protein Data Bank (PDB); and robotics is beginning to benefit from shared action datasets and the improvements are becoming rapidly tangible in that field.

However, at the intersection of neuroscience and Human–AI interfaces, an equivalent paradigm is conspicuously absent.

The bottleneck is not simply a lack of data, but the lack of structured, shared, multimodal nervous system datasets that link stimulation, sensing, and perception or action outcomes. Examples of critically missing datasets include: Neural signals (EEG, ECoG, intracortical), Peripheral signals (EMG, skin conductance, nerve activity), Stimulation parameters (FES/EMS, TMS, tDCS, haptics), Kinematics (body pose, torque/forces), or even Ground truth outcomes (perception, behavior, adaptation, contextual clues from environment, etc)—these are absent in large scale, the scale needed to prevent this bottleneck.

Today, progress remains fragmented across clinical labs, Human Computer Interaction research groups (like mine), and psychophysics communities, each producing small, siloed datasets that cannot support generalizable models. While I am proud of my group's work on this area (e.g., we have 10+ years of data on how the nervous system reacts to electrical muscle stimulation, brain stimulation, tactile stimulation and much more), I do worry that the datasets are small, incompatible with those of other labs (simply said: we do not have a way to share those nor agree on how to do so) and locked in my lab's servers.

2. Why Existing Structures Fail

This bottleneck persists due to systems-level constraints rather than purely technical limitations.

  • Incentive misalignment: Academic systems reward novelty over dataset curation.
  • Hardware heterogeneity: Data is collected using incompatible, custom setups (even in our lab we struggle reusing the same templates and hardware/firmware setups over multiple generations of students and hardware variants).
  • Lack of shared schemas: No standard representation exists (moreover, academia again does not tend to reward the arduous task of creating these "codecs"/schemas).
  • Small-N culture: Studies rely on limited participants.
  • Perceived irreproducibility: High variability discourages sharing.
  • Regulatory barriers: Nervous system data is often tightly coupled to papers and difficult to share.

The result, I argue, is that a single individual lab cannot resolve these issues, and most incremental efforts insofar have failed to produce a shared data substrate.


3. Hypothesis

If we create a minimally viable, shared, multimodal repository of nervous system data, paired with a standardized schema, and train foundation models on top of it, we will observe emergent generalization across tasks, devices, and users. We might then, expect in return to see an explosion in data sharing in this space, leading to innovations in this space.


4. Proposed Experiment

Phase 1: Minimum Viable Dataset

Aggregate data from 5–10 labs, ensuring to include a wide range of datasets, such as EMG, haptics, and neural data where available. Intiate a dialog to estabelish and standardize time-synced multimodal streams.


Phase 2: Train Models

In the second phase, we train models that jointly capture the relationships between sensing, stimulation, and resulting outcomes. Rather than treating these components in isolation, the goal is to learn a unified representation that links neural and peripheral signals, stimulation parameters, and behavioral or perceptual responses within a single framework. Because real-world datasets in this domain are inherently incomplete, the models must also be designed to handle missing modalities—learning from whatever combination of signals is available while still maintaining coherent internal structure.


Phase 3: Generalization Tests

The final phase evaluates whether the learned models genuinely generalize beyond the conditions under which they were trained. To do this, we adopt a leave-one-lab-out protocol: models are trained on data aggregated from multiple labs and then tested on a completely held-out lab whose data was never seen during training. This setup provides a stringent test of whether the model has captured underlying structure in nervous system signals, rather than overfitting to specific hardware setups, participant populations, or experimental conventions.

Performance is compared against baselines trained only on the target lab’s local data. This comparison is critical: it distinguishes between models that merely benefit from more data and those that truly leverage cross-lab diversity to learn transferable representations. Improvements over local-only baselines would indicate that shared datasets provide additive value beyond scale, while failures to outperform them would suggest limits to generalization under current assumptions.


Possible Outcomes

This experiment is intentionally designed so that all outcomes are informative. If strong generalization emerges across labs, it would provide compelling evidence that shared datasets and foundation models can serve as a viable path forward for the field. If generalization is present but limited, it would suggest that the approach is directionally correct but constrained by insufficient standardization, dataset scale, or alignment across collection protocols. On the other hand, if models fail to generalize altogether, this would indicate that variability across users, devices, or contexts may be more fundamental than currently assumed, motivating a shift toward personalization-first approaches or entirely different modeling paradigms.

5. Why Now

Several converging developments make this an unusually timely moment to address this bottleneck. First, advances in multimodal AI now enable learning from heterogeneous and partially aligned data sources, a capability that was largely out of reach just a few years ago. Second, the rapid proliferation of wearable sensors and stimulation devices has made it significantly easier to collect peripheral nervous system data at scale. Third, there is growing momentum around embodied AI and human–machine integration, increasing the demand for models that operate directly on the body rather than solely on text or images. Finally, progress in key application domains—such as haptics, prosthetics, and neuromodulation—has remained comparatively stagnant, suggesting that the limiting factor is no longer hardware innovation alone, but the absence of a shared data substrate to support cumulative advances.

6. Broader Impact

Solving this bottleneck would not simply add another dataset to the field—it would change the way progress is made. A shared, multimodal nervous system data substrate would enable the emergence of reusable models for human–AI interaction, analogous to how pretrained models reshaped vision and language. Instead of rebuilding systems from scratch for each study, lab, or device, researchers could build on top of learned representations of the human body—its signals, responses, and adaptations—dramatically lowering the barrier to entry and accelerating iteration cycles.

In practical terms, this could unlock a new generation of assistive and augmentative technologies. Prosthetic control systems could generalize across users with far less calibration. Haptic interfaces in virtual and augmented reality could move beyond handcrafted effects toward perceptually grounded rendering. Neuromodulation therapies for pain, rehabilitation, or motor recovery could be informed by models trained across populations rather than tuned in isolation. Even modest gains in generalization would compound quickly in these domains, where current systems are brittle, slow to personalize, and difficult to scale.

Equally important, this shift would move the field from a pattern of isolated demonstrations to one of cumulative progress. Today, many results in computing interfaces (especially neural interfaces) are difficult to compare, reproduce, or extend because they depend on idiosyncratic setups and small datasets. A shared substrate would create a common reference point, allowing results to stack, models to improve incrementally, and insights to transfer across domains—from clinical neuroscience to HCI to robotics. This is the difference between a field that advances paper by paper and one that advances infrastructure-first.

Crucially, even a negative result would be highly valuable. If shared datasets and foundation models fail to produce meaningful generalization, this would provide concrete evidence that variability across individuals, contexts, or devices is not just a nuisance but a defining property of the system. That insight would justify a decisive pivot toward alternative paradigms, such as fully personalized models, adaptive closed-loop systems, or simulation-driven approaches. In other words, the proposed experiment does not risk ambiguity: it forces the field to learn whether it is missing scale and structure—or pursuing the wrong abstraction entirely.

In both success and failure, the outcome is the same: a re-envisioned map of the problem space and a path away from the current local optima. What is at stake is not just faster progress, but a transition toward a more principled, shared foundation for understanding and engineering human–AI systems that operate on and through the body.