1. Introduction
Pain and pain chronification are incompletely understood and unresolved medical problems
that continue to have a high prevalence.
14
It has been accepted that pain is a complex phenomenon.
2,32,72
Contemporary methods of computational science
51
can use complex clinical and experimental data to better understand the complexity
of pain. Among data science techniques, machine learning is referred to as a set of
methods (Fig. 1) that can automatically detect patterns in data and then use the uncovered
patterns to predict or classify future data, to observe structures such as subgroups
in the data, or to extract information from the data suitable to derive new knowledge.
11,43
Together with (bio)statistics, artificial intelligence and machine learning aim at
learning from data.
Figure 1.
Overview and classification of machine learning methods, selected for their use in
a pain research context. The figure structures machine learning for its main uses
comprising (1) classification tasks used for example to obtain a clinical diagnosis,
(2) data structure detection including the identification of clusters, and (3) knowledge
discovery in experimental or clinical data or in large databases structured hierarchically
such as ontologies. Short descriptions of key methods are provided in Box 1. The icons
at the right of each main application field symbolize respective typical machine learning
methods, that is, from top to bottom: (1) SVM where the grouping (classification)
is obtained by placing a border (hyperplane) between classes (subsymbolic classifier),
(2) a decision tree where the classification is obtained through hierarchical rules
(symbolic classifier), (3) an emergent self-organizing maps as an unsupervised machine
learning method able to find an interesting structure in high-dimensional data such
as clusters. In this figure, the map was colored using a geographical analogy with
brown (up to snow-covered) heights and green valleys, on which clusters can be separated
(from Ref. 36). Finally, (4) a directed acyclic graph is drafted depicting the polyhierarchy
of, for example, the functions of pain-relevant genes (from Ref. 69). CART, classification
and regression tree; DAG, directed acyclic graph; DT, decision tree; ESOM, emergent
self-organizing map; HMM, hidden Markov models; k-NN, k nearest neighbor; LVQ, learning
vector quantization; MLP, multilayer perceptron; PCA, principal component analysis;
SVM, support vector machine.
Although statistics can be regarded as a branch of mathematics, artificial intelligence
and machine learning have developed from computer science (Ref. 58; see also https://en.wikipedia.org/wiki/Artificial_intelligence).
The initial definition of artificial intelligence originates from Alan Turing who
proposed an experiment where 2 players, who can either be human or artificial, try
to convince a human third player, that they are also humans.
68
The test of artificial intelligence is passed if the third player cannot tell who
is the machine. Important steps in the development of machine learning were the first
creation of the computer learning program, which was a checker game,
54
and the first neural network called the perceptron.
53
Statistics uses mathematical equations to model probability relationships between
data variables, whereas machine learning learns from data without the necessity of
previous knowledge. It aims at optimization and performance of an algorithm rather
than on the analysis of the probabilities of observations, given a known underlying
data distribution. Nevertheless, both machine learning and statistics techniques are
working in concert for pattern recognition, knowledge discovery, and data mining and
share partly the same methods such as regression, which is used widely in statistics
but is also considered as a classification method in machine learning (Fig. 1).
In the present research context, when provided with pain-related data, machine-learned
methods are able to learn a mapping of complex features to a known class, that is,
to predict a pain phenotype class from a complex pattern of acquired parameters. After
the machine has learned the prediction of a pain-related phenotype, the algorithm
can subsequently be used on new data from which the class membership of a novel yet
unclassified subject can be identified. However, machine learning methods can also
be used for pattern recognition in complex pain-related data to reveal traces of an
underlying molecular background or for knowledge discovery in big data in a drug discovery
or repurposing context. The increasing use of contemporary methods of computational
science is reflected in the rising number of reports using machine learning for pain
research (Table 1). This review is focused on machine-learned technologies applied
to general pain research that allow one to analyze and predict pain phenotypes and
to obtain knowledge from experimental and clinical pain-related data.
Table 1
Reports of pain research in the order of the year of publication, where machine-learned
methods were used.
2. Pain research involving machine learning
A literature search was conducted using PubMed at https://www.ncbi.nlm.nih.gov/pubmed
on July 22, 2017 for “([machine-learn*] OR machine learn*) AND pain.” One hundred
ten results published between 2002 and 2017 with an increasing number of publications
over time were obtained (Table 1), and a few more reports were obtained from reference
tracking. After elimination of editorials, reviews, repeated reports of the same machine-learned
analysis, 88 original reports of the use of machine learning in a pain context were
identified. Twenty-two articles that regarded pain only as a symptom interesting in
another context such as chest pain as a diagnostic criterion for pneumonia
10
or coronary syndromes
4
or phantom limb pain as an indicator or prosthesis functioning
1
were excluded as well as 14 reports about neuroimaging of pain, a topic that has been
reviewed separately.
26,39
This resulted in 52 reports that were analyzed for the use of several different methods
of machine learning in pain research (Table 1). For a short description of the mentioned
machine learning methods, please refer to Box 1.
Text box 1.
Definitions and descriptions of key methods of machine learning most frequently used
so far in the pain research context (Table 1 and Fig. 1). For a detailed description
of these and further machine learning methods, see, for example, Ref. 11,43
(1) Classification solves the problem of identifying to which category (diagnosis)
a new case belongs, based on a training data set of data containing cases whose category
is known.
(2) A Bayes classifier minimizes the probability of misclassification, given the prerequisites
of the theorem of Bayes, that is, distributions and (conditional) probabilities.
(3) Decision tree methods output a tree-structured graph consisting of variables (features)
in the decisions nodes (points of split) and conditions in the edges.
(4) Random forests use a multitude of simple decision trees usually based on a random
selection of a small set of features.
(5) Projection methods represent the data space in a lower-dimensional space with
the aim of conserving important structural properties.
(6) Focusing projection methods are learned using a function of the neighborhood of
points in the data space.
(7) K nearest-neighbors (k-NN) methods use k (classified) prototypes to which new
cases are assigned depending on their distances to all prototypes.
(8) Artificial neural networks (ANNs) are computer programs that operate a multitude
of simple processing elements (neurons), which are connected to each other by (weighted)
synapses.
(9) Multilayer perceptrons are ANN, where the connections are structured in layers.
The neurons are of the McCulloch-Pitts type, that is, nonlinear decisions using hyperplanes
are used.
(10) Support vector machines are multilayer perceptron with 1 layer using McCulloch-Pitts
type of neurons, where the input data are projected into a (possible infinite dimensional)
vector space in which (1) scalar products are easily computable and (2) the decision
surface can be more complex than simple hyperplanes.
(11) A self-organizing map (SOM) is an unsupervised learning ANN producing a 2-dimensional
discretized representation of the data space through a focusing projection.
(12) Emergent SOM are SOMs able to show emergent structures in the form of (U-, P-,
and U*-) matrix representations, which display structural features of the data space
using a geographical map metaphor.
(13) Knowledge in data science is a symbolic representation of taxonomic categorizations
and decisions using an algorithmic treatable (ie, decidable or provable) part of natural
human language, such as (a subset of) predicate logic, with the requirement to generalize
to unseen data.
(14) Ontologies use data science knowledge in the form of a naming and definition
of the terms and semantic interrelationships of the entities that really or fundamentally
exist for a particular domain of discourse.
(15) Ontology directed acyclic graphs are graph-based representations of a polyhierarchy
of terms of an ontology.
3. Pain phenotype prediction from complex case data
Machine learning addresses the so-called data space including an input space X comprising
vectors x
i
= <xi,1,…xi,d> with d > 0 different parameters (variables and features
24
) acquired from n > 0 cases. In supervised machine learning, algorithms enable a mapping
of the input parameters x
i
to the output classes y
i
in the data space
. The information consisting of several biomedical parameters is used to derive a
mapping that allows assigning future cases to the right class (prediction and generalization
Ref. 11), for example, the pain phenotype group or a clinical diagnosis. The main
types of classifiers provided by supervised machine learning are symbolic
45
or subsymbolic
63
classifiers.
In symbolic classifiers, the decision how a classification is obtained can be interpreted
by a domain expert as a combination of conditions on the features. For example, a
symbolic
45
classifier composed of a decision tree was created to predict patient-controlled analgesia
consumption from approximately 30 acquired features including demographic (age, sex,
and weight), biomedical (eg, blood pressure, diabetes, and arterial hypertension),
surgery-related therapy (eg, type of surgery, duration, and details of anesthesia)
and analgesic-related therapy (eg, consumption of analgesics before the surgery and
dose demands during the first 24 hours after surgery) parameters.
27
Importantly, for each of the parameters, the value range underlying the decision with
respect to analgesics demands remained accessible (see Tables 1 and 11 in Ref. 27).
In decision trees, the features are also weighted according to their importance (most
important first). Another example of a symbolic classifier is the creation of a Bayesian
diagnostic tool from demographic-, pain-, and surgery-related parameters for the prediction
of persistence of pain in a breast cancer surgery context. It provided a sensitivity
and specificity of 33% and 95%, respectively.
61
Again, the classification procedure was accessible to direct interpretation through
the Bayesian decision limits calculated for the single parameters.
In subsymbolic classifiers, a better performance of a machine learned algorithm is
sought by waiving the possibility of understanding the details, that is, it is impossible
to obtain biomedical explanations for the functioning of the algorithm. For example,
random forests use hundreds or thousands of simple decision trees that escape interpretation;
the classification is obtained through the complete set of trees, that is, the “forest.”
6
Such a classifier was created from various stool-based markers to diagnose a bladder
pain syndrome.
5
Similarly, a projection method for high-dimensional data, specifically, minimum curvilinear
embedding, was applied to obtain from complex proteomics data a clustering of patients
with neuropathic pain from controls and a further separation of different types of
neuropathy such as neuropathy associated with amyotrophic lateral sclerosis and peripheral
neuropathy with or without or pain.
8
Machine-learned algorithms were further applied to predict thermal pain sensitivity
from bioresponses acquired through electromyography, skin conductance level, and electrocardiography.
23
Specifically, using support vector machines (SVMs
56
), individual pain threshold and tolerance to thermal stimulation could be predicted
from the noninvasive measurements at accuracies of >91% and 79%, respectively.
23
This aimed at obtaining information about pain in subjects with verbal and/or cognitive
impairments in whom queries of pain such as standard visual rating scales cannot be
applied. Moreover, predicting which patients required high opioid doses for analgesia,
based on a next-generation sequencing–derived opioid receptor genotype, was achieved
with a subsymbolic classifier based on k-nearest neighbors calculations.
34
Another application of subsymbolic classifiers has been implemented as neural networks.
The so-called elastic net regression models and SVMs predicted pain scores measured
between 40 and 120 minutes after the administration of 10 mg oxycodone from interpolated
pain score values before drug administration.
46
The elastic net regression model provided pain scores that had a correlation coefficient
of 0.6 with the observed scores.
4. Structure detection in complex pain-related data
Detecting structures in the d-dimensional data space
pointing at patterns or subgroups accessible to biomedical interpretation is a typical
application of unsupervised machine learning. In contrast to the supervised learning
setting, the class information Y is absent or ignored; the task is to find “interesting”
data structures that can be interpreted as subgroups (clusters and strata) in the
studied cases or made accessible for biomedical interpretation by domain experts,
including the discovery of new knowledge in data-driven research approaches.
For example, in a data matrix comprising several quantitative sensory testing (QST)
parameters acquired from healthy subjects, a pattern was detected allowing one to
identify a subgroup of healthy subjects who reacted to hypersensitization with topical
capsaicin with a shift in QST parameters that resembled the parameter pattern observed
in patients with neuropathic pain.
35
Similarly, in a set of pain phenotype data comprising responses to experimental heat,
cold, mechanical, and electrical pain stimuli applied in 125 healthy subjects, structures
were detected using unsupervised machine learning implemented as emergent self-organizing
maps.
71
These data structures could be associated with a complex genotype composed of 30 reportedly
pain relevant variants in 10 genes, which was able to correctly identify 80% of the
subjects as belonging to an extreme pain phenotype in an independent and prospectively
assessed cohort of 89 other subjects.
38
5. Knowledge discovery and exploration of pain-related data
Machine learning methods can be used to explore data sets by reversing the analytical
focus of classifier building and pattern detection. Supervised machine learning methods
qualify for data exploration under the assumption that if a biomedical parameter qualifies
for inclusion in a classifier, then it is probably important for the addressed pain-related
problem. In contrast to classic statistical methods, where knowledge or at least presumptions
about the distributions and/or functional dependencies of the data are necessary,
machine learning methods allow for data-driven research approaches. Hence, techniques
of feature selection, which are common in machine learning, enable one to identify
relevant modulators of pain-related outcomes in data-driven and hypothesis-free explorative
research approaches. For example, a machine-learned analysis identified, among hundreds
of biomedical parameters, demographic-, psychological-, and pain-related parameters
as the most relevant for explaining the persistence of pain in women who underwent
breast cancer surgery.
61
Moreover, unsupervised machine learning methods can be used to assess, at a whole-study
level, whether the acquired biomedical parameters demonstrate the efficacy of a treatment
applied during a research project. The rationale is to detect data structures that
are congruent with a known preclassification such as the presence of a modulator of
the pain phenotype. For example, after treatment of 82 subjects with local UV-B irradiation
or capsaicin application and assessing the pain phenotype using 10 different QST parameters,
a 246 × 10-sized data matrix was obtained in a human experimental pain study.
36
Using unsupervised machine learning implemented as emergent self-organizing maps,
71
data structures were detected that coincided with applied known treatments indicating
that a modulation of the complex pain phenotype had been obtained.
36
A machine learning algorithm consisting of a classification and regression tree analysis
was applied to 8034 independent observations of baseline thermal nociceptive sensitivity
in mice.
9
The analysis identified the mouse genotype as predictive of the pain phenotype; however,
it also revealed that the experimenter performing the test and additional laboratory
factors including season/humidity, cage density, time of day, sex, and within-cage
or order of testing modulated the pain phenotypes.
9
Finally, natural language progressing methods,
73
which combine linguistics with computer science to analyze human language in speech
or written text, were used to extract signs from clinical notes using, such as the
occurrence of terms, for example, keywords that hint at a clinically incident, in
a document.
48
Prediction accuracy of this method for the patient's pain level was reported to be
better than 99%.
6. Limitations of machine learning in pain research
Machine learning is vulnerable to overfitting and may end up in describing noise or
irrelevant relationships rather than the true relationship between features and classes.
In that case, only the actual data on which the mapping has been learned are successfully
classified, but the algorithm fails to classify new data. This can be addressed by
building the classifier on a training data set and testing its performance on a test
data set obtained in a separate experiment or through splitting the available data,
and/or by cross validation using creating data subsets randomly resampled from the
original data sets. Furthermore, machine learning may be fooled by data sets containing
dominant but irrelevant features. A classic example is the training of a neuronal
network to recognize camouflaged tanks hidden in trees.
13
The network was apparently successfully trained with a set of photographs of tanks
in trees and just trees without tanks. However, in a new set of photographs of trees
with or without tanks hidden among them, the neuronal network failed. It turned out
that in the training set, photographs of camouflaged tanks had been taken on cloudy
days, whereas photographs of trees without tanks had been taken on sunny days. The
neural network had learned to recognize the weather rather than distinguishing tanks
among trees. In the new set of photographs, forests with and without tanks had been
photographed in the same weather; hence, a neuronal network merely able to distinguish
the weather was unable to identify tanks.
Furthermore, applications of machine learning in pain research may be limited by the
availability and quality of data; it depends on the maintenance of knowledge bases
or on the success of enrolling the necessary large number of subjects in clinical
studies. The latter has become easier, thanks to funding activities of concerted large-scale
pain research projects.
33
However, even the analysis of apparently large data sets can quickly be confronted
with small sample problems when data structure detection results in many subgroups
of small sizes. Then, the rather typical setting of many more features than cases
poses challenges on a valid data analysis. Possibly, generative machine learning methods
17
are able to reduce this problem. Such models range from Gaussian mixture models as
a simple form of a generative model up to more complex approaches such as generative
adversarial networks,
20
generative restricted Boltzman machines,
62
or generative emergent self-organizing neuronal networks.
70
7. Conclusions
The emerging discipline of computational pain research provides contemporary tools
to understand pain. This discipline uses computer-based processing of complex pain-related
data and relies on “intelligent” learning algorithms. By extracting information from
complex pain-related data and generating knowledge from this, information will be
facilitated. Therefore, machine learning has the ability to influence the study and
treatment of pain profoundly. Indeed, the application of machine learning for pain
research–related nonimaging problems has been mentioned in publications in scientific
journals since 2002 (Table 1). Among machine learning methods,
11,43
a subset has so far been applied to pain research–related problems (Fig. 1), SVMs,
regression models, and several kinds of neural networks so far most frequently mentioned
in the pain literature. Machine learning receives increasing general interest and
appears to penetrate many parts of daily life and natural sciences. This tendency
is likely to extend to pain research. The present review aims to acquaint pain domain
experts with the methods and current applications of machine learning in pain research,
possibly facilitating the awareness of the methods in current and future projects.
Conflict of interest statement
This work has been funded by the European Union Seventh Framework Programme (FP7/2007-2013)
under grant agreement no. 602919 (“GLORIA”, J.L.) and by the Landesoffensive zur Entwicklung
wissenschaftlich-ökonomischer Exzellenz (LOEWE), LOEWE-Zentrum für Translationale
Medizin und Pharmakologie (J.L.).
The authors have declared that no further conflicts of interest exist.