62 5. COMPUTATIONAL EXPERIENCE
Given a set of k training samples .x; y/ 2 T (for example each x a sentence and each y a
bit string of y
values) the goal is to ﬁnd a set of weights w to maximize an objective function
with two parts: one term R.w/ incorporates constraints and minimizes errors on the training set,
and one term ˝.w/ favors simple weighting schemes to avoid overﬁtting to T . In the training
phase the learning algorithm iterates to modify the weight set w incrementally. At each iteration a
query is sent to the quantum annealing hardware to ﬁnd an optimal solution to a QUBO instance
(see Section 3.2.1) and thereby improve w.
e authors point out that theirs is an unusual objective function because it has quadratic
terms, which allows it to account for pairwise correlations between elements of the test set. eir
experiments test whether the introduction of a quadratic objective function yields better classiﬁers.
ey compare their approach to an alternative method involving support vector machines (SVM)
that does not incorporate the quadratic interaction term.
eir evaluation of the classiﬁer looks at two input classes, one with labels based on solu-
tions to random satisﬁability problems, and one from a standard test set in image recognition.
Maximum instance sizes are jxj D 20; jyj D 34 in the random problems and jxj D 294; jyj D 6
for the image data. e quality of a learning algorithm is evaluated in terms of relative Hamming
error which measures the distance between true assignments and those returned the classiﬁer on
the evaluation set R. On these tests their learning algorithm produced classiﬁers with 28 and 10
percent better relative Hamming error than those from the SVM approach.
More discussion of this use of D-Wave quantum annealers to train classiﬁers may be found
in Nevin et al. , Nevin et al. , and Nevin et al. .
5.1.2 FINDING RAMSEY NUMBERS
Suppose you have guest list for a party of N people. For given m and n, you want to determine
whether there is a subgroup of size m who are all mutually acquainted, forming a clique, or a
subgroup of size n who are all strangers, forming an independent set. (If so, your party will be a
ﬂop because large cliques and large groups of strangers do not mingle well.)
A Ramsey number R.n; m/ is a threshold value for N such that if N R.m; n/ it must hold
that all parties of size N will either contain m mutual acquaintances or n mutual strangers. e
computation of R.n; m/ is a notoriously diﬃcult problem; for example with n; m > 3 only nine
Ramsey numbers are known. Besides party planning, the problem has important applications in
number theory, combinatorics, and complexity theory (see Roberts ).
e complexity of this problem is unknown: Schaefer  shows that a more general ver-
sion of the decision problem (for general subgraphs) is co-NP
-complete, that is, complete for
the second level of the polynomial hierarchy.
Gaitan and Clark  show that the problem of computing Ramsey numbers belongs in the
quantum complexity class QMA (see Section 2.4). ey describe a quantum annealing algorithm
to compute Ramsey numbers, as follows. e ﬁrst step is to cast the computation of R .n; m/ as
an optimization problem. e party list can be represented by a graph G on N vertices and M