The first problem that arises when dealing with a collection of molecular graphs such as those that exist in a database is duplication. The best approach to dealing with this task is to find a way to canonicalize  the molecular graphs and reorder the nodes to obtain all possible permutations of atoms. By storing the canonicalized forms in a database, the complexity of the task of checking whether an input molecule already exists is reduced significantly. Textual representations of molecules such as absolute SMILES  or INChi  can also be stored in databases and will help to reduce the cost of searches. It is essential, though, to ensure that the molecular graphs are chemically valid with the correct assignment of aromatic bond types and stereochemical descriptors before canonicalization takes place. The INChi strings in particular are an ideal tool for this task because by definition they have to be produced by the INChi published algorithm proposed by IUPAC and are designed particularly to solve the problem of chemical identity.
Another problem often encountered when dealing with a collection of molecular graphs is that of finding ones that are chemically similar to an input molecule, comparisons with which may involve finding others that:
(a) are supergraphs or subgraphs,
(b) share a large part of their structure (maximum common subgraph),
(c) share many similar subgraphs (common fragments).
A related class of problems is to ...