Get full access to Hands-On Transfer Learning with Python and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Conceptual approach

A successful image captioning system needs a way to translate a given image into a sequence of words. For extracting the right and relevant features from images, we can leverage a DCNN and, coupled with a recurrent neural network model, such as RNNs or LSTMs, we can build a hybrid generative model to start generating sequences of words as a caption, given a source image.

Thus, conceptually, the idea is to build a single hybrid model that can take a source image I as input, and can be trained to maximize the likelihood, P(S|I), such that S is the output of a sequence of words, which is our target output, and can be represented by S = {S₁,S₂, ..., S_n}, such that each word S_w comes from a given dictionary, which is our vocabulary. ...

Get Hands-On Transfer Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now