Skip to main content

Currently Skimming:

Large-Scale Visual Semantic Extraction--Samy Bengio
Pages 61-68

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 61...
... Such a tree will be different from semantic-only trees, such as WordNet, which do not take into account the visual appearance of concepts. INTRODUCTION The emergence of the Web as a tool for sharing information has caused a massive increase in the size of potential data sets available for machines to learn from.
From page 62...
... is very large and where even a linear algorithm in the number of classes can become computationally infeasible. We propose an algorithm for learning a tree structure of the labels in the previously proposed joint embedding space, which, by optimizing the overall tree loss, provides a superior accuracy to existing tree labeling methods.
From page 63...
... , largest first. Image Labeling as a Learning-To-Rank Task Labeling an image can be viewed as a ranking task where, given an image, one needs to order labels such that the top ones correspond to the image while the bottom ones are unrelated to it.
From page 64...
... The resulting model is much faster to train and obtains a much better performance at the top of the ranking. Large-Scale Learning We trained an embedding model with the WARP loss on a very large data set, containing more than 10 million training images and more than 100,000 labels, where labels correspond to queries uttered on Google Image Search and images attributed to these labels were images often clicked for these queries.
From page 65...
... Target Image Nearest Labels Target Image Nearest Labels delfini, orca, dolphin, barrack obama, barack mar, delfin, dauphin, obama, barack hussein whale, cancun, killer obama, barack obama, whale, sea world james marsden, jay z, obama, nelly, falco, barack eiffel tower, ipod, ipod nano, nokia, statue, eiffel, mole i pod, nintendo ds, antoneliana, la tour nintendo, lg, pc, nokia eiffel, londra, cctv 7610, vino tower, big ben, calatrava, tokyo tower Source: Adapted from Weston et al., 2010. pared to naive linear time approaches.
From page 66...
... We do so by computing the confusion matrix between all labels, where we count the number of times our classifiers confuse class i with class j, and use this matrix to apply spectral clustering (Ng et al., 2002)
From page 67...
... where each image can now be labeled either with its original labels, or with any of the nodes of the tree that contains them. Moreover, whenever an internal node is selected as a positive label for a given image during training, we select a competing negative label as a sibling node in the label tree, as this corresponds to how the tree would then be used at test time.
From page 68...
... 2010. Large scale image annotation: Learning to rank with joint word-image embeddings.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.