arXiv Paper Daily: Wed, 22 Jan 2020

Neural and Evolutionary Computing

DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm

Hao Xu , Haibin Chang , Dongxiao Zhang Subjects : Neural and Evolutionary Computing (cs.NE) ; Machine Learning (cs.LG)

Data-driven methods have recently been developed to discover underlying

partial differential equations (PDEs) of physical problems. However, for these

methods, a complete candidate library of potential terms in a PDE are usually

required. To overcome this limitation, we propose a novel framework combining

deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In

the proposed framework, a deep neural network that is trained with available

data of a physical problem is utilized to generate meta-data and calculate

derivatives, and the genetic algorithm is then employed to discover the

underlying PDE. Owing to the merits of the genetic algorithm, such as mutation

and crossover, DLGA-PDE can work with an incomplete candidate library. The

proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV)

equation, the Burgers equation, the wave equation, and the Chaffee-Infante

equation, respectively, for proof-of-concept. Satisfactory results are obtained

without the need for a complete candidate library, even in the presence of

noisy and limited data.

Multi-factorial Optimization for Large-scale Virtual Machine Placement in Cloud Computing

Zhengping Liang , Jian Zhang , Liang Feng , Zexuan Zhu Subjects : Neural and Evolutionary Computing (cs.NE)

The placement scheme of virtual machines (VMs) to physical servers (PSs) is

crucial to lowering operational cost for cloud providers. Evolutionary

algorithms (EAs) have been performed promising-solving on virtual machine

placement (VMP) problems in the past. However, as growing demand for cloud

services, the existing EAs fail to implement in large-scale virtual machine

placement (LVMP) problem due to the high time complexity and poor scalability.

Recently, the multi-factorial optimization (MFO) technology has surfaced as a

new search paradigm in evolutionary computing. It offers the ability to evolve

multiple optimization tasks simultaneously during the evolutionary process.

This paper aims to apply the MFO technology to the LVMP problem in

heterogeneous environment. Firstly, we formulate a deployment cost based VMP

problem in the form of the MFO problem. Then, a multi-factorial evolutionary

algorithm (MFEA) embedded with greedy-based allocation operator is developed to

address the established MFO problem. After that, a re-migration and merge

operator is designed to offer the integrated solution of the LVMP problem from

the solutions of MFO problem. To assess the effectiveness of our proposed

method, the simulation experiments are carried on large-scale and extra

large-scale VMs test data sets. The results show that compared with various

heuristic methods, our method could shorten optimization time significantly and

offer a competitive placement solution for the LVMP problem in heterogeneous

environment.

Population-based metaheuristics for Association Rule Text Mining

Iztok Fister Jr. , Suash Deb , Iztok Fister Subjects : Neural and Evolutionary Computing (cs.NE)

Nowadays, the majority of data on the Internet is held in an unstructured

format, like websites and e-mails. The importance of analyzing these data has

been growing day by day. Similar to data mining on structured data, text mining

methods for handling unstructured data have also received increasing attention

from the research community. The paper deals with the problem of Association

Rule Text Mining. To solve the problem, the PSO-ARTM method was proposed, that

consists of three steps: Text preprocessing, Association Rule Text Mining using

population-based metaheuristics, and text postprocessing. The method was

applied to a transaction database obtained from professional triathlon

athletes’ blogs and news posted on their websites. The obtained results reveal

that the proposed method is suitable for Association Rule Text Mining and,

therefore, offers a promising way for further development.

EdgeNets:Edge Varying Graph Neural Networks

Elvin Isufi , Fernando Gama , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Machine Learning (stat.ML)

Driven by the outstanding performance of neural networks in the structured

Euclidean domain, recent years have seen a surge of interest in developing

neural networks for graphs and data supported on graphs. The graph is leveraged

as a parameterization to capture detail at the node level with a reduced number

of parameters and complexity. Following this rationale, this paper puts forth a

general framework that unifies state-of-the-art graph neural networks (GNNs)

through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows

different nodes to use different parameters to weigh the information of

different neighbors. By extrapolating this strategy to more iterations between

neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to

capture local detail. This is the most general local operation that a node can

do and encompasses under one formulation all graph convolutional neural

networks (GCNNs) as well as graph attention networks (GATs). In writing

different GNN architectures with a common language, EdgeNets highlight specific

architecture advantages and limitations, while providing guidelines to improve

their capacity without compromising their local implementation. For instance,

we show that GCNNs have a parameter sharing structure that induces permutation

equivariance. This can be an advantage or a limitation, depending on the

application. When it is a limitation, we propose hybrid approaches and provide

insights to develop several other solutions that promote parameter sharing

without enforcing permutation equivariance. Another interesting conclusion is

the unification of GCNNs and GATs -approaches that have been so far perceived

as separate. In particular, we show that GATs are GCNNs on a graph that is

learned from the features. This particularization opens the doors to develop

alternative attention mechanisms for improving discriminatory power.

Ensemble Genetic Programming

Nuno M. Rodrigues , João E. Batista , Sara Silva

Comments: eurogp 2020 submission

Subjects

:

Machine Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Ensemble learning is a powerful paradigm that has been usedin the top

state-of-the-art machine learning methods like Random Forestsand XGBoost.

Inspired by the success of such methods, we have devel-oped a new Genetic

Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP

follows the same steps as other GeneticProgramming systems, but with

differences in the population structure,fitness evaluation and genetic

operators. We have tested this method oneight binary classification problems,

achieving results significantly betterthan standard GP, with much smaller

models. Although other methodslike M3GP and XGBoost were the best overall,

Ensemble GP was able toachieve exceptionally good generalization results on a

particularly hardproblem where none of the other methods was able to succeed.

An Efficient Framework for Automated Screening of Clinically Significant Macular Edema

Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The present study proposes a new approach to automated screening of

Clinically Significant Macular Edema (CSME) and addresses two major challenges

associated with such screenings, i.e., exudate segmentation and imbalanced

datasets. The proposed approach replaces the conventional exudate segmentation

based feature extraction by combining a pre-trained deep neural network with

meta-heuristic feature selection. A feature space over-sampling technique is

being used to overcome the effects of skewed datasets and the screening is

accomplished by a k-NN based classifier. The role of each data-processing step

(e.g., class balancing, feature selection) and the effects of limiting the

region-of-interest to fovea on the classification performance are critically

analyzed. Finally, the selection and implication of operating point on Receiver

Operating Characteristic curve are discussed. The results of this study

convincingly demonstrate that by following these fundamental practices of

machine learning, a basic k-NN based classifier could effectively accomplish

the CSME screening.

Memory capacity of neural networks with threshold and ReLU activations

Roman Vershynin

Comments: 25 pages

Subjects

:

Machine Learning (cs.LG)

; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Overwhelming theoretical and empirical evidence shows that mildly

overparametrized neural networks — those with more connections than the size

of the training data — are often able to memorize the training data with

(100\%) accuracy. This was rigorously proved for networks with sigmoid

activation functions and, very recently, for ReLU activations. Addressing a

1988 open question of Baum, we prove that this phenomenon holds for general

multilayered perceptrons, i.e. neural networks with threshold activation

functions, or with any mix of threshold and ReLU activations. Our construction

is probabilistic and exploits sparsity.

Computer Vision and Pattern Recognition

SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation

Jesse Sun , Fatemeh Darbeha , Mark Zaidi , Bo Wang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Medical image segmentation is a difficult but important task for many

clinical operations such as cardiac bi-ventricular volume estimation. More

recently, there has been a shift to utilizing deep learning and fully

convolutional neural networks (CNNs) to perform image segmentation that has

yielded state-of-the-art results in many public benchmark datasets. Despite the

progress of deep learning in medical image segmentation, standard CNNs are

still not fully adopted in clinical settings as they lack robustness and

interpretability. Shapes are generally more meaningful features than solely

textures of images, which are features regular CNNs learn, causing a lack of

robustness. Likewise, previous works surrounding model interpretability have

been focused on post hoc gradient-based saliency methods. However,

gradient-based saliency methods typically require additional computations post

hoc and have been shown to be unreliable for interpretability. Thus, we present

a new architecture called Shape Attentive U-Net (SAUNet) which focuses on model

interpretability and robustness. The proposed architecture attempts to address

these limitations by the use of a secondary shape stream that captures rich

shape-dependent information in parallel with the regular texture stream.

Furthermore, we suggest multi-resolution saliency maps can be learned using our

dual-attention decoder module which allows for multi-level interpretability and

mitigates the need for additional computations post hoc. Our method also

achieves state-of-the-art results on the two large public cardiac MRI image

segmentation datasets of SUN09 and AC17.

PatchPerPix for Instance Segmentation

Peter Hirsch , Lisa Mais , Dagmar Kainmueller Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper we present a novel method for proposal free instance

segmentation that can handle sophisticated object shapes that span large parts

of an image and form dense object clusters with crossovers. Our method is based

on predicting dense local shape descriptors, which we assemble to form

instances. All instances are assembled simultaneously in one go. To our

knowledge, our method is the first non-iterative method that guarantees

instances to be composed of learnt shape patches. We evaluate our method on a

variety of data domains, where it defines the new state of the art on two

challenging benchmarks, namely the ISBI 2012 EM segmentation benchmark, and the

BBBC010 C. elegans dataset. We show furthermore that our method performs well

also on 3d image data, and can handle even extreme cases of complex shape

clusters.

Geometric Proxies for Live RGB-D Stream Enhancement and Consolidation

Adrien Kaiser , José Alonso Ybanez Zepeda , Tamy Boubekeur

Comments: extension of our ECCV 2018 paper at this http URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We propose a geometric superstructure for unified real-time processing of

RGB-D data. Modern RGB-D sensors are widely used for indoor 3D capture, with

applications ranging from modeling to robotics, through augmented reality.

Nevertheless, their use is limited by their low resolution, with frames often

corrupted with noise, missing data and temporal inconsistencies. Our approach

consists in generating and updating through time a single set of compact local

statistics parameterized over detected geometric proxies, which are fed from

raw RGB-D data. Our proxies provide several processing primitives, which

improve the quality of the RGB-D stream on the fly or lighten further

operations. Experimental results confirm that our lightweight analysis

framework copes well with embedded execution as well as moderate memory and

computational capabilities compared to state-of-the-art methods. Processing

RGB-D data with our proxies allows noise and temporal flickering removal, hole

filling and resampling. As a substitute of the observed scene, our proxies can

additionally be applied to compression and scene reconstruction. We present

experiments performed with our framework in indoor scenes of different natures

within a recent open RGB-D dataset.

Multimodal Deep Unfolding for Guided Image Super-Resolution

Iman Marivani , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

The reconstruction of a high resolution image given a low resolution

observation is an ill-posed inverse problem in imaging. Deep learning methods

rely on training data to learn an end-to-end mapping from a low-resolution

input to a high-resolution output. Unlike existing deep multimodal models that

do not incorporate domain knowledge about the problem, we propose a multimodal

deep learning design that incorporates sparse priors and allows the effective

integration of information from another image modality into the network

architecture. Our solution relies on a novel deep unfolding operator,

performing steps similar to an iterative algorithm for convolutional sparse

coding with side information; therefore, the proposed neural network is

interpretable by design. The deep unfolding architecture is used as a core

component of a multimodal framework for guided image super-resolution. An

alternative multimodal design is investigated by employing residual learning to

improve the training efficiency. The presented multimodal approach is applied

to super-resolution of near-infrared and multi-spectral images as well as depth

upsampling using RGB images as side information. Experimental results show that

our model outperforms state-of-the-art methods.

P(^2)-GAN: Efficient Style Transfer Using Single Style Image

Zhentan Zheng , Jianyi Liu

Comments: 5 pages, 5 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Style transfer is a useful image synthesis technique that can re-render given

image into another artistic style while preserving its content information.

Generative Adversarial Network (GAN) is a widely adopted framework toward this

task for its better representation ability on local style patterns than the

traditional Gram-matrix based methods. However, most previous methods rely on

sufficient amount of pre-collected style images to train the model. In this

paper, a novel Patch Permutation GAN (P(^2)-GAN) network that can efficiently

learn the stroke style from a single style image is proposed. We use patch

permutation to generate multiple training samples from the given style image. A

patch discriminator that can simultaneously process patch-wise images and

natural images seamlessly is designed. We also propose a local texture

descriptor based criterion to quantitatively evaluate the style transfer

quality. Experimental results showed that our method can produce finer quality

re-renderings from single style image with improved computational efficiency

compared with many state-of-the-arts methods.

Detecting Face2Face Facial Reenactment in Videos

Prabhat Kumar , Mayank Vatsa , Richa Singh

Comments: 9 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Visual content has become the primary source of information, as evident in

the billions of images and videos, shared and uploaded on the Internet every

single day. This has led to an increase in alterations in images and videos to

make them more informative and eye-catching for the viewers worldwide. Some of

these alterations are simple, like copy-move, and are easily detectable, while

other sophisticated alterations like reenactment based DeepFakes are hard to

detect. Reenactment alterations allow the source to change the target

expressions and create photo-realistic images and videos. While technology can

be potentially used for several applications, the malicious usage of automatic

reenactment has a very large social implication. It is therefore important to

develop detection techniques to distinguish real images and videos with the

altered ones. This research proposes a learning-based algorithm for detecting

reenactment based alterations. The proposed algorithm uses a multi-stream

network that learns regional artifacts and provides a robust performance at

various compression levels. We also propose a loss function for the balanced

learning of the streams for the proposed network. The performance is evaluated

on the publicly available FaceForensics dataset. The results show

state-of-the-art classification accuracy of 99.96%, 99.10%, and 91.20% for no,

easy, and hard compression factors, respectively.

Learning Diverse Features with Part-Level Resolution for Person Re-Identification

Ben Xie , Xiaofu Wu , Suofei Zhang , Shiliang Zhao , Ming Li

Comments: 8 pages, 5 figures, submitted to IEEE TCSVT

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

Learning diverse features is key to the success of person re-identification.

Various part-based methods have been extensively proposed for learning local

representations, which, however, are still inferior to the best-performing

methods for person re-identification. This paper proposes to construct a strong

lightweight network architecture, termed PLR-OSNet, based on the idea of

Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving

feature diversity. The proposed PLR-OSNet has two branches, one branch for

global feature representation and the other branch for local feature

representation. The local branch employs a uniform partition strategy for

part-level feature resolution but produces only a single identity-prediction

loss, which is in sharp contrast to the existing part-based methods. Empirical

evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art

performance on popular person Re-ID datasets, including Market1501,

DukeMTMC-reID and CUHK03, despite its small model size.

Evaluating Weakly Supervised Object Localization Methods Right

Junsuk Choe , Seong Joon Oh , Seungho Lee , Sanghyuk Chun , Zeynep Akata , Hyunjung Shim Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

Weakly-supervised object localization (WSOL) has gained popularity over the

last years for its promise to train localization models with only image-level

labels. Since the seminal WSOL work of class activation mapping (CAM), the

field has focused on how to expand the attention regions to cover objects more

broadly and localize them better. However, these strategies rely on full

localization supervision to validate hyperparameters and for model selection,

which is in principle prohibited under the WSOL setup. In this paper, we argue

that WSOL task is ill-posed with only image-level labels, and propose a new

evaluation protocol where full supervision is limited to only a small held-out

set not overlapping with the test set. We observe that, under our protocol, the

five most recent WSOL methods have not made a major improvement over the CAM

baseline. Moreover, we report that existing WSOL methods have not reached the

few-shot learning baseline, where the full-supervision at validation time is

used for model training instead. Based on our findings, we discuss some future

directions for WSOL.

An End-to-end Deep Learning Approach for Landmark Detection and Matching in Medical Images

Monika Grewal , Timo M. Deist , Jan Wiersma , Peter A. N. Bosman , Tanja Alderliesten

Comments: SPIE Medical Imaging Conference – 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Anatomical landmark correspondences in medical images can provide additional

guidance information for the alignment of two images, which, in turn, is

crucial for many medical applications. However, manual landmark annotation is

labor-intensive. Therefore, we propose an end-to-end deep learning approach to

automatically detect landmark correspondences in pairs of two-dimensional (2D)

images. Our approach consists of a Siamese neural network, which is trained to

identify salient locations in images as landmarks and predict matching

probabilities for landmark pairs from two different images. We trained our

approach on 2D transverse slices from 168 lower abdominal Computed Tomography

(CT) scans. We tested the approach on 22,206 pairs of 2D slices with varying

levels of intensity, affine, and elastic transformations. The proposed approach

finds an average of 639, 466, and 370 landmark matches per image pair for

intensity, affine, and elastic transformations, respectively, with spatial

matching errors of at most 1 mm. Further, more than 99% of the landmark pairs

are within a spatial matching error of 2 mm, 4 mm, and 8 mm for image pairs

with intensity, affine, and elastic transformations, respectively. To

investigate the utility of our developed approach in a clinical setting, we

also tested our approach on pairs of transverse slices selected from follow-up

CT scans of three patients. Visual inspection of the results revealed landmark

matches in both bony anatomical regions as well as in soft tissues lacking

prominent intensity gradients.

From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds

Christiane Sommer , Yumin Sun , Leonidas Guibas , Daniel Cremers , Tolga Birdal

Comments: Accepted to IEEE Robotics and Automation Letters 2020 | Video: this https URL | Code: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

We propose a new method for segmentation-free joint estimation of orthogonal

planes, their intersection lines, relationship graph and corners lying at the

intersection of three orthogonal planes. Such unified scene exploration under

orthogonality allows for multitudes of applications such as semantic plane

detection or local and global scan alignment, which in turn can aid robot

localization or grasping tasks. Our two-stage pipeline involves a rough yet

joint estimation of orthogonal planes followed by a subsequent joint refinement

of plane parameters respecting their orthogonality relations. We form a graph

of these primitives, paving the way to the extraction of further reliable

features: lines and corners. Our experiments demonstrate the validity of our

approach in numerous scenarios from wall detection to 6D tracking, both on

synthetic and real data.

VMRFANet:View-Specific Multi-Receptive Field Attention Network for Person Re-identification

Honglong Cai , Yuedong Fang , Zhiguan Wang , Tingchun Yeh , Jinxing Cheng

Comments: Accepted by ICAART2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Person re-identification (re-ID) aims to retrieve the same person across

different cameras. In practice, it still remains a challenging task due to

background clutter, variations on body poses and view conditions, inaccurate

bounding box detection, etc. To tackle these issues, in this paper, we propose

a novel multi-receptive field attention (MRFA) module that utilizes filters of

various sizes to help network focusing on informative pixels. Besides, we

present a view-specific mechanism that guides attention module to handle the

variation of view conditions. Moreover, we introduce a Gaussian horizontal

random cropping/padding method which further improves the robustness of our

proposed network. Comprehensive experiments demonstrate the effectiveness of

each component. Our method achieves 95.5% / 88.1% in rank-1 / mAP on

Market-1501, 88.9% / 80.0% on DukeMTMC-reID, 81.1% / 78.8% on CUHK03 labeled

dataset and 78.9% / 75.3% on CUHK03 detected dataset, outperforming current

state-of-the-art methods.

Transfer Learning using Neural Ordinary Differential Equations

Rajath S , Sumukh Aithal K , Natarajan Subramanyam Subjects : Computer Vision and Pattern Recognition (cs.CV)

A concept of using Neural Ordinary Differential Equations(NODE) for Transfer

Learning has been introduced. In this paper we use the EfficientNets to explore

transfer learning on CIFAR-10 dataset. We use NODE for fine-tuning our model.

Using NODE for fine tuning provides more stability during training and

validation.These continuous depth blocks can also have a trade off between

numerical precision and speed .Using Neural ODEs for transfer learning has

resulted in much stable convergence of the loss function.

Face Verification via learning the kernel matrix

Ning Yuan , Xiao-Jun Wu , He-Feng Yin

Comments: 10 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

The kernel function is introduced to solve the nonlinear pattern recognition

problem. The advantage of a kernel method often depends critically on a proper

choice of the kernel function. A promising approach is to learn the kernel from

data automatically. Over the past few years, some methods which have been

proposed to learn the kernel have some limitations: learning the parameters of

some prespecified kernel function and so on. In this paper, the nonlinear face

verification via learning the kernel matrix is proposed. A new criterion is

used in the new algorithm to avoid inverting the possibly singular within-class

which is a computational problem. The experimental results obtained on the

facial database XM2VTS using the Lausanne protocol show that the verification

performance of the new method is superior to that of the primary method Client

Specific Kernel Discriminant Analysis (CSKDA). The method CSKDA needs to choose

a proper kernel function through many experiments, while the new method could

learn the kernel from data automatically which could save a lot of time and

have the robust performance.

Neural Style Difference Transfer and Its Application to Font Generation

Gantugs Atarsaikhan , Brian Kenji Iwana , Seiichi Uchida

Comments: Submitted to DAS2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Designing fonts requires a great deal of time and effort. It requires

professional skills, such as sketching, vectorizing, and image editing.

Additionally, each letter has to be designed individually. In this paper, we

will introduce a method to create fonts automatically. In our proposed method,

the difference of font styles between two different fonts is found and

transferred to another font using neural style transfer. Neural style transfer

is a method of stylizing the contents of an image with the styles of another

image. We proposed a novel neural style difference and content difference loss

for the neural style transfer. With these losses, new fonts can be generated by

adding or removing font styles from a font. We provided experimental results

with various combinations of input fonts and discussed limitations and future

development for the proposed method.

Recovering Geometric Information with Learned Texture Perturbations

Jane Wu , Yongxu Jin , Zhenglin Geng , Hui Zhou , Ronald Fedkiw Subjects : Computer Vision and Pattern Recognition (cs.CV)

Regularization is used to avoid overfitting when training a neural network;

unfortunately, this reduces the attainable level of detail hindering the

ability to capture high-frequency information present in the training data.

Even though various approaches may be used to re-introduce high-frequency

detail, it typically does not match the training data and is often not time

coherent. In the case of network inferred cloth, these sentiments manifest

themselves via either a lack of detailed wrinkles or unnaturally appearing

and/or time incoherent surrogate wrinkles. Thus, we propose a general strategy

whereby high-frequency information is procedurally embedded into low-frequency

data so that when the latter is smeared out by the network the former still

retains its high-frequency detail. We illustrate this approach by learning

texture coordinates which when smeared do not in turn smear out the

high-frequency detail in the texture itself but merely smoothly distort it.

Notably, we prescribe perturbed texture coordinates that are subsequently used

to correct the over-smoothed appearance of inferred cloth, and correcting the

appearance from multiple camera views naturally recovers lost geometric

information.

UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision

Tsun-Yi Yang , Duy-Kien Nguyen , Huub Heijnen , Vassileios Balntas Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

In this paper, we explore how three related tasks, namely keypoint detection,

description, and image retrieval can be jointly tackled using a single unified

framework, which is trained without the need of training data with point to

point correspondences. By leveraging diverse information from sequential layers

of a standard ResNet-based architecture, we are able to extract keypoints and

descriptors that encode local information using generic techniques such as

local activation norms, channel grouping and dropping, and self-distillation.

Subsequently, global information for image retrieval is encoded in an

end-to-end pipeline, based on pooling of the aforementioned local responses. In

contrast to previous methods in local matching, our method does not depend on

pointwise/pixelwise correspondences, and requires no such supervision at all

i.e. no depth-maps from an SfM model nor manually created synthetic affine

transformations. We illustrate that this simple and direct paradigm, is able to

achieve very competitive results against the state-of-the-art methods in

various challenging benchmark conditions such as viewpoint changes, scale

changes, and day-night shifting localization.

Autocamera Calibration for traffic surveillance cameras with wide angle lenses

Aman Gajendra Jain , Nicolas Saunier Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)

We propose a method for automatic calibration of a traffic surveillance

camera with wide-angle lenses. Video footage of a few minutes is sufficient for

the entire calibration process to take place. This method takes in the height

of the camera from the ground plane as the only user input to overcome the

scale ambiguity. The calibration is performed in two stages, 1. Intrinsic

Calibration 2. Extrinsic Calibration. Intrinsic calibration is achieved by

assuming an equidistant fisheye distortion and an ideal camera model. Extrinsic

calibration is accomplished by estimating the two vanishing points, on the

ground plane, from the motion of vehicles at perpendicular intersections. The

first stage of intrinsic calibration is also valid for thermal cameras.

Experiments have been conducted to demonstrate the effectiveness of this

approach on visible as well as thermal cameras.

Index Terms: fish-eye, calibration, thermal camera, intelligent

transportation systems, vanishing points

Spectral Pyramid Graph Attention Network for Hyperspectral Image Classification

Tinghuai Wang , Guangming Wang , Kuan Eeik Tan , Donghui Tan

Comments: 7 pages, 6 figures, 4 tables

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNN) have made significant advances in

hyperspectral image (HSI) classification. However, standard convolutional

kernel neglects the intrinsic connections between data points, resulting in

poor region delineation and small spurious predictions. Furthermore, HSIs have

a unique continuous data distribution along the high dimensional spectrum

domain – much remains to be addressed in characterizing the spectral contexts

considering the prohibitively high dimensionality and improving reasoning

capability in light of the limited amount of labelled data. This paper presents

a novel architecture which explicitly addresses these two issues. Specifically,

we design an architecture to encode the multiple spectral contextual

information in the form of spectral pyramid of multiple embedding spaces. In

each spectral embedding space, we propose graph attention mechanism to

explicitly perform interpretable reasoning in the spatial domain based on the

connection in spectral feature space. Experiments on three HSI datasets

demonstrate that the proposed architecture can significantly improve the

classification accuracy compared with the existing methods.

Active and Incremental Learning with Weak Supervision

Clemens-Alexander Brust , Christoph Käding , Joachim Denzler

Comments: Accepted for publication in KI – K”unstliche Intelligenz

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Large amounts of labeled training data are one of the main contributors to

the great success that deep models have achieved in the past. Label acquisition

for tasks other than benchmarks can pose a challenge due to requirements of

both funding and expertise. By selecting unlabeled examples that are promising

in terms of model improvement and only asking for respective labels, active

learning can increase the efficiency of the labeling process in terms of time

and cost.

In this work, we describe combinations of an incremental learning scheme and

methods of active learning. These allow for continuous exploration of newly

observed unlabeled data. We describe selection criteria based on model

uncertainty as well as expected model output change (EMOC). An object detection

task is evaluated in a continuous exploration context on the PASCAL VOC

dataset. We also validate a weakly supervised system based on active and

incremental learning in a real-world biodiversity application where images from

camera traps are analyzed. Labeling only 32 images by accepting or rejecting

proposals generated by our method yields an increase in accuracy from 25.4% to

42.6%.

BARNet: Bilinear Attention Network with Adaptive Receptive Field for Surgical Instrument Segmentation

Zhen-Liang Ni , Gui-Bin Bian , Guan-An Wang , Xiao-Hu Zhou , Zeng-Guang Hou , Xiao-Liang Xie , Zhen Li , Yu-Han Wang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV)

Surgical instrument segmentation is extremely important for computer-assisted

surgery. Different from common object segmentation, it is more challenging due

to the large illumination and scale variation caused by the special surgical

scenes. In this paper, we propose a novel bilinear attention network with

adaptive receptive field to solve these two challenges. For the illumination

variation, the bilinear attention module can capture second-order statistics to

encode global contexts and semantic dependencies between local pixels. With

them, semantic features in challenging areas can be inferred from their

neighbors and the distinction of various semantics can be boosted. For the

scale variation, our adaptive receptive field module aggregates multi-scale

features and automatically fuses them with different weights. Specifically, it

encodes the semantic relationship between channels to emphasize feature maps

with appropriate scales, changing the receptive field of subsequent

convolutions. The proposed network achieves the best performance 97.47% mean

IOU on Cata7 and comes first place on EndoVis 2017 by 10.10% IOU overtaking

second-ranking method.

Multiplication fusion of sparse and collaborative-competitive representation for image classification

Zi-Qi Li , Jun Sun , Xiao-Jun Wu , He-Feng Yin

Comments: submitted to International Journal of Machine Learning and Cybernetics

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Representation based classification methods have become a hot research topic

during the past few years, and the two most prominent approaches are sparse

representation based classification (SRC) and collaborative representation

based classification (CRC). CRC reveals that it is the collaborative

representation rather than the sparsity that makes SRC successful.

Nevertheless, the dense representation of CRC may not be discriminative which

will degrade its performance for classification tasks. To alleviate this

problem to some extent, we propose a new method called sparse and

collaborative-competitive representation based classification (SCCRC) for image

classification. Firstly, the coefficients of the test sample are obtained by

SRC and CCRC, respectively. Then the fused coefficient is derived by

multiplying the coefficients of SRC and CCRC. Finally, the test sample is

designated to the class that has the minimum residual. Experimental results on

several benchmark databases demonstrate the efficacy of our proposed SCCRC. The

source code of SCCRC is accessible at this https URL .

Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

Moshiur R. Farazi , Salman H. Khan , Nick Barnes Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computational Complexity (cs.CC)

Visual Question Answering (VQA) has emerged as a Visual Turing Test to

validate the reasoning ability of AI agents. The pivot to existing VQA models

is the joint embedding that is learned by combining the visual features from an

image and the semantic features from a given question. Consequently, a large

body of literature has focused on developing complex joint embedding strategies

coupled with visual attention mechanisms to effectively capture the interplay

between these two modalities. However, modelling the visual and semantic

features in a high dimensional (joint embedding) space is computationally

expensive, and more complex models often result in trivial improvements in the

VQA accuracy. In this work, we systematically study the trade-off between the

model complexity and the performance on the VQA task. VQA models have a diverse

architecture comprising of pre-processing, feature extraction, multimodal

fusion, attention and final classification stages. We specifically focus on the

effect of “multi-modal fusion” in VQA models that is typically the most

expensive step in a VQA pipeline. Our thorough experimental evaluation leads us

to two proposals, one optimized for minimal complexity and the other one

optimized for state-of-the-art VQA performance.

Plane Pair Matching for Efficient 3D View Registration

Adrien Kaiser , José Alonso Ybanez Zepeda , Tamy Boubekeur Subjects : Computer Vision and Pattern Recognition (cs.CV)

We present a novel method to estimate the motion matrix between overlapping

pairs of 3D views in the context of indoor scenes. We use the Manhattan world

assumption to introduce lightweight geometric constraints under the form of

planes into the problem, which reduces complexity by taking into account the

structure of the scene. In particular, we define a stochastic framework to

categorize planes as vertical or horizontal and parallel or non-parallel. We

leverage this classification to match pairs of planes in overlapping views with

point-of-view agnostic structural metrics. We propose to split the motion

computation using the classification and estimate separately the rotation and

translation of the sensor, using a quadric minimizer. We validate our approach

on a toy example and present quantitative experiments on a public RGB-D

dataset, comparing against recent state-of-the-art methods. Our evaluation

shows that planar constraints only add low computational overhead while

improving results in precision when applied after a prior coarse estimate. We

conclude by giving hints towards extensions and improvements of current

results.

Adaptive Dithering Using Curved Markov-Gaussian Noise in the Quantized Domain for Mapping SDR to HDR Image

Subhayan Mukherjee , Guan-Ming Su , Irene Cheng

Comments: 2018 International Conference on Smart Multimedia

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Image and Video Processing (eess.IV)

High Dynamic Range (HDR) imaging is gaining increased attention due to its

realistic content, for not only regular displays but also smartphones. Before

sufficient HDR content is distributed, HDR visualization still relies mostly on

converting Standard Dynamic Range (SDR) content. SDR images are often

quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video

transmission. Quantization can easily lead to banding artefacts. In some

computing and/or memory I/O limited environment, the traditional solution using

spatial neighborhood information is not feasible. Our method includes noise

generation (offline) and noise injection (online), and operates on pixels of

the quantized image. We vary the magnitude and structure of the noise pattern

adaptively based on the luma of the quantized pixel and the slope of the

inverse-tone mapping function. Subjective user evaluations confirm the superior

performance of our technique.

FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing

Yu Dong , Yihao Liu , He Zhang , Shifeng Chen , Yu Qiao

Comments: Accepted by AAAI2020 (with supplementary files)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Recently, convolutional neural networks (CNNs) have achieved great

improvements in single image dehazing and attained much attention in research.

Most existing learning-based dehazing methods are not fully end-to-end, which

still follow the traditional dehazing procedure: first estimate the medium

transmission and the atmospheric light, then recover the haze-free image based

on the atmospheric scattering model. However, in practice, due to lack of

priors and constraints, it is hard to precisely estimate these intermediate

parameters. Inaccurate estimation further degrades the performance of dehazing,

resulting in artifacts, color distortion and insufficient haze removal. To

address this, we propose a fully end-to-end Generative Adversarial Networks

with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed

Fusion-discriminator which takes frequency information as additional priors,

our model can generator more natural and realistic dehazed images with less

color distortion and fewer artifacts. Moreover, we synthesize a large-scale

training dataset including various indoor and outdoor hazy images to boost the

performance and we reveal that for learning-based dehazing methods, the

performance is strictly influenced by the training data. Experiments have shown

that our method reaches state-of-the-art performance on both public synthetic

datasets and real-world images with more visually pleasing dehazed results.

A hybrid algorithm for disparity calculation from sparse disparity estimates based on stereo vision

Subhayan Mukherjee , Ram Mohana Reddy Guddeti

Comments: 2014 SPCOM

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In this paper, we have proposed a novel method for stereo disparity

estimation by combining the existing methods of block based and region based

stereo matching. Our method can generate dense disparity maps from disparity

measurements of only 18% pixels of either the left or the right image of a

stereo image pair. It works by segmenting the lightness values of image pixels

using a fast implementation of K-Means clustering. It then refines those

segment boundaries by morphological filtering and connected components

analysis, thus removing a lot of redundant boundary pixels. This is followed by

determining the boundaries’ disparities by the SAD cost function. Lastly, we

reconstruct the entire disparity map of the scene from the boundaries’

disparities through disparity propagation along the scan lines and disparity

prediction of regions of uncertainty by considering disparities of the

neighboring regions. Experimental results on the Middlebury stereo vision

dataset demonstrate that the proposed method outperforms traditional disparity

determination methods like SAD and NCC by up to 30% and achieves an improvement

of 2.6% when compared to a recent approach based on absolute difference (AD)

cost function for disparity calculations [1].

G2MF-WA: Geometric Multi-Model Fitting with Weakly Annotated Data

Chao Zhang , Xuequan Lu , Katsuya Hotta , Xi Yang Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper we attempt to address the problem of geometric multi-model

fitting with resorting to a few weakly annotated (WA) data points, which has

been sparsely studied so far. In weak annotating, most of the manual

annotations are supposed to be correct yet inevitably mixed with incorrect

ones. The WA data can be naturally obtained in an interactive way for specific

tasks, for example, in the case of homography estimation, one can easily

annotate points on the same plane/object with a single label by observing the

image. Motivated by this, we propose a novel method to make full use of the WA

data to boost the multi-model fitting performance. Specifically, a graph for

model proposal sampling is first constructed using the WA data, given the prior

that the WA data annotated with the same weak label has a high probability of

being assigned to the same model. By incorporating this prior knowledge into

the calculation of edge probabilities, vertices (i.e., data points) lie on/near

the latent model are likely to connect together and further form a

subset/cluster for effective proposals generation. With the proposals

generated, the (alpha)-expansion is adopted for labeling, and our method in

return updates the proposals. This works in an iterative way. Extensive

experiments validate our method and show that the proposed method produces

noticeably better results than state-of-the-art techniques in most cases.

A Novel Image Dehazing and Assessment Method

Saad Bin Sami , Abdul Muqeet , Humera Tariq

Comments: Accepted in IBA-ICICT 2019

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Images captured in hazy weather conditions often suffer from color contrast

and color fidelity. This degradation is represented by transmission map which

represents the amount of attenuation and airlight which represents the color of

additive noise. In this paper, we have proposed a method to estimate the

transmission map using haze levels instead of airlight color since there are

some ambiguities in estimation of airlight. Qualitative and quantitative

results of proposed method show competitiveness of the method given. In

addition we have proposed two metrics which are based on statistics of natural

outdoor images for assessment of haze removal algorithms.

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing VQA datasets contain questions with varying levels of complexity.

While the majority of questions in these datasets require perception for

recognizing existence, properties, and spatial relationships of entities, a

significant portion of questions pose challenges that correspond to reasoning

tasks — tasks that can only be answered through a synthesis of perception and

knowledge about the world, logic and / or reasoning. This distinction allows us

to notice when existing VQA models have consistency issues — they answer the

reasoning question correctly but fail on associated low-level perception

questions. For example, models answer the complex reasoning question “Is the

banana ripe enough to eat?” correctly, but fail on the associated perception

question “Are the bananas mostly green or yellow?” indicating that the model

likely answered the reasoning question correctly but for the wrong reason. We

quantify the extent to which this phenomenon occurs by creating a new Reasoning

split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of

200K new perception questions which serve as sub questions corresponding to the

set of perceptual tasks needed to effectively answer the complex reasoning

questions in the Reasoning split. Additionally, we propose an approach called

Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the

model to attend do the same parts of the image when answering the reasoning

question and the perception sub questions. We show that SQuINT improves model

consistency by 7.8%, also marginally improving its performance on the Reasoning

questions in VQA, while also displaying qualitatively better attention maps.

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

Simon Vandenhende , Stamatios Georgoulis , Luc Van Gool Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper, we highlight the importance of considering task interactions

at multiple scales when distilling task information in a multi-task learning

setup. In contrast to common belief, we show that tasks with high pattern

affinity at a certain scale are not guaranteed to retain this behaviour at

other scales, and vice versa. We propose a novel architecture, MTI-Net, that

builds upon this finding in three ways. First, it explicitly models task

interactions at every scale via a multi-scale multi-modal distillation unit.

Second, it propagates distilled task information from lower to higher scales

via a feature propagation module. Third, it aggregates the refined task

features from all scales via a feature aggregation unit to produce the final

per-task predictions.

Extensive experiments on two multi-task dense labeling datasets show that,

unlike prior work, our multi-task model delivers on the full potential of

multi-task learning, that is, smaller memory footprint, reduced number of

calculations, and better performance w.r.t. single-task learning.

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

Zhu Zhang , Zhou Zhao , Yang Zhao , Qi Wang , Huasheng Liu , Lianli Gao Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper, we consider a novel task, Spatio-Temporal Video Grounding for

Multi-Form Sentences (STVG). Given an untrimmed video and a

declarative/interrogative sentence depicting an object, STVG aims to localize

the spatiotemporal tube of the queried object. STVG has two challenging

settings: (1) We need to localize spatio-temporal object tubes from untrimmed

videos, where the object may only exist in a very small segment of the video;

(2) We deal with multi-form sentences, including the declarative sentences with

explicit objects and interrogative sentences with unknown objects. Existing

methods cannot tackle the STVG task due to the ineffective tube pre-generation

and the lack of object relationship modeling. Thus, we then propose a novel

Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build

a spatio-temporal region graph to capture the region relationships with

temporal object dynamics, which involves the implicit and explicit spatial

subgraphs in each frame and the temporal dynamic subgraph across frames. We

then incorporate textual clues into the graph and develop the multi-step

cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer

with a dynamic selection method to directly retrieve the spatiotemporal tubes

without tube pre-generation. Moreover, we contribute a large-scale video

grounding dataset VidSTG based on video relation dataset VidOR. The extensive

experiments demonstrate the effectiveness of our method.

RGB-D Odometry and SLAM

Javier Civera , Seong Hun Lee

Comments: This is the pre-submission version of the manuscript that was later edited and published as a chapter in RGB-D Image Analysis and Processing

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

The emergence of modern RGB-D sensors had a significant impact in many

application fields, including robotics, augmented reality (AR) and 3D scanning.

They are low-cost, low-power and low-size alternatives to traditional range

sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the

additional depth information that removes the need of frame-by-frame

triangulation for 3D scene reconstruction. These merits have made them very

popular in mobile robotics and AR, where it is of great interest to estimate

ego-motion and 3D scene structure. Such spatial understanding can enable robots

to navigate autonomously without collisions and allow users to insert virtual

entities consistent with the image stream. In this chapter, we review common

formulations of odometry and Simultaneous Localization and Mapping (known by

its acronym SLAM) using RGB-D stream input. The two topics are closely related,

as the former aims to track the incremental camera motion with respect to a

local map of the scene, and the latter to jointly estimate the camera

trajectory and the global map with consistency. In both cases, the standard

approaches minimize a cost function using nonlinear optimization techniques.

This chapter consists of three main parts: In the first part, we introduce the

basic concept of odometry and SLAM and motivate the use of RGB-D sensors. We

also give mathematical preliminaries relevant to most odometry and SLAM

algorithms. In the second part, we detail the three main components of SLAM

systems: camera pose tracking, scene mapping and loop closing. For each

component, we describe different approaches proposed in the literature. In the

final part, we provide a brief discussion on advanced research topics with the

references to the state-of-the-art.

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

Junjie Yan , Ruosi Wan , Xiangyu Zhang , Wei Zhang , Yichen Wei , Jian Sun

Comments: Published in ICLR2020; code: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (stat.ML)

Batch Normalization (BN) is one of the most widely used techniques in Deep

Learning field. But its performance can awfully degrade with insufficient batch

size. This weakness limits the usage of BN on many computer vision tasks like

detection or segmentation, where batch size is usually small due to the

constraint of memory consumption. Therefore many modified normalization

techniques have been proposed, which either fail to restore the performance of

BN completely, or have to introduce additional nonlinear operations in

inference procedure and increase huge consumption. In this paper, we reveal

that there are two extra batch statistics involved in backward propagation of

BN, on which has never been well discussed before. The extra batch statistics

associated with gradients also can severely affect the training of deep neural

network. Based on our analysis, we propose a novel normalization method, named

Moving Average Batch Normalization (MABN). MABN can completely restore the

performance of vanilla BN in small batch cases, without introducing any

additional nonlinear operations in inference procedure. We prove the benefits

of MABN by both theoretical analysis and experiments. Our experiments

demonstrate the effectiveness of MABN in multiple computer vision tasks

including ImageNet and COCO. The code has been released in

this https URL .

Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

Chunle Guo , Chongyi Li , Jichang Guo , Chen Change Loy , Junhui Hou , Sam Kwong , Runmin Cong Subjects : Computer Vision and Pattern Recognition (cs.CV)

The paper presents a novel method, Zero-Reference Deep Curve Estimation

(Zero-DCE), which formulates light enhancement as a task of image-specific

curve estimation with a deep network. Our method trains a lightweight deep

network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic

range adjustment of a given image. The curve estimation is specially designed,

considering pixel value range, monotonicity, and differentiability. Zero-DCE is

appealing in its relaxed assumption on reference images, i.e., it does not

require any paired or unpaired data during training. This is achieved through a

set of carefully formulated non-reference loss functions, which implicitly

measure the enhancement quality and drive the learning of the network. Our

method is efficient as image enhancement can be achieved by an intuitive and

simple nonlinear curve mapping. Despite its simplicity, we show that it

generalizes well to diverse lighting conditions. Extensive experiments on

various benchmarks demonstrate the advantages of our method over

state-of-the-art methods qualitatively and quantitatively. Furthermore, the

potential benefits of our Zero-DCE to face detection in the dark are discussed.

Code and model will be available at this https URL .

SlideImages: A Dataset for Educational Image Classification

David Morris , Eric Müller-Budack , Ralph Ewerth

Comments: 8 pages, 2 figures, to be presented at ECIR 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Information Retrieval (cs.IR)

In the past few years, convolutional neural networks (CNNs) have achieved

impressive results in computer vision tasks, which however mainly focus on

photos with natural scene content. Besides, non-sensor derived images such as

illustrations, data visualizations, figures, etc. are typically used to convey

complex information or to explore large datasets. However, this kind of images

has received little attention in computer vision. CNNs and similar techniques

use large volumes of training data. Currently, many document analysis systems

are trained in part on scene images due to the lack of large datasets of

educational image data. In this paper, we address this issue and present

SlideImages, a dataset for the task of classifying educational illustrations.

SlideImages contains training data collected from various sources, e.g.,

Wikimedia Commons and the AI2D dataset, and test data collected from

educational slides. We have reserved all the actual educational images as a

test dataset in order to ensure that the approaches using this dataset

generalize well to new educational images, and potentially other domains.

Furthermore, we present a baseline system using a standard deep neural

architecture and discuss dealing with the challenge of limited training data.

Deep Semantic Face Deblurring

Ziyi Shen , Wei-Sheng Lai , Tingfa Xu , Jan Kautz , Ming-Hsuan Yang

Comments: Submitted to International Journal of Computer Vision (IJCV). arXiv admin note: text overlap with arXiv:1803.03345

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose an effective and efficient face deblurring

algorithm by exploiting semantic cues via deep convolutional neural networks.

As the human faces are highly structured and share unified facial components

(e.g., eyes and mouths), such semantic information provides a strong prior for

restoration. We incorporate face semantic labels as input priors and propose an

adaptive structural loss to regularize facial local structures within an

end-to-end deep convolutional neural network. Specifically, we first use a

coarse deblurring network to reduce the motion blur on the input face image. We

then adopt a parsing network to extract the semantic features from the coarse

deblurred image. Finally, the fine deblurring network utilizes the semantic

information to restore a clear face image. We train the network with perceptual

and adversarial losses to generate photo-realistic results. The proposed method

restores sharp images with more accurate facial features and details.

Quantitative and qualitative evaluations demonstrate that the proposed face

deblurring algorithm performs favorably against the state-of-the-art methods in

terms of restoration quality, face recognition and execution speed.

Gated Path Selection Network for Semantic Segmentation

Qichuan Geng , Hong Zhang , Xiaojuan Qi , Ruigang Yang , Zhong Zhou , Gao Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation is a challenging task that needs to handle large scale

variations, deformations and different viewpoints. In this paper, we develop a

novel network named Gated Path Selection Network (GPSNet), which aims to learn

adaptive receptive fields. In GPSNet, we first design a two-dimensional

multi-scale network – SuperNet, which densely incorporates features from

growing receptive fields. To dynamically select desirable semantic context, a

gate prediction module is further introduced. In contrast to previous works

that focus on optimizing sample positions on the regular grids, GPSNet can

adaptively capture free form dense semantic contexts. The derived adaptive

receptive fields are data-dependent, and are flexible that can model different

object geometric transformations. On two representative semantic segmentation

datasets, i.e., Cityscapes, and ADE20K, we show that the proposed approach

consistently outperforms previous methods and achieves competitive performance

without bells and whistles.

Human-Aware Motion Deblurring

Ziyi Shen , Wenguan Wang , Xiankai Lu , Jianbing Shen , Haibin Ling , Tingfa Xu , Ling Shao

Comments: ICCV2019 paper. Website: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a human-aware deblurring model that disentangles the

motion blur between foreground (FG) humans and background (BG). The proposed

model is based on a triple-branch encoder-decoder architecture. The first two

branches are learned for sharpening FG humans and BG details, respectively;

while the third one produces global, harmonious results by comprehensively

fusing multi-scale deblurring information from the two domains. The proposed

model is further endowed with a supervised, human-aware attention mechanism in

an end-to-end fashion. It learns a soft mask that encodes FG human information

and explicitly drives the FG/BG decoder-branches to focus on their specific

domains. To further benefit the research towards Human-aware Image Deblurring,

we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry

and sharp image pairs with 65,784 densely annotated FG human bounding boxes.

HIDE is specifically built to span a broad range of scenes, human object sizes,

motion patterns, and background complexities. Extensive experiments on public

benchmarks and our dataset demonstrate that our model performs favorably

against the state-of-the-art motion deblurring methods, especially in capturing

semantic details.

GTNet: Generative Transfer Network for Zero-Shot Object Detection

Shizhen Zhao , Changxin Gao , Yuanjie Shao , Lerenhan Li , Changqian Yu , Zhong Ji , Nong Sang

Comments: Accepted by AAAI 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We propose a Generative Transfer Network (GTNet) for zero shot object

detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge

Transfer Module. The Object Detection Module can learn large-scale seen domain

knowledge. The Knowledge Transfer Module leverages a feature synthesizer to

generate unseen class features, which are applied to train a new classification

layer for the Object Detection Module. In order to synthesize features for each

unseen class with both the intra-class variance and the IoU variance, we design

an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature

synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN

consists of three unit models: Class Feature Generating Unit (CFU), Foreground

Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU).

CFU generates unseen features with the intra-class variance conditioned on the

class semantic embeddings. FFU and BFU add the IoU variance to the results of

CFU, yielding class-specific foreground and background features, respectively.

We evaluate our method on three public datasets and the results demonstrate

that our method performs favorably against the state-of-the-art ZSD approaches.

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

Xiankai Lu , Wenguan Wang , Chao Ma , Jianbing Shen , Ling Shao , Fatih Porikli

Comments: CVPR2019. Weblink: this https URL

Journal-ref: CVPR2019

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel network, called CO-attention Siamese Network (COSNet),

to address the unsupervised video object segmentation task from a holistic

view. We emphasize the importance of inherent correlation among video frames

and incorporate a global co-attention mechanism to improve further the

state-of-the-art deep learning based solutions that primarily focus on learning

discriminative foreground representations over appearance and motion in

short-term temporal segments. The co-attention layers in our network provide

efficient and competent stages for capturing global correlations and scene

context by jointly computing and appending co-attention responses into a joint

feature space. We train COSNet with pairs of video frames, which naturally

augments training data and allows increased learning capacity. During the

segmentation stage, the co-attention model encodes useful information by

processing multiple reference frames together, which is leveraged to infer the

frequently reappearing and salient foreground objects better. We propose a

unified and end-to-end trainable framework where different co-attention

variants can be derived for mining the rich context within videos. Our

extensive experiments over three large benchmarks manifest that COSNet

outperforms the current alternatives by a large margin.

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

Wenguan Wang , Xiankai Lu , Jianbing Shen , David Crandall , Ling Shao

Comments: ICCV2019(Oral). Website: this https URL

Journal-ref: ICCV2019(Oral)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This work proposes a novel attentive graph neural network (AGNN) for

zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this

task as a process of iterative information fusion over video graphs.

Specifically, AGNN builds a fully connected graph to efficiently represent

frames as nodes, and relations between arbitrary frame pairs as edges. The

underlying pair-wise relations are described by a differentiable attention

mechanism. Through parametric message passing, AGNN is able to efficiently

capture and mine much richer and higher-order relations between video frames,

thus enabling a more complete understanding of video content and more accurate

foreground estimation. Experimental results on three video segmentation

datasets show that AGNN sets a new state-of-the-art in each case. To further

demonstrate the generalizability of our framework, we extend AGNN to an

additional task: image object co-segmentation (IOCS). We perform experiments on

two famous IOCS datasets and observe again the superiority of our AGNN model.

The extensive experiments verify that AGNN is able to learn the underlying

semantic/appearance relationships among video frames or related images, and

discover the common objects.

Learning Compositional Neural Information Fusion for Human Parsing

Wenguan Wang , Zhijie Zhang , Siyuan Qi , Jianbing Shen , Yanwei Pang , Ling Shao

Comments: ICCV2019. Websie: this https URL

Journal-ref: ICCV2019

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This work proposes to combine neural networks with the compositional

hierarchy of human bodies for efficient and complete human parsing. We

formulate the approach as a neural information fusion framework. Our model

assembles the information from three inference processes over the hierarchy:

direct inference (directly predicting each part of a human body using image

information), bottom-up inference (assembling knowledge from constituent

parts), and top-down inference (leveraging context from parent nodes). The

bottom-up and top-down inferences explicitly model the compositional and

decompositional relations in human bodies, respectively. In addition, the

fusion of multi-source information is conditioned on the inputs, i.e., by

estimating and considering the confidence of the sources. The whole model is

end-to-end differentiable, explicitly modeling information flows and

structures. Our approach is extensively evaluated on four popular datasets,

outperforming the state-of-the-arts in all cases, with a fast processing speed

of 23fps. Our code and results have been released to help ease future research

in this direction.

Image denoising via K-SVD with primal-dual active set algorithm

Quan Xiao , Canhong Wen , Zirui Yan

Comments: 9 pages, 6 figures. The paper was accepted by IEEE. WACV 2020 and will placed in the IEEE Xplore

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (stat.ML)

K-SVD algorithm has been successfully applied to image denoising tasks dozens

of years but the big bottleneck in speed and accuracy still needs attention to

break. For the sparse coding stage in K-SVD, which involves (ell_{0})

constraint, prevailing methods usually seek approximate solutions greedily but

are less effective once the noise level is high. The alternative (ell_{1})

optimization is proved to be powerful than (ell_{0}), however, the time

consumption prevents it from the implementation. In this paper, we propose a

new K-SVD framework called K-SVD(_P) by applying the Primal-dual active set

(PDAS) algorithm to it. Different from the greedy algorithms based K-SVD, the

K-SVD(_P) algorithm develops a selection strategy motivated by KKT

(Karush-Kuhn-Tucker) condition and yields to an efficient update in the sparse

coding stage. Since the K-SVD(_P) algorithm seeks for an equivalent solution to

the dual problem iteratively with simple explicit expression in this denoising

problem, speed and quality of denoising can be reached simultaneously.

Experiments are carried out and demonstrate the comparable denoising

performance of our K-SVD(_P) with state-of-the-art methods.

Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants

Hui Zhu , Zhulin An , Kaiqiang Xu , Xiaolong Hu , Yongjun Xu Subjects : Computer Vision and Pattern Recognition (cs.CV)

Existing approaches to improve the performances of convolutional neural

networks by optimizing the local architectures or deepening the networks tend

to increase the size of models significantly. In order to deploy and apply the

neural networks to edge devices which are in great demand, reducing the scale

of networks are quite crucial. However, It is easy to degrade the performance

of image processing by compressing the networks. In this paper, we propose a

method which is suitable for edge devices while improving the efficiency and

effectiveness of inference. The joint decision of multi-participants, mainly

contain multi-layers and multi-networks, can achieve higher classification

accuracy (0.26% on CIFAR-10 and 4.49% on CIFAR-100 at most) with similar total

number of parameters for classical convolutional neural networks.

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion

Kaiyu Shan , Yongtao Wang , Zhuoying Wang , Tingting Liang , Zhi Tang , Ying Chen , Yangyan Li

Comments: Under Review

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

To efficiently extract spatiotemporal features of video for action

recognition, most state-of-the-art methods integrate 1D temporal convolution

into a conventional 2D CNN backbone. However, they all exploit 1D temporal

convolution of fixed kernel size (i.e., 3) in the network building block, thus

have suboptimal temporal modeling capability to handle both long-term and

short-term actions. To address this problem, we first investigate the impacts

of different kernel sizes for the 1D temporal convolutional filters. Then, we

propose a simple yet efficient operation called Mixed Temporal Convolution

(MixTConv), which consists of multiple depthwise 1D convolutional filters with

different kernel sizes. By plugging MixTConv into the conventional 2D CNN

backbone ResNet-50, we further propose an efficient and effective network

architecture named MSTNet for action recognition, and achieve state-of-the-art

results on multiple benchmarks.

NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection

Yazhao Li , Yanwei Pang , Jianbing Shen , Jiale Cao , Ling Shao

Comments: 10 pages, 8 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Due to the advantages of real-time detection and improved performance,

single-shot detectors have gained great attention recently. To solve the

complex scale variations, single-shot detectors make scale-aware predictions

based on multiple pyramid layers. However, the features in the pyramid are not

scale-aware enough, which limits the detection performance. Two common problems

in single-shot detectors caused by object scale variations can be observed: (1)

small objects are easily missed; (2) the salient part of a large object is

sometimes detected as an object. With this observation, we propose a new

Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid

features and explore scale-aware features. In NET, a Neighbor Erasing Module

(NEM) is designed to erase the salient features of large objects and emphasize

the features of small objects in shallow layers. A Neighbor Transferring Module

(NTM) is introduced to transfer the erased features and highlight large objects

in deep layers. With this mechanism, a single-shot network called NETNet is

constructed for scale-aware object detection. In addition, we propose to

aggregate nearest neighboring pyramid features to enhance our NET. NETNet

achieves 38.5% AP at a speed of 27 FPS and 32.0% AP at a speed of 55 FPS on MS

COCO dataset. As a result, NETNet achieves a better trade-off for real-time and

accurate object detection.

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

Jie Wu , Guanbin Li , Si Liu , Liang Lin

Comments: To appear in AAAI2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Temporally language grounding in untrimmed videos is a newly-raised task in

video understanding. Most of the existing methods suffer from inferior

efficiency, lacking interpretability, and deviating from the human perception

mechanism. Inspired by human’s coarse-to-fine decision-making paradigm, we

formulate a novel Tree-Structured Policy based Progressive Reinforcement

Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by

an iterative refinement process. The semantic concepts are explicitly

represented as the branches in the policy, which contributes to efficiently

decomposing complex policies into an interpretable primitive action.

Progressive reinforcement learning provides correct credit assignment via two

task-oriented rewards that encourage mutual promotion within the

tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and

ActivityNet datasets, and experimental results show that TSP-PRL achieves

competitive performance over existing state-of-the-art methods.

ENAS U-Net: Evolutionary Neural Architecture Search for Retinal Vessel Segmentation

Zhun Fan , Jiahong Wei , Guijie Zhu , Jiajie Mo , Wenji Li Subjects : Computer Vision and Pattern Recognition (cs.CV)

The accurate retina vessel segmentation (RVS) is of great significance to

assist doctors in the diagnosis of ophthalmology diseases and other systemic

diseases, and manually designing a valid neural network architecture for

retinal vessel segmentation requires high expertise and a large workload. In

order to further improve the performance of vessel segmentation and reduce the

workload of manually designing neural network. We propose a specific search

space based on encoder-decoder framework and apply neural architecture search

(NAS) to retinal vessel segmentation. The search space is a macro-architecture

search that involves some operations and adjustments to the entire network

topology. For the architecture optimization, we adopt the modified evolutionary

strategy which can evolve with limited computing resource to evolve the

architectures. During the evolution, we select the elite architectures for the

next generation evolution based on their performances. After the evolution, the

searched model is evaluated on three mainstream datasets, namely DRIVE, STARE

and CHASE_DB1. The searched model achieves top performance on all three

datasets with fewer parameters (about 2.3M). Moreover, the results of

cross-training between above three datasets show that the searched model is

with considerable scalability, which indicates that the searched model is with

potential for clinical disease diagnosis.

Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset for Spatially Varying Isotropic Materials

Min Li , Zhenglong Zhou , Zhe Wu , Boxin Shi , Changyu Diao , Ping Tan Subjects : Computer Vision and Pattern Recognition (cs.CV)

We present a method to capture both 3D shape and spatially varying

reflectance with a multi-view photometric stereo (MVPS) technique that works

for general isotropic materials. Our algorithm is suitable for perspective

cameras and nearby point light sources. Our data capture setup is simple, which

consists of only a digital camera, some LED lights, and an optional automatic

turntable. From a single viewpoint, we use a set of photometric stereo images

to identify surface points with the same distance to the camera. We collect

this information from multiple viewpoints and combine it with

structure-from-motion to obtain a precise reconstruction of the complete 3D

shape. The spatially varying isotropic bidirectional reflectance distribution

function (BRDF) is captured by simultaneously inferring a set of basis BRDFs

and their mixing weights at each surface point. In experiments, we demonstrate

our algorithm with two different setups: a studio setup for highest precision

and a desktop setup for best usability. According to our experiments, under the

studio setting, the captured shapes are accurate to 0.5 millimeters and the

captured reflectance has a relative root-mean-square error (RMSE) of 9%. We

also quantitatively evaluate state-of-the-art MVPS on a newly collected

benchmark dataset, which is publicly available for inspiring future research.

Text-to-Image Generation with Attention Based Recurrent Neural Networks

Tehseen Zia , Shahan Arif , Shakeeb Murtaza , Mirza Ahsan Ullah Subjects : Computer Vision and Pattern Recognition (cs.CV)

Conditional image modeling based on textual descriptions is a relatively new

domain in unsupervised learning. Previous approaches use a latent variable

model and generative adversarial networks. While the formers are approximated

by using variational auto-encoders and rely on the intractable inference that

can hamper their performance, the latter is unstable to train due to Nash

equilibrium based objective function. We develop a tractable and stable

caption-based image generation model. The model uses an attention-based encoder

to learn word-to-pixel dependencies. A conditional autoregressive based decoder

is used for learning pixel-to-pixel dependencies and generating images.

Experimentations are performed on Microsoft COCO, and MNIST-with-captions

datasets and performance is evaluated by using the Structural Similarity Index.

Results show that the proposed model performs better than contemporary

approaches and generate better quality images. Keywords: Generative image

modeling, autoregressive image modeling, caption-based image generation, neural

attention, recurrent neural networks.

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

Anubha Pandey , Ashish Mishra , Vinay Kumar Verma , Anurag Mittal , Hema A. Murthy

Comments: Accepted in WACV’2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that

the data of all the classes are available during training. The assumption may

not always be practical since the data of a few classes may be unavailable, or

the classes may not appear at the time of training. Zero-Shot Sketch-Based

Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to

handle previously unseen classes during the test. This paper proposes a

generative approach based on the Stacked Adversarial Network (SAN) and the

advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a

high-quality sample, SN learns a better distance metric compared to that of the

nearest neighbor search. The capability of the generative model to synthesize

image features based on the sketch reduces the SBIR problem to that of an

image-to-image retrieval problem. We evaluate the efficacy of our proposed

approach on TU-Berlin, and Sketchy database in both standard ZSL and

generalized ZSL setting. The proposed method yields a significant improvement

in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL)

for SBIR.

Deep Metric Structured Learning For Facial Expression Recognition

Pedro D. Marrero Fernandez , Tsang Ing Ren , Tsang Ing Jyh , Fidel A. Guerrero Peña , Alexandre Cunha Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose a deep metric learning model to create embedded sub-spaces with a

well defined structure. A new loss function that imposes Gaussian structures on

the output space is introduced to create these sub-spaces thus shaping the

distribution of the data. Having a mixture of Gaussians solution space is

advantageous given its simplified and well established structure. It allows

fast discovering of classes within classes and the identification of mean

representatives at the centroids of individual classes. We also propose a new

semi-supervised method to create sub-classes. We illustrate our methods on the

facial expression recognition problem and validate results on the FER+,

AffectNet, Extended Cohn-Kanade (CK+), BU-3DFE, and JAFFE datasets. We

experimentally demonstrate that the learned embedding can be successfully used

for various applications including expression retrieval and emotion

recognition.

A Foreground-background Parallel Compression with Residual Encoding for Surveillance Video

Lirong Wu , Kejie Huang , Haibin Shen , Lianli Gao Subjects : Computer Vision and Pattern Recognition (cs.CV)

The data storage has been one of the bottlenecks in surveillance systems. The

conventional video compression algorithms such as H.264 and H.265 do not fully

utilize the low information density characteristic of the surveillance video.

In this paper, we propose a video compression method that extracts and

compresses the foreground and background of the video separately. The

compression ratio is greatly improved by sharing background information among

multiple adjacent frames through an adaptive background updating and

interpolation module. Besides, we present two different schemes to compress the

foreground and compare their performance in the ablation study to show the

importance of temporal information for video compression. In the decoding end,

a coarse-to-fine two-stage module is applied to achieve the composition of the

foreground and background and the enhancements of frame quality. Furthermore,

an adaptive sampling method for surveillance cameras is proposed, and we have

shown its effects through software simulation. The experimental results show

that our proposed method requires 69.5% less bpp (bits per pixel) than the

conventional algorithm H.265 to achieve the same PSNR (36 dB) on the HECV

dataset.

A GAN-based Tunable Image Compression System

Lirong Wu , Kejie Huang , Haibin Shen Subjects : Computer Vision and Pattern Recognition (cs.CV)

The method of importance map has been widely adopted in DNN-based lossy image

compression to achieve bit allocation according to the importance of image

contents. However, insufficient allocation of bits in non-important regions

often leads to severe distortion at low bpp (bits per pixel), which hampers the

development of efficient content-weighted image compression systems. This paper

rethinks content-based compression by using Generative Adversarial Network

(GAN) to reconstruct the non-important regions. Moreover, multiscale pyramid

decomposition is applied to both the encoder and the discriminator to achieve

global compression of high-resolution images. A tunable compression scheme is

also proposed in this paper to compress an image to any specific compression

ratio without retraining the model. The experimental results show that our

proposed method improves MS-SSIM by more than 10.3% compared to the recently

reported GAN-based method to achieve the same low bpp (0.05) on the Kodak

dataset.

Harmonic Convolutional Networks based on Discrete Cosine Transform

Matej Ulicny , Vladimir A. Krylov , Rozenn Dahyot

Comments: arXiv admin note: substantial text overlap with arXiv:1812.03205

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Convolutional neural networks (CNNs) learn filters in order to capture local

correlation patterns in feature space. In this paper we propose to revert to

learning combinations of preset spectral filters by switching to CNNs with

harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT)

filters which have excellent energy compaction properties and are widely used

for image compression. The proposed harmonic blocks rely on DCT-modeling and

replace conventional convolutional layers to produce partially or fully

harmonic versions of new or existing CNN architectures. We demonstrate how the

harmonic networks can be efficiently compressed in a straightforward manner by

truncating high-frequency information in harmonic blocks which is possible due

to the redundancies in the spectral domain. We report extensive experimental

validation demonstrating the benefits of the introduction of harmonic blocks

into state-of-the-art CNN models in image classification, segmentation and edge

detection applications.

Media Forensics and DeepFakes: an overview

Luisa Verdoliva Subjects : Computer Vision and Pattern Recognition (cs.CV)

With the rapid progress of recent years, techniques that generate and

manipulate multimedia content can now guarantee a very advanced level of

realism. The boundary between real and synthetic media has become very thin. On

the one hand, this opens the door to a series of exciting applications in

different fields such as creative arts, advertising, film production, video

games. On the other hand, it poses enormous security threats. Software packages

freely available on the web allow any individual, without special skills, to

create very realistic fake images and videos. So-called deepfakes can be used

to manipulate public opinion during elections, commit fraud, discredit or

blackmail people. Potential abuses are limited only by human imagination.

Therefore, there is an urgent need for automated tools capable of detecting

false multimedia content and avoiding the spread of dangerous false

information. This review paper aims to present an analysis of the methods for

visual media integrity verification, that is, the detection of manipulated

images and videos. Special emphasis will be placed on the emerging phenomenon

of deepfakes and, from the point of view of the forensic analyst, on modern

data-driven forensic methods. The analysis will help to highlight the limits of

current forensic tools, the most relevant issues, the upcoming challenges, and

suggest future directions for research.

Adapting Grad-CAM for Embedding Networks

Lei Chen , Jianhui Chen , Hossein Hajimirsadeghi , Greg Mori

Comments: WACV 2020 camera ready

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

The gradient-weighted class activation mapping (Grad-CAM) method can

faithfully highlight important regions in images for deep model prediction in

image classification, image captioning and many other tasks. It uses the

gradients in back-propagation as weights (grad-weights) to explain network

decisions. However, applying Grad-CAM to embedding networks raises significant

challenges because embedding networks are trained by millions of dynamically

paired examples (e.g. triplets). To overcome these challenges, we propose an

adaptation of the Grad-CAM method for embedding networks. First, we aggregate

grad-weights from multiple training examples to improve the stability of

Grad-CAM. Then, we develop an efficient weight-transfer method to explain

decisions for any image without back-propagation. We extensively validate the

method on the standard CUB200 dataset in which our method produces more

accurate visual attention than the original Grad-CAM method. We also apply the

method to a house price estimation application using images. The method

produces convincing qualitative results, showcasing the practicality of our

approach.

Temporal Interlacing Network

Hao Shao , Shengju Qian , Yu Liu

Comments: Accepted to AAAI 2020. Winning entry of ICCV Multi-Moments in Time Challenge 2019. Code is available at this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

For a long time, the vision community tries to learn the spatio-temporal

representation by combining convolutional neural network together with various

temporal models, such as the families of Markov chain, optical flow, RNN and

temporal convolution. However, these pipelines consume enormous computing

resources due to the alternately learning process for spatial and temporal

information. One natural question is whether we can embed the temporal

information into the spatial one so the information in the two domains can be

jointly learned once-only. In this work, we answer this question by presenting

a simple yet powerful operator — temporal interlacing network (TIN). Instead

of learning the temporal features, TIN fuses the two kinds of information by

interlacing spatial representations from the past to the future, and vice

versa. A differentiable interlacing target can be learned to control the

interlacing process. In this way, a heavy temporal model is replaced by a

simple interlacing operator. We theoretically prove that with a learnable

interlacing target, TIN performs equivalently to the regularized temporal

convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on

6 challenging benchmarks. These results push the state-of-the-art performances

of video understanding by a considerable margin. Not surprising, the ensemble

model of the proposed TIN won the (1^{st}) place in the ICCV19 – Multi Moments

in Time challenge. Code is made available to facilitate further research at

this https URL

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn , David Berthelot , Chun-Liang Li , Zizhao Zhang , Nicholas Carlini , Ekin D. Cubuk , Alex Kurakin , Han Zhang , Colin Raffel Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Semi-supervised learning (SSL) provides an effective means of leveraging

unlabeled data to improve a model’s performance. In this paper, we demonstrate

the power of a simple combination of two common SSL methods: consistency

regularization and pseudo-labeling. Our algorithm, FixMatch, first generates

pseudo-labels using the model’s predictions on weakly-augmented unlabeled

images. For a given image, the pseudo-label is only retained if the model

produces a high-confidence prediction. The model is then trained to predict the

pseudo-label when fed a strongly-augmented version of the same image. Despite

its simplicity, we show that FixMatch achieves state-of-the-art performance

across a variety of standard semi-supervised learning benchmarks, including

94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 — just

4 labels per class. Since FixMatch bears many similarities to existing SSL

methods that achieve worse performance, we carry out an extensive ablation

study to tease apart the experimental factors that are most important to

FixMatch’s success. We make our code available at

this https URL .

Generate High-Resolution Adversarial Samples by Identifying Effective Features

Sizhe Chen , Peidong Zhang , Chengjin Sun , Jia Cai , Xiaolin Huang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

As the prevalence of deep learning in computer vision, adversarial samples

that weaken the neural networks emerge in large numbers, revealing their

deep-rooted defects. Most adversarial attacks calculate an imperceptible

perturbation in image space to fool the DNNs. In this strategy, the

perturbation looks like noise and thus could be mitigated. Attacks in feature

space produce semantic perturbation, but they could only deal with low

resolution samples. The reason lies in the great number of coupled features to

express a high-resolution image. In this paper, we propose Attack by

Identifying Effective Features (AIEF), which learns different weights for

features to attack. Effective features, those with great weights, influence the

victim model much but distort the image little, and thus are more effective for

attack. By attacking mostly on them, AIEF produces high resolution adversarial

samples with acceptable distortions. We demonstrate the effectiveness of AIEF

by attacking on different tasks with different generative models.

batchboost: regularization for stabilizing training with resistance to underfitting & overfitting

Maciej A. Czyzewski

Comments: 6 pages; 5 figures

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Overfitting & underfitting and stable training are an important challenges in

machine learning. Current approaches for these issues are mixup, SamplePairing

and BC learning. In our work, we state the hypothesis that mixing many images

together can be more effective than just two. Batchboost pipeline has three

stages: (a) pairing: method of selecting two samples. (b) mixing: how to create

a new one from two samples. (c) feeding: combining mixed samples with new ones

from dataset into batch (with ratio (gamma)). Note that sample that appears in

our batch propagates with subsequent iterations with less and less importance

until the end of training. Pairing stage calculates the error per sample, sorts

the samples and pairs with strategy: hardest with easiest one, than mixing

stage merges two samples using mixup, (x_1 + (1-lambda)x_2). Finally, feeding

stage combines new samples with mixed by ratio 1:1. Batchboost has 0.5-3%

better accuracy than the current state-of-the-art mixup regularization on

CIFAR-10 & Fashion-MNIST. Our method is slightly better than SamplePairing

technique on small datasets (up to 5%). Batchboost provides stable training on

not tuned parameters (like weight decay), thus its a good method to test

performance of different architectures. Source code is at:

this https URL

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

Kyoung-Woon On , Eun-Sol Kim , Yu-Jung Heo , Byoung-Tak Zhang

Comments: 8 pages, 3 figures, Association for the Advancement of Artificial Intelligence (AAAI2020). arXiv admin note: substantial text overlap with arXiv:1907.01709

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Conventional sequential learning methods such as Recurrent Neural Networks

(RNNs) focus on interactions between consecutive inputs, i.e. first-order

Markovian dependency. However, most of sequential data, as seen with videos,

have complex dependency structures that imply variable-length semantic flows

and their compositions, and those are hard to be captured by conventional

methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for

learning video data by discovering these complex structures of the video. The

CB-GLNs represent video data as a graph, with nodes and edges corresponding to

frames of the video and their dependencies respectively. The CB-GLNs find

compositional dependencies of the data in multilevel graph forms via a

parameterized kernel with graph-cut and a message passing framework. We

evaluate the proposed method on the two different tasks for video

understanding: Video theme classification (Youtube-8M dataset) and Video

Question and Answering (TVQA dataset). The experimental results show that our

model efficiently learns the semantic compositional structure of video data.

Furthermore, our model achieves the highest performance in comparison to other

baseline methods.

Motif Difference Field: A Simple and Effective Image Representation of Time Series for Classification

Yadong Zhang , Xin Chen Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Time series motifs play an important role in the time series analysis. The

motif-based time series clustering is used for the discovery of higher-order

patterns or structures in time series data. Inspired by the convolutional

neural network (CNN) classifier based on the image representations of time

series, motif difference field (MDF) is proposed. Compared to other image

representations of time series, MDF is simple and easy to construct. With the

Fully Convolution Network (FCN) as the classifier, MDF demonstrates the

superior performance on the UCR time series dataset in benchmark with other

time series classification methods. It is interesting to find that the triadic

time series motifs give the best result in the test. Due to the motif

clustering reflected in MDF, the significant motifs are detected with the help

of the Gradient-weighted Class Activation Mapping (Grad-CAM). The areas in MDF

with high weight in Grad-CAM have a high contribution from the significant

motifs with the desired ordinal patterns associated with the signature patterns

in time series. However, the signature patterns cannot be identified with the

neural network classifiers directly based on the time series.

Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter

Kentaro Wada , Kei Okada , Masayuki Inaba

Comments: 7 pages, 13 figures, IEEE International Conference on Robotics and Automation (ICRA) 2019

Subjects

:

Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

We present joint learning of instance and semantic segmentation for visible

and occluded region masks. Sharing the feature extractor with instance

occlusion segmentation, we introduce semantic occlusion segmentation into the

instance segmentation model. This joint learning fuses the instance- and

image-level reasoning of the mask prediction on the different segmentation

tasks, which was missing in the previous work of learning instance segmentation

only (instance-only). In the experiments, we evaluated the proposed joint

learning comparing the instance-only learning on the test dataset. We also

applied the joint learning model to 2 different types of robotic pick-and-place

tasks (random and target picking) and evaluated its effectiveness to achieve

real-world robotic tasks.

Breast lesion segmentation in ultrasound images with limited annotated data

Bahareh Behboodi , Mina Amiri , Rupert Brooks , Hassan Rivaz

Comments: Accepted to ISBI 2020

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Ultrasound (US) is one of the most commonly used imaging modalities in both

diagnosis and surgical interventions due to its low-cost, safety, and

non-invasive characteristic. US image segmentation is currently a unique

challenge because of the presence of speckle noise. As manual segmentation

requires considerable efforts and time, the development of automatic

segmentation algorithms has attracted researchers attention. Although recent

methodologies based on convolutional neural networks have shown promising

performances, their success relies on the availability of a large number of

training data, which is prohibitively difficult for many applications.

Therefore, in this study we propose the use of simulated US images and natural

images as auxiliary datasets in order to pre-train our segmentation network,

and then to fine-tune with limited in vivo data. We show that with as little as

19 in vivo images, fine-tuning the pre-trained network improves the dice score

by 21% compared to training from scratch. We also demonstrate that if the same

number of natural and simulation US images is available, pre-training on

simulation data is preferable.

Digital synthesis of histological stains using micro-structured and multiplexed virtual staining of label-free tissue

Yijie Zhang , Kevin de Haan , Yair Rivenson , Jingxi Li , Apostolos Delis , Aydogan Ozcan

Comments: 19 pages, 5 figures, 2 tables

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph); Quantitative Methods (q-bio.QM)

Histological staining is a vital step used to diagnose various diseases and

has been used for more than a century to provide contrast to tissue sections,

rendering the tissue constituents visible for microscopic analysis by medical

experts. However, this process is time-consuming, labor-intensive, expensive

and destructive to the specimen. Recently, the ability to virtually-stain

unlabeled tissue sections, entirely avoiding the histochemical staining step,

has been demonstrated using tissue-stain specific deep neural networks. Here,

we present a new deep learning-based framework which generates

virtually-stained images using label-free tissue, where different stains are

merged following a micro-structure map defined by the user. This approach uses

a single deep neural network that receives two different sources of information

at its input: (1) autofluorescence images of the label-free tissue sample, and

(2) a digital staining matrix which represents the desired microscopic map of

different stains to be virtually generated at the same tissue section. This

digital staining matrix is also used to virtually blend existing stains,

digitally synthesizing new histological stains. We trained and blindly tested

this virtual-staining network using unlabeled kidney tissue sections to

generate micro-structured combinations of Hematoxylin and Eosin (H&E), Jones

silver stain, and Masson’s Trichrome stain. Using a single network, this

approach multiplexes virtual staining of label-free tissue with multiple types

of stains and paves the way for synthesizing new digital histological stains

that can be created on the same tissue cross-section, which is currently not

feasible with standard histochemical staining methods.

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Yichao Zhou , Shaunak Mishra , Manisha Verma , Narayan Bhamidipati , Wei Wang

Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020

Subjects

:

Computation and Language (cs.CL)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

There is a perennial need in the online advertising industry to refresh ad

creatives, i.e., images and text used for enticing online users towards a

brand. Such refreshes are required to reduce the likelihood of ad fatigue among

online users, and to incorporate insights from other successful campaigns in

related product categories. Given a brand, to come up with themes for a new ad

is a painstaking and time consuming process for creative strategists.

Strategists typically draw inspiration from the images and text used for past

ad campaigns, as well as world knowledge on the brands. To automatically infer

ad themes via such multimodal sources of information in past ad campaigns, we

propose a theme (keyphrase) recommender system for ad creative strategists. The

theme recommender is based on aggregating results from a visual question

answering (VQA) task, which ingests the following: (i) ad images, (ii) text

associated with the ads as well as Wikipedia pages on the brands in the ads,

and (iii) questions around the ad. We leverage transformer based cross-modality

encoders to train visual-linguistic representations for our VQA task. We study

two formulations for the VQA task along the lines of classification and

ranking; via experiments on a public dataset, we show that cross-modal

representations lead to significantly better classification accuracy and

ranking precision-recall metrics. Cross-modal representations show better

performance compared to separate image and text representations. In addition,

the use of multimodal information shows a significant lift over using only

textual or visual information.

Learning Deformable Registration of Medical Images with Anatomical Constraints

Lucas Mansilla , Diego H. Milone , Enzo Ferrante

Comments: Accepted for publication in Neural Networks (Elsevier). Source code and resulting segmentation masks for the NIH Chest-XRay14 dataset with estimated quality index available at this https URL

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV)

Deformable image registration is a fundamental problem in the field of

medical image analysis. During the last years, we have witnessed the advent of

deep learning-based image registration methods which achieve state-of-the-art

performance, and drastically reduce the required computational time. However,

little work has been done regarding how can we encourage our models to produce

not only accurate, but also anatomically plausible results, which is still an

open question in the field. In this work, we argue that incorporating

anatomical priors in the form of global constraints into the learning process

of these models, will further improve their performance and boost the realism

of the warped images after registration. We learn global non-linear

representations of image anatomy using segmentation masks, and employ them to

constraint the registration process. The proposed AC-RegNet architecture is

evaluated in the context of chest X-ray image registration using three

different datasets, where the high anatomical variability makes the task

extremely challenging. Our experiments show that the proposed anatomically

constrained registration model produces more realistic and accurate results

than state-of-the-art methods, demonstrating the potential of this approach.

A deep network for sinogram and CT image reconstruction

Wei Wang , Xiang-Gen Xia , Chuanjiang He , Zemin Ren , Jian Lu , Tianfu Wang , Baiying Lei Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A CT image can be well reconstructed when the sampling rate of the sinogram

satisfies the Nyquist criteria and the sampled signal is noise-free. However,

in practice, the sinogram is usually contaminated by noise, which degrades the

quality of a reconstructed CT image. In this paper, we design a deep network

for sinogram and CT image reconstruction. The network consists of two cascaded

blocks that are linked by a filter backprojection (FBP) layer, where the former

block is responsible for denoising and completing the sinograms while the

latter is used to removing the noise and artifacts of the CT images.

Experimental results show that the reconstructed CT images by our methods have

the highest PSNR and SSIM in average compared to state of the art methods.

Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives

Daniel J. Trosten , Michael C. Kampffmeyer , Robert Jenssen

Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems

Subjects

:

Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper we develop a new model for deep image clustering, using

convolutional neural networks and tensor kernels. The proposed Deep Tensor

Kernel Clustering (DTKC) consists of a convolutional neural network (CNN),

which is trained to reflect a common cluster structure at the output of its

intermediate layers. Encouraging a consistent cluster structure throughout the

network has the potential to guide it towards meaningful clusters, even though

these clusters might appear to be nonlinear in the input space. The cluster

structure is enforced through the idea of unsupervised companion objectives,

where separate loss functions are attached to layers in the network. These

unsupervised companion objectives are constructed based on a proposed

generalization of the Cauchy-Schwarz (CS) divergence, from vectors to tensors

of arbitrary rank. Generalizing the CS divergence to tensor-valued data is a

crucial step, due to the tensorial nature of the intermediate representations

in the CNN. Several experiments are conducted to thoroughly assess the

performance of the proposed DTKC model. The results indicate that the model

outperforms, or performs comparable to, a wide range of baseline algorithms. We

also empirically demonstrate that our model does not suffer from objective

function mismatch, which can be a problematic artifact in autoencoder-based

clustering models.

An Efficient Framework for Automated Screening of Clinically Significant Macular Edema

Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The present study proposes a new approach to automated screening of

Clinically Significant Macular Edema (CSME) and addresses two major challenges

associated with such screenings, i.e., exudate segmentation and imbalanced

datasets. The proposed approach replaces the conventional exudate segmentation

based feature extraction by combining a pre-trained deep neural network with

meta-heuristic feature selection. A feature space over-sampling technique is

being used to overcome the effects of skewed datasets and the screening is

accomplished by a k-NN based classifier. The role of each data-processing step

(e.g., class balancing, feature selection) and the effects of limiting the

region-of-interest to fovea on the classification performance are critically

analyzed. Finally, the selection and implication of operating point on Receiver

Operating Characteristic curve are discussed. The results of this study

convincingly demonstrate that by following these fundamental practices of

machine learning, a basic k-NN based classifier could effectively accomplish

the CSME screening.

Towards Augmented Reality-based Suturing in Monocular Laparoscopic Training

Chandrakanth Jayachandran Preetha , Jonathan Kloss , Fabian Siegfried Wehrtmann , Lalith Sharan , Carolyn Fan , Beat Peter Müller-Stich , Felix Nickel , Sandy Engelhardt

Comments: Accepted for SPIE Medical Imaging 2020

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV)

Minimally Invasive Surgery (MIS) techniques have gained rapid popularity

among surgeons since they offer significant clinical benefits including reduced

recovery time and diminished post-operative adverse effects. However,

conventional endoscopic systems output monocular video which compromises depth

perception, spatial orientation and field of view. Suturing is one of the most

complex tasks performed under these circumstances. Key components of this tasks

are the interplay between needle holder and the surgical needle. Reliable 3D

localization of needle and instruments in real time could be used to augment

the scene with additional parameters that describe their quantitative geometric

relation, e.g. the relation between the estimated needle plane and its rotation

center and the instrument. This could contribute towards standardization and

training of basic skills and operative techniques, enhance overall surgical

performance, and reduce the risk of complications. The paper proposes an

Augmented Reality environment with quantitative and qualitative visual

representations to enhance laparoscopic training outcomes performed on a

silicone pad. This is enabled by a multi-task supervised deep neural network

which performs multi-class segmentation and depth map prediction. Scarcity of

labels has been conquered by creating a virtual environment which resembles the

surgical training scenario to generate dense depth maps and segmentation maps.

The proposed convolutional neural network was tested on real surgical training

scenarios and showed to be robust to occlusion of the needle. The network

achieves a dice score of 0.67 for surgical needle segmentation, 0.81 for needle

holder instrument segmentation and a mean absolute error of 6.5 mm for depth

estimation.

Gradient Surgery for Multi-Task Learning

Tianhe Yu , Saurabh Kumar , Abhishek Gupta , Sergey Levine , Karol Hausman , Chelsea Finn Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

While deep learning and deep reinforcement learning (RL) systems have

demonstrated impressive results in domains such as image classification, game

playing, and robotic control, data efficiency remains a major challenge.

Multi-task learning has emerged as a promising approach for sharing structure

across multiple tasks to enable more efficient learning. However, the

multi-task setting presents a number of optimization challenges, making it

difficult to realize large efficiency gains compared to learning tasks

independently. The reasons why multi-task learning is so challenging compared

to single-task learning are not fully understood. In this work, we identify a

set of three conditions of the multi-task optimization landscape that cause

detrimental gradient interference, and develop a simple yet general approach

for avoiding such interference between task gradients. We propose a form of

gradient surgery that projects a task’s gradient onto the normal plane of the

gradient of any other task that has a conflicting gradient. On a series of

challenging multi-task supervised and multi-task RL problems, this approach

leads to substantial gains in efficiency and performance. Further, it is

model-agnostic and can be combined with previously-proposed multi-task

architectures for enhanced performance.

A Transfer Learning Approach to Cross-Modal Object Recognition: From Visual Observation to Robotic Haptic Exploration

Pietro Falco , Shuang Lu , Ciro Natale , Salvatore Pirozzi , Dongheui Lee

Journal-ref: IEEE Transactions on Robotics ( Volume: 35 , Issue: 4 , Aug. 2019

)

Subjects

:

Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

In this work, we introduce the problem of cross-modal visuo-tactile object

recognition with robotic active exploration. With this term, we mean that the

robot observes a set of objects with visual perception and, later on, it is

able to recognize such objects only with tactile exploration, without having

touched any object before. Using a machine learning terminology, in our

application we have a visual training set and a tactile test set, or vice

versa. To tackle this problem, we propose an approach constituted by four

steps: finding a visuo-tactile common representation, defining a suitable set

of features, transferring the features across the domains, and classifying the

objects. We show the results of our approach using a set of 15 objects,

collecting 40 visual examples and five tactile examples for each object. The

proposed approach achieves an accuracy of 94.7%, which is comparable with the

accuracy of the monomodal case, i.e., when using visual data both as training

set and test set. Moreover, it performs well compared to the human ability,

which we have roughly estimated carrying out an experiment with ten

participants.

OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning

Shuo Wang , Tianle Chen , Shangyu Chen , Carsten Rudolph , Surya Nepal , Marthie Grobler Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Machine Learning (stat.ML)

Anomaly detection aims to recognize samples with anomalous and unusual

patterns with respect to a set of normal data, which is significant for

numerous domain applications, e.g. in industrial inspection, medical imaging,

and security enforcement. There are two key research challenges associated with

existing anomaly detention approaches: (1) many of them perform well on

low-dimensional problems however the performance on high-dimensional instances

is limited, such as images; (2) many of them depend on often still rely on

traditional supervised approaches and manual engineering of features, while the

topic has not been fully explored yet using modern deep learning approaches,

even when the well-label samples are limited. In this paper, we propose a

One-for-all Image Anomaly Detection system (OIAD) based on disentangled

learning using only clean samples. Our key insight is that the impact of small

perturbation on the latent representation can be bounded for normal samples

while anomaly images are usually outside such bounded intervals, called

structure consistency. We implement this idea and evaluate its performance for

anomaly detention. Our experiments with three datasets show that OIAD can

detect over (90\%) of anomalies while maintaining a high low false alarm rate.

It can also detect suspicious samples from samples labeled as clean, coincided

with what humans would deem unusual.

Accelerating the Registration of Image Sequences by Spatio-temporal Multilevel Strategies

Hari Om Aggrawal , Jan Modersitzki

Comments: Accepted at ISBI 2020

Subjects

:

Signal Processing (eess.SP)

; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Multilevel strategies are an integral part of many image registration

algorithms. These strategies are very well-known for avoiding undesirable local

minima, providing an outstanding initial guess, and reducing overall

computation time. State-of-the-art multilevel strategies build a hierarchy of

discretization in the spatial dimensions. In this paper, we present a

spatio-temporal strategy, where we introduce a hierarchical discretization in

the temporal dimension at each spatial level. This strategy is suitable for a

motion estimation problem where the motion is assumed smooth over time. Our

strategy exploits the temporal smoothness among image frames by following a

predictor-corrector approach. The strategy predicts the motion by a novel

interpolation method and later corrects it by registration. The prediction step

provides a good initial guess for the correction step, hence reduces the

overall computational time for registration. The acceleration is achieved by a

factor of 2.5 on average, over the state-of-the-art multilevel methods on three

examined optical coherence tomography datasets.

CHAOS Challenge — Combined (CT-MR) Healthy Abdominal Organ Segmentation

Segmentation of abdominal organs has been a comprehensive, yet unresolved,

research field for many years. In the last decade, intensive developments in

deep learning (DL) have introduced new state-of-the-art segmentation systems.

Despite outperforming the overall accuracy of existing systems, the effects of

DL model properties and parameters on the performance is hard to interpret.

This makes comparative analysis a necessary tool to achieve explainable studies

and systems. Moreover, the performance of DL for emerging learning approaches

such as cross-modality and multi-modal tasks have been rarely discussed. In

order to expand the knowledge in these topics, CHAOS — Combined (CT-MR)

Healthy Abdominal Organ Segmentation challenge has been organized in the IEEE

International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy.

Despite a large number of the previous abdomen related challenges, the majority

of which are focused on tumor/lesion detection and/or classification with a

single modality, CHAOS provides both abdominal CT and MR data from healthy

subjects. Five different and complementary tasks have been designed to analyze

the capabilities of the current approaches from multiple perspectives. The

results are investigated thoroughly, compared with manual annotations and

interactive methods. The outcomes are reported in detail to reflect the latest

advancements in the field. CHAOS challenge and data will be available online to

provide a continuous benchmark resource for segmentation.

Artificial Intelligence

Stochastic Finite State Control of POMDPs with LTL Specifications

Mohamadreza Ahmadi , Rangoli Sharan , Joel W. Burdick Subjects : Artificial Intelligence (cs.AI) ; Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)

Partially observable Markov decision processes (POMDPs) provide a modeling

framework for autonomous decision making under uncertainty and imperfect

sensing, e.g. robot manipulation and self-driving cars. However, optimal

control of POMDPs is notoriously intractable. This paper considers the

quantitative problem of synthesizing sub-optimal stochastic finite state

controllers (sFSCs) for POMDPs such that the probability of satisfying a set of

high-level specifications in terms of linear temporal logic (LTL) formulae is

maximized. We begin by casting the latter problem into an optimization and use

relaxations based on the Poisson equation and McCormick envelopes. Then, we

propose an stochastic bounded policy iteration algorithm, leading to a

controlled growth in sFSC size and an any time algorithm, where the performance

of the controller improves with successive iterations, but can be stopped by

the user based on time or memory considerations. We illustrate the proposed

method by a robot navigation case study.

Adequate and fair explanations

Nicholas Asher , Soumya Paul , Chris Russell Subjects : Artificial Intelligence (cs.AI)

Explaining sophisticated machine-learning based systems is an important issue

at the foundations of AI. Recent efforts have shown various methods for

providing explanations. These approaches can be broadly divided into two

schools: those that provide a local and human interpreatable approximation of a

machine learning algorithm, and logical approaches that exactly characterise

one aspect of the decision. In this paper we focus upon the second school of

exact explanations with a rigorous logical foundation. There is an

epistemological problem with these exact methods. While they can furnish

complete explanations, such explanations may be too complex for humans to

understand or even to write down in human readable form. Interpretability

requires epistemically accessible explanations, explanations humans can grasp.

Yet what is a sufficiently complete epistemically accessible explanation still

needs clarification. We do this here in terms of counterfactuals, following

[Wachter et al., 2017]. With counterfactual explanations, many of the

assumptions needed to provide a complete explanation are left implicit. To do

so, counterfactual explanations exploit the properties of a particular data

point or sample, and as such are also local as well as partial explanations. We

explore how to move from local partial explanations to what we call complete

local explanations and then to global ones. But to preserve accessibility we

argue for the need for partiality. This partiality makes it possible to hide

explicit biases present in the algorithm that may be injurious or unfair.We

investigate how easy it is to uncover these biases in providing complete and

fair explanations by exploiting the structure of the set of counterfactuals

providing a complete local explanation.

Implementations in Machine Ethics: A Survey

Suzanne Tolmeijer , Markus Kneer , Cristina Sarasua , Markus Christen , Abraham Bernstein

Comments: 37 pages, 7 tables, 4 figures, draft version

Subjects

:

Artificial Intelligence (cs.AI)

Increasingly complex and autonomous systems require machine ethics to

maximize the benefits and minimize the risks to society arising from the new

technology. It is challenging to decide which type of ethical theory to employ

and how to implement it effectively. This survey provides a threefold

contribution. Firstly, it introduces a taxonomy to analyze the field of machine

ethics from an ethical, implementational, and technical perspective. Secondly,

an exhaustive selection and description of relevant works is presented.

Thirdly, applying the new taxonomy to the selected works, dominant research

patterns and lessons for the field are identified, and future directions for

research are suggested.

Channels' Confirmation and Predictions' Confirmation: from the Medical Test to the Raven Paradox

Chenguang Lu

Comments: 12 tables, 7 figures

Subjects

:

Artificial Intelligence (cs.AI)

; Information Theory (cs.IT); Logic (math.LO)

After long arguments between positivism and falsificationism, the

verification of universal hypotheses was replaced with the confirmation of

uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox

(RP). Then, Carnap used the logical probability increment as the confirmation

measure. So far, many confirmation measures have been proposed. Measure F among

them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries

proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and

normalizing property suggested by many researchers. Based on the semantic

information theory, a measure b* similar to F is derived from the medical test.

Like the likelihood ratio, b* and F can only indicate the quality of channels

or the testing means instead of the quality of probability predictions. And, it

is still not easy to use b*, F, or another measure to clarify the RP. For this

reason, measure c* similar to the correct rate is derived. The c* has the

simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines

the Equivalence Condition, and hence, can be used to eliminate the RP. Some

examples are provided to show why it is difficult to use one of popular

confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that

fewer counterexamples’ existence is more essential than more positive examples’

existence, and hence, are compatible with Popper’s falsification thought.

AI Trust in business processes: The need for process-aware explanations

Steve T.K. Jan , Vatche Ishakian , Vinod Muthusamy Subjects : Artificial Intelligence (cs.AI)

Business processes underpin a large number of enterprise operations including

processing loan applications, managing invoices, and insurance claims. There is

a large opportunity for infusing AI to reduce cost or provide better customer

experience, and the business process management (BPM) literature is rich in

machine learning solutions including unsupervised learning to gain insights on

clusters of process traces, classification models to predict the outcomes,

duration, or paths of partial process traces, extracting business process from

documents, and models to recommend how to optimize a business process or

navigate decision points. More recently, deep learning models including those

from the NLP domain have been applied to process predictions.

Unfortunately, very little of these innovations have been applied and adopted

by enterprise companies. We assert that a large reason for the lack of adoption

of AI models in BPM is that business users are risk-averse and do not

implicitly trust AI models. There has, unfortunately, been little attention

paid to explaining model predictions to business users with process context. We

challenge the BPM community to build on the AI interpretability literature, and

the AI Trust community to understand

Combining Federated and Active Learning for Communication-efficient Distributed Failure Prediction in Aeronautics

Nicolas Aussel (INF, ACMES-SAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMES-SAMOVAR), Yohan Petetin (TIPIC-SAMOVAR, CITI, IP Paris) Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Machine Learning has proven useful in the recent years as a way to achieve

failure prediction for industrial systems. However, the high computational

resources necessary to run learning algorithms are an obstacle to its

widespread application. The sub-field of Distributed Learning offers a solution

to this problem by enabling the use of remote resources but at the expense of

introducing communication costs in the application that are not always

acceptable. In this paper, we propose a distributed learning approach able to

optimize the use of computational and communication resources to achieve

excellent learning model performances through a centralized architecture. To

achieve this, we present a new centralized distributed learning algorithm that

relies on the learning paradigms of Active Learning and Federated Learning to

offer a communication-efficient method that offers guarantees of model

precision on both the clients and the central server. We evaluate this method

on a public benchmark and show that its performances in terms of precision are

very close to state-of-the-art performance level of non-distributed learning

despite additional constraints.

A multi-agent ontologies-based clinical decision support system

Ying Shen (UPN), Jacquet-Andrieu Armelle , Joël Colloc (IDEES)

Comments: in French

Journal-ref: AMINA’2012, Jan 2012, Mahdia, Tunisie

Subjects

:

Artificial Intelligence (cs.AI)

Clinical decision support systems combine knowledge and data from a variety

of sources, represented by quantitative models based on stochastic methods, or

qualitative based rather on expert heuristics and deductive reasoning. At the

same time, case-based reasoning (CBR) memorizes and returns the experience of

solving similar problems. The cooperation of heterogeneous clinical knowledge

bases (knowledge objects, semantic distances, evaluation functions, logical

rules, databases…) is based on medical ontologies. A multi-agent decision

support system (MADSS) enables the integration and cooperation of agents

specialized in different fields of knowledge (semiology, pharmacology, clinical

cases, etc.). Each specialist agent operates a knowledge base defining the

conduct to be maintained in conformity with the state of the art associated

with an ontological basis that expresses the semantic relationships between the

terms of the domain in question. Our approach is based on the specialization of

agents adapted to the knowledge models used during the clinical steps and

ontologies. This modular approach is suitable for the realization of MADSS in

many areas.

On Algorithmic Decision Procedures in Emergency Response Systems in Smart and Connected Communities

Geoffrey Pettet , Ayan Mukhopadhyay , Mykel Kochenderfer , Yevgeniy Vorobeychik , Abhishek Dubey

Comments: Accepted at AAMAS 2020 (International Conference on Autonomous Agents and Multiagent Systems)

Subjects

:

Artificial Intelligence (cs.AI)

Emergency Response Management (ERM) is a critical problem faced by

communities across the globe. Despite its importance, it is common for ERM

systems to follow myopic and straight-forward decision policies in the real

world. Principled approaches to aid decision-making under uncertainty have been

explored in this context but have failed to be accepted into real systems. We

identify a key issue impeding their adoption – algorithmic approaches to

emergency response focus on reactive, post-incident dispatching actions, i.e.

optimally dispatching a responder after incidents occur. However, the critical

nature of emergency response dictates that when an incident occurs, first

responders always dispatch the closest available responder to the incident. We

argue that the crucial period of planning for ERM systems is not post-incident,

but between incidents. However, this is not a trivial planning problem – a

major challenge with dynamically balancing the spatial distribution of

responders is the complexity of the problem. An orthogonal problem in ERM

systems is to plan under limited communication, which is particularly important

in disaster scenarios that affect communication networks. We address both the

problems by proposing two partially decentralized multi-agent planning

algorithms that utilize heuristics and the structure of the dispatch problem.

We evaluate our proposed approach using real-world data, and find that in

several contexts, dynamic re-balancing the spatial distribution of emergency

responders reduces both the average response time as well as its variance.

Sampling and Learning for Boolean Function

Chuyu Xiong Subjects : Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)

In this article, we continue our study on universal learning machine by

introducing new tools. We first discuss boolean function and boolean circuit,

and we establish one set of tools, namely, fitting extremum and proper sampling

set. We proved the fundamental relationship between proper sampling set and

complexity of boolean circuit. Armed with this set of tools, we then introduce

much more effective learning strategies. We show that with such learning

strategies and learning dynamics, universal learning can be achieved, and

requires much less data.

AutoMATES: Automated Model Assembly from Text, Equations, and Software

Adarsh Pyarelal , Marco A. Valenzuela-Escarcega , Rebecca Sharp , Paul D. Hein , Jon Stephens , Pratik Bhandari , HeuiChan Lim , Saumya Debray , Clayton T. Morrison

Comments: 8 pages, 6 figures, accepted to Modeling the World’s Systems 2019

Subjects

:

Artificial Intelligence (cs.AI)

; Multimedia (cs.MM); Software Engineering (cs.SE)

Models of complicated systems can be represented in different ways – in

scientific papers, they are represented using natural language text as well as

equations. But to be of real use, they must also be implemented as software,

thus making code a third form of representing models. We introduce the

AutoMATES project, which aims to build semantically-rich unified

representations of models from scientific code and publications to facilitate

the integration of computational models from different domains and allow for

modeling large, complicated systems that span multiple domains and levels of

abstraction.

Towards Social Identity in Socio-Cognitive Agents

Diogo Rato , Samuel Mascarenhas , Rui Prada Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)

Current architectures for social agents are designed around some specific

units of social behaviour that address particular challenges. Although their

performance might be adequate for controlled environments, deploying these

agents in the wild is difficult. Moreover, the increasing demand for autonomous

agents capable of living alongside humans calls for the design of more robust

social agents that can cope with diverse social situations. We believe that to

design such agents, their sociality and cognition should be conceived as one.

This includes creating mechanisms for constructing social reality as an

interpretation of the physical world with social meanings and selective

deployment of cognitive resources adequate to the situation. We identify

several design principles that should be considered while designing agent

architectures for socio-cognitive systems. Taking these remarks into account,

we propose a socio-cognitive agent model based on the concept of Cognitive

Social Frames that allow the adaptation of an agent’s cognition based on its

interpretation of its surroundings, its Social Context. Our approach supports

an agent’s reasoning about other social actors and its relationship with them.

Cognitive Social Frames can be built around social groups, and form the basis

for social group dynamics mechanisms and construct of Social Identity.

The Incentives that Shape Behaviour

Ryan Carey , Eric Langlois , Tom Everitt , Shane Legg

Comments: 12 pages, 7 figures, accepted to SafeAI workshop at AAAI

Subjects

:

Artificial Intelligence (cs.AI)

; Machine Learning (cs.LG)

Which variables does an agent have an incentive to control with its decision,

and which variables does it have an incentive to respond to? We formalise these

incentives, and demonstrate unique graphical criteria for detecting them in any

single decision causal influence diagram. To this end, we introduce structural

causal influence models, a hybrid of the influence diagram and structural

causal model frameworks. Finally, we illustrate how these incentives predict

agent incentives in both fairness and AI safety applications.

Measuring Diversity of Artificial Intelligence Conferences

Ana Freire , Lorenzo Porcaro , Emilia Gómez Subjects : Artificial Intelligence (cs.AI)

The lack of diversity of the Artificial Intelligence (AI) field is nowadays a

concern, and several initiatives such as funding schemes and mentoring programs

have been designed to fight against it. However, there is no indication on how

these initiatives actually impact AI diversity in the short and long term. This

work studies the concept of diversity in this particular context and proposes a

small set of diversity indicators (i.e. indexes) of AI scientific events. These

indicators are designed to quantify the lack of diversity of the AI field and

monitor its evolution. We consider diversity in terms of gender, geographical

location and business (understood as the presence of academia versus industry).

We compute these indicators for the different communities of a conference:

authors, keynote speakers and organizing committee. From these components we

compute a summarized diversity indicator for each AI event. We evaluate the

proposed indexes for a set of recent major AI conferences and we discuss their

values and limitations.

MOEA/D with Random Partial Update Strategy

Yuri Lavinas , Claus Aranha , Marcelo Ladeira , Felipe Campelo Subjects : Artificial Intelligence (cs.AI)

Recent studies on resource allocation suggest that some subproblems are more

important than others in the context of the MOEA/D, and that focusing on the

most relevant ones can consistently improve the performance of that algorithm.

These studies share the common characteristic of updating only a fraction of

the population at any given iteration of the algorithm. In this work we

investigate a new, simpler partial update strategy, in which a random subset of

solutions is selected at every iteration. The performance of the MOEA/D using

this new resource allocation approach is compared experimentally against that

of the standard MOEA/D-DE and the MOEA/D with relative improvement-based

resource allocation. The results indicate that using the MOEA/D with this new

partial update strategy results in improved HV and IGD values, and a much

higher proportion of non-dominated solutions, particularly as the number of

updated solutions at every iteration is reduced.

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Amit Kumar Mondal , Nadeem Jamali Subjects : Artificial Intelligence (cs.AI)

Reinforcement learning is one of the core components in designing an

artificial intelligent system emphasizing real-time response. Reinforcement

learning influences the system to take actions within an arbitrary environment

either having previous knowledge about the environment model or not. In this

paper, we present a comprehensive study on Reinforcement Learning focusing on

various dimensions including challenges, the recent development of different

state-of-the-art techniques, and future directions. The fundamental objective

of this paper is to provide a framework for the presentation of available

methods of reinforcement learning that is informative enough and simple to

follow for the new researchers and academics in this domain considering the

latest concerns. First, we illustrated the core techniques of reinforcement

learning in an easily understandable and comparable way. Finally, we analyzed

and depicted the recent developments in reinforcement learning approaches. My

analysis pointed out that most of the models focused on tuning policy values

rather than tuning other things in a particular state of reasoning.

Correcting Knowledge Base Assertions

Jiaoyan Chen , Xi Chen , Ian Horrocks , Ernesto Jimenez-Ruiz , Erik B. Myklebus

Comments: Accepted by The Web Conference (WWW) 2020

Subjects

:

Artificial Intelligence (cs.AI)

The usefulness and usability of knowledge bases (KBs) is often limited by

quality issues. One common issue is the presence of erroneous assertions, often

caused by lexical or semantic confusion. We study the problem of correcting

such assertions, and present a general correction framework which combines

lexical matching, semantic embedding, soft constraint mining and semantic

consistency checking. The framework is evaluated using DBpedia and an

enterprise medical KB.

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Baicen Xiao , Qifan Lu , Bhaskar Ramasubramanian , Andrew Clark , Linda Bushnell , Radha Poovendran

Comments: Accepted as Full Paper to International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2020

Subjects

:

Artificial Intelligence (cs.AI)

Reinforcement learning has been successful in training autonomous agents to

accomplish goals in complex environments. Although this has been adapted to

multiple settings, including robotics and computer games, human players often

find it easier to obtain higher rewards in some environments than reinforcement

learning algorithms. This is especially true of high-dimensional state spaces

where the reward obtained by the agent is sparse or extremely delayed. In this

paper, we seek to effectively integrate feedback signals supplied by a human

operator with deep reinforcement learning algorithms in high-dimensional state

spaces. We call this FRESH (Feedback-based REward SHaping). During training, a

human operator is presented with trajectories from a replay buffer and then

provides feedback on states and actions in the trajectory. In order to

generalize feedback signals provided by the human operator to previously unseen

states and actions at test-time, we use a feedback neural network. We use an

ensemble of neural networks with a shared network architecture to represent

model uncertainty and the confidence of the neural network in its output. The

output of the feedback neural network is converted to a shaping reward that is

augmented to the reward provided by the environment. We evaluate our approach

on the Bowling and Skiing Atari games in the arcade learning environment.

Although human experts have been able to achieve high scores in these

environments, state-of-the-art deep learning algorithms perform poorly. We

observe that FRESH is able to achieve much higher scores than state-of-the-art

deep learning algorithms in both environments. FRESH also achieves a 21.4%

higher score than a human expert in Bowling and does as well as a human expert

in Skiing.

Fair Transfer of Multiple Style Attributes in Text

Karan Dabas , Nishtha Madan , Vijay Arya , Sameep Mehta , Gautam Singh , Tanmoy Chakraborty Subjects : Artificial Intelligence (cs.AI)

To preserve anonymity and obfuscate their identity on online platforms users

may morph their text and portray themselves as a different gender or

demographic. Similarly, a chatbot may need to customize its communication style

to improve engagement with its audience. This manner of changing the style of

written text has gained significant attention in recent years. Yet these past

research works largely cater to the transfer of single style attributes. The

disadvantage of focusing on a single style alone is that this often results in

target text where other existing style attributes behave unpredictably or are

unfairly dominated by the new style. To counteract this behavior, it would be

nice to have a style transfer mechanism that can transfer or control multiple

styles simultaneously and fairly. Through such an approach, one could obtain

obfuscated or written text incorporated with a desired degree of multiple soft

styles such as female-quality, politeness, or formalness.

In this work, we demonstrate that the transfer of multiple styles cannot be

achieved by sequentially performing multiple single-style transfers. This is

because each single style-transfer step often reverses or dominates over the

style incorporated by a previous transfer step. We then propose a neural

network architecture for fairly transferring multiple style attributes in a

given text. We test our architecture on the Yelp data set to demonstrate our

superior performance as compared to existing one-style transfer steps performed

in a sequence.

Intelligent Bandwidth Allocation for Latency Management in NG-EPON using Reinforcement Learning Methods

Qi Zhou , Jingjie Zhu , Junwen Zhang , Zhensheng Jia , Bernardo Huberman , Gee-Kung Chang Subjects : Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)

A novel intelligent bandwidth allocation scheme in NG-EPON using

reinforcement learning is proposed and demonstrated for latency management. We

verify the capability of the proposed scheme under both fixed and dynamic

traffic loads scenarios to achieve <1ms average latency. The RL agent

demonstrates an efficient intelligent mechanism to manage the latency, which

provides a promising IBA solution for the next-generation access network.

Deceptive AI Explanations: Creation and Detection

Johannes Schneider , Joshua Handali , Michalis Vlachos , Christian Meske Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Artificial intelligence comes with great opportunities and but also great

risks. We investigate to what extent deep learning can be used to create and

detect deceptive explanations that either aim to lure a human into believing a

decision that is not truthful to the model or provide reasoning that is

non-faithful to the decision. Our theoretical insights show some limits of

deception and detection in the absence of domain knowledge. For empirical

evaluation, we focus on text classification. To create deceptive explanations,

we alter explanations originating from GradCAM, a state-of-art technique for

creating explanations in neural networks. We evaluate the effectiveness of

deceptive explanations on 200 participants. Our findings indicate that

deceptive explanations can indeed fool humans. Our classifier can detect even

seemingly minor attempts of deception with accuracy that exceeds 80\% given

sufficient domain knowledge encoded in the form of training data.

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Stefan Ultes

Comments: Published at SIGDIAL 2019

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Learning suitable and well-performing dialogue behaviour in statistical

spoken dialogue systems has been in the focus of research for many years. While

most work which is based on reinforcement learning employs an objective measure

like task success for modelling the reward signal, we use a reward based on

user satisfaction estimation. We propose a novel estimator and show that it

outperforms all previous estimators while learning temporal dependencies

implicitly. Furthermore, we apply this novel user satisfaction estimation model

live in simulated experiments where the satisfaction estimation model is

trained on one domain and applied in many other domains which cover a similar

task. We show that applying this model results in higher estimated

satisfaction, similar task success rates and a higher robustness to noise.

Turing analogues of Gödel statements and computability of intelligence

Yasha Savelyev

Comments: This is rewrite from a completely mathematical viewpoint of arXiv:1810.06985 , which is to be withdrawn

Subjects

:

Logic in Computer Science (cs.LO)

; Artificial Intelligence (cs.AI)

We show that there is a mathematical obstruction to complete Turing

computability of intelligence. This obstruction can be circumvented only if

human reasoning is fundamentally unsound. The most compelling original argument

for existence of such an obstruction was proposed by Penrose, however Gödel,

Turing and Lucas have also proposed such arguments. We first partially

reformulate the argument of Penrose. In this formulation we argue that his

argument works up to possibility of construction of a certain Gödel

statement. We then completely re-frame the argument in the language of Turing

machines, and by partially defining our subject just enough, we show that a

certain analogue of a Gödel statement, or a Gödel string as we call it in

the language of Turing machines, can be readily constructed directly, without

appeal to the Gödel incompleteness theorem, and thus removing the final

objection.

Provenance for the Description Logic ELHr

Camille Bourgaux , Ana Ozaki , Rafael Peñaloza , Livia Predoiu

Comments: 24 pages

Subjects

:

Logic in Computer Science (cs.LO)

; Artificial Intelligence (cs.AI)

We address the problem of handling provenance information in ELHr ontologies.

We consider a setting recently introduced for ontology-based data access, based

on semirings and extending classical data provenance, in which ontology axioms

are annotated with provenance tokens. A consequence inherits the provenance of

the axioms involved in deriving it, yielding a provenance polynomial as an

annotation. We analyse the semantics for the ELHr case and show that the

presence of conjunctions poses various difficulties for handling provenance,

some of which are mitigated by assuming multiplicative idempotency of the

semiring. Under this assumption, we study three problems: ontology completion

with provenance, computing the set of relevant axioms for a consequence, and

query answering.

Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping

Eugenio Bargiacchi , Timothy Verstraeten , Diederik M. Roijers , Ann Nowé Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We present a new model-based reinforcement learning algorithm, Cooperative

Prioritized Sweeping, for efficient learning in multi-agent Markov decision

processes. The algorithm allows for sample-efficient learning on large problems

by exploiting a factorization to approximate the value function. Our approach

only requires knowledge about the structure of the problem in the form of a

dynamic decision network. Using this information, our method learns a model of

the environment and performs temporal difference updates which affect multiple

joint states and actions at once. Batch updates are additionally performed

which efficiently back-propagate knowledge throughout the factored Q-function.

Our method outperforms the state-of-the-art algorithm sparse cooperative

Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized

environments.

Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems

Vevake Balaraman , Bernardo Magnini Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

In task-oriented dialogue systems the dialogue state tracker (DST) component

is responsible for predicting the state of the dialogue based on the dialogue

history. Current DST approaches rely on a predefined domain ontology, a fact

that limits their effective usage for large scale conversational agents, where

the DST constantly needs to be interfaced with ever-increasing services and

APIs. Focused towards overcoming this drawback, we propose a domain-aware

dialogue state tracker, that is completely data-driven and it is modeled to

predict for dynamic service schemas. The proposed model utilizes domain and

slot information to extract both domain and slot specific representations for a

given dialogue, and then uses such representations to predict the values of the

corresponding slot. Integrating this mechanism with a pretrained language model

(i.e. BERT), our approach can effectively learn semantic relations.

Node Masking: Making Graph Neural Networks Generalize and Scale Better

Pushkar Mishra , Aleksandra Piktus , Gerard Goossen , Fabrizio Silvestri Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Graph Neural Networks (GNNs) have received a lot of interest in the recent

times. From the early spectral architectures that could only operate on

undirected graphs per a transductive learning paradigm to the current state of

the art spatial ones that can apply inductively to arbitrary graphs, GNNs have

seen significant contributions from the research community. In this paper, we

discuss some theoretical tools to better visualize the operations performed by

state of the art spatial GNNs. We analyze the inner workings of these

architectures and introduce a simple concept, node masking, that allows them to

generalize and scale better. To empirically validate the theory, we perform

several experiments on two widely used benchmark datasets for node

classification in both transductive and inductive settings.

Engineering AI Systems: A Research Agenda

Jan Bosch , Ivica Crnkovic , Helena Holmström Olsson

Comments: 13 pages, 3 figures, highlights section

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Deploying machine-, and in particular deep-learning, (ML/DL) solutions in

industry-strength, production quality contexts proves to challenging. This

requires a structured engineering approach to constructing and evolving systems

that contain ML/DL components. In this paper, we provide a conceptualization of

the typical evolution patterns that companies experience when employing ML/DL

well as a framework for integrating ML/DL components in systems consisting of

multiple types of components. In addition, we provide an overview of the

engineering challenges surrounding AI/ML/DL solutions and, based on that, we

provide a research agenda and overview of open items that need to be addressed

by the research community at large.

Unsupervisedly Learned Representations: Should the Quest be Over?

Daniel N. Nissani (Nissensohn)

Comments: submitted to Pattern Recognition Letters

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI)

There exists a Classification accuracy gap of about 20% between our best

methods of generating Unsupervisedly Learned Representations and the accuracy

rates achieved by (naturally Unsupervisedly Learning) humans. We are at our

fourth decade at least in search of this class of paradigms. It thus may well

be that we are looking in the wrong direction. We present in this paper a

possible solution to this puzzle. We demonstrate that Reinforcement Learning

schemes can learn representations, which may be used for Pattern Recognition

tasks such as Classification, achieving practically the same accuracy as that

of humans. Our main modest contribution lies in the observations that: a. when

applied to a real world environment (e.g. nature itself) Reinforcement Learning

does not require labels, and thus may be considered a natural candidate for the

long sought, accuracy competitive Unsupervised Learning method, and b. in

contrast, when Reinforcement Learning is applied in a simulated or symbolic

processing environment (e.g. a computer program) it does inherently require

labels and should thus be generally classified, with some exceptions, as

Supervised Learning. The corollary of these observations is that further search

for Unsupervised Learning competitive paradigms which may be trained in

simulated environments like many of those found in research and applications

may be futile.

Fast Sequence-Based Embedding with Diffusion Graphs

Benedek Rozemberczki , Rik Sarkar

Comments: Source code available at: this https URL

Journal-ref: CompleNet 2018

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A graph embedding is a representation of graph vertices in a low-dimensional

space, which approximately preserves properties such as distances between

nodes. Vertex sequence-based embedding procedures use features extracted from

linear sequences of nodes to create embeddings using a neural network. In this

paper, we propose diffusion graphs as a method to rapidly generate vertex

sequences for network embedding. Its computational efficiency is superior to

previous methods due to simpler sequence generation, and it produces more

accurate results. In experiments, we found that the performance relative to

other methods improves with increasing edge density in the graph. In a

community detection task, clustering nodes in the embedding space produces

better results compared to other sequence-based embedding methods.

Designing for the Long Tail of Machine Learning

Martin Lindvall , Jesper Molin

Comments: Accepted for presentation in poster format for the ACM CHI’19 Workshop &lt;Emerging Perspectives in Human-Centered Machine Learning&gt;

Subjects

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI)

Recent technical advances has made machine learning (ML) a promising

component to include in end user facing systems. However, user experience (UX)

practitioners face challenges in relating ML to existing user-centered design

processes and how to navigate the possibilities and constraints of this design

space. Drawing on our own experience, we characterize designing within this

space as navigating trade-offs between data gathering, model development and

designing valuable interactions for a given model performance. We suggest that

the theoretical description of how machine learning performance scales with

training data can guide designers in these trade-offs as well as having

implications for prototyping. We exemplify the learning curve’s usage by

arguing that a useful pattern is to design an initial system in a bootstrap

phase that aims to exploit the training effect of data collected at increasing

orders of magnitude.

Learning Diverse Features with Part-Level Resolution for Person Re-Identification

Ben Xie , Xiaofu Wu , Suofei Zhang , Shiliang Zhao , Ming Li

Comments: 8 pages, 5 figures, submitted to IEEE TCSVT

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

Learning diverse features is key to the success of person re-identification.

Various part-based methods have been extensively proposed for learning local

representations, which, however, are still inferior to the best-performing

methods for person re-identification. This paper proposes to construct a strong

lightweight network architecture, termed PLR-OSNet, based on the idea of

Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving

feature diversity. The proposed PLR-OSNet has two branches, one branch for

global feature representation and the other branch for local feature

representation. The local branch employs a uniform partition strategy for

part-level feature resolution but produces only a single identity-prediction

loss, which is in sharp contrast to the existing part-based methods. Empirical

evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art

performance on popular person Re-ID datasets, including Market1501,

DukeMTMC-reID and CUHK03, despite its small model size.

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Carlos Fernandez , Foster Provost , Xintian Han Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Lack of understanding of the decisions made by model-based AI systems is an

important barrier for their adoption. We examine counterfactual explanations as

an alternative for explaining AI decisions. The counterfactual approach defines

an explanation as a set of the system’s data inputs that causally drives the

decision (meaning that removing them changes the decision) and is irreducible

(meaning that removing any subset of the inputs in the explanation does not

change the decision). We generalize previous work on counterfactual

explanations, resulting in a framework that (a) is model-agnostic, (b) can

address features with arbitrary data types, (c) is able explain decisions made

by complex AI systems that incorporate multiple models, and (d) is scalable to

large numbers of features. We also propose a heuristic procedure to find the

most useful explanations depending on the context. We contrast counterfactual

explanations with another alternative: methods that explain model predictions

by weighting features according to their importance (e.g., SHAP, LIME). This

paper presents two fundamental reasons why explaining model predictions is not

the same as explaining the decisions made using those predictions, suggesting

we should carefully consider whether importance-weight explanations are

well-suited to explain decisions made by AI systems. Specifically, we show that

(1) features that have a large importance weight for a model prediction may not

actually affect the corresponding decision, and (2) importance weights are

insufficient to communicate whether and how features influence system

decisions. We demonstrate this using several examples, including three detailed

studies using real-world data that compare the counterfactual approach with

SHAP and illustrate various conditions under which counterfactual explanations

explain data-driven decisions better than feature importance weights.

Lyceum: An efficient and scalable ecosystem for robot learning

Colin Summers , Kendall Lowrey , Aravind Rajeswaran , Siddhartha Srinivasa , Emanuel Todorov Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

We introduce Lyceum, a high-performance computational ecosystem for robot

learning. Lyceum is built on top of the Julia programming language and the

MuJoCo physics simulator, combining the ease-of-use of a high-level programming

language with the performance of native C. In addition, Lyceum has a

straightforward API to support parallel computation across multiple cores and

machines. Overall, depending on the complexity of the environment, Lyceum is

5-30x faster compared to other popular abstractions like OpenAI’s Gym and

DeepMind’s dm-control. This substantially reduces training time for various

reinforcement learning algorithms; and is also fast enough to support real-time

model predictive control through MuJoCo. The code, tutorials, and demonstration

videos can be found at: this http URL .

Dynamic Epistemic Logic Games with Epistemic Temporal Goals

Bastien Maubert , Aniello Murano , Sophie Pinchinat , François Schwarzentruber , Silvia Stranieri Subjects : Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Dynamic Epistemic Logic (DEL) is a logical framework in which one can

describe in great detail how actions are perceived by the agents, and how they

affect the world. DEL games were recently introduced as a way to define classes

of games with imperfect information where the actions available to the players

are described very precisely. This framework makes it possible to define

easily, for instance, classes of games where players can only use public

actions or public announcements. These games have been studied for reachability

objectives, where the aim is to reach a situation satisfying some epistemic

property expressed in epistemic logic; several (un)decidability results have

been established. In this work we show that the decidability results obtained

for reachability objectives extend to a much more general class of winning

conditions, namely those expressible in the epistemic temporal logic LTLK. To

do so we establish that the infinite game structures generated by DEL public

actions are regular, and we describe how to obtain finite representations on

which we rely to solve them.

An interpretable neural network model through piecewise linear approximation

Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Most existing interpretable methods explain a black-box model in a post-hoc

manner, which uses simpler models or data analysis techniques to interpret the

predictions after the model is learned. However, they (a) may derive

contradictory explanations on the same predictions given different methods and

data samples, and (b) focus on using simpler models to provide higher

descriptive accuracy at the sacrifice of prediction accuracy. To address these

issues, we propose a hybrid interpretable model that combines a piecewise

linear component and a nonlinear component. The first component describes the

explicit feature contributions by piecewise linear approximation to increase

the expressiveness of the model. The other component uses a multi-layer

perceptron to capture feature interactions and implicit nonlinearity, and

increase the prediction performance. Different from the post-hoc approaches,

the interpretability is obtained once the model is learned in the form of

feature shapes. We also provide a variant to explore higher-order interactions

among features to demonstrate that the proposed model is flexible for

adaptation. Experiments demonstrate that the proposed model can achieve good

interpretability by describing feature shapes while maintaining

state-of-the-art accuracy.

Synergizing Domain Expertise with Self-Awareness in Software Systems: A Patternized Architecture Guideline

Tao Chen , Rami Bahsoon , Xin Yao

Comments: submitted; 31 pages, 10 tables and 29 figures

Subjects

:

Software Engineering (cs.SE)

; Artificial Intelligence (cs.AI)

Architectural patterns provide a reusable architectural solution for commonly

recurring problems that can assist in designing software systems. In this

regard, self-awareness architectural patterns are specialized patterns that

leverage good engineering practices and experiences to help in designing

self-awareness and self-adaptation of a software system. However, domain

knowledge and engineers’ expertise that is built over time are not explicitly

linked to these patterns and the self-aware process. This linkage is important,

as it can enrich the design patterns of these systems, which consequently leads

to more effective and efficient self-aware and self-adaptive behaviours. This

paper is an introductory work that highlights the importance of synergizing

domain expertise into the self-awareness in software systems, relying on

well-defined underlying approaches. In particular, we present a holistic

framework that classifies widely known representations used to obtain and

maintain the domain expertise, documenting their nature and specifics rules

that permits different levels of synergies with self-awareness. Drawing on

such, we describe mechanisms that can enrich existing patterns with engineers’

expertise and knowledge of the domain. This, together with the framework, allow

us to codify an intuitive step-by-step methodology that guides engineer in

making design decisions when synergizing domain expertise into self-awareness

and reveal their importances, in an attempt to keep ‘engineers-in-the-loop’.

Through three case studies, we demonstrate how the enriched patterns, the

proposed framework and methodology can be applied in different domains, within

which we quantitatively compare the actual benefits of incorporating engineers’

expertise into self-awareness, at alternative levels of synergies.

A point-wise linear model reveals reasons for 30-day readmission of heart failure patients

Yasuho Yamashita , Takuma Shibahara , Junichi Kuwata

Comments: 8 pages, 3 figures

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Heart failures in the United States cost an estimated 30.7 billion dollars

annually and predictive analysis can decrease costs due to readmission of heart

failure patients. Deep learning can predict readmissions but does not give

reasons for its predictions. Ours is the first study on a deep-learning

approach to explaining decisions behind readmission predictions. Additionally,

it provides an automatic patient stratification to explain cohorts of

readmitted patients. The new deep-learning model called a point-wise linear

model is a meta-learning machine of linear models. It generates a logistic

regression model to predict early readmission for each patient. The custom-made

prediction models allow us to analyze feature importance. We evaluated the

approach using a dataset that had 30-days readmission patients with heart

failures. This study has been submitted in PLOS ONE. In advance, we would like

to share the theoretical aspect of the point-wise linear model as a part of our

study.

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing VQA datasets contain questions with varying levels of complexity.

While the majority of questions in these datasets require perception for

recognizing existence, properties, and spatial relationships of entities, a

significant portion of questions pose challenges that correspond to reasoning

tasks — tasks that can only be answered through a synthesis of perception and

knowledge about the world, logic and / or reasoning. This distinction allows us

to notice when existing VQA models have consistency issues — they answer the

reasoning question correctly but fail on associated low-level perception

questions. For example, models answer the complex reasoning question “Is the

banana ripe enough to eat?” correctly, but fail on the associated perception

question “Are the bananas mostly green or yellow?” indicating that the model

likely answered the reasoning question correctly but for the wrong reason. We

quantify the extent to which this phenomenon occurs by creating a new Reasoning

split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of

200K new perception questions which serve as sub questions corresponding to the

set of perceptual tasks needed to effectively answer the complex reasoning

questions in the Reasoning split. Additionally, we propose an approach called

Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the

model to attend do the same parts of the image when answering the reasoning

question and the perception sub questions. We show that SQuINT improves model

consistency by 7.8%, also marginally improving its performance on the Reasoning

questions in VQA, while also displaying qualitatively better attention maps.

Teaching Software Engineering for AI-Enabled Systems

Christian Kästner , Eunsuk Kang

Comments: to be published in ICSE-SEET 2020

Subjects

:

Software Engineering (cs.SE)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Software engineers have significant expertise to offer when building

intelligent systems, drawing on decades of experience and methods for building

systems that are scalable, responsive and robust, even when built on unreliable

components. Systems with artificial-intelligence or machine-learning (ML)

components raise new challenges and require careful engineering. We designed a

new course to teach software-engineering skills to students with a background

in ML. We specifically go beyond traditional ML courses that teach modeling

techniques under artificial conditions and focus, in lecture and assignments,

on realism with large and changing datasets, robust and evolvable

infrastructure, and purposeful requirements engineering that considers ethics

and fairness as well. We describe the course and our infrastructure and share

experience and all material from teaching the course for the first time.

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Amy X. Zhang , Michael Muller , Dakuo Wang

Comments: CSCW’2020

Subjects

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)

Today, the prominence of data science within organizations has given rise to

teams of data science workers collaborating on extracting insights from data,

as opposed to individual data scientists working alone. However, we still lack

a deep understanding of how data science workers collaborate in practice. In

this work, we conducted an online survey with 183 participants who work in

various aspects of data science. We focused on their reported interactions with

each other (e.g., managers with engineers) and with different tools (e.g.,

Jupyter Notebook). We found that data science teams are extremely collaborative

and work with a variety of stakeholders and tools during the six common steps

of a data science workflow (e.g., clean data and train model). We also found

that the collaborative practices workers employ, such as documentation, vary

according to the kinds of tools they use. Based on these findings, we discuss

design implications for supporting data science team collaborations and future

research directions.

Learning to See Analogies: A Connectionist Exploration

Douglas S. Blank

Comments: 191 pages, PhD thesis

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This dissertation explores the integration of learning and analogy-making

through the development of a computer program, called Analogator, that learns

to make analogies by example. By “seeing” many different analogy problems,

along with possible solutions, Analogator gradually develops an ability to make

new analogies. That is, it learns to make analogies by analogy. This approach

stands in contrast to most existing research on analogy-making, in which

typically the a priori existence of analogical mechanisms within a model is

assumed. The present research extends standard connectionist methodologies by

developing a specialized associative training procedure for a recurrent network

architecture. The network is trained to divide input scenes (or situations)

into appropriate figure and ground components. Seeing one scene in terms of a

particular figure and ground provides the context for seeing another in an

analogous fashion. After training, the model is able to make new analogies

between novel situations. Analogator has much in common with lower-level

perceptual models of categorization and recognition; it thus serves as a

unifying framework encompassing both high-level analogical learning and

low-level perception. This approach is compared and contrasted with other

computational models of analogy-making. The model’s training and generalization

performance is examined, and limitations are discussed.

Graph Ordering: Towards the Optimal by Learning

Kangfei Zhao , Yu Rong , Jeffrey Xu Yu , Junzhou Huang , Hao Zhang

Comments: 11 pages

Subjects

:

Social and Information Networks (cs.SI)

; Artificial Intelligence (cs.AI)

Graph representation learning has achieved a remarkable success in many

graph-based applications, such as node classification, link prediction, and

community detection. These models are usually designed to preserve the vertex

information at different granularity and reduce the problems in discrete space

to some machine learning tasks in continuous space. However, regardless of the

fruitful progress, for some kind of graph applications, such as graph

compression and edge partition, it is very hard to reduce them to some graph

representation learning tasks. Moreover, these problems are closely related to

reformulating a global layout for a specific graph, which is an important

NP-hard combinatorial optimization problem: graph ordering. In this paper, we

propose to attack the graph ordering problem behind such applications by a

novel learning approach. Distinguished from greedy algorithms based on

predefined heuristics, we propose a neural network model: Deep Order Network

(DON) to capture the hidden locality structure from partial vertex order sets.

Supervised by sampled partial order, DON has the ability to infer unseen

combinations. Furthermore, to alleviate the combinatorial explosion in the

training space of DON and make the efficient partial vertex order sampling , we

employ a reinforcement learning model: the Policy Network, to adjust the

partial order sampling probabilities during the training phase of DON

automatically. To this end, the Policy Network can improve the training

efficiency and guide DON to evolve towards a more effective model

automatically. Comprehensive experiments on both synthetic and real data

validate that DON-RL outperforms the current state-of-the-art heuristic

algorithm consistently. Two case studies on graph compression and edge

partitioning demonstrate the potential power of DON-RL in real applications.

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

Samaneh Hosseini Semnani , Hugh Liu , Michael Everett , Anton de Ruiter , Jonathan P. How Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)

This paper introduces a hybrid algorithm of deep reinforcement learning (RL)

and Force-based motion planning (FMP) to solve distributed motion planning

problem in dense and dynamic environments. Individually, RL and FMP algorithms

each have their own limitations. FMP is not able to produce time-optimal paths

and existing RL solutions are not able to produce collision-free paths in dense

environments. Therefore, we first tried improving the performance of recent RL

approaches by introducing a new reward function that not only eliminates the

requirement of a pre supervised learning (SL) step but also decreases the

chance of collision in crowded environments. That improved things, but there

were still a lot of failure cases. So, we developed a hybrid approach to

leverage the simpler FMP approach in stuck, simple and high-risk cases, and

continue using RL for normal cases in which FMP can’t produce optimal path.

Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show

that the proposed algorithm outperforms both deep RL and FMP algorithms and

produces up to 50% more successful scenarios than deep RL and up to 75% less

extra time to reach goal than FMP.

The Risk to Population Health Equity Posed by Automated Decision Systems: A Narrative Review

Mitchell Burger

Comments: 22 pages (12 pages excluding references), 1 figure

Subjects

:

Computers and Society (cs.CY)

; Artificial Intelligence (cs.AI)

Artificial intelligence is already ubiquitous, and is increasingly being used

to autonomously make ever more consequential decisions. However, there has been

relatively little research into the consequences for equity of the use of

narrow AI and automated decision systems in medicine and public health. A

narrative review using a hermeneutic approach was undertaken to explore current

and future uses of AI in medicine and public health, issues that have emerged,

and longer-term implications for population health. Accounts in the literature

reveal a tremendous expectation on AI to transform medical and public health

practices, especially regarding precision medicine and precision public health.

Automated decisions being made about disease detection, diagnosis, treatment,

and health funding allocation have significant consequences for individual and

population health and wellbeing. Meanwhile, it is evident that issues of bias,

incontestability, and erosion of privacy have emerged in sensitive domains

where narrow AI and automated decision systems are in common use. As the use of

automated decision systems expands, it is probable that these same issues will

manifest widely in medicine and public health applications. Bias,

incontestability, and erosion of privacy are mechanisms by which existing

social, economic and health disparities are perpetuated and amplified. The

implication is that there is a significant risk that use of automated decision

systems in health will exacerbate existing population health inequities. The

industrial scale and rapidity with which automated decision systems can be

applied to whole populations heightens the risk to population health equity.

There is a need therefore to design and implement automated decision systems

with care, monitor their impact over time, and develop capacities to respond to

issues as they emerge.

Siamese Graph Neural Networks for Data Integration

Evgeny Krivosheev , Mattia Atzeni , Katsiaryna Mirylenka , Paolo Scotton , Fabio Casati Subjects : Databases (cs.DB) ; Artificial Intelligence (cs.AI)

Data integration has been studied extensively for decades and approached from

different angles. However, this domain still remains largely rule-driven and

lacks universal automation. Recent development in machine learning and in

particular deep learning has opened the way to more general and more efficient

solutions to data integration problems. In this work, we propose a general

approach to modeling and integrating entities from structured data, such as

relational databases, as well as unstructured sources, such as free text from

news articles. Our approach is designed to explicitly model and leverage

relations between entities, thereby using all available information and

preserving as much context as possible. This is achieved by combining siamese

and graph neural networks to propagate information between connected entities

and support high scalability. We evaluate our method on the task of integrating

data about business entities, and we demonstrate that it outperforms standard

rule-based systems, as well as other deep learning approaches that do not use

graph-based representations.

Activism by the AI Community: Analysing Recent Achievements and Future Prospects

Haydn Belfield

Comments: Forthcoming in Proceedings of the 2020 AAAI/ACM Conference on Artificial Intelligence, Ethics and Society. 7 pages

Subjects

:

Computers and Society (cs.CY)

; Artificial Intelligence (cs.AI)

The artificial intelligence community (AI) has recently engaged in activism

in relation to their employers, other members of the community, and their

governments in order to shape the societal and ethical implications of AI. It

has achieved some notable successes, but prospects for further political

organising and activism are uncertain. We survey activism by the AI community

over the last six years; apply two analytical frameworks drawing upon the

literature on epistemic communities, and worker organising and bargaining; and

explore what they imply for the future prospects of the AI community. Success

thus far has hinged on a coherent shared culture, and high bargaining power due

to the high demand for a limited supply of AI talent. Both are crucial to the

future of AI activism and worthy of sustained attention.

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Yunlong Lu , Kai Yan

Comments: 14 pages, review. arXiv admin note: text overlap with arXiv:1810.05587 by other authors

Subjects

:

Computer Science and Game Theory (cs.GT)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Deep reinforcement learning (RL) has achieved outstanding results in recent

years, which has led a dramatic increase in the number of methods and

applications. Recent works are exploring learning beyond single-agent scenarios

and considering multi-agent scenarios. However, they are faced with lots of

challenges and are seeking for help from traditional game-theoretic algorithms,

which, in turn, show bright application promise combined with modern algorithms

and boosting computing power. In this survey, we first introduce basic concepts

and algorithms in single agent RL and multi-agent systems; then, we summarize

the related algorithms from three aspects. Solution concepts from game theory

give inspiration to algorithms which try to evaluate the agents or find better

solutions in multi-agent systems. Fictitious self-play becomes popular and has

a great impact on the algorithm of multi-agent reinforcement learning.

Counterfactual regret minimization is an important tool to solve games with

incomplete information, and has shown great strength when combined with deep

learning.

A Survey on the Use of Preferences for Virtual Machine Placement in Cloud Data Centers

Abdulaziz Alashaikh , Eisa Alanazi , Ala Al-Fuqaha

Comments: 40 pages, 5 figures, 6 tables

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

With the rapid development of virtualization techniques, cloud data centers

allow for cost effective, flexible, and customizable deployments of

applications on virtualized infrastructure. Virtual machine (VM) placement aims

to assign each virtual machine to a server in the cloud environment. VM

Placement is of paramount importance to the design of cloud data centers.

Typically, VM placement involves complex relations and multiple design factors

as well as local policies that govern the assignment decisions. It also

involves different constituents including cloud administrators and customers

that might have disparate preferences while opting for a placement solution.

Thus, it is often valuable to not only return an optimized solution to the VM

placement problem but also a solution that reflects the given preferences of

the constituents. In this paper, we provide a detailed review on the role of

preferences in the recent literature on VM placement. We further discuss key

challenges and identify possible research opportunities to better incorporate

preferences within the context of VM placement.

Information Retrieval

Hybrid Semantic Recommender System for Chemical Compounds

Marcia Barros , André Moitinho , Francisco M. Couto

Comments: 8 pages, 3 figures, accepted as short-paper to the 42nd European Conference on Information Retrieval (ECIR), Lisbon 2020

Subjects

:

Information Retrieval (cs.IR)

Recommending Chemical Compounds of interest to a particular researcher is a

poorly explored field. The few existent datasets with information about the

preferences of the researchers use implicit feedback. The lack of Recommender

Systems in this particular field presents a challenge for the development of

new recommendations models. In this work, we propose a Hybrid recommender model

for recommending Chemical Compounds. The model integrates

collaborative-filtering algorithms for implicit feedback (Alternating Least

Squares (ALS) and Bayesian Personalized Ranking(BPR)) and semantic similarity

between the Chemical Compounds in the ChEBI ontology (ONTO). We evaluated the

model in an implicit dataset of Chemical Compounds, CheRM. The Hybrid model was

able to improve the results of state-of-the-art collaborative-filtering

algorithms, especially for Mean Reciprocal Rank, with an increase of 6.7% when

comparing the collaborative-filtering ALS and the Hybrid ALS_ONTO.

BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction

Diana Sousa , Francisco M. Couto

Comments: Accepted as ECIR 2020 Short Paper

Subjects

:

Information Retrieval (cs.IR)

Successful biomedical relation extraction can provide evidence to researchers

and clinicians about possible unknown associations between biomedical entities,

advancing the current knowledge we have about those entities and their inherent

mechanisms. Most biomedical relation extraction systems do not resort to

external sources of knowledge, such as domain-specific ontologies. However,

using deep learning methods, along with biomedical ontologies, has been

recently shown to effectively advance the biomedical relation extraction field.

To perform relation extraction, our deep learning system, BiOnt, employs four

types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype

Ontology, the Human Disease Ontology, and the Chemical Entities of Biological

Interest, regarding gene-products, phenotypes, diseases, and chemical

compounds, respectively. We tested our system with three data sets that

represent three different types of relations of biomedical entities. BiOnt

achieved, in F-score, an improvement of 4.93 percentage points for drug-drug

interactions (DDI corpus), 4.99 percentage points for phenotype-gene relations

(PGR corpus), and 2.21 percentage points for chemical-induced disease relations

(BC5CDR corpus), relatively to the state-of-the-art. The code supporting this

system is available at this https URL .

Quantum-like Structure in Multidimensional Relevance Judgements

Sagar Uprety , Prayag Tiwari , Shahram Dehdashti , Lauren Fell , Dawei Song , Peter Bruza , Massimo Melucci

Comments: To be published at 42nd European Conference on Research in IR, ECIR 2020

Subjects

:

Information Retrieval (cs.IR)

A large number of studies in cognitive science have revealed that

probabilistic outcomes of certain human decisions do not agree with the axioms

of classical probability theory. The field of Quantum Cognition provides an

alternative probabilistic model to explain such paradoxical findings. It posits

that cognitive systems have an underlying quantum-like structure, especially in

decision-making under uncertainty. In this paper, we hypothesise that relevance

judgement, being a multidimensional, cognitive concept, can be used to probe

the quantum-like structure for modelling users’ cognitive states in information

seeking. Extending from an experiment protocol inspired by the Stern-Gerlach

experiment in Quantum Physics, we design a crowd-sourced user study to show

violation of the Kolmogorovian probability axioms as a proof of the

quantum-like structure, and provide a comparison between a quantum

probabilistic model and a Bayesian model for predictions of relevance.

Common Conversational Community Prototype: Scholarly Conversational Assistant

Krisztian Balog , Lucie Flekova , Matthias Hagen , Rosie Jones , Martin Potthast , Filip Radlinski , Mark Sanderson , Svitlana Vakulenko , Hamed Zamani Subjects : Information Retrieval (cs.IR)

This paper discusses the potential for creating academic resources (tools,

data, and evaluation approaches) to support research in conversational search,

by focusing on realistic information needs and conversational interactions.

Specifically, we propose to develop and operate a prototype conversational

search system for scholarly activities. This Scholarly Conversational Assistant

would serve as a useful tool, a means to create datasets, and a platform for

running evaluation challenges by groups across the community. This article

results from discussions of a working group at Dagstuhl Seminar 19461 on

Conversational Search.

On the Minimum Achievable Age of Information for General Service-Time Distributions

Jaya Prakash Champati , Ramana R. Avula , Tobias J. Oechtering , James Gross Subjects : Information Retrieval (cs.IR) ; Information Theory (cs.IT)

There is a growing interest in analysing the freshness of data in networked

systems. Age of Information (AoI) has emerged as a popular metric to quantify

this freshness at a given destination. There has been a significant research

effort in optimizing this metric in communication and networking systems under

different settings. In contrast to previous works, we are interested in a

fundamental question, what is the minimum achievable AoI in any

single-server-single-source queuing system for a given service-time

distribution? To address this question, we study a problem of optimizing AoI

under service preemptions. Our main result is on the characterization of the

minimum achievable average peak AoI (PAoI). We obtain this result by showing

that a fixed-threshold policy is optimal in the set of all randomized-threshold

causal policies. We use the characterization to provide necessary and

sufficient conditions for the service-time distributions under which

preemptions are beneficial.

Information Foraging for Enhancing Implicit Feedback in Content-based Image Recommendation

Amit Kumar Jaiswal , Haiming Liu , Ingo Frommholz

Comments: FIRE ’19: Proceedings of the 11th Forum for Information Retrieval Evaluation

Subjects

:

Information Retrieval (cs.IR)

; Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

User implicit feedback plays an important role in recommender systems.

However, finding implicit features is a tedious task. This paper aims to

identify users’ preferences through implicit behavioural signals for image

recommendation based on the Information Scent Model of Information Foraging

Theory. In the first part, we hypothesise that the users’ perception is

improved with visual cues in the images as behavioural signals that provide

users’ information scent during information seeking. We designed a

content-based image recommendation system to explore which image attributes

(i.e., visual cues or bookmarks) help users find their desired image. We found

that users prefer recommendations predicated by visual cues and therefore

consider the visual cues as good information scent for their information

seeking. In the second part, we investigated if visual cues in the images

together with the images itself can be better perceived by the users than each

of them on its own. We evaluated the information scent artifacts in image

recommendation on the Pinterest image collection and the WikiArt dataset. We

find our proposed image recommendation system supports the implicit signals

through Information Foraging explanation of the information scent model.

Ranking Significant Discrepancies in Clinical Reports

Sean MacAvaney , Arman Cohan , Nazli Goharian , Ross Filice

Comments: ECIR 2020 (short)

Subjects

:

Information Retrieval (cs.IR)

; Computation and Language (cs.CL)

Medical errors are a major public health concern and a leading cause of death

worldwide. Many healthcare centers and hospitals use reporting systems where

medical practitioners write a preliminary medical report and the report is

later reviewed, revised, and finalized by a more experienced physician. The

revisions range from stylistic to corrections of critical errors or

misinterpretations of the case. Due to the large quantity of reports written

daily, it is often difficult to manually and thoroughly review all the

finalized reports to find such errors and learn from them. To address this

challenge, we propose a novel ranking approach, consisting of textual and

ontological overlaps between the preliminary and final versions of reports. The

approach learns to rank the reports based on the degree of discrepancy between

the versions. This allows medical practitioners to easily identify and learn

from the reports in which their interpretation most substantially differed from

that of the attending physician (who finalized the report). This is a crucial

step towards uncovering potential errors and helping medical practitioners to

learn from such errors, thus improving patient-care in the long run. We

evaluate our model on a dataset of radiology reports and show that our approach

outperforms both previously-proposed approaches and more recent language models

by 4.5% to 15.4%.

Random-walk Based Generative Model for Classifying Document Networks

Takafumi J. Suzuki

Comments: 7pages, 3 figures

Subjects

:

Physics and Society (physics.soc-ph)

; Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Document networks are found in various collections of real-world data, such

as citation networks, hyperlinked web pages, and online social networks. A

large number of generative models have been proposed because they offer

intuitive and useful pictures for analyzing document networks. Prominent

examples are relational topic models, where documents are linked according to

their topic similarities. However, existing generative models do not make full

use of network structures because they are largely dependent on topic modeling

of documents. In particular, centrality of graph nodes is missing in generative

processes of previous models. In this paper, we propose a novel generative

model for document networks by introducing random walkers on networks to

integrate the node centrality into link generation processes. The developed

method is evaluated in semi-supervised classification tasks with real-world

citation networks. We show that the proposed model outperforms existing

probabilistic approaches especially in detecting communities in connected

networks.

Finding temporal patterns using algebraic fingerprints

Suhas Thejaswi , Aristides Gionis Subjects : Data Structures and Algorithms (cs.DS) ; Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)

In this paper we study a family of pattern-detection problems in

vertex-colored temporal graphs. In particular, given a vertex-colored temporal

graph and a multi-set of colors as a query, we search for temporal paths in the

graph that contain the colors specified in the query. These types of problems

have several interesting applications, for example, recommending tours for

tourists, or searching for abnormal behavior in a network of financial

transactions.

For the family of pattern-detection problems we define, we establish

complexity results and design an algebraic-algorithmic framework based on

constrained multilinear sieving. We demonstrate that our solution can scale to

massive graphs with up to hundred million edges, despite the problems being

NP-hard. Our implementation, which is publicly available, exhibits practical

edge-linear scalability and highly optimized. For example, in a real-world

graph dataset with more than six million edges and a multi-set query with ten

colors, we can extract an optimal solution in less than eight minutes on a

haswell desktop with four cores.

Audio Summarization with Audio Features and Probability Distribution Divergence

Carlos-Emiliano González-Gallardo , Romain Deveaud , Eric SanJuan , Juan-Manuel Torres

Comments: 20th International Conference on Computational Linguistics and Intelligent Text Processing

Subjects

:

Computation and Language (cs.CL)

; Information Retrieval (cs.IR)

The automatic summarization of multimedia sources is an important task that

facilitates the understanding of an individual by condensing the source while

maintaining relevant information. In this paper we focus on audio summarization

based on audio features and the probability of distribution divergence. Our

method, based on an extractive summarization approach, aims to select the most

relevant segments until a time threshold is reached. It takes into account the

segment’s length, position and informativeness value. Informativeness of each

segment is obtained by mapping a set of audio features issued from its

Mel-frequency Cepstral Coefficients and their corresponding Jensen-Shannon

divergence score. Results over a multi-evaluator scheme shows that our approach

provides understandable and informative summaries.

SlideImages: A Dataset for Educational Image Classification

David Morris , Eric Müller-Budack , Ralph Ewerth

Comments: 8 pages, 2 figures, to be presented at ECIR 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Information Retrieval (cs.IR)

In the past few years, convolutional neural networks (CNNs) have achieved

impressive results in computer vision tasks, which however mainly focus on

photos with natural scene content. Besides, non-sensor derived images such as

illustrations, data visualizations, figures, etc. are typically used to convey

complex information or to explore large datasets. However, this kind of images

has received little attention in computer vision. CNNs and similar techniques

use large volumes of training data. Currently, many document analysis systems

are trained in part on scene images due to the lack of large datasets of

educational image data. In this paper, we address this issue and present

SlideImages, a dataset for the task of classifying educational illustrations.

SlideImages contains training data collected from various sources, e.g.,

Wikimedia Commons and the AI2D dataset, and test data collected from

educational slides. We have reserved all the actual educational images as a

test dataset in order to ensure that the approaches using this dataset

generalize well to new educational images, and potentially other domains.

Furthermore, we present a baseline system using a standard deep neural

architecture and discuss dealing with the challenge of limited training data.

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

Anubha Pandey , Ashish Mishra , Vinay Kumar Verma , Anurag Mittal , Hema A. Murthy

Comments: Accepted in WACV’2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that

the data of all the classes are available during training. The assumption may

not always be practical since the data of a few classes may be unavailable, or

the classes may not appear at the time of training. Zero-Shot Sketch-Based

Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to

handle previously unseen classes during the test. This paper proposes a

generative approach based on the Stacked Adversarial Network (SAN) and the

advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a

high-quality sample, SN learns a better distance metric compared to that of the

nearest neighbor search. The capability of the generative model to synthesize

image features based on the sketch reduces the SBIR problem to that of an

image-to-image retrieval problem. We evaluate the efficacy of our proposed

approach on TU-Berlin, and Sketchy database in both standard ZSL and

generalized ZSL setting. The proposed method yields a significant improvement

in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL)

for SBIR.

Adaptive Parameterization for Neural Dialogue Generation

Hengyi Cai , Hongshen Chen , Cheng Zhang , Yonghao Song , Xiaofang Zhao , Dawei Yin

Comments: Published as a long paper in EMNLP 2019

Subjects

:

Computation and Language (cs.CL)

; Information Retrieval (cs.IR); Machine Learning (cs.LG)

Neural conversation systems generate responses based on the

sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with

a single set of learned parameters to generate responses for given input

contexts. When confronting diverse conversations, its adaptability is rather

limited and the model is hence prone to generate generic responses. In this

work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation

model, extsc{AdaND}, which manages various conversations with

conversation-specific parameterization. For each conversation, the model

generates parameters of the encoder-decoder by referring to the input context.

In particular, we propose two adaptive parameterization mechanisms: a

context-aware and a topic-aware parameterization mechanism. The context-aware

parameterization directly generates the parameters by capturing local semantics

of the given context. The topic-aware parameterization enables parameter

sharing among conversations with similar topics by first inferring the latent

topics of the given context and then generating the parameters with respect to

the distributional topics. Extensive experiments conducted on a large-scale

real-world conversational dataset show that our model achieves superior

performance in terms of both quantitative metrics and human evaluations.

SEPT: Improving Scientific Named Entity Recognition with Span Representation

Tan Yan , Heyan Huang , Xian-Ling Mao Subjects : Computation and Language (cs.CL) ; Information Retrieval (cs.IR)

We introduce a new scientific named entity recognizer called SEPT, which

stands for Span Extractor with Pre-trained Transformers. In recent papers, span

extractors have been demonstrated to be a powerful model compared with sequence

labeling models. However, we discover that with the development of pre-trained

language models, the performance of span extractors appears to become similar

to sequence labeling models. To keep the advantages of span representation, we

modified the model by under-sampling to balance the positive and negative

samples and reduce the search space. Furthermore, we simplify the origin

network architecture to combine the span extractor with BERT. Experiments

demonstrate that even simplified architecture achieves the same performance and

SEPT achieves a new state of the art result in scientific named entity

recognition even without relation information involved.

Computation and Language

Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference

Timo Schick , Hinrich Schütze Subjects : Computation and Language (cs.CL)

Some NLP tasks can be solved in a fully unsupervised fashion by providing a

pretrained language model with “task descriptions” in natural language (e.g.,

Radford et al., 2019). While this approach underperforms its supervised

counterpart, we show in this work that the two ideas can be combined: We

introduce Pattern-Exploiting Training (PET), a semi-supervised training

procedure that reformulates input examples as cloze-style phrases which help

the language model understand the given task. Theses phrases are then used to

assign soft labels to a large set of unlabeled examples. Finally, regular

supervised training is performed on the resulting training set. On several

tasks, we show that PET outperforms both supervised training and unsupervised

approaches in low-resource settings by a large margin.

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Stefan Ultes

Comments: Published at SIGDIAL 2019

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Learning suitable and well-performing dialogue behaviour in statistical

spoken dialogue systems has been in the focus of research for many years. While

most work which is based on reinforcement learning employs an objective measure

like task success for modelling the reward signal, we use a reward based on

user satisfaction estimation. We propose a novel estimator and show that it

outperforms all previous estimators while learning temporal dependencies

implicitly. Furthermore, we apply this novel user satisfaction estimation model

live in simulated experiments where the satisfaction estimation model is

trained on one domain and applied in many other domains which cover a similar

task. We show that applying this model results in higher estimated

satisfaction, similar task success rates and a higher robustness to noise.

Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese

Jessica Rodrigues da Silva , Helena de Medeiros Caseli

Comments: 14 pages, STIL 2019 Full paper

Journal-ref: STIL XII (2019) 104-113

Subjects

:

Computation and Language (cs.CL)

Word embeddings are numerical vectors which can represent words or concepts

in a low-dimensional continuous space. These vectors are able to capture useful

syntactic and semantic information. The traditional approaches like Word2Vec,

GloVe and FastText have a strict drawback: they produce a single vector

representation per word ignoring the fact that ambiguous words can assume

different meanings. In this paper we use techniques to generate sense

embeddings and present the first experiments carried out for Portuguese. Our

experiments show that sense vectors outperform traditional word vectors in

syntactic and semantic analogy tasks, proving that the language resource

generated here can improve the performance of NLP tasks in Portuguese.

Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems

Vevake Balaraman , Bernardo Magnini Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

In task-oriented dialogue systems the dialogue state tracker (DST) component

is responsible for predicting the state of the dialogue based on the dialogue

history. Current DST approaches rely on a predefined domain ontology, a fact

that limits their effective usage for large scale conversational agents, where

the DST constantly needs to be interfaced with ever-increasing services and

APIs. Focused towards overcoming this drawback, we propose a domain-aware

dialogue state tracker, that is completely data-driven and it is modeled to

predict for dynamic service schemas. The proposed model utilizes domain and

slot information to extract both domain and slot specific representations for a

given dialogue, and then uses such representations to predict the values of the

corresponding slot. Integrating this mechanism with a pretrained language model

(i.e. BERT), our approach can effectively learn semantic relations.

A Physical Embedding Model for Knowledge Graphs

Caglar Demir , Axel-Cyrille Ngonga Ngomo

Comments: 9th Joint International Conference, JIST 2019, Hangzhou, China

Subjects

:

Computation and Language (cs.CL)

Knowledge graph embedding methods learn continuous vector representations for

entities in knowledge graphs and have been used successfully in a large number

of applications. We present a novel and scalable paradigm for the computation

of knowledge graph embeddings, which we dub PYKE . Our approach combines a

physical model based on Hooke’s law and its inverse with ideas from simulated

annealing to compute embeddings for knowledge graphs efficiently. We prove that

PYKE achieves a linear space complexity. While the time complexity for the

initialization of our approach is quadratic, the time complexity of each of its

iterations is linear in the size of the input knowledge graph. Hence, PYKE’s

overall runtime is close to linear. Consequently, our approach easily scales up

to knowledge graphs containing millions of triples. We evaluate our approach

against six state-of-the-art embedding approaches on the DrugBank and DBpedia

datasets in two series of experiments. The first series shows that the cluster

purity achieved by PYKE is up to 26% (absolute) better than that of the state

of art. In addition, PYKE is more than 22 times faster than existing embedding

solutions in the best case. The results of our second series of experiments

show that PYKE is up to 23% (absolute) better than the state of art on the task

of type prediction while maintaining its superior scalability. Our

implementation and results are open-source and are available at

this http URL .

Length-controllable Abstractive Summarization by Guiding with Summary Prototype

Itsumi Saito , Kyosuke Nishida , Kosuke Nishida , Atsushi Otsuka , Hisako Asano , Junji Tomita , Hiroyuki Shindo , Yuji Matsumoto Subjects : Computation and Language (cs.CL)

We propose a new length-controllable abstractive summarization model. Recent

state-of-the-art abstractive summarization models based on encoder-decoder

models generate only one summary per source text. However, controllable

summarization, especially of the length, is an important aspect for practical

applications. Previous studies on length-controllable abstractive summarization

incorporate length embeddings in the decoder module for controlling the summary

length. Although the length embeddings can control where to stop decoding, they

do not decide which information should be included in the summary within the

length constraint. Unlike the previous models, our length-controllable

abstractive summarization model incorporates a word-level extractive module in

the encoder-decoder model instead of length embeddings. Our model generates a

summary in two steps. First, our word-level extractor extracts a sequence of

important words (we call it the “prototype text”) from the source text

according to the word-level importance scores and the length constraint.

Second, the prototype text is used as additional input to the encoder-decoder

model, which generates a summary by jointly encoding and copying words from

both the prototype text and source text. Since the prototype text is a guide to

both the content and length of the summary, our model can generate an

informative and length-controlled summary. Experiments with the CNN/Daily Mail

dataset and the NEWSROOM dataset show that our model outperformed previous

models in length-controlled settings.

A Hierarchical Location Normalization System for Text

Dongyun Liang , Guohua Wang , Jing Nie , Binxu Zhai , Xiusen Gu

Comments: 7 pages, submitted to conference

Subjects

:

Computation and Language (cs.CL)

It’s natural these days for people to know the local events from massive

documents. Many texts contain location information, such as city name or road

name, which is always incomplete or latent. It’s significant to extract the

administrative area of the text and organize the hierarchy of area, called

location normalization. Existing detecting location systems either exclude

hierarchical normalization or present only a few specific regions. We propose a

system named ROIBase that normalizes the text by the Chinese hierarchical

administrative divisions. ROIBase adopts a co-occurrence constraint as the

basic framework to score the hit of the administrative area, achieves the

inference by special embeddings, and expands the recall by the ROI (region of

interest). It has high efficiency and interpretability because it mainly

establishes on the definite knowledge and has less complex logic than the

supervised models. We demonstrate that ROIBase achieves better performance

against feasible solutions and is useful as a strong support system for

location normalization.

Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching

Shuohang Wang , Yunshi Lan , Yi Tay , Jing Jiang , Jingjing Liu

Comments: AAAI 2020, 8 pages

Subjects

:

Computation and Language (cs.CL)

Transformer has been successfully applied to many natural language processing

tasks. However, for textual sequence matching, simple matching between the

representation of a pair of sequences might bring in unnecessary noise. In this

paper, we propose a new approach to sequence pair matching with Transformer, by

learning head-wise matching representations on multiple levels. Experiments

show that our proposed approach can achieve new state-of-the-art performance on

multiple tasks that rely only on pre-computed sequence-vector-representation,

such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.

Text-based inference of moral sentiment change

Jing Yi Xie , Renato Ferreira Pinto Jr. , Graeme Hirst , Yang Xu

Comments: In Proceedings of EMNLP 2019

Subjects

:

Computation and Language (cs.CL)

We present a text-based framework for investigating moral sentiment change of

the public via longitudinal corpora. Our framework is based on the premise that

language use can inform people’s moral perception toward right or wrong, and we

build our methodology by exploring moral biases learned from diachronic word

embeddings. We demonstrate how a parameter-free model supports inference of

historical shifts in moral sentiment toward concepts such as slavery and

democracy over centuries at three incremental levels: moral relevance, moral

polarity, and fine-grained moral dimensions. We apply this methodology to

visualizing moral time courses of individual concepts and analyzing the

relations between psycholinguistic variables and rates of moral sentiment

change at scale. Our work offers opportunities for applying natural language

processing toward characterizing moral sentiment change in society.

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Yichao Zhou , Shaunak Mishra , Manisha Verma , Narayan Bhamidipati , Wei Wang

Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020

Subjects

:

Computation and Language (cs.CL)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

There is a perennial need in the online advertising industry to refresh ad

creatives, i.e., images and text used for enticing online users towards a

brand. Such refreshes are required to reduce the likelihood of ad fatigue among

online users, and to incorporate insights from other successful campaigns in

related product categories. Given a brand, to come up with themes for a new ad

is a painstaking and time consuming process for creative strategists.

Strategists typically draw inspiration from the images and text used for past

ad campaigns, as well as world knowledge on the brands. To automatically infer

ad themes via such multimodal sources of information in past ad campaigns, we

propose a theme (keyphrase) recommender system for ad creative strategists. The

theme recommender is based on aggregating results from a visual question

answering (VQA) task, which ingests the following: (i) ad images, (ii) text

associated with the ads as well as Wikipedia pages on the brands in the ads,

and (iii) questions around the ad. We leverage transformer based cross-modality

encoders to train visual-linguistic representations for our VQA task. We study

two formulations for the VQA task along the lines of classification and

ranking; via experiments on a public dataset, we show that cross-modal

representations lead to significantly better classification accuracy and

ranking precision-recall metrics. Cross-modal representations show better

performance compared to separate image and text representations. In addition,

the use of multimodal information shows a significant lift over using only

textual or visual information.

Audio Summarization with Audio Features and Probability Distribution Divergence

Carlos-Emiliano González-Gallardo , Romain Deveaud , Eric SanJuan , Juan-Manuel Torres

Comments: 20th International Conference on Computational Linguistics and Intelligent Text Processing

Subjects

:

Computation and Language (cs.CL)

; Information Retrieval (cs.IR)

The automatic summarization of multimedia sources is an important task that

facilitates the understanding of an individual by condensing the source while

maintaining relevant information. In this paper we focus on audio summarization

based on audio features and the probability of distribution divergence. Our

method, based on an extractive summarization approach, aims to select the most

relevant segments until a time threshold is reached. It takes into account the

segment’s length, position and informativeness value. Informativeness of each

segment is obtained by mapping a set of audio features issued from its

Mel-frequency Cepstral Coefficients and their corresponding Jensen-Shannon

divergence score. Results over a multi-evaluator scheme shows that our approach

provides understandable and informative summaries.

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang , Changyou Chen , Zhe Gan , Zheng Wen , Wenlin Wang , Lawrence Carin

Comments: Accepted by AISTATS2020

Subjects

:

Computation and Language (cs.CL)

; Machine Learning (cs.LG)

Reinforcement learning (RL) has been widely studied for improving

sequence-generation models. However, the conventional rewards used for RL

training typically cannot capture sufficient semantic information and therefore

render model bias. Further, the sparse and delayed rewards make RL exploration

inefficient. To alleviate these issues, we propose the concept of

nested-Wasserstein distance for distributional semantic matching. To further

exploit it, a novel nested-Wasserstein self-imitation learning framework is

developed, encouraging the model to exploit historical high-rewarded sequences

for enhanced exploration and better semantic matching. Our solution can be

understood as approximately executing proximal policy optimization with

Wasserstein trust-regions. Experiments on a variety of unconditional and

conditional sequence-generation tasks demonstrate the proposed approach

consistently leads to improved performance.

A multimodal deep learning approach for named entity recognition from social media

Meysam Asgari-Chenaghlu , M.Reza Feizi-Derakhshi , Leili Farzinvash , Cina Motamed Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI)

Named Entity Recognition (NER) from social media posts is a challenging task.

User generated content which forms the nature of social media, is noisy and

contains grammatical and linguistic errors. This noisy content makes it much

harder for tasks such as named entity recognition. However some applications

like automatic journalism or information retrieval from social media, require

more information about entities mentioned in groups of social media posts.

Conventional methods applied to structured and well typed documents provide

acceptable results while compared to new user generated media, these methods

are not satisfactory. One valuable piece of information about an entity is the

related image to the text. Combining this multimodal data reduces ambiguity and

provides wider information about the entities mentioned. In order to address

this issue, we propose a novel deep learning approach utilizing multimodal deep

learning. Our solution is able to provide more accurate results on named entity

recognition task. Experimental results, namely the precision, recall and F1

score metrics show the superiority of our work compared to other

state-of-the-art NER solutions.

From Speech-to-Speech Translation to Automatic Dubbing

Marcello Federico , Robert Enyedi , Roberto Barra-Chicote , Ritwik Giri , Umut Isik , Arvindh Krishnaswamy

Comments: 5 pages, 4 figures

Subjects

:

Computation and Language (cs.CL)

; Sound (cs.SD); Audio and Speech Processing (eess.AS)

We present enhancements to a speech-to-speech translation pipeline in order

to perform automatic dubbing. Our architecture features neural machine

translation generating output of preferred length, prosodic alignment of the

translation with the original speech segments, neural text-to-speech with fine

tuning of the duration of each utterance, and, finally, audio rendering to

enriches text-to-speech output with background noise and reverberation

extracted from the original audio. We report on a subjective evaluation of

automatic dubbing of excerpts of TED Talks from English into Italian, which

measures the perceived naturalness of automatic dubbing and the relative

importance of each proposed enhancement.

Capturing Evolution in Word Usage: Just Add More Clusters?

Syrielle Montariol , Elaine Zosa , Matej Martinc , Lidia Pivovarova

Comments: 7 pages

Subjects

:

Computation and Language (cs.CL)

The way the words are used evolves through time, mirroring cultural or

technological evolution of society. Semantic change detection is the task of

detecting and analysing word evolution in textual data, even in short periods

of time. This task has recently become a popular task in the NLP community. In

this paper we focus on a new set of methods relying on contextualised

embedding, a type of semantic modelling that revolutionised the field recently.

We leverage the ability of the transformer-based BERT model to generate

contextualised embeddings suitable to detect semantic change of words across

time. We compare our results to other approaches from the literature in a

common setting in order to establish strengths and weaknesses for each of them.

We also propose several ideas for improvements, managing to drastically improve

the performance of existing approaches.

Adaptive Parameterization for Neural Dialogue Generation

Hengyi Cai , Hongshen Chen , Cheng Zhang , Yonghao Song , Xiaofang Zhao , Dawei Yin

Comments: Published as a long paper in EMNLP 2019

Subjects

:

Computation and Language (cs.CL)

; Information Retrieval (cs.IR); Machine Learning (cs.LG)

Neural conversation systems generate responses based on the

sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with

a single set of learned parameters to generate responses for given input

contexts. When confronting diverse conversations, its adaptability is rather

limited and the model is hence prone to generate generic responses. In this

work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation

model, extsc{AdaND}, which manages various conversations with

conversation-specific parameterization. For each conversation, the model

generates parameters of the encoder-decoder by referring to the input context.

In particular, we propose two adaptive parameterization mechanisms: a

context-aware and a topic-aware parameterization mechanism. The context-aware

parameterization directly generates the parameters by capturing local semantics

of the given context. The topic-aware parameterization enables parameter

sharing among conversations with similar topics by first inferring the latent

topics of the given context and then generating the parameters with respect to

the distributional topics. Extensive experiments conducted on a large-scale

real-world conversational dataset show that our model achieves superior

performance in terms of both quantitative metrics and human evaluations.

R2DE: a NLP approach to estimating IRT parameters of newly generated questions

Luca Benedetto , Andrea Cappelli , Roberto Turrin , Paolo Cremonesi Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Machine Learning (stat.ML)

The main objective of exams consists in performing an assessment of students’

expertise on a specific subject. Such expertise, also referred to as skill or

knowledge level, can then be leveraged in different ways (e.g., to assign a

grade to the students, to understand whether a student might need some support,

etc.). Similarly, the questions appearing in the exams have to be assessed in

some way before being used to evaluate students. Standard approaches to

questions’ assessment are either subjective (e.g., assessment by human experts)

or introduce a long delay in the process of question generation (e.g.,

pretesting with real students). In this work we introduce R2DE (which is a

Regressor for Difficulty and Discrimination Estimation), a model capable of

assessing newly generated multiple-choice questions by looking at the text of

the question and the text of the possible choices. In particular, it can

estimate the difficulty and the discrimination of each question, as they are

defined in Item Response Theory. We also present the results of extensive

experiments we carried out on a real world large scale dataset coming from an

e-learning platform, showing that our model can be used to perform an initial

assessment of newly created questions and ease some of the problems that arise

in question generation.

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300

Zoltán Tüske , George Saon , Kartik Audhkhasi , Brian Kingsbury

Comments: 5 pages, 2 figures

Subjects

:

Audio and Speech Processing (eess.AS)

; Computation and Language (cs.CL)

It is generally believed that direct sequence-to-sequence (seq2seq) speech

recognition models are competitive with hybrid models only when a large amount

of data, at least a thousand hours, is available for training. In this paper,

we show that state-of-the-art recognition performance can be achieved on the

Switchboard-300 database using a single headed attention, LSTM based model.

Using a cross-utterance language model, our single-pass speaker independent

system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and

CallHome subsets of Hub5’00, without a pronunciation lexicon. While careful

regularization and data augmentation are crucial in achieving this level of

performance, experiments on Switchboard-2000 show that nothing is more useful

than more data.

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing VQA datasets contain questions with varying levels of complexity.

While the majority of questions in these datasets require perception for

recognizing existence, properties, and spatial relationships of entities, a

significant portion of questions pose challenges that correspond to reasoning

tasks — tasks that can only be answered through a synthesis of perception and

knowledge about the world, logic and / or reasoning. This distinction allows us

to notice when existing VQA models have consistency issues — they answer the

reasoning question correctly but fail on associated low-level perception

questions. For example, models answer the complex reasoning question “Is the

banana ripe enough to eat?” correctly, but fail on the associated perception

question “Are the bananas mostly green or yellow?” indicating that the model

likely answered the reasoning question correctly but for the wrong reason. We

quantify the extent to which this phenomenon occurs by creating a new Reasoning

split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of

200K new perception questions which serve as sub questions corresponding to the

set of perceptual tasks needed to effectively answer the complex reasoning

questions in the Reasoning split. Additionally, we propose an approach called

Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the

model to attend do the same parts of the image when answering the reasoning

question and the perception sub questions. We show that SQuINT improves model

consistency by 7.8%, also marginally improving its performance on the Reasoning

questions in VQA, while also displaying qualitatively better attention maps.

Ranking Significant Discrepancies in Clinical Reports

Sean MacAvaney , Arman Cohan , Nazli Goharian , Ross Filice

Comments: ECIR 2020 (short)

Subjects

:

Information Retrieval (cs.IR)

; Computation and Language (cs.CL)

Medical errors are a major public health concern and a leading cause of death

worldwide. Many healthcare centers and hospitals use reporting systems where

medical practitioners write a preliminary medical report and the report is

later reviewed, revised, and finalized by a more experienced physician. The

revisions range from stylistic to corrections of critical errors or

misinterpretations of the case. Due to the large quantity of reports written

daily, it is often difficult to manually and thoroughly review all the

finalized reports to find such errors and learn from them. To address this

challenge, we propose a novel ranking approach, consisting of textual and

ontological overlaps between the preliminary and final versions of reports. The

approach learns to rank the reports based on the degree of discrepancy between

the versions. This allows medical practitioners to easily identify and learn

from the reports in which their interpretation most substantially differed from

that of the attending physician (who finalized the report). This is a crucial

step towards uncovering potential errors and helping medical practitioners to

learn from such errors, thus improving patient-care in the long run. We

evaluate our model on a dataset of radiology reports and show that our approach

outperforms both previously-proposed approaches and more recent language models

by 4.5% to 15.4%.

Distributed, Parallel, and Cluster Computing

Lattice QCD on a novel vector architecture

Benjamin Huth , Nils Meyer , Tilo Wettig

Comments: Proceedings of Lattice 2019, 7 pages, 6 colorful figures

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; High Energy Physics – Lattice (hep-lat)

The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC’s SX

architecture family. Its multi-core vector processor features a vector length

of 16 kbits and interfaces with up to 48 GB of HBM2 memory in the current

models, available since 2018. The compute performance is up to 2.45 TFlop/s

peak in double precision, and the memory throughput is up to 1.2 TB/s peak. New

models with improved performance characteristics are announced for the near

future. In this contribution we discuss key aspects of the SX-Aurora and

describe how we enabled the architecture in the Grid Lattice QCD framework.

An IoT Platform-as-a-service for NFV Based — Hybrid Cloud / Fog Systems

Carla Mouradian , Fereshteh Ebrahimnezhad , Yassine Jebbar , Jasmeen Kaur Ahluwalia , Seyedeh Negar Afrasiabi , Roch H. Glitho , Ashok Moghe Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Software Engineering (cs.SE)

Cloud computing, despite its inherent advantages (e.g., resource efficiency)

still faces several challenges. the wide are network used to connect the cloud

to end-users could cause high latency, which may not be tolerable for some

applications, especially Internet of Things (IoT applications. Fog computing

can reduce this latency by extending the traditional cloud architecture to the

edge of the network and by enabling the deployment of some application

components on fog nodes. Application providers use Platform-as-a-Service (PaaS)

to provision (i.e., develop, deploy, manage, and orchestrate) applications in

cloud. However, existing PaaS solutions (including IoT PaaS) usually focus on

cloud and do not enable provisioning of applications with components spanning

cloud and fog. provisioning such applications require novel functions, such as

application graph generation, that are absent from existing PaaS. Furthermore,

several functions offered by existing PaaS (e.g., publication/discovery) need

to be significantly extended in order to fit in a hybrid cloud/fog environment.

In this paper, we propose a novel architecture for PaaS for hybrid cloud/fog

system. It is IoT use case-driven, and its applications’ components are

implemented as Virtual Network Functions (VNFs) with execution sequences

modeled s graphs with sub-structures such as selection and loops. It automates

the provisioning of applications with components spanning cloud and fog. In

addition, it enables the discovery of existing cloud and fog nodes and

generates application graphs. A proof of concept is built based on Cloudify

open source. Feasibility is demonstrated by evaluating its performance when

PaaS modules and application components are placed in clouds and fogs in

different geographical locations.

Self Organization Agent Oriented Dynamic Resource Allocation on Open Federated Clouds Environment

Kemchi Sofiane , Abdelhafid Zitouni (LIRE), Mahieddine Djoudi (TECHNÉ – EA 6316)

Journal-ref: International Conference on Cloud Computing Technologies and

Applications, CLOUDTECH 2016, May 2016, Marrakech, Morocco

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

To ensure uninterrupted services to the cloud clients from federated cloud

providers, it is important to guarantee an efficient allocation of the cloud

resources to users to improve the rate of client satisfaction and the quality

of the service provisions. It is better to get as more computing and storage

resources as possible. In cloud domain several Multi Agent Resource Allocation

methods have been proposed to implement the problem of dynamic resource

allocation. However the problem is still open and many works to do in this

field. In cloud computing robustness is important so in this paper we focus on

auto-adaptive method to deal with changes of open federated cloud computing

environment. Our approach is hybrid, we first adopt an existing organizations

optimization approach for self organization in broker agent organization to

combine it with already existing Multi Agent Resource Allocation approach on

Federated Clouds. We consider an open clouds federation environment which is

dynamic and in constant evolution, new cloud operators can join the federation

or leave this one. At the same time our approach is multi criterion which can

take in account various parameters (i.e. computing load balance of mediator

agent, geographical distance (network delay) between costumer and provider…).

Serverless Straggler Mitigation using Local Error-Correcting Codes

Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT)

Inexpensive cloud services, such as serverless computing, are often

vulnerable to straggling nodes that increase end-to-end latency for distributed

computation. We propose and implement simple yet principled approaches for

straggler mitigation in serverless systems for matrix multiplication and

evaluate them on several common applications from machine learning and

high-performance computing. The proposed schemes are inspired by

error-correcting codes and employ parallel encoding and decoding over the data

stored in the cloud using serverless workers. This creates a fully distributed

computing framework without using a master node to conduct encoding or

decoding, which removes the computation, communication and storage bottleneck

at the master. On the theory side, we establish that our proposed scheme is

asymptotically optimal in terms of decoding time and provide a lower bound on

the number of stragglers it can tolerate with high probability. Through

extensive experiments, we show that our scheme outperforms existing schemes

such as speculative execution and other coding theoretic methods by at least

25%.

Transparently Capturing Request Execution Path for Anomaly Detection

Yong Yang , Long Wang , Jing Gu , Ying Li

Comments: 13pages, 7 figures

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

With the increasing scale and complexity of cloud systems and big data

analytics platforms, it is becoming more and more challenging to understand and

diagnose the processing of a service request in such distributed platforms. One

way that helps to deal with this problem is to capture the complete end-to-end

execution path of service requests among all involved components accurately.

This paper presents REPTrace, a generic methodology for capturing such

execution paths in a transparent fashion. We analyze a comprehensive list of

execution scenarios, and propose principles and algorithms for generating the

end-to-end request execution path for all the scenarios. Moreover, this paper

presents an anomaly detection approach exploiting request execution paths to

detect anomalies of the execution during request processing. The experiments on

four popular distributed platforms with different workloads show that REPTrace

can transparently capture the accurate request execution path with reasonable

latency and negligible network overhead. Fault injection experiments show that

execution anomalies are detected with high recall (96%).

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

Lorenz Braun , Sotirios Nikas , Chen Song , Vincent Heuveline , Holger Fröning Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Machine Learning (cs.LG); Performance (cs.PF)

Characterizing compute kernel execution behavior on GPUs for efficient task

scheduling is a non trivial task. We address this with a simple model enabling

portable and fast predictions among different GPUs using only

hardware-independent features extracted. This model is built based on random

forests using 189 individual compute kernels from benchmarks such as Parboil,

Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using

cross-validation yields a median Mean Average Percentage Error (MAPE) of

[13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on

five different GPUs, while latency for a single prediction varies between 0.1

and 0.2 seconds.

OpenMP Parallelization of Dynamic Programming and Greedy Algorithms

Claude Tadonki Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Multicore has emerged as a typical architecture model since its advent and

stands now as a standard. The trend is to increase the number of cores and

improve the performance of the memory system. Providing an efficient multicore

implementation for a important algorithmic kernel is a genuine contribution.

From a methodology standpoint, this should be done at the level of the

underlying paradigm if any. In this paper, we study the cases of {em dynamic

programming} and {em greedy algorithms}, which are two major algorithmic

paradigms. We exclusively consider directives-based loop parallelization

through OpenMP and investigate necessary pre-transformations to reach a regular

parallel form. We evaluate our methodology with a selection of well-known

combinatorial optimization problems on an INTEL Broadwell processor. Key points

for scalability are discussed before and after experimental results. Our

immediate perspective is to extend our study to the manycore case, with a

special focus on NUMA configurations.

Blockchain Consensuses Algorithms: A Survey

Md Sadek Ferdous , Mohammad Jabed Morshed Chowdhury , Mohammad A. Hoque , Alan Colman Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

In recent years, blockchain technology has received unparalleled attention

from academia, industry, and governments all around the world. It is considered

a technological breakthrough anticipated to disrupt several application

domains. This has resulted in a plethora of blockchain systems for various

purposes. However, many of these blockchain systems suffer from serious

shortcomings related to their performance and security, which need to be

addressed before any wide-scale adoption can be achieved. A crucial component

of any blockchain system is its underlying consensus algorithm, which in many

ways, determines its performance and security. Therefore, to address the

limitations of different blockchain systems, several existing as well novel

consensus algorithms have been introduced. A systematic analysis of these

algorithms will help to understand how and why any particular blockchain

performs the way it functions. However, the existing studies of consensus

algorithms are not comprehensive. Those studies have incomplete discussions on

the properties of the algorithms and fail to analyse several major blockchain

consensus algorithms in terms of their scopes. This article fills this gap by

analysing a wide range of consensus algorithms using a comprehensive taxonomy

of properties and by examining the implications of different issues still

prevalent in consensus algorithms in detail. The result of the analysis is

presented in tabular formats, which provides a visual illustration of these

algorithms in a meaningful way. We have also analysed more than hundred top

crypto-currencies belonging to different categories of consensus algorithms to

understand their properties and to implicate different trends in these

crypto-currencies. Finally, we have presented a decision tree of algorithms to

be used as a tool to test the suitability of consensus algorithms under

different criteria.

2PS: High-Quality Edge Partitioning with Two-Phase Streaming

Ruben Mayer , Kamil Orujzade , Hans-Arno Jacobsen

Comments: in submission

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

Graph partitioning is an important preprocessing step to distributed graph

processing. In edge partitioning, the edge set of a given graph is split into

(k) equally-sized partitions, such that the replication of vertices across

partitions is minimized. Streaming is a viable approach to partition graphs

that exceed the memory capacities of a single server. The graph is ingested as

a stream of edges, and one edge at a time is immediately and irrevocably

assigned to a partition based on a scoring function. However, streaming

partitioning suffers from the uninformed assignment problem: At the time of

partitioning early edges in the stream, there is no information available about

the rest of the edges. As a consequence, edge assignments are often driven by

balancing considerations, and the achieved replication factor is comparably

high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for

high-quality edge partitioning. In the first phase, vertices are separated into

clusters by a lightweight streaming clustering algorithm. In the second phase,

the graph is re-streamed and edge partitioning is performed while taking into

account the clustering of the vertices from the first phase. Our evaluations

show that 2PS can achieve a replication factor that is comparable to

heavy-weight random access partitioners while inducing orders of magnitude

lower memory overhead.

Distributed Vehicular Computing at the Dawn of 5G: a Survey

Ahmad Alhilal , Tristan Braud , Pan Hui

Comments: 34 pages, 10 figures, Journal

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

Recent advances in information technology have revolutionized the automotive

industry, paving the way for next-generation smart and connected vehicles.

Connected vehicles can collaborate to deliver novel services and applications.

These services and applications require 1) massive volumes of data that

perceive ambient environments, 2) ultra-reliable and low-latency communication

networks, 3) real-time data processing which provides decision support under

application-specific constraints. Addressing such constraints introduces

significant challenges with current communication and computation technologies.

Coincidentally, the fifth generation of cellular networks (5G) was developed to

respond to communication challenges by providing an infrastructure for

low-latency, high-reliability, and high bandwidth communication. At the core of

this infrastructure, edge computing allows data offloading and computation at

the edge of the network, ensuring low-latency and context-awareness, and

pushing the utilization efficiency of 5G to its limit. In this paper, we aim at

providing a comprehensive overview of the state of research on vehicular

computing in the emerging age of 5G. After reviewing the main vehicular

applications requirements and challenges, we follow a bottom-up approach,

starting with the promising technologies for vehicular communications, all the

way up to Artificial Intelligence (AI) solutions. We explore the various

architectures for vehicular computing, including centralized Cloud Computing,

Vehicular Cloud Computing, and Vehicular Edge computing, and investigate the

potential data analytics technologies and their integration on top of the

vehicular computing architectures. We finally discuss several future research

directions and applications for vehicular computation systems.

BAASH: Enabling Blockchain-as-a-Service on High-Performance Computing Systems

Abdullah Al-Mamun , Dongfang Zhao Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

The state-of-the-art approach to manage blockchains is to process blocks of

transactions in a shared-nothing environment. Although blockchains have the

potential to provide various services for high-performance computing (HPC)

systems, HPC will not be able to embrace blockchains before the following two

missing pieces become available: (i) new consensus protocols being aware of the

shared-storage architecture in HPC, and (ii) new fault-tolerant mechanisms

compensating for HPC’s programming model—the message passing interface

(MPI)—that is vulnerable for blockchain-like workloads. To this end, we

design a new set of consensus protocols crafted for the HPC platforms and a new

fault-tolerance subsystem compensating for the failures caused by faulty MPI

processes. Built on top of the new protocols and fault-tolerance mechanism, a

prototype system is implemented and evaluated with two million transactions on

a 500-core HPC cluster, showing (6 imes), (12 imes), and (75 imes) higher

throughput than Hyperldeger, Ethereum, and Parity, respectively.

Contract-connection:An efficient communication protocol for Distributed Ledger Technology

Yibin Xu , Yangyu Huang

Journal-ref: 2019 IEEE 38th International Performance Computing and

Communications Conference (IPCCC)

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Cryptography and Security (cs.CR)

Distributed Ledger Technology (DLT) is promising to become the foundation of

many decentralised systems. However, the unbalanced and unregulated network

layout contributes to the inefficiency of DLT especially in the Internet of

Things (IoT) environments, where nodes connect to only a limited number of

peers. The data communication speed globally is unbalanced and does not live up

to the constraints of efficient real-time distributed systems. In this paper,

we introduce a new communication protocol, which enables nodes to calculate the

tradeoff between connecting/disconnecting a peer in a completely decentralised

manner. The network layout globally is continuously re-balancing and optimising

along with nodes adjusting their peers. This communication protocol weakened

the inequality of the communication network. The experiment suggests this

communication protocol is stable and efficient.

The Parallelism Motifs of Genomic Data Analysis

Katherine Yelick , Aydin Buluc , Muaaz Awan , Ariful Azad , Benjamin Brock , Rob Egan , Saliya Ekanayake , Marquita Ellis , Evangelos Georganas , Giulia Guidi , Steven Hofmeyr , Oguz Selvitopi , Cristina Teodoropol , Leonid Oliker Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Genomic data sets are growing dramatically as the cost of sequencing

continues to decline and small sequencing devices become available. Enormous

community databases store and share this data with the research community, but

some of these genomic data analysis problems require large scale computational

platforms to meet both the memory and computational requirements. These

applications differ from scientific simulations that dominate the workload on

high end parallel systems today and place different requirements on programming

support, software libraries, and parallel architectural design. For example,

they involve irregular communication patterns such as asynchronous updates to

shared data structures. We consider several problems in high performance

genomics analysis, including alignment, profiling, clustering, and assembly for

both single genomes and metagenomes. We identify some of the common

computational patterns or motifs that help inform parallelization strategies

and compare our motifs to some of the established lists, arguing that at least

two key patterns, sorting and hashing, are missing.

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

Jeremy Kepner , Tim Davis , Chansup Byun , William Arcand , David Bestor , William Bergeron , Vijay Gadepally , Matthew Hubbell , Michael Houle , Michael Jones , Anna Klein , Peter Michaleas , Lauren Milechin , Julie Mullen , Andrew Prout , Antonio Rosa , Siddharth Samsi , Charles Yee , Albert Reuther

Comments: 3 pages, 2 figures, 28 references, accepted to Northeast Database Day (NEDB) 2020. arXiv admin note: substantial text overlap with arXiv:1907.04217

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF); Social and Information Networks (cs.SI)

The SuiteSparse GraphBLAS C-library implements high performance hypersparse

matrices with bindings to a variety of languages (Python, Julia, and

Matlab/Octave). GraphBLAS provides a lightweight in-memory database

implementation of hypersparse matrices that are ideal for analyzing many types

of network data, while providing rigorous mathematical guarantees, such as

linearity. Streaming updates of hypersparse matrices put enormous pressure on

the memory hierarchy. This work benchmarks an implementation of hierarchical

hypersparse matrices that reduces memory pressure and dramatically increases

the update rate into a hypersparse matrices. The parameters of hierarchical

hypersparse matrices rely on controlling the number of entries in each level in

the hierarchy before an update is cascaded. The parameters are easily tunable

to achieve optimal performance for a variety of applications. Hierarchical

hypersparse matrices achieve over 1,000,000 updates per second in a single

instance. Scaling to 31,000 instances of hierarchical hypersparse matrices

arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update

rate of 75,000,000,000 updates per second. This capability allows the MIT

SuperCloud to analyze extremely large streaming network data sets.

CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via Sharding

Mengqian Zhang , Jichen Li , Zhaohua Chen , Hongyin Chen , Xiaotie Deng

Comments: short version in IPDPS 2020

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Cryptography and Security (cs.CR)

Traditional public distributed ledgers have not been able to scale-out well

and work efficiently. Sharding is deemed as a promising way to solve this

problem. By partitioning all nodes into small committees and letting them work

in parallel, we can significantly lower the amount of communication and

computation, reduce the overhead on each node’s storage, as well as enhance the

throughput of distributed ledger. Existing sharding-based protocols still

suffer from several serious drawbacks. The first thing is that all honest nodes

must connect well with each other, which demands a huge number of communication

channels in the network. Moreover, previous protocols have face great loss in

efficiency in the case where the honesty of each committee’s leader is in

question. At the same time, no explicit incentive is provided for nodes to

actively participate in the protocol.

We present CycLedger, a scalable and secure parallel protocol for distributed

ledger via sharding. Our protocol selects a leader and a partial set for each

committee, who are in charge of maintaining intra-shard consensus and

communicating with other committees, to reduce the amortized complexity of

communication, computation and storage on all nodes. We introduce a novel

commitment scheme between committees and a recovery procedure to prevent the

system from crashing even when leaders of committees are malicious. To add

incentive for the network, we use the concept of reputation, which measures

each node’s computing power. As nodes with higher reputation receive more

rewards, there is an encouragement for nodes with strong computing ability to

work honestly so as to gain reputation. In this way, we strike out a new path

to establish scalability, security and incentive for the sharding-based

distributed ledger.

Combining Federated and Active Learning for Communication-efficient Distributed Failure Prediction in Aeronautics

Nicolas Aussel (INF, ACMES-SAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMES-SAMOVAR), Yohan Petetin (TIPIC-SAMOVAR, CITI, IP Paris) Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Machine Learning has proven useful in the recent years as a way to achieve

failure prediction for industrial systems. However, the high computational

resources necessary to run learning algorithms are an obstacle to its

widespread application. The sub-field of Distributed Learning offers a solution

to this problem by enabling the use of remote resources but at the expense of

introducing communication costs in the application that are not always

acceptable. In this paper, we propose a distributed learning approach able to

optimize the use of computational and communication resources to achieve

excellent learning model performances through a centralized architecture. To

achieve this, we present a new centralized distributed learning algorithm that

relies on the learning paradigms of Active Learning and Federated Learning to

offer a communication-efficient method that offers guarantees of model

precision on both the clients and the central server. We evaluate this method

on a public benchmark and show that its performances in terms of precision are

very close to state-of-the-art performance level of non-distributed learning

despite additional constraints.

Fast Sequence-Based Embedding with Diffusion Graphs

Benedek Rozemberczki , Rik Sarkar

Comments: Source code available at: this https URL

Journal-ref: CompleNet 2018

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A graph embedding is a representation of graph vertices in a low-dimensional

space, which approximately preserves properties such as distances between

nodes. Vertex sequence-based embedding procedures use features extracted from

linear sequences of nodes to create embeddings using a neural network. In this

paper, we propose diffusion graphs as a method to rapidly generate vertex

sequences for network embedding. Its computational efficiency is superior to

previous methods due to simpler sequence generation, and it produces more

accurate results. In experiments, we found that the performance relative to

other methods improves with increasing edge density in the graph. In a

community detection task, clustering nodes in the embedding space produces

better results compared to other sequence-based embedding methods.

PoAh: A Novel Consensus Algorithm for Fast Scalable Private Blockchain for Large-scale IoT Frameworks

Deepak Puthal , Saraju P. Mohanty , Venkata P. Yanambaka , Elias Kougianos

Comments: 26 pages, 18 figures

Subjects

:

Cryptography and Security (cs.CR)

; Distributed, Parallel, and Cluster Computing (cs.DC)

In today’s connected world, resource constrained devices are deployed for

sensing and decision making applications, ranging from smart cities to

environmental monitoring. Those recourse constrained devices are connected to

create real-time distributed networks popularly known as the Internet of Things

(IoT), fog computing and edge computing. The blockchain is gaining a lot of

interest in these domains to secure the system by ignoring centralized

dependencies, where proof-of-work (PoW) plays a vital role to make the whole

security solution decentralized. Due to the resource limitations of the

devices, PoW is not suitable for blockchain-based security solutions. This

paper presents a novel consensus algorithm called Proof-of-Authentication

(PoAh), which introduces a cryptographic authentication mechanism to replace

PoW for resource constrained devices, and to make the blockchain

application-specific. PoAh is thus suitable for private as well as permissioned

blockchains. Further, PoAh not only secures the systems, but also maintains

system sustainability and scalability. The proposed consensus algorithm is

evaluated theoretically in simulation scenarios, and in real-time hardware

testbeds to validate its performance. Finally, PoAh and its integration with

the blockchain in the IoT and edge computing scenarios is discussed. The

proposed PoAh, while running in limited computer resources (e.g. single-board

computing devices like the Raspberry Pi) has a latency in the order of 3 secs.

Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems

Burak Hasircioglu , Jesus Gomez-Vilardebo , Deniz Gunduz Subjects : Information Theory (cs.IT) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Polynomial coding has been proposed as a solution to the straggler mitigation

problem in distributed matrix multiplication. Previous works in the literature

employ univariate polynomials to encode matrix partitions. Such schemes greatly

improve the speed of distributed computing systems by making the task

completion time to depend only on the fastest workers. However, the work done

by the slowest workers, which fails to finish the task assigned to them, is

completely ignored. In order to exploit the partial computations of the slower

workers, we further decompose the overall matrix multiplication task into even

smaller subtasks to better fit workers’ storage and computation capacities. In

this work, we show that univariate schemes fail to make an efficient use of the

storage capacity and we propose bivariate polynomial codes. We show that

bivariate polynomial codes are a more natural choice to accommodate the

additional decomposition of subtasks, as well as, heterogeneous storage and

computation resources at workers. However, in contrast to univariate polynomial

decoding, for multivariate interpolation guarantying decodability is much

harder. We propose two bivartiate polynomial schemes. The first scheme exploits

the fact that bivariate interpolation is always possible for rectangular grid

of points. We obtain the rectangular grid of points at the cost of allowing

some redundant computations. For the second scheme, we relax the decoding

constraint, and require decodability for almost all choices of evaluation

points. We present interpolation sets satisfying the almost decodability

conditions for certain storage configurations of workers. Our numerical results

show that bivariate polynomial coding considerably reduces the completion time

of distributed matrix multiplication.

Finding temporal patterns using algebraic fingerprints

Suhas Thejaswi , Aristides Gionis Subjects : Data Structures and Algorithms (cs.DS) ; Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)

In this paper we study a family of pattern-detection problems in

vertex-colored temporal graphs. In particular, given a vertex-colored temporal

graph and a multi-set of colors as a query, we search for temporal paths in the

graph that contain the colors specified in the query. These types of problems

have several interesting applications, for example, recommending tours for

tourists, or searching for abnormal behavior in a network of financial

transactions.

For the family of pattern-detection problems we define, we establish

complexity results and design an algebraic-algorithmic framework based on

constrained multilinear sieving. We demonstrate that our solution can scale to

massive graphs with up to hundred million edges, despite the problems being

NP-hard. Our implementation, which is publicly available, exhibits practical

edge-linear scalability and highly optimized. For example, in a real-world

graph dataset with more than six million edges and a multi-set query with ten

colors, we can extract an optimal solution in less than eight minutes on a

haswell desktop with four cores.

High-Quality Hierarchical Process Mapping

Marcelo Fonseca Faraj , Alexander van der Grinten , Henning Meyerhenke , Jesper Larsson Träff , Christian Schulz Subjects : Data Structures and Algorithms (cs.DS) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Partitioning graphs into blocks of roughly equal size such that few edges run

between blocks is a frequently needed operation when processing graphs on a

parallel computer. When a topology of a distributed system is known an

important task is then to map the blocks of the partition onto the processors

such that the overall communication cost is reduced. We present novel

multilevel algorithms that integrate graph partitioning and process mapping.

Important ingredients of our algorithm include fast label propagation, more

localized local search, initial partitioning, as well as a compressed data

structure to compute processor distances without storing a distance matrix.

Experiments indicate that our algorithms speed up the overall mapping process

and, due to the integrated multilevel approach, also find much better solutions

in practice. For example, one configuration of our algorithm yields better

solutions than the previous state-of-the-art in terms of mapping quality while

being a factor 62 faster. Compared to the currently fastest iterated multilevel

mapping algorithm Scotch, we obtain 16% better solutions while investing

slightly more running time.

Segment blockchain: A size reduced storage mechanism for blockchain

Yibin Xu , Yangyu Huang

Journal-ref: IEEE Access,2020. https://ieeexplore.ieee.org/document/8957450/

Subjects

:

Cryptography and Security (cs.CR)

; Distributed, Parallel, and Cluster Computing (cs.DC)

The exponential growth of the blockchain size has become a major contributing

factor that hinders the decentralisation of blockchain and its potential

implementations in data-heavy applications. In this paper, we propose segment

blockchain, an approach that segmentises blockchain and enables nodes to only

store a copy of one blockchain segment. We use emph{PoW} as a membership

threshold to limit the number of nodes taken by an Adversary—the Adversary

can only gain at most (n/2) of nodes in a network of (n) nodes when it has

(50\%) of the calculation power in the system (the Nakamoto blockchain security

threshold). A segment blockchain system fails when an Adversary stores all

copies of a segment, because the Adversary can then leave the system, causing a

permanent loss of the segment. We theoretically prove that segment blockchain

can sustain a ((AD/n)^m) failure probability when the Adversary has no more

than (AD) number of nodes and every segment is stored by (m) number of nodes.

The storage requirement is mostly shrunken compared to the traditional design

and therefore making the blockchain more suitable for data-heavy applications.

BlockHouse: Blockchain-based Distributed Storehouse System

Doriane Perard , Lucas Gicquel , Jérôme Lacan

Comments: Published in “9TH Latin-American Symposium on Dependable Computing”, 2019

Subjects

:

Cryptography and Security (cs.CR)

; Distributed, Parallel, and Cluster Computing (cs.DC)

We propose in this paper BlockHouse, a decentralized/P2P storage system fully

based on private blockchains. Each participant can rent his unused storage in

order to host data of other members. This system uses a dual Smart Contract and

Proof of Retrievability system to automatically check at a fixed frequency if

the file is still hosted. In addition to transparency, the blockchain allows a

better integration with all payments associated to this type of system (

regular payments, sequestration to ensure good behaviors of users, …). Except

the data transferred between the client and the server, all the actions go

through a smart contract in the blockchain in order to log, pay and secure the

entire storage process.

Learning

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn , David Berthelot , Chun-Liang Li , Zizhao Zhang , Nicholas Carlini , Ekin D. Cubuk , Alex Kurakin , Han Zhang , Colin Raffel Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Semi-supervised learning (SSL) provides an effective means of leveraging

unlabeled data to improve a model’s performance. In this paper, we demonstrate

the power of a simple combination of two common SSL methods: consistency

regularization and pseudo-labeling. Our algorithm, FixMatch, first generates

pseudo-labels using the model’s predictions on weakly-augmented unlabeled

images. For a given image, the pseudo-label is only retained if the model

produces a high-confidence prediction. The model is then trained to predict the

pseudo-label when fed a strongly-augmented version of the same image. Despite

its simplicity, we show that FixMatch achieves state-of-the-art performance

across a variety of standard semi-supervised learning benchmarks, including

94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 — just

4 labels per class. Since FixMatch bears many similarities to existing SSL

methods that achieve worse performance, we carry out an extensive ablation

study to tease apart the experimental factors that are most important to

FixMatch’s success. We make our code available at

this https URL .

Deceptive AI Explanations: Creation and Detection

Johannes Schneider , Joshua Handali , Michalis Vlachos , Christian Meske Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Artificial intelligence comes with great opportunities and but also great

risks. We investigate to what extent deep learning can be used to create and

detect deceptive explanations that either aim to lure a human into believing a

decision that is not truthful to the model or provide reasoning that is

non-faithful to the decision. Our theoretical insights show some limits of

deception and detection in the absence of domain knowledge. For empirical

evaluation, we focus on text classification. To create deceptive explanations,

we alter explanations originating from GradCAM, a state-of-art technique for

creating explanations in neural networks. We evaluate the effectiveness of

deceptive explanations on 200 participants. Our findings indicate that

deceptive explanations can indeed fool humans. Our classifier can detect even

seemingly minor attempts of deception with accuracy that exceeds 80\% given

sufficient domain knowledge encoded in the form of training data.

Mobility Inference on Long-Tailed Sparse Trajectory

Lei Shi Subjects : Machine Learning (cs.LG) ; Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Analyzing the urban trajectory in cities has become an important topic in

data mining. How can we model the human mobility consisting of stay and travel

from the raw trajectory data? How can we infer such a mobility model from the

single trajectory information? How can we further generalize the mobility

inference to accommodate the real-world trajectory data that is sparsely

sampled over time?

In this paper, based on formal and rigid definitions of the stay/travel

mobility, we propose a single trajectory inference algorithm that utilizes a

generic long-tailed sparsity pattern in the large-scale trajectory data. The

algorithm guarantees a 100\% precision in the stay/travel inference with a

provable lower-bound in the recall. Furthermore, we introduce an

encoder-decoder learning architecture that admits multiple trajectories as

inputs. The architecture is optimized for the mobility inference problem

through customized embedding and learning mechanism. Evaluations with three

trajectory data sets of 40 million urban users validate the performance

guarantees of the proposed inference algorithm and demonstrate the superiority

of our deep learning model, in comparison to well-known sequence learning

methods. On extremely sparse trajectories, the deep learning model achieves a

2( imes) overall accuracy improvement from the single trajectory inference

algorithm, through proven scalability and generalizability to large-scale

versatile training data.

Generate High-Resolution Adversarial Samples by Identifying Effective Features

Sizhe Chen , Peidong Zhang , Chengjin Sun , Jia Cai , Xiaolin Huang Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

As the prevalence of deep learning in computer vision, adversarial samples

that weaken the neural networks emerge in large numbers, revealing their

deep-rooted defects. Most adversarial attacks calculate an imperceptible

perturbation in image space to fool the DNNs. In this strategy, the

perturbation looks like noise and thus could be mitigated. Attacks in feature

space produce semantic perturbation, but they could only deal with low

resolution samples. The reason lies in the great number of coupled features to

express a high-resolution image. In this paper, we propose Attack by

Identifying Effective Features (AIEF), which learns different weights for

features to attack. Effective features, those with great weights, influence the

victim model much but distort the image little, and thus are more effective for

attack. By attacking mostly on them, AIEF produces high resolution adversarial

samples with acceptable distortions. We demonstrate the effectiveness of AIEF

by attacking on different tasks with different generative models.

batchboost: regularization for stabilizing training with resistance to underfitting & overfitting

Maciej A. Czyzewski

Comments: 6 pages; 5 figures

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Overfitting & underfitting and stable training are an important challenges in

machine learning. Current approaches for these issues are mixup, SamplePairing

and BC learning. In our work, we state the hypothesis that mixing many images

together can be more effective than just two. Batchboost pipeline has three

stages: (a) pairing: method of selecting two samples. (b) mixing: how to create

a new one from two samples. (c) feeding: combining mixed samples with new ones

from dataset into batch (with ratio (gamma)). Note that sample that appears in

our batch propagates with subsequent iterations with less and less importance

until the end of training. Pairing stage calculates the error per sample, sorts

the samples and pairs with strategy: hardest with easiest one, than mixing

stage merges two samples using mixup, (x_1 + (1-lambda)x_2). Finally, feeding

stage combines new samples with mixed by ratio 1:1. Batchboost has 0.5-3%

better accuracy than the current state-of-the-art mixup regularization on

CIFAR-10 & Fashion-MNIST. Our method is slightly better than SamplePairing

technique on small datasets (up to 5%). Batchboost provides stable training on

not tuned parameters (like weight decay), thus its a good method to test

performance of different architectures. Source code is at:

this https URL

EdgeNets:Edge Varying Graph Neural Networks

Elvin Isufi , Fernando Gama , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Machine Learning (stat.ML)

Driven by the outstanding performance of neural networks in the structured

Euclidean domain, recent years have seen a surge of interest in developing

neural networks for graphs and data supported on graphs. The graph is leveraged

as a parameterization to capture detail at the node level with a reduced number

of parameters and complexity. Following this rationale, this paper puts forth a

general framework that unifies state-of-the-art graph neural networks (GNNs)

through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows

different nodes to use different parameters to weigh the information of

different neighbors. By extrapolating this strategy to more iterations between

neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to

capture local detail. This is the most general local operation that a node can

do and encompasses under one formulation all graph convolutional neural

networks (GCNNs) as well as graph attention networks (GATs). In writing

different GNN architectures with a common language, EdgeNets highlight specific

architecture advantages and limitations, while providing guidelines to improve

their capacity without compromising their local implementation. For instance,

we show that GCNNs have a parameter sharing structure that induces permutation

equivariance. This can be an advantage or a limitation, depending on the

application. When it is a limitation, we propose hybrid approaches and provide

insights to develop several other solutions that promote parameter sharing

without enforcing permutation equivariance. Another interesting conclusion is

the unification of GCNNs and GATs -approaches that have been so far perceived

as separate. In particular, we show that GATs are GCNNs on a graph that is

learned from the features. This particularization opens the doors to develop

alternative attention mechanisms for improving discriminatory power.

Simple and Effective Graph Autoencoders with One-Hop Linear Models

Guillaume Salha , Romain Hennequin , Michalis Vazirgiannis

Comments: Under review. A preliminary version of this work has previously been presented at a NeurIPS 2019 workshop without proceedings: arXiv:1910.00942

Subjects

:

Machine Learning (cs.LG)

; Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged

as powerful node embedding methods, with promising performances on challenging

tasks such as link prediction and node clustering. Graph AE, VAE and most of

their extensions rely on graph convolutional networks (GCN) encoders to learn

vector space representations of nodes. In this paper, we propose to replace the

GCN encoder by a significantly simpler linear model w.r.t. the direct

neighborhood (one-hop) adjacency matrix of the graph. For the two

aforementioned tasks, we show that this approach consistently reaches

competitive performances w.r.t. GCN-based models for numerous real-world

graphs, including all benchmark datasets commonly used to evaluate graph AE and

VAE. We question the relevance of repeatedly using these datasets to compare

complex graph AE and VAE. We also emphasize the effectiveness of the proposed

encoding scheme, that appears as a simpler and faster alternative to GCN

encoders for many real-world applications.

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

Kyoung-Woon On , Eun-Sol Kim , Yu-Jung Heo , Byoung-Tak Zhang

Comments: 8 pages, 3 figures, Association for the Advancement of Artificial Intelligence (AAAI2020). arXiv admin note: substantial text overlap with arXiv:1907.01709

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Conventional sequential learning methods such as Recurrent Neural Networks

(RNNs) focus on interactions between consecutive inputs, i.e. first-order

Markovian dependency. However, most of sequential data, as seen with videos,

have complex dependency structures that imply variable-length semantic flows

and their compositions, and those are hard to be captured by conventional

methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for

learning video data by discovering these complex structures of the video. The

CB-GLNs represent video data as a graph, with nodes and edges corresponding to

frames of the video and their dependencies respectively. The CB-GLNs find

compositional dependencies of the data in multilevel graph forms via a

parameterized kernel with graph-cut and a message passing framework. We

evaluate the proposed method on the two different tasks for video

understanding: Video theme classification (Youtube-8M dataset) and Video

Question and Answering (TVQA dataset). The experimental results show that our

model efficiently learns the semantic compositional structure of video data.

Furthermore, our model achieves the highest performance in comparison to other

baseline methods.

Analytic Properties of Trackable Weak Models

Mark Chilenski , George Cybenko , Isaac Dekine , Piyush Kumar , Gil Raz

Comments: 10 pages, 9 figures

Subjects

:

Machine Learning (cs.LG)

; Combinatorics (math.CO); Machine Learning (stat.ML)

We present several new results on the feasibility of inferring the hidden

states in strongly-connected trackable weak models. Here, a weak model is a

directed graph in which each node is assigned a set of colors which may be

emitted when that node is visited. A hypothesis is a node sequence which is

consistent with a given color sequence. A weak model is said to be trackable if

the worst case number of such hypotheses grows as a polynomial in the sequence

length. We show that the number of hypotheses in strongly-connected trackable

models is bounded by a constant and give an expression for this constant. We

also consider the problem of reconstructing which branch was taken at a node

with same-colored out-neighbors, and show that it is always eventually possible

to identify which branch was taken if the model is strongly connected and

trackable. We illustrate these properties by assigning transition probabilities

and employing standard tools for analyzing Markov chains. In addition, we

present new results for the entropy rates of weak models according to whether

they are trackable or not. These theorems indicate that the combination of

trackability and strong connectivity dramatically simplifies the task of

reconstructing which nodes were visited. This work has implications for any

problem which can be described in terms of an agent traversing a colored graph,

such as the reconstruction of hidden states in a hidden Markov model (HMM).

Understanding the Limitations of Network Online Learning

Timothy LaRock , Timothy Sakharov , Sahely Bhadra , Tina Eliassi-Rad Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Studies of networked phenomena, such as interactions in online social media,

often rely on incomplete data, either because these phenomena are partially

observed, or because the data is too large or expensive to acquire all at once.

Analysis of incomplete data leads to skewed or misleading results. In this

paper, we investigate limitations of learning to complete partially observed

networks via node querying. Concretely, we study the following problem: given

(i) a partially observed network, (ii) the ability to query nodes for their

connections (e.g., by accessing an API), and (iii) a budget on the number of

such queries, sequentially learn which nodes to query in order to maximally

increase observability. We call this querying process Network Online Learning

and present a family of algorithms called NOL*. These algorithms learn to

choose which partially observed node to query next based on a parameterized

model that is trained online through a process of exploration and exploitation.

Extensive experiments on both synthetic and real world networks show that (i)

it is possible to sequentially learn to choose which nodes are best to query in

a network and (ii) some macroscopic properties of networks, such as the degree

distribution and modular structure, impact the potential for learning and the

optimal amount of random exploration.

Motif Difference Field: A Simple and Effective Image Representation of Time Series for Classification

Yadong Zhang , Xin Chen Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Time series motifs play an important role in the time series analysis. The

motif-based time series clustering is used for the discovery of higher-order

patterns or structures in time series data. Inspired by the convolutional

neural network (CNN) classifier based on the image representations of time

series, motif difference field (MDF) is proposed. Compared to other image

representations of time series, MDF is simple and easy to construct. With the

Fully Convolution Network (FCN) as the classifier, MDF demonstrates the

superior performance on the UCR time series dataset in benchmark with other

time series classification methods. It is interesting to find that the triadic

time series motifs give the best result in the test. Due to the motif

clustering reflected in MDF, the significant motifs are detected with the help

of the Gradient-weighted Class Activation Mapping (Grad-CAM). The areas in MDF

with high weight in Grad-CAM have a high contribution from the significant

motifs with the desired ordinal patterns associated with the signature patterns

in time series. However, the signature patterns cannot be identified with the

neural network classifiers directly based on the time series.

R2DE: a NLP approach to estimating IRT parameters of newly generated questions

Luca Benedetto , Andrea Cappelli , Roberto Turrin , Paolo Cremonesi Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Machine Learning (stat.ML)

The main objective of exams consists in performing an assessment of students’

expertise on a specific subject. Such expertise, also referred to as skill or

knowledge level, can then be leveraged in different ways (e.g., to assign a

grade to the students, to understand whether a student might need some support,

etc.). Similarly, the questions appearing in the exams have to be assessed in

some way before being used to evaluate students. Standard approaches to

questions’ assessment are either subjective (e.g., assessment by human experts)

or introduce a long delay in the process of question generation (e.g.,

pretesting with real students). In this work we introduce R2DE (which is a

Regressor for Difficulty and Discrimination Estimation), a model capable of

assessing newly generated multiple-choice questions by looking at the text of

the question and the text of the possible choices. In particular, it can

estimate the difficulty and the discrimination of each question, as they are

defined in Item Response Theory. We also present the results of extensive

experiments we carried out on a real world large scale dataset coming from an

e-learning platform, showing that our model can be used to perform an initial

assessment of newly created questions and ease some of the problems that arise

in question generation.

Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute

Tiphaine Viard , Thomas McLachlan , Hamidreza Ghader , Satoshi Sekine

Comments: 7 pages

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Wikipedia is a huge opportunity for machine learning, being the largest

semi-structured base of knowledge available. Because of this, countless works

examine its contents, and focus on structuring it in order to make it usable in

learning tasks, for example by classifying it into an ontology. Beyond its

textual contents, Wikipedia also displays a typical graph structure, where

pages are linked together through citations. In this paper, we address the task

of integrating graph (i.e. structure) information to classify Wikipedia into a

fine-grained named entity ontology (NE), the Extended Named Entity hierarchy.

To address this task, we first start by assessing the relevance of the graph

structure for NE classification. We then explore two directions, one related to

feature vectors using graph descriptors commonly used in large-scale network

analysis, and one extending flat classification to a weighted model taking into

account semantic similarity. We conduct at-scale practical experiments, on a

manually labeled subset of 22,000 pages extracted from the Japanese Wikipedia.

Our results show that integrating graph information succeeds at reducing

sparsity of the input feature space, and yields classification results that are

comparable or better than previous works.

Ensemble Genetic Programming

Nuno M. Rodrigues , João E. Batista , Sara Silva

Comments: eurogp 2020 submission

Subjects

:

Machine Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Ensemble learning is a powerful paradigm that has been usedin the top

state-of-the-art machine learning methods like Random Forestsand XGBoost.

Inspired by the success of such methods, we have devel-oped a new Genetic

Programming method called Ensemble GP. The evo-lutionary cycle of Ensemble GP

follows the same steps as other GeneticProgramming systems, but with

differences in the population structure,fitness evaluation and genetic

operators. We have tested this method oneight binary classification problems,

achieving results significantly betterthan standard GP, with much smaller

models. Although other methodslike M3GP and XGBoost were the best overall,

Ensemble GP was able toachieve exceptionally good generalization results on a

particularly hardproblem where none of the other methods was able to succeed.

Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping

Eugenio Bargiacchi , Timothy Verstraeten , Diederik M. Roijers , Ann Nowé Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We present a new model-based reinforcement learning algorithm, Cooperative

Prioritized Sweeping, for efficient learning in multi-agent Markov decision

processes. The algorithm allows for sample-efficient learning on large problems

by exploiting a factorization to approximate the value function. Our approach

only requires knowledge about the structure of the problem in the form of a

dynamic decision network. Using this information, our method learns a model of

the environment and performs temporal difference updates which affect multiple

joint states and actions at once. Batch updates are additionally performed

which efficiently back-propagate knowledge throughout the factored Q-function.

Our method outperforms the state-of-the-art algorithm sparse cooperative

Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized

environments.

Node Masking: Making Graph Neural Networks Generalize and Scale Better

Pushkar Mishra , Aleksandra Piktus , Gerard Goossen , Fabrizio Silvestri Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Graph Neural Networks (GNNs) have received a lot of interest in the recent

times. From the early spectral architectures that could only operate on

undirected graphs per a transductive learning paradigm to the current state of

the art spatial ones that can apply inductively to arbitrary graphs, GNNs have

seen significant contributions from the research community. In this paper, we

discuss some theoretical tools to better visualize the operations performed by

state of the art spatial GNNs. We analyze the inner workings of these

architectures and introduce a simple concept, node masking, that allows them to

generalize and scale better. To empirically validate the theory, we perform

several experiments on two widely used benchmark datasets for node

classification in both transductive and inductive settings.

The gap between theory and practice in function approximation with deep neural networks

Ben Adcock , Nick Dexter Subjects : Machine Learning (cs.LG) ; Numerical Analysis (math.NA); Machine Learning (stat.ML)

Deep learning (DL) is transforming whole industries as complicated

decision-making processes are being automated by Deep Neural Networks (DNNs)

trained on real-world data. Driven in part by a rapidly-expanding literature on

DNN approximation theory showing that DNNs can approximate a rich variety of

functions, these tools are increasingly being considered for problems in

scientific computing. Yet, unlike more traditional algorithms in this field,

relatively little is known about DNNs from the principles of numerical

analysis, namely, stability, accuracy, computational efficiency and sample

complexity. In this paper we introduce a computational framework for examining

DNNs in practice, and use it to study their empirical performance with regard

to these issues. We examine the performance of DNNs of different widths and

depths on a variety of test functions in various dimensions, including smooth

and piecewise smooth functions. We also compare DL against best-in-class

methods for smooth function approximation based on compressed sensing. Our main

conclusion is that there is a crucial gap between the approximation theory of

DNNs and their practical performance, with trained DNNs performing relatively

poorly on functions for which there are strong approximation results (e.g.

smooth functions), yet performing well in comparison to best-in-class methods

for other functions. Finally, we present a novel practical existence theorem,

which asserts the existence of a DNN architecture and training procedure which

offers the same performance as current best-in-class schemes. This result

indicates the potential for practical DNN approximation, and the need for

future research into practical architecture design and training strategies.

Engineering AI Systems: A Research Agenda

Jan Bosch , Ivica Crnkovic , Helena Holmström Olsson

Comments: 13 pages, 3 figures, highlights section

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Deploying machine-, and in particular deep-learning, (ML/DL) solutions in

industry-strength, production quality contexts proves to challenging. This

requires a structured engineering approach to constructing and evolving systems

that contain ML/DL components. In this paper, we provide a conceptualization of

the typical evolution patterns that companies experience when employing ML/DL

well as a framework for integrating ML/DL components in systems consisting of

multiple types of components. In addition, we provide an overview of the

engineering challenges surrounding AI/ML/DL solutions and, based on that, we

provide a research agenda and overview of open items that need to be addressed

by the research community at large.

Unsupervisedly Learned Representations: Should the Quest be Over?

Daniel N. Nissani (Nissensohn)

Comments: submitted to Pattern Recognition Letters

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI)

There exists a Classification accuracy gap of about 20% between our best

methods of generating Unsupervisedly Learned Representations and the accuracy

rates achieved by (naturally Unsupervisedly Learning) humans. We are at our

fourth decade at least in search of this class of paradigms. It thus may well

be that we are looking in the wrong direction. We present in this paper a

possible solution to this puzzle. We demonstrate that Reinforcement Learning

schemes can learn representations, which may be used for Pattern Recognition

tasks such as Classification, achieving practically the same accuracy as that

of humans. Our main modest contribution lies in the observations that: a. when

applied to a real world environment (e.g. nature itself) Reinforcement Learning

does not require labels, and thus may be considered a natural candidate for the

long sought, accuracy competitive Unsupervised Learning method, and b. in

contrast, when Reinforcement Learning is applied in a simulated or symbolic

processing environment (e.g. a computer program) it does inherently require

labels and should thus be generally classified, with some exceptions, as

Supervised Learning. The corollary of these observations is that further search

for Unsupervised Learning competitive paradigms which may be trained in

simulated environments like many of those found in research and applications

may be futile.

Fast Sequence-Based Embedding with Diffusion Graphs

Benedek Rozemberczki , Rik Sarkar

Comments: Source code available at: this https URL

Journal-ref: CompleNet 2018

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A graph embedding is a representation of graph vertices in a low-dimensional

space, which approximately preserves properties such as distances between

nodes. Vertex sequence-based embedding procedures use features extracted from

linear sequences of nodes to create embeddings using a neural network. In this

paper, we propose diffusion graphs as a method to rapidly generate vertex

sequences for network embedding. Its computational efficiency is superior to

previous methods due to simpler sequence generation, and it produces more

accurate results. In experiments, we found that the performance relative to

other methods improves with increasing edge density in the graph. In a

community detection task, clustering nodes in the embedding space produces

better results compared to other sequence-based embedding methods.

Learning to Control PDEs with Differentiable Physics

Philipp Holl , Vladlen Koltun , Nils Thuerey

Comments: Published as a conference paper at ICLR 2020. Main text: 10 pages, 6 figures, 3 tables. Total: 28 pages, 18 figures

Subjects

:

Machine Learning (cs.LG)

; Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)

Predicting outcomes and planning interactions with the physical world are

long-standing goals for machine learning. A variety of such tasks involves

continuous physical systems, which can be described by partial differential

equations (PDEs) with many degrees of freedom. Existing methods that aim to

control the dynamics of such systems are typically limited to relatively short

time frames or a small number of interaction parameters. We present a novel

hierarchical predictor-corrector scheme which enables neural networks to learn

to understand and control complex nonlinear physical systems over long time

frames. We propose to split the problem into two distinct tasks: planning and

control. To this end, we introduce a predictor network that plans optimal

trajectories and a control network that infers the corresponding control

parameters. Both stages are trained end-to-end using a differentiable PDE

solver. We demonstrate that our method successfully develops an understanding

of complex physical systems and learns to control them for tasks involving PDEs

such as the incompressible Navier-Stokes equations.

Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects

Fredrik D. Johansson , Uri Shalit , Nathan Kallus , David Sontag Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Practitioners in diverse fields such as healthcare, economics and education

are eager to apply machine learning to improve decision making. The cost and

impracticality of performing experiments and a recent monumental increase in

electronic record keeping has brought attention to the problem of evaluating

decisions based on non-experimental observational data. This is the setting of

this work. In particular, we study estimation of individual-level causal

effects, such as a single patient’s response to alternative medication, from

recorded contexts, decisions and outcomes. We give generalization bounds on the

error in estimated effects based on distance measures between groups receiving

different treatments, allowing for sample re-weighting. We provide conditions

under which our bound is tight and show how it relates to results for

unsupervised domain adaptation. Led by our theoretical results, we devise

representation learning algorithms that minimize our bound, by regularizing the

representation’s induced treatment group distance, and encourage sharing of

information between treatment groups. We extend these algorithms to

simultaneously learn a weighted representation to further reduce treatment

group distances. Finally, an experimental evaluation on real and synthetic data

shows the value of our proposed representation architecture and regularization

scheme.

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Carlos Fernandez , Foster Provost , Xintian Han Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Lack of understanding of the decisions made by model-based AI systems is an

important barrier for their adoption. We examine counterfactual explanations as

an alternative for explaining AI decisions. The counterfactual approach defines

an explanation as a set of the system’s data inputs that causally drives the

decision (meaning that removing them changes the decision) and is irreducible

(meaning that removing any subset of the inputs in the explanation does not

change the decision). We generalize previous work on counterfactual

explanations, resulting in a framework that (a) is model-agnostic, (b) can

address features with arbitrary data types, (c) is able explain decisions made

by complex AI systems that incorporate multiple models, and (d) is scalable to

large numbers of features. We also propose a heuristic procedure to find the

most useful explanations depending on the context. We contrast counterfactual

explanations with another alternative: methods that explain model predictions

by weighting features according to their importance (e.g., SHAP, LIME). This

paper presents two fundamental reasons why explaining model predictions is not

the same as explaining the decisions made using those predictions, suggesting

we should carefully consider whether importance-weight explanations are

well-suited to explain decisions made by AI systems. Specifically, we show that

(1) features that have a large importance weight for a model prediction may not

actually affect the corresponding decision, and (2) importance weights are

insufficient to communicate whether and how features influence system

decisions. We demonstrate this using several examples, including three detailed

studies using real-world data that compare the counterfactual approach with

SHAP and illustrate various conditions under which counterfactual explanations

explain data-driven decisions better than feature importance weights.

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Jinlong Liu , Guoqing Jiang , Yunzhi Bai , Ting Chen , Huayan Wang

Comments: 14 pages, 8 figures, ICLR2020 accepted as spotlight presentation

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

As deep neural networks (DNNs) achieve tremendous success across many

application domains, researchers tried to explore in many aspects on why they

generalize well. In this paper, we provide a novel perspective on these issues

using the gradient signal to noise ratio (GSNR) of parameters during training

process of DNNs. The GSNR of a parameter is defined as the ratio between its

gradient’s squared mean and variance, over the data distribution. Based on

several approximations, we establish a quantitative relationship between model

parameters’ GSNR and the generalization gap. This relationship indicates that

larger GSNR during training process leads to better generalization performance.

Moreover, we show that, different from that of shallow models (e.g. logistic

regression, support vector machines), the gradient descent optimization

dynamics of DNNs naturally produces large GSNR during training, which is

probably the key to DNNs’ remarkable generalization ability.

On the infinite width limit of neural networks with a standard parameterization

Jascha Sohl-Dickstein , Roman Novak , Samuel S. Schoenholz , Jaehoon Lee Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

There are currently two parameterizations used to derive fixed kernels

corresponding to infinite width neural networks, the NTK (Neural Tangent

Kernel) parameterization and the naive standard parameterization. However, the

extrapolation of both of these parameterizations to infinite width is

problematic. The standard parameterization leads to a divergent neural tangent

kernel while the NTK parameterization fails to capture crucial aspects of

finite width networks such as: the dependence of training dynamics on relative

layer widths, the relative training dynamics of weights and biases, and a

nonstandard learning rate scale. Here we propose an improved extrapolation of

the standard parameterization that preserves all of these properties as width

is taken to infinity and yields a well-defined neural tangent kernel. We show

experimentally that the resulting kernels typically achieve similar accuracy to

those resulting from an NTK parameterization, but with better correspondence to

the parameterization of typical finite width networks. Additionally, with

careful tuning of width parameters, the improved standard parameterization

kernels can outperform those stemming from an NTK parameterization. We release

code implementing this improved standard parameterization as part of the Neural

Tangents library at this https URL .

Mixed integer programming formulation of unsupervised learning

Arturo Berrones-Santos Subjects : Machine Learning (cs.LG) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)

A novel formulation and training procedure for full Boltzmann machines in

terms of a mixed binary quadratic feasibility problem is given. As a proof of

concept, the theory is analytically and numerically tested on XOR patterns.

SGLB: Stochastic Gradient Langevin Boosting

Aleksei Ustimenko , Liudmila Prokhorenkova Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

In this paper, we introduce Stochastic Gradient Langevin Boosting (SGLB) – a

powerful and efficient machine learning framework, which may deal with a wide

range of loss functions and has provable generalization guarantees. The method

is based on a special form of Langevin Diffusion equation specifically designed

for gradient boosting. This allows us to guarantee the global convergence,

while standard gradient boosting algorithms can guarantee only local optima,

which is a problem for multimodal loss functions. To illustrate the advantages

of SGLB, we apply it to a classification task with 0-1 loss function, which is

known to be multimodal, and to a standard Logistic regression task that is

convex. The algorithm is implemented as a part of the CatBoost gradient

boosting library and outperforms classic gradient boosting methods.

Heterogeneous Transfer Learning in Ensemble Clustering

Vladimir Berikov

Comments: 10 pages, 5 figures

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

This work proposes an ensemble clustering method using transfer learning

approach. We consider a clustering problem, in which in addition to data under

consideration, “similar” labeled data are available. The datasets can be

described with different features. The method is based on constructing

meta-features which describe structural characteristics of data, and their

transfer from source to target domain. An experimental study of the method

using Monte Carlo modeling has confirmed its efficiency. In comparison with

other similar methods, the proposed one is able to work under arbitrary feature

descriptions of source and target domains; it has smaller complexity.

Model Reuse with Reduced Kernel Mean Embedding Specification

Xi-Zhu Wu , Wenkai Xu , Song Liu , Zhi-Hua Zhou Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Given a publicly available pool of machine learning models constructed for

various tasks, when a user plans to build a model for her own machine learning

application, is it possible to build upon models in the pool such that the

previous efforts on these existing models can be reused rather than starting

from scratch? Here, a grand challenge is how to find models that are helpful

for the current application, without accessing the raw training data for the

models in the pool. In this paper, we present a two-phase framework. In the

upload phase, when a model is uploading into the pool, we construct a reduced

kernel mean embedding (RKME) as a specification for the model. Then in the

deployment phase, the relatedness of the current task and pre-trained models

will be measured based on the value of the RKME specification. Theoretical

results and extensive experiments validate the effectiveness of our approach.

An interpretable neural network model through piecewise linear approximation

Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Most existing interpretable methods explain a black-box model in a post-hoc

manner, which uses simpler models or data analysis techniques to interpret the

predictions after the model is learned. However, they (a) may derive

contradictory explanations on the same predictions given different methods and

data samples, and (b) focus on using simpler models to provide higher

descriptive accuracy at the sacrifice of prediction accuracy. To address these

issues, we propose a hybrid interpretable model that combines a piecewise

linear component and a nonlinear component. The first component describes the

explicit feature contributions by piecewise linear approximation to increase

the expressiveness of the model. The other component uses a multi-layer

perceptron to capture feature interactions and implicit nonlinearity, and

increase the prediction performance. Different from the post-hoc approaches,

the interpretability is obtained once the model is learned in the form of

feature shapes. We also provide a variant to explore higher-order interactions

among features to demonstrate that the proposed model is flexible for

adaptation. Experiments demonstrate that the proposed model can achieve good

interpretability by describing feature shapes while maintaining

state-of-the-art accuracy.

Projection based Active Gaussian Process Regression for Pareto Front Modeling

Zhengqi Gao , Jun Tao , Yangfeng Su , Dian Zhou , Xuan Zeng Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Pareto Front (PF) modeling is essential in decision making problems across

all domains such as economics, medicine or engineering. In Operation Research

literature, this task has been addressed based on multi-objective optimization

algorithms. However, without learning models for PF, these methods cannot

examine whether a new provided point locates on PF or not. In this paper, we

reconsider the task from Data Mining perspective. A novel projection based

active Gaussian process regression (P- aGPR) method is proposed for efficient

PF modeling. First, P- aGPR chooses a series of projection spaces with

dimensionalities ranking from low to high. Next, in each projection space, a

Gaussian process regression (GPR) model is trained to represent the constraint

that PF should satisfy in that space. Moreover, in order to improve modeling

efficacy and stability, an active learning framework has been developed by

exploiting the uncertainty information obtained in the GPR models. Different

from all existing methods, our proposed P-aGPR method can not only provide a

generative PF model, but also fast examine whether a provided point locates on

PF or not. The numerical results demonstrate that compared to state-of-the-art

passive learning methods the proposed P-aGPR method can achieve higher modeling

accuracy and stability.

A point-wise linear model reveals reasons for 30-day readmission of heart failure patients

Yasuho Yamashita , Takuma Shibahara , Junichi Kuwata

Comments: 8 pages, 3 figures

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Heart failures in the United States cost an estimated 30.7 billion dollars

annually and predictive analysis can decrease costs due to readmission of heart

failure patients. Deep learning can predict readmissions but does not give

reasons for its predictions. Ours is the first study on a deep-learning

approach to explaining decisions behind readmission predictions. Additionally,

it provides an automatic patient stratification to explain cohorts of

readmitted patients. The new deep-learning model called a point-wise linear

model is a meta-learning machine of linear models. It generates a logistic

regression model to predict early readmission for each patient. The custom-made

prediction models allow us to analyze feature importance. We evaluated the

approach using a dataset that had 30-days readmission patients with heart

failures. This study has been submitted in PLOS ONE. In advance, we would like

to share the theoretical aspect of the point-wise linear model as a part of our

study.

Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications

Qing Qu , Zhihui Zhu , Xiao Li , Manolis C. Tsakiris , John Wright , René Vidal

Comments: QQ and ZZ contributed equally to the work. Invited review paper for IEEE Signal Processing Magazine Special Issue on non-convex optimization for signal processing and machine learning. This article contains 26 pages with 11 figures

Subjects

:

Machine Learning (cs.LG)

; Information Theory (cs.IT); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML)

The problem of finding the sparsest vector (direction) in a low dimensional

subspace can be considered as a homogeneous variant of the sparse recovery

problem, which finds applications in robust subspace recovery, dictionary

learning, sparse blind deconvolution, and many other problems in signal

processing and machine learning. However, in contrast to the classical sparse

recovery problem, the most natural formulation for finding the sparsest vector

in a subspace is usually nonconvex. In this paper, we overview recent advances

on global nonconvex optimization theory for solving this problem, ranging from

geometric analysis of its optimization landscapes, to efficient optimization

algorithms for solving the associated nonconvex optimization problem, to

applications in machine intelligence, representation learning, and imaging

sciences. Finally, we conclude this review by pointing out several interesting

open problems for future research.

Reinforcement Learning with Probabilistically Complete Exploration

Philippe Morere , Gilad Francis , Tom Blau , Fabio Ramos Subjects : Machine Learning (cs.LG) ; Robotics (cs.RO); Machine Learning (stat.ML)

Balancing exploration and exploitation remains a key challenge in

reinforcement learning (RL). State-of-the-art RL algorithms suffer from high

sample complexity, particularly in the sparse reward case, where they can do no

better than to explore in all directions until the first positive rewards are

found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement

Learning (R3L). We formulate exploration as a search problem and leverage

widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to

find initial solutions. These solutions are used as demonstrations to

initialize a policy, then refined by a generic RL algorithm, leading to faster

and more stable convergence. We provide theoretical guarantees of R3L

exploration finding successful solutions, as well as bounds for its sampling

complexity. We experimentally demonstrate the method outperforms classic and

intrinsic exploration techniques, requiring only a fraction of exploration

samples and achieving better asymptotic performance.

Memory capacity of neural networks with threshold and ReLU activations

Roman Vershynin

Comments: 25 pages

Subjects

:

Machine Learning (cs.LG)

; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Overwhelming theoretical and empirical evidence shows that mildly

overparametrized neural networks — those with more connections than the size

of the training data — are often able to memorize the training data with

(100\%) accuracy. This was rigorously proved for networks with sigmoid

activation functions and, very recently, for ReLU activations. Addressing a

1988 open question of Baum, we prove that this phenomenon holds for general

multilayered perceptrons, i.e. neural networks with threshold activation

functions, or with any mix of threshold and ReLU activations. Our construction

is probabilistic and exploits sparsity.

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Jie Gui , Zhenan Sun , Yonggang Wen , Dacheng Tao , Jieping Ye Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Generative adversarial networks (GANs) are a hot research topic recently.

GANs have been widely studied since 2014, and a large number of algorithms have

been proposed. However, there is few comprehensive study explaining the

connections among different GANs variants, and how they have evolved. In this

paper, we attempt to provide a review on various GANs methods from the

perspectives of algorithms, theory, and applications. Firstly, the motivations,

mathematical representations, and structure of most GANs algorithms are

introduced in details. Furthermore, GANs have been combined with other machine

learning algorithms for specific applications, such as semi-supervised

learning, transfer learning, and reinforcement learning. This paper compares

the commonalities and differences of these GANs methods. Secondly, theoretical

issues related to GANs are investigated. Thirdly, typical applications of GANs

in image processing and computer vision, natural language processing, music,

speech and audio, medical field, and data science are illustrated. Finally, the

future open research problems for GANs are pointed out.

Infrequent adverse event prediction in low carbon energy production using machine learning

Stefano Coniglio , Anthony J. Dunn , Alain B. Zemkoho Subjects : Machine Learning (cs.LG) ; Optimization and Control (math.OC)

Machine Learning is one of the fastest growing fields in academia. Many

industries are aiming to incorporate machine learning tools into their day to

day operation. However the keystone of doing so, is recognising when you have a

problem which can be solved using machine learning. Adverse event prediction is

one such problem. There are a wide range of methods for the production of

sustainable energy. In many of which adverse events can occur which can impede

energy production and even damage equipment. The two examples of adverse event

prediction in sustainable energy production we examine in this paper are foam

formation in anaerobic digestion and condenser fouling in steam turbines as

used in nuclear power stations. In this paper we will propose a framework for:

formalising a classification problem based around adverse event prediction,

building predictive maintenance models capable of predicting these events

before they occur and testing the reliability of these models.

Distributionally Robust Bayesian Quadrature Optimization

Thanh Tang Nguyen , Sunil Gupta , Huong Ha , Santu Rana , Svetha Venkatesh

Comments: AISTATS2020

Journal-ref: AISTATS2020

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Bayesian quadrature optimization (BQO) maximizes the expectation of an

expensive black-box integrand taken over a known probability distribution. In

this work, we study BQO under distributional uncertainty in which the

underlying probability distribution is unknown except for a limited set of its

i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of

the true expected objective given the fixed sample set. Though Monte Carlo

estimate is unbiased, it has high variance given a small set of samples; thus

can result in a spurious objective function. We adopt the distributionally

robust optimization perspective to this problem by maximizing the expected

objective under the most adversarial distribution. In particular, we propose a

novel posterior sampling based algorithm, namely distributionally robust BQO

(DRBQO) for this purpose. We demonstrate the empirical effectiveness of our

proposed framework in synthetic and real-world problems, and characterize its

theoretical convergence via Bayesian regret.

Discriminator Soft Actor Critic without Extrinsic Rewards

Daichi Nishio , Daiki Kuyoshi , Toi Tsuneda , Satoshi Yamane Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

It is difficult to be able to imitate well in unknown states from a small

amount of expert data and sampling data. Supervised learning methods such as

Behavioral Cloning do not require sampling data, but usually suffer from

distribution shift. The methods based on reinforcement learning, such as

inverse reinforcement learning and generative adversarial imitation learning

(GAIL), can learn from only a few expert data. However, they often need to

interact with the environment. Soft Q imitation learning addressed the

problems, and it was shown that it could learn efficiently by combining

Behavioral Cloning and soft Q-learning with constant rewards. In order to make

this algorithm more robust to distribution shift, We propose Discriminator Soft

Actor Critic (DSAC). It uses a reward function based on adversarial inverse

reinforcement learning instead of constant rewards. We evaluated it on PyBullet

environments with only four expert trajectories.

Learning Options from Demonstration using Skill Segmentation

Matthew Cockcroft , Shahil Mawjee , Steven James , Pravesh Ranchod

Comments: To be published in SAUPEC/RobMech/PRASA 2020. Consists of 6 pages, 5 figures

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

We present a method for learning options from segmented demonstration

trajectories. The trajectories are first segmented into skills using

nonparametric Bayesian clustering and a reward function for each segment is

then learned using inverse reinforcement learning. From this, a set of inferred

trajectories for the demonstration are generated. Option initiation sets and

termination conditions are learned from these trajectories using the one-class

support vector machine clustering algorithm. We demonstrate our method in the

four rooms domain, where an agent is able to autonomously discover usable

options from human demonstration. Our results show that these inferred options

can then be used to improve learning and planning.

Gradient Surgery for Multi-Task Learning

Tianhe Yu , Saurabh Kumar , Abhishek Gupta , Sergey Levine , Karol Hausman , Chelsea Finn Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

While deep learning and deep reinforcement learning (RL) systems have

demonstrated impressive results in domains such as image classification, game

playing, and robotic control, data efficiency remains a major challenge.

Multi-task learning has emerged as a promising approach for sharing structure

across multiple tasks to enable more efficient learning. However, the

multi-task setting presents a number of optimization challenges, making it

difficult to realize large efficiency gains compared to learning tasks

independently. The reasons why multi-task learning is so challenging compared

to single-task learning are not fully understood. In this work, we identify a

set of three conditions of the multi-task optimization landscape that cause

detrimental gradient interference, and develop a simple yet general approach

for avoiding such interference between task gradients. We propose a form of

gradient surgery that projects a task’s gradient onto the normal plane of the

gradient of any other task that has a conflicting gradient. On a series of

challenging multi-task supervised and multi-task RL problems, this approach

leads to substantial gains in efficiency and performance. Further, it is

model-agnostic and can be combined with previously-proposed multi-task

architectures for enhanced performance.

Algebraic and Analytic Approaches for Parameter Learning in Mixture Models

Akshay Krishnamurthy , Arya Mazumdar , Andrew McGregor , Soumyabrata Pal

Comments: 22 pages, Accepted at Algorithmic Learning Theory (ALT) 2020

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

We present two different approaches for parameter learning in several mixture

models in one dimension. Our first approach uses complex-analytic methods and

applies to Gaussian mixtures with shared variance, binomial mixtures with

shared success probability, and Poisson mixtures, among others. An example

result is that (exp(O(N^{1/3}))) samples suffice to exactly learn a mixture of

(k<N) Poisson distributions, each with integral rate parameters bounded by (N).

Our second approach uses algebraic and combinatorial tools and applies to

binomial mixtures with shared trial parameter (N) and differing success

parameters, as well as to mixtures of geometric distributions. Again, as an

example, for binomial mixtures with (k) components and success parameters

discretized to resolution (epsilon), (O(k^2(N/epsilon)^{8/sqrt{epsilon}}))

samples suffice to exactly recover the parameters. For some of these

distributions, our results represent the first guarantees for parameter

estimation.

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Juan Vargas , Lazar Andjelic , Amir Barati Farimani

Comments: 10 pages, 5 figures

Subjects

:

Machine Learning (cs.LG)

; Robotics (cs.RO); Machine Learning (stat.ML)

Actor critic methods with sparse rewards in model-based deep reinforcement

learning typically require a deterministic binary reward function that reflects

only two possible outcomes: if, for each step, the goal has been achieved or

not. Our hypothesis is that we can influence an agent to learn faster by

applying an external environmental pressure during training, which adversely

impacts its ability to get higher rewards. As such, we deviate from the

classical paradigm of sparse rewards and add a uniformly sampled reward value

to the baseline reward to show that (1) sample efficiency of the training

process can be correlated to the adversity experienced during training, (2) it

is possible to achieve higher performance in less time and with less resources,

(3) we can reduce the performance variability experienced seed over seed, (4)

there is a maximum point after which more pressure will not generate better

results, and (5) that random positive incentives have an adverse effect when

using a negative reward strategy, making an agent under those conditions learn

poorly and more slowly. These results have been shown to be valid for Deep

Deterministic Policy Gradients using Hindsight Experience Replay in a well

known Mujoco environment, but we argue that they could be generalized to other

methods and environments as well.

A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

Marek Śmieja , Łukasz Struski , Mário A. T. Figueiredo Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

In this paper, we introduce a neural network framework for semi-supervised

clustering (SSC) with pairwise (must-link or cannot-link) constraints. In

contrast to existing approaches, we decompose SSC into two simpler

classification tasks/stages: the first stage uses a pair of Siamese neural

networks to label the unlabeled pairs of points as must-link or cannot-link;

the second stage uses the fully pairwise-labeled dataset produced by the first

stage in a supervised neural-network-based clustering method. The proposed

approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is

motivated by the observation that binary classification (such as assigning

pairwise relations) is usually easier than multi-class clustering with partial

supervision. On the other hand, being classification-based, our method solves

only well-defined classification problems, rather than less well specified

clustering tasks. Extensive experiments on various datasets demonstrate the

high performance of the proposed method.

Learning to See Analogies: A Connectionist Exploration

Douglas S. Blank

Comments: 191 pages, PhD thesis

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This dissertation explores the integration of learning and analogy-making

through the development of a computer program, called Analogator, that learns

to make analogies by example. By “seeing” many different analogy problems,

along with possible solutions, Analogator gradually develops an ability to make

new analogies. That is, it learns to make analogies by analogy. This approach

stands in contrast to most existing research on analogy-making, in which

typically the a priori existence of analogical mechanisms within a model is

assumed. The present research extends standard connectionist methodologies by

developing a specialized associative training procedure for a recurrent network

architecture. The network is trained to divide input scenes (or situations)

into appropriate figure and ground components. Seeing one scene in terms of a

particular figure and ground provides the context for seeing another in an

analogous fashion. After training, the model is able to make new analogies

between novel situations. Analogator has much in common with lower-level

perceptual models of categorization and recognition; it thus serves as a

unifying framework encompassing both high-level analogical learning and

low-level perception. This approach is compared and contrasted with other

computational models of analogy-making. The model’s training and generalization

performance is examined, and limitations are discussed.

OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning

Shuo Wang , Tianle Chen , Shangyu Chen , Carsten Rudolph , Surya Nepal , Marthie Grobler Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Machine Learning (stat.ML)

Anomaly detection aims to recognize samples with anomalous and unusual

patterns with respect to a set of normal data, which is significant for

numerous domain applications, e.g. in industrial inspection, medical imaging,

and security enforcement. There are two key research challenges associated with

existing anomaly detention approaches: (1) many of them perform well on

low-dimensional problems however the performance on high-dimensional instances

is limited, such as images; (2) many of them depend on often still rely on

traditional supervised approaches and manual engineering of features, while the

topic has not been fully explored yet using modern deep learning approaches,

even when the well-label samples are limited. In this paper, we propose a

One-for-all Image Anomaly Detection system (OIAD) based on disentangled

learning using only clean samples. Our key insight is that the impact of small

perturbation on the latent representation can be bounded for normal samples

while anomaly images are usually outside such bounded intervals, called

structure consistency. We implement this idea and evaluate its performance for

anomaly detention. Our experiments with three datasets show that OIAD can

detect over (90\%) of anomalies while maintaining a high low false alarm rate.

It can also detect suspicious samples from samples labeled as clean, coincided

with what humans would deem unusual.

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

Samaneh Hosseini Semnani , Hugh Liu , Michael Everett , Anton de Ruiter , Jonathan P. How Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)

This paper introduces a hybrid algorithm of deep reinforcement learning (RL)

and Force-based motion planning (FMP) to solve distributed motion planning

problem in dense and dynamic environments. Individually, RL and FMP algorithms

each have their own limitations. FMP is not able to produce time-optimal paths

and existing RL solutions are not able to produce collision-free paths in dense

environments. Therefore, we first tried improving the performance of recent RL

approaches by introducing a new reward function that not only eliminates the

requirement of a pre supervised learning (SL) step but also decreases the

chance of collision in crowded environments. That improved things, but there

were still a lot of failure cases. So, we developed a hybrid approach to

leverage the simpler FMP approach in stuck, simple and high-risk cases, and

continue using RL for normal cases in which FMP can’t produce optimal path.

Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show

that the proposed algorithm outperforms both deep RL and FMP algorithms and

produces up to 50% more successful scenarios than deep RL and up to 75% less

extra time to reach goal than FMP.

Regularized Cycle Consistent Generative Adversarial Network for Anomaly Detection

Ziyi Yang , Iman Soltani Bozchalooi , Eric Darve

Comments: the 24th European Conference on Artificial Intelligence (ECAI 2020)

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

In this paper, we investigate algorithms for anomaly detection. Previous

anomaly detection methods focus on modeling the distribution of non-anomalous

data provided during training. However, this does not necessarily ensure the

correct detection of anomalous data. We propose a new Regularized Cycle

Consistent Generative Adversarial Network (RCGAN) in which deep neural networks

are adversarially trained to better recognize anomalous samples. This approach

is based on leveraging a penalty distribution with a new definition of the loss

function and novel use of discriminator networks. It is based on a solid

mathematical foundation, and proofs show that our approach has stronger

guarantees for detecting anomalous examples compared to the current

state-of-the-art. Experimental results on both real-world and synthetic data

show that our model leads to significant and consistent improvements on

previous anomaly detection benchmarks. Notably, RCGAN improves on the

state-of-the-art on the KDDCUP, Arrhythmia, Thyroid, Musk and CIFAR10 datasets.

FlexiBO: Cost-Aware Multi-Objective Optimization of Deep Neural Networks

Md Shahriar Iqbal , Jianhai Su , Lars Kotthoff , Pooyan Jamshidi

Comments: 19 pages

Subjects

:

Machine Learning (cs.LG)

; Optimization and Control (math.OC); Machine Learning (stat.ML)

One of the key challenges in designing machine learning systems is to

determine the right balance amongst several objectives, which also oftentimes

are incommensurable and conflicting. For example, when designing deep neural

networks (DNNs), one often has to trade-off between multiple objectives, such

as accuracy, energy consumption, and inference time. Typically, there is no

single configuration that performs equally well for all objectives.

Consequently, one is interested in identifying Pareto-optimal designs. Although

different multi-objective optimization algorithms have been developed to

identify Pareto-optimal configurations, state-of-the-art multi-objective

optimization methods do not consider the different evaluation costs attending

the objectives under consideration. This is particularly important for

optimizing DNNs: the cost arising on account of assessing the accuracy of DNNs

is orders of magnitude higher than that of measuring the energy consumption of

pre-trained DNNs. We propose FlexiBO, a flexible Bayesian optimization method,

to address this issue. We formulate a new acquisition function based on the

improvement of the Pareto hyper-volume weighted by the measurement cost of each

objective. Our acquisition function selects the next sample and objective that

provides maximum information gain per unit of cost. We evaluated FlexiBO on 7

state-of-the-art DNNs for object detection, natural language processing, and

speech recognition. Our results indicate that, when compared to other

state-of-the-art methods across the 7 architectures we tested, the Pareto front

obtained using FlexiBO has, on average, a 28.44% higher contribution to the

true Pareto front and achieves 25.64% better diversity.

Scalable Bid Landscape Forecasting in Real-time Bidding

Aritra Ghosh , Saayan Mitra , Somdeb Sarkhel , Jason Xie , Gang Wu , Viswanathan Swaminathan

Comments: Appeared in ECML-PKDD 2019

Subjects

:

Machine Learning (cs.LG)

; Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)

In programmatic advertising, ad slots are usually sold using second-price

(SP) auctions in real-time. The highest bidding advertiser wins but pays only

the second-highest bid (known as the winning price). In SP, for a single item,

the dominant strategy of each bidder is to bid the true value from the bidder’s

perspective. However, in a practical setting, with budget constraints, bidding

the true value is a sub-optimal strategy. Hence, to devise an optimal bidding

strategy, it is of utmost importance to learn the winning price distribution

accurately. Moreover, a demand-side platform (DSP), which bids on behalf of

advertisers, observes the winning price if it wins the auction. For losing

auctions, DSPs can only treat its bidding price as the lower bound for the

unknown winning price. In literature, typically censored regression is used to

model such partially observed data. A common assumption in censored regression

is that the winning price is drawn from a fixed variance (homoscedastic)

uni-modal distribution (most often Gaussian). However, in reality, these

assumptions are often violated. We relax these assumptions and propose a

heteroscedastic fully parametric censored regression approach, as well as a

mixture density censored network. Our approach not only generalizes censored

regression but also provides flexibility to model arbitrarily distributed

real-world data. Experimental evaluation on the publicly available dataset for

winning price estimation demonstrates the effectiveness of our method.

Furthermore, we evaluate our algorithm on one of the largest demand-side

platforms and significant improvement has been achieved in comparison with the

baseline solutions.

Inference for Network Structure and Dynamics from Time Series Data via Graph Neural Network

Mengyuan Chen , Jiang Zhang , Zhang Zhang , Lun Du , Qiao Hu , Shuo Wang , Jiaqi Zhu Subjects : Machine Learning (cs.LG) ; Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Network structures in various backgrounds play important roles in social,

technological, and biological systems. However, the observable network

structures in real cases are often incomplete or unavailable due to measurement

errors or private protection issues. Therefore, inferring the complete network

structure is useful for understanding complex systems. The existing studies

have not fully solved the problem of inferring network structure with partial

or no information about connections or nodes. In this paper, we tackle the

problem by utilizing time series data generated by network dynamics. We regard

the network inference problem based on dynamical time series data as a problem

of minimizing errors for predicting future states and proposed a novel

data-driven deep learning model called Gumbel Graph Network (GGN) to solve the

two kinds of network inference problems: Network Reconstruction and Network

Completion. For the network reconstruction problem, the GGN framework includes

two modules: the dynamics learner and the network generator. For the network

completion problem, GGN adds a new module called the States Learner to infer

missing parts of the network. We carried out experiments on discrete and

continuous time series data. The experiments show that our method can

reconstruct up to 100% network structure on the network reconstruction task.

While the model can also infer the unknown parts of the structure with up to

90% accuracy when some nodes are missing. And the accuracy decays with the

increase of the fractions of missing nodes. Our framework may have wide

application areas where the network structure is hard to obtained and the time

series data is rich.

Neighborhood Structure Assisted Non-negative Matrix Factorization and its Application in Unsupervised Point Anomaly Detection

Imtiaz Ahmed , Xia Ben Hu , Mithun P. Acharya , Yu Ding Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Dimensionality reduction is considered as an important step for ensuring

competitive performance in unsupervised learning such as anomaly detection.

Non-negative matrix factorization (NMF) is a popular and widely used method to

accomplish this goal. But NMF, together with its recent, enhanced version, like

graph regularized NMF or symmetric NMF, do not have the provision to include

the neighborhood structure information and, as a result, may fail to provide

satisfactory performance in presence of nonlinear manifold structure. To

address that shortcoming, we propose to consider and incorporate the

neighborhood structural similarity information within the NMF framework by

modeling the data through a minimum spanning tree. What motivates our choice is

the understanding that in the presence of complicated data structure, a minimum

spanning tree can approximate the intrinsic distance between two data points

better than a simple Euclidean distance does, and consequently, it constitutes

a more reasonable basis for differentiating anomalies from the normal class

data. We label the resulting method as the neighborhood structure assisted NMF.

By comparing the formulation and properties of the neighborhood structure

assisted NMF with other versions of NMF including graph regularized NMF and

symmetric NMF, it is apparent that the inclusion of the neighborhood structure

information using minimum spanning tree makes a key difference. We further

devise both offline and online algorithmic versions of the proposed method.

Empirical comparisons using twenty benchmark datasets as well as an industrial

dataset extracted from a hydropower plant demonstrate the superiority of the

neighborhood structure assisted NMF and support our claim of merit.

Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems

Jaimie Drozdal , Justin Weisz , Dakuo Wang , Gaurav Dass , Bingsheng Yao , Changruo Zhao , Michael Muller , Lin Ju , Hui Su

Comments: IUI 2020

Subjects

:

Machine Learning (cs.LG)

; Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

We explore trust in a relatively new area of data science: Automated Machine

Learning (AutoML). In AutoML, AI methods are used to generate and optimize

machine learning models by automatically engineering features, selecting

models, and optimizing hyperparameters. In this paper, we seek to understand

what kinds of information influence data scientists’ trust in the models

produced by AutoML? We operationalize trust as a willingness to deploy a model

produced using automated methods. We report results from three studies —

qualitative interviews, a controlled experiment, and a card-sorting task — to

understand the information needs of data scientists for establishing trust in

AutoML systems. We find that including transparency features in an AutoML tool

increased user trust and understandability in the tool; and out of all proposed

features, model performance metrics and visualizations are the most important

information to data scientists when establishing their trust with an AutoML

tool.

K-NN active learning under local smoothness assumption

Boris Ndjia Njike , Xavier Siebert

Comments: arXiv admin note: substantial text overlap with arXiv:1902.03055

Subjects

:

Machine Learning (cs.LG)

; Statistics Theory (math.ST); Machine Learning (stat.ML)

There is a large body of work on convergence rates either in passive or

active learning. Here we first outline some of the main results that have been

obtained, more specifically in a nonparametric setting under assumptions about

the smoothness of the regression function (or the boundary between classes) and

the margin noise. We discuss the relative merits of these underlying

assumptions by putting active learning in perspective with recent work on

passive learning. We design an active learning algorithm with a rate of

convergence better than in passive learning, using a particular smoothness

assumption customized for k-nearest neighbors. Unlike previous active learning

algorithms, we use a smoothness assumption that provides a dependence on the

marginal distribution of the instance space. Additionally, our algorithm avoids

the strong density assumption that supposes the existence of the density

function of the marginal distribution of the instance space and is therefore

more generally applicable.

TopRank+: A Refinement of TopRank Algorithm

Victor de la Pena , Haolin Zou Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

Online learning to rank is a core problem in machine learning. In Lattimore

et al. (2018), a novel online learning algorithm was proposed based on

topological sorting. In the paper they provided a set of self-normalized

inequalities (a) in the algorithm as a criterion in iterations and (b) to

provide an upper bound for cumulative regret, which is a measure of algorithm

performance. In this work, we utilized method of mixtures and asymptotic

expansions of certain implicit function to provide a tighter, iterated-log-like

boundary for the inequalities, and as a consequence improve both the algorithm

itself as well as its performance estimation.

Multimodal Deep Unfolding for Guided Image Super-Resolution

Iman Marivani , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

The reconstruction of a high resolution image given a low resolution

observation is an ill-posed inverse problem in imaging. Deep learning methods

rely on training data to learn an end-to-end mapping from a low-resolution

input to a high-resolution output. Unlike existing deep multimodal models that

do not incorporate domain knowledge about the problem, we propose a multimodal

deep learning design that incorporates sparse priors and allows the effective

integration of information from another image modality into the network

architecture. Our solution relies on a novel deep unfolding operator,

performing steps similar to an iterative algorithm for convolutional sparse

coding with side information; therefore, the proposed neural network is

interpretable by design. The deep unfolding architecture is used as a core

component of a multimodal framework for guided image super-resolution. An

alternative multimodal design is investigated by employing residual learning to

improve the training efficiency. The presented multimodal approach is applied

to super-resolution of near-infrared and multi-spectral images as well as depth

upsampling using RGB images as side information. Experimental results show that

our model outperforms state-of-the-art methods.

A Comprehensive Study on Temporal Modeling for Online Action Detection

Wen Wang , Xiaojiang Peng , Yu Qiao , Jian Cheng Subjects : Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG)

Online action detection (OAD) is a practical yet challenging task, which has

attracted increasing attention in recent years. A typical OAD system mainly

consists of three modules: a frame-level feature extractor which is usually

based on pre-trained deep Convolutional Neural Networks (CNNs), a temporal

modeling module, and an action classifier. Among them, the temporal modeling

module is crucial which aggregates discriminative information from historical

and current features. Though many temporal modeling methods have been developed

for OAD and other topics, their effects are lack of investigation on OAD

fairly. This paper aims to provide a comprehensive study on temporal modeling

for OAD including four meta types of temporal modeling methods, ie temporal

pooling, temporal convolution, recurrent neural networks, and temporal

attention, and uncover some good practices to produce a state-of-the-art OAD

system. Many of them are explored in OAD for the first time, and extensively

evaluated with various hyper parameters. Furthermore, based on our

comprehensive study, we present several hybrid temporal modeling methods, which

outperform the recent state-of-the-art methods with sizable margins on

THUMOS-14 and TVSeries.

Variational Dropout Sparsification for Particle Identification speed-up

Artem Ryzhikov , Denis Derkach , Mikhail Hushchyn Subjects : Data Analysis, Statistics and Probability (physics.data-an) ; Machine Learning (cs.LG)

Accurate particle identification (PID) is one of the most important aspects

of the LHCb experiment. Modern machine learning techniques such as neural

networks (NNs) are efficiently applied to this problem and are integrated into

the LHCb software. In this research, we discuss novel applications of neural

network speed-up techniques to achieve faster PID in LHC upgrade conditions. We

show that the best results are obtained using variational dropout

sparsification, which provides a prediction (feedforward pass) speed increase

of up to a factor of sixteen even when compared to a model with shallow

networks.

Pruning Neural Belief Propagation Decoders

Andreas Buchberger , Christian Häger , Henry D. Pfister , Laurent Schmalen , Alexandre Graell i Amat Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)

We consider near maximum-likelihood (ML) decoding of short linear block codes

based on neural belief propagation (BP) decoding recently introduced by

Nachmani et al.. While this method significantly outperforms conventional BP

decoding, the underlying parity-check matrix may still limit the overall

performance. In this paper, we introduce a method to tailor an overcomplete

parity-check matrix to (neural) BP decoding using machine learning. We consider

the weights in the Tanner graph as an indication of the importance of the

connected check nodes (CNs) to decoding and use them to prune unimportant CNs.

As the pruning is not tied over iterations, the final decoder uses a different

parity-check matrix in each iteration. For Reed-Muller and short low-density

parity-check codes, we achieve performance within 0.27 dB and 1.5 dB of the ML

performance while reducing the complexity of the decoder.

Evaluating Weakly Supervised Object Localization Methods Right

Junsuk Choe , Seong Joon Oh , Seungho Lee , Sanghyuk Chun , Zeynep Akata , Hyunjung Shim Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

Weakly-supervised object localization (WSOL) has gained popularity over the

last years for its promise to train localization models with only image-level

labels. Since the seminal WSOL work of class activation mapping (CAM), the

field has focused on how to expand the attention regions to cover objects more

broadly and localize them better. However, these strategies rely on full

localization supervision to validate hyperparameters and for model selection,

which is in principle prohibited under the WSOL setup. In this paper, we argue

that WSOL task is ill-posed with only image-level labels, and propose a new

evaluation protocol where full supervision is limited to only a small held-out

set not overlapping with the test set. We observe that, under our protocol, the

five most recent WSOL methods have not made a major improvement over the CAM

baseline. Moreover, we report that existing WSOL methods have not reached the

few-shot learning baseline, where the full-supervision at validation time is

used for model training instead. Based on our findings, we discuss some future

directions for WSOL.

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

Kaixuan Chen , Dalin Zhang , Lina Yao , Bin Guo , Zhiwen Yu , Yunhao Liu Subjects : Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG)

The vast proliferation of sensor devices and Internet of Things enables the

applications of sensor-based activity recognition. However, there exist

substantial challenges that could influence the performance of the recognition

system in practical scenarios. Recently, as deep learning has demonstrated its

effectiveness in many areas, plenty of deep methods have been investigated to

address the challenges in activity recognition. In this study, we present a

survey of the state-of-the-art deep learning methods for sensor-based human

activity recognition. We first introduce the multi-modality of the sensory data

and provide information for public datasets that can be used for evaluation in

different challenge tasks. We then propose a new taxonomy to structure the deep

methods by challenges. Challenges and challenge-related deep methods are

summarized and analyzed to form an overview of the current research progress.

At the end of this work, we discuss the open issues and provide some insights

for future directions.

Explicit agreement extremes for a (2 imes2) table with given marginals

José E. Chacón

Comments: 8 pages, 2 figures

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG); Statistics Theory (math.ST)

The problem of maximizing (or minimizing) the agreement between clusterings,

subject to given marginals, can be formally posed under a common framework for

several agreement measures. Until now, it was possible to find its solution

only through numerical algorithms. Here, an explicit solution is shown for the

case where the two clusterings have two clusters each.

Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes

Daniele Gammelli , Inon Peled , Filipe Rodrigues , Dario Pacino , Haci A. Kurtaran , Francisco C. Pereira

Comments: 17 pages, 7 figures

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG)

Transport demand is highly dependent on supply, especially for shared

transport services where availability is often limited. As observed demand

cannot be higher than available supply, historical transport data typically

represents a biased, or censored, version of the true underlying demand

pattern. Without explicitly accounting for this inherent distinction,

predictive models of demand would necessarily represent a biased version of

true demand, thus less effectively predicting the needs of service users. To

counter this problem, we propose a general method for censorship-aware demand

modeling, for which we devise a censored likelihood function. We apply this

method to the task of shared mobility demand prediction by incorporating the

censored likelihood within a Gaussian Process model, which can flexibly

approximate arbitrary functional forms. Experiments on artificial and

real-world datasets show how taking into account the limiting effect of supply

on demand is essential in the process of obtaining an unbiased predictive model

of user demand behavior.

Lyceum: An efficient and scalable ecosystem for robot learning

Colin Summers , Kendall Lowrey , Aravind Rajeswaran , Siddhartha Srinivasa , Emanuel Todorov Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

We introduce Lyceum, a high-performance computational ecosystem for robot

learning. Lyceum is built on top of the Julia programming language and the

MuJoCo physics simulator, combining the ease-of-use of a high-level programming

language with the performance of native C. In addition, Lyceum has a

straightforward API to support parallel computation across multiple cores and

machines. Overall, depending on the complexity of the environment, Lyceum is

5-30x faster compared to other popular abstractions like OpenAI’s Gym and

DeepMind’s dm-control. This substantially reduces training time for various

reinforcement learning algorithms; and is also fast enough to support real-time

model predictive control through MuJoCo. The code, tutorials, and demonstration

videos can be found at: this http URL .

Breast lesion segmentation in ultrasound images with limited annotated data

Bahareh Behboodi , Mina Amiri , Rupert Brooks , Hassan Rivaz

Comments: Accepted to ISBI 2020

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Ultrasound (US) is one of the most commonly used imaging modalities in both

diagnosis and surgical interventions due to its low-cost, safety, and

non-invasive characteristic. US image segmentation is currently a unique

challenge because of the presence of speckle noise. As manual segmentation

requires considerable efforts and time, the development of automatic

segmentation algorithms has attracted researchers attention. Although recent

methodologies based on convolutional neural networks have shown promising

performances, their success relies on the availability of a large number of

training data, which is prohibitively difficult for many applications.

Therefore, in this study we propose the use of simulated US images and natural

images as auxiliary datasets in order to pre-train our segmentation network,

and then to fine-tune with limited in vivo data. We show that with as little as

19 in vivo images, fine-tuning the pre-trained network improves the dice score

by 21% compared to training from scratch. We also demonstrate that if the same

number of natural and simulation US images is available, pre-training on

simulation data is preferable.

Neural Style Difference Transfer and Its Application to Font Generation

Gantugs Atarsaikhan , Brian Kenji Iwana , Seiichi Uchida

Comments: Submitted to DAS2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Designing fonts requires a great deal of time and effort. It requires

professional skills, such as sketching, vectorizing, and image editing.

Additionally, each letter has to be designed individually. In this paper, we

will introduce a method to create fonts automatically. In our proposed method,

the difference of font styles between two different fonts is found and

transferred to another font using neural style transfer. Neural style transfer

is a method of stylizing the contents of an image with the styles of another

image. We proposed a novel neural style difference and content difference loss

for the neural style transfer. With these losses, new fonts can be generated by

adding or removing font styles from a font. We provided experimental results

with various combinations of input fonts and discussed limitations and future

development for the proposed method.

DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm

Hao Xu , Haibin Chang , Dongxiao Zhang Subjects : Neural and Evolutionary Computing (cs.NE) ; Machine Learning (cs.LG)

Data-driven methods have recently been developed to discover underlying

partial differential equations (PDEs) of physical problems. However, for these

methods, a complete candidate library of potential terms in a PDE are usually

required. To overcome this limitation, we propose a novel framework combining

deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In

the proposed framework, a deep neural network that is trained with available

data of a physical problem is utilized to generate meta-data and calculate

derivatives, and the genetic algorithm is then employed to discover the

underlying PDE. Owing to the merits of the genetic algorithm, such as mutation

and crossover, DLGA-PDE can work with an incomplete candidate library. The

proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV)

equation, the Burgers equation, the wave equation, and the Chaffee-Infante

equation, respectively, for proof-of-concept. Satisfactory results are obtained

without the need for a complete candidate library, even in the presence of

noisy and limited data.

Digital synthesis of histological stains using micro-structured and multiplexed virtual staining of label-free tissue

Yijie Zhang , Kevin de Haan , Yair Rivenson , Jingxi Li , Apostolos Delis , Aydogan Ozcan

Comments: 19 pages, 5 figures, 2 tables

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph); Quantitative Methods (q-bio.QM)

Histological staining is a vital step used to diagnose various diseases and

has been used for more than a century to provide contrast to tissue sections,

rendering the tissue constituents visible for microscopic analysis by medical

experts. However, this process is time-consuming, labor-intensive, expensive

and destructive to the specimen. Recently, the ability to virtually-stain

unlabeled tissue sections, entirely avoiding the histochemical staining step,

has been demonstrated using tissue-stain specific deep neural networks. Here,

we present a new deep learning-based framework which generates

virtually-stained images using label-free tissue, where different stains are

merged following a micro-structure map defined by the user. This approach uses

a single deep neural network that receives two different sources of information

at its input: (1) autofluorescence images of the label-free tissue sample, and

(2) a digital staining matrix which represents the desired microscopic map of

different stains to be virtually generated at the same tissue section. This

digital staining matrix is also used to virtually blend existing stains,

digitally synthesizing new histological stains. We trained and blindly tested

this virtual-staining network using unlabeled kidney tissue sections to

generate micro-structured combinations of Hematoxylin and Eosin (H&E), Jones

silver stain, and Masson’s Trichrome stain. Using a single network, this

approach multiplexes virtual staining of label-free tissue with multiple types

of stains and paves the way for synthesizing new digital histological stains

that can be created on the same tissue cross-section, which is currently not

feasible with standard histochemical staining methods.

UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision

Tsun-Yi Yang , Duy-Kien Nguyen , Huub Heijnen , Vassileios Balntas Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

In this paper, we explore how three related tasks, namely keypoint detection,

description, and image retrieval can be jointly tackled using a single unified

framework, which is trained without the need of training data with point to

point correspondences. By leveraging diverse information from sequential layers

of a standard ResNet-based architecture, we are able to extract keypoints and

descriptors that encode local information using generic techniques such as

local activation norms, channel grouping and dropping, and self-distillation.

Subsequently, global information for image retrieval is encoded in an

end-to-end pipeline, based on pooling of the aforementioned local responses. In

contrast to previous methods in local matching, our method does not depend on

pointwise/pixelwise correspondences, and requires no such supervision at all

i.e. no depth-maps from an SfM model nor manually created synthetic affine

transformations. We illustrate that this simple and direct paradigm, is able to

achieve very competitive results against the state-of-the-art methods in

various challenging benchmark conditions such as viewpoint changes, scale

changes, and day-night shifting localization.

Counter-example Guided Learning of Bounds on Environment Behavior

Yuxiao Chen , Sumanth Dathathri , Tung Phan-Minh , Richard M. Murray Subjects : Systems and Control (eess.SY) ; Machine Learning (cs.LG); Robotics (cs.RO)

There is a growing interest in building autonomous systems that interact with

complex environments. The difficulty associated with obtaining an accurate

model for such environments poses a challenge to the task of assessing and

guaranteeing the system’s performance. We present a data-driven solution that

allows for a system to be evaluated for specification conformance without an

accurate model of the environment. Our approach involves learning a

conservative reactive bound of the environment’s behavior using data and

specification of the system’s desired behavior. First, the approach begins by

learning a conservative reactive bound on the environment’s actions that

captures its possible behaviors with high probability. This bound is then used

to assist verification, and if the verification fails under this bound, the

algorithm returns counter-examples to show how failure occurs and then uses

these to refine the bound. We demonstrate the applicability of the approach

through two case-studies: i) verifying controllers for a toy multi-robot

system, and ii) verifying an instance of human-robot interaction during a

lane-change maneuver given real-world human driving data.

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Yichao Zhou , Shaunak Mishra , Manisha Verma , Narayan Bhamidipati , Wei Wang

Comments: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 2020

Subjects

:

Computation and Language (cs.CL)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

There is a perennial need in the online advertising industry to refresh ad

creatives, i.e., images and text used for enticing online users towards a

brand. Such refreshes are required to reduce the likelihood of ad fatigue among

online users, and to incorporate insights from other successful campaigns in

related product categories. Given a brand, to come up with themes for a new ad

is a painstaking and time consuming process for creative strategists.

Strategists typically draw inspiration from the images and text used for past

ad campaigns, as well as world knowledge on the brands. To automatically infer

ad themes via such multimodal sources of information in past ad campaigns, we

propose a theme (keyphrase) recommender system for ad creative strategists. The

theme recommender is based on aggregating results from a visual question

answering (VQA) task, which ingests the following: (i) ad images, (ii) text

associated with the ads as well as Wikipedia pages on the brands in the ads,

and (iii) questions around the ad. We leverage transformer based cross-modality

encoders to train visual-linguistic representations for our VQA task. We study

two formulations for the VQA task along the lines of classification and

ranking; via experiments on a public dataset, we show that cross-modal

representations lead to significantly better classification accuracy and

ranking precision-recall metrics. Cross-modal representations show better

performance compared to separate image and text representations. In addition,

the use of multimodal information shows a significant lift over using only

textual or visual information.

A deep network for sinogram and CT image reconstruction

Wei Wang , Xiang-Gen Xia , Chuanjiang He , Zemin Ren , Jian Lu , Tianfu Wang , Baiying Lei Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A CT image can be well reconstructed when the sampling rate of the sinogram

satisfies the Nyquist criteria and the sampled signal is noise-free. However,

in practice, the sinogram is usually contaminated by noise, which degrades the

quality of a reconstructed CT image. In this paper, we design a deep network

for sinogram and CT image reconstruction. The network consists of two cascaded

blocks that are linked by a filter backprojection (FBP) layer, where the former

block is responsible for denoising and completing the sinograms while the

latter is used to removing the noise and artifacts of the CT images.

Experimental results show that the reconstructed CT images by our methods have

the highest PSNR and SSIM in average compared to state of the art methods.

The Incentives that Shape Behaviour

Ryan Carey , Eric Langlois , Tom Everitt , Shane Legg

Comments: 12 pages, 7 figures, accepted to SafeAI workshop at AAAI

Subjects

:

Artificial Intelligence (cs.AI)

; Machine Learning (cs.LG)

Which variables does an agent have an incentive to control with its decision,

and which variables does it have an incentive to respond to? We formalise these

incentives, and demonstrate unique graphical criteria for detecting them in any

single decision causal influence diagram. To this end, we introduce structural

causal influence models, a hybrid of the influence diagram and structural

causal model frameworks. Finally, we illustrate how these incentives predict

agent incentives in both fairness and AI safety applications.

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

Lorenz Braun , Sotirios Nikas , Chen Song , Vincent Heuveline , Holger Fröning Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Machine Learning (cs.LG); Performance (cs.PF)

Characterizing compute kernel execution behavior on GPUs for efficient task

scheduling is a non trivial task. We address this with a simple model enabling

portable and fast predictions among different GPUs using only

hardware-independent features extracted. This model is built based on random

forests using 189 individual compute kernels from benchmarks such as Parboil,

Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using

cross-validation yields a median Mean Average Percentage Error (MAPE) of

[13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on

five different GPUs, while latency for a single prediction varies between 0.1

and 0.2 seconds.

BARNet: Bilinear Attention Network with Adaptive Receptive Field for Surgical Instrument Segmentation

Zhen-Liang Ni , Gui-Bin Bian , Guan-An Wang , Xiao-Hu Zhou , Zeng-Guang Hou , Xiao-Liang Xie , Zhen Li , Yu-Han Wang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV)

Surgical instrument segmentation is extremely important for computer-assisted

surgery. Different from common object segmentation, it is more challenging due

to the large illumination and scale variation caused by the special surgical

scenes. In this paper, we propose a novel bilinear attention network with

adaptive receptive field to solve these two challenges. For the illumination

variation, the bilinear attention module can capture second-order statistics to

encode global contexts and semantic dependencies between local pixels. With

them, semantic features in challenging areas can be inferred from their

neighbors and the distinction of various semantics can be boosted. For the

scale variation, our adaptive receptive field module aggregates multi-scale

features and automatically fuses them with different weights. Specifically, it

encodes the semantic relationship between channels to emphasize feature maps

with appropriate scales, changing the receptive field of subsequent

convolutions. The proposed network achieves the best performance 97.47% mean

IOU on Cata7 and comes first place on EndoVis 2017 by 10.10% IOU overtaking

second-ranking method.

Pairwise Discriminative Neural PLDA for Speaker Verification

Shreyas Ramoji , Prashant Krishnan V , Prachi Singh , Sriram Ganapathy

Comments: Submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing 2020 (ICASSP 2020). Link to GitHub Repository: this https URL

Subjects

:

Audio and Speech Processing (eess.AS)

; Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

The state-of-art approach to speaker verification involves the extraction of

discriminative embeddings like x-vectors followed by a generative model

back-end using a probabilistic linear discriminant analysis (PLDA). In this

paper, we propose a Pairwise neural discriminative model for the task of

speaker verification which operates on a pair of speaker embeddings such as

x-vectors/i-vectors and outputs a score that can be considered as a scaled

log-likelihood ratio. We construct a differentiable cost function which

approximates speaker verification loss, namely the minimum detection cost. The

pre-processing steps of linear discriminant analysis (LDA), unit length

normalization and within class covariance normalization are all modeled as

layers of a neural model and the speaker verification cost functions can be

back-propagated through these layers during training. We also explore

regularization techniques to prevent overfitting, which is a major concern in

using discriminative back-end models for verification tasks. The experiments

are performed on the NIST SRE 2018 development and evaluation datasets. We

observe average relative improvements of 8% in CMN2 condition and 30% in VAST

condition over the PLDA baseline system.

Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives

Daniel J. Trosten , Michael C. Kampffmeyer , Robert Jenssen

Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems

Subjects

:

Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper we develop a new model for deep image clustering, using

convolutional neural networks and tensor kernels. The proposed Deep Tensor

Kernel Clustering (DTKC) consists of a convolutional neural network (CNN),

which is trained to reflect a common cluster structure at the output of its

intermediate layers. Encouraging a consistent cluster structure throughout the

network has the potential to guide it towards meaningful clusters, even though

these clusters might appear to be nonlinear in the input space. The cluster

structure is enforced through the idea of unsupervised companion objectives,

where separate loss functions are attached to layers in the network. These

unsupervised companion objectives are constructed based on a proposed

generalization of the Cauchy-Schwarz (CS) divergence, from vectors to tensors

of arbitrary rank. Generalizing the CS divergence to tensor-valued data is a

crucial step, due to the tensorial nature of the intermediate representations

in the CNN. Several experiments are conducted to thoroughly assess the

performance of the proposed DTKC model. The results indicate that the model

outperforms, or performs comparable to, a wide range of baseline algorithms. We

also empirically demonstrate that our model does not suffer from objective

function mismatch, which can be a problematic artifact in autoencoder-based

clustering models.

An Efficient Framework for Automated Screening of Clinically Significant Macular Edema

Renoh Johnson Chalakkal , Faizal Hafiz , Waleed Abdulla , Akshya Swain Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The present study proposes a new approach to automated screening of

Clinically Significant Macular Edema (CSME) and addresses two major challenges

associated with such screenings, i.e., exudate segmentation and imbalanced

datasets. The proposed approach replaces the conventional exudate segmentation

based feature extraction by combining a pre-trained deep neural network with

meta-heuristic feature selection. A feature space over-sampling technique is

being used to overcome the effects of skewed datasets and the screening is

accomplished by a k-NN based classifier. The role of each data-processing step

(e.g., class balancing, feature selection) and the effects of limiting the

region-of-interest to fovea on the classification performance are critically

analyzed. Finally, the selection and implication of operating point on Receiver

Operating Characteristic curve are discussed. The results of this study

convincingly demonstrate that by following these fundamental practices of

machine learning, a basic k-NN based classifier could effectively accomplish

the CSME screening.

CNN-Based Real-Time Parameter Tuning for Optimizing Denoising Filter Performance

Subhayan Mukherjee , Navaneeth Kamballur Kottayil , Xinyao Sun , Irene Cheng

Comments: 2019 International Conference on Image Analysis and Recognition

Subjects

:

Image and Video Processing (eess.IV)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel direction to improve the denoising quality of

filtering-based denoising algorithms in real time by predicting the best filter

parameter value using a Convolutional Neural Network (CNN). We take the use

case of BM3D, the state-of-the-art filtering-based denoising algorithm, to

demonstrate and validate our approach. We propose and train a simple, shallow

CNN to predict in real time, the optimum filter parameter value, given the

input noisy image. Each training example consists of a noisy input image

(training data) and the filter parameter value that produces the best output

(training label). Both qualitative and quantitative results using the widely

used PSNR and SSIM metrics on the popular BSD68 dataset show that the

CNN-guided BM3D outperforms the original, unguided BM3D across different noise

levels. Thus, our proposed method is a CNN-based improvement on the original

BM3D which uses a fixed, default parameter value for all images.

CNN-based InSAR Coherence Classification

Subhayan Mukherjee , Aaron Zimmer , Xinyao Sun , Parwant Ghuman , Irene Cheng

Comments: 2018 IEEE SENSORS

Subjects

:

Image and Video Processing (eess.IV)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Interferometric Synthetic Aperture Radar (InSAR) imagery based on microwaves

reflected off ground targets is becoming increasingly important in remote

sensing for ground movement estimation. However, the reflections are

contaminated by noise, which distorts the signal’s wrapped phase. Demarcation

of image regions based on degree of contamination (“coherence”) is an important

component of the InSAR processing pipeline. We introduce Convolutional Neural

Networks (CNNs) to this problem domain and show their effectiveness in

improving coherence-based demarcation and reducing misclassifications in

completely incoherent regions through intelligent preprocessing of training

data. Quantitative and qualitative comparisons prove superiority of proposed

method over three established methods.

CNN-based InSAR Denoising and Coherence Metric

Subhayan Mukherjee , Aaron Zimmer , Navaneeth Kamballur Kottayil , Xinyao Sun , Parwant Ghuman , Irene Cheng

Comments: 2018 IEEE SENSORS

Subjects

:

Image and Video Processing (eess.IV)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Interferometric Synthetic Aperture Radar (InSAR) imagery for estimating

ground movement, based on microwaves reflected off ground targets is gaining

increasing importance in remote sensing. However, noise corrupts microwave

reflections received at satellite and contaminates the signal’s wrapped phase.

We introduce Convolutional Neural Networks (CNNs) to this problem domain and

show the effectiveness of autoencoder CNN architectures to learn InSAR image

denoising filters in the absence of clean ground truth images, and for artefact

reduction in estimated coherence through intelligent preprocessing of training

data. We compare our results with four established methods to illustrate

superiority of proposed method.

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang , Changyou Chen , Zhe Gan , Zheng Wen , Wenlin Wang , Lawrence Carin

Comments: Accepted by AISTATS2020

Subjects

:

Computation and Language (cs.CL)

; Machine Learning (cs.LG)

Reinforcement learning (RL) has been widely studied for improving

sequence-generation models. However, the conventional rewards used for RL

training typically cannot capture sufficient semantic information and therefore

render model bias. Further, the sparse and delayed rewards make RL exploration

inefficient. To alleviate these issues, we propose the concept of

nested-Wasserstein distance for distributional semantic matching. To further

exploit it, a novel nested-Wasserstein self-imitation learning framework is

developed, encouraging the model to exploit historical high-rewarded sequences

for enhanced exploration and better semantic matching. Our solution can be

understood as approximately executing proximal policy optimization with

Wasserstein trust-regions. Experiments on a variety of unconditional and

conditional sequence-generation tasks demonstrate the proposed approach

consistently leads to improved performance.

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

Shun-ichi Amari Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

It is known that any target function is realized in a sufficiently small

neighborhood of any randomly connected deep network, provided the width (the

number of neurons in a layer) is sufficiently large. There are sophisticated

theories and discussions concerning this striking fact, but rigorous theories

are very complicated. We give an elementary geometrical proof by using a simple

model for the purpose of elucidating its structure. We show that

high-dimensional geometry plays a magical role: When we project a

high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform

distribution over the sphere reduces to a Gaussian distribution of negligibly

small covariances.

Memristor Hardware-Friendly Reinforcement Learning

Nan Wu , Adrien Vincent , Dmitri Strukov , Yuan Xie Subjects : Emerging Technologies (cs.ET) ; Machine Learning (cs.LG)

Recently, significant progress has been made in solving sophisticated

problems among various domains by using reinforcement learning (RL), which

allows machines or agents to learn from interactions with environments rather

than explicit supervision. As the end of Moore’s law seems to be imminent,

emerging technologies that enable high performance neuromorphic hardware

systems are attracting increasing attention. Namely, neuromorphic architectures

that leverage memristors, the programmable and nonvolatile two-terminal

devices, as synaptic weights in hardware neural networks, are candidates of

choice to realize such highly energy-efficient and complex nervous systems.

However, one of the challenges for memristive hardware with integrated learning

capabilities is prohibitively large number of write cycles that might be

required during learning process, and this situation is even exacerbated under

RL situations. In this work we propose a memristive neuromorphic hardware

implementation for the actor-critic algorithm in RL. By introducing a two-fold

training procedure (i.e., ex-situ pre-training and in-situ re-training) and

several training techniques, the number of weight updates can be significantly

reduced and thus it will be suitable for efficient in-situ learning

implementations. As a case study, we consider the task of balancing an inverted

pendulum, a classical problem in both RL and control theory. We believe that

this study shows the promise of using memristor-based hardware neural networks

for handling complex tasks through in-situ reinforcement learning.

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Ramprasaath R. Selvaraju , Purva Tendulkar , Devi Parikh , Eric Horvitz , Marco Ribeiro , Besmira Nushi , Ece Kamar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing VQA datasets contain questions with varying levels of complexity.

While the majority of questions in these datasets require perception for

recognizing existence, properties, and spatial relationships of entities, a

significant portion of questions pose challenges that correspond to reasoning

tasks — tasks that can only be answered through a synthesis of perception and

knowledge about the world, logic and / or reasoning. This distinction allows us

to notice when existing VQA models have consistency issues — they answer the

reasoning question correctly but fail on associated low-level perception

questions. For example, models answer the complex reasoning question “Is the

banana ripe enough to eat?” correctly, but fail on the associated perception

question “Are the bananas mostly green or yellow?” indicating that the model

likely answered the reasoning question correctly but for the wrong reason. We

quantify the extent to which this phenomenon occurs by creating a new Reasoning

split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of

200K new perception questions which serve as sub questions corresponding to the

set of perceptual tasks needed to effectively answer the complex reasoning

questions in the Reasoning split. Additionally, we propose an approach called

Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the

model to attend do the same parts of the image when answering the reasoning

question and the perception sub questions. We show that SQuINT improves model

consistency by 7.8%, also marginally improving its performance on the Reasoning

questions in VQA, while also displaying qualitatively better attention maps.

Optimal Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting

Tianyang Hu , Zuofeng Shang , Guang Cheng Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

Classifiers built with neural networks handle large-scale high-dimensional

data, such as facial images from computer vision, extremely well while

traditional statistical methods often fail miserably. In this paper, we attempt

to understand this empirical success in high dimensional classification by

deriving the convergence rates of excess risk. In particular, a teacher-student

framework is proposed that assumes the Bayes classifier to be expressed as ReLU

neural networks. In this setup, we obtain a dimension-independent and

un-improvable rate of convergence, i.e., (O(n^{-2/3})), for classifiers trained

based on either 0-1 loss or hinge loss. This rate can be further improved to

(O(n^{-1})) when data is separable. Here, (n) represents the sample size.

A multimodal deep learning approach for named entity recognition from social media

Meysam Asgari-Chenaghlu , M.Reza Feizi-Derakhshi , Leili Farzinvash , Cina Motamed Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI)

Named Entity Recognition (NER) from social media posts is a challenging task.

User generated content which forms the nature of social media, is noisy and

contains grammatical and linguistic errors. This noisy content makes it much

harder for tasks such as named entity recognition. However some applications

like automatic journalism or information retrieval from social media, require

more information about entities mentioned in groups of social media posts.

Conventional methods applied to structured and well typed documents provide

acceptable results while compared to new user generated media, these methods

are not satisfactory. One valuable piece of information about an entity is the

related image to the text. Combining this multimodal data reduces ambiguity and

provides wider information about the entities mentioned. In order to address

this issue, we propose a novel deep learning approach utilizing multimodal deep

learning. Our solution is able to provide more accurate results on named entity

recognition task. Experimental results, namely the precision, recall and F1

score metrics show the superiority of our work compared to other

state-of-the-art NER solutions.

A meta-algorithm for classification using random recursive tree ensembles: A high energy physics application

Vidhi Lalchand

Comments: MPhil Thesis (Scientific Computing, Physics, Machine Learning)

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG); Applications (stat.AP)

The aim of this work is to propose a meta-algorithm for automatic

classification in the presence of discrete binary classes. Classifier learning

in the presence of overlapping class distributions is a challenging problem in

machine learning. Overlapping classes are described by the presence of

ambiguous areas in the feature space with a high density of points belonging to

both classes. This often occurs in real-world datasets, one such example is

numeric data denoting properties of particle decays derived from high-energy

accelerators like the Large Hadron Collider (LHC). A significant body of

research targeting the class overlap problem use ensemble classifiers to boost

the performance of algorithms by using them iteratively in multiple stages or

using multiple copies of the same model on different subsets of the input

training data. The former is called boosting and the latter is called bagging.

The algorithm proposed in this thesis targets a challenging classification

problem in high energy physics – that of improving the statistical significance

of the Higgs discovery. The underlying dataset used to train the algorithm is

experimental data built from the official ATLAS full-detector simulation with

Higgs events (signal) mixed with different background events (background) that

closely mimic the statistical properties of the signal generating class

overlap. The algorithm proposed is a variant of the classical boosted decision

tree which is known to be one of the most successful analysis techniques in

experimental physics. The algorithm utilizes a unified framework that combines

two meta-learning techniques – bagging and boosting. The results show that this

combination only works in the presence of a randomization trick in the base

learners.

Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based Surrogate Model Via Uncertainty Quantification

Ray-Bing Chen , Yuan Wang , C. F. Jeff Wu

Comments: 35 pages, 5 figures, 5 tables

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG); Methodology (stat.ME)

Global optimization of expensive functions has important applications in

physical and computer experiments. It is a challenging problem to develop

efficient optimization scheme, because each function evaluation can be costly

and the derivative information of the function is often not available. We

propose a novel global optimization framework using adaptive Radial Basis

Functions (RBF) based surrogate model via uncertainty quantification. The

framework consists of two iteration steps. It first employs an RBF-based

Bayesian surrogate model to approximate the true function, where the parameters

of the RBFs can be adaptively estimated and updated each time a new point is

explored. Then it utilizes a model-guided selection criterion to identify a new

point from a candidate set for function evaluation. The selection criterion

adopted here is a sample version of the expected improvement (EI) criterion. We

conduct simulation studies with standard test functions, which show that the

proposed method has some advantages, especially when the true surface is not

very smooth. In addition, we also propose modified approaches to improve the

search performance for identifying global optimal points and to deal with the

higher dimension scenarios.

SQLFlow: A Bridge between SQL and Machine Learning

Yi Wang , Yang Yang , Weiguo Zhu , Yi Wu , Xu Yan , Yongfeng Liu , Yu Wang , Liang Xie , Ziyao Gao , Wenjing Zhu , Xiang Chen , Wei Yan , Mingjie Tang , Yuan Tang Subjects : Databases (cs.DB) ; Machine Learning (cs.LG)

Industrial AI systems are mostly end-to-end machine learning (ML) workflows.

A typical recommendation or business intelligence system includes many online

micro-services and offline jobs. We describe SQLFlow for developing such

workflows efficiently in SQL. SQL enables developers to write short programs

focusing on the purpose (what) and ignoring the procedure (how). Previous

database systems extended their SQL dialect to support ML. SQLFlow

( this https URL ) takes another strategy to work as a bridge over

various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute,

and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL

syntax carefully to make the extension working with various SQL dialects. We

implement the extension by inventing a collaborative parsing algorithm. SQLFlow

is efficient and expressive to a wide variety of ML techniques — supervised

and unsupervised learning; deep networks and tree models; visual model

explanation in addition to training and prediction; data processing and feature

extraction in addition to ML. SQLFlow compiles a SQL program into a

Kubernetes-native workflow for fault-tolerable execution and on-cloud

deployment. Current industrial users include Ant Financial, DiDi, and Alibaba

Group.

Dual Stochastic Natural Gradient Descent

Borja Sánchez-López , Jesús Cerquides

Comments: 16 pages

Subjects

:

Optimization and Control (math.OC)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Although theoretically appealing, Stochastic Natural Gradient Descent (SNGD)

is computationally expensive, it has been shown to be highly sensitive to the

learning rate, and it is not guaranteed to be convergent. Convergent Stochastic

Natural Gradient Descent (CSNGD) aims at solving the last two problems.

However, the computational expense of CSNGD is still unacceptable when the

number of parameters is large. In this paper we introduce the Dual Stochastic

Natural Gradient Descent (DSNGD) where we take benefit of dually flat manifolds

to obtain a robust alternative to SNGD which is also computationally feasible.

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

Kevin Maik Jablonka , Daniele Ongari , Seyed Mohamad Moosavi , Berend Smit

Comments: 159 pages, 46 figures

Subjects

:

Materials Science (cond-mat.mtrl-sci)

; Machine Learning (cs.LG)

By combining metal nodes with organic linkers we can potentially synthesize

millions of possible metal organic frameworks (MOFs). At present, we have

libraries of over ten thousand synthesized materials and millions of in-silico

predicted materials. The fact that we have so many materials opens many

exciting avenues to tailor make a material that is optimal for a given

application. However, from an experimental and computational point of view we

simply have too many materials to screen using brute-force techniques. In this

review, we show that having so many materials allows us to use big-data methods

as a powerful technique to study these materials and to discover complex

correlations. The first part of the review gives an introduction to the

principles of big-data science. We emphasize the importance of data collection,

methods to augment small data sets, how to select appropriate training sets. An

important part of this review are the different approaches that are used to

represent these materials in feature space. The review also includes a general

overview of the different ML techniques, but as most applications in porous

materials use supervised ML our review is focused on the different approaches

for supervised ML. In particular, we review the different method to optimize

the ML process and how to quantify the performance of the different methods. In

the second part, we review how the different approaches of ML have been applied

to porous materials. In particular, we discuss applications in the field of gas

storage and separation, the stability of these materials, their electronic

properties, and their synthesis. The range of topics illustrates the large

variety of topics that can be studied with big-data science. Given the

increasing interest of the scientific community in ML, we expect this list to

rapidly expand in the coming years.

Adaptive Stochastic Optimization

Frank E. Curtis , Katya Scheinberg Subjects : Optimization and Control (math.OC) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

Optimization lies at the heart of machine learning and signal processing.

Contemporary approaches based on the stochastic gradient method are

non-adaptive in the sense that their implementation employs prescribed

parameter values that need to be tuned for each application. This article

summarizes recent research and motivates future work on adaptive stochastic

optimization methods, which have the potential to offer significant

computational savings when training large-scale systems.

Teaching Software Engineering for AI-Enabled Systems

Christian Kästner , Eunsuk Kang

Comments: to be published in ICSE-SEET 2020

Subjects

:

Software Engineering (cs.SE)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Software engineers have significant expertise to offer when building

intelligent systems, drawing on decades of experience and methods for building

systems that are scalable, responsive and robust, even when built on unreliable

components. Systems with artificial-intelligence or machine-learning (ML)

components raise new challenges and require careful engineering. We designed a

new course to teach software-engineering skills to students with a background

in ML. We specifically go beyond traditional ML courses that teach modeling

techniques under artificial conditions and focus, in lecture and assignments,

on realism with large and changing datasets, robust and evolvable

infrastructure, and purposeful requirements engineering that considers ethics

and fairness as well. We describe the course and our infrastructure and share

experience and all material from teaching the course for the first time.

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Amy X. Zhang , Michael Muller , Dakuo Wang

Comments: CSCW’2020

Subjects

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)

Today, the prominence of data science within organizations has given rise to

teams of data science workers collaborating on extracting insights from data,

as opposed to individual data scientists working alone. However, we still lack

a deep understanding of how data science workers collaborate in practice. In

this work, we conducted an online survey with 183 participants who work in

various aspects of data science. We focused on their reported interactions with

each other (e.g., managers with engineers) and with different tools (e.g.,

Jupyter Notebook). We found that data science teams are extremely collaborative

and work with a variety of stakeholders and tools during the six common steps

of a data science workflow (e.g., clean data and train model). We also found

that the collaborative practices workers employ, such as documentation, vary

according to the kinds of tools they use. Based on these findings, we discuss

design implications for supporting data science team collaborations and future

research directions.

Efficient Neural Architecture Search: A Broad Version

Zixiang Ding , Yaran Chen , Nannan Li , Dongbin Zhao , C.L. Philip Chen

Comments: 8 pages, 2 figures, 2 tables

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG)

Efficient Neural Architecture Search (ENAS) achieves novel efficiency for

learning architecture with high-performance via parameter sharing, but suffers

from an issue of slow propagation speed of search model with deep topology. In

this paper, we propose a Broad version for ENAS (BENAS) to solve the above

issue, by learning broad architecture whose propagation speed is fast with

reinforcement learning and parameter sharing used in ENAS, thereby achieving a

higher search efficiency. In particular, we elaborately design Broad

Convolutional Neural Network (BCNN), the search paradigm of BENAS with fast

propagation speed, which can obtain a satisfactory performance with broad

topology, i.e. fast forward and backward propagation speed. The proposed BCNN

extracts multi-scale features and enhancement representations, and feeds them

into global average pooling layer to yield more reasonable and comprehensive

representations so that the achieved performance of BCNN with shallow topology

can be promised. In order to verify the effectiveness of BENAS, several

experiments are performed and experimental results show that 1) BENAS delivers

0.23 day which is 2x less expensive than ENAS, 2) the architecture learned by

BENAS based small-size BCNNs with 0.5 and 1.1 millions parameters obtain

state-of-the-art performance, 3.63% and 3.40% test error on CIFAR-10, 3) the

learned architecture based BCNN achieves 25.3\% top-1 error on ImageNet just

using 3.9 millions parameters.

Deep Collaborative Embedding for information cascade prediction

Yuhui Zhao , Ning Yang , Tao Lin , Philip S. Yu

Journal-ref: Knowledge-Based Systems, 2020, 105502

Subjects

:

Social and Information Networks (cs.SI)

; Machine Learning (cs.LG)

Recently, information cascade prediction has attracted increasing interest

from researchers, but it is far from being well solved partly due to the three

defects of the existing works. First, the existing works often assume an

underlying information diffusion model, which is impractical in real world due

to the complexity of information diffusion. Second, the existing works often

ignore the prediction of the infection order, which also plays an important

role in social network analysis. At last, the existing works often depend on

the requirement of underlying diffusion networks which are likely unobservable

in practice. In this paper, we aim at the prediction of both node infection and

infection order without requirement of the knowledge about the underlying

diffusion mechanism and the diffusion network, where the challenges are

two-fold. The first is what cascading characteristics of nodes should be

captured and how to capture them, and the second is that how to model the

non-linear features of nodes in information cascades. To address these

challenges, we propose a novel model called Deep Collaborative Embedding (DCE)

for information cascade prediction, which can capture not only the node

structural property but also two kinds of node cascading characteristics. We

propose an auto-encoder based collaborative embedding framework to learn the

node embeddings with cascade collaboration and node collaboration, in which way

the non-linearity of information cascades can be effectively captured. The

results of extensive experiments conducted on real-world datasets verify the

effectiveness of our approach.

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

Anubha Pandey , Ashish Mishra , Vinay Kumar Verma , Anurag Mittal , Hema A. Murthy

Comments: Accepted in WACV’2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that

the data of all the classes are available during training. The assumption may

not always be practical since the data of a few classes may be unavailable, or

the classes may not appear at the time of training. Zero-Shot Sketch-Based

Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to

handle previously unseen classes during the test. This paper proposes a

generative approach based on the Stacked Adversarial Network (SAN) and the

advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a

high-quality sample, SN learns a better distance metric compared to that of the

nearest neighbor search. The capability of the generative model to synthesize

image features based on the sketch reduces the SBIR problem to that of an

image-to-image retrieval problem. We evaluate the efficacy of our proposed

approach on TU-Berlin, and Sketchy database in both standard ZSL and

generalized ZSL setting. The proposed method yields a significant improvement

in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL)

for SBIR.

Adaptive Parameterization for Neural Dialogue Generation

Hengyi Cai , Hongshen Chen , Cheng Zhang , Yonghao Song , Xiaofang Zhao , Dawei Yin

Comments: Published as a long paper in EMNLP 2019

Subjects

:

Computation and Language (cs.CL)

; Information Retrieval (cs.IR); Machine Learning (cs.LG)

Neural conversation systems generate responses based on the

sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with

a single set of learned parameters to generate responses for given input

contexts. When confronting diverse conversations, its adaptability is rather

limited and the model is hence prone to generate generic responses. In this

work, we propose an {f Ada}ptive {f N}eural {f D}ialogue generation

model, extsc{AdaND}, which manages various conversations with

conversation-specific parameterization. For each conversation, the model

generates parameters of the encoder-decoder by referring to the input context.

In particular, we propose two adaptive parameterization mechanisms: a

context-aware and a topic-aware parameterization mechanism. The context-aware

parameterization directly generates the parameters by capturing local semantics

of the given context. The topic-aware parameterization enables parameter

sharing among conversations with similar topics by first inferring the latent

topics of the given context and then generating the parameters with respect to

the distributional topics. Extensive experiments conducted on a large-scale

real-world conversational dataset show that our model achieves superior

performance in terms of both quantitative metrics and human evaluations.

Machine Learning in Quantitative PET Imaging

Tonghe Wang , Yang Lei , Yabo Fu , Walter J. Curran , Tian Liu , Xiaofeng Yang

Comments: 25 pages, 2 tables

Subjects

:

Image and Video Processing (eess.IV)

; Machine Learning (cs.LG); Medical Physics (physics.med-ph); Machine Learning (stat.ML)

This paper reviewed the machine learning-based studies for quantitative

positron emission tomography (PET). Specifically, we summarized the recent

developments of machine learning-based methods in PET attenuation correction

and low-count PET reconstruction by listing and comparing the proposed methods,

study designs and reported performances of the current published studies with

brief discussion on representative studies. The contributions and challenges

among the reviewed studies were summarized and highlighted in the discussion

part followed by.

Harmonic Convolutional Networks based on Discrete Cosine Transform

Matej Ulicny , Vladimir A. Krylov , Rozenn Dahyot

Comments: arXiv admin note: substantial text overlap with arXiv:1812.03205

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Convolutional neural networks (CNNs) learn filters in order to capture local

correlation patterns in feature space. In this paper we propose to revert to

learning combinations of preset spectral filters by switching to CNNs with

harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT)

filters which have excellent energy compaction properties and are widely used

for image compression. The proposed harmonic blocks rely on DCT-modeling and

replace conventional convolutional layers to produce partially or fully

harmonic versions of new or existing CNN architectures. We demonstrate how the

harmonic networks can be efficiently compressed in a straightforward manner by

truncating high-frequency information in harmonic blocks which is possible due

to the redundancies in the spectral domain. We report extensive experimental

validation demonstrating the benefits of the introduction of harmonic blocks

into state-of-the-art CNN models in image classification, segmentation and edge

detection applications.

Privacy Amplification of Iterative Algorithms via Contraction Coefficients

Shahab Asoodeh , Mario Diaz , Flavio P. Calmon

Comments: Submitted for publication

Subjects

:

Information Theory (cs.IT)

; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

We investigate the framework of privacy amplification by iteration, recently

proposed by Feldman et al., from an information-theoretic lens. We demonstrate

that differential privacy guarantees of iterative mappings can be determined by

a direct application of contraction coefficients derived from strong data

processing inequalities for (f)-divergences. In particular, by generalizing the

Dobrushin’s contraction coefficient for total variation distance to an

(f)-divergence known as (E_{gamma})-divergence, we derive tighter bounds on

the differential privacy parameters of the projected noisy stochastic gradient

descent algorithm with hidden intermediate updates.

Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition

Sebastian Raschka , Benjamin Kaufman Subjects : Biomolecules (q-bio.BM) ; Machine Learning (cs.LG)

In the last decade, machine learning and artificial intelligence applications

have received a significant boost in performance and attention in both academic

research and industry. The success behind most of the recent state-of-the-art

methods can be attributed to the latest developments in deep learning. When

applied to various scientific domains that are concerned with the processing of

non-tabular data, for example, image or text, deep learning has been shown to

outperform not only conventional machine learning but also highly specialized

tools developed by domain experts. This review aims to summarize AI-based

research for GPCR bioactive ligand discovery with a particular focus on the

most recent achievements and research trends. To make this article accessible

to a broad audience of computational scientists, we provide instructive

explanations of the underlying methodology, including overviews of the most

commonly used deep learning architectures and feature representations of

molecular data. We highlight the latest AI-based research that has led to the

successful discovery of GPCR bioactive ligands. However, an equal focus of this

review is on the discussion of machine learning-based technology that has been

applied to ligand discovery in general and has the potential to pave the way

for successful GPCR bioactive ligand discovery in the future. This review

concludes with a brief outlook highlighting the recent research trends in deep

learning, such as active learning and semi-supervised learning, which have

great potential for advancing bioactive ligand discovery.

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Yunlong Lu , Kai Yan

Comments: 14 pages, review. arXiv admin note: text overlap with arXiv:1810.05587 by other authors

Subjects

:

Computer Science and Game Theory (cs.GT)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Deep reinforcement learning (RL) has achieved outstanding results in recent

years, which has led a dramatic increase in the number of methods and

applications. Recent works are exploring learning beyond single-agent scenarios

and considering multi-agent scenarios. However, they are faced with lots of

challenges and are seeking for help from traditional game-theoretic algorithms,

which, in turn, show bright application promise combined with modern algorithms

and boosting computing power. In this survey, we first introduce basic concepts

and algorithms in single agent RL and multi-agent systems; then, we summarize

the related algorithms from three aspects. Solution concepts from game theory

give inspiration to algorithms which try to evaluate the agents or find better

solutions in multi-agent systems. Fictitious self-play becomes popular and has

a great impact on the algorithm of multi-agent reinforcement learning.

Counterfactual regret minimization is an important tool to solve games with

incomplete information, and has shown great strength when combined with deep

learning.

Unsupervised Deep Features for Privacy Image Classification

Chiranjibi Sitaula , Yong Xiang , Sunil Aryal , Xuequan Lu

Comments: Accepted in PSIVT2019 Conference

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG)

Sharing images online poses security threats to a wide range of users due to

the unawareness of privacy information. Deep features have been demonstrated to

be a powerful representation for images. However, deep features usually suffer

from the issues of a large size and requiring a huge amount of data for

fine-tuning. In contrast to normal images (e.g., scene images), privacy images

are often limited because of sensitive information. In this paper, we propose a

novel approach that can work on limited data and generate deep features of

smaller size. For training images, we first extract the initial deep features

from the pre-trained model and then employ the K-means clustering algorithm to

learn the centroids of these initial deep features. We use the learned

centroids from training features to extract the final features for each testing

image and encode our final features with the triangle encoding. To improve the

discriminability of the features, we further perform the fusion of two proposed

unsupervised deep features obtained from different layers. Experimental results

show that the proposed features outperform state-of-the-art deep features, in

terms of both classification accuracy and testing time.

Information Theory

Caching at Base Stations with Multi-Cluster Multicast Wireless Backhaul via Accelerated First-Order Algorithm

Yang Li , Minghua Xia , Yik-Chung Wu

Comments: 14 pages, 8 figures, accepted by IEEE Transactions on Wireless Communications

Subjects

:

Information Theory (cs.IT)

Cloud radio access network (C-RAN) has been recognized as a promising

architecture for next-generation wireless systems to extcolor{black}{support}

the rapidly increasing demand for higher data rate. However, the performance of

C-RAN is limited by the backhaul capacities, especially for the wireless

deployment. While C-RAN with fixed BS caching has been demonstrated to reduce

backhaul consumption, it is more challenging to further optimize the cache

allocation at BSs with multi-cluster multicast backhaul, where the

inter-cluster interference induces additional non-convexity to the cache

optimization problem. Despite the challenges, we propose an accelerated

first-order algorithm, which achieves much higher content downloading sum-rate

than a second-order algorithm running for the same amount of time. Simulation

results demonstrate that, by simultaneously delivering the required contents to

different multicast clusters, the proposed algorithm achieves significantly

higher downloading sum-rate than those of time-division single-cluster

transmission schemes. Moreover, it is found that the proposed algorithm

allocates larger cache sizes to the farther BSs within the nearer clusters,

which provides insight to the superiority of the proposed cache allocation.

On the Capacity of the Oversampled Wiener Phase Noise Channel

Luca Barletta , Stefano Rini Subjects : Information Theory (cs.IT)

In this paper, the capacity of the oversampled Wiener phase noise (OWPN)

channel is investigated. The OWPN channel is a discrete-time point-to-point

channel with a multi-sample receiver in which the channel output is affected by

both additive and multiplicative noise. The additive noise is a white standard

Gaussian process while the multiplicative noise is a Wiener phase noise

process. This channel generalizes a number of channel models previously studied

in the literature which investigate the effects of phase noise on the channel

capacity, such as the Wiener phase noise channel and the non-coherent channel.

We derive upper and inner bounds to the capacity of OWPN channel: (i) an upper

bound is derived through the I-MMSE relationship by bounding the Fisher

information when estimating a phase noise sample given the past channel outputs

and phase noise realizations, then (ii) two inner bounds are shown: one relying

on coherent combining of the oversampled channel outputs and one relying on

non-coherent combining of the samples. After capacity, we study generalized

degrees of freedom (GDoF) of the OWPN channel for the case in which the

oversampling factor grows with the average transmit power (P) as (P)? and the

frequency noise variance as (P^{alpha})?. Using our new capacity bounds, we

derive the GDoF region in three regimes: regime (i) in which the GDoF region

equals that of the classic additive white Gaussian noise (for (eta leq 1)),

one (ii) in which GDoF region reduces to that of the non-coherent channel (for

(eta geq min {alpha,1})) and, finally, one in which partially-coherent

combining of the over-samples is asymptotically optimal (for (2 alpha-1leq

eta leq 1)). Overall, our results are the first to identify the regimes in

which different oversampling strategies are asymptotically optimal.

Pruning Neural Belief Propagation Decoders

Andreas Buchberger , Christian Häger , Henry D. Pfister , Laurent Schmalen , Alexandre Graell i Amat Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)

We consider near maximum-likelihood (ML) decoding of short linear block codes

based on neural belief propagation (BP) decoding recently introduced by

Nachmani et al.. While this method significantly outperforms conventional BP

decoding, the underlying parity-check matrix may still limit the overall

performance. In this paper, we introduce a method to tailor an overcomplete

parity-check matrix to (neural) BP decoding using machine learning. We consider

the weights in the Tanner graph as an indication of the importance of the

connected check nodes (CNs) to decoding and use them to prune unimportant CNs.

As the pruning is not tied over iterations, the final decoder uses a different

parity-check matrix in each iteration. For Reed-Muller and short low-density

parity-check codes, we achieve performance within 0.27 dB and 1.5 dB of the ML

performance while reducing the complexity of the decoder.

Efficient Angle-Domain Processing for FDD-based Cell-free Massive MIMO Systems

Asmaa Abdallah , Mohammad M. Mansour Subjects : Information Theory (cs.IT)

Cell-free massive MIMO communications is an emerging network technology for

5G wireless communications wherein distributed multi-antenna access points

(APs) serve many users simultaneously. Most prior work on cell-free massive

MIMO systems assume time-division duplexing mode, although frequency-division

duplexing (FDD) systems dominate current wireless standards. The key challenges

in FDD massive MIMO systems are channel-state information (CSI) acquisition and

feedback overhead. To address these challenges, we exploit the so-called angle

reciprocity of multipath components in the uplink and downlink, so that the

required CSI acquisition overhead scales only with the number of served users,

and not the number of AP antennas nor APs. We propose a low complexity

multipath component estimation technique and present linear angle-of-arrival

(AoA)-based beamforming/combining schemes for FDD-based cell-free massive MIMO

systems. We analyze the performance of these schemes by deriving closed-form

expressions for the mean-square-error of the estimated multipath components, as

well as expressions for the uplink and downlink spectral efficiency. Using

semi-definite programming, we solve a max-min power allocation problem that

maximizes the minimum user rate under per-user power constraints. Furthermore,

we present a user-centric (UC) AP selection scheme in which each user chooses a

subset of APs to improve the overall energy efficiency of the system.

Simulation results demonstrate that the proposed multipath component estimation

technique outperforms conventional subspace-based and gradient-descent based

techniques. We also show that the proposed beamforming and combining techniques

along with the proposed power control scheme substantially enhance the spectral

and energy efficiencies with an adequate number of antennas at the APs.

The 2-adic complexity of Yu-Gong sequences with interleaved structure and optimal autocorrelation magnitude

Yuhua Sun , Tongjiang Yan Subjects : Information Theory (cs.IT)

In 2008, a class of binary sequences of period (N=4(2^k-1)(2^k+1)) with

optimal autocorrelation magnitude has been presented by Yu and Gong based on an

(m)-sequence, the perfect sequence ((0,1,1,1)) of period (4) and interleaving

technique. In this paper, we study the 2-adic complexities of these sequences.

Our results show that they are larger than (N-2lceilmathrm{log}_2N

ceil)

(which is far larger than (N/2)) and could attain the maximum value (N) if

suitable parameters are chosen, i.e., the 2-adic complexity of this class of

interleaved sequences is large enough to resist the Rational Approximation

Algorithm.

Coordinated Multi-Point Transmission: A Poisson-Delaunay Triangulation Based Approach

Yan Li , Minghua Xia , Sonia Aïssa

Comments: 14 pages, 11 figures, accepted by IEEE Transactions on Wireless Communications

Subjects

:

Information Theory (cs.IT)

Coordinated multi-point (CoMP) transmission is a cooperating technique among

base stations (BSs) in a cellular network, with outstanding capability at

inter-cell interference (ICI) mitigation. ICI is a dominant source of error,

and has detrimental effects on system performance if not managed properly.

Based on the theory of Poisson-Delaunay triangulation, this paper proposes a

novel analytical model for CoMP operation in cellular networks. Unlike the

conventional CoMP operation that is dynamic and needs on-line updating

occasionally, the proposed approach enables the cooperating BS set of a user

equipment (UE) to be fixed and off-line determined according to the location

information of BSs. By using the theory of stochastic geometry, the coverage

probability and spectral efficiency of a typical UE are analyzed, and

simulation results corroborate the effectiveness of the proposed CoMP scheme

and the developed performance analysis.

Enabling Panoramic Full-Angle Reflection via Aerial Intelligent Reflecting Surface

Haiquan Lu , Yong Zeng , Shi Jin , Rui Zhang Subjects : Information Theory (cs.IT)

This paper proposes a new three dimensional (3D) networking architecture

enabled by aerial intelligent reflecting surface (AIRS) to achieve panoramic

signal reflection from the sky. Compared to the conventional terrestrial IRS,

AIRS not only enjoys higher deployment flexibility, but also is able to achieve

360(^circ) panoramic full-angle reflection and requires fewer reflections in

general due to its higher likelihood of having line of sight (LoS) links with

the ground nodes. We focus on the problem to maximize the worst-case

signal-to-noise ratio (SNR) in a given coverage area by jointly optimizing the

transmit beamforming, AIRS placement and phase shifts. The formulated problem

is non-convex and the optimization variables are coupled with each other in an

intricate manner. To tackle this problem, we first consider the special case of

single-location SNR maximization to gain useful insights, for which the optimal

solution is obtained in closed-form. Then for the general case of area

coverage, an efficient suboptimal solution is proposed by exploiting the

similarity between phase shifts optimization for IRS and analog beamforming for

the conventional phase array. Numerical results show that the proposed design

can achieve significant performance gain than heuristic AIRS deployment

schemes.

Secure Index Coding with Security Constraints on Receivers

Yucheng Liu , Parastoo Sadeghi , Neda Aboutorab , Arman Sharififar

Comments: 6 pages; submitted to IEEE International Conference on Communications 2020

Subjects

:

Information Theory (cs.IT)

Index coding is concerned with efficient broadcast of a set of messages to

receivers in the presence of receiver side information. In this paper, we study

the secure index coding problem with security constraints on the receivers

themselves. That is, for each receiver there is a single legitimate message it

needs to decode and a prohibited message list, none of which should be decoded

by that receiver. To this end, our contributions are threefold. We first

introduce a secure linear coding scheme, which is an extended version of the

fractional local partial clique covering scheme that was originally devised for

non-secure index coding. We then develop two information-theoretic bounds on

the performance of any valid secure index code, namely secure polymatroidal

outer bound (on the capacity region) and secure maximum acyclic induced

subgraph lower bound (on the broadcast rate). The structure of these bounds

leads us to further develop two necessary conditions for a given index coding

problem to be securely feasible (i.e., to have nonzero rates).

Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems

Burak Hasircioglu , Jesus Gomez-Vilardebo , Deniz Gunduz Subjects : Information Theory (cs.IT) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Polynomial coding has been proposed as a solution to the straggler mitigation

problem in distributed matrix multiplication. Previous works in the literature

employ univariate polynomials to encode matrix partitions. Such schemes greatly

improve the speed of distributed computing systems by making the task

completion time to depend only on the fastest workers. However, the work done

by the slowest workers, which fails to finish the task assigned to them, is

completely ignored. In order to exploit the partial computations of the slower

workers, we further decompose the overall matrix multiplication task into even

smaller subtasks to better fit workers’ storage and computation capacities. In

this work, we show that univariate schemes fail to make an efficient use of the

storage capacity and we propose bivariate polynomial codes. We show that

bivariate polynomial codes are a more natural choice to accommodate the

additional decomposition of subtasks, as well as, heterogeneous storage and

computation resources at workers. However, in contrast to univariate polynomial

decoding, for multivariate interpolation guarantying decodability is much

harder. We propose two bivartiate polynomial schemes. The first scheme exploits

the fact that bivariate interpolation is always possible for rectangular grid

of points. We obtain the rectangular grid of points at the cost of allowing

some redundant computations. For the second scheme, we relax the decoding

constraint, and require decodability for almost all choices of evaluation

points. We present interpolation sets satisfying the almost decodability

conditions for certain storage configurations of workers. Our numerical results

show that bivariate polynomial coding considerably reduces the completion time

of distributed matrix multiplication.

Systematic Maximum Sum Rank Codes

Paulo Almeida , Umberto Martínez-Penas , Diego Napp Subjects : Information Theory (cs.IT)

In the last decade there has been a great interest in extending results for

codes equipped with the Hamming metric to analogous results for codes endowed

with the rank metric. This work follows this thread of research and studies the

characterization of systematic generator matrices (encoders) of codes with

maximum rank distance. In the context of Hamming distance these codes are the

so-called Maximum Distance Separable (MDS) codes and systematic encoders have

been fully investigated. In this paper we investigate the algebraic properties

and representation of encoders in systematic form of Maximum Rank Distance

(MRD) codes and Maximum Sum Rank Distance (MSRD) codes. We address both block

codes and convolutional codes separately and present necessary and sufficient

conditions for an encoder in systematic form to generate a code with maximum

(sum) rank distance. These characterizations are given in terms of certain

matrices that must be superregular in a extension field and that preserve

superregularity after some transformations performed over the base field. We

conclude the work presenting some examples of Maximum Sum Rank convolutional

codes over small fields. For the given parameters the examples obtained are

over smaller fields than the examples obtained by other authors.

Enabling optimal access and error correction for the repair of Reed-Solomon codes

Zitan Chen , Min Ye , Alexander Barg Subjects : Information Theory (cs.IT)

Recently Reed-Solomon (RS) codes were shown to possess a repair scheme that

supports repair of failed nodes with optimal repair bandwidth. In this paper,

we extend this result in two directions. First, we propose a new repair scheme

for the RS codes constructed in [Tamo-Ye-Barg, {em IEEE Transactions on

Information Theory}, vol. 65, May 2019] and show that our new scheme is robust

to erroneous information provided by the helper nodes while maintaining the

optimal repair bandwidth. Second, we construct a new family of RS codes with

optimal access for the repair of any single failed node. We also show that the

constructed codes can accommodate both features, supporting optimal-access

repair with optimal error-correction capability. Going beyond RS codes, we also

prove that any scalar MDS code with optimal repair bandwidth allows for a

repair scheme with optimal access property.

Physical Layer Security: Authentication, Integrity and Confidentiality

Mahdi Shakiba-Herfeh , Arsenia Chorti , H. Vince Poor Subjects : Information Theory (cs.IT)

The goal of physical layer security (PLS) is to make use of the properties of

the physical layer, including the wireless communication medium and the

transceiver hardware, to enable critical aspects of secure communications. In

particular, PLS can be employed to provide i) node authentication, ii) message

authentication, and, iii) message confidentiality. Unlike the corresponding

classical cryptographic approaches which are all based on computational

security, PLS’s added strength is that it is based on information theoretic

security, in which no limitation with respect to the opponent’s computational

power is assumed and is therefore inherently quantum resistant. In this survey,

we review the aforementioned fundamental aspects of PLS, starting with node

authentication, moving to the information theoretic characterization of message

integrity, and finally, discussing message confidentiality both in the secret

key generation from shared randomness and from the wiretap channel point of

view. The aim of this review is to provide a comprehensive roadmap on important

relevant results by the authors and other contributors and discuss open issues

on the applicability of PLS in sixth generation systems.

Computational Code-Based Single-Server Private Information Retrieval

Lukas Holzbaur , Camilla Hollanti , Antonia Wachter-Zeh Subjects : Information Theory (cs.IT)

A new computational private information retrieval (PIR) scheme based on

random linear codes is presented. A matrix of messages from a McEliece scheme

is used to query the server with carefully chosen errors. The server responds

with the sum of the scalar multiple of the rows of the query matrix and the

files. The user recovers the desired file by erasure decoding the response.

Contrary to code-based cryptographic systems, the scheme presented here enables

to use truly random codes, not only codes disguised as such. Further, we show

the relation to the so-called error subspace search problem and quotient error

search problem, which we assume to be difficult, and show that the scheme is

secure against attacks based on solving these problems.

Uncertainty of Reconstructing Multiple Messages from Uniform-Tandem-Duplication Noise

Yonatan Yehezkeally , Moshe Schwartz

Comments: 6 pages, no figures. Conference paper

Subjects

:

Information Theory (cs.IT)

A growing number of works have, in recent years, been concerned with in-vivo

DNA as medium for data storage. This paper extends the concept of

reconstruction codes for uniform-tandem-duplication noise to the model of

associative memory, by finding the uncertainty associated with (m>2) strings

(where a previous paper considered (m=2)). That uncertainty is found as an

asymptotic expression assuming code-words belong to a typical set of strings,

consisting a growing fraction of the space size, converging to (1). Our

findings show that this uncertainty scales polynomially in the message length.

Coded Caching with Polynomial Subpacketization

Wentu Song , Kui Cai , Long Shi Subjects : Information Theory (cs.IT)

Consider a centralized caching network with a single server and (K) users.

The server has a database of (N) files with each file being divided into (F)

packets ((F) is known as subpacketization), and each user owns a local cache

that can store (frac{M}{N}) fraction of the (N) files. We construct a family

of centralized coded caching schemes with polynomial subpacketization.

Specifically, given (M), (N) and an integer (ngeq 0), we construct a family of

coded caching schemes for any ((K,M,N)) caching system with (F=O(K^{n+1})).

More generally, for any (tin{1,2,cdots,K-2}) and any integer (n) such that

(0leq nleq t), we construct a coded caching scheme with

(frac{M}{N}=frac{t}{K}) and (Fleq

Kinom{left(1-frac{M}{N}

ight)K+n}{n}).

On the Joint Typicality of Permutations of Sequences of Random Variables

Farhad Shirani , Siddharth Garg , Elza Erkip Subjects : Information Theory (cs.IT)

Permutations of correlated sequences of random variables appear naturally in

a variety of applications such as graph matching and asynchronous

communications. In this paper, the asymptotic statistical behavior of such

permuted sequences is studied. It is assumed that a collection of random

vectors is produced based on an arbitrary joint distribution, and the vectors

undergo a permutation operation. The joint typicality of the resulting permuted

vectors with respect to the original distribution is investigated. As an

initial step, permutations of pairs of correlated random vectors are

considered. It is shown that the probability of joint typicality of the

permuted vectors depends only on the number and length of the disjoint cycles

of the permutation. Consequently, it suffices to study typicality for a class

of permutations called ‘standard permutations’, for which, upper-bounds on the

probability of joint typicality are derived. The notion of standard

permutations is extended to a class of permutation vectors called ‘Bell

permutation vectors’. By investigating Bell permutation vectors, upper-bounds

on the probability of joint typicality of permutations of arbitrary collections

of random sequences are derived.

Non-Orthogonal Multiple Access with Wireless Caching for 5G-Enabled Vehicular Networks

Sanjeev Gurugopinath , Sami Muhaidat , Yousof Al-Hammadi , Paschalis C. Sofotasios , Octavia A. Dobre Subjects : Information Theory (cs.IT) ; Signal Processing (eess.SP)

The proliferation of connected vehicles along with the high demand for rich

multimedia services constitute key challenges for the emerging 5G-enabled

vehicular networks. These challenges include, but are not limited to, high

spectral efficiency and low latency requirements. Recently, the integration of

cache-enabled networks with non-orthogonal multiple access (NOMA) has been

shown to reduce the content delivery time and traffic congestion in wireless

networks. Ac-cordingly, in this article, we envisage cache-aided NOMA as a

technology facilitator for 5G-enabled vehicular networks. In particular, we

present a cache-aided NOMA architecture, which can address some of the

aforementioned challenges in these networks. We demonstrate that the spectral

efficiency gain of the proposed architecture, which depends largely on the

cached contents, significantly outperforms that of conventional vehicular

networks. Finally, we provide deep insights into the challenges, opportunities,

and future research trends that will enable the practical realization of

cache-aided NOMA in 5G-enabled vehicular networks.

Optimal Resource Allocation in Ground Wireless Networks Supporting Unmanned Aerial Vehicle Transmissions

Yulin Hu , Guodong Sun , Guohua Zhang , M. Cenk Gursoy , Anke Schmeink

Comments: 13 pages, 13 figures

Subjects

:

Information Theory (cs.IT)

We consider a fully-loaded ground wireless network supporting unmanned aerial

vehicle (UAV) transmission services. To enable the overload transmissions to a

ground user (GU) and a UAV, two transmission schemes are employed, namely

non-orthogonal multiple access (NOMA) and relaying, depending on whether or not

the GU and UAV are served simultaneously. Under the assumption of the system

operating with infinite blocklength (IBL) codes, the IBL throughputs of both

the GU and the UAV are derived under the two schemes. More importantly, we also

consider the scenario in which data packets are transmitted via finite

blocklength (FBL) codes, i.e., data transmission to both the UAV and the GU is

performed under low-latency and high reliability constraints. In this setting,

the FBL throughputs are characterized again considering the two schemes of NOMA

and relaying. Following the IBL and FBL throughput characterizations, optimal

resource allocation designs are subsequently proposed to maximize the UAV

throughput while guaranteeing the throughput of the cellular user.Moreover, we

prove that the relaying scheme is able to provide transmission service to the

UAV while improving the GU’s performance, and that the relaying scheme

potentially offers a higher throughput to the UAV in the FBL regime than in the

IBL regime. On the other hand, the NOMA scheme provides a higher UAV throughput

(than relaying) by slightly sacrificing the GU’s performance.

Privacy-Utility Tradeoff in a Guessing Framework Inspired by Index Coding

Yucheng Liu , Ni Ding , Parastoo Sadeghi , Thierry Rakotoarivelo

Comments: 6 pages; submitted to IEEE International Symposium on Information Theory 2020

Subjects

:

Information Theory (cs.IT)

This paper studies the tradeoff in privacy and utility in a single-trial

multi-terminal guessing (estimation) framework using a system model that is

inspired by index coding. There are (n) independent discrete sources at a data

curator. There are (m) legitimate users and one adversary, each with some side

information about the sources. The data curator broadcasts a distorted function

of sources to legitimate users, which is also overheard by the adversary. In

terms of utility, each legitimate user wishes to perfectly reconstruct some of

the unknown sources and attain a certain gain in the estimation correctness for

the remaining unknown sources. In terms of privacy, the data curator wishes to

minimize the maximal leakage: the worst-case guessing gain of the adversary in

estimating any target function of its unknown sources after receiving the

broadcast data. Given the system settings, we derive fundamental performance

lower bounds on the maximal leakage to the adversary, which are inspired by the

notion of confusion graph and performance bounds for the index coding problem.

We also detail a greedy privacy enhancing mechanism, which is inspired by the

agglomerative clustering algorithms in the information bottleneck and privacy

funnel problems.

Characterization of Conditional Independence and Weak Realizations of Multivariate Gaussian Random Variables: Applications to Networks

Charalambos D. Charalambous , Jan H. van Schuppen

Comments: 6 pages

Subjects

:

Information Theory (cs.IT)

The Gray and Wyner lossy source coding for a simple network for sources that

generate a tuple of jointly Gaussian random variables (RVs) (X_1 : Omega

ightarrow {mathbb R}^{p_1}) and (X_2 : Omega

ightarrow {mathbb

R}^{p_2}), with respect to square-error distortion at the two decoders is

re-examined using (1) Hotelling’s geometric approach of Gaussian RVs-the

canonical variable form, and (2) van Putten’s and van Schuppen’s

parametrization of joint distributions ({f P}_{X_1, X_2, W}) by Gaussian RVs

(W : Omega

ightarrow {mathbb R}^n ) which make ((X_1,X_2)) conditionally

independent, and the weak stochastic realization of ((X_1, X_2)). Item (2) is

used to parametrize the lossy rate region of the Gray and Wyner source coding

problem for joint decoding with mean-square error distortions ({f

E}ig{||X_i-hat{X}_i||_{{mathbb R}^{p_i}}^2 ig}leq Delta_i in

[0,infty], i=1,2), by the covariance matrix of RV (W). From this then follows

Wyner’s common information (C_W(X_1,X_2)) (information definition) is achieved

by (W) with identity covariance matrix, while a formula for Wyner’s lossy

common information (operational definition) is derived, given by

(C_{WL}(X_1,X_2)=C_W(X_1,X_2)

= frac{1}{2} sum_{j=1}^n

ln

left(

frac{1+d_j}{1-d_j}

ight),) for the distortion region ( 0leq Delta_1 leq

sum_{j=1}^n(1-d_j)), (0leq Delta_2 leq sum_{j=1}^n(1-d_j)), and where (1 >

d_1 geq d_2 geq ldots geq d_n>0) in ((0,1)) are {em the canonical

correlation coefficients} computed from the canonical variable form of the

tuple ((X_1, X_2)). The methods are of fundamental importance to other problems

of multi-user communication, where conditional independence is imposed as a

constraint.

3D Beamforming in Reconfigurable Intelligent Surfaces-assisted Wireless Communication Networks

S. Mohammad Razavizadeh , Tommy Svensson Subjects : Information Theory (cs.IT)

Reconfigurable Intelligent Surfaces (RIS) or Intelligent Reflecting Surfaces

(IRS) are metasurfaces that can be deployed in various places in wireless

environments to make these environments controllable and reconfigurable. In

this paper, we investigate the problem of using 3D beamforming in RIS-empowered

wireless networks and propose a new scheme that provides more degrees of

freedom in designing and deploying the RIS-based networks. In the proposed

scheme, a base station (BS) equipped with a full dimensional array of antennas

optimizes its radiation pattern in the three-dimensional space to maximize the

received signal to noise ratio at a target user. We also study the effect of

angle of incidence of the received signal by the RIS on its reflecting

properties and find a relation between this angle and the BS antenna array’s

tilt and elevation angles. The user receives the signal from a reflected path

from the RIS as well as from a direct path from the BS which both depend on the

BS antenna array’s tilt and elevation angles. These angles and also the RIS

element’s phase shifts are jointly numerically optimized. Our simulation

results show that using RIS-assisted 3D beamforming with optimized phase shifts

and radiation angles can considerably improve the performance of wireless

networks.

Optimal Codes Correcting a Burst of Deletions of Variable Length

Andreas Lenz , Nikita Polyanskii

Comments: 6 pages

Subjects

:

Information Theory (cs.IT)

In this paper, we present an efficiently encodable and decodable code

construction that is capable of correction a burst of deletions of length at

most (k). The redundancy of this code is (log n + k(k+1)/2log log n+c_k) for

some constant (c_k) that only depends on (k) and thus is scaling-optimal. The

code can be split into two main components. First, we impose a constraint that

allows to locate the burst of deletions up to an interval of size roughly (log

n). Then, with the knowledge of the approximate location of the burst, we use

several {shifted Varshamov-Tenengolts} codes to correct the burst of deletions,

which only requires a small amount of redundancy since the location is already

known up to an interval of small size. Finally, we show how to efficiently

encode and decode the code.

On Optimal Multi-user Beam Alignment in Millimeter Wave Wireless Systems

Abbas Khalili , Shahram Shahsavari , Mohammad A. (Amir)

Khojastepour , Elza Erkip Subjects : Information Theory (cs.IT)

Directional transmission patterns (a.k.a. narrow beams) are the key to

wireless communications in millimeter wave (mmWave) frequency bands which

suffer from high path loss and severe shadowing. In addition, the propagation

channel in mmWave frequencies incorporates only a few number of spatial

clusters requiring a procedure to align the corresponding narrow beams with the

angle of departure (AoD) of the channel clusters. The objective of this

procedure, called beam alignment (BA) is to increase the beamforming gain for

subsequent data communication. Several prior studies consider optimizing BA

procedure to achieve various objectives such as reducing the BA overhead,

increasing throughput, and reducing power consumption. While these studies

mostly provide optimized BA schemes for scenarios with a single active user,

there are often multiple active users in practical networks. Consequently, it

is more efficient in terms of BA overhead and delay to design multi-user BA

schemes which can perform beam management for multiple users collectively. This

paper considers a class of multi-user BA schemes where the base station

performs a one shot scan of the angular domain to simultaneously localize

multiple users. The objective is to minimize the average of expected width of

remaining uncertainty regions (UR) on the AoDs after receiving users’

feedbacks. Fundamental bounds on the optimal performance are analyzed using

information theoretic tools. Furthermore, a beam design optimization problem is

formulated and a practical BA scheme, which provides significant gains compared

to the beam sweeping used in 5G standard is proposed.

Privacy Amplification of Iterative Algorithms via Contraction Coefficients

Shahab Asoodeh , Mario Diaz , Flavio P. Calmon

Comments: Submitted for publication

Subjects

:

Information Theory (cs.IT)

; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

We investigate the framework of privacy amplification by iteration, recently

proposed by Feldman et al., from an information-theoretic lens. We demonstrate

that differential privacy guarantees of iterative mappings can be determined by

a direct application of contraction coefficients derived from strong data

processing inequalities for (f)-divergences. In particular, by generalizing the

Dobrushin’s contraction coefficient for total variation distance to an

(f)-divergence known as (E_{gamma})-divergence, we derive tighter bounds on

the differential privacy parameters of the projected noisy stochastic gradient

descent algorithm with hidden intermediate updates.

Robust Gaussian Joint Source-Channel Coding Under the Near-Zero Bandwidth Regime

Mohammadamin Baniasadi , Ertem Tuncel Subjects : Information Theory (cs.IT)

Minimum power required to achieve a distortion-noise profile, i.e., a

function indicating the maximum allowed distortion value for each noise level,

is studied for the transmission of Gaussian sources over Gaussian channels

under a regime of bandwidth approaching zero. A simple but instrumental lower

bound to the minimum required power for a given profile is presented. For an

upper bound, a dirty-paper based coding scheme is proposed and its

power-distortion tradeoff is analyzed. Finally, upper and lower bounds to the

minimum power is analyzed and compared for specific distortion-noise profiles,

namely rational profiles with order one and two.

Channels' Confirmation and Predictions' Confirmation: from the Medical Test to the Raven Paradox

Chenguang Lu

Comments: 12 tables, 7 figures

Subjects

:

Artificial Intelligence (cs.AI)

; Information Theory (cs.IT); Logic (math.LO)

After long arguments between positivism and falsificationism, the

verification of universal hypotheses was replaced with the confirmation of

uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox

(RP). Then, Carnap used the logical probability increment as the confirmation

measure. So far, many confirmation measures have been proposed. Measure F among

them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries

proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and

normalizing property suggested by many researchers. Based on the semantic

information theory, a measure b* similar to F is derived from the medical test.

Like the likelihood ratio, b* and F can only indicate the quality of channels

or the testing means instead of the quality of probability predictions. And, it

is still not easy to use b*, F, or another measure to clarify the RP. For this

reason, measure c* similar to the correct rate is derived. The c* has the

simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines

the Equivalence Condition, and hence, can be used to eliminate the RP. Some

examples are provided to show why it is difficult to use one of popular

confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that

fewer counterexamples’ existence is more essential than more positive examples’

existence, and hence, are compatible with Popper’s falsification thought.

Discreteness-aware Receivers for Overloaded MIMO Systems

Hiroki Iimori , Razvan-Andrei Stoica , Giuseppe Thadeu Freitas de Abreu , David Gonzalez G. , Andreas Andrae , Osvaldo Gonsa

Comments: Submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects

:

Signal Processing (eess.SP)

; Information Theory (cs.IT)

We describe three new high-performance receivers suitable for symbol

detection of large-scaled and overloaded multidimensional wireless

communication systems, which are designed upon the usual perfect channel state

information (CSI) assumption at the receiver. Using this common assumption, the

maximum likelihood (ML) detection problem is first formulated in terms of an

l0-norm-based optimization problem, subsequently transformed using a

recently-proposed fractional programming (FP) technique referred to as

quadratic transform (QT), in which the l0-norm is not relaxed into an l1-norm,

in three distinct ways so as to offer a different performance-complexity

trade-off. The first algorithm, dubbed the discreteness-aware penalized

zero-forcing (DAPZF) receiver, aims at outperforming state-of-the-arts (SotAs)

while minimizing the computational complexity. The second solution, referred to

as the discreteness-aware probabilistic soft-quantization detector (DAPSD), is

designed to improve the recovery performance via a soft-quantization method,

and is found via numerical simulations to achieve the best performance of the

three. Finally, the third scheme, named the discreteness-aware generalized

eigenvalue detector (DAGED), not only offers a trade-off between performance

and complexity compared to the others, but also differs from them by not

requiring a penalization parameter to be optimized offline. Simulation results

demonstrate that all three methods outperform the state-of-the-art receivers,

with the DAPZF exhibiting significantly lower complexity.

Serverless Straggler Mitigation using Local Error-Correcting Codes

Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT)

Inexpensive cloud services, such as serverless computing, are often

vulnerable to straggling nodes that increase end-to-end latency for distributed

computation. We propose and implement simple yet principled approaches for

straggler mitigation in serverless systems for matrix multiplication and

evaluate them on several common applications from machine learning and

high-performance computing. The proposed schemes are inspired by

error-correcting codes and employ parallel encoding and decoding over the data

stored in the cloud using serverless workers. This creates a fully distributed

computing framework without using a master node to conduct encoding or

decoding, which removes the computation, communication and storage bottleneck

at the master. On the theory side, we establish that our proposed scheme is

asymptotically optimal in terms of decoding time and provide a lower bound on

the number of stragglers it can tolerate with high probability. Through

extensive experiments, we show that our scheme outperforms existing schemes

such as speculative execution and other coding theoretic methods by at least

25%.

Q-ary Multi-Mode OFDM with Index Modulation

Ferhat Yarkin , Justin P. Coon

Comments: 5 pages, 2 figures

Subjects

:

Signal Processing (eess.SP)

; Information Theory (cs.IT)

In this paper, we propose a novel orthogonal frequency division multiplexing

with index modulation (OFDM-IM) scheme, which we call Q-ary multi-mode OFDM-IM

(Q-MM-OFDM-IM). In the proposed scheme, Q disjoint M-ary constellations are

used repeatedly on each subcarrier, and a maximum-distance separable code is

applied to the indices of these constellations to achieve the highest number of

index symbols. A low-complexity subcarrier-wise detection is shown possible for

the proposed scheme. Spectral efficiency (SE) and the error rate performance of

the proposed scheme are further analyzed. It is shown that the proposed scheme

exhibits a very flexible structure that is capable of encompassing conventional

OFDM as a special case. It is also shown that the proposed scheme is capable of

considerably outperforming the other OFDM-IM schemes and conventional OFDM in

terms of error and SE performance while preserving a low-complexity structure.

Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications

Qing Qu , Zhihui Zhu , Xiao Li , Manolis C. Tsakiris , John Wright , René Vidal

Comments: QQ and ZZ contributed equally to the work. Invited review paper for IEEE Signal Processing Magazine Special Issue on non-convex optimization for signal processing and machine learning. This article contains 26 pages with 11 figures

Subjects

:

Machine Learning (cs.LG)

; Information Theory (cs.IT); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML)

The problem of finding the sparsest vector (direction) in a low dimensional

subspace can be considered as a homogeneous variant of the sparse recovery

problem, which finds applications in robust subspace recovery, dictionary

learning, sparse blind deconvolution, and many other problems in signal

processing and machine learning. However, in contrast to the classical sparse

recovery problem, the most natural formulation for finding the sparsest vector

in a subspace is usually nonconvex. In this paper, we overview recent advances

on global nonconvex optimization theory for solving this problem, ranging from

geometric analysis of its optimization landscapes, to efficient optimization

algorithms for solving the associated nonconvex optimization problem, to

applications in machine intelligence, representation learning, and imaging

sciences. Finally, we conclude this review by pointing out several interesting

open problems for future research.

Quantitative Aspects of Programming Languages and Systems over the past (2^4) years and beyond

Alessandro Aldini (University of Urbino)

Comments: In Proceedings QAPL 2019, arXiv:2001.06163

Journal-ref: EPTCS 312, 2020, pp. 1-19

Subjects

:

Programming Languages (cs.PL)

; Information Theory (cs.IT); Logic in Computer Science (cs.LO)

Quantitative aspects of computation are related to the use of both physical

and mathematical quantities, including time, performance metrics, probability,

and measures for reliability and security. They are essential in characterizing

the behaviour of many critical systems and in estimating their properties.

Hence, they need to be integrated both at the level of system modeling and

within the verification methodologies and tools. Along the last two decades a

variety of theoretical achievements and automated techniques have contributed

to make quantitative modeling and verification mainstream in the research

community. In the same period, they represented the central theme of the series

of workshops entitled Quantitative Aspects of Programming Languages and Systems

(QAPL) and born in 2001. The aim of this survey is to revisit such achievements

and results from the standpoint of QAPL and its community.

Memory capacity of neural networks with threshold and ReLU activations

Roman Vershynin

Comments: 25 pages

Subjects

:

Machine Learning (cs.LG)

; Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Overwhelming theoretical and empirical evidence shows that mildly

overparametrized neural networks — those with more connections than the size

of the training data — are often able to memorize the training data with

(100\%) accuracy. This was rigorously proved for networks with sigmoid

activation functions and, very recently, for ReLU activations. Addressing a

1988 open question of Baum, we prove that this phenomenon holds for general

multilayered perceptrons, i.e. neural networks with threshold activation

functions, or with any mix of threshold and ReLU activations. Our construction

is probabilistic and exploits sparsity.

Covert Communication in Continuous-Time Systems

Ke Li , Don Towsley , Dennis Goeckel

Comments: 8 pages, 5 figures, concise version submitted to 2020 IEEE International Symposium on Information Theory (ISIT)

Subjects

:

Cryptography and Security (cs.CR)

; Information Theory (cs.IT)

Recent works have considered the ability of transmitter Alice to communicate

reliably to receiver Bob without being detected by warden Willie. These works

generally assume a standard discrete-time model. But the assumption of a

discrete-time model in standard communication scenarios is often predicated on

its equivalence to a continuous-time model, which has not been established for

the covert communications problem. Here, we consider the continuous-time

channel directly and study if efficient covert communication can still be

achieved. We assume that an uninformed jammer is present to assist Alice, and

we consider additive white Gaussian noise (AWGN) channels between all parties.

For a channel with approximate bandwidth W, we establish constructions such

that O(WT) information bits can be transmitted covertly and reliably from Alice

to Bob in T seconds for two separate scenarios: 1) when the path-loss between

Alice and Willie is known; and 2) when the path-loss between Alice and Willie

is unknown.

Minimum Symbol-Error Probability Symbol-Level Precoding with Intelligent Reflecting Surface

Mingjie Shao , Qiang Li , Wing-Kin Ma Subjects : Signal Processing (eess.SP) ; Information Theory (cs.IT)

Recently, the use of intelligent reflecting surface (IRS) has gained

considerable attention in wireless communications. By intelligently adjusting

the passive reflection angle, IRS is able to assist the base station (BS) to

extend the coverage and improve spectral efficiency. This paper considers a

joint symbol-level precoding (SLP) and IRS reflecting design to minimize the

symbol-error probability (SEP) of the intended users in an IRS-aided multiuser

MISO downlink. We formulate the SEP minimization problems to pursue uniformly

good performance for all users for both QAM and PSK constellations. The

resulting problem is non-convex and we resort to alternating minimization to

obtain a stationary solution. Simulation results demonstrate that under the aid

of IRS our proposed design indeed enhances the bit-error rate performance. In

particular, the performance improvement is significant when the number of IRS

elements is large.

On the Minimum Achievable Age of Information for General Service-Time Distributions

Jaya Prakash Champati , Ramana R. Avula , Tobias J. Oechtering , James Gross Subjects : Information Retrieval (cs.IR) ; Information Theory (cs.IT)

There is a growing interest in analysing the freshness of data in networked

systems. Age of Information (AoI) has emerged as a popular metric to quantify

this freshness at a given destination. There has been a significant research

effort in optimizing this metric in communication and networking systems under

different settings. In contrast to previous works, we are interested in a

fundamental question, what is the minimum achievable AoI in any

single-server-single-source queuing system for a given service-time

distribution? To address this question, we study a problem of optimizing AoI

under service preemptions. Our main result is on the characterization of the

minimum achievable average peak AoI (PAoI). We obtain this result by showing

that a fixed-threshold policy is optimal in the set of all randomized-threshold

causal policies. We use the characterization to provide necessary and

sufficient conditions for the service-time distributions under which

preemptions are beneficial.

Predictability limit of partially observed systems

Andrés Abeliuk , Zhishen Huang , Emilio Ferrara , Kristina Lerman Subjects : Computers and Society (cs.CY) ; Information Theory (cs.IT)

Applications from finance to epidemiology and cyber-security require accurate

forecasts of dynamic phenomena, which are often only partially observed. We

demonstrate that a system’s predictability degrades as a function of temporal

sampling, regardless of the adopted forecasting model. We quantify the loss of

predictability due to sampling, and show that it cannot be recovered by using

external signals. We validate the generality of our theoretical findings in

real-world partially observed systems representing infectious disease

outbreaks, online discussions, and software development projects. On a variety

of prediction tasks—forecasting new infections, the popularity of topics in

online discussions, or interest in cryptocurrency projects—predictability

irrecoverably decays as a function of sampling, unveiling fundamental

predictability limits in partially observed systems.

欢迎加入我爱机器学习QQ14群:336582044

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习

我来评几句
登录后评论

已发表评论数()

相关站点

热门文章