Missing values are common in the Internet of Things (IoT) environment for various reasons, including regular maintenance or malfunction. In time-series prediction in the IoT, missing values may have a relationship with the target labels, and their missing patterns result in informative missingness. Thus, missing values can be a barrier to achieving high accuracy of prediction and analysis in data mining in the IoT. Although several methods have been proposed to estimate values that are missing, few studies have investigated the comparison of interpolation methods using conventional and Neural Network models. There has thus far been relatively little research into interpolation methods in the IoT environment. To address these problems, this research work presents the use of linear regression artificial neural networks, and long short-term memory to make time-series predictions for missing values. Finally, a full comparison and analysis of interpolation methods are presented. We believe that these findings can be of value to future work in IoT applications.
Time-series
data are found in a wide range of real scenarios and various applications.
Missing values have many causes, such as regular maintenance, malfunction, and
lowering costs. Missing values might include variables related to target
categories in machine learning tasks. Therefore, informative missingness will
happen if the patterns of missing data are ignored [1]. Missing values can thus
be a barrier to the achievement of high accuracy of prediction and analysis in
data mining tasks. For example, air quality prediction is an IoT time-series
application. To prevent harm caused by PM2.5, people can refer to the results
of a prediction and then decide whether to go outside or stay indoors. However,
air quality prediction is difficult because of several complex factors. One of
the factors is that the existing sensors for missingness are not sufficiently
accurate [2]. Therefore, we cannot ignore missing values arbitrarily, because
each data point regarding air quality is valuable and the data sequence is
important. In such situations, the missing patterns will lead to inaccurate
predictions. Therefore, we must deal with missing values using approaches such
as interpolation. However, there has thus far been relatively little research
into the use of interpolation methods in the IoT environment. To analyze
existing interpolation methods and identify the most competitive methods, believe that these findings will benefit
future IoT applications. The problems of missing values can be divided into
three types as follows: Each type has a corresponding method of handling [4].
·
Missing Completely at
Random (MCAR): In this type, the missing values are independent and might not
be influenced by their own values or those of other attributes.
·
Missing at Random (MAR):
This type of missing value might be influenced by the values of some other
attributes instead of the attributes related to itself.
·
Missing Not at Random
(MNAR): This type of missing value might be influenced by the values of
attributes related to itself [5, 6]. To deal with missing values, a simple
technique is to remove them from the data analysis or machine learning task.
This way might be feasible in large, balanced datasets.
However,
in most cases, data are imbalanced, and limited amounts of data are available.
Missing values are hard to ignore in IoT applications because the most common
applications are monitoring or time-series tasks under continuous conditions.
Therefore, the most frequently used techniques are interpolation-based.
Interpolation involves replacing each missing value with an appropriate one [6].
Interpolation methods often use the mean, median, or a predefined value of the
data attribute which has the missing values, or predictive values calculated
from the patterns of the missing data [7]. When the pattern of missing data is
of the MAR or MCAR
type,
such interpolation techniques can be used. They are valuable when each
transaction or record is very important in the dataset, or if there is a single
transaction without missing patterns across many data attributes [8]. Listwise
deletion or maximum likelihood methods can be used to deal with MCAR patterns.
In contrast, there are no general methods to deal with missing patterns of the
MNAR type. In general, the air quality might belong to one of three types. The
type of air quality might depend on the area because the generation of air
pollutants is complex. Therefore, the previous works usually discuss the three
types. The interpolation approaches can be divided into single interpolation
and model-based interpolation as follows: 1. Single interpolation usually uses a
constant value, such as a sample mean, to replace missing values. Although
single interpolation is easy to do, it impairs estimates of variance and
covariance, because it neglects the relationships between missing values and
other attributes in the data. Therefore, the non-missing values or features can
be used to train regression models for predicting the missing values because
the regression models consider the relationships between data attributes.
The
most powerful interpolation approaches are usually model-based. Model-based interpolation methods estimate the most probable value for a missing value by
maximizing the probability using the non-missing values. Such methods aim to
recover the real data as accurately as possible. The most well-known model is linear regression. For example,
Proposed a recursive method to build and
update a linear regression model without using previous transactions or
records. Therefore, this study focused mainly on model-based interpolation
approaches. The most well-known conventional regression models are linear
regression and support vector regression, and these two models have previously
been used to deal with missing values. Recent work has shown that deep learning
models can achieve better performance in time-series prediction than other
methods. This study aimed to analyze and compare interpolation performance in
IoT applications using conventional learning and deep learning models.
IoT device identification
Device
identification refers to a mechanism that predicts the type of an internet-of-thing
(IoT) the device according to the device’s characteristics. Understanding the
identifications of IoT devices is critical to service providers (e.g. mobile
apps) for commercial purposes (e.g. advertising), and infrastructure (e.g.
system/network) managers for security (e.g. finding vulnerable devices).
Specifically, we define the IoT device identification problem as follows: the
input is various data collected from a device, e.g. sensors’ data, network
data, etc.; the output is a label for the IoT device indicating the type of the
device. Figure 2 also shows the model for device identification. This
problem receives extensive attention in recent years due to the proliferation of
mobile computing, IoT deployment, and smart everything. Since this area is
rapidly evolving due to fast wireless and mobile technology innovations, we
review recent efforts on leveraging machine learning to identify IoT devices in
the last five years. Table 2 presents a short summary of the reviewed
works. It is worth noting that proactive approaches are based on IP address,
MAC addresses, unique device numbers by manufacturer, or operating system are not
stable; thus, researchers turned to machine learning approaches, which may also
be passive identifications. In the following, we first review proposed
approaches that tried to identify mobile phones, then we move to review works
that aimed to identify general IoT devices
IoT fast and streaming
data
Many research attempts suggested streaming data analytics that can be mainly deployed
on high-performance computing systems or cloud platforms. The streaming data
analytics on such frameworks is based on data parallelism and incremental
processing [17]. By data parallelism, a large dataset is partitioned into
several smaller datasets, on which parallel analytics are performed
simultaneously. Incremental processing refers to fetching a small batch of data
to be processed quickly in a pipeline of computation tasks. Although these
techniques reduce time latency to return a response from the streaming data
analytic framework, they are not the best possible solution for time-stringent
IoT applications. By bringing streaming data analytics closer to the source of
data (i.e., IoT devices or edge devices) the need for data parallelism and incremental
processing is less sensible as the size of the data in the source allows it to
be processed rapidly. However, bringing fast analytics on IoT devices
introduces its own challenges such as limitations of computing, storage, and
power resources at the source of data.
Recurrent
Neural Networks (RNNs):
In many tasks, the prediction is dependent on
several previous samples such that, in addition to classifying individual
samples, we also need to analyze the sequences of inputs. In such applications,
a feed-forward neural network is not applicable since it assumes no dependency
between input and output layers. RNNs have been developed to address this issue
in sequential (e.g., speech or text) or time-series problems (sensor’s data)
with various lengths. Detecting drivers’ behaviors in smart vehicles, identifying
individual’s movement patterns, and estimating the energy consumption of a
household are some examples where RNNs can be applied. The input to an RNN
consists of both the current sample and the previously observed sample. In other
words, the output of an RNN at time step t−1 affects the output at time step t.
Each neuron is equipped with a feedback loop that returns the current output as
an input for the next step. This structure can be expressed such that each
neuron in an RNN has an internal memory that keeps the information of the
computations from the previous input
Autoencoders
(AEs): AEs consists of an input layer and an output layer that are connected
through one or more hidden layers. AEs have the same number of input and output
units. This network aims to reconstruct the input by transforming inputs into
outputs in the simplest possible way, such that it does not distort the input
very much. This kind of neural networks has been used mainly for solving
unsupervised learning problems as well as transfer learning. Due to their
behavior of constructing the input at the output layer, AEs are mainly used for
diagnosis and fault detection tasks. This is of great interest for industrial
IoT to serve many applications such as fault diagnosis in hardware devices and
machines, and anomaly detection in the performance of assembly lines.
Application domain
IoT
has already created a huge hype among the businesses. Not only big players,
SMEs are also sensing lucrative potential in adopting IoT. It promises to bring
value to all types of businesses by reinventing the business processes and
operations that will eventually enhance the level and quality of products and
services as well as the customer experience. collect any sort of data (e.g.
contextual, locational, etc.) either related to business process or customer.
IoT can contribute to improving business operations in several directions: As
IoT has provided them the most important element of business—the data
acquisition cog, organizations are emancipated to
Improved business
process: The massive connected data from IoT makes
business processes smarter. Analyzing the data collected from every division of
the business will give new insights and knowledge
Increase business
opportunities: IoT opens the door for new and
innovative business opportunities and creates further revenue inlets.
Exploiting acquired knowledge through IoT, corporates will be able to develop
advanced and new business models, locate new markets to extend services and
diversify their product line.
Uplifting business moment:
Businesses can earn competitive velocity and agility by capitalizing and toning
with the influx of dynamic and crucial business data generated through IoT
devices across the domains.
Increase productivity:
IoT helps to identify the need and lack of workforce expertise and also enables
organizations to train the employees just-in-time. This improves workers’
efficiency and reduces mismatch of skills which in turn increases
organizational productivity.
Improved operational
efficiencies: The real-time sensor data from IoT
devices enable organizations to monitor business operations observantly,
minimizing human intervention. If IoT data collected from logistics network,
factory floor and supply chain are utilized judiciously, inventory management
can be optimized, and time to market as well as downtime due to maintenance can
be curtailed significantly.
Enhanced asset
utilization: Industrial IoT enables tracking of
the production equipment, machinery, and tools. Examining the real-time status,
better asset utilization can be achieved.
Faster decision-making:
The real-time business process and operational knowledge will help
organizations to make faster and smarter business decisions. The connected
nature of IoT facilitates dispensing the intelligence and hence decision-makers
are able to prioritize all business decisions
Machine
learning has great potential to be the key technology for IoT. Machine
learning trends to provide analytics for IoT applications. Despite the
recent wave of success of machine learning for networking, there is a scarcity
of machine learning literature about its applications for IoT services and
systems, which this survey aims to address. This paper is different from the
previously published survey papers in terms of focus, scope, and breadth; we
have written this paper to emphasize the application of machine learning for
IoT and the coverage of recent advances. Due to the versatility and evolving
nature of IoT, it is impossible to cover each and every application. However,
this paper has made an attempt to cover the major applications of machine
learning for IoT and the relevant techniques, including traffic profiling, IoT
device identification, security, edge computing infrastructure, network
management based on SDN, and typical IoT applications. We have presented a
thorough study on the recent researches about the application of machine
learning for IoT, its technical progress, and application domains. We have also
presented concise research challenges and open issues, which are critical to
the application of machine learning for IoT.
No comments:
Post a Comment