600字范文 > latex_5_一篇IEEE模板以及遇到的问题和解决方案

latex_5_一篇IEEE模板以及遇到的问题和解决方案

时间：2023-04-22 22:59:51

模板链接：/climb-the-wind/others/blob/master/IEEE-open-journal-template/IEEE-open-journal-template.zip，也可自行前往IEEE computer society 官网下载相应的模板

改动后的模板：/climb-the-wind/others/blob/master/EnglishWork/EnglishWork.zip

模板内容：

\documentclass{IEEEoj}\usepackage{cite}\usepackage{amsmath,amssymb,amsfonts}\usepackage{algorithmic}\usepackage{graphicx,color}\usepackage{textcomp}\usepackage{float}\usepackage{makecell}\bibliographystyle{unsrt}%指定参考文献的样式\UseRawInputEncoding\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08emT\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}\AtBeginDocument{\definecolor{ojcolor}{cmyk}{0.93,0.59,0.15,0.02}}\def\OJlogo{\vspace{-14pt}\includegraphics[height=28pt]{OJIM.png}}\begin{document}\title{A Trust Update Mechanism Based on Reinforcement Learning in Underwater Acoustic Sensor Networks}\begin{abstract}Underwater acoustic sensor networks (UASNs) have been widely applied in marine scenarios, such as offshore exploration, auxiliary navigation and marine military. Due to the limitations in communication, computation, and storage of underwater sensor nodes, traditional security mechanisms are not applicable to UASNs. Recently, various trust models have been investigated as effective tools towards improving the security of UASNs. However, the existing trust models lack flexible trust update rules, particularly when facing the inevitable dynamic fluctuations in the underwater environment and a wide spectrum of potential attack modes. In this study, a novel trust update mechanism for UASNs based on reinforcement learning (TUMRL) is proposed. The scheme is developed in three phases. First, an environment model is designed to quantify the impact of underwater fluctuations in the sensor data, which assists in updating the trust scores. Then, the definition of key degree is given; in the process of trust update, nodes with higher key degree react more sensitively to malicious attacks, thereby better protecting important nodes in the network. Finally, a novel trust update mechanism based on reinforcement learning is presented, to withstand changing attack modes while achieving efficient trust update. The experimental results prove that our proposed scheme has satisfactory performance in improving trust update efficiency and network security.\end{abstract}\begin{IEEEkeywords}Underwater acoustic sensor networks,reinforcement learning,trust update,environment model\end{IEEEkeywords}%\IEEEspecialpapernotice{(Invited Paper)}\maketitle\section{Introduction}\IEEEPARstart{U}{derwater} acoustic sensor networks (UASNs) are expected to play an increasingly important role in many fields, such as marine environmental monitoring, offshore exploration, auxiliary navigation, tsunami warning and marine military operations \cite{heidemann_underwater_},\cite{5714973},\cite{8777101},\cite{8585407}. As shown in Fig.\ref{fig1}, a typical UASN generally comprises numerous underwater sensor nodes, which cooperate to accomplish environment awareness, information collection and packet transmission \cite{820738},\cite{4907458}. Because of their open and unattended nature, sensor nodes can be easily compromised and attacked \cite{6757189},\cite{8093608}. Network security issues have gradually become the main obstacle restricting the development of UASNs.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig1}}\caption{Structure of a typical underwater acoustic sensor network\label{fig1}}\end{figure}Traditional security mechanisms such as key management and identity authentication are effective against external attacks. However, they are powerless to internal attackers who have successfully invaded the network \cite{8307107},\cite{7180508},\cite{5039583},\cite{he_robust_},\cite{6007138}. An internal attacker can compromise normal nodes to obtain the keys and IDs, which allow them to neutralize most traditional security systems. Recently, various trust models have been investigated as effective tools for confronting internal attacks. Commonly, a trust model mainly includes three parts: evidence accumulation, trust score computation and trust update. Considerable research has been done on the first two, whereas the latter still presents weakness in complex underwater environments.\\\indent In this paper, we propose a novel trust update mechanism based on reinforcement learning (TUMRL) for UASNs. First, the impact of the underwater environment is quantified by a specific environment model, which regulates the trust score update process. Then, considering that the nodes with higher importance in the network may be preferentially subject to malicious attacks, the concept of key degree is presented-to protect key nodes by increasing their detection sensitivity to offensive attacks. Finally, the overall method is integrated into a decision-making trust score update mechanism-by introducing reinforcement learning, and thereby achieving efficient trust update.\\\indent The main contributions of this paper are summarized as follows: (i) we propose an environment model to quantify the abstract environmental impact as a probabilistic quantity, which is used to mitigate trust misclassification caused by environmental factors, rather than offensive attacks; (ii) we propose the concept of key degree-to increase the trust sensitivity of key nodes in the network, thereby minimizing the losses caused by malicious attacks; (iii) reinforcement learning is integrated into the environment model and the key degree method to conform the process of trust update, while improving the efficiency and adaptivity of the trust update mechanism.\\\indent The rest of this paper is organized as follows. In Section 2, we provide an overview of related work about trust models. Section 3 introduces the network model and some reasonable assumptions. Then, detailed descriptions about the proposed scheme and simulation results are given in Sections 4 and 5 respectively. Conclusions are drawn in Section 6.\section{Related Work}Although research on trust models in UASNs is still in its infancy, trust models have been extensively studied–and proved to have important value in network security–in the terrestrial wireless sensor networks (TWSNs), cloud environments, and other related subjects. According to their differences in network structure, trust models can be divided into two main categories: 1) trust models in distributed networks, and 2) trust models in clustered networks.\\\indent Trust Models in Distributed Networks. In a distributed network, each node calculates the trust of neighbor nodes (the nodes in the communication range) and maintains the trust update. The results of trust calculation are shared with other nodes through active broadcast or passive transmission, thereby achieving the global security. Ganeriwal et al. \cite{ganeriwal_reputation-based_} first paid attention to the issue of trust security in sensor networks and proposed a reputation-based trust framework. In this framework, each node updates the reputation of neighbor nodes based on their past behaviors, and predicts the future behaviors of neighbors based on the latest reputation. In order to improve the accuracy of trust calculation, Yao et al. \cite{4053930} presented a trust model based on direct interaction and recommendation trust. Both past behavior of the neighbor nodes and recommended trust evidence from other nodes are considered when updating the trust scores. To optimize the coupling between direct trust and indirect trust, Feng et al. \cite{feng_trust_} proposed a trust model based on modified evidence theory. In their system, each node first obtains the relevant direct trust and indirect trust, and then aggregates the information by way of fuzzy sets, which yield the final trust score.\\\indent A different approach for distributed TWSNs was studied by Ren et al. \cite{6463408}. In their method, subjective logic based on consensus techniques is adopted, to mitigate trust fluctuations caused by environmental factors. In addition, each node performs distributed trust storage by maintaining a geographic hash table, thereby reducing the cost of trust update. Further, Jiang et al. \cite{6805612} proposed an efficient distributed trust model (EDTM) for TWSNs. In the EDTM, the final trust result is integrated by communication trust, energy trust, and data trust to improve the accuracy. Each node maintains a sliding time window, which comprises several time-slots. Trust value is calculated in each time-slot and updated by sliding the time window iteratively. Lastly, the impact of poor link quality on trust evaluation was examined by Wu et al. \cite{8668833}. In their work, a beta and link quality indicator (LQI)-based trust model (BLTM) was proposed. In the BLTM, the LQI between two nodes decides whether to update the trust score within a current period, and the weight of trust evidence based on the beta probability density is considered in the process of trust update.\\\indent Trust Models in Clustered Networks. In a clustered network, the trust scores are maintained and updated by agent nodes (e.g., cluster heads) instead of the individual nodes. The agent nodes set trust scores for the remaining nodes within a given vicinity, and share their scores with other agent nodes-to achieve collaborative network security.\\\indent In order to minimize the overhead of trust models in terms of extra messages and time delays, Boukerch et al. \cite{boukerch_trust-based_} put forth an agent-based trust and reputation management scheme (ATRM) for TWSNs. In the ATRM, each node stores its trust information, and a mobile agent is responsible for trust calculation and trust update for each node in a local range. However, the reliability of the mobile agent leaves a great hidden peril for the network. For solving the high resource consumption of trust models in TWSNs, Shaikh et al. \cite{4721432} proposed a lightweight group-based trust management scheme (GTMS). The trust score of a group of nodes is evaluated conjointly, instead of individual trust evaluation, to reduce the cost of trust records at each node. Nevertheless, the interaction between groups generates additional communication costs. Zhang et al. \cite{8306887} presented a trust model for cloud environment to reduce management overhead and detect malicious nodes. In this scheme, nodes are partitioned into domains, to decrease storage and computational costs. Furthermore, a filter procedure is adopted to remove malicious trust evaluations and malicious nodes from a domain.\\\indent Another example is the trust management scheme for open systems introduced by Fan et al. \cite{7572209}, in which a feedback confirmation method-based on pairwise similarity-is developed to deal with dishonest ratings. In the same publication, a trust propagation strategy based on Susceptible-Infected-Recovered model is employed, to control the process of trust propagation. For the sake of secure data storage in cloud environment, Ghafoorian et al. \cite{8466653} presented a role-based access control model based on trust and reputation, which provides accurate direct and indirect trust evaluation at the same time. The security goal that should be considered in an efficient trust-based system is presented, which has certain enlightening significance for the design of similar programs.\\\indent To achieve accurate and energy efficient trust evaluation in UASNs, an attack-resistant trust model based on multidimensional trust metrics (ARTMM) was proposed in \cite{7038144}. In the ARTMM, the trust metrics including link trust, data trust, and node trust are analyzed to obtain an integrated trust. Moreover, the effect of communication channel and node mobility is utilized, as a factor to improve the accuracy of trust evaluation. However, the impact of malicious attacks on the acquisition of trust metrics is neglected in this work.\\\indent Other researchers have also considered the uncertainties associated to the acoustic channel, dynamic network structure and weak link connectivity. For example, a trust model based on cloud theory (TMC) for UASNs was proposed in \cite{7360179}, which works as follows. First, in the process of generating trust evidences, malicious attacks are analyzed layer by layer. Then, the trust value is calculated based on the cloud model. The TMC effectively solves the ambiguity and uncertainty of trust. However, the lack of consideration for the changeable attack modes leaves hidden security risks. Therefore, we will consider an attack model with changing offensive modes in this paper. Additionally, we adopt a flexible trust update mechanism based on reinforcement learning, to incorporate adaptive properties to trust score evaluation.\\\indent The clustered network structure is of importance towards improving network scalability. However, assigning trust to third parties (cluster heads in unattended UASNs) generates new types of security challenges.\section{Network Model and Assumptions}Because of the shortcomings present in clustered networks (Section 2), this study is based on a distributed UASN architecture (Fig.1), where the underwater nodes are deployed at different depth levels by means of anchoring-buoys, and each node is allotted a unique ID. All underwater nodes are homogeneous–i.e., they all have the same but limited capabilities of energy, communication, computation, and storage. The locations of the underwater nodes are determined by established positioning algorithms \cite{8948247},\cite{8848378}, and are periodically exchanged with neighboring nodes to update a neighbor table. The underwater nodes collect information, and send data packets to the surface sinks, by collaborating with neighbor nodes. Further, the data packets are transmitted to the base station on land via satellite relay. Since radio signals are strongly attenuated in saltwater, and often scattered by suspended particles, acoustic waves are the primary communication medium used in underwater environments.\\\indent As a result of the open and unattended nature of UASNs, they are highly vulnerable to offensive attacks. Once the attackers successfully invade a network, they can compromise and take control of normal nodes. Here it is assumed the worse case scenario where the attackers will first monitor and trace the network traffic, and select as targets the nodes situated in higher traffic hotspots. Moreover, the compromised nodes can launch attacks independently, and can also switch from one attack mode to another (DoS, selective forwarding, packet tampering, Sybil, wormhole attacks, etc.). As shown in Fig.1, a normal node B is sending packets to a compromised node A. Assuming that A is performing a selective forwarding attack, the anomalous behavior of A can be detected by B through changes in packet delivery rate. If the attack mode of A is changed to packet tampering in a following cycle, the historical behavior evidence collected by B cannot truly reflect the attack type induced by A, which ultimately leads to inaccurate trust evaluation. In order to detect malicious nodes against internal attackers, each underwater node is embedded with a trust management program. The historical interaction information between nodes is used for trust calculation, and a sliding time window is maintained by each node during trust update (Fig.5). The trust value is represented by a real number ranging from 0 to 1. The simulated network is initialized without compromised nodes, and with a trust score of 0.5.\\\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig2}}\caption{Workflow schematic of the proposed TUMRL.\label{fig2}}\end{figure}\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig3}}\caption{Mobility model of underwater sensor node.\label{fig3}}\end{figure}\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig4}}\caption{Conceptual schematic of direct and indirect link connectivity with respect to a node $ni$.\label{fig4}}\end{figure}\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig5}}\caption{Trust update cycle with $m$ update time-slots.\label{fig5}}\end{figure}\section{Trust Evaluation Architecture (TURML)}The overall workflow of the trust evaluation architecture-the TUMRL-is displayed in Fig.2. In summary, the trust update cycle of a node starts by calculating the current environmental impact on trust score evaluation-according to the environment model-as an impact factor. If the impact factor is higher than a specified threshold, the trust update for the current time-slot is aborted. Otherwise, the interaction information stored in the sliding time window is used to compute the trust evidences, and the trust update based on reinforcement learning is executed. Finally, the key degree of nodes affects the probability of state transition, that is, the trust of the nodes with higher key degree is more sensitive to offensive attacks, in order to minimize the attack damage.\\\indent In what follows, a detailed description of the building blocks of the TUMRL is given.\subsection{ Environment Model}UASNs are usually deployed in complex, dynamic and harsh underwater environments. Thus the environmental conditions may highly affect the characteristics of node communication, data transmission and even network security. Therefore, it is necessary to consider the impact of underwater deployment environment on node trust update. In this section, we propose an environment model to quantify the impact of the underwater environment. The model quantifies the impact into two categories: the mobility of water flow and the instability of the acoustic communication. Based on the impact factors, a comprehensive environmental impact value is obtained, which regulates the trust update process.\subsubsection{The Mobility of Water Flow}\indent Under the influence of ocean currents, tides and other factors, underwater sensor nodes exhibit dynamic displacements in their relative positions. Hence a mobility model of underwater sensor nodes \cite{chang_reinforcement_} is here adopted. As shown in Fig.3, an underwater sensor node is anchored to a certain position $O$ on the seabed by a tethered wire; $O$ is the origin of the coordinate system. The position of the underwater sensor node and its subpoint on the undersurface are expressed as $S$ and $S^{\prime }$, respectively. The angle between the vector $OS^{\prime }$ and the $X$ axis ($\varphi$), and the angle between the vector $OS$ and the $Z$ axis ($\theta$) are also indicated. It is assumed that the speed of the underwater sensor node $i$ at time-slot $t$ is expressed as ${v_i}\left(t \right)$, which obeys the normal distribution $N\left({{\mu _1},{\sigma _1}^2} \right)$ and the effective range is truncated to $\left({0,2{\mu _1}} \right)$. The moving direction of node $i$ at time-slot $t$ is denoted as $\left({d{\theta _i}\left(t \right),d{\varphi _i}\left(t \right)} \right)$, where $d{\theta _i}\left(t \right)$ and $d{\varphi _i}\left(t \right)$ are assumed to obey uniform distributions; $U\left({0,\pi } \right)$ and $U\left({0,2\pi } \right)$ respectively. The length of the tethered wire of node $i$ is expressed as ${R_i}$. According to the current location $\left({{x_i}\left(t \right),{y_i}\left(t \right),{z_i}\left(t \right)} \right)$, the location of node $i$ in next time-slot is expressed as follows:\begin{equation}\label{eq1}{c_i}\left({t + 1} \right) = \left\lbrace \begin{array}{l}x_i\left(t \right) + {v_i}\left(t \right)\sin d{\theta _i}\left(t \right)\cos d{\varphi _i}\left(t \right)\\ {y_i}\left(t \right) + {v_i}\left(t \right)\sin d{\theta _i}\left(t \right)\sin d{\varphi _i}\left(t \right)\\ {z_i}\left(t \right) + {v_i}\left(t \right)\cos d{\theta _i}\left(t \right) \end{array} \right. \tag{1}\end{equation}Due to the constraint imposed by the tethered wire $\left| {{c_i}\left(t \right) - {O_i}} \right| \leq {R_i}$, the limit locus of node $i$ in polar coordinates is $\left({{R_i},{\theta _i}\left(t \right) + d{\theta _i}\left(t \right),{\varphi _i}\left(t \right) + d{\varphi _i}\left(t \right)} \right)$. The distance between node $i$ and $j$ is given by\begin{equation}\label{eq2}{d_{ij}}\left({t+1} \right) = \left| {{c_i}\left({t+1} \right) - {c_j}\left({t+1} \right)} \right|\tag{2}\end{equation}The faster the distance between the two nodes changes with time, the stronger the impact of the mobility of water flow. Therefore, the impact of the mobility of water flow is defined as follows:\begin{equation}\label{eq3}{I_{mw}} = \frac{{\left| {{d_{ij}}\left(t \right) - {d_{ij}}\left({t + 1} \right)} \right|}}{r} \tag{3}\end{equation}where $r$ represents the communication radius of the underwater sensor nodes. When the numerator is greater than $r$, there is no communication between the two nodes, hence trust score update is not possible.\subsubsection{The Instability of the Acoustic Communication}Instabilities in underwater acoustic communication are mainly caused by channel fading and environmental noise \cite{von_looz_querying_}. Underwater acoustic channels display both, distance and frequency selective fading. This is caused by spreading loss and by the signal-heat transfer increase-associated to larger transmission distances. Signal-heat transfer also increases with the transmission frequency. Based on the signal frequency $f$ (in $kHz$) and the distance $d_{ij}$ between transmitter $i$ and receiver $j$, the attenuation occurring in an underwater acoustic channel can be estimated by the relation\begin{equation}\label{eq4}A\left({{d_{ij}},f} \right) = {A_0}d_{ij}^ka{\left(f \right)^{{d_{ij}}}}\tag{4}\end{equation}where ${A_0}$ is the unit-normalizing constant, and $k$ is the spreading factor, which is commonly set to 2 for spherical spreading, 1 for cylindrical spreading, and 1.5 for practical spreading. In addition, $a\left(f \right)$ (in $dB/km$) is the absorption coefficient, which can be estimated by a well-known empirical formula\begin{align*}10\log a\left(f \right) &= 0.11\frac{{{f^2}}}{{1 + {f^2}}} + 44\frac{{{f^2}}}{{4100 + {f^2}}} \\ &\quad +\;2.75 \times {10^{ - 4}}{f^2} + 0.003. \tag{5}\end{align*}Another factor that influences the quality of underwater acoustic channels is the plurality of noise sources present in underwater environments, which typically include: turbulence noise ${N_t}\left(f \right)$, ship-induced noise ${N_s}\left(f \right)$, wave motion noise ${N_w}\left(f \right)$ caused by surface winds–and thermal noise ${N_{th}}\left(f \right)$. These noise categories can be described by Gaussian statistics, and estimated by the empirical relations\begin{align*}10\log {N_t}\left(f \right) &= 17 - 30\log f \\ 10\log {N_s}\left(f \right) &= 40 + 20\left({s - 0.5} \right) + 26\log f \\ &\quad \;-60\log \left({f + 0.03} \right) \\ 10\log {N_w}\left(f \right) &= 50 + 7.5{w^{\frac{1}{2}}} + 20\log f \\ &\quad \;-40\log \left({f + 0.4} \right) \\ 10\log {N_{th}}\left(f \right) &= - 15 + 20\log f\tag{6}\end{align*}where $s \in \left[ {0,1} \right]$ is a factor representing the surface vessel activity frequency, and $w$ is the wind speed (in $m/s$). The effective noise level at a frequency $f$ is the sum of the contributions of the above factors\begin{equation}\label{eq7}N\left(f \right) = {N_t}\left(f \right) + {N_s}\left(f \right) + {N_w}\left(f \right) + {N_{th}}\left(f \right). \tag{7}\end{equation}For moderate signaling bandwidth $B$ and transmitted power $P$, the average signal to noise ratio ($SNR$) at the receiver is calculated as\begin{equation}\label{eq8}\text{SNR}\left({{d_{ij}},f} \right) = \frac{{{P \mathord {\left/ {\vphantom{P {A\left({{d_{ij}},f} \right)}}} \right. } {A\left({{d_{ij}},f} \right)}}}}{{N\left(f \right)B}}, \tag{8}\end{equation}with which the Rayleigh fading sub-channel–i.e., the probability of symbol error during the transmission–can be approximated\begin{equation}\label{eq9}{P_e} = \frac{1}{{4\text{SNR}}}. \tag{9}\end{equation}The instability of underwater acoustic communication eventually causes errors in the packet transmission. Therefore, the impact of unstable acoustic communication is defined as follows:\begin{equation}\label{eq10}{I_{ac}} = 1 - {\left({1 - {P_e}} \right)^N}, \tag{10}\end{equation}where $N$ is the number of symbols in a packet.Finally, according to the impact of the mobility of water flow and the instability of the acoustic communication, the integrated environmental impact ${I_e}$, that is, the output of the environment model, is defined\begin{equation}\label{eq11}{I_e} = {k_1}{I_{mw}} + {k_2}{I_{ac}}, \tag{11}\end{equation}where ${k_1},{k_1} \in \left({0,1} \right)$ are the weights of the two factors, and ${k_1} + {k_2} = 1$.\subsection{ Definition of Key Degree}UASNs are open systems, in which the communication between underwater nodes can be easily monitored by potential attackers through acoustic receivers. Therefore, an attacker can find hotspots in the network by monitoring, analyzing and tracing network traffic, and preferentially compromise nodes in the hotspots to optimize their offensive manoeuvres. In order to tackle the possibility of priority attacks, the concept of key degree is proposed to quantify the importance of a node to the network. Based on the difference in key degree, a differentiated trust update strategy is given in the next section, so as to achieve priority protection for the nodes with high importance.\\\indent Fig.4 shows an example of node ${n_i}$ which has only neighbor nodes ${n_1}$, ${n_2}$ and ${n_3}$. The nodes ${n_2}$ and ${n_3}$ are also neighbors to each other, so there is a direct acoustic link between them. Node ${n_1}$ cannot communicate directly with node ${n_2}$, but it can communicate indirectly by routing through ${n_i}$, thereby there is an indirect link between them. Similarly, there is also an indirect link between node ${n_1}$ and node ${n_3}$. It can be seen that once node $i$ is removed from the network, the indirect links within the communication range will disappear. Therefore, the ratio of the number of indirect links to the total number of links is an indicator of the importance-or key degree-of node ${n_i}$.\\\indent Now, lets assume that the number of neighbors of node ${n_i}$ is $k$, and the coordinates of the neighbors are denoted as${C_i} = \left\lbrace {{c_1},{c_2}, \cdots,{c_k}} \right\rbrace$. Within the communication range of node $i$, the total number of links-direct and indirect-can be computed by $\sum \nolimits _{i = 1}^{k - 1} = \frac{{k\left({k - 1} \right)}}{2}$, and the set of direct links is expressed as $L = \left\lbrace {{l_{mn}}|\left| {{c_m} - {c_n}} \right| \leq r,\forall {c_m},{c_n} \in {C_i},m \ne n} \right\rbrace$, where ${{l_{mn}}}$ represents the direct link between nodes $m$ and $n$. Finally, the key degree of the node ${n_i}$ is defined by\begin{equation}\label{eq12}{K_i} = \frac{{\frac{{k\left({k - 1} \right)}}{2} - \left| L \right|}}{{\frac{{k\left({k - 1} \right)}}{2}}} = 1 - 2\frac{{\left| L \right|}}{{k\left({k - 1} \right)}}, \tag{12}\end{equation}where ${\left| L \right|}$ represents the cardinality of the set $L$.\subsection{ Trust Update Based on Reinforcement Learning}The trust management system must not only confront challenges associated to the complexity of the underwater environment, but also various types of offensive attacks. Therefore, it is necessary to design an efficient and adaptive trust update mechanism. Reinforcement learning is a machine learning framework that has become ubiquitous in situations where online weight adaptation is required \cite{mnih_human-level_},\cite{kaelbling_reinforcement_1996}. In this case, the reinforcement learning model acquires information-to update the trust model parameters-by receiving environmental feedbacks iteratively. This characteristic makes it suitable for practical underwater environment, where the exact information is difficult to obtain. Therefore, reinforcement learning is introduced into the process of trust update. The process generally comprises three phases. First, an update regulation mechanism is presented to mitigate trust misclassification caused by environmental factors, rather than offensive attacks. Then, update benefit is accumulated based on the states of the evaluated nodes and the obtained trust evidences. Last, a state transition mechanism is elaborately designed to achieve optimal trust update.\subsubsection{Update Regulation}As shown in Fig.5, each node updates trust scores for neighbor nodes periodically, and each trust update cycle is divided into $m$ update time-slots of equal length. That is, the node performs at most $m$ update rounds in each trust update cycle. Each update time-slot records interaction information with other nodes during this period of time. Outside the trust update cycle, the node uses the results of the last trust update to evaluate other nodes. During each updating time-slot, the environmental impact ${I_e}$ is first obtained by the TUMRL-according to the environment model mentioned in Section 4.1. If ${I_e} > {\theta _e}$ is satisfied, where ${\theta _e} \in \left({0,1} \right)$ represents the threshold of environmental stability, the updating time-slot is skipped, and the system proceeds to the next update time-slot. This is because when the environmental stability is poor–i.e., ${I_e} > {\theta _e}$ the system could be affected by the environment and assign low trust scores to normal nodes.If ${I_e} \leq {\theta _e}$ is satisfied, the trust evidences-including communication trust ${T_c}$, energy trust ${T_e}$ and data trust ${T_d}$ are first calculated according to the relations (from our previous work in [33])\begin{equation}\label{eq13a}{T_c} = \frac{{2s + 1}}{{2\left({s + f + 1} \right)}} \tag{13a}\end{equation}\begin{equation}\label{eq13b}{T_d = 2\int \limits _{v_j}^\infty f\left(v \right)\;dv } \tag{13b}\end{equation}\begin{equation}\label{eq13c}T_e=\left\lbrace \begin{array}{ll}0 & \text{if } E_{res} < \theta \\ 1 - \left| {{r_e} - {r_N}} \right| & \text{otherwise}, \end{array}\right. \tag{13c}\end{equation}where ${T_c},{T_d},{T_e} \in \left[ {0,1} \right]$. $s$ and $f$ are the number of successful and failed communications in the time-slot, respectively. ${{v_j}}$ represents the value in the packet received from the $jth$ neighbor, and ，${f\left(v \right)}$ represents the degree of data differentiation. ${E_{res}}$ is the surplus energy of the evaluated node, $θ$ is the energy threshold, ${{r_e}}$ is the current energy consumption rate and ${{r_N}}$ is the normal energy consumption rate.\subsubsection{Benefit Accumulation}After obtaining the trust evidences, the TUMRL performs trust score update through the reinforcement learning model. Specifically, it dynamically adjusts the weight proportion of the trust evidences to adaptively counter the unpredictable offensive attack modes mentioned in Section 3. The main process of trust update, which also occurs during a given time-slot is now described. First, a reward is obtained based on the current state and the latest performed action, and these values are fed to the Bellman Equation, in order to update the accumulated benefit. Finally, the state is updated according to the latest accumulated benefit.\\\indent We now describe the states, actions, and rewards of the TUMRL in more detail. The weight proportion of the various examples of trust evidences are given in Table 1. The states, which are defined as ${S_i},i = 1,2, \ldots,7$, are the different combinations of the latest trust evidences. The actions are denoted by ${A_j},j = 1,2, \ldots,8$. For example, $state = {S_1}$ means that the current trust score of the evaluated node is $T = \frac{1}{3}{T_c} + \frac{1}{3}{T_e} + \frac{1}{3}{T_d}$, and $action = {A_1}$ means that the latest trust evidences satisfy ${T_c} < 0.5$,${T_e} \geq 0.5$ and ${T_d} \geq 0.5$. The corresponding rewards table is also given (Table 2), in which ${S_i}{A_j} = v$ signifies that when the current state is ${S_i}$ and the action is ${A_j}$, the reward is $v$.\begin{table}[H]\caption{Parallel Table of States and Actions}\label{Table1}\footnotesize\begin{tabular}{| c | c || c | c | }\hline$States$ & $Expressions$ & $Actions$ & $Expressions$ \\\hline${S_1}$ & $(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ & ${A_1}$ & $\{{T_c} < 0.5$,${T_e} \geq 0.5$,${T_d} \geq 0.5$\} \\\hline${S_2}$ & $(\frac{1}{2},\frac{1}{2},$0$)$& ${A_2}$ & $\{{T_c} \geq 0.5$,${T_e} < 0.5$,${T_d} \geq 0.5$\} \\\hline${S_3}$ & $(\frac{1}{2},$0$,\frac{1}{2})$& ${A_3}$ & $\{{T_c} \geq 0.5$,${T_e} \geq 0.5$,${T_d} < 0.5$\} \\\hline${S_4}$ & $($0$,\frac{1}{2},\frac{1}{2})$& ${A_4}$ & $\{{T_c} < 0.5$,${T_e} < 0.5$,${T_d} \geq 0.5$\} \\\hline${S_5}$ & $($1$,$0$,$0$)$& ${A_5}$ & $\{{T_c} < 0.5$,${T_e} \geq 0.5$,${T_d} < 0.5$\} \\\hline${S_6}$ & $($0$,$1$,$0$)$& ${A_6}$ & $\{{T_c} \geq 0.5$,${T_e} < 0.5$,${T_d} < 0.5$\} \\\hline${S_7}$ & $($0$,$0$,$1$)$& ${A_7}$ & $\{{T_c} < 0.5$,${T_e} < 0.5$,${T_d} < 0.5$\} \\\hline& & ${A_8}$ & $\{{T_c} \geq 0.5$,${T_e} \geq 0.5$,${T_d} \geq 0.5$\} \\\hline\end{tabular}\centering\end{table}\begin{table}[H]\caption{Reward Table}\label{Table2}\normalsize\begin{tabular}{ c | c | c | c | c | c | c | c | c }\hline$R$ & ${A_1}$ & ${A_2}$ & ${A_3}$ & ${A_4}$ & ${A_5}$ & ${A_6}$ & ${A_7}$ & ${A_8}$ \\\hline${S_1}$ & $2$ & $2$ & $2$ & $1$ & $1$ & $1$ & $0$ & $10$ \\\hline${S_2}$ & $1$ & $1$ & $0$ & $2$ & $0$ & $0$ & $1$ & $10$ \\\hline${S_3}$ & $1$ & $0$ & $1$ & $0$ & $2$ & $0$ & $1$ & $10$ \\\hline${S_4}$ & $0$ & $1$ & $0$ & $0$ & $0$ & $2$ & $1$ & $10$ \\\hline${S_5}$ & $2$ & $0$ & $0$ & $1$ & $1$ & $0$ & $1$ & $10$ \\\hline${S_6}$ & $0$ & $2$ & $0$ & $2$ & $0$ & $1$ & $1$ & $10$ \\\hline${S_7}$ & $0$ & $0$ & $2$ & $0$ & $1$ & $1$ & $1$ & $10$ \\\hline\end{tabular}\centering\end{table}Actions performed over different states produce different results, hence they have different rewards. For example, when the state is ${S_1}$, actions ${A_1}$, ${A_2}$, and ${A_3}$ have in common that there is only one trust evidence below the threshold 0.5 (Table 1). This situation is likely to be caused by a compromised node, so the reward is set to 2. In ${A_4}$, ${A_5}$, and ${A_6}$, there are two types of trust evidences below the threshold, which is less likely to occur because of compromised nodes-as these can only perform one type of attack at a time. The three types of trust evidences in ${A_7}$ are all below the threshold, so the reward is set to 0. The three types of trust evidence in ${A_8}$ are all above the threshold, which is the desired state corresponding to a healthy UASN, so the reward is set to 10.\\\indent After the state, action and reward in current time-slot are obtained by the TUMRL, the Bellman Equation (Eq. (14)) is employed to update the accumulated benefit\\\begin{align*}Q\left({S,A} \right) &= \left({1 - \alpha } \right)Q\left({S,A} \right) \\ &\quad +\;\alpha \left[ {R + \gamma \max \;Q\left({S^{\prime },A^{\prime }} \right)} \right], \tag{14}\end{align*}where $\alpha \in \left[ {0,1} \right]$ is the learning rate-which weighs the past experience against the current learning–and $\gamma \in \left[ {0,1} \right)$ is the discount factor-indicating the importance of the future benefit. Additionally, $Q\left({S,A} \right)$ represents the accumulated benefit, and ${\max\; Q\left({S^{\prime },A^{\prime }} \right)}$ denotes the maximum accumulated benefit under all actions when the next state is ${S^{\prime }}$.\subsubsection{State Transition}In the TUMRL, the choice of the next state $ {S^{\prime }}$ is of great significance for dealing with offensive attacks. The state transition mechanism consists of two stages: (1) the initial state transition, that occurs in the first effective update time-slot (the time-slot that is never skipped due to the environment model) of each update cycle; and (2), the probabilistic state transition that occurs in the subsequent update time-slots.\\\indent The initial state transition is shown in Table 3. According to the definition of key degree given in Section 4.2, the key node is defined as the evaluated node whose key degree is higher than the average key degree. Non-key nodes are those whose key degree is lower than the average. In Table 3, in most cases, non-key nodes can select one of the four states as the next state with equal probability, and the initial transition state of the key node is unique. The purpose of this is to increase the reaction rate of key nodes to attacks, in order to minimize the losses caused by compromised nodes.\begin{table}[H]\caption{Initial State Transition}\label{Table3}\tiny\begin{tabular}{ c | c | c | c | c | c | c | c | c }\hline\makecell[c]{$Next$ \\ $state$} & ${A_1}$ & ${A_2}$ & ${A_3}$ & ${A_4}$ & ${A_5}$ & ${A_6}$ & ${A_7}$ & ${A_8}$ \\\hline\makecell[c]{$Non-$ \\ $key$ \\ $node$} & \makecell[c]{${S_1}{S_2}$ \\ ${S_3}{S_5}$} & \makecell[c]{${S_1}{S_2}$ \\ ${S_4}{S_6}$} & \makecell[c]{${S_1}{S_3}$ \\ ${S_4}{S_7}$} & \makecell[c]{${S_1}{S_2}$ \\ ${S_5}{S_6}$} & \makecell[c]{${S_1}{S_3}$ \\ ${S_6}{S_7}$} & \makecell[c]{${S_1}{S_4}$ \\ ${S_5}{S_7}$} & \makecell[c]{${S_1}{S_2}$ \\ ${S_3}{S_4}$} & ${S_1}$ \\\hline\makecell[c]{$Key$ \\ $node$} & ${S_5}$ & ${S_6}$ & ${S_7}$ & ${S_2}$ & ${S_3}$ & ${S_4}$ & ${S_1}$ & ${S_1}$ \\\hline\end{tabular}\centering\end{table}As an example, assuming that there is a key node $i$ and a non-key node $j$ queuing for state transition, and that the current state of both nodes is ${S_1}$, their latest action is ${A_1}$, then the next state of the key node is ${S_5}$, and the next state of the non-key node can be chosen to be ${S_2}$ (Table 3). According to the Table 1, $action = {A_1}$ means that ${{T_c} < 0.5},{{T_e} \geq 0.5}和{{T_d} \geq 0.5}$. Since only the communication trust ${T_c}$ is below the threshold, it can be suspected that the evaluated node is performing an attack that causes communication failure, e.g., selective forwarding attack. In addition, from the states ${S_1}$,${S_5}$ and ${S_2}$, we can infer that the trust scores in the current time-slot are $T_i^k = T_j^k = \frac{1}{3}{T_c} + \frac{1}{3}{T_d} + \frac{1}{3}{T_e}$, and for next time-slot are $T_i^{k + 1} = {T_c}$ and $T_j^{k + 1} = \frac{1}{2}{T_c} + \frac{1}{2}{T_e}$. According to $T_i^{k + 1} - T_j^{k + 1} = \frac{1}{2}\left({{T_c} - {T_e}} \right) < 0$, it can be inferred that although the trust value of the key node and the non-key node is the same in the current time-slot, the trust value of the key node is lower than that of the non-key node in the next time-slot, which shows that the key node presents lower time delays when reacting offensive attacks.\\\indent Although the initial state transition can quickly react to attacks, it is unable to cope with the changing attack mode mentioned in Section 3. In the subsequent time-slots of the update cycle, probabilistic state transition is used to address this problem. The main idea of the probabilistic state transition is to make a decision on the current state transition by using the accumulated benefit of the historical state transition, and the higher the accumulated benefit, the greater the probability that the corresponding state is selected. Suppose that the current state is $S$, and the latest action is $A$ in the current time-slot (a time-slot after the first effective time-slot of the current update cycle), the probabilistic state transition is defined as follows:\begin{equation}\label{eq15}P\left({S^{\prime }|S,A} \right) = \frac{{Q\left({S^{\prime },A} \right)}}{{Q\left({\hat{S},A} \right)}}, \tag{15}\end{equation}where $P\left({S^{\prime }|S,A} \right)$ is the probability of transition from the state $S$ to state $ {S^{\prime }}$ when the action is $A$, ${Q\left({\hat{S},A} \right)}$ is the sum of the accumulated benefit of all states when the action is $A$, and ${Q\left({S^{\prime },A} \right)}$ is the accumulated benefit when the action is $A$ and the state is $ {S^{\prime }}$.\\\indent A correct state selection can get a greater reward, thereby accelerating the benefit accumulation mechanism (Eq. (14)). Therefore, the state with a high accumulated benefit is often the state that can effectively deal with a specific attack mode. Although a compromised node may change its attack mode in different cycles, the result of the probabilistic state transition makes the TUMRL to show preference towards states with greater accumulated benefit. Therefore, the probabilistic state transition can effectively deal with the changing attack mode of the compromised node.\section{Simulation Results and Analysis}In this study, the proposed TUMRL, and other established trust models, were simulated on MATLAB Ra to evaluate and compare their performance. First, the performance of the different mechanisms in the TUMRL was evaluated. Specifically, both the impact of the key degree mechanism, on the reaction rate to offensive attacks, and the impact of the environment model on the detection accuracy rate of compromised nodes were verified. Then, the performance of the TUMRL was compared with other related work: the ARTMM \cite{7038144}, TMC \cite{7360179} and BLTM \cite{8668833} trust models. To the best of our knowledge, the ARTMM and the TMC represent the state-of-the-art in trust modelling for UASNs. The BLTM has a simple update mechanism, which partly has motivated this study. The performance was compared in terms of detection accuracy rate, false alarm rate and energy efficiency. The deployment area was set to $500 \times 500 \times 500$ $m^3$, from which 500 sensor nodes were randomly deployed in the area, and the communication radius of sensor nodes was set to 100 m.\subsection{Performance of TUMRL}\subsubsection{Evaluation of Key Degree}The efficacy of key degree assignment during trust updates was evaluated, by measuring the time delay required by the TUMRL to react to attacks. Three typical attack modes were simulated to evaluate the performance of key nodes and non-key nodes. The attack modes were selective forwarding, DoS and packet tampering, which mainly affect communication success rate, energy consumption rate and packet error rate respectively. The nodes operated normally from 0 to 100 s, when attacks were launched by the compromised nodes during subsequent 50 s. The trust value decreased faster than that of the non-key nodes during the attack (Fig.6). This is because the state transition mechanism makes key nodes more sensitive to anomalous behavior, and the attack consequences are mitigated by quickly reducing their trust value. In the period from 100 to 150 s, the curve of key node in Fig.6a is smoother than that in the other two figures. The reason is that the selective forwarding attack not only affects communication success rate, but also causes abnormal energy consumption. Thus, the key node in Fig.6a has more choices for transitioning among states.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig6}}\caption{Efficacy of key degree assignment on the TUMRL under various attack modes.\label{fig6}}\end{figure}\subsubsection{Evaluation of Environment Model}In order to evaluate the performance of the environment model, a set of controlled experiments was designed. The detection accuracy rate was evaluated with an increasing proportion of compromised nodes, under two conditions: with and without environment model. The detection accuracy rate was taken to be the true positive rate of compromised node classification. In Fig.7, it is clear that the trust model with environment model outperforms the trust model without environment model. This is because the environment model can quantify the impact of the underwater fluctuations, to reduce the frequency of trust update when these are large, thereby alleviating the trust decline caused by environmental factors effectively. In addition, the environmental models under different positioning accuracy (100, 85, 70 percent) are simulated to observe how errors in the underwater positioning algorithms would affect the proposed security algorithm. As shown in Fig.7, the detection accuracy rate decreases with the reduced positioning accuracy. This is because the positioning accuracy directly affects the mobility factor $I_{mw}$ in the environmental model, to increase the decision error of the environmental model, thereby reducing the detection accuracy to compromised nodes.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig7}}\caption{Performance of the TUMRL, with and without environment model, for an increasing proportion of compromised nodes.\label{fig7}}\end{figure}\subsection{Comparison With Related Work}\subsubsection{Comparison of Detection Accuracy Rate}The TUMRL was compared with the TMC, ARTMM and BLTM in terms of detection accuracy rate. The detection accuracy rate was measured under the simulated condition that 30 percent randomly selected nodes are compromised. As shown in Fig.8, the TUMRL performs worse than the other schemes before about 180 s. The memory properties introduced by the reinforcement learning method in the TUMRL imply that the TUMRL states need to be populated, before achieving sufficient evidence accumulation. Therefore, the performance of the TUMRL is relatively poor initially, but it shortly reaches higher performance. Thus the TUMRL is more robust in a realistic setting, where importance is on the long term credibility of the sensor data.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig8}}\caption{Detection accuracy versus simulation time of the examined trust models (TUMRL, TMC, ARTMM, and BLTM).\label{fig8}}\end{figure}In order to assess the impact on trust evaluation caused by the changing attack mode, the performance of the detection accuracy rate was simulated with changing frequency between attack modes. For example, a changing frequency of 2 means that the compromised node changes the attack mode twice during each time cycle. As shown in Fig.9, when the changing frequency gradually increases from 0 to 2, the detection accuracy rate of the TUMRL presents a slower decline as compared to the other trust models. Once the attack mode of a compromised node changes, the difference in performance of this node can be detected by its neighbor nodes, which is reflected in the trust evidence in the TUMRL. Moreover, the weight of the trust evidences can be flexibly changed to confront the unpredictable attack mode. Therefore, the simulations suggest that the performance of the TUMRL is more robust in detecting compromised nodes, under the condition that the attack mode is changeable.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig9}}\caption{Detection accuracy rate with changing frequency of attack mode.\label{fig9}}\end{figure}\subsubsection{Comparison of False Alarm Rate}For determining the impact of the dynamic environment on trust evaluation, the TUMRL was compared with the TMC and ARTMM by evaluating their false alarm rate-under the simulated condition of different moving speeds between the sensor nodes.1 As shown in Fig.10, the simulations produce lower false alarm rate for the TUMRL, when the speed of sensor nodes was gradually increased from 0 to 50 m/s. Thus the use of an environment model seems to be an effective extension of trust models, when monitoring underwater dynamic environments. As the speed of the sensor nodes increases, so does the response of the environment model. Once the output exceeds the specified threshold, the trust scores are not updated during the current time-slot. Hence the TUMRL distinguishes the impact originated from attacks from environment fluctuations. However, there are no effective mechanisms in the TMC and ARTMM to complete a similar distinction. Therefore, when the node movement speed increases, that is, the impact of the environment on the trust evaluation increases, the false alarm rate gradually increases in the TMC and ARTMM.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig10}}\caption{False alarm rate versus moving speed of sensor nodes for the compared models.\label{fig10}}\end{figure}An additional simulation was implemented to exhibit the effect of the distribution density of UASN on the false alarm rate. As shown in Fig.11, the TUMRL outperforms other schemes when the distribution density of sensor nodes is greater than 100. However, the performance of false alarm rate for the TUMRL is worse than that of the other schemes for smaller densities. The reason is that the sensor nodes lack sufficient information for trust score evaluation when the distribution density of sensor nodes is relatively low. Further, the TURML has higher requirements on the amount of data. Hence the TUMRL is more robust in compact networks, where information between the nodes highly contributes to the overall health factor of the network.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig11}}\caption{False alarm rate versus network density.\label{fig11}}\end{figure}\subsubsection{Comparison of Energy Efficiency}With respect to energy efficiency, the four models (TUMRL, TMC, ARTMM and BLTM) were compared under different percentages of compromised nodes and network densities. Since the energy consumption of a node, when receiving or sending packets, is much greater than the energy consumption for storage and computation, only the communication energy consumption is considered here. The energy efficiency is defined as the the ratio of the remaining energy of the network over the initial energy after detecting all the compromised nodes.\\\indent As shown in Fig. 12, when the proportion of malicious nodes is less than 8 percent, the energy efficiency of the TUMRL is slightly inferior to that of the other three schemes. As stated, the reinforcement learning adopted in the TUMRL is more effective in dense attacks; few attacked nodes only produce a small increase in energy expenditure. However, as the proportion of compromised nodes increases, the energy consumption caused by attacked nodes far exceeds the computational costs of the reinforcement learning model. The energy efficiency of the TUMRL is better than that of the other schemes when the proportion of compromised nodes is higher than 12 percent.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig12}}\caption{Energy efficiency with different proportion of compromised nodes.\label{fig12}}\end{figure}The energy efficiency was also analyzed under different network densities (Fig.13). Although the performance of the TUMRL is similar to the other schemes-and even slightly lower when the distribution density of sensor nodes is less than $90nodes/km^3$. For higher densities, the TUMRL outperforms the other methods. As before the reason is that the reinforcement learning algorithm in the TUMRL requires sufficient data.\begin{figure}\centerline{\includegraphics[width=3.5in]{Figures/fig13}}\caption{Energy efficiency with different distribution density.\label{fig13}}\end{figure}\section{Conclusion}This study explores the problem of effective trust update, while facing unstable underwater environment fluctuations and offensive manoeuvres-consisting of switching attack modes. A novel trust update mechanism based on reinforcement learning is proposed. In the scheme, the impact of underwater environment is first analyzed and quantified by an environment model. The quantified result is used to regulate and improve the trust score update mechanism. Further, the concept of key degree is introduced. The key degree of the sensor nodes determines their relative priority during trust score update. The trust score update is completed by way of a reinforcement learning model, which integrates the role of the environment model and the key degree-to adapt to switching attack modes and anomalous behaviour from compromised nodes. Moreover, the experimental analysis indicates that the TUMRL offers increased performance with respect to previously established work, in terms of complex underwater environment management and defective node detection. This is particularly true when measuring the long term performance of the TUMRL, in networks with relatively high density, networks subject to widespread global attacks and varying attack conditions.\nocite{*}\bibliography{Reference}\end{document}

遇到的问题以及解决方案：

1.需要注意中文论文和英文论文的编辑器选项。点击构建选项，可以修改默认编译器、PDF查看器和默认文献工具等。若写中文论文，则需修改默认编译器为XelaTeX.，若为英文，则用PdfLaTex。

2.问题：Something’s wrong–perhaps a missing \item. \end{thebibliography}

解决的办法是：