{"text": "\\section{Randomized controlled trial}\\label{RCT}\nA standard approach in recommendation systems literature is to evaluate a counterfactual policy using off-policy evaluation methods \\citep{swaminathan2015batch, swaminathan2015self,gilotte2018offline}. Conceptually this involves identifying which observed user-item interactions would have also occurred under the counterfactual policy and using observed utilities from these interactions to compute the mean utility under the counterfactual policy. In our context, this is problematic for two reasons. First, recommendation policies impact both the outcomes of user-story interactions and the number of interactions. Thus, the natural metric for evaluating a recommendation policy is the \\emph{total utility}, which we cannot reliably estimate with standard off-policy metrics that do not capture the change in the number of interactions (see \\cite{forbes} for a similar argument). Second, even if we abstract away from changes on the extensive margin and assume that the impact of a new recommendation policy can be summarized by additive effects across user-story interactions we will systematically miss some of them. We can adjust for differences between user-item interactions captured in the off-policy evaluation and those in the population at large but this is likely to be incomplete due to data sparsity. \n\nBoth these challenges can be thought of as a problem of the overlap between the data generated using the baseline policy and data that would have been generated under the counterfactual policy. Considering a general case with effects on the extensive margin and interaction effects between stories, reliable off-policy estimates can be obtained only when the baseline and counterfactual policies coincide, making the method impracticable.\\footnote{An alternative is to consider a structural model.} When one is willing to consider the case of simple additive utilities, the extent of the overlap between user-story interactions in the baseline and the counterfactual policy determines how reliable this approach is; \\cite{contrastingoffon} and \\cite{offonecomm} show these limitations in empirical studies.\\footnote{By construction this approach is more suitable for evaluating small changes in the policy. A large change that results in new user-story interactions will imply a low overlap.}\n\nAn alternative approach to evaluate a new policy is an A/B experiment in which the targeted metric is \\emph{total utility} of a user. This is the method we use in this paper. This section discusses the design of the experiment and presents the results.\n\n\\subsection{Design of the experiment}\n\nIn the experiment, 7750 users were randomized into treatment and control. We considered only users that had at least sixty story interactions before the experiment. The treatment group received personalized recommendations in the \\emph{Recommended Story} tray, while the control group remained with the baseline system of stories selected randomly from a list specified by editors. Tray's UI was consistent across the control and treatment group; the only thing exogenously varied was the set of stories displayed in the tray. Content presented in the other trays of the app was unchanged. Treated users were not aware of the change in the recommendation system. The experiment lasted for two weeks, which was pre-determined with the partner. Based on the analysis of past data the minimum detectable effect on total utility (per user sum of utility over two weeks) was 0.08 standard deviation.\n\nThe experiment started on the 22nd of July 2021 and lasted until the 4th of August.\\footnote{After the experiment, our system of personalized recommendations was launched for all eligible users on the \\emph{Recommended Story} tray.} During the experiment, 3023 users from the experimental groups launched the app at least once and of them, 525 viewed at least one story in \\emph{Recommended Story} tray.\\footnote{The large difference in the number of randomized students and the number of students who were active during the experimental period is because one, a number of students were only active on other trays and two, there is continuous churn and students drop off the app over time.} We report the balance of observable characteristics between the treatment and control groups in Appendix \\ref{cobal}. \n\nIn the evaluation of the experiment, we consider subjects that launched the app at least once during the experiment. This means that we exclude users that did not launch the app in the experiment period, but we include users who launched the app but did not click on any of the stories in the \\emph{Recommended Story} tray. The reason for including the latter group is that users can see the front page of the first story in \\emph{Recommended Story} tray without starting to interact with any of the stories in the tray. Thus, we also capture the change from not interacting at all with content in the \\emph{Recommended Story} to having some non-zero utility interaction.\n\n\\subsection{Outcome metrics}\n\nWe focus on two types of outcomes: first, outcomes specific to \\emph{Recommended Story} tray and, second, overall app usage. Even though other trays in the app remained unchanged, we are interested in the impact on overall app usage to understand whether changes in one tray are compensated by altered utilization of content elsewhere, or the overall time spent on the app also shifts. In this specific context where many users are consuming content based on the recommendation of parents or teachers, understanding the overall elasticity of consumption with respect to changes in the app quality is an important, strategic metric that can guide app development.\n\n\n\nWe consider the following outcome metrics: (i) \\emph{total utility} - per user sums of utility from all user-story interactions in \\emph{Recommended Story} tray during the experiment, (ii) \\emph{total utility all trays} - per user sums of utility from all user-story interactions in all trays of the app during the experiment, (iii) \\emph{total stories} - per user sums of completed stories in \\emph{Recommended Story} tray during the experiment, (iv) \\emph{total stories all trays} - per user sums of completed stories in all trays, (v) \\emph{total reading time} - per user sums of estimated reading time of stories completed in \\emph{Recommended Story} tray, (vi) \\emph{total reading time all trays} - per user sums of estimated reading time of stories completed in all trays.\\footnote{The estimates of the reading time per story are provided by \\emph{S2M} as intervals, e.g., from two to four minutes. For each story we take the mid point of the interval.}\n\nAll metrics relate to total app utilization per user. This approach assigns the same weight to each user without distinguishing between users of varying consumption patterns. In \\Cref{section:otherutils}, we additionally consider mean utility from user-story interactions.\n\n\n\nWe constructed all variables based on raw log files provided by \\emph{S2M}. These log files are internal data used by \\emph{S2M} data analytics teams, they constitute the most accurate available picture of users' behavior on the platform. Nevertheless, occasional instrumentation errors occur. The type of instrumentation errors that are problematic for our analysis is an incorrect attribution of user-story interactions.\\footnote{This can for example take a form of a user being assigned interactions of another user, or assigned completions instead of views.} This results in some users having spurious, very high utilization during specific sessions. To avoid including such sessions in the analysis we drop users that had at least one session in which they completed more than 10 stories. In result, we drop 40 users. \\Cref{tab:sum_stats_exp} provides summary statistics of variables describing utilization in the \\emph{Recommended Stories} tray.\n\n\\begin{table}[!htbp] \\centering \n \\caption{Summary statistics of outcome variables describing activity on the \\emph{Recommended Stories} tray per group.} \n \\label{tab:sum_stats_exp} \n \\resizebox{\\textwidth}{!}{\n\\begin{tabular}{>{}l|lrrrrrr}\n\\toprule\nnames & group & min & mean & percentile 75th & percentile 90th & percentile 95th & max\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Total utility}}} & control & 0 & 0.28 & 0 & 0.51 & 1.6 & 13.6\\\\\n {\\textcolor{black}{\\textbf{Total utility}}} & treatment & 0 & 0.45 & 0 & 1.30 & 3.0 & 23.9\\\\\n {\\textcolor{black}{\\textbf{Total stories}}} & control & 0 & 0.15 & 0 & 0.00 & 1.0 & 11.0\\\\\n {\\textcolor{black}{\\textbf{Total stories}}} & treatment & 0 & 0.27 & 0 & 1.00 & 2.0 & 21.0\\\\\n {\\textcolor{black}{\\textbf{Total reading time}}} & control & 0 & 1.04 & 0 & 0.00 & 7.5 & 78.5\\\\\n {\\textcolor{black}{\\textbf{Total reading time}}} & treatment & 0 & 1.94 & 0 & 7.25 & 11.0 & 148.5\\\\\n\\bottomrule\n\\end{tabular}\n}\n\\caption*{\\footnotesize{\\textit{Note: Summary statistics of variables measuring utilization of the \\emph{Recommended Story} tray during the experiment. Sample includes only users that launched the app during the experiment period. }}}\n\\end{table}\n \nEven though we consider only users that launched the app during the experiment, most of them had zero utilization of the app in the \\emph{Recommended Story} tray. Nevertheless, we still include them in the experiment evaluation as different recommendation policies might impact the share of users consuming any content in the tray. From \\Cref{tab:sum_stats_exp} we can notice that the treatment group has higher mean utilization and higher utilization on the 90th and 95th percentiles. \n\nWe are also interested in the impact of the personalization of content recommendations in \\emph{Recommended Stories} tray on the overall app usage. \\Cref{tab:sum_stats_exp_allpaths} presents summary statistics of variables describing utilization on all trays in the app.\n\n\\begin{table}[!htbp] \\centering \n \\caption{Summary statistics of outcome variables describing activity on all trays per group.} \n \\label{tab:sum_stats_exp_allpaths} \n \\resizebox{\\textwidth}{!}{\n\\begin{tabular}{>{}l|lrrrrrr}\n\\toprule\nnames & group & min & mean & percentile 75th & percentile 90th & percentile 95th & max\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Total utility}}} & control & 0 & 3.79 & 4.47 & 11.4 & 17.77 & 81.5\\\\\n {\\textcolor{black}{\\textbf{Total utility}}} & treatment & 0 & 4.32 & 5.50 & 13.5 & 19.02 & 63.7\\\\\n {\\textcolor{black}{\\textbf{Total stories}}} & control & 0 & 1.89 & 2.00 & 6.0 & 9.00 & 41.0\\\\\n {\\textcolor{black}{\\textbf{Total stories}}} & treatment & 0 & 2.25 & 3.00 & 7.0 & 11.00 & 42.0\\\\\n {\\textcolor{black}{\\textbf{Total reading time}}} & control & 0 & 12.65 & 14.50 & 40.0 & 65.32 & 273.0\\\\\n {\\textcolor{black}{\\textbf{Total reading time}}} & treatment & 0 & 15.15 & 15.00 & 46.0 & 73.12 & 365.0\\\\\n\\bottomrule\n\\end{tabular}\n}\n\\caption*{\\footnotesize{\\textit{Note: Summary statistics of variables measuring the overall app utilization during the experiment. Sample includes only users that launched the app during the experiment period. }}}\n\\end{table}\n\nIn \\Cref{tab:sum_stats_exp_allpaths} we see that mean outcomes are higher in the treatment group for all outcome variables. Treatment has higher or equal outcomes at the 75th, 90th, and the 95th percentile.\n\nTo compare distributions of total utility in treatment and control we carry out Wilcox test (one sided alternative). Using the total utility in the \\emph{Recommended Story} tray we reject the hypothesis that the true location shift is less than zero, with p-value 0.0007, and for all trays in the app with p-value of 0.05.\nIn \\Cref{fig:total_utility_distribution} we present entire distributions of total utility. Panel A shows cumulative distribution functions of total utility from \\emph{Recommended Story} tray per experimental group; panel B shows the difference between probability density functions of the treatment and the control group. We can notice that a larger share of control group users did not have any positive-utility content interaction during the experiment. Treatment group has a higher probability mass for almost any non-zero utility.\n\n\\begin{figure}%\n \\centering\n \\caption{Distribution of total utility in \\emph{Recommended Stories} tray per group.}%\n \\subfloat[\\centering Cumulative distribution function per group. Treatment in blue, control in red.]{{\\includegraphics[width=15cm]{images/cdfs_total_utils.png} }}%\n \\qquad\n \\subfloat[\\centering Difference between the probability density functions of treatment and control groups. Treatment in blue, control in red. ]{{\\includegraphics[width=15cm]{images/diff_pdfs_total_utility.png} }}%\n\n \\label{fig:total_utility_distribution}%\n\\end{figure}\n\n\n\n \t\n\\subsection{Average treatment effects}\\label{ATE_section}\nEstimates of the average treatment effects are presented in \\Cref{tab:ATE}. We use the difference in means, the linear regression, and the augmented inverse propensity weighing (AIPW) estimators.\n\nWe find a strong positive effect of personalization on all outcomes metrics. The impact on utilization of the \\emph{Recommended Stories} tray has high economic and statistical significance. Total utility increases by 63\\% ($\\pm$ 28\\%), the number of stories completed in the tray by 78\\% ($\\pm$39\\%), and total reading time by 87\\% ($\\pm$ 41\\%).\\footnote{Confidence intervals in brackets. Standard errors based on difference in means estimator.} \n\nWe also find an increase in the utilization of the app across all trays; total utility increases by 14\\% ($\\pm$ 12\\%), the number of stories completed by 19\\% ($\\pm$ 14\\%) , and the reading time in all trays by 20\\% ($\\pm$ 14\\%). Thus, the increase of consumption of content in \\emph{Recommended Stories} did not come entirely at the expense of consumption in other trays; on the contrary, this evidence suggest that users started using the app more.\\footnote{In \\Cref{ap_out} we provide robustness check of this estimates by trimming the top 5\\% users with the highest daily number of completed stories instead of the cap on 10 stories.}\n\n\\begin{table}[!htbp] \\centering \n \\caption{Estimates of average treatment effects for all outcome variables} \n \\label{tab:ATE} \n\\resizebox{\\textwidth}{!}{%\n\\begin{tabular}{>{}l|rrrrrrrr}\n\\toprule\nvariable & ATE & std.err. & p.value & ATE \\% & ATE reg adj. & std. err. reg adj. & ATE AIPW adj. & std. err. AIPW adj.\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Total utility RS}}} & 0.17 & 0.05 & 0.00 & 60 & 0.18 & 0.05 & 0.18 & 0.05\\\\\n {\\textcolor{black}{\\textbf{Total stories RS}}} & 0.12 & 0.04 & 0.00 & 78 & 0.13 & 0.03 & 0.13 & 0.04\\\\\n {\\textcolor{black}{\\textbf{Total reading time RS}}} & 0.90 & 0.26 & 0.00 & 87 & 0.96 & 0.25 & 0.98 & 0.25\\\\\n\\addlinespace\n {\\textcolor{black}{\\textbf{Total utility all trays}}} & 0.52 & 0.27 & 0.05 & 14 & 0.50 & 0.26 & 0.51 & 0.26\\\\\n {\\textcolor{black}{\\textbf{Total stories all trays}}} & 0.36 & 0.16 & 0.03 & 19 & 0.36 & 0.15 & 0.35 & 0.15\\\\\n {\\textcolor{black}{\\textbf{Total reading time all trays}}} & 2.50 & 1.10 & 0.02 & 20 & 2.47 & 1.07 & 2.49 & 1.06\\\\\n\\bottomrule\n\\end{tabular}\n\n}\n\\caption*{\\footnotesize{\\textit{Note: Estimates of the average treatment effect using difference-in-means estimator (first column), adjusting for covariates with a linear regression (fifth column), and adjusting for covariates using Augmented Inverse Propensity Weighting - AIPW (column seven); covariates used: users' grade, user type (B2B, B2C, or paid), past utilization, niche type (indicator whether user consumes content that is popular amongst other users or more niche content), past usage of the \\emph{Recommended Story} tray. Columns two, six, and eight show standard errors. Column three presents p-values. Three first rows describe outcomes in \\emph{Recommended Story} tray, three bottom rows overall app utilization.}}}\n\\end{table}\n\n\n Additionally, we review differences in total utility in the most popular trays in the app across treatment and control. \\Cref{fig:ate_all_paths} shows differences in average total utility in treatment and control groups in other popular trays in the app. The experiment period is marked in blue; we can see that the difference between the two groups is statistically significant only for Recommended Story tray. We carry out this comparison for the same users in a pre-experiment period; before the experiment, differences in average utility across treatment and control are insignificant in all of the trays (which is expected since the users where randomly assigned).\n \n \n \\begin{figure}[!ht]\n \\centering\n \\caption{Difference in average total utility in treatment and control groups for eight most popular trays.}\n \\includegraphics[scale = 0.63]{images/plot_al_stories.png}\n \\caption*{\\footnotesize{\\textit{Note: Difference in average total utility in treatment and control groups for eight most popular trays. Experimental period in blue, pre-experimental in red. Pre-experimental period is 7-19.06.2021, there are approximately twice as many users in the per-experimental period (this date is chosen on the basis of being the closest two-weeks long period without other major experiments and alterations in the app).}}}\n \\label{fig:ate_all_paths}\n \\end{figure} \n\n \n \\paragraph{Impact on time spent on the app.} In \\Cref{tab:ATE}, we see a strongly significant positive effect on the time spent, both in \\emph{Recommended Story} tray as well as across all trays.\\footnote{Note, this outcome metric is a sum of the duration of completed stories. We do not include stories that were started, but not completed, since we do not observe the moment in which users stopped engaging with a specific story. The average number of stories started but not completed in treatment and control is roughly the same 2.33 in treatment and 2.17 in control; the test for difference in means has p-value of 0.55.} \n \n The increase in total time spent on the app is particularly interesting because it means that students prefer to spend time on the app than engage in other activities, outside of the app. In our context, this result suggests that if the content is interesting to students, they are willing to go beyond the time prescribed by parents or teachers.\n \n Generally, we can consider users responding to the improvements in the app quality on an intensive and extensive margin. Gains on the intensive margin would be due to users better allocating their time; in our case, that is reallocation of the time to more attractive, personalized content in the \\emph{Recommended Story} tray. While the impact on the extensive margin means that users substitute away from other activities and start using the app more. The effect on the extensive margin highlights that the app quality matters to the users, and improving it will result in more time spent with the app.\n \n \n\\section*{Appendix}\n\n\\section{Covariate balance check}\\label{cobal}\n\nTable \\ref{tab:baltab} presents comparison of means of user characteristics across treatment and control. We find that difference between treatment and control are small and statistically insignificant.\n\n\\begin{table}[!htbp] \\centering \n \\caption{Balance of covariates across treatment and control} \n \\label{tab:baltab} \n\\resizebox{0.75\\textwidth}{!}{%\n\\begin{tabular}{>{}l|rrrrr}\n\\toprule\ncovariate & mean treatment & sd treatment & mean control & sd control & p value\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{past utility}}} & 101.23 & 139.38 & 94.67 & 114.35 & 0.17\\\\\n {\\textcolor{black}{\\textbf{past stories}}} & 57.14 & 89.03 & 52.75 & 77.22 & 0.16\\\\\n {\\textcolor{black}{\\textbf{max streak}}} & 18.58 & 85.00 & 17.79 & 84.00 & 0.80\\\\\n {\\textcolor{black}{\\textbf{share b2b}}} & 0.31 & 0.46 & 0.29 & 0.46 & 0.46\\\\\n {\\textcolor{black}{\\textbf{share b2c}}} & 0.41 & 0.49 & 0.41 & 0.49 & 0.79\\\\\n\n {\\textcolor{black}{\\textbf{share paid}}} & 0.25 & 0.43 & 0.26 & 0.44 & 0.66\\\\\n {\\textcolor{black}{\\textbf{share grade 2}}} & 0.24 & 0.43 & 0.24 & 0.43 & 0.82\\\\\n {\\textcolor{black}{\\textbf{share grade 3}}} & 0.22 & 0.42 & 0.22 & 0.41 & 0.69\\\\\n\\bottomrule\n\\end{tabular}\n}\n\\caption*{\\footnotesize{\\textit{Note: Means of users' characteristics in treatment and control. Last column p-value from a t.test for difference in means. Category paid includes users from a paid fLive program and regular paying users; category b2b includes regular b2b customers and club 1br users, a B2B promotion.}}}\n\\end{table}\n\n\\section{Robustness check of the average treatment effect estimates}\\label{ap_out}\nIn table \\ref{tab:ATE_rob} we present estimates of the average treatment effect based on data which is trimmed at the 95\\% percentile of daily stories completed, i.e., we remove users that are in top 5\\% of users with highest daily number of completed stories across all paths of the app.\n\nWe find very similar estimates of the ATE for outcomes across all paths in the app. The path specific estimates are smaller, but still high and statistically significant. The confidence intervals include the point estimates from the baseline specification.\n\n\\begin{table}[!htbp] \\centering \n \\caption{Estimates of average treatment effects for all outcome variables} \n \\label{tab:ATE_rob} \n\\resizebox{0.8\\textwidth}{!}{%\n\\begin{tabular}{>{}l|rrrr}\n\\toprule\nvariable & ATE & std.error & p.value & ATE percentage\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Total utility RS}}} & 0.14 & 0.04 & <0.001 & 58\\\\\n {\\textcolor{black}{\\textbf{Total stories completed RS}}} & 0.09 & 0.03 & <0.001 & 68\\\\\n {\\textcolor{black}{\\textbf{Total reading time RS}}} & 0.67 & 0.21 & <0.001 & 77\\\\\n\\addlinespace\n {\\textcolor{black}{\\textbf{Total utility all paths}}} & 0.50 & 0.23 & 0.03 & 15\\\\\n {\\textcolor{black}{\\textbf{Total stories completed all paths}}} & 0.32 & 0.13 & 0.01 & 21\\\\\n {\\textcolor{black}{\\textbf{Total reading time all paths}}} & 2.07 & 0.88 & 0.02 & 20\\\\\n\\bottomrule\n\\end{tabular}\n}\n\\caption*{\\footnotesize{\\textit{Note: Estimates of the average treatment effect using difference-in-means estimator. Three first rows describe outcomes in \\emph{Recommended Story} path, three bottom rows overall app utilization. Last columns shows the ATE estimate as a percent share of the baseline.}}}\n\\end{table}\n\n\n \\section{Alternative utility metrics.}\\label{section:otherutils} \n The utility metric that we analyzed so far is the per user sum of utility from all user-story interactions during the experiment period. This metric assigns the same weight to each user, irrespective of the number of stories consumed by that user. It also captures the fact that a new policy might impact the number of stories consumed by users. However, a firm introducing a new recommendation system might have a different objective, for example to weigh each user-story observation equally, or simply focus on maximizing the mean utility each user receives. In \\Cref{tab:ATE_alt}, we provide treatment effects on such alternative utility metrics.\n \n \\begin{table}[]\n\t\t\\centering\n\t\t \\caption{Average treatment effects: alternative utility metrics}\\label{tab:ATE_alt}\n\\resizebox{0.7\\textwidth}{!}{%\n\\begin{tabular}{>{}l|rrrr}\n\\toprule\nvariable & ATE & std. error & p. value & ATE \\% \\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Mean utility RS}}} & 0.015 & 0.006 & 0.013 & 0.31\\\\\n {\\textcolor{black}{\\textbf{Utility RS}}} & 0.006 & 0.013 & 0.626 & 0.01\\\\\n\\bottomrule\n\\end{tabular}\n}\n\n\\caption*{\\footnotesize{\\textit{Note: Estimates of the average treatment effect using difference-in-means estimator. First row mean utility per user in \\emph{Recommended Story} path (mean of the mean utilities within the group); only users that launched the app considered. Second row mean utility per user-story interaction. }}}\n \\end{table}\n\nIn the first row of \\Cref{tab:ATE_alt} we present the average treatment effect on mean utility per user, it is insignificant. This metric weights each user equally, but does not capture the increase in the number of user-story interactions. Finally, in the second row, we present the treatment's impact on mean utility per user-story interaction, this metric puts more weight on heavy users, as they have move interactions. We find that there is a strongly positive and statistically significant treatment effect. This suggests that the personalized policy had a stronger positive effect on heavy users, that consume many stories, than on somewhat infrequent users.\n\n\\section{Heterogeneous treatment effects: regressions analysis}\\label{robust}\n\nTo provide further robustness into the finding that heavy and niche users benefit from personalized recommendations, we present results of regressions of AIPW scores based on total utility on users' past utilization (see \\cite{athey2019estimating} for methodology). See \\cref{tab:aipw_reg} for summary of results. \n\n\\begin{table}[!htbp] \\centering \n \\caption{Results of a regression of user types on AIPW scores.} \n \\label{tab:aipw_reg} \n \\resizebox{\\textwidth}{!}{\n\\begin{tabular}{@{\\extracolsep{5pt}}lccccccc} \n\\\\[-1.8ex]\\hline \n\\hline \\\\[-1.8ex] \n & \\multicolumn{7}{c}{\\textit{Dependent variable:}} \\\\ \n\\cline{2-8} \n\\\\[-1.8ex] & \\multicolumn{7}{c}{Total utility (aipw.scores)} \\\\ \n\\\\[-1.8ex] & (1) & (2) & (3) & (4) & (5) & (6) & (7)\\\\ \n\\hline \\\\[-1.8ex] \n past utility & 0.001$^{***}$ (0.0003) & & & & & & 0.001 (0.0004) \\\\ \n stories completed & & 0.002$^{***}$ (0.001) & & & & 0.001$^{**}$ (0.001) & \\\\ \n heavy user utility & & & 0.366$^{***}$ (0.075) & & & & \\\\ \n heavy user completions & & & & 0.352$^{***}$ (0.076) & & & \\\\ \n niche type & & & & & 0.342$^{***}$ (0.074) & 0.231$^{***}$ (0.088) & 0.256$^{***}$ (0.091) \\\\ \n \\hline \\\\[-1.8ex] \nObservations & 2,661 & 2,661 & 2,661 & 2,661 & 2,661 & 2,661 & 2,661 \\\\ \nR$^{2}$ & 0.006 & 0.007 & 0.009 & 0.008 & 0.008 & 0.010 & 0.009 \\\\ \n\\hline \n\\hline \\\\[-1.8ex] \n\\textit{Note:} & \\multicolumn{7}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\\\ \n\\end{tabular} \n}\n\\caption*{\\footnotesize{\\textit{Note: Outcome variable is AIPW score of total utility per user. OLS estimator. All covariates defined based on the pre-experiment app usage. $^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01}}}\n\\end{table} \n\nColumns one to four of \\cref{tab:aipw_reg} show that users with high past utilization have higher treatment effects. Column five shows higher treatment effect for niche type users. Finally, columns six and seven control both for heavy utilization and niche type; niche type remains to have a high and statistically significant treatment effect.\n\n\\section{Data-driven treatment effects heterogeneity}\\label{hte_appendix}\n\n\n\nWe use the estimated causal forest to divide our users into tertiles according to their estimates CATE prediction (see \\cite{chernozhukov2018generic} for details of this approach). To avoid using model that was fitted using observations for which we make predictions, we use honest sample splitting with 10 folds. \n\n\\Cref{fig:CATE_4} shows the the predicted CATES in the four groups. First of all, the treatment effects are quite similar for the four groups. The fourth quartile appears to have higher treatment effects, but the differences are small.\n\n\t\\begin{figure}[H]\n\t\t\t\\centering\n\t\t\t\\caption{Average CATE within each ranking (as defined by predicted CATE). Predictions with OLS in blue and AIPW scores in red.}\n\t\t\t\\includegraphics[height=3.5in]{images/cate_3.png}\n\t\t\t\\label{fig:CATE_4}\n\t\\end{figure}\t\n\t\nFinally, we can also compare average characteristics for individuals in the four quartiles. We present such a comparison in \\Cref{fig:average_CATEs}.\n\nHeavy users (high maximal streak and freq-user indicator) appear more frequently in the highest quartile. We also see more niche users in the fourth group. We look in detail into these groups in the next subsections.\n\n\t\\begin{figure}[H]\n\t\t\t\\centering\n\t\t\t\\caption{Average covariate values within group (based on CATE estimate ranking).}\n\t\t\t\\includegraphics[height=5in]{images/average_CATEs.png}\n\t\t\t\\label{fig:average_CATEs}\n\t\\end{figure}\t\n\n\\section{Model calibration with experimental data}\\label{calibration}\nThe main component of the recommendation system is the collaborative filtering model that predicts user utility from user-story interactions. In our analysis, high treatment effects suggest that the model has successfully identified user preferences and selected stories that users liked.'\nIn this section, we further evaluate the calibration of the model by correlating the models predicted user utilities with observed utilities from the experiment. Figure \\ref{hist} shows the histogram of the predicted utilities; we can notice that they vary from very high values of around 1 to lower values of $0.2$. We don't see values lower than 0.2 because in the experiment we considered only stories ranked at the top of the ranking of predicted utilities.\n\n\\begin{figure}\n \\centering\n \\caption{Histogram of predicted utility}\n \\includegraphics[scale = 0.6]{images/his_pred.png}\n \\label{hist}\n\\end{figure}\n \nTo evaluate the model calibration, we regress the predicted utility on the observed utility from the experiment. Regression results are in Table \\ref{tab:ev_pred_all}.\n\n\n\\begin{table}[!htbp] \\centering \n \\caption{Correlation between utility predicted by the collaborative filtering model and observed in the experiment. $^{***}: p < 0.01$} \n\\begin{tabular}{@{\\extracolsep{5pt}}lccc} \n\\\\[-1.8ex]\\hline \n\\hline \\\\[-1.8ex] \n & \\multicolumn{3}{c}{\\textit{Dependent variable:} utility} \\\\ \n\\cline{2-4} \n & All users & Frequent users & Infrequent users \\\\ \n\\hline \\\\[-1.8ex] \npred. utility & 0.44$^{***}$ (0.01) & 0.46$^{***}$ (0.01) & 0.43$^{***}$ (0.01) \\\\ \n \\hline \\\\[-1.8ex] \nObservations & 9,344 & 2,268 & 7,076 \\\\ \nR$^{2}$ & 0.40 & 0.38 & 0.41 \\\\ \n\\hline \n\\hline \\\\[-1.8ex] \n\\end{tabular}\n \\label{tab:ev_pred_all} \n \n \\end{table}\n \\Cref{tab:ev_pred_all} shows that the model predictions are strongly correlated with observed utilities. The model is better calibrated for frequent users, for whom we have longer consumption histories (albeit the difference is small). We also break down the analysis by the experimental groups, which is shown in \\Cref{tab:ev_pred_groups}.\n \n\\begin{table}[!htp] \\centering \n \\caption{Results from linear regressions of actual utility on predicted utility. Column (1) treatment group, column (2) control group.} \n\\begin{tabular}{@{\\extracolsep{5pt}}lcc} \n\\\\[-1.8ex]\\hline \n\\hline \\\\[-1.8ex] \n & \\multicolumn{2}{c}{\\textit{Dependent variable:} utility} \\\\ \n\\cline{2-3} \n & treatment & control \\\\\n\\hline \\\\[-1.8ex] \n pred. utility & 0.33$^{***}$ (0.01) & 0.60$^{***}$ (0.01) \\\\ \n \\hline \\\\[-1.8ex] \nObservations & 4,695 & 1,773 \\\\ \nR$^{2}$ & 0.32 & 0.51 \\\\ \n\\hline \n\\hline \\\\[-1.8ex] \n\\end{tabular} \n \\label{tab:ev_pred_groups} \n\\end{table} \n\nIn \\Cref{tab:ev_pred_groups} we can notice that the predictions from the model correlate strongly with observed utility in both treatment and control groups. The model is much better calibrated in the control group, this is not surprising because the model is trained on similar data\n\\section{Introduction}\nRecommendation systems, the algorithms that determine which pieces of content will be displayed to each users, have been widely deployed in online services and credited with being an important factor in determining user engagement with the service. Personalized content recommendations have contributed to the success of some of the most valuable companies in the world. Market leaders in the entertainment sector (e.g., \\emph{Netflix} or \\emph{Spotify}) and in online retail (e.g., \\emph{Amazon}) are at the forefront of developing algorithms that provide personalized recommendations, and reap high benefits from implementing them.\\footnote{See \\citep{gomez2015netflix} for a discussion of the purpose and business value of personalized recommendation algorithms at \\emph{Netflix}.} Recommendations, particularly personalized ones, in principle, have the potential to create significant value in other settings where user preferences for items vary.\n\n\\par However, the incremental benefits of personalization have also been challenged, and the empirical question of its impact remains open in many settings, particularly in education. In real-world educational applications, the user base may be orders of magnitude smaller than popular entertainment applications, and so it is unclear whether data-driven personalization would be effective in such settings. In addition, the benefits of personalization depend on the fundamental preferences of the users (e.g. students); if their preferences are homogeneous, then human curation or simple popularity-based algorithms may be sufficient. Therefore, empirical evidence is required to understand the importance of personalization in a given setting. \n\n\\par In education and training, students might spend more time with educational material (and potentially learn more) if it matches their interests. Yet perhaps surprisingly, the publicly available evidence of the impact of personalized content recommendations in education is limited. This paper aims to fill some of these gaps by providing evidence from a large-scale randomized controlled trial (RCT) designed to measure the impact of the introduction of personalized recommendations in place of editor based manually curated recommendations into \\emph{Freadom}, an educational app designed to help children in India learn to read in English. In particular, we conducted a two-week-long randomized experiment, where the control group was exposed to stories based on the status quo, a system in which editors select content for all users (the ``editorial-based'' system), while the treatment group was exposed to stories from a personalized recommendation system in one section of the app.\n\n\\par Our most important finding is that personalization of recommended content leads to a substantial increase in user engagement with the app compared to the editorial-based system: our estimate of the increase in usage of the personalized section is 63\\% ($\\pm$ 28\\%).\\footnote{In the brackets we show a 95\\% confidence interval.} A key element of the experiment is that the personalized content was shown in one section of the app; thus, it is possible that users might simply shift from consuming editor-based content to personalized content without increasing overall engagement. We also estimate the total increase in app usage which includes all sections of the app and estimate an increase of approximately 14\\% ($\\pm$ 12\\%). \n\n\\par Increases in the consumption of educational content of the magnitude that we estimate can lead to substantial societal benefits. Notably, as the app's content is curated by pedagogy experts, higher levels of engagement are likely to accelerate learning. It is worth noting that Freadom has wide reach at a low cost; therefore, improving its efficiency can potentially benefit a large user base.\n\n\\par Personalization of content selection in the ed-tech context typically takes the form of either assigning learning materials at the difficulty level that is right for the specific user or adjusting the content's style so that it matches the user's preferences.\\footnote{See \\cite{escueta2020upgrading} for a review of the literature on the impact of personalization of learning content difficulty on learning outcomes.} In this paper, we focus on the latter. It is not a priori clear that personalized content increases app usage. Notably, learners might engage with ed-tech products following a specific routine or, in the case of children, the recommendation of parents or teachers. The finding that overall usage of the app increases following the introduction of personalized recommendations suggests that investments in recommendation systems in the ed-tech context can create substantial value. \n\n\\par To understand better the potential impact of the intervention, consider the context of the \\emph{Freadom} app. It is developed by \\emph{Stones2Milestones (S2M)}, and it is targeted at children aged 3 to 12 years old. Short illustrated stories are the main content of the app. Each story is a self-contained learning unit, generally consisting of a reading part and a quiz. Stories are curated by \\emph{S2M} pedagogy experts; they are grade-appropriate and have clear educational goals. \\emph{Freadom} is mostly used on smartphones, where the main page of the app consists of various sections. Each section contains a tray of stories. A tray is a sequence of stories sorted by an algorithm. Trays are labelled with different names e.g., \\emph{Trending Now}, \\emph{New Releases}, or \\emph{Recommended Story} and display stories following different algorithms (e.g., \\emph{New Releases} features stories recently added to the app). At the time we conducted this research, the algorithms assigning stories to trays were not personalized, and either manual curation or simple algorithms such as the most recently added stories, were used to select stories.\n\t\nThe first step of the project was to develop a personalized recommendation system using data on historical user-story interactions. We compared several alternative approaches, selecting an approach based on collaborative filtering \\citep{mnih2007probabilistic, rendle2010factorization} which performed best of the alternatives we considered in terms of estimated policy values (estimated using doubly robust off-line policy evaluation \\citep{gilotte2018offline, zhan2021policy}). However, off-line analysis is tailored to understanding the impact of recommending different individual stories to users on their engagement with the particular story, but it does not capture the effects of sustained exposure to a personalized recommendation system. In addition, off-line policy evaluation of recommendation systems has known limitations in terms of both bias and variance. This motivates our next step, which was to design a Randomized Controlled Trial (RCT) in order to compare the status quo system of manually curated recommendations to the personalized algorithm.\n\t\nTo evaluate the impact of personalized content recommendations on the utilization of the app, we carried out a randomized experiment. Since collaborative filtering requires substantial user history to perform well, the experiment included users who interacted with at least sixty stories before the start of the experiment. The main outcome metric is a user's total utilization of the app, defined as the sum of utilities from all user-story interactions during the experiment. Utility is a constructed metric, which assigns a value of one if a user completed a story, 0.5 if a user started the story but did not finish it, and 0.2 if the user clicked on the story to view the description but did not start it. Otherwise, the user is assigned the utility of zero. The experiment lasted for two weeks. We summarize our findings next.\n\t\n We find that users in the treatment group had a 63\\% ($\\pm$ 28\\% ) higher total utility from content interactions in the personalized tray compared to users in the control group. Treated users also completed 78\\% ($\\pm$ 39\\%) more stories and spent 87\\% ($\\pm$ 41\\%) more time-consuming content on the personalized tray. We document significant patterns of heterogeneity in treatment effects. Users who consumed more niche content (i.e., content that is less popular overall) in the pre-experimental period had substantially higher treatment effects than users who like popular content. This is an expected result as the editorial team selects content targeted to typical tastes. Therefore, users with preferences that are different than those of the majority are likely to benefit more from personalization. Furthermore, users with long histories of content interactions also gained more from the personalization of content. This is because the performance of the collaborative filtering model improves when more information about past interactions is available. Last, we compare outcomes of users that had used the \\emph{Recommended Story} tray in the past and users that have not. We find statistically significant treatment effects in both groups. The positive treatment effect for users that were not interacting with stories in this tray in the past suggests that users explore the app enough to notice content even in trays they rarely use and adjust their consumption decisions.\n\t\nUsers who received personalized recommendations in the \\emph{Recommended Story} tray increased utilization of the app across all trays. We find a 14\\% ($\\pm$ 12\\%) increase in the total utilization of the app, a 19\\% ($\\pm$ 14\\%) increase in the number of completed stories, and a 20\\% ($\\pm$ 14\\%) growth in the time spent reading stories. Also, users in the treatment group who didn't read any stories on the \\emph{Recommended Story} tray prior to the experiment exhibited a much larger (statistically significant) propensity to start reading on this tray compared to users in the control group. These results suggest that the increased usage of the \\emph{Recommended Story} tray is not driven entirely by substitution away from other trays in the app. On the contrary, we find that users substitute away from other non-app activities to use the app more. In summary, better content selection can increase the overall utilization of an ed-tech app, justifying investments in developing recommendation systems.\n\n \n\n \n \n \n\\paragraph{Literature review.} This paper relates to several strands of literature. Personalized recommendation systems have been studied intensively in entertainment \\citep{davidson2010youtube, gomez2015netflix, jacobson2016music, holtz2020engagement} and in retail shopping \\citep{linden2003amazon, sharma2015estimating, smith2017two,greenstein2018personal, ursu2018power}. For example, in the entertainment context and using a similar approach to our paper, \\citep{holtz2020engagement} show that personalized recommendations increase consumption of podcasts on Spotify. However, there is little empirical evidence of the usefulness of recommendation engines beyond entertainment platforms and e-commerce. This paper attempts to fill this gap by providing evidence from the ed-tech sector.\\footnote{\\cite{DBLP:reference/sp/DrachslerVSM15} provide an extensive review of literature on recommendation systems in ed-tech and point out a shortage of papers documenting the efficiency of recommendation systems using reliable evaluation methods. They conclude the review by calling for more comprehensive user studies in a controlled experimental environment.} Additionally, we show that personalized recommendations can be an effective method of boosting user engagement in settings with moderate amounts of data.\\par\n\nThe existing evidence of the efficacy of recommendation systems in education is generally based on small studies that combine the introduction of personalized recommendations with other changes to the user interface. \\citep{ISIS} use a recruited group of university students to study the effect of showing personalized recommendations of course materials to not showing any recommendations at all. While this study is an A/B experiment, it bundles two changes in one treatment: adding a user interface element and personalizing recommendations. Furthermore, this study is based on a relatively small sample of 250 subjects. \\citep{Ruiz-Iniesta2018} develop and test a recommendation system on an ed-tech platform called \\textit{Smile and Learn}, and evaluate it in an observational study. Their proposed treatment is a new user interface component with recommendations generated using collaborative filtering. The newly introduced system helps users navigate the app and reach desired content quicker. They find substantial increases in consumption of recommended items versus non-recommended items. However, the treatment in \\citep{Ruiz-Iniesta2018} has two elements: the part simplifying app navigation by adding a user interface component and a personalization component. Our work provides results that isolate the impact of personalization on the consumption of learning items.\\footnote{Contexts of \\citep{Ruiz-Iniesta2018} and of this paper also differ substantially. In our setting, we have thousands of stories to choose from as compared to around one hundred games. This seemingly technical difference results in problems of data sparsity, which is a serious challenge in creating recommendations for stories that are relatively new. In section \\ref{sectionoffline}, we present the methodology for designing and evaluating a recommendation system in such settings.} To the best of our knowledge, our paper is the first large-scale study in the ed-tech context that estimates the effect of personalization on user engagement in isolation from other changes in the app. \n\nSecond, our work contributes to the growing literature assessing the effects of personalized recommendation systems on the diversity of consumed content. To our knowledge, we are the first to do so in an ed-tech context. \\cite{anderson2020algorithmic, holtz2020engagement} provide evidence from a randomized experiment indicating that personalized recommendations reduce the diversity of content consumed on \\emph{Spotify}. In the context of retail, \\citep{AMAZONREC} show that, while recommendations reduce within-consumer diversity, their effect on aggregate diversity is ambiguous. \\citep{NEWSREC} find that recommendations reduce consumption diversity in the context of news consumption. In this paper, we show that users with niche preferences are recommended more niche content and less often interact with stories liked by the majority of users. This closely relates to the literature documenting 'filter-bubbles' due to the personalization of content on media platforms \\citep{haim2018burst, moller2018not}.\n\n\nLast, this paper relates to a rich literature on technology-assisted language learning.\\footnote{See \\cite{garrett2009computer}, \\citep{zhao2003recent}, and \\citep{tafazoli2019technology} for reviews of this literature.} Personalization in the language learning context has been shown to be effective in task assignment \\citep{xie2019personalized} and learning resource recommendations \\citep{sun2020vocabulary}. We contribute to this literature by bringing causal evidence of the impact of personalization on time spent interacting with language learning content.\n\n\n\nThe rest of the paper is organized as follows. \\Cref{empdata} details the empirical setting. \\Cref{sectionoffline} presents the methodology used to develop and test the recommendation model using offline data. \\Cref{RCT} describes the design of the randomized experiment and presents the results. Finally, \\cref{conclusion} concludes.\n\n\\section{Empirical setting}\\label{empdata}\n\\emph{Stones2Milestones (S2M)} was founded in 2009 in India. The company provides technology-enabled English education through a variety of programs serving a diverse set of users. The main product of \\emph{S2M} is a smartphone app called \\emph{Freadom}, aimed at 3 to 12-year-old children. Throughout 2021, the average daily number of users amounted to approximately 7,500. Users come to the app through two main channels: customer acquisition through schools, where the \\emph{S2M} sales team reaches out to schools that later recommend the app to their students (B2B), and independent users who download the app from the app store (B2C). Additionally, there is a paid version of the app which gives access to some additional non-essential features. \n\nThe main content of \\emph{Freadom} is short illustrated stories. Stories are organized in different trays based on various themes such as \\emph{Trending now} or \\emph{Recommended Story}. Figure \\ref{screenshot} presents screenshots from the app. \\Cref{fig:f1} shows the landing page that a user sees when launching the app. The landing page contains trays of stories and news, but also occasional promotions and announcements. \\Cref{fig:f2} presents the \\emph{Stories} subpage, which contains only stories. The tray displayed at the top is \\emph{Recommended Story}.\n\nEach tray is a slate of stories that a user can browse, and choose the ones to read. The selection of stories into trays follows various rule-based algorithms. For example, \\emph{Trending now} displays stories that are currently consumed by many users. \\Cref{fig:f3} presents the top part of the \\emph{Recommended Story} tray. Importantly, during the pre-experimental period, none of the trays of the app assigned students to content in a personalized fashion.\n\n\t\t\t\n\t\t\\begin{figure}[!ht]\n\t\t\n \n\t\t\t\\caption{Screenshots from \\emph{Freadom}.}\\label{screenshot}\n\n\t\t\t\\subfloat[Home Feed page: users open the app on this page.]{\\includegraphics[scale = 0.13]{images/freadom_home_tray.jpg}\\label{fig:f1}}\n\t\t\t\\hfill\n\t\t\t\\subfloat[Stories page: contains all story trays.]{\\includegraphics[scale = 0.13]{images/freadom_stories_tray.jpg}\\label{fig:f2}}\n\t\t\t\\hfill\n\t\t\t\\subfloat[\\emph{Recommended Stories}: one of the most popular trays.]{\\includegraphics[scale = 0.13]{images/freadom_rs_path.jpg}\\label{fig:f3}}\n\t\t\\end{figure}\n\n\n\n\\emph{Freadom} stories are curated by the \\emph{S2M} pedagogical team together with publishers specializing in educational content for kids. They are age-appropriate and created with a pedagogical goal in mind. Therefore, \\emph{S2M} operates under the premise that maximizing the consumption of content on the app helps learners achieve their educational goals. \n\n\\emph{Freadom} users can browse stories in the selected tray before deciding on which one to click. Clicking allows the user to open the story and view its description. Many users that view a description decide to go back to browsing; others start the story but do not finish it. Only a small minority of user-story interactions lead to the completion of a story.\n\n\\Cref{fig:utility_funnel} shows a content interaction funnel representing frequencies of users' content consumption decisions. We divide users into three main categories: \\emph{B2C}, \\emph{B2B}, and \\emph{paid} and show frequencies of different outcomes from interactions with stories. Thus, the unit of interest is the interaction between a user and a story. Users decide whether to view a story or not (second column), whether to start reading it (third column), and whether to complete it or not (final column). We can notice that users tend to explore many stories and acquire information about them through viewing or starting before deciding which stories to complete. \n\n\n\n\t\t\t\\begin{figure}[H]\n\t\t\t\\centering\n\t\t\t\\caption{User-story interaction utility funnel}\n\t\t\t\\includegraphics[scale = 0.60]{images/alluvial_new.png}\n\t\t\t\\caption*{\\footnotesize{\\textit{Note: Utility funnel broken by the type of user (B2B, B2C, paid) and the outcome of user-story interactions. In intense colors shares of user-story interactions that resulted in story completion. In red B2B users, in blue B2C, and in green paid users. The first column shows shares of user categories, the second one is the share of users that viewed the story, the third that started the story, and the fourth that completed it.}}}\n\t\t\t\\label{fig:utility_funnel}\n\t\t\\end{figure}\n\n\n\n\\section{Using offline data to develop a recommendation system}\\label{sectionoffline}\nIn this section, we aim to describe how we decided on what type of recommendation system we deployed and why. The objective of this section is, on the one hand, to describe the process and decisions taken in the development of the recommendation system that we eventually implemented, but also to act as a guide to practitioners interested in building a similar system who are positioned in a setting similar to ours.\n\n\\subsection{Target metric and datasets}\n\\paragraph{Our goal.} With a story catalog as large as \\emph{Freadom}'s, it is unpractical for a child to manually choose which stories to consume. Just as in the context of entertainment (movie recommendations) or e-commerce (product recommendations), serving \\emph{personalized} recommendations will potentially elicit the most child engagement on the app. Since stories are curated with a focus on pedagogy, this potentially accelerates child learning. Before the experiment, \\emph{Freadom} served stories based on editorial recommendations by experts, which were the same across users with no personalization. Therefore, the goal of our research was to develop a personalized recommendation system and evaluate its efficacy.\n\n\\paragraph{Datasets available.} We have historical log data of children's interactions with stories. Every entry of this dataset records an interaction of a child with a story, as well as to what extent they consumed the story; specifically whether a child did not consume it at all, considered reading it by viewing the story description card, started reading it or completed reading it. We have information about a child's grade level, as well as a tag recording the collections a story belongs to; a collection is a theme such as \\textit{animal} or \\textit{sport}.\\footnote{This is analogous to the type of data in the \\textit{movielens} benchmark dataset\\citep{movielens}; however, a key difference is that our dataset does not contain a rich set of child and story characteristics.}\n\n\\paragraph{Utility.} Based on our interaction data, it is unclear what our goal is in maximizing engagement. There are numerous apparent options; such as maximizing story card view rate or start rate or completion rate. While the ultimate goal of recommending stories to users is that they complete them, viewing and starting a story are prerequisites to completing it, and these are outcomes on a continuum rather than unrelated outcomes. Therefore, we define a metric \\emph{utility}, determined together with \\emph{S2M} to reflect their organizational objectives. The utility is derived from user-story interactions as follows\n\\begin{itemize}\n \\item If a story was not shown or was shown to a user who did not interact with any story in that specific session, we do not assign any value: $NA$,\\footnote{User-item interactions database contains records of only users' sessions that resulted in at least one click on a story. Thus, sessions in which a user launched the app and skipped all shown stories are not recorded in the data that we have access to.}\n \\item If the story was shown to the user, but the user skipped the story and viewed another story later in the session: $0$,\n \\item If the user viewed the story page, but did not start the story: $0.3$,\n \\item If the user viewed and started, but did not complete the story: $0.5$,\n \\item If the user viewed, started, and completed the story: $1$.\n\\end{itemize}\nWe note that we distinguish between the user choosing not to engage with a story that was shown (0) and the user never having an opportunity to interact with a story because it was not shown (NA), a critical distinction for understanding user preferences not always made in previous studies.\nThe above utility assignment can be thought of as giving us a utility matrix, with a child represented by a row, and a story by a column. This is the main building block of the recommendation system.\n\t\t\n\\subsection{Recommendation System}\nWe now present how we used observational data on user-story interactions to design the recommendation system for \\emph{S2M}. Our dataset does not contain rich user and story characteristics; therefore, we chose a classic collaborative filtering model \\citep{mnih2007probabilistic, rendle2010factorization} as the basis for our personalized recommendation system. We start by describing the collaborative-filtering model.\n\n\\paragraph{Model Description}\nConsider two models for our recommendation system. We evaluate them based on out-of-sample performance in the observational data. In what follows, $\\epsilon_{ij}$ is an unobserved error drawn iid for each child/story pair, and $\\sigma(x) = 1/(1 + e^{-x})$, the sigmoid function.\n\nFirst, is a popularity-based model, (also called a two-way fixed effects model: TWFE).\n\\begin{equation}\\label{eq:twfe}\n U_{ij} = \\sigma(\\beta_{0} + \\Psi_{i} + \\Gamma_{j} + \\epsilon_{ij})\n\\end{equation}\n, where $\\Gamma_{j}$ and $\\Psi_{i}$ are user and story fixed effects, respectively; Note, the popularity-based model is non-personalized, stories are simply ranked by their mean popularity and users receive stories at the top of the rank. We include this model for two reasons: first, it is a useful benchmark for evaluating personalized models. Second, such a model is simpler to implement; thus, to justify the development and introduction of a more complicated personalized model, it is useful to show that simpler models do not achieve similar performance.\n\nSecond, our main candidate model is the collaborative filtering approach.\n\\begin{equation}\\label{eq:cf}\n U_{ij} = \\sigma(\\Lambda_{j} \\times \\Theta_{i} + \\beta_{0} + \\Psi_{i} + \\Gamma_{j} + \\epsilon_{ij})\n\\end{equation}\n, where $\\Lambda_{j}$ is a latent preferences vector per user, $\\Theta_{i}$ a latent vector per story. This approach follows the seminal model proposed in \\citep{rendle2010factorization}. \nThe latent vectors, $\\Lambda_{j}$ and $\\Theta_{i}$, are of length $k$ and are rows and columns of matrices $\\Lambda$ and $\\Theta$. Columns of $\\Lambda$ and rows of $\\Theta$ are k-dimensional representations of user and story latent preference characteristics, respectively. \n\nThis approach allows for simplifying the utility matrix. Instead of modeling the preferences of each user for each story, we express user preferences and story features as each having k-dimensions. These dimensions (or axes of variation) could be thought of as characteristics (e.g., a serious or a funny story); each story and every user are placed along these axes. A higher value in a particular dimension for a story results in a higher expected user valuation in that dimension, and a higher expected preference for that story among users that also have a high value on that dimension. In sum, the collaborative filtering model identifies a low-dimensional representation of both users and stories, so that users with preferences for a particular type of story are located close (in the sense of Euclidean distance) to one another and to their preferred story types. \n\nThe collaborative filtering model can achieve high performance, if the matrices $\\Lambda$ and $\\Theta$ represent underlying preferences well. The more data we have on users' and stories' past interactions, the higher the chance of arriving at an accurate representation of the utility matrix. Crucially, this depends on how well-structured the data is; if there are clear repetitive patterns of user preferences and story types, we are more likely to capture them with this approach.\n\n\\paragraph{System Implementation Details.}\nWe build our collaborative filtering system using the PyTorch \\citep{pytorch} framework in python. We learn our model using Stochastic Gradient Descent (SGD) using the Adam Optimization method. To regularize, we use an L2 penalty on our parameter. We tune the number of latents, $k$ (the dimension of $\\Lambda_{j}$ and $\\Theta_{i}$), and our L2 penalty parameter using a randomly held out validation set.\\footnote{We split our dataset into a train, test, and validation set at random. Another approach is to split the data by time into a train dataset and a test dataset; so that we test in the period following our training data. In this setting, the train data is randomly split into a train set and a validation set. We also executed this approach; this leads to similar results.} Once we have our optimal learning hyperparameters, we relearn our model on the entire dataset which gives us the final model.\n\n\\paragraph{Personalized and baseline model performance on offline data.}\nTo test the accuracy of the prediction models, we compare the performance of the popularity-based model from \\Cref{eq:twfe} to the performance of the collaborative filtering from \\Cref{eq:cf}, additionally for completeness we include the performance of a model with just a constant term (mean model).\n\nWe compare the performance of these models in terms of Mean Squared Error (MSE) calculated using randomly held out historical data. See \\Cref{tab:mse-table} for results. We find that collaborative filtering outperforms the other models. \n\n\\begin{table}\n\\caption{MSE values for collaborative filtering (PYTF), two-way fixed effects (TWFE), and a simple mean model.}\n\\begin{center}\n\\resizebox{0.33\\textwidth}{!}{%\n\\begin{tabular}{@{}rrr@{}}\n\\toprule\n\\textbf{PYTF} & \\textbf{TWFE} & \\textbf{Mean Model} \\\\ \\midrule\n0.0962 & 0.1022 & 0.1309 \\\\ \\bottomrule\n\\end{tabular}%\n}\n\\label{tab:mse-table}\n\\caption*{\\footnotesize{\\textit{Note: Models are trained and evaluated on the dataset including all users and stories (no filtering based on user-item history length).}}}\n\\end{center}\n\\end{table}\n\n\n\\paragraph{Determining the target audience.}\nThe performance of the collaborative filtering model depends on the length of histories of interactions of users and stories.\\footnote{Our approach is generally not suitable for new users and new stories. The so-called cold-start problem of assigning content recommendations to users that have not yet revealed preferences from content interactions or stories whose latent style is still unknown is well-documented in recommendation systems literature, see e.g., \\cite{lam2008addressing, lika2014facing, bobadilla2012collaborative}. } To determine the right set of users and stories for the deployment of the recommendation model, we compared the MSEs of utility predictions from the selected utility model trained over different amounts of data and tested on a held-out test set. The training sets differ by the minimum histories of interactions of stories and users. This analysis tells us how much user and story history that is necessary for the recommendation model to provide high-quality recommendations.\n\n\\Cref{tab:my-table} presents MSEs for nine specifications depending on the length of the history of stories (columns) and of users (rows). We evaluate all specifications on the same dataset with thresholds (20,20).\n\n\n\\begin{table}\n\\caption{Collaborative Filtering Model Mean Squared Error for various user and story histories.}\n\\begin{center}\n\\resizebox{0.4\\textwidth}{!}{%\n\\begin{tabular}{@{}crrrr@{}}\\toprule\n & \\multicolumn{1}{c}{} & \\multicolumn{3}{c}{\\textbf{Stories}} \\\\\n & \\textbf{} & \\textbf{20} & \\textbf{60} & \\textbf{100} \\\\ \\midrule\n\\multicolumn{1}{c|}{\\multirow{3}{*}{\\rotatebox[origin=c]{90}{\\textbf{Users}}}} & \\multicolumn{1}{r|}{\\textbf{20}} & 0.0967 & 0.0931 & 0.0932 \\\\\n\\multicolumn{1}{c|}{} & \\multicolumn{1}{r|}{\\textbf{60}} & 0.0964 & 0.0931 & 0.0931 \\\\\n\\multicolumn{1}{c|}{} & \\multicolumn{1}{r|}{\\textbf{100}} & 0.0959 & 0.0930 & 0.0931 \\\\ \\bottomrule \n\\end{tabular}%\n\n}\n\\caption*{\\footnotesize{\\textit{Note: The rows represent the minimum interactions per user, and the columns represent minimum interactions per story. We use a single trained model on the largest dataset (20, 20), and report MSEs on different test sets.}}}\n\\label{tab:my-table}\n\n\\end{center}\n\\end{table}\n\n\nBased on the results from \\Cref{tab:my-table}, we decided that the population of users that will receive personal recommendations will consist of users and stories with at least 60 interactions in our historical data. Two factors contributed to this decision; first, high-quality predictions as measured by MSE and, second, the sample size requirement for the A/B experiment.\\footnote{Approximately 15\\% of users in the entire user base and 92\\% of all stories in the app have at least 60 interactions.}\n\n\n\\paragraph{Choosing the right tray for the new recommendation system.}\n\\emph{Freadom} is built based on multiple horizontally scrollable trays. Trays vary by popularity; one important driver of the tray's popularity is its position on the page. The most popular trays are \\emph{Popular}, \\emph{Trending Now}, \\emph{Recommended Story}, and \\emph{Today For you}.\n\nWe chose to deploy the recommendation model in the tray called \\emph{Recommended Story}. This tray was popular amongst more experienced users of the app, which meant that we were able to deploy the new system to many users of this tray. \\emph{S2M} had also originally intended the tray to be for personalized recommendations hence the name - \\emph{Recommended Story}. Before deploying our recommendation model, the tray content was chosen by \\emph{editors} on a weekly basis. \n\n\n\n\n\n\n\\paragraph{Re-ranking over time.}\nOn our chosen tray \\emph{Recommended Story}, stories are presented in a slate of 15 entries. The slate design task consists of deciding how to rank the stories and how frequently to update the ranking. We wanted to keep the ranking and refreshing module similar to the baseline one, so we can focus on isolating the effects of personalization.\n\nDue to computational constraints, new utility predictions were generated once per week. Thus, every week we would rank stories in decreasing order of predicted utility and the top 15 stories would make the slate. Within the week we would remove completed stories every day. Completed stories were replaced by stories that appeared next in the ranking of predicted utility.\n\nWhenever the user was active in the tray for two days but did not engage with any of the top 3 stories, we would remove those stories from their tray. This decision was motivated by the limitations of our data collection process, which does not allow for observing story skipping behavior in the case when the user did not click on any story during the session. This prevents us from accurately determining, the stories that users chose to ignore. Last, the ranking does not change if the user was inactive on the tray. The ranking algorithm was run every day.\n\n\\input{ATE_section}\n\n\n\\subsection{Heterogeneous treatment effects}\\label{HTE_section}\n\nThe evidence presented so far relates to the average impact of personalization. In this section, we analyze heterogeneity in treatment across past usage intensity, taste for popular vs. niche content, and the usage of the \\emph{Recommended Story} tray prior to the experiment. In \\Cref{hte_appendix}, we carry out a data-driven analysis of treatment heterogeneity and find a moderate amount of treatment heterogeneity.\n\nWe expect that the personalization of content recommendations will mostly benefit heavy and niche-type users. Frequent users leave a long record of user-story interactions, which allows us to well understand their tastes. Additionally, we expect niche users to have high benefits because, in the baseline system, stories are targeted at a typical user, whereas in the personalized system, their niche tastes are taken into account.\n\n\n\n\n\\paragraph{Definitions of users' types.} To determine whether someone is a heavy user we analyze the pre-experimental app usage. For each user, we compute the total utility and the total number of completed stories prior to the start of the experiment. \nAdditionally, we construct indicator variables: \\emph{high utility user} and \\emph{high story completion user}, which take a value of one when a user is in the top 50th percentile of the distribution of past utilization (past number of completed stories) and zero otherwise.\n\n\nNiche-type users are users that consume content that is generally not very popular. We consider a story to be a popular story if it is one of the top 25\\% of stories in terms of pre-experiment completions.\\footnote{Top 25\\% of stories correspond to 67\\% of impressions in the \\emph{Recommended Story} during the experiment.} \\Cref{fig:hist_popularity} shows the histogram of shares of popular content consumption per user prior to the experiment. There are some users whose content is largely niche. We consider a user to be a niche type if the share of niche content in her pre-experiment consumption is more than 50\\% (in red in \\Cref{fig:hist_popularity}). Note, that all users were receiving the same recommendations prior to the experiment; thus, finding niche stories required searching beyond the top of the recommendation list. \n\t\n\t\\begin{figure}[!ht]\n\t\\centering\n\t\t\t\\caption{Histogram of the share of popular stories consumed by users.}\n\t\t\t\\includegraphics[height=3.5in]{images/hist_pop.png}\n\t\t\t\\caption*{\\footnotesize{\\textit{Note: A popular story is a story in the top 25\\% of stories ranked by the number of pre-experiment completions. Niche users in red.}}}\n\t\t\t\\label{fig:hist_popularity}\n\t\\end{figure}\n\t\n\\paragraph{Treatment effects per group.} We start by providing estimates of the average treatment effects per group of interest. We consider total utility in \\emph{Recommended Story} tray as the outcome variable of interest and use a difference-in-means estimator. \\Cref{tab:HTE_groups} presents the results.\n\n\n\\begin{table}[!htbp] \\centering \n \\caption{Estimates of average treatment effects per group.} \n \\label{tab:HTE_groups}\n \\resizebox{0.85\\textwidth}{!}{\n\\begin{tabular}{>{}l|lrrr}\n\\toprule\ncategory & group & ATE & std. error & p. value\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Type}}} & Niche users & 0.334 & 0.089 & 0.000\\\\\n {\\textcolor{black}{\\textbf{}}} & Non-niche users & 0.044 & 0.063 & 0.487\\\\ \n\\addlinespace\n {\\textcolor{black}{\\textbf{Past utilization}}} & High utility users & 0.299 & 0.094 & 0.001\\\\\n {\\textcolor{black}{\\textbf{}}} & Low utility users & 0.092 & 0.056 & 0.103\\\\\n {\\textcolor{black}{\\textbf{}}} & High story completion users & 0.249 & 0.094 & 0.008\\\\\n {\\textcolor{black}{\\textbf{}}} & Low story completion users & 0.120 & 0.056 & 0.032\\\\\n\\addlinespace\n {\\textcolor{black}{\\textbf{Type and past utilization}}} & High utility and niche users & 0.508 & 0.132 & 0.000\\\\\n {\\textcolor{black}{\\textbf{}}} & High utility and not niche users & -0.023 & 0.122 & 0.848\\\\\n\\bottomrule\n\\end{tabular}\n}\n\\caption*{\\footnotesize{\\textit{Note: Outcome variable is total utility per user. ATE is estimated using a difference-in-means estimator. All groups are defined based on the pre-experiment app usage.}}}\n\\end{table}\nWe find that the gains from personalization are higher for niche users than for non-niche users and for heavy users than for light users. The niche dimension is of higher magnitude and statistical significance. In the last two rows of \\Cref{tab:HTE_groups}, we focus on the distinction between niche and non-niche users in the heavy utility group and find that niche users in this group have much higher treatment effects. This highlights, that the niche users form a distinct category, rather than are just heavy users who completed all popular stories and need to explore less popular ones.\\footnote{One might argue that a user becomes niche after having seen all the popular stories. Note, that there are 839 users in the heavy utility and niche and 573 in heavy utility and non-niche. This indicates that niche users are indeed a distinct category of users.} In \\Cref{robust} we provide further robustness of this result by regressing AIPW scores on past utilization and user type.\\footnote{To estimate AIPW scores we use the \\emph{grf} package (see \\cite{athey2019generalized}). This methodology allows us to flexibly adjust for individual characteristics and estimate conditional average treatment effects. We consider users' school grade, type (B2B, B2C, paid), max streak (maximal number of consecutive days in which users completed at least one story), past utilization (the total number of completed stories prior to the experiment, and total utility prior to the experiment), and whether a user is a niche type. To determine the variables based on past consumption we consider a period of app usage between July 2020 and the start of the experiment.}\n\n\nLast, in \\Cref{fig:aipw_utilization} we show how AIPW scores change across users depending on their past utilization. Panel A shows how AIPW scores change depending on the percentile of the pre-experiment number of story completions and panel B on users' past utility. We can notice upward trends in both figures. The differences are, however, moderate.\n\n\n\\begin{figure}%\n \\caption{AIPW scores across past utilization. AIPW scores for users with past utility higher than the percentile.}%\n \\centering\n \\subfloat[\\centering Past story completions. AIPW scores for users with the past number of completions higher than the percentile.]{{\\includegraphics[width=15cm]{images/aipw_completions.png} }}%\n \\qquad\n \\subfloat[\\centering Past utility. AIPW scores for users with past utility higher than the percentile. ]{{\\includegraphics[width=15cm]{images/aipw_utility.png} }}%\n \\label{fig:aipw_utilization}%\n\\end{figure}\n\n\n\\paragraph{Niche-type users see more niche content.}\n Personalized recommendations benefit niche users because they do not need to seek out their favorite niche stories away from the top of the list of recommended stories, but receive them right away. In \\Cref{tab:nicher}, we confirm this intuition by comparing the popularity of stories shown to popular and niche types in the two experimental groups.\n \n For each story, we compute the share of its impressions in total impressions in an experimental group and rank stories by it (\\emph{Rank of impressions}). Additionally, we compute each story's percentile in the distribution of impressions within the experimental group (the total number of stories per experimental group differs).\n\n \\begin{table}\n \\begin{center}\n \\caption{Type of stories shown to niche and popular-type users across treatment and control.}\\label{tab:nicher}\n \n\\begin{tabular}{>{}l|lrrrr}\n\\toprule\ngroup & variable & mean niche & mean non-niche & std. error & p. value\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Treatment}}} & Rank of impressions & 379.184 & 343.442 & 17.313 & 0.040\\\\\n {\\textcolor{black}{\\textbf{Control}}} & Rank of impressions & 498.730 & 489.391 & 18.357 & 0.611\\\\\n\\addlinespace\n {\\textcolor{black}{\\textbf{Treatment}}} & Percentile of impressions & 0.390 & 0.447 & 0.028 & 0.040\\\\\n {\\textcolor{black}{\\textbf{Control}}} & Percentile of impressions & 0.401 & 0.412 & 0.022 & 0.611\\\\\n\\bottomrule\n\\end{tabular}\n \n \t\t\t\\caption*{\\footnotesize{\\textit{Note: Type of stories shown to niche and popular-type users across treatment and control. The rank of impressions - stories ranked by the number of impressions during the experiment in the experimental group. Percentile refers to the percentile of the distribution of the share of impressions per story in the total impressions in the experimental group.}}}\n\t\t\t\n\t\\end{center}\n \\end{table}\n In the control group, popular and niche type users see stories of similar popularity, while in the treatment group niche users are shown more niche stories; the difference is statistically significant.\n \n\\paragraph{New and old users of \\emph{Recommended Story} tray.} Another important layer of heterogeneity is between users that have been consuming stories in \\emph{Recommended Story} tray before the experiment and those that started using this tray because of the personalized recommendations. Out of all experimental subjects only 14\\% interacted with at least one story from the \\emph{Recommended Story} tray in the two weeks prior to the experiment, and 47\\% have never interacted with a story in this tray.\n\n \\begin{table}\n \\begin{center}\n \\caption{ATE by past usage of \\emph{Recommended Story} tray.}\\label{tab:new_old}\n \n\\begin{tabular}{>{}l|lrrrr}\n\\toprule\ngroup & variable & ATE & ATE \\% baseline & std.error & p.value\\\\\n\\midrule\n {\\textcolor{black}{\\textbf{Past users of RS}}} & Total utility & 0.788 & 54.461 & 0.322 & 0.015\\\\\n {\\textcolor{black}{\\textbf{Past users of RS}}} & Total stories & 0.497 & 56.182 & 0.248 & 0.046\\\\\n {\\textcolor{black}{\\textbf{Past users of RS}}} & Total time reading & 3.980 & 65.417 & 1.777 & 0.026\\\\\n\\addlinespace\n {\\textcolor{black}{\\textbf{New RS users}}} & Total utility & 0.132 & 87.423 & 0.039 & 0.001\\\\\n {\\textcolor{black}{\\textbf{New RS users}}} & Total stories & 0.097 & 143.136 & 0.026 & <0.001\\\\\n {\\textcolor{black}{\\textbf{New RS users}}} & Total time reading & 0.692 & 148.707 & 0.189 & <0.001\\\\\n\\bottomrule\n\\end{tabular}\n \n \t\t\t\\caption*{\\footnotesize{\\textit{Note: Average treatment effects estimates using Difference-in-Means estimator. Subjects were grouped based on the usage of \\emph{Recommended Story} tray two weeks prior to the experiment. Three first rows show results for users that viewed at least one story in the tray; three bottom rows users that did not interact with any stories in the \\emph{Recommended Story} tray in this period. }}}\n\t\t\t\n\t\\end{center}\n \\end{table}\n\n\\Cref{tab:new_old} presents estimates of conditional average treatment effects. We consider only outcomes specific to the utilization of \\emph{Recommended Story} tray. Three top rows present results for users that have interacted with at least one story in the \\emph{Recommended Story} tray in the two weeks prior to the experiment. We find high and statistically significant treatment effects for this group.\n\nThe three bottom rows of \\cref{tab:new_old} present the results for users that were not actively using this tray prior to the experiment. We find that the treatment effects for such users are highly statistically significant and have high economic magnitudes. While the point estimates are small, the percentage change compared to the baseline (usage in the control group) is very high. These results suggest that the introduction of personalized recommendations attracted users to the tray that otherwise would not be using it at all. \n\n\\paragraph{Stories that drive the treatment effect.}\nIs the increase in total utility driven by a few stories liked by many users or a better assignment of many stories? To answer this question, we want to group stories into frequently and rarely shown and compare user utilities in these categories, in both the treatment and control groups.\n \n \\Cref{fig:reg_buckets} shows estimates of the conditional expectation of utility from user-story interactions for stories in different buckets of popularity. We use a linear regression where we adjust for users' grade, type, and past utilization. Buckets are constructed according to the rank of the number of story impressions in the experimental group (the total number of impressions in the treatment group is approximately equal in each bucket). Differences across experimental groups in the average utility in a bucket are (apart from personalization) due to, the selection of stories into buckets and differences in users that see stories in these buckets. Adjusting for user features allows us to isolate the effect of the story selection.\n \n \t\\begin{figure}[H]\n \t\\caption{Estimates of the conditional expectation of utility per bucket.}\n\t\t\t\\centering\n\t\t\t\\includegraphics[height=4in]{images/summs_plot.png}\n\t\t\t\\caption*{\\footnotesize{\\textit{Note: Utility estimates adjusted for the difference in grades, user types, and past usage intensity across buckets.}}}\n\t\t\t\\label{fig:reg_buckets}\n\t\t\\end{figure}\n\t\t\n We find that utilities in the treatment group are higher in all buckets. There is a high and statistically significant difference in the two first buckets. This suggests that our model picked up stories that were liked by many users. However, there is also a substantial increase in utility from the least impressed, niche stories. This means that there is a component of personalized niche content driving higher utility in the treatment group. In sum, we see that there are two mechanisms in story selection that increase the utility in the treatment group: (i) stories that are shown to many users on average lead to higher utility in the treatment group, and (ii) personalization of niche, infrequent stories in the treatment group leads on average to higher utility from interactions with these stories. \n \n\n\n\n\\section{Conclusion}\\label{conclusion}\nIn this paper, we provide evidence from a randomized controlled trial of the efficacy of personalized recommendations in promoting user engagement on an ed-tech app. We show that children learning to read in English engage more with content when it is selected based on their preferences. We find an effect of an over 60\\% increase in the utilization of the personalized content as compared to the baseline system of content selected by editors. We also find a 15\\% boost in overall app usage. \n\nWe evaluate the effects of the treatment on different user subgroups in the experiment and find interesting patterns of heterogeneity. We find that heavy users have substantially higher treatment effects. We have more data about such users; thus, we know their preferences better and can provide them with higher-quality recommendations. Second, we find that users that ex-ante prefer niche stories are the main beneficiaries of the personalized system; we also find that the personalized recommendation system makes it easier for them to discover niche content on the platform. Third, we show that both users who have been using the personalized section of the app prior to the experiment as well as those who have not benefited from the personalization.\n\nWe examine whether the increased utilization comes with increased diversity, and find that while the recommendation algorithm picks up on stories that are popular, it also increases utility from the least shown, niche stories.\n\n\nThis paper contributes to the recommendation systems literature by bringing evidence from the educational sector and a setting with limited data (as compared to big-tech environments where such systems are typically deployed). We carefully discuss the recommendation system design process hoping to allow practitioners to develop and deploy similar recommendation systems in other contexts.\n\n\n\n\nThe main limitation of this paper is that we focus on students that are heavy app users (interacted with at least sixty stories) and on stories that have been already shown to many users. This is a limitation of any system based on the collaborative filtering model as the model's performance improves with the number of past users-content interactions. Furthermore, the approach is not applicable to new users and new stories. Developing and implementing recommendations for new users and new items is a valuable extension of this work.\n\nLast, the proposed approach optimizes for user engagement rather than for learning. The recommendation system assigns stories that the user is most likely to complete, but these might not necessarily be the stories that will maximize learning. Optimizing the story selection for learning would be a preferable approach; however, because of difficulties in accurately measuring learning outcomes and slower feedback loops, we focused on engagement.\\footnote{This relates to the literature on surrogate \\citep{surrogates_eckles}, where a surrogate metric that closely tracks the target metric is optimized instead due to the target metric being infeasible to access} Bridging the gap between optimizing for short-term outcomes vs. long-term learning, for example by using surrogates, is a promising next step on this research agenda.\n\\newpage\n\n\\bibliographystyle{apalike}\n\n", "meta": {"timestamp": "2022-08-31T02:05:17", "yymm": "2208", "arxiv_id": "2208.13940", "language": "en", "url": "https://arxiv.org/abs/2208.13940"}} {"text": "\\section{Introduction}\nAs part of the Integrable Optics Test Accelerator (IOTA) a string of octupoles (Fig. \\ref{fig:oct}) is installed in a configuration to maintain the Hamiltonian as a constant of motion. During IOTA run 2 unexpected deviations in the closed orbit while the octupoles were energized suggested misalignment in the magnets or deviations in construction generating large low-order (quadrupole and dipole) transverse multipole components. The nominal values for the octupoles are in Table~\\ref{tab:octParams} \\cite{antipov2016design}.\n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.4 \\textwidth]{octupoleTsquareCrop.jpg}\n \\caption{A single octupole from the string.}\n \\label{fig:oct}\n\\end{figure}\n\nThere are a number of conventions for presenting the multipole components, and the following format will be used in this paper, Eqs. (\\ref{eq:harmDef}) \\& (\\ref{eq:compDef}).\n\n\\begin{equation}\n B_y + iB_x = \\sum_{n=1}^\\infty C_n \\left(\\frac{x+iy}{R_{ref}}\\right)^{n-1}\n \\label{eq:harmDef}\n\\end{equation}\n\n\\begin{equation}\n C_n = B_n + iA_n\n \\label{eq:compDef}\n\\end{equation}\nWhere $B_n$ and $A_n$ are the normal and skew terms respectively, $R_{ref}$ is the reference radius for the measurements, and the multipole index \"$n$\" follows the European convention, i.e. $n$ = 1 corresponds to the dipole term. The longitudinal component of the field was not considered in the characterization. The magnets were removed and characterized using a hall probe to determine potential outliers and align a set of nine magnets for installation in a new configuration before IOTA run 4. The figure of merit for selecting the magnets was the magnitude of low order multipoles.\n\n\\begin{table}[h]\n \\centering\n \\caption{Nominal Octupole Parameters}\n \\begin{tabular}{@{}lc@{}}\n \\toprule\n \\textbf{Octupole Parameter} & \\textbf{Design Value}\\\\\n \\midrule\n Length & \\SI{70}{mm} \\\\\n \\midrule\n Aperture & \\SI{28}{mm} \\\\\n \\midrule\n Coil Turns per Pole & 88 \\\\\n \\midrule\n Maximum Excitation Current & \\SI{2}{A} \\\\\n \\midrule\n Maximum Octupole Gradient & \\SI{1.4}{kG/cm^3} \\\\\n \\midrule\n Effective Field Length & \\SI{75}{mm} \\\\\n \\bottomrule\n \\end{tabular}\n \\label{tab:octParams}\n\\end{table}\n\n\\section{Test Stand Measurements}\n\\subsection{Methods}\n\nThe multipole components of the magnets were determined using a hall probe mounted on a three-axis test stand based on a procedure described in reference \\cite{campmany2014determination}. The test stand was composed of three, perpendicular rails actuated by linear stepper motors with a hall probe mounted along the nominal z-axis. The magnets were mounted to a support stand with alignment features for all degrees of freedom next to the test stand, see Fig. \\ref{fig:testCartoon}. Before any measurements were taken, the test stand was calibrated to the support stand stand using a precise flat and dial indicator to ensure that the axes of motion were perpendicular to each other. \n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.35 \\textwidth]{octTestStand.pdf}\n \\caption{Cartoon of hall probe and support stand with octupole.}\n \\label{fig:testCartoon}\n\\end{figure}\n\nAll measurements were taken at an energizing current of \\SI{2}{A}, the maximum for these octupoles. The test stand measured the magnetic field at a preprogrammed set of points. In practice, this was an equidistant set of points on a circle in a number of planes along the magnet's axis. The field from the x and y hall sensors was combined into azimuthal data based on the relative angle of the points on the circle. A Fourier decomposition was then performed on the magnetic field data to find the multipole components. A coarse scan (smaller radius, fewer points) was performed first and the relevant offset was calculated using Eq. (\\ref{eq:octCenter}) assuming that the sextupole component was all due to feed-down. The probe would then be centered in the magnet based on this offset and proceed onto a second, higher-fidelity scan. The high fidelity scan was 32 points at a reference radius of \\SI{8}{mm}, the largest radius which did not risk hitting any pole tips. In total, six circular scans were performed in the magnet at three different longitudinal positions, at each end of the pole tips and at the center of the magnet so the integrated field could be calculated. The measurements were taken moving forward and backwards through the magnet and averaged at each position to account for any potential backlash in the test stand. \n\n\\begin{equation}\n x_o + i y_o = \\left(\\frac{1}{n-1}\\right)\\left(\\frac{C'_{n-1}}{C'_n}\\right)R_{ref}\n \\label{eq:octCenter}\n\\end{equation}\n\nOnce the magnets multipole compositions had all been determined the best magnets could be selected. As the sextupole components had been deliberately minimized, the quadrupole and dipole components were used for selecting the best subset. Any outliers were excluded and from the remaining ten magnets were selected (nine for installation and one spare). These magnets were then remeasured on the stand for alignment. The same basic procedure was followed, but the probe left in the same position for each magnet. The relative magnetic centers could then be calculated by Eq. (\\ref{eq:octCenter}) and the magnets shimmed against matching alignment surfaces on the installation mount.\n\n\\subsection{Results}\nInitially, all magnet decompositions demonstrated abnormally large low-order multipole components, especially the dipole term (see Fig. \\ref{fig:azDecomp}).\n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.45 \\textwidth]{mag17decomp.pdf}\n \\caption{Initial multipole decomposition.}\n \\label{fig:azDecomp}\n\\end{figure}\n\nThis did not match empirical measurements, the abnormal dipole term was similar for all longitudinal positions in the magnet, but fields of this magnitude were not observed at the center where higher order multipoles are negligible. The source of this error was found to be the use of the calculated azimuthal fields. The hall probe consist of three individual sensors in the probe tip and have a significant offset with respect to one another. The initial calculation assumed these measurements were taken at the same point in the probe. To remedy this, the individual measurements of the disparate hall probes were decomposed separately, so each pass on the test stand effectively took X and Y measurements. The longitudinal component was not used. These measurements were not aligned to the magnetic center as the centering movement of the probe was based on the azimuthal calculation. In the interest of time, the measurements were centered in software using Eq. (\\ref{eq:recenter}) which applies the feed-down of all higher order multipoles to find the components in a new set of coordinates \\cite{jain1997basic}. As the sampling rate of the circular scan was 32 points the discrete Fourier transform yielded multipole components up to n=16, but the n=8 multipole is the maximum which demonstrates good sensitivity. A comparison of the centering calculation was done with both up to n=16 and n=8 and no significant deviations were observed.\n\n\\begin{equation}\n C_n = \\sum_{k=n}^{\\infty} C'_k \\left(\\frac{(k-1)!}{(n-1)!(k-n)!}\\right)\\left(\\frac{x_o +i y_o}{R_{ref}}\\right)^{k-n}\n \\label{eq:recenter}\n\\end{equation}\n\nThe new decomposition yielded much cleaner results (Figs. \\ref{fig:xDecomp} and \\ref{fig:yDecomp}), and was used for the centering measurements. \n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.45 \\textwidth]{mag13xDecomp.pdf}\n \\caption{Multipole components from X sensor.}\n \\label{fig:xDecomp}\n\\end{figure}\n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.45 \\textwidth]{mag13yDecomp.pdf}\n \\caption{Multipole components from Y sensor.}\n \\label{fig:yDecomp}\n\\end{figure}\n\n\\subsection{Alignment}\nOnce a subset of octupoles had been selected the alignment measurement was performed. During the course of these measurements the repeatability of the test stand positioning at the center was found to be on the order of \\SI{5}{\\micro m}. This was well within the alignment threshold of \\SI{400}{\\micro m}. To determine the size of the shims, the relative offset compared to the magnet with the center furthest from the alignment feature was found. Figure \\ref{fig:xOctCenters} shows the offsets in the x direction for the selected magnets, here Magnet 12 is the reference.\n\n\\begin{figure}[h]\n \\centering\n \\includegraphics[width = 0.48 \\textwidth]{octXCenterOffsets.png}\n \\caption{Relative offset of octupole magnets.}\n \\label{fig:xOctCenters}\n\\end{figure}\n\nShims matching these offsets could then be inserted to align the relative centers of the octupoles (see Fig. \\ref{fig:octAlign}).\n\n\\begin{figure}\n \\centering\n \\includegraphics{octAlign.pdf}\n \\caption{Cartoon of relative alignment procedure.}\n \\label{fig:octAlign}\n\\end{figure}\n\n\\section{Summary and Future Work}\nThe octupoles for the IOTA quasi integrable lattice element were characterized using a hall probe mounted to a test stand. An error related to the mismatched position of the hall sensors in the probe was identified and remedied using an alternative decomposition of the field. Once the satisfactory magnets had been selected, their relative centers were measured and aligned for installation in IOTA prior to run 4. In the course of run 4, the magnet alignment will be confirmed using beam based measurements.\n\n\n", "meta": {"timestamp": "2022-08-31T02:03:30", "yymm": "2208", "arxiv_id": "2208.13883", "language": "en", "url": "https://arxiv.org/abs/2208.13883"}} {"text": "\\section{Introduction}\nGiven an infinite graph $G$ and a probability measure $\\nu$ supported on $[0,\\infty)$, we can create a random metric $T$\non the vertex set $V$ of $G$ in a natural way; the ``length'' or ``weight'' of each edge is an independent random variable sampled\naccording to $\\nu$, and the distance between two vertices is simply the total ``length'' of the ``shortest'' edge path between them.\nThis model is called first passage percolation and was introduced by Hammersley and Welsh \\cite{HW} as a\nmodel for the spread of fluid through a porous medium. These distances are thought of as the time it takes for fluid\nto flow from one point to another, and $T(x,y)$ is often called the ``passage time''. For a survey on this model,\nthe reader is directed to \\cite{Aspects, ADH}.\n\nOne is interested in the large scale geometry of the random metric $T$; \nfor instance, if $G=(V,E)$ is a Cayley graph of a virtually nilpotent\ngroup, then under mild assumptions the sequence of random metric spaces $(V, \\frac{1}{t} T)$ almost surely converges\nto a deterministic limit space $(G_{\\infty}, d_{\\infty})$ (\\cite{BenjaminiTessera},\n\\cite{CD, Aspects} for the standard Cayley graph of $\\mathbb{Z}^d$) as $t \\to \\infty$.\nIn the case that the group is $\\mathbb{Z}^d$,\nthe limit space is $\\mathbb{R}^d$ with some norm. \nThe norm is determined by the ``time constants''\n\\[\n \\mu_v := \\lim_{n \\to \\infty} \\frac{ T(0,nv) }{ n } = \\lim_{n \\to \\infty} \\frac{ \\mathbb{E} T(0,nv) }{n},\n\\]\nwhere $v$ ranges over unit vectors in $\\mathbb{R}^d$.\nDue to the second equality above, we see that the scaling limit only depends on $T$ through $\\mathbb{E} T$,\nand in fact throughout this paper we focus on $\\mathbb{E} T$ rather than $T$.\n\nThe metric $\\mathbb{E} T$ does depend on the weight distribution $\\nu$, and understanding this dependence is the subject of this paper.\nVan den Berg and Kesten \\cite{vdBK} showed that if a probability measure $\\tilde{\\nu}$ is ``strictly more variable'' than\na probability measure $\\nu$, both have finite mean, and $\\nu$ is subcritical (in a sense to be described later), then for $G$ the standard Cayley graph of $\\mathbb{Z}^d$, $d \\ge 2$,\nwe have a strict inequality of time constants $\\tilde{\\mu}_v < \\mu_v$ for all $v \\ne 0$.\nIn fact, their proof shows that\n\\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} > 0,\n\\]\nwhere $d(x,y) := \\sum_{i=1}^d |x_i - y_i|$ is the graph distance between the vertices $x,y \\in \\mathbb{Z}^d$ in the standard Cayley graph of $\\mathbb{Z}^d$.\nNote that this inequality makes sense for any graph; one does not require the existence of any scaling limits or\neven ``time constants.'' We will often abbreviate the above inequality as $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\nThis naturally raises the question: for which other graphs $G$ does the same conclusion hold?\nLet us say that such graphs have the \\emph{van den Berg-Kesten (vdBK) property}. \n(For precise definitions of ``more variable'', ``subcritical'', and the vdBK property, see Section \\ref{sec:def}). \nA main result of this paper is a generalization of van den Berg-Kesten's result:\n\n\\begin{thm} \\label{thm:nilpotentvdBK}\n Let $G$ be any Cayley graph of a finitely generated virtually nilpotent group.\n If $G$ is not isomorphic as a graph to the standard Cayley graph of $\\mathbb{Z}$, then $G$ has the vdBK property.\n\\end{thm}\n\nOne might wonder if \\emph{all} graphs have the vdBK property. This is not true; the easiest counterexample is when\nthe graph $G$ is a tree. In this case, since there is only one self-avoiding path between any two points, we have\nthat $\\frac{\\mathbb{E} T(x,y) }{ d(x,y) }$ is a \\emph{constant} equal to the mean of $\\nu$.\nIt is easy to produce two different probability measures $\\nu, \\tilde{\\nu}$ with the same mean such that $\\tilde{\\nu}$\nis more variable than $\\nu$ (see the proof of Theorem \\ref{thm:notvdBK}).\nThis is, of course, why the standard Cayley graph of $\\mathbb{Z}$ had to be excluded from the above theorem.\n\nHowever, trees are not the only counterexample. Consider the Cayley graph of the free group $F(a,b)$ on the\ntwo letters $a,b$ which is associated to the [redundant] generating set $\\{a, b, ab\\}$ (see Figure \\ref{fig:F2Cayley}).\nIt is not hard to see that, although there is more than one self-avoiding path between any two points, each self-avoiding\npath between two points must pass through every vertex of the edge-geodesic path between those two points,\nand that each step, one only has the choice to travel along the edge lying in the geodesic, or to take a particular path of length\ntwo. Hence, in this case $\\frac{ \\mathbb{E} T(x,y) }{ d(x,y) }$ is a constant given by $\\mathbb{E} \\min( w_1, w_2 + w_3 )$, where $w_1,w_2,w_3$\nare independent variables with distribution $\\nu$. One can again produce two distinct distributions, one more variable than\nthe other, such that their ``time constants'' are equal, contradicting the vdBK property.\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[scale=.45]{F2_redundant_Cayley_more_with_path}\n \\caption{The Cayley graph of the free group $F(a,b)$ with respect to the generating set\n \\{a,b,ab\\}. In green is the unique edge-geodesic path from 1 to $ab^{-1}ab$. Every self-avoiding \n path from 1 to $ab^{-1}ab$ in this graph must visit all the vertices of the green\n path and can only use green or black edges.}\n \\label{fig:F2Cayley}\n\\end{figure}\n\nIt turns out that the crucial property for determining whether a graph is vdBK is a property of the graph which we\ncall ``admitting detours'' (defined in Section \\ref{sec:detours}). If a bounded degree graph does not admit detours, then it is not vdBK\n(Theorem \\ref{thm:notvdBK} below). On the other hand, our main theorems prove that given one of two quite different ``large scale''\nassumptions on the geometry of $G$, admitting detours \\emph{implies} the vdBK property:\n\n\\begin{restatable}{thm}{polygrowthvdBK} \\label{thm:polygrowthvdBK}\n Let $G$ be a graph of strict polynomial growth. Then $G$ is vdBK if and only if $G$ admits detours.\n \n \n\\end{restatable}\n\n\\begin{restatable}{thm}{qitree} \\label{thm:qitree}\n Let $G$ be a bounded degree graph which is quasi-isometric to a tree.\n Then $G$ is vdBK if and only if $G$ admits detours. In fact, if $G$ admits detours, then whenever $\\tilde{\\nu}$ is strictly more variable than $\\nu$, we have\n $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\n \n \n \n\\end{restatable}\n\nThe first theorem is used to prove Theorem \\ref{thm:nilpotentvdBK}: the latter follows from the former once we prove\nthat all Cayley graphs of virtually nilpotent groups which are not isomorphic to the standard Cayley graph of $\\mathbb{Z}$ admit\ndetours, which is proven in Appendix \\ref{app:grouptheory}.\nThe second theorem can be combined with results proved in Appendix \\ref{app:grouptheory} to show the corollary:\n\\begin{thm} \\label{thm:virtfree}\n Let $G$ be a Cayley graph for a group $\\Gamma$ which is virtually free, \n not isomorphic to $\\mathbb{Z}$ or $\\mathbb{Z}/2 * \\mathbb{Z}/2$, and which either contains\n a finite index subgroup with nontrivial center or contains a nontrivial finite normal \n subgroup. Then $G$ has the van den Berg-Kesten property.\n For example, if $\\Gamma$ contains $F_k \\times F$ as a finite-index subgroup, where $F_k$ is the free group on $k \\ge 1$ letters\n and $F$ is any nontrivial finite group, or if $\\Gamma$ is isomorphic to a semidirect product\n $F \\rtimes F_k$, then any Cayley graph of $\\Gamma$ is vdBK.\n\\end{thm}\n\nFinally, as noted in \\cite{vdBK}, the question of strict inequalities of time constants is related to\n``absolute continuity with respect to the expected empirical measure.'' What precisely we mean by this is explained in Section \\ref{sec:abscont};\nnote that this condition does not imply \\emph{existence} of a limiting expected empirical measure. In any case, the methods of our\npaper easily prove absolute continuity of the weight distribution with respect to the expected empirical measure under the same ``large-scale'' assumptions (see\nthe Section \\ref{sec:def} for the definition of exponential-subcriticality):\n\n\n\\begin{restatable}{thm}{qitreeabscont} \\label{thm:qitreeabscont}\n Let $G$ be a bounded degree graph which is quasi-isometric to a tree. Then for any probability measure $\\nu$\n on $[0, \\infty)$ with finite mean, $\\nu$ is absolutely continuous with respect to\n the expected empirical measure of the associated first passage percolation $T$.\n Moreover, if $\\nu$ strictly stochastically dominates a measure $\\tilde{\\nu}$ with finite mean, then $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\n\\end{restatable}\n\n\\begin{restatable}{thm}{polygrowthabscont} \\label{thm:polygrowthabscont}\n Let $G$ be a graph of strict polynomial growth. Suppose that $\\nu$ has finite mean \n and is exponential-subcritical.\n \n Then $\\nu$ is absolutely continuous with respect to\n the expected empirical measure of the associated first passage percolation $T$.\n Moreover, if $\\nu$ strictly stochastically dominates a measure $\\tilde{\\nu}$ with finite mean, then $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\n\\end{restatable}\n\n\n\n\n\nThe layout of the paper is as follows: in Section \\ref{sec:def} we establish definitions and notations, particularly the definition of the vdBK property.\nIn Section \\ref{sec:general} we collect various lemmata, most of which are essentially proven in \\cite{vdBK}, which\nwe will need to prove our main theorems. The key conclusion of this section is that, in order to prove that $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$, it suffices to show \nthat the expected number of times the $T$-geodesic between two points $x$ and $y$ passes through a certain type of configuration \ncalled a ``feasible pair''\nis \\emph{linear} in $d(x,y)$. In Section \\ref{sec:detours} we introduce the concept of ``admitting detours,'' show that this\nis a necessary condition for a graph to be vdBK, and then give examples of graphs which admit detours.\nIn Section \\ref{sec:qitrees} we prove Theorem \\ref{thm:qitree}, which will follow almost immediately from the results of Section \\ref{sec:general} \ncombined with a characterization of graphs quasi-isometric to trees.\nBecause paths are so constrained in this setting, it is not hard to produce local events which imply that the $T$-geodesic from $x$ to $y$\npasses through a feasible pair, and this makes the proof quite simple.\n\nIn Section \\ref{sec:polygrowth} we prove Theorem \\ref{thm:polygrowthvdBK}, which is much more involved.\nThe three key components are a Peierls-type lemma, a resampling argument, and a ``geometric construction'' (a \nconstruction of a set of weights suitable for use in the resampling argument).\nAlthough this general strategy is the same as in \\cite{vdBK}, the methods given here apply to general graphs\nof strict polynomial growth which are not necessarily almost-transitive.\nThe geometric constructions in particular are rather different from those of \\cite{vdBK} and are quite involved,\nsince we are given the task of manipulating the geodesic while remaining largely agnostic to the fine geometry of the graph.\nIndeed, these geometric constructions are the most involved part of the proof of Theorem \\ref{thm:polygrowthvdBK}.\nAt the end of this section we give some examples of graphs which are not almost-transitive to which our results apply.\n\nLastly, in Section \\ref{sec:abscont} we prove absolute continuity with respect to the expected empirical measure for graphs of strict polynomial growth\nand for graphs quasi-isometric to trees, which implies a strict monotonicity theorem\nwith respect to stochastic domination, regardless of whether the graph in question\nadmits detours. The proofs are just easier versions of the proofs of the main theorems of the paper.\nAppendix \\ref{app:grouptheory} gives proofs of the statements in Section \\ref{sec:detours} regarding which Cayley graphs admit detours.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\section{Passage times and the van den Berg-Kesten property} \\label{sec:def}\n\nBy a \\emph{graph} we mean a pair $G = (V,E)$ of sets and an ``endpoint'' or ``boundary'' map from $E$ \nto the set of subsets of $V$ of size 2.\nIn particular, we allow more than one edge between each pair of vertices but we do not allow self-loops. (Disallowing self-loops\nis simply a matter of convenience; virtually all questions considered here are easily seen to be equivalent for a graph $G$ with \nself-loops and the graph $G'$ obtained from $G$ by deleting all self-loops).\nThroughout, the ``ambient graph'' $G$ is tacitly assumed to be connected, locally finite (i.e. each vertex has finite degree)\nand infinite (that is, $V$ is countably infinite); we will often however consider subgraphs of $G$ which are finite and/or disconnected.\n\nA path $\\pi$ in $G$ is an alternating sequence of vertices and edges (starting and ending with a vertex) such that\nthe vertices immediately preceding and following an edge comprise the edge's boundary.\nIf $\\pi$ starts at $x \\in V$ and ends at $y \\in V$, we often write $\\pi: x \\to y$.\nWe will typically abuse notation and use the same symbol $\\pi$ to refer to the set of edges appearing in the path $\\pi$ (so $\\pi \\subset E$).\n$|S|$ denotes the cardinality of the set $S$, so in particular, if $\\pi$ is a path, $|\\pi|$ is the number of edges appearing in the path\n(again abusing notation and considering $\\pi$ as a subset of $E$). If $\\pi$ does not contain any repeated edges,\nthen this agrees with the usual notion of length of a path. In fact, we will mostly be concerned with paths\nwhich do not have any repeated vertices; we call such paths \\emph{self-avoiding} (or \\emph{vertex-self-avoiding}).\n\n\nA graph $G$ gives a natural metric on $V$ by\n\\[\n d(v,w) := \\inf \\{ |\\gamma| : \\gamma: v \\to w \\} = \\inf \\{ |\\gamma| : \\gamma: v \\to w \\mbox{ self-avoiding} \\}.\n\\]\nWe write $B(x,R)$ for the ball $\\{ y \\in V : d(x,y) \\le R \\}$ in this metric and write $S(x,R)$ for the sphere $\\{ y \\in V : d(x,y) = R\\}$.\n\nMore generally, given a function $w:E \\to [0,\\infty)$ we can define\n\\[\n T(\\pi) := \\sum_{e \\in \\pi} w(e).\n\\]\n(For general $\\pi$, the sum should be over the \\emph{sequence} of edges given by $\\pi$; if $\\pi$ contains no repeated edges,\nthis may again be considered as an edge \\emph{set}). We then get a pseudo-metric on $V$ given by\n\\[\n T(v,w) := \\inf \\{ T(\\gamma) : \\gamma: v \\to w \\} = \\inf \\{ T(\\gamma) : \\gamma: v \\to w \\mbox{ self-avoiding} \\}.\n\\]\nWe call $w(e)$ the \\emph{weight} of the edge $e$.\nWe often call $T(\\pi)$ the \\emph{passage time} of the path $\\pi$ and we call $T(v,w)$ the \\emph{passage time} from $v$ to $w$.\nIf $\\nu$ is a probability measure on $[0,\\infty)$ then we get a \\emph{random} function $w:E \\to [0,\\infty)$ \nby taking the $\\{ w(e) \\}_{e \\in E}$ to be an independent family of $\\nu$-distributed random variables.\nThis gives a \\emph{random} pseudometric $T$ on $V$. This model is called \\emph{[independent] first passage percolation}.\nThroughout the paper, $w$ and $T$ will represent the random weights and pseudo-metric given by a probability measure $\\nu$.\nSimilarly, $\\tilde{w}$ and $\\tilde{T}$ will represent the random weights and pseudo-metric given by a probability measure $\\tilde{\\nu}$,\nand similarly for any other diacritics.\n\n\n\nLet $\\nu$ and $\\tilde{\\nu}$ be two probability measures on $[0,\\infty)$. We say that $\\tilde{\\nu}$ is \\emph{more variable} than $\\nu$\nif for every concave nondecreasing function $f: \\mathbb{R} \\to \\mathbb{R}$ we have\n\\[\n \\int f d\\tilde{\\nu} \\le \\int f d\\nu\n\\]\nas long as both integrals converge absolutely. We say that $\\tilde{\\nu}$ is \\emph{strictly more variable} than $\\nu$\nif $\\tilde{\\nu}$ is more variable than $\\nu$ and $\\tilde{\\nu} \\ne \\nu$.\n\n\nWe now define some percolation thresholds associated to a graph $G$. For $p \\in [0,1]$, denote by $G_p$ the random subgraph\nof $G$ given by including each edge $e \\in E(G)$ in $G_p$ independently with probability $p$, excluding with probability $1-p$.\nWe define the \\emph{exponential percolation threshold} for $G$ to be\n\\[\n \\underline{p_c} := \n \\sup \n \\left\\{ p \\in [0,1] : \\limsup_{R \\to \\infty} \\sup_{o \\in V} \\frac{1}{R} \\log \\mathbb{P}(G_p \\mbox{ contains an edge path from } o \\mbox{ to } B_G(o,R)^c) < 0 \\right\\}\n\\]\nand we define the \\emph{exponential geodesic percolation threshold} for $G$ to be\n\\[\n \\vec{\\underline{p_c}} :=\n \\sup \\left\\{ p \\in [0,1] : \\limsup_{R \\to \\infty} \\sup_{o \\in V} \\frac{1}{R} \\log \n \\mathbb{P} \\left( \\begin{array}{c} G_p \\mbox{ contains an edge path from } o \\mbox{ to } \\\\\n B_G(o,R)^c \\mbox{ which is edge-geodesic in } G\n \\end{array} \\right) < 0 \\right\\}.\n\\]\nBelow the exponential percolation thresholds, we have uniform exponential upper bounds on connection events.\nWe call a measure $\\nu$ \\emph{exponential-subcritical} if either $\\inf := \\inf \\mathrm{supp} \\hspace{2pt} \\nu = 0$ and $\\nu(\\{0\\}) < \\underline{p_c}$\nor $\\inf > 0$ and $\\nu(\\{\\inf\\}) < \\vec{\\underline{p_c}}$.\n\n\\begin{defn}\n We say that an infinite graph $G$ \\emph{has the van den Berg-Kesten (vdBK) property}\n if for every $\\nu, \\tilde{\\nu}$ with finite mean such that\n $\\nu$ is exponential-subcritical and $\\tilde{\\nu}$ is strictly more variable than $\\nu$, we have\n \\begin{equation} \\label{eq:strictineq}\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} > 0.\n \\end{equation}\n\\end{defn}\nWe will often abbreviate the ``asymptotic strict inequality'' \\eqref{eq:strictineq} as $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\nThe main theorems of this paper give sufficient or necessary conditions for a graph to be vdBK.\n\n\\subsection{Remarks on the condition of exponential subcriticality}\n Here are some remarks which, while not necessary to the proofs below, are\n worth noting, on the condition of exponential subcriticality, its relationship to various other percolation thresholds,\n and the extent to which it is a necessary assumption to get a strict monotonicity result.\n \n First, it is clear from the definitions that for any graph, $\\vec{\\underline{p_c}} \\ge \\underline{p_c}$, and a simple union bound (counting self-avoiding paths\n from a fixed vertex) shows that if $G$ has degree at most $D$, then $\\underline{p_c} \\ge 1/D > 0$.\n It is also clear that for any connected graph, $\\underline{p_c} \\le p_c$, where $p_c$ is the percolation threshold as usually defined:\n \\[\n p_c := \\inf \\{ p \\in [0,1] : \\mathbb{P}( G_p \\mbox{ contains an infinite edge path from } o ) > 0 \\}.\n \\]\n For almost-transitive graphs, the sharpness of the percolation threshold \\cite{DuminilCopinTassion} shown by Duminil-Copin and\n Tassion implies that $\\underline{p_c} = p_c$.\n (The proof of sharpness in \\cite{DuminilCopinTassion} is stated for transitive graphs, but is not hard to generalize to almost-transitive graphs;\n here by \\emph{almost-transitive graph} we mean a graph $G$ such that the action of $\\mathrm{Aut}(G)$ on $V$ has finitely many orbits.)\n \n \n Furthermore, on amenable almost-transitive graphs (in particular graphs of polynomial growth),\n the original argument of Burton-Keane (\\cite{BurtonKeane},\n see also \\cite{HaggstromJonasson} for an explicitly general proof) shows that $p_c = p_u$, where $p_u$ is the \\emph{uniqueness threshold}\n \\[\n p_u := \\inf \\{ p \\in [0,1] : \\mathbb{P}( G_p \\mbox{ contains a unique infinite connected component } ) = 1 \\}.\n \\]\n If $\\nu(\\{0\\}) \\ge p_u$, then one expects that $\\lim_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) }{ d(x,y) } = 0$ (although this has \n only been proven in certain cases, see e.g. Theorem 6.1 of \\cite{Aspects}). In that case, it is impossible that $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$,\n so for polynomial growth almost-transitive graphs, the assumption on the atom at 0 is really as weak as one could hope for.\n \n In the case of the standard Cayley graph of $\\mathbb{Z}^d$, $\\vec{\\underline{p_c}}$ is the classical oriented percolation threshold $\\vec{p_c}$; this is because\n of the nature of edge-geodesics in this graph, combined with the sharpness results of Aizenman and Barsky \\cite{AizenmanBarsky}.\n In fact, if $G$ is the standard Cayley graph of $\\mathbb{Z}^d$, the condition here of being exponential-subcritical is precisely the condition of being\n ``useful'' in \\cite{vdBK}.\n Furthermore, in this case, if $\\inf \\mathrm{supp} \\hspace{2pt} \\nu := a > 0$ and $\\nu(\\{a\\}) \\ge \\vec{p_c}$, then $\\lim_{n \\to \\infty} \\frac{ \\mathbb{E} T(0, (n,...,n)) }{ dn } = a$ \\cite{DurrettLiggett, Marchand}.\n So if $\\nu \\ne \\delta_a$, we can\n take $\\tilde{\\nu} = \\delta_a$ to get $\\tilde{\\nu}$ strictly more variable than $\\nu$ but $\\mathbb{E} \\tilde{T} \\nll \\mathbb{E} T$.\n \n Thus, the assumption $\\nu(\\{a\\}) < \\vec{\\underline{p_c}}$ in the definition of the vdBK property is also necessary at least in this setting.\\footnote{\n Of course, this is because $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$ is equivalent to a strict inequality of time constants \\emph{in all directions simultaneously};\n if one instead only cares about strict inequality of a time constant in a fixed direction, the assumption $\\nu(a) < \\vec{\\underline{p_c}}$ may not be necessary;\n for instance Marchand \\cite{Marchand} proved that for $G$ the standard Cayley graph of $\\mathbb{Z}^2$, we get strict inequality in the $e_1$ direction\n without that assumption (as long as still $\\nu(\\{0\\}) < p_c$).}\n \n It is reasonable to conjecture that similar behavior happens more generally, i.e. that for, say, almost-transitive polynomial growth graphs\n one has $\\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) }{ d(x,y) } = \\inf =: \\inf \\mathrm{supp} \\hspace{2pt} \\nu$ whenever $\\inf > 0$ and $\\nu(\\{\\inf\\}) \\ge \\vec{\\underline{p_c}}$,\n but to show this would require defining an appropriate analogue of the oriented percolation threshold (which will likely not in general be associated\n to a literal oriented percolation model) and showing some sort of sharpness, and this is not explored here.\n \n On the other hand, for graphs quasi-isometric to a tree, we will see in Theorem \\ref{thm:qitree} that exponential subcriticality as defined here is not necessary at all.\n In fact, in this setting, if $G$ admits detours (see Section \\ref{sec:detours}), then\n $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$\n \n whenever $\\tilde{\\nu}$ is strictly more variable\n than $\\nu$, with no further assumptions needed on either measure.\n This is consistent with the perspective that generally the \\emph{uniqueness} threshold, rather than $p_c$, is the correct threshold to consider for the atom at 0,\n since almost-transitive graphs quasi-isometric to trees can have $p_c <1$ but always have $p_u = 1$ (since they have more than one end, see\n page 86 of \\cite{HaggstromPeresSchonmann}). The proper ``uniqueness'' analogue of $\\vec{\\underline{p_c}}$ outside of the amenable\n case is unclear.\n \n Finally, percolation on graphs which are not almost-transitive is poorly understood, and so it is entirely unclear how close exponential\n subcriticality is to the ``right'' condition on $\\nu$ to consider in this general setting. However, if $G$ has degree at most $D$ then\n the inequalities $\\vec{\\underline{p_c}} \\ge \\underline{p_c} \\ge 1/D > 0$ tell us that our main theorems are never vacuous for bounded degree graphs;\n in particular, we get sufficient conditions to conclude strict monotonicity, even if the parameters $\\vec{\\underline{p_c}}$ and $\\underline{p_c}$ are quite mysterious.\n \n \n \n\n \n\n\n\n\\section{Reduction to a lower bound on expected number of traversed ``feasible pairs''}\n\\label{sec:general}\nIn this section, we reduce the task of deducing a strict inequality $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$ to the task of showing that the $T$-geodesic traverses linearly many ``feasible pairs''\nin expectation. Most of the argument from this section can be transferred directly from \\cite{vdBK}, with the main difference being that we define a weaker notion\nof ``feasible pair.'' Proofs are given where the necessary modifications from \\cite{vdBK} are not obvious.\nNote that the arguments of this section allow us to stop considering $\\tilde{w}$ or $\\tilde{T}$ and simply focus on understanding the $T$-geodesic.\n\nFirst, note that we have the following theorem of van den Berg and Kesten:\n\\begin{thm}[\\cite{vdBK}, Theorem 2.9a]\n Let $\\nu$ and $\\tilde{\\nu}$ be probability measures on $[0,\\infty)$ with finite mean such that $\\tilde{\\nu}$ is more variable than $\\nu$. Then for all $x,y \\in V$\n \\[\n \\mathbb{E} \\tilde{T}(x,y) \\le \\mathbb{E} T(x,y).\n \\]\n\\end{thm}\n\\noindent\nAlthough the proof in \\cite{vdBK} is stated only for $G = \\mathbb{Z}^d$, it easily extends to all locally finite graphs.\n\nWe also have\n\\begin{thm}[\\cite{Strassen, Whitt}]\n Let $\\nu$ and $\\tilde{\\nu}$ be probability measures on $[0,\\infty)$ with finite mean such that $\\tilde{\\nu}$ is strictly more variable than $\\nu$.\n Then there exists a coupling $(w(e), \\tilde{w}(e))$ such that $w(e)$ is $\\nu$-distributed, $\\tilde{w}(e)$ is $\\tilde{\\nu}$-distributed,\n and \n \\[\n \\mathbb{E}[ \\tilde{w}(e) | w(e) ] \\le w(e)\n \\]\n almost surely.\n\\end{thm}\n\nAnother lemma from \\cite{vdBK} which we will need is the following:\n\\begin{lemma}[\\cite{vdBK}, Lemma 4.5] \\label{lem:wlog}\n We may assume without loss of generality that in our coupling \n \\begin{equation} \\label{eq:extraassumption}\n \\mathbb{P}( \\tilde{w}(e) > w(e) ) > 0.\n \\end{equation}\n Explicitly, either this holds, or there exists some $\\bar{w}(e)$ such that $\\mathbb{E} \\tilde{T}(x,y) \\le \\mathbb{E} \\bar{T}(x,y)$ for all $x,y \\in V$,\n the distribution of $\\bar{w}(e)$ is strictly more variable than $\\nu$, and such that \\eqref{eq:extraassumption}\n holds with $\\tilde{w}$ replaced by $\\bar{w}$, i.e. $\\mathbb{P}( \\bar{w}(e) > w(e) ) > 0$.\n\\end{lemma}\n\nA key technical lemma we will use is the following:\n\\begin{lemma}[\\cite{vdBK}, Lemma 4.8] \\label{lem:technicallemma}\n Let $\\nu$, $\\tilde{\\nu}$ be probability measures with finite mean such that $\\tilde{\\nu}$ is more variable than $\\nu$ and such that \n \\eqref{eq:extraassumption} holds. Then there exist $\\epsilon > 0$, $a>0$, $b>0$, $g > 0$, and\n a bounded Borel set $I_0 \\subset [0, \\infty)$ and $y_0 \\in I_0$ with the following properties:\n \\begin{itemize}\n \\item For all $\\delta > 0$, $\\nu(I_0 \\cap (y_0 - \\delta, y_0 + \\delta)) > 0$.\n \\item For all $y \\in I_0$,\n \\[\n \\mathbb{P}( \\tilde{w}(e) > y + a | w(e) = y ) \\ge b.\n \\]\n \\item For any $k \\ge 1$, $\\delta > 0$, and \n any $y_1,...,y_k, y'_1,...,y'_{\\lfloor (1 + \\epsilon) k \\rfloor} \\in I_0 \\cap (y_0 - \\delta, y_0 + \\delta)$, we have\n \\[\n \\sum_{i=1}^{k} (y_i + a) - \\sum_{i=1}^{\\lfloor (1 + \\epsilon) k \\rfloor} y'_i > kg \\ge g.\n \\]\n \\end{itemize}\n\\end{lemma}\n\\begin{proof}\n The proof is essentially the same as that given in \\cite{vdBK}, but simpler since we do not actually\n need as many conditions. By \\eqref{eq:extraassumption}, \n for some sufficiently small $a,b > 0$ there is\n some Borel set $B \\subset [0, \\infty)$ such that $\\nu(B) > 0$ and for all $y \\in B$,\n \\[\n \\mathbb{P}(\\tilde{w}(e) > y + a | w(e) = y) \\ge b.\n \\]\n Let $y_0$ be a point of support for $B$, that is, a point such that $\\nu(B \\cap (y_0 - \\delta, y_0 + \\delta)) > 0$\n for all $\\delta > 0$. Choose $\\epsilon > 0$ sufficiently small such that $\\epsilon y_0 < a$. Then choose $\\delta_0 > 0$\n sufficiently small that\n \\[\n \\epsilon y_0 + 2 \\delta_0 + \\epsilon \\delta_0 < a,\n \\]\n and choose\n \\[\n 0 < g < a - (\\epsilon y_0 + 2 \\delta_0 + \\epsilon \\delta_0).\n \\]\n Then we can take $I_0 := B \\cap (y - \\delta_0, y + \\delta_0)$.\n The first two conditions clearly hold by construction; let us show the last condition:\n \\begin{align*}\n \\sum_{i=1}^{k} (y_i + a) - \\sum_{i=1}^{\\lfloor (1 + \\epsilon) k \\rfloor} y'_i &\\ge k(y_0 - \\delta_0 + a) - k(1 + \\epsilon)(y_0 + \\delta_0) \\\\\n &= k( a - 2\\delta_0 - \\epsilon y_0 - \\epsilon \\delta_0 ) > kg \\ge g.\n \\end{align*}\n\\end{proof}\n\n\\begin{defn}\n Let $\\pi, \\pi'$ be a pair of paths with the same starting and ending point such that $\\pi \\ne \\pi'$ (as edge sets). \n We say that $\\pi'$ is a \\emph{$\\epsilon$-detour for $\\pi$} if\n \\[\n |\\pi' \\setminus \\pi| \\le (1 + \\epsilon) |\\pi \\setminus \\pi'|.\n \\]\n Here, $\\pi'$ and $\\pi$ are identified with the sets of edges they contain, $\\setminus$ denotes set difference, and $| \\cdot |$ is the cardinality of a set.\n\\end{defn}\nNote that the condition that $\\pi' \\ne \\pi$ implies that $|\\pi \\setminus \\pi'| \\ge 1$. For otherwise we would have\nthat $|\\pi' \\setminus \\pi| \\le (1 + \\epsilon) |\\pi \\setminus \\pi'| = (1+ \\epsilon) \\cdot 0 = 0$, that is,\n$|\\pi' \\setminus \\pi| = |\\pi \\setminus \\pi'| = 0$ and hence $\\pi' = \\pi$.\nIntuitively, one should think of $\\pi'$ as an ``alternate path'' which misses a fair number of edges of the original path\nbut is not much longer than the original path. A simple but useful observation is:\n\\begin{prop}\n $\\pi'$ is an $\\epsilon$-detour for $\\pi$ if and only if $\\pi$ and $\\pi'$ have the same endpoints and\n \\[\n |\\pi'| - |\\pi| \\le \\epsilon |\\pi \\setminus \\pi'|.\n \\]\n\\end{prop}\n\\begin{proof}\n This follows immediately from the facts that $|\\pi \\setminus \\pi'| = |\\pi| - |\\pi \\cap \\pi'|$ and $|\\pi' \\setminus \\pi| = |\\pi'| - |\\pi \\cap \\pi'|$, together\n with some algebraic manipulation.\n\\end{proof}\n\n\\begin{defn}\n Let $\\epsilon$ and $I_0$ all be as in Lemma \\ref{lem:technicallemma},\n and let $C$ be a constant.\n We call $(\\alpha, \\gamma)$ a \\emph{feasible pair} with respect to the $T$-geodesic\\footnote{\n A $T$-geodesic from $x$ to $y$ is a path $\\pi:x \\to y$ with $T(\\pi) = T(x,y)$.\n In general there may be more than one $T$-geodesic; we implicitly fix an arbitrary well-ordering\n on self-avoiding paths in $G$ and define ``the'' $T$-geodesic $\\pi$ from $x$ to $y$ to be the $T$-geodesic\n which is least in this ordering.\n On the other hand, it is not a priori obvious that a $T$-geodesic exists. If $\\nu(\\{0\\}) < p_c(G)$ and $G$ is locally finite, then it is\n easily shown (see Proposition 4.4 in \\cite{ADH}) that all pairs of points $x,y \\in V$ admit a $T$-geodesic;\n in particular, if $\\nu$ is exponential-subcritical, $T$-geodesics exist.\n In Section \\ref{sec:qitrees} we do not assume that $\\nu$ is exponential-subcritical, but the arguments are easily modified\n to avoid the assumption that $T$-geodesics exist, by considering paths $\\pi:x \\to y$ with $T(\\pi) \\le T(x,y) + \\epsilon$\n and letting $\\epsilon \\to 0$.}\n $\\pi$ from $x$ to $y$ if both $\\alpha$ and $\\gamma$ are self-avoiding,\n $\\gamma$ is an $\\epsilon$-detour for $\\alpha$ of length at most $C(1+\\epsilon)$, \n $\\alpha$ is a subpath of $\\pi$, and for all $e \\in (\\alpha \\cup \\gamma) \\setminus (\\alpha \\cap \\gamma)$, $w(e) \\in I_0$.\n\\end{defn}\nThis notion of course depends on $C$, $\\epsilon$, and $I_0$ even though this is suppressed in the notation.\nHere $C$ is an unspecified constant, but in practice there will be one particular $C=C(\\epsilon)$ that we end up using.\nThese detours turn out to be key to proving the strict inequalities we want to show, as we shall see in the next lemma.\n\nThe following lemma is essentially contained within Lemma 5.19 and the proof of Theorem 2.9(b)\nfrom Proposition 5.22 in \\cite{vdBK}, but we write it here to be \nexplicit about what the necessary modifications are.\n\\begin{lemma} \\label{lem:feasibletogap}\n Let $\\nu, \\tilde{\\nu}$ have finite mean and be such that\n $\\tilde{\\nu}$ is strictly more variable than $\\nu$ and \\eqref{eq:extraassumption} holds. \n Let $\\epsilon$ and $I_0$ be given as in Lemma \\ref{lem:technicallemma} and let $C$\n be fixed. Then there exists some constant $c_0 > 0$ such that if $G=(V,E)$ is a graph and $\\{B_i\\}_{i \\in I} \\subset E$\n is a family of disjoint subsets, for any $x,y \\in V$ we have\n \\[\n \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) \\ge c_0 \\sum_{i \\in I} \\mathbb{P}( B_i \\mbox{ contains a feasible pair for the } T\\mbox{-geodesic } \\pi: x \\to y ).\n \\]\n\\end{lemma}\n\\begin{proof}\n As in \\cite{vdBK}, let $\\hat{w}:E \\to [0,\\infty)$ be given by\n \\[\n \\hat{w}(e) := \\mathbbm{1}_{\\xi(e) = 0} w(e) + \\mathbbm{1}_{\\xi(e) = 1} \\tilde{w}(e),\n \\]\n where $\\{\\xi(e)\\}_{e \\in E}$ is a family of i.i.d. $\\mathrm{Unif}(\\{0,1\\})$ variables, also independent of $w,\\tilde{w}$.\n As shown in Lemma 5.19 of \\cite{vdBK}, $\\mathbb{E} \\tilde{T}(x,y) \\le \\mathbb{E} \\hat{T}(x,y)$ for all $x,y \\in V$,\n so it suffices to show the desired inequality with $\\tilde{T}$ replaced by $\\hat{T}$.\n \n We will call a pair $(\\alpha,\\gamma)$ \\emph{advantageous} for the $T$-geodesic $\\pi: x \\to y$ if it is feasible for $\\pi$\n and furthermore $\\xi(e) = 0$ for all $e \\in \\gamma \\setminus \\alpha$, $\\xi(e) = 1$ for all $e$ in a subset $S \\subset \\alpha \\setminus \\gamma$\n of size at least $|S| \\ge \\frac{1}{1+\\epsilon} |\\gamma \\setminus \\alpha|$, and if for all $e \\in S$ we have $\\tilde{w}(e) > w(e) + a$,\n where $a>0$ is as given in Lemma \\ref{lem:technicallemma}.\n Note that, by Lemma \\ref{lem:technicallemma}, if $(\\alpha,\\gamma)$ is advantageous then\n \\[\n \\hat{T}(\\alpha) - \\hat{T}(\\gamma) = \\hat{T}(\\alpha \\setminus \\gamma) - \\hat{T}(\\gamma \\setminus \\alpha) \n \\ge \\tilde{T}(S) - T(\\gamma \\setminus \\alpha) \\ge g,\n \\]\n where $g > 0$ is as in the lemma.\n Furthermore, for any pair $(\\alpha, \\gamma)$ we have\n \\[\n \\mathbb{E} \\left[ \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is advantageous}\\}} \\middle| w \\right] \\ge \n 2^{-(C(1+\\epsilon) + C)} b^C \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is feasible}\\}}.\n \\]\n (Here we have used that $|\\gamma| \\le C(1+\\epsilon)$).\n \n Therefore, consider a $T$-geodesic $\\pi$ from $x$ to $y$. Construct another (random) path $\\pi'$ by starting with $\\pi$\n and, for each $B_i$, if $B_i$ contains an advantageous pair $(\\alpha_i, \\gamma_i)$ for $\\pi$, replacing the subsegment $\\alpha_i$\n with $\\gamma_i$. (If $B_i$ contains more than one advantageous pair, choose the least one in some arbitrary ordering).\n We then have\n \\begin{align*}\n \\hat{T}(\\pi) - \\hat{T}(\\pi') \\ge g \\sum_i \\mathbbm{1}_{\\{ B_i \\mbox{ contains an advantageous pair for } \\pi \\}}. \n \\end{align*}\n Since, as shown in Lemma 5.19 of \\cite{vdBK}, $\\mathbb{E} T(\\pi) \\ge \\mathbb{E} \\hat{T}(\\pi)$, we have\n \\[\n \\mathbb{E} T(x,y) - \\mathbb{E} \\hat{T}(x,y) \\ge \\mathbb{E} \\hat{T}(\\pi) - \\mathbb{E} \\hat{T}(\\pi') \\ge g \\sum_{i \\in I} \\mathbb{P}(B_i \\mbox{ contains an advantageous pair for } \\pi).\n \\]\n But (again using some fixed ordering on pairs inside $B_i$) we have\n \\begin{align*}\n &\\mathbb{P}(B_i \\mbox{ contains an advantageous pair for } \\pi) \\\\\n \\ge &\\sum_{(\\alpha,\\gamma) \\subset B_i} \\mathbb{E} \\left[ \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is the least pair in } B_i \\mbox{ which is feasible}\\}}\n \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is advantageous} \\}} \\right] \\\\\n = &\\sum_{(\\alpha,\\gamma) \\subset B_i} \\mathbb{E} \\left[ \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is the least pair in } B_i \\mbox{ which is feasible}\\}}\n \\mathbb{E}[\\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is advantageous} \\}} | w] \\right] \\\\\n \\ge &2^{-(C(1+\\epsilon) + C)} b^C \n \\sum_{(\\alpha,\\gamma) \\subset B_i} \\mathbb{E} \\left[ \\mathbbm{1}_{\\{(\\alpha,\\gamma) \\mbox{ is the least pair in } B_i \\mbox{ which is feasible}\\}} \\right] \\\\\n = & 2^{-(C(1+\\epsilon) + C)} b^C \\mathbb{P}(B_i \\mbox{ contains a feasible pair for } \\pi).\n \\end{align*}\n Thus we have the lemma with $c_0 := g \\cdot 2^{-(C(1+\\epsilon) + C)} b^C > 0$.\n\\end{proof}\nInequalities up to a constant factor will appear many times in this paper, so from here we fix the following notation.\nFor two functions $f$ and $g$ of a parameter $t$, we will write $f(t) \\lessim g(t)$ or $g(t) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > f(t)$ if there is\nsome constant $c>0$ and $t_0 < \\infty$ such that $f(t) \\le c g(t)$ for all $t \\ge t_0$.\nIn this paper our parameter $t$ is typically either $d(x,y)$ or $R$, and which it is should be clear from context.\n\nFinally, it will be convenient to ``upgrade'' to the following lemma (where the same hypotheses on $\\nu$ and $\\tilde{\\nu}$ are assumed):\n\\begin{lemma} \\label{lem:feasibletovdBK}\nLet $\\{B_i\\}_{i \\in I}$ be a family of subgraphs of $G$ and suppose that \n\\[\n \\sup_{i \\in I} \\# \\{ j \\in I : B_j \\cap B_i \\ne \\emptyset \\} < \\infty.\n\\]\nThen, if\n\\[\n \\sum_{i \\in I} \\mathbb{P}( B_i \\mbox{ contains a feasible pair for the } T\\mbox{ geodesic } \\pi : x \\to y) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)\n\\]\nfor all $x,y \\in V$ with $d(x,y)$ sufficiently large, then\n\\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{ d(x,y) } > 0.\n\\]\n\\end{lemma}\n\\begin{proof}\n First, consider the graph whose vertex set is $I$ and whose edges are $\\{i, j\\}$ such that $B_i \\cap B_j \\ne \\emptyset$.\n Our first assumption states precisely that this graph has degree bounded by some constant, let's call it $D' < \\infty$.\n Then this graph can be colored by $D'+1$ colors using a greedy coloring.\n Hence we get a decomposition $I = \\bigsqcup_{\\ell=1}^{D' + 1} I_{\\ell}$ such that for each fixed $\\ell$, for all $i,j \\in I_{\\ell}$,\n if $i \\ne j$ then $B_i \\cap B_j = \\emptyset$. Moreover, we have\n \\begin{align*}\n & \\max_{\\ell \\in \\{1,...,D'+1\\}} \\sum_{i \\in I_{\\ell}} \\mathbb{P}( B_i \\mbox{ contains a feasible pair for the geodesic } \\pi : x \\to y) \\\\\n \\ge &\\frac{1}{D' + 1} \\sum_{i \\in I} \\mathbb{P}( B_i \\mbox{ contains a feasible pair for the geodesic } \\pi : x \\to y) \n \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y).\n \\end{align*}\n Thus we will have our lemma once we show that for each $\\ell$\n \\[\n \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > \\sum_{i \\in I_{\\ell}} \\mathbb{P}(B_i \\mbox{ contains a feasible pair for the geodesic } \\pi : x \\to y).\n \\]\n But since the $\\{B_i\\}_{i \\in I_{\\ell}}$ are disjoint families, this follows immediately from Lemma \\ref{lem:feasibletogap}.\n\\end{proof}\n\nIn light of the previous lemma, our strategy for proving our main theorems will be to find suitable subgraphs $B_i$ of $G$\nand then prove that the expected number of $B_i$ containing a feasible pair for the $T$-geodesic from \n$x$ to $y$ is at least a constant times $d(x,y)$.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\section{Graphs that admit detours} \\label{sec:detours}\nHere we introduce and prove facts about the key fine-geometric condition on our graphs.\n\\begin{defn}\n We say that a graph $G$ \\emph{admits detours} if for every $\\epsilon > 0$, there exists some $C$ such that\n for every self-avoiding path $\\pi$ in $G$ of length $C$, there exists a self-avoiding $\\epsilon$-detour $\\pi'$ for $\\pi$.\n\\end{defn}\nOur main theorems say that at least in certain coarse-geometric settings, this fine-geometric condition on the graph is equivalent to the\nvdBK property.\nWe first give an equivalent condition and show that this condition is \\emph{necessary} for a graph to have the vdBK property.\nWe then give examples of graphs which admit detours.\n\n\\subsection{vdBK graphs admit detours}\nRecall that a path $\\pi$ from $v$ to $w$ is called an \\emph{edge-geodesic} if for all paths $\\pi'$ from\n$v$ to $w$, $|\\pi| \\le |\\pi'|$. $\\pi$ is called a \\emph{unique (edge)-geodesic} if for all $\\pi' \\ne \\pi$ from $v$ to $w$, $|\\pi| < |\\pi'|$.\n\n\\begin{prop} \\label{prop:detourequiv}\n $G$ admits detours if and only if $G$ \\emph{admits detours along unique geodesics} in the following sense: for all\n $\\epsilon > 0$, there exists $C < \\infty$ such that for every unique edge-geodesic $\\pi$ of length $C$, there exists \n an $\\epsilon$-detour $\\pi'$ for $\\pi$.\n\\end{prop}\n\\begin{proof}\n The forward implication is clear. Now assume that $G$ admits detours along unique geodesics.\n Note that the $\\epsilon$-detour $\\pi'$ for a unique geodesic $\\pi$ can be made self-avoiding simply by loop erasing;\n the resulting path is still an $\\epsilon$-detour for $\\pi$ because the process of loop erasing cannot\n increase $|\\pi' \\setminus \\pi|$ and cannot decrease $|\\pi \\setminus \\pi'| \\ge 1$.\n So it only remains to construct self-avoiding $\\epsilon$-detours for self-avoiding paths which are not unique geodesics.\n Let $\\pi$ be a self-avoiding path which is not a unique geodesic, and let $\\pi'$ be an edge-geodesic connecting the endpoints of $\\pi$\n which is not equal to $\\pi$.\n Since $\\pi'$ is a geodesic, we have\n \\[\n 0 \\le |\\pi| - |\\pi'| = |\\pi \\setminus \\pi'| - |\\pi' \\setminus \\pi|,\n \\]\n so that\n \\[\n |\\pi' \\setminus \\pi| \\le |\\pi \\setminus \\pi'| \\le (1+ \\epsilon)|\\pi \\setminus \\pi'|\n \\]\n for all $\\epsilon > 0$.\n\\end{proof}\nWe can now easily prove that admitting detours is a \\emph{necessary} condition for a graph to be vdBK.\n\n\\begin{thm} \\label{thm:notvdBK}\n Let $G$ be a graph which does not admit detours.\n Then there exists a sequence of pairs $(x_n,y_n) \\in V^2$ with $d(x_n,y_n) \\tendsto{n}{\\infty} \\infty$\n and a pair of atomless measures $\\nu, \\tilde{\\nu}$ with finite mean which are supported away from $0$ \n and such that $\\tilde{\\nu}$ is strictly more variable than $\\nu$ but\n \\[\n \\mathbb{E} T(x_n, y_n) = \\mathbb{E} \\tilde{T}(x_n, y_n)\n \\]\n for all $n$.\n In particular, if $G$ has bounded degree, then $G$ does not satisfy the vdBK property.\n\\end{thm}\n\\begin{proof}\n Since $G$ does not admit detours, by Proposition \\ref{prop:detourequiv} there exists $\\epsilon_0 > 0$ such that\n for each $n$ we have a unique geodesic $\\pi_n$ of length $n$ which\n does not admit a $\\epsilon_0$-detour, which is to say that, if $x_n$ and $y_n$ are the endpoints of $\\pi_n$, then any other \n self-avoiding $\\pi'_n$ from\n $x_n$ to $y_n$ satisfies\n \\[\n |\\pi'_n \\setminus \\pi_n| \\ge (1 + \\epsilon_0)|\\pi_n \\setminus \\pi'_n|.\n \\]\n Note that, canceling a term of $T(\\pi_n \\cap \\pi'_n)$, we always have\n \\[\n T(\\pi'_n) - T(\\pi_n) = T(\\pi'_n \\setminus \\pi_n) - T(\\pi_n \\setminus \\pi'_n).\n \\]\n Now assume $\\nu$ is supported on $[1,1+\\epsilon_0]$. Then for any $\\pi'_n \\ne \\pi_n$ we have\n \\begin{align*}\n T(\\pi'_n \\setminus \\pi_n) - T(\\pi_n \\setminus \\pi'_n) &\\ge 1 \\cdot |\\pi'_n \\setminus \\pi_n| - (1+ \\epsilon_0)|\\pi_n \\setminus \\pi'_n| \\\\\n & \\ge(1 + \\epsilon_0) |\\pi_n \\setminus \\pi'_n| - (1 + \\epsilon_0)|\\pi_n \\setminus \\pi'_n| = 0.\n \\end{align*}\n That is, when $\\nu$ is supported on $[1, 1+ \\epsilon_0]$, $\\pi_n$ is almost surely has optimal passage time, that is,\n \\[\n T(x_n,y_n) = T(\\pi_n) \\mbox{ a.s.}\n \\]\n But then\n \\[\n \\mathbb{E} T(x_n,y_n) = \\mathbb{E} T(\\pi_n) = (\\mathbb{E} w) d(x_n,y_n).\n \\]\n In particular, if both $\\nu$ and $\\tilde{\\nu}$ are supported on $[1,1+\\epsilon_0]$ and $\\mathbb{E} w = \\mathbb{E} \\tilde{w}$, we get\n \\[\n \\mathbb{E} T(x_n,y_n) = (\\mathbb{E} w) d(x_n,y_n) = (\\mathbb{E} \\tilde{w}) d(x_n,y_n) = \\mathbb{E} \\tilde{T}(x_n,y_n),\n \\]\n so to complete our proof we just need to find two such $\\nu, \\tilde{\\nu}$ such that $\\tilde{\\nu}$ is strictly more variable than $\\nu$.\n For example, we can take $\\tilde{\\nu}$ to be the uniform measure on $[1, 1+\\epsilon_0]$ and $\\nu$ to be the uniform measure\n on $[1+(\\epsilon_0/4), 1+(3\\epsilon_0/4)]$ (see Example 2.17 in \\cite{vdBK}).\n Finally, if $G$ has bounded degree, then $\\nu(\\{1+(\\epsilon_0/4)\\}) = 0 < 1/D \\le \\vec{\\underline{p_c}}$, so $\\nu$ is exponential-subcritical\n and the pair $\\nu, \\tilde{\\nu}$ contradicts the vdBK property.\n\\end{proof}\n\n\\subsection{Examples of graphs which admit detours}\nProposition \\ref{prop:detourequiv} gives us an easy way to produce graphs which do not admit detours, namely by ``doubling'' edges.\nSimply take any graph $G$ and create a new graph $G'$ by taking the edge set of $G$ and adding an extra edge between each $v, w \\in V$ which are\nconnected by an edge in $G$. Since every edge has a ``parallel'' edge, $G'$ has no unique geodesics, and hence\nby Proposition \\ref{prop:detourequiv} admits detours.\n\nThis is a rather ``cheap'' way to get a graph that admits detours, especially since in first-passage percolation often the graphs one is interested in\nare simple, i.e. contain no parallel edges. However, this is a simple way to see that the property of admitting detours is not a quasi-isometry invariant;\nevery graph $G$ is quasi-isometric to a graph $G'$ which admits detours, so admitting detours is a ``fine'' rather than a ``coarse'' geometric property.\n\nThe property is not group-theoretic either; that is, for some groups $\\Gamma$, some Cayley graphs of $\\Gamma$ admit detours and others do not.\nThis can be seen using the same technique as above if one allows Cayley graphs to have double edges (for discussion on what exactly is meant\nby ``Cayley graph'' see Appendix \\ref{app:grouptheory}). But even if one restricts to simple Cayley graphs, there are counterexamples.\nThe standard Cayley graph of $\\mathbb{Z}$ does not admit detours (since it is a tree), but every Cayley graph of $\\mathbb{Z}$ not isomorphic to this one does.\nSimilarly, Cayley graphs of $\\mathbb{Z}/2 * \\mathbb{Z}/2$ which are isomorphic to the standard Cayley graph of $\\mathbb{Z}$ do not admit detours, but all others do. \nThis is proven in Appendix \\ref{app:grouptheory}.\n\nOn the other hand, there are several properties of groups which ensure that \\emph{all} of their Cayley graphs admit detours. For instance, we have\n\\begin{restatable*}{prop}{finitenormalsubgroup} \\label{prop:finitenormalsubgroup}\n Let $\\Gamma$ be a finitely generated group, and suppose that $\\Gamma$ contains $F \\unlhd \\Gamma$ a nontrivial finite normal subgroup.\n Then any Cayley graph of $\\Gamma$ admits detours.\n\\end{restatable*}\n\n\\begin{restatable*}{prop}{withcenter} \\label{prop:withcenter}\n Let $\\Gamma$ be a finitely generated group not isomorphic to $\\mathbb{Z}$ or $\\mathbb{Z}/2 * \\mathbb{Z}/2 \\cong \\mathbb{Z} \\rtimes \\mathbb{Z}/2$ \n with a finite index subgroup $H$ such that $H$ has nontrivial center. Then any Cayley graph $G$ \n of $\\Gamma$ admits detours.\n\\end{restatable*}\nThese allow us to conclude:\n\n\\begin{restatable*}{thm}{nilpotentdetours} \\label{thm:nilpotentdetours}\n Let $G$ be a Cayley graph of a virtually nilpotent group. If $G$ is not isomorphic as a graph\n to the standard Cayley graph of $\\mathbb{Z}$, then $G$ admits detours.\n\\end{restatable*}\n\nThe proofs of all of these facts are entirely combinatorial and group-theoretic, and are deferred to Appendix \\ref{app:grouptheory}.\nThere may be many weaker group-theoretic conditions which ensure that every Cayley graph of a group admits detours;\nthe ones proven here were mostly chosen in order to prove Theorem \\ref{thm:nilpotentdetours}, since this is needed to \nprove Theorem \\ref{thm:nilpotentvdBK}.\nOf course, they readily apply to many groups which are not virtually nilpotent.\n\n\n\n\n\n\n\n\n\\section{Proof of Theorem \\ref{thm:qitree}} \\label{sec:qitrees}\nIn this section, we prove the following:\n\\qitree*\n\nA metric space which is quasi-isometric to a tree (where the tree is given the usual graph metric) is called a \\emph{quasi-tree}.\nThe following is a well-known equivalent condition for a geodesic metric space to be a quasi-tree\n(the original, slightly weaker condition is due to Manning \\cite{Manning}; the following extension is a well-known\nconsequence, see e.g. \\cite{BBF}):\n\\begin{thm}[Manning's bottleneck criterion]\n A geodesic metric space $X$ is a quasi-tree if and only if there exists some $\\Delta < \\infty$ such that\n for every $x,y \\in X$, for every geodesic $[x,y]$ from $x$ to $y$, for every $z \\in [x,y]$, any path $\\pi$ from $x$ to $y$\n intersects $B(z,\\Delta)$.\n\\end{thm}\n\n\\begin{cor}\n Let $G = (V,E)$ be a graph which is a quasi-tree. Then there exists $R < \\infty$ such that for any $x,y \\in V$, for any edge geodesic $[x,y]$ from $x$ to $y$\n and any $z \\in V([x,y])$, every path $\\pi$ from $x$ to $y$ intersects $E(B(z,R))$.\n\\end{cor}\nHere (and later in the paper), if $S \\subset V$, then $E(S) \\subset E$ is defined to be the set of edges of $G$ which have both endpoints lying in $S$,\nand if $S \\subset E$, then $V(S) \\subset V$ is defined to be the set of vertices which are an endpoint of an edge in $S$.\n\\begin{proof}\n $(V,d)$ is naturally a subspace of the geodesic metric space $(G,d)$ given by the geometric realization of $G$ (i.e. the 1-dimensional metric cell complex where each $e \\in E$\n corresponds to 1-cell in $G$ isometric to $[0,1]$, joining 0-cells corresponding to the endpoints of $e$). The combinatorial edge-geodesics we study in this paper\n correspond to geodesics in $(G,d)$, and one quickly sees that the corollary holds with $R = \\Delta + 1$.\n\\end{proof}\n\n\n\\begin{proof}[Proof of Theorem \\ref{thm:qitree}]\n We only need to prove that, if $G$ admits detours, then we have $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$ whenever $\\tilde{\\nu}$ is strictly\n more variable than $\\nu$, since the other direction is given by\n Theorem \\ref{thm:notvdBK}.\n \n To this end, let $\\nu, \\tilde{\\nu}$ have finite mean with $\\tilde{\\nu}$ strictly more variable than $\\nu$ and first assume that\n \\eqref{eq:extraassumption} holds. \n Then let $\\epsilon>0, I_0$ be given\n as in Lemma \\ref{lem:technicallemma}.\n Since $G$ admits detours, there is some $C$ such that every self-avoiding path $\\pi$ of length $C$ admits a self-avoiding $\\epsilon$-detour\n (which is necessarily of length at most $(1+\\epsilon)C$).\n Since $G$ is a quasi-tree, take $R < \\infty$ such that for all $x,y \\in V$, for any geodesic $[x,y]$ from $x$ to $y$,\n any path $\\pi: x \\to y$ intersects $E(B(z,R))$ for all $z \\in V([x,y])$.\n \n Now, define the family $\\{ B_v := B(v,R+C(2+\\epsilon)) : v \\in V \\}$.\n First we claim that for any $v$,\n \\begin{align*}\n &\\mathbb{P}( B_v \\mbox{ contains a feasible pair for the geodesic } \\pi:x \\to y ) \\\\\n \\ge &\\mathbb{P}( \\pi \\mbox{ visits } B(v,R) \\mbox{ and leaves } B_v, w(e) \\in I_0 \\mbox{ for all } e \\in E(B_v)).\n \\end{align*}\n To see this, note that if $\\pi$ visits $B(v,R)$ and exits $B_v$, there is a segment $\\alpha$ of $\\pi$ of length at least $C$\n contained in $B(v,R+C)$; this segment admits a self-avoiding $\\epsilon$-detour $\\gamma$ contained in $B(v,R+C(2+\\epsilon))$.\n Then, if also $w(e) \\in I_0$ for all $e \\in E(B_v)$, $(\\alpha,\\gamma)$ forms a feasible pair.\n \n Next note that if $v \\in V([x,y]) \\setminus B_y$ then \\emph{any} path from $x$ to $y$ visits $B(v,R)$ and exits $B_v$,\n and so for such $v$ we have\n \\begin{align*}\n &\\mathbb{P}( B_v \\mbox{ contains a feasible pair for the geodesic } \\pi:x \\to y ) \\\\\n \\ge &\\mathbb{P}( w(e) \\in I_0 \\mbox{ for all } e \\in E(B_v)) = \\nu(I_0)^{|E(B_v)|} \\ge (\\nu(I_0))^{(D+1)^{R+C(2+\\epsilon)+1}} =: c\n \\end{align*}\n where $D$ is the maximum degree of $G$.\n Therefore for $d(x,y)$ sufficiently large we have\n \\begin{align*}\n &\\sum_{v \\in V} \\mathbb{P}(B_v \\mbox{ contains a feasible pair for the geodesic } \\pi:x \\to y ) \\\\ \n \\ge &\\sum_{v \\in V([x,y]) \\setminus B_y} \\mathbb{P}( w(e) \\in I_0 \\mbox{ for all } e \\in E(B_v)) \\\\\n \\ge &|V([x,y]) \\setminus B_y| c \\ge c d(x,y) - c(D+1)^{R+C(2+\\epsilon)} \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y).\n \\end{align*}\n Moreover, we have that\n \\[\n \\sup_v \\# \\{ w : B_w \\cap B_v \\ne \\emptyset \\} \\le \\sup_v |B(v,2(R + C(2 + \\epsilon)))| \\le (D+1)^{2(R + C(2 + \\epsilon))} < \\infty,\n \\]\n and so by Lemma \\ref{lem:feasibletovdBK} we have that\n \\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} > 0,\n \\]\n as desired.\n \n On the other hand, if $w$ and $\\tilde{w}$ do not satisfy \\eqref{eq:extraassumption}, then take $\\bar{w}$ as in Lemma \\ref{lem:wlog};\n applying our above argument to $\\bar{w}$ gives\n \\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} \\ge \n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\bar{T}(x,y) }{d(x,y)}> 0,\n \\]\n and so we are done.\n\\end{proof}\n\n\n\n\n\n\n\nAs a corollary we also obtain Theorem \\ref{thm:virtfree}:\n\\begin{proof}[Proof of Theorem \\ref{thm:virtfree}]\n Let $\\Gamma$ be a virtually free group. Since free groups have Cayley graphs which are regular trees, any\n Cayley graph of $\\Gamma$ is quasi-isometric to a regular tree, and so by Theorem \\ref{thm:qitree} a Cayley\n graph of $\\Gamma$ is vdBK if and only if it admits detours.\n If $\\Gamma$ has a finite index subgroup with nontrivial center and is not isomorphic to $\\mathbb{Z}$ or $\\mathbb{Z}/2 * \\mathbb{Z}/2$, \n then by Proposition \\ref{prop:withcenter},\n all its Cayley graphs admit detours.\n If $\\Gamma$ has a finite normal subgroup, then by Proposition \\ref{prop:finitenormalsubgroup},\n all its Cayley graphs admit detours; hence under either condition all Cayley graphs of $\\Gamma$ are vdBK.\n\\end{proof}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\section{Proof of Theorem \\ref{thm:polygrowthvdBK}} \\label{sec:polygrowth}\nWe say that a graph has \\emph{strict polynomial growth} if there exists some $0 d(x,y)$ many regions where the prescribed events hold.\nThis is a Peierls-type argument, a special case of which was used in \\cite{vdBK}.\n\n\\begin{lemma} \\label{lem:peierls}\n Let $G = (V,E)$ be a graph. Suppose that for each sufficiently large $R < \\infty$ we have the following:\n \\begin{itemize}\n \\item A partition $V = \\bigsqcup_i B_i^R$, with $\\sup_i \\mathrm{diam} (B_i^R) < \\infty$.\n \\item A collection of subsets $\\tilde{B}_i^R \\subset E$ indexed by the same index set as the $B_i^R$.\n \\item A collection of events $A_i^R$, each of which depends only upon the weights of the edges in $\\tilde{B}_i^R$.\n \\end{itemize}\n For each $R$ then construct the simple graph $G^R$ whose vertex set is $\\{B_i^R\\}_i$ and is such that two distinct vertices\n $B_i^R$ and $B_j^R$ are joined by an edge if and only if there is an edge in the original graph with one endpoint lying\n in $B_i^R$ and the other lying in $B_j^R$.\n Also construct the simple graph $\\tilde{G}^R$ with the same vertex set which is such that two distinct vertices\n $B_i^R$ and $B_j^R$ are joined by an edge if and only if $\\tilde{B}_i^R \\cap \\tilde{B}_j^R \\ne \\emptyset$.\n Suppose that there exists $D < \\infty$ such that for all $R$, the degree of both $G^R$ and $\\tilde{G}^R$\n is bounded by $D$. Suppose also that\n \\[\n \\rho(R) := \\sup_i \\mathbb{P} \\left( (A_i^R)^c \\right) \\tendsto{R}{\\infty} 0.\n \\]\n Then, for all sufficiently large $R$, there exist $c_2(R), \\epsilon_2(R) > 0$ such that for all sufficiently large $d(x,y)$,\n \\[\n \\mathbb{P} \\left( \n \\begin{array}{c} \\exists \\gamma:x \\to y \\mbox{ visiting at most } c_2 d(x,y) \\mbox{ distinct } \\\\\n B_i^R \\mbox{ such that } A_i^R \\mbox{ holds } \\end{array}\n \\right)\n \\le e^{-\\epsilon_2 d(x,y)}\n \\]\n\\end{lemma}\n\\begin{rmk} \\label{rmk:strongerpeierls}\n From the proof it will be clear that the following weaker condition is sufficient: let $D(R)$ be the maximum degree of $G^R$\n and let $\\tilde{D}(R)$ be the maximum degree of $\\tilde{G}^R$. Then for each $R$ such that\n \\[\n (2D(R) \\rho(R)^{\\frac{1}{\\tilde{D}(R) + 1}}) < 1,\n \\]\n we have the desired bound.\n For instance, since in our later geometric constructions we can ensure $\\rho(R) \\le e^{-cR}$,\n this allows us to extend Theorem \\ref{thm:polygrowthvdBK} from graphs\n of strict polynomial growth to graphs with $R^{d'} \\lessim |B(R)| \\lessim R^d$ with $d-d'<1$, since\n in that case our Voronoi construction below will give $\\tilde{D}(R), D(R) \\lessim R^{d-d'} = o(R)$, but it\n is difficult to come up with an example of a graph which has such a property but is not already of strict polynomial growth.\n\\end{rmk}\n\\begin{proof}\n Let $\\gamma$ be a path from $x$ to $y$. This induces a path $\\tilde{\\gamma}$ in $G^R$ in a natural way: \n $\\tilde{\\gamma}$ starts at the unique $B_{i_1}^R$ containing $x$, and each time $\\gamma$ crosses an edge\n from a vertex in some $B_i^R$ to a vertex in some distinct $B_{i'}^R$, $\\tilde{\\gamma}$ crosses an \n edge from $B_i^R$ to $B_{i'}^R$\n Note that since the diameter of the $B_i^R$ is bounded uniformly in $i$, there exists $\\eta(R) > 0$ such\n that if $\\gamma: x \\to y$, then $\\tilde{\\gamma}$ visits at least $\\eta d(x,y)$ distinct $B_i^R$.\n We want to bound the probability that (for some $c_2(R)$ to be chosen later) \n some such $\\tilde{\\gamma}$ visits at most $c_2 d(x,y)$ $B_i^R$\n such that $A_i^R$ holds. First, note that if $\\tilde{\\gamma}$ visits at most $c_2 d(x,y)$ $B_i^R$ such\n that $A_i^R$ holds, a self-avoiding path obtained from $\\tilde{\\gamma}$ from erasing loops has the same property.\n So it suffices to bound the probability that some self-avoiding path $\\tilde{\\gamma}$ in $G^R$ which starts\n at $B_{i_1}^R \\ni x$ and ends at $B_{i_2}^R \\ni y$ visits at most $c_2 d(x,y)$ $B_i^R$ such that $A_i^R$ holds;\n to reduce clutter, let us write $B_i$ and $A_i$ instead of $B_i^R$ and $A_i^R$.\n \n Now, for a fixed self-avoiding path $\\tilde{\\gamma}$ visiting $k$ distinct $B_i$, we have\n \\begin{align*}\n \\mathbb{P} \\left( \\begin{array}{c} \\tilde{\\gamma} \\mbox{ visits at most } c_2 d(x,y) \\\\\n B_i \\mbox{ such that } A_i \\mbox{ holds } \\end{array} \\right)\n \\le \\sum_{S \\subset V(\\tilde{\\gamma}), |S| = k - c_2 d(x,y)} \\mathbb{P} \\left( \\bigcap_{B_i \\in S} A_i^c \\right).\n \\end{align*}\n Each such $S \\subset \\{ B_i \\}_i$ contains a subset $S'$ which is independent in $\\tilde{G}^R$ (that is,\n no two elements of $S'$ are joined by an edge of $\\tilde{G}^R$) and which has size at least $|S'| \\ge \\frac{1}{D + 1}|S|$.\n From the definition of $\\tilde{G}^R$ we see that if $S'$ is an independent set in $\\tilde{G}^R$ then the collection\n of events $\\{A_i^R\\}_{B_i^R \\in S'}$ is independent. Hence the above is bounded by\n \\begin{align*}\n \\sum_{\\substack{S \\subset V(\\tilde{\\gamma}), \\\\ |S| = k - c_2 d(x,y)}} \\mathbb{P} \\left( \\bigcap_{B_i \\in S'} A_i^c \\right) \n = \\sum_{\\substack{S \\subset V(\\tilde{\\gamma}), \\\\ |S| = k - c_2 d(x,y)}} \\prod_{B_i \\in S'} \\mathbb{P}(A_i^c) \n \\le {k \\choose c_2 d(x,y)} \\rho^{\\frac{k - c_2 d(x,y)}{D + 1}}.\n \\end{align*}\n On the other hand, the number of self-avoiding paths of length $k$ in $G^R$ starting at the unique $B_{i_1} \\ni x$\n is at most $D^k$. Thus we have\n \\begin{align*}\n \\mathbb{P} \\left( \n \\begin{array}{c} \\exists \\gamma:x \\to y \\mbox{ visiting at most } c_2 d(x,y) \\mbox{ distinct } \\\\\n B_i^R \\mbox{ such that } A_i^R \\mbox{ holds } \\end{array}\n \\right)\n &\\le \\sum_{k = \\lceil \\eta d(x,y) \\rceil}^{\\infty} D^k {k \\choose c_2 d(x,y)} \\left(\\rho^{\\frac{1}{D + 1}}\\right)^{k - c_2 d(x,y)} \\\\\n &\\le \\left( \\rho^{\\frac{1}{D + 1}} \\right)^{-c_2 d(x,y)} \\sum_{k=\\lceil \\eta d(x,y) \\rceil}^{\\infty} \\left( 2D \\rho^{\\frac{1}{D+1}} \\right)^k;\n \\end{align*}\n for $R$ sufficiently large we have $2D \\rho^{\\frac{1}{D + 1}} < 1$, and then the right hand side above is equal to \n \\[\n \\left( \\rho^{\\frac{1}{D + 1}} \\right)^{-c_2 d(x,y)} \\left( 2D \\rho^{\\frac{1}{D+1}} \\right)^{\\lceil \\eta d(x,y) \\rceil} \\cdot \\frac{1}{1 - 2D\\rho^{\\frac{1}{D+1}}}.\n \\]\n If we choose $c_2 > 0$ sufficiently small that\n \\[\n -c_2 \\log (\\rho^{1/(D+1)}) + \\eta \\log ( 2D \\rho^{1/(D+1)} ) > 0\n \\]\n then our upper bound decays exponentially in $d(x,y)$, and so we are done.\n\\end{proof}\n\nNow, assuming that $G$ has strict polynomial growth, we construct a suitable family $\\{B_i\\}, \\{\\tilde{B}_i\\}$.\nFirst, for each $R$, choose a maximal subset $\\{ o^R_i \\}_i \\subset V$ which is $R$-separated, that is, such that\nif $i \\ne j$, then $d(o_i,o_j) \\ge R$. Also fix an arbitrary well-ordering on the indices $i$.\nMaximality implies that for each vertex $v \\in V$,\nthere exists some $i$ such that $d(o_i,v) \\le R$. For each $i$, let $B_i$ be the ``Voronoi tile'' containing $o_i$,\nthat is, set \n\\[\n B_i := \\{ v \\in V : d(o_i,v) < d(o_j,v) \\mbox{ for all } j < i, d(o_i,v) \\le d(o_j,v) \\mbox{ for all } j \\ge i \\}.\n\\]\n($B_i$ consists of the vertices which are closer to $o_i$ than any other $o_j$, but we ``break ties'' when $v$\nis equidistant from $o_i$ and $o_j$ using the ordering on indices).\nWe see that $V = \\bigsqcup_i B_i$ and that $\\sup_i \\mathrm{diam} B_i \\le 2R$ (since each $B_i \\subset B(o_i,R)$).\nNext, we fix $0 < \\Sigma < \\infty$ (a scaling parameter that will be chosen to suit our separate constructions below).\nWe have the following:\n\\begin{prop} \\label{prop:voronoiworks}\n Taking $B_i$ to be the Voronoi tiles and $\\tilde{B}_i := E(B(o_i, \\Sigma R))$ gives families satisfying the hypotheses of \n Lemma \\ref{lem:peierls}. That is, the associated sequence of graphs $G^R$, $\\tilde{G}^R$ both have degree uniformly bounded in $R$.\n\\end{prop}\n\\begin{proof}\n Fixing some $o_i$, we have\n \\begin{align*}\n \\{ j : B_j \\sim B_i \\mbox{ in } G^R \\} \\subset \\{ j : d(o_i,o_j) \\le 2R + 1 \\} \\subset \\{ j : B_j \\subset B(o_i,3R+1) \\},\n \\end{align*}\n as well as\n \\begin{align*}\n \\{j : B_j \\sim B_i \\mbox{ in } \\tilde{G}^R \\} \\subset \\{ j : d(o_i,o_j) \\le 2\\Sigma R \\} \\subset \\{ j : B_j \\subset B(o_i, (2\\Sigma + 1)R \\}.\n \\end{align*}\n Thus in order to bound both degrees it suffices to show that, given any constant $\\Sigma'$, the quantity\n \\[\n \\# \\{ j : B_j \\subset B(o_i, \\Sigma' R) \\}\n \\]\n is uniformly bounded in both $i$ and $R$. To this end, note that, since $o_i$ is $R$-separated, it follows that\n $B(o_i, (R/2) - 1) \\subset B_i$. So using our volume bounds and the fact that the $B_j$ are disjoint we have\n \\begin{align*}\n \\# \\{ j : B_j \\subset B(o_i, \\Sigma' R) \\} c_1[(R/2)-1]^d \\le \\sum_{B_j \\subset B(o_i, \\Sigma' R)} |B_j|\n \\le |B(o_i, \\Sigma' R)| \\le C_1(\\Sigma' R)^d,\n \\end{align*}\n so that\n \\[\n \\# \\{ j : B_j \\subset B(o_i, \\Sigma' R) \\} \\le \\frac{C_1(\\Sigma' R)^d}{c_1[(R/2)-1]^d} \\tendsto{R}{\\infty} \\frac{C_1}{c_1}(2 \\Sigma')^d,\n \\]\n so we are done.\n\\end{proof}\n\\begin{rmk}\n This is actually the only point in the proof where we use strict polynomial growth. In every other part of the proof,\n we will only use that we have a uniform strictly subexponential volume bound and bounded degree (which\n is equivalent to a uniform bound on $|B(v,1)|$).\n If one could find a suitable family for more general subexponential growth graphs, the methods in this paper would\n immediately show that such graphs are vdBK if and only if they admit detours.\n However, constructing such a family would take some ingenuity; if for instance we attempted to do the Voronoi\n construction for a graph with growth of order $e^{\\sqrt{R}}$, the degree bounds we get from the above analysis\n are superpolynomial, and even using the stronger form of Lemma \\ref{lem:peierls} (see Remark \\ref{rmk:strongerpeierls})\n will require at least a strictly sublinear bound on the degree of $\\tilde{G}^R$ to use our geometric constructions below,\n where the failure probabilities have order $-\\log \\rho(R) \\sim R$.\n\\end{rmk}\n\nLastly, let us use these lemmata to prove the following, which will be very important to our later constructions.\n\\begin{lemma} \\label{lem:bddawayfrominf}\n Let $G$ be a graph with strict polynomial growth, and suppose that $\\nu$ is exponential-subcritical.\n \n Then, there exist $q, c>0$ such that, for all $x,y \\in V$ with $d(x,y)$ sufficiently large, \n \\[\n \\mathbb{P}( T(x,y) < (\\inf + q)d(x,y) ) \\le e^{-cd(x,y)}.\n \\]\n\\end{lemma}\n\\begin{rmk}\n The conclusion of the above lemma also holds for \\emph{any} graph $G$ of degree at most $D$ if one assumes $\\nu(\\{\\inf\\}) < 1/D$;\n this is proved in the course of proving Lemma A.1 in \\cite{Tessera}.\n\\end{rmk}\n\\begin{proof}\n \n \n\n \n First, suppose $\\inf = 0$; since $\\nu$ is exponential-subcritical, $\\nu(\\{0\\}) < \\underline{p_c}$, and we can pick $q' > 0$ sufficiently small that\n if $\\nu([\\inf,\\inf+q']) < \\underline{p_c}$. Then, by the definition of $\\underline{p_c}$,\n there is some $c'>0$ such that for any $R$ sufficiently large, for any $v \\in V$,\n \\[\n \\mathbb{P}(v \\mbox{ is connected to } B(v,R)^c \\mbox{ by a path of edges which each have weight} < \\inf + q')\n \\le e^{-c'R}.\n \\]\n In particular, for any $\\Sigma \\ge 2$, we have that \n \\begin{align*}\n \\mathbb{P}(\\exists p \\in S(v, \\Sigma R), x \\in B(v,R), \\mbox{ path } \\alpha:p \\to x \\mbox{ in } B(v,\\Sigma R) \n \\mbox{ s.t. } w(e) < \\inf + q' \\mbox{ for all } e \\in \\alpha) \\\\\n \\le \\mathbb{P}( \\exists x \\in B(v,R), p' \\in S(x, (\\Sigma - 1)R), \\mbox{ path } \\alpha:x \\to p' \\mbox{ s.t. }\n w(e) < \\inf + q' \\mbox{ for all } e \\in \\alpha) \\\\\n \\le |B(v,R)| e^{-c'(\\Sigma - 1) R} \\le C_1R^d e^{-c'(\\Sigma -1)R} \\tendsto{R}{\\infty} 0.\n \\end{align*}\n In particular,\n \\begin{align*}\n \\inf_{v \\in V} \\mathbb{P}(\\mbox{all paths from } S(v,\\Sigma R) \\mbox{ to } B(v,R) \\mbox{ contain at least one edge of weight } \\ge \\inf + q')\n \\tendsto{R}{\\infty} 1,\n \\end{align*}\n and so by Lemma \\ref{lem:peierls}, for all sufficiently large $R$, there exist $c_2(R) >0, \\epsilon_2(R) > 0$ such that\n for all sufficiently large $d(x,y)$,\n \\[\n \\mathbb{P} \\left( \n \\begin{array}{l} \\exists \\gamma:x \\to y \\mbox{ visiting at most } c_2 d(x,y) \\mbox{ distinct } B_i \\mbox{ such that } \\\\\n \\mbox{ all paths from } S(o_i,\\Sigma R) \\mbox{ to } B(o_i,R) \\mbox{ contain at least one} \\\\\n \\mbox{ edge of weight } \\ge \\inf + q' \\end{array} \\\\\n \\right)\n \\le e^{-\\epsilon_2 d(x,y)}.\n \\]\n Now, each $B(o_i, \\Sigma R)$ intersects at most $D'$ other $B(o_i', \\Sigma R)$ by Proposition \\ref{prop:voronoiworks}, \n and so if a path $\\gamma$ visits at least $c_2 d(x,y)$ $B_i$ such that all paths from $S(o_i,\\Sigma R)$ to $B(o_i,R)$ contain\n at least edge with weight at least $\\inf + q'$, there is some collection of at least $\\frac{1}{D'+1} c_2 d(x,y)$ \\emph{disjoint}\n $B(o_i, \\Sigma R)$ with this property such that $\\gamma$ visits $B_i$. If $x \\notin B(o_i,\\Sigma R)$ then in particular $\\gamma$\n starts outside of $B(o_i, \\Sigma R)$ and so since $\\gamma$ visits $B_i \\subset B(o_i,R)$, some subpath of $\\gamma$\n joins $S(o_i,\\Sigma R)$ to $B(o_i,R)$ and so some edge of $\\gamma \\cap E(B(o_i, \\Sigma R))$ has weight at least $\\inf + q'$.\n So by disjointness we conclude that $\\gamma$ has at least $\\frac{c_2}{D'+1} d(x,y) - 1$ edges of length at least $\\inf + q'$,\n and so\n \\[\n T(\\gamma) \\ge (\\inf) d(x,y) + q' \\left( \\frac{c_2}{D'+1} d(x,y) - 1 \\right)\n \\]\n in this case. So taking $q := q' c_2/(2(D'+1))$, we see that whenever $d(x,y) \\ge 2(D'+1)/c_2$ we have\n \\[\n \\mathbb{P}(T(\\gamma) < (\\inf + q)d(x,y)) \\le e^{-\\epsilon_2 d(x,y)},\n \\]\n and the lemma follows.\n \n Now, suppose that $\\inf > 0$. Then choose $q' > 0$ such that $\\nu([\\inf, \\inf + q']) < \\vec{\\underline{p_c}}$ to obtain $c' > 0$ such that for any $R$ sufficiently large,\n for any $v \\in V$ we have \n \\[\n \\mathbb{P} \\left(\n \\begin{array}{c}\n v \\mbox{ is connected to } B(v,R)^c \\mbox{ by an edge-geodesic path} \\\\ \n \\mbox{of edges which each have weight} < \\inf + q'\n \\end{array}\n \\right)\n \\le e^{-c'R}.\n \\]\n Then arguing similarly as above, by Lemma \\ref{lem:peierls}, for all sufficiently large $R$, there exist $c_2(R) >0, \\epsilon_2(R) > 0$ such that\n for all sufficiently large $d(x,y)$,\n \\[\n \\mathbb{P} \\left( \n \\begin{array}{l} \\exists \\gamma:x \\to y \\mbox{ visiting at most } c_2 d(x,y) \\mbox{ distinct } B_i \\mbox{ such that } \\\\\n \\mbox{ all edge-geodesic paths from } S(o_i,\\Sigma R) \\mbox{ to } B(o_i,R) \\mbox{ contain} \\\\ \n \\mbox{ at least one edge of weight } \\ge \\inf + q' \\end{array} \\\\\n \\right)\n \\le e^{-\\epsilon_2 d(x,y)}.\n \\]\n Similar to above, we then see that (except on an exponentially small event) every path $\\gamma$ from $x$ to $y$ contains at least\n $\\frac{c_2}{D'+1} d(x,y) - 1$ disjoint subpaths which are either not edge-geodesic, or contain an edge of weight at least $\\inf + q'$.\n Each such subpath $\\gamma_i$ has passage time $T(\\gamma_i) \\ge (\\inf)|\\gamma_i| + \\min(\\inf, q')$.\n So taking $q := \\min(q', \\inf) c_2/(2(D'+1)) > 0$ and $c = \\epsilon_2$ gives the lemma.\n \n\\end{proof}\n\\begin{rmk}\n This is the only part of the proof where we use the exponential subcriticality of $\\nu$. \n \\end{rmk}\n\\begin{rmk}\n This lemma implies in particular that if $G$ is a Cayley graph of a finitely generated virtually nilpotent group and if $\\nu(\\{0\\}) < p_c$,\n then there exists $a>0$ such that for all $x,y \\in V$, $\\mathbb{E} T(x,y) \\ge a d(x,y)$.\n This means, for instance, that the results of \\cite{BenjaminiTessera} giving the existence of a scaling limit apply when\n $\\nu$ has an exponential moment and $\\nu(\\{0\\}) < p_c$ (a weaker condition than the condition $\\nu(\\{0\\}) < 1/D$ quoted in that paper).\n\\end{rmk}\n\n\n\n\n\n\n\n\n\\subsection{Proof strategy: a resampling scheme}\nNote that if we have any family of events $A_i^R$ as in Lemma \\ref{lem:peierls}, we have that in particular\n\\begin{align*}\n &\\sum_i \\mathbb{P}( \\{\\mbox{the geodesic } \\pi:x \\to y \\mbox{ visits } B^R_i\\} \\cap A^R_i ) \\\\\n = &\\mathbb{E}[ \\# B_i \\mbox{ such that } \\pi \\mbox { visits } B_i \\mbox{ and } A^R_i \\mbox {holds} ] \\\\\n \\ge &(c_2 d(x,y)) \\mathbb{P}( \\pi \\mbox{ visits at least } c_2d(x,y) \\mbox{ } B_i \\mbox{ such that } A^R_i \\mbox{ holds}) \\\\\n \\ge &(c_2 d(x,y)) (1 - e^{-\\epsilon_2 d(x,y)}) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y).\n\\end{align*}\nWe will say that $\\pi$ \\emph{crosses} $B(o_i, \\Sigma R)$ if $\\pi$ starts at a vertex outside $B(o_i, \\Sigma R)$, ends at a vertex outside\n$B(o_i, \\Sigma R)$, and visits $B_i$. Since the number of $o_i$ such that $x \\in B(o_i, \\Sigma R)$ or $y \\in B(o_i, \\Sigma R)$\nis bounded independent of $x$ and $y$, we see also from the above that\n\\[\n \\sum_i \\mathbb{P}( \\{\\mbox{the geodesic } \\pi:x \\to y \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A^R_i ) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y).\n\\]\n\nThus, if we find a family of events $\\{A^R_i\\}$ such that for each $i$, $\\mathbb{P}( B(o_i, \\Sigma R) \\mbox{ contains a feasible pair})$ is at least \na positive constant (independent of $x,y,i$, but possibly depending on $R$) times \n\\\\ $\\mathbb{P}( \\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A^R_i )$, we will have\n\\[\n \\sum_i \\mathbb{P}( B(o_i, \\Sigma R) \\mbox{ contains a feasible pair for } \\pi: x \\to y ) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y),\n\\]\nand hence by Lemma \\ref{lem:feasibletovdBK} we will have our theorem. (Note that here the role of the $B_i$ from \nLemma \\ref{lem:feasibletovdBK} is played by $B(o_i, \\Sigma R)$, not the Voronoi tiles $B_i$ we defined in the last section).\n\nWe will obtain a bound of the form \n\\[\n \\mathbb{P}(B(o_i, \\Sigma R) \\mbox{ contains a feasible pair}) \\ge c(R) \\mathbb{P}( \\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A^R_i )\n\\]\nby introducing a resampling scheme, as in \\cite{vdBK}. Explicitly, fix some $o_i$;\nthroughout the rest of the paper, we abbreviate $B(s) := B(o_i,s)$. Define new random weights \n$w^*:E \\to [0,\\infty)$ as follows: $w^*|_{E(B(\\Sigma R)^c} = w|_{E(B(\\Sigma R)^c}$, but the $w^*(e), e \\in E(B(\\Sigma R)$\nare i.i.d. $\\nu$-distributed random variables, also independent of $w$. (Recall that for $S \\subset V$, we define $E(S) \\subset E$ to\nbe the set of edges of $G$ with endpoints lying in $S$).\nNote that $w$ and $w^*$ are equal in distribution.\nFor each $R$ we will define a $w$-measurable random set of configurations $E_w \\subset [0, \\infty)^{E(B(\\Sigma R))}$ such that\n\\begin{equation} \\label{eq:resampleevents}\n \\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A_i^R \\cap \\{ w^*|_{E(B(\\Sigma R))} \\in E_w \\} \n \\subset \\{ B(\\Sigma R) \\mbox{ contains a feasible pair for } \\pi^* \\},\n\\end{equation}\nwhere $\\pi$ is the $T$-geodesic from $x$ to $y$ and $\\pi^*$ is the $T^*$-geodesic from $x$ to $y$.\nTo reduce clutter, let us abbreviate the event $\\{ w^*|_{E(B(\\Sigma R)} \\in E_w\\}$ by $\\{ w^* \\in E_w\\}$.\nIf in addition we ensure that the conditional probability $\\mathbb{P}( w^* \\in E_w | w) \\ge c(R) > 0$ on the event\n$\\{ \\pi \\mbox{ crosses } B(\\Sigma R) \\} \\cap A_i^R$ (where $c(R)$ is some non-random constant), we get\n\\begin{align*}\n \\mathbb{P}( B(\\Sigma R) \\mbox{ contains a feasible pair for } \\pi )\n &= \\mathbb{P}( B(\\Sigma R) \\mbox{ contains a feasible pair for } \\pi^* ) \\\\\n &\\ge \\mathbb{P}( \\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A_i^R \\cap \\{ w^* \\in E_w \\} )\\\\\n &= \\mathbb{E}\\left[ \\mathbbm{1}_{\\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A_i^R\\}} \\mathbb{E}[ \\mathbbm{1}_{\\{ w^* \\in E_w \\}} | w ]\\right] \\\\\n &\\ge c(R) \\mathbb{P}( \\{ \\pi \\mbox{ crosses } B(o_i, \\Sigma R) \\} \\cap A^R_i ),\n\\end{align*}\nas desired. The discussion in this section is summarized in following proposition:\n\\begin{prop} \\label{prop:conditionaltovdBK}\n Suppose there exist $w$-measurable events $A_i^R$ satisfying the conditions of Lemma \\ref{lem:peierls} \n and $w$-measurable random sets of configurations $E_w$ such that for sufficiently large $R$ \n \\eqref{eq:resampleevents} holds and\n $\\mathbb{P}(w^* \\in E_w | w) \\ge c(R)$ on the event $\\{ \\pi \\mbox{ crosses } B(\\Sigma R) \\} \\cap A_i^R$,\n where $c(R) > 0$ is a constant depending only on $R, \\nu, \\tilde{\\nu},$ and $G$. Then \n \\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{\\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y)}{d(x,y)} > 0.\n \\]\n\\end{prop}\nThus the meat of the proof of Theorem \\ref{thm:polygrowthvdBK} \nconsists of performing a ``geometric'' construction to obtain suitable $A_i^R$ and $E_w$.\n\n\\subsection{Geometric construction: bounded case} \\label{bddsupp}\nFirst, suppose that $\\nu$ has bounded support. We want to construct $A_i^R$ and $E_w$ satisfying the hypotheses\nof Proposition \\ref{prop:conditionaltovdBK}.\nDenote by $\\inf$ the infimum of the support of $\\nu$\nand denote by $\\sup$ the supremum of the support of $\\nu$.\nAssume that \\eqref{eq:extraassumption} holds, and then choose $\\epsilon>0, y_0, I_0$\nas in Lemma \\ref{lem:technicallemma}. \nAssuming that $G$ admits detours, let $C$ be such that every self-avoiding path of length $C$\nadmits a self-avoiding $\\epsilon$-detour. Set $C' := C(3 + 2\\epsilon)$.\nAssume that $\\nu$ is exponential-subcritical, and\nthen let $q>0$ be the parameter given by Lemma \\ref{lem:bddawayfrominf}.\nDenote by $D$ the maximum degree of $G$.\n\nFirst, let us consider the case that $y_0 = \\sup$; in fact this allows us to do a much simpler construction. \nIn this case, choose $\\Sigma > 2$ large enough that $(\\inf + q)\\left(1 - \\frac{1}{\\Sigma} \\right) > \\inf$ and\nchoose $\\delta > 0$ such that $\\inf + \\delta < (\\inf + q)\\left(1 - \\frac{1}{\\Sigma}\\right)$.\nChoose a sequence $\\delta_{\\sup}(R) \\tendsto{R}{\\infty} 0$ such that for each $R$ \n$\\nu([\\sup - \\delta_{\\sup}(R), \\sup]) > 0$ but $\\lim_{R \\to \\infty} \\nu([\\inf,\\sup - \\delta_{\\sup}(R)])^{D C_1 (\\Sigma R)^d} = 1$.\nThen let $A_i^R := A_1 \\cap A_2$ where $A_1$ and $A_2$ are as follows:\n\\[\n A_1 := \\left\\{ \\begin{array}{c} \n \\mbox{For all vertices } v,w \\in B(\\Sigma R) \\mbox{ with } d(v,w) \\ge R, \\\\ \n \\mbox{ all paths } \\gamma \\mbox{ from } v \\mbox{ to } w \n \\mbox{ in } B(\\Sigma R) \\mbox{ have } T(\\gamma) \\ge (\\inf + q)d(v,w)\n \\end{array} \\right\\}.\n\\]\n\\[\n A_2 := \\left\\{ w(e) \\le \\sup - \\delta_{\\sup} \\mbox{ for all } e \\in E(B(\\Sigma R)) \\right\\}.\n\\]\nWe see that both events only depend on the weights of edges in $B(\\Sigma R)$, by choice of $\\delta_{\\sup}(R)$ we\nhave $\\mathbb{P}(A_2) \\tendsto{R}{\\infty} 1$, and by Lemma \\ref{lem:bddawayfrominf} we have that for sufficiently large $R$\n\\[\n \\mathbb{P}(A_1^c) \\le \\sum_{\\substack{v,w \\in B(\\Sigma R),\\\\ d(v,w) \\ge R}} \\mathbb{P}( T(v,w) < (\\inf + q) d(v,w) )\n \\le (C_1 R^d)^2 e^{-\\epsilon_2 R} \\tendsto{R}{\\infty} 0\n\\]\nuniformly in $i$, so the hypotheses of Lemma \\ref{lem:peierls} are satisfied.\n\nNow in this case set of configurations $E_w$ does not actually depend on $w$; we simply set\n\\[\n E_w := \\left\\{ \\omega \\in [0,\\infty)^{E(B(\\Sigma R))} : \\omega(e) \\in \n \\begin{array}{lc}\n [\\inf, \\inf + \\delta) &\\mbox{ if } e \\in E(B(\\Sigma R - C')), \\\\ \\relax\n [\\sup, \\sup - \\delta_{\\sup}] \\cap I_0 &\\mbox{ otherwise} \n \\end{array} \\right\\}.\n\\]\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[scale=.5]{boundedconstruction1_cropped}\n \\caption{A schematic diagram of the prescribed set of configurations $E_w$ in the case\n that $\\nu$ has bounded support and $y_0 = \\sup$.}\n \\label{fig:boundedconstruction1}\n\\end{figure}\n\nLet us show that, for sufficiently large $R$, \non the event $\\{ \\pi \\mbox{ crosses } B(\\Sigma R) \\} \\cap A_1 \\cap A_2 \\cap \\{w^* \\in E_w\\}$, $B(\\Sigma R)$ contains\na feasible pair for any $T^*$-geodesic.\n\nFor a subset $S \\subset E$, denote by $T_S(p,q)$ the infimal weight of a path from $p$ to $q$ which only uses edges lying in $S$.\nFirst, let $a$ and $b$ be points of $S(\\Sigma R)$ such that $T_{E(B(\\Sigma R))^c}(x,a)$ and $T_{E(B(\\Sigma R))^c}(b,y)$\nare infimal. Fix a $T$-geodesic $\\alpha \\subset E(B(\\Sigma R))^c$ from $x$ to $a$, an edge geodesic $[a,o_i]$ from $a$ to $o_i$,\nan edge-geodesic $[o_i,b]$ from $o_i$ to $b$, and a $T$-geodesic $\\beta \\subset E(B(\\Sigma R))^c$ from $b$ to $y$,\nand define $\\pi' := \\alpha * [a,o_i] * [o_i,b] * \\beta$.\nWe claim that $T^*(\\pi') < T(\\pi)$ when $R$ is sufficiently large. To see this, first note that, if $v$ and $w$ are the first\nand last vertices of $\\pi$ lying on $S(\\Sigma R)$, we have\n\\[\n T^*(\\pi'_{x,a}) + T^*(\\pi'_{b,y}) = T(\\pi'_{x,a}) + T(\\pi'_{b,y}) \\le T(\\pi_{x,v}) + T(\\pi_{w,y}),\n\\]\nwhere here and elsewhere, for a path $\\gamma$ and vertices $p,q \\in V(\\gamma)$, $\\gamma_{p,q}$ denotes the\nsubpath of $\\gamma$ starting at $p$ and ending at $q$.\n\nNext, since $\\pi$ crosses $B_i$, $\\pi_{v,w}$ contains at least two subsegments connecting $S(\\Sigma R)$ and $S(R)$,\nand so since $A_1$ holds we have\n\\[\n T(\\pi_{v,w}) \\ge 2(\\inf + q)(\\Sigma - 1)R = (\\inf + q)\\left(1 - \\frac{1}{\\Sigma}\\right) 2 \\Sigma R,\n\\]\nwhile if $w^* \\in E_w$, we have\n\\[\n T^*(\\pi'_{a,b}) \\le 2 \\Sigma R(\\inf + \\delta) + (\\sup)C'.\n\\]\nSince by construction $\\inf + \\delta < (\\inf + q)\\left(1 - \\frac{1}{\\Sigma}\\right)$ and $(\\sup)C' = o(R)$,\nfor sufficiently large $R$ we have $T^*(\\pi') < T(\\pi)$.\n\nNow, consider a $T^*$-geodesic $\\pi^*$ from $x$ to $y$. On our event, we have $w^* \\ge w$ on $E(B(\\Sigma R - C'))^c$,\nso if $\\pi^*$ did not intersect $E(B(\\Sigma R - C'))$, we would have $T^*(\\pi^*) \\ge T(\\pi) > T^*(\\pi')$, a contradiction.\nThus, $\\pi^*$ must visit $B(\\Sigma R - C')$. In particular, it contains a subpath connecting $S(\\Sigma R)$ and $S(\\Sigma R - C')$,\nand so a subpath connecting $S(\\Sigma R - C(1+\\epsilon))$ and $S(\\Sigma R - C' + C(1 + \\epsilon))$, which must\nhave length at least $C' - 2C(1+\\epsilon) = C$. Choose a self-avoiding $\\epsilon$-detour $\\gamma$ for such a segment.\nSince $\\gamma$ has length at most $C(1+\\epsilon)$, it is contained in $E(B(\\Sigma R)) \\setminus E(B(\\Sigma R - C'))$.\nBut since $w^* \\in E_w$, this means that all the edges of both $\\gamma$ and the subsegment of $\\pi^*$ \nhave weights in $I_0$. Hence $B(\\Sigma R)$ contains a feasible pair for $\\pi^*$.\n\nFurthermore, since $y_0 = \\sup$, by the construction of $I_0$ we have\n\\[\n \\mathbb{P}( w^* \\in E_w ) \\ge \\min \\left( \\nu([\\inf, \\inf + \\delta)), \\nu([\\sup - \\delta_{\\sup}, \\sup] \\cap I_0) \\right)^{D C_1 (\\Sigma R)^d } > 0\n\\]\nindependent of $o_i$, so both hypotheses of Proposition \\ref{prop:conditionaltovdBK} hold.\n\nNow we suppose that $y_0 < \\sup$ and do a different construction of the $A_i^R$ and $E_w$. \nAgain take $\\epsilon, y_0, I_0, C, C', q$ as above.\nThen take some large $\\Sigma_0 > 2$ such that\n\\[\n \\inf < \\left(1 - \\frac{1}{\\Sigma_0} \\right) (\\inf + q) < \\sup;\n\\]\nthen take some $\\delta_0 > 0$ sufficiently small that\n\\[\n \\inf + \\delta_0 < \\left(1 - \\frac{1}{\\Sigma_0} \\right) (\\inf + q) < \\sup,\n\\]\n\\[\n \\sup - \\mathbb{E} w - 2\\delta_0 > 0,\n\\]\nand\n\\[\n \\sup - y_0 - 2\\delta_0 > 0.\n\\]\n(Note that $\\mathbb{E} w < \\sup$ since in the case that $\\nu$ is Dirac, $y_0 = \\sup$).\nNext, fix some $0 < s < \\left( 1 - \\frac{1}{\\Sigma_0} \\right) \\frac{(\\inf+q)}{\\sup}$ such that\n\\[\n (\\inf + \\delta_0) + s \\sup < \\left(1 - \\frac{1}{\\Sigma_0}\\right) (\\inf + q).\n\\]\nThen fix some $\\Sigma \\ge \\Sigma_0$ such that $s\\Sigma > 1$.\nAlso fix some\nsome $0 < \\kappa < \\frac{\\sup - \\delta_0 - \\mathbb{E} w }{ \\sup - \\inf } s$.\n\nThe event $A_i^R$ will be defined as the intersection of three events $A_1 \\cap A_2 \\cap A_3$.\nWe set\n\\[\n A_1 := \\left\\{ \\begin{array}{c} \n \\mbox{For all vertices } v,w \\in B(\\Sigma R) \\mbox{ with } d(v,w) \\ge R, \\\\ \n \\mbox{ all paths } \\gamma \\mbox{ from } v \\mbox{ to } w \n \\mbox{ in } B(\\Sigma R) \\mbox{ have } T(\\gamma) \\ge (\\inf + q)d(v,w)\n \\end{array} \\right\\},\n\\]\njust as in the first case.\nWe set\n\\[\n A_2 := \\left\\{ \\begin{array}{c} \n \\mbox{For all vertices } v,w \\in B(\\Sigma R) \\mbox{ with } d_{E(B(\\Sigma R))}(v,w) \\ge R, \\\\ \n T_{E(B(\\Sigma R))}(v,w) \\le (\\mathbb{E} w + \\delta_0) d_{E(B(\\Sigma R))}(v,w)\n \\end{array} \\right\\}.\n\\]\nFor this, note that for each fixed pair of points $v,w$ with $d_{E(B(\\Sigma R))}(v,w) \\ge R$, fixing an edge-minimal path $\\gamma: v \\to w$ in $B(\\Sigma R)$,\nwe have \n\\begin{align*}\n \\mathbb{P}(T_{E(B(\\Sigma R))}(v,w) > (\\mathbb{E} w + \\delta_0) d_{E(B(\\Sigma R))}(v,w)) \n &\\le \\mathbb{P}( T(\\gamma) > (\\mathbb{E} w + \\delta_0) |\\gamma| ), \\\\\n\\end{align*}\nwhich, since $T(\\gamma)$ is just a sum of $|\\gamma|$ i.i.d. $\\nu$-distributed random variables, decays exponentially in $|\\gamma|$, (hence $R$),\nby a standard Chernoff bound ($\\nu$ has bounded support and hence exponential moments).\nSince the number of pairs of such $(v,w)$ is strictly subexponential in $R$, we have $\\mathbb{P}(A_2^c) \\tendsto{R}{\\infty} 0$, as desired.\nClearly also $A_2$ only depends on the weights of edges in $B(\\Sigma R)$.\n\nLastly we choose for each $R$ some $0 \\le \\delta_{\\sup}(R) < \\delta_0$ such that $\\nu([\\sup - \\delta_{\\sup}, \\sup]) > 0$\nand $\\nu([\\inf,\\sup - \\delta_{\\sup}(R)])^{D C_1 (\\Sigma R)^d} \\tendsto{R}{\\infty} 1$, and then set\n\\[\n A_3 := \\left\\{ w(e) \\le \\sup - \\delta_{\\sup} \\mbox{ for all } e \\in E(B(\\Sigma R)) \\right\\}.\n\\]\nClearly $A_3$ only depends on edges in $B(\\Sigma R)$ and by our construction of $\\delta_{\\sup}(R)$, \nwe have $\\mathbb{P}(A_3) \\tendsto{R}{\\infty} 1$ uniformly in $i$, as desired.\n\nNow, let $a',b' \\in S(\\Sigma R)$ be such that $T_{E(B(\\Sigma R))^c}(x,a')$ and $T_{E(B(\\Sigma R))^c}(b',y)$ are minimal.\nChoose edge geodesics $[a',o_i]$ and $[b',o_i]$. Let $a \\in V([a',o_i]), b \\in V([b',o_i])$ be the unique vertices such that\n$d(a,a'), d(b,b') = \\lceil s \\Sigma R \\rceil$.\nMoreover, for each $t \\in [0,\\Sigma R - \\lceil s \\Sigma R \\rceil] \\cap \\mathbb{Z}$, let $a_t \\in V([a,o_i]), b_t \\in V([b,o_i])$ be the unique vertices such that\n$d(a,a_t), d(b,b_t) = t$.\nNow, let $t_a \\ge 0$ be minimal such that\n\\[\n d(a_{t_a+1}, [b,o_i]) \\le 2C',\n\\]\nand let $t_b \\ge 0$ be minimal such that\n\\[\n d(b_{t_b+1}, [a,o_i]) \\le 2C',\n\\]\nand set $ c:= a_{t_a}$, $d:=b_{t_b}$.\nNote that minimality implies that for all $0 \\le t \\le t_a$ we have $d(a_t,[b,o_i]) \\ge 2C' + 1$ and for all $0 \\le t \\le t_b$\nwe have $d(b_t,[a,o_i]) \\ge 2C' + 1$.\nHere we have tacitly used the fact that $d(a,b) \\ge d(a',b') - 2\\lceil s \\Sigma R \\rceil \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > R$\nis strictly larger than $2C'$ for sufficiently large $R$. \nTo see the bound $d(a',b') - 2\\lceil s \\Sigma R \\rceil \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > R$, \nlet $v$ and $w$ be the entry and exit points from $B(\\Sigma R)$ of the $T$-geodesic $\\gamma: x \\to y$, and\nnote that\n\\[\n d(a',b') \\ge \\frac{1}{\\sup} T(a',b') \\ge \\frac{1}{\\sup} T(v,w) \\ge \\frac{\\inf + q}{\\sup} \\left(1 - \\frac{1}{\\Sigma}\\right) 2 \\Sigma R,\n\\]\nso \n\\[\n d(a,b) \\ge d(a',b') - 2\\lceil s\\Sigma R \\rceil \\ge \\left[ \\frac{\\inf + q}{\\sup} \\left(1 - \\frac{1}{\\Sigma}\\right) - s \\right](2 \\Sigma R) - 1,\n\\]\nwhich is $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > R$ by choice of $s$.\nThe bound $T(a',b') \\ge T(v,w)$ comes from the fact that, since $v,w$ lie on the $T$-geodesic from $x$ to $y$,\n$T(x,y) = T(x,v) + T(v,w) + T(w,y) \\le T(x,a') + T(a',b') + T(b',y)$, and by definition of $a',b'$ we have\n$T(x,v) + T(w,y) \\ge T(x,a') + T(b',y)$.\nThe bound $T(v,w) \\ge (\\inf + q) \\left(1 - \\frac{1}{\\Sigma}\\right) 2 \\Sigma R$ comes from the fact that\n$\\pi$ crosses $B_i$ and hence contains at least two paths connecting $S(\\Sigma R)$ and $S(R)$,\nwhich, since $A_1$ holds, have total passage time at least $2(\\inf + q)(\\Sigma - 1)R$.\n\n\nNow consider the sets of integers\n\\[\n S_n(C',\\kappa) := \\{ n \\lfloor \\kappa \\Sigma R \\rfloor + j : j \\in [0,C'] \\cap \\mathbb{Z} \\} \\subset \\mathbb{Z}\n\\]\nand\n\\[\n S'_n(C',\\kappa) := \\{ n \\lfloor \\kappa \\Sigma R \\rfloor + j : j \\in [C(1+\\epsilon),C(2+\\epsilon)] \\cap \\mathbb{Z} \\} \\subset \\mathbb{Z},\n\\]\nwhere $n \\ge 0, n \\in \\mathbb{Z}$. Then let $\\alpha_n$ and $\\beta_n$ respectively be the subpaths of $[a,c]$ and $[b,d]$ respectively induced by the \nvertex sets $\\{ a_t : t \\in S_n \\}$ and $\\{ b_t : t \\in S_n \\}$. \nSimilarly let $\\alpha'_n$ and $\\beta'_n$ be induced by $\\{ a_t : t \\in S'_n \\}$ and $\\{ b_t : t \\in S'_n \\}$.\nFor each $n \\ge 0$ with $(n+1)\\lfloor \\kappa \\Sigma R \\rfloor \\le t_a, t_b$,\nfix a self-avoiding $\\epsilon$-detour $\\gamma_n$ for $\\alpha'_n$ and a self-avoiding $\\epsilon$-detour $\\delta_n$ for $\\beta'_n$.\nNote that by construction each $\\alpha_n \\cup \\gamma_n$ is disjoint from $[b,d]$ and all $\\beta_m \\cup \\delta_m$, and vice versa.\nMoreover, $\\alpha_n \\cup \\gamma_n$ is disjoint from $\\alpha_m \\cup \\gamma_m$ for $n \\ne m$, and the same is true for the\n$\\beta_n \\cup \\delta_n$.\n\nFinally, define \n\\[\n S_I := \\bigcup_{\\substack{n \\ge 0, \\\\ (n+1)\\lfloor \\kappa \\Sigma R \\rfloor \\le t_a, t_b}} (\\alpha_n \\cup \\gamma_n) \\cup (\\beta_n \\cup \\delta_n),\n\\]\ndefine\n\\[\n S_{\\inf} := ([a,c] \\cup [b,d]) \\setminus S_I,\n\\]\nand set $S_{\\sup} := E(B(\\Sigma R)) \\setminus (S_{\\inf} \\cup S_I)$.\nFor each $R$ we choose $0<\\delta_{\\inf}(R) < \\delta_0$ sufficiently small that \n$(D C_1 R^d +2)\\delta_{\\inf} < \\sup - \\delta_0 - y_0$.\nWe finally define our random set of configurations by\n\\begin{align*}\n E_w :=\n \\left \\{ \\omega \\in [0, \\infty)^{E(B(\\Sigma R))} : \n \\omega(e) \\in \n \\begin{array}{lc}\n I_0 \\cap (y_0 - \\frac{\\delta_{\\inf}}{2}, y_0 + \\frac{\\delta_{\\inf}}{2}) & e \\in S_I \\\\ \\relax\n [\\inf, \\inf + \\delta_{\\inf}] & e \\in S_{\\inf} \\\\ \\relax\n [\\sup - \\delta_{\\sup}, \\sup] & e \\in S_{\\sup}\n \\end{array} \n \\right \\}.\n\\end{align*}\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[scale=.65]{boundedconstruction2_cropped_fixed}\n \\caption{A schematic diagram of the prescribed set of configurations $E_w$ in the case\n that $\\nu$ has bounded support and $y_0 < \\sup$.}\n \\label{fig:boundedconstruction2}\n\\end{figure}\n\n\nNow let us prove that $A_1 \\cap A_2 \\cap A_3 \\cap \\{ \\pi \\mbox{ crosses } B_i \\} \\cap \\{ w^* \\in E_w \\}$\nis contained in the event that $B(\\Sigma R)$ contains a feasible pair with respect to $T^*$.\n\nFirst, define a path $\\pi'$ by taking a $T$-geodesic from $x$ to $a'$ in $B(\\Sigma R)^c$, then taking the path $[a',c]$,\ntaking an edge-geodesic from $c$ to $d$, taking $[d,b']$ and then taking a $T$-geodesic from $b'$ to $y$ in $B(\\Sigma R)^c$.\nFor all sufficiently large $R$, on the event $\\{\\pi \\mbox{ crosses } B_i\\} \\cap A_i^R \\cap \\{ w^* \\in E_w \\}$, we \nhave that $T^*(\\pi') < T(\\pi)$. To see this, first note that by definition of $a',b'$, if $v,w$ are the first entrance and last exit of $\\pi$ from $B(\\Sigma R)$\nthen we have $T(\\pi_{x,v}) + T(\\pi_{w,y}) = T(x,v) + T(w,y) \\ge T(x,a') + T(b',y) = T^*(\\pi'_{x,a'}) + T^*(\\pi'_{b',y})$.\nThus it suffices to show that $T(\\pi_{v,w}) > T^*(\\pi'_{a',b'})$ for sufficiently large $R$. Since $\\pi$ visits $B_i \\subset B(R)$\nand since $A_1$ holds we have\n\\[\n T(\\pi_{v,w}) \\ge (\\inf + q)2(\\Sigma - 1)R = (\\inf + q)\\left( 1 - \\frac{1}{\\Sigma} \\right) 2 \\Sigma R,\n\\]\nwhereas \n\\begin{align*}\n T(\\pi'_{a',b'}) &\\le \\left[ (\\inf + \\delta_{\\inf}) + \\frac{C'}{\\lfloor \\kappa \\Sigma R \\rfloor} (y_0 + \\delta_0) \\right](d(a,c) + d(b,d)) \n + (\\sup) (2s \\Sigma R + d(c,d)) \\\\\n &\\le (\\inf + \\delta_{\\inf} + (\\sup)s ) 2 \\Sigma R + o(R),\n\\end{align*}\nso this follows from our choice to ensure $\\inf + \\delta_{\\inf} + (\\sup) s < (\\inf + q)\\left( 1 - \\frac{1}{\\Sigma} \\right)$.\n(We get the bound $d(c,d) = o(R)$ as follows: assume that $t_a \\le t_b$; in the opposite case the argument is analogous.\nBy definition there exists some $t' \\ge t_b + 1$ such that $d(a_{t_a + 1}, b_{t'}) \\le 2C'$. But then \n\\[ |t' - (t_a + 1)| = |d(o_i, b_{t'}) - d(o_i, a_{t_a + 1})| \\le d(b_{t'}, a_{t_a + 1}) \\le 2C', \\] \nthat is, $t' \\le t_a + 1 + 2C' \\le t_b + 1 + 2C'$,\nand so \n\\[\n d(c,d) \\le d(c,a_{t_a +1}) + d(a_{t_a + 1}, b_{t'}) + d(b_{t'}, b_{t_b}) \\le 4C' + 2 = O(C') = o(R).)\n\\]\n\nNow, let $\\pi^*$ be a $T^*$-geodesic from $x$ to $y$. We show that $\\pi^*$ traverses a feasible pair.\n\nWe first show that if $p,q \\in V(\\pi^*) \\cap V(S_{\\inf})$ with $p$ and $q$ lying in the same connected component of $S_{\\inf} \\cup S_I$,\nthen $\\pi^*_{p,q} \\subset S_{\\inf} \\cup S_I $.\nTo see this, note that, when $w^* \\in E_w$, if $e$ is an edge in $[a,c]$ or $[b,d]$ with one endpoint in $S(t)$ and one in $S(t+1)$, then\n\\[\n w^*(e) \\le \\inf \\{ w^*(e') : e' \\mbox{ has one endpoint in } S(t) \\mbox{ and the other in } S(t+1) \\} + \\delta_{\\inf}.\n\\]\nThis is because, if $e \\in S_{\\inf}$, then $e' \\in S_{\\inf}$ or $e' \\in S_{\\sup}$ and if $e \\in S_I$ then $e' \\in S_I$ or $e' \\in S_{\\sup}$.\n\nSince every path from $p$ to $q$ must have at least one edge connecting $S(t)$ to $S(t+1)$ for all $t,t+1$\nbetween $d(a,p)$ and $d(a,q)$, we see that\n\\[\n T^*([p,q]) \\le T^*(\\alpha) + \\delta_{\\inf} |\\alpha|\n\\]\nfor any path $\\alpha$ from $p$ to $q$. If furthermore $\\alpha$ leaves $S_{\\inf} \\cup S_I$, then it contains at least one edge\nof weight at least $\\sup - \\delta_{\\sup}$; such an edge has weight at least $\\sup - \\delta_{\\sup} - y_0 - \\delta_{\\inf}$\ngreater than any edge in $[p,q]$. Hence in this case we get the bound\n\\[\n T^*([p,q]) + \\sup - \\delta_{\\sup} - y_0 - \\delta_{\\inf} \\le T^*(\\alpha) + \\delta_{\\inf} (|\\alpha| - 1).\n\\]\nBut applying our assumption on $\\delta_{\\inf}(R)$ we get\n\\[\n T^*(\\alpha) - T^*([p,q]) \\ge \\sup - \\delta_{\\sup} - y_0 - (|\\alpha| + 2)\\delta_{\\inf} \\ge \\sup - \\delta_0 - y_0 - (|B(\\Sigma R)|+2)\\delta_{\\inf}\n > 0.\n\\]\nThat is, such an $\\alpha$ is not optimal, and hence an optimal $T^*$-path $\\pi^*_{p,q}$ must lie in $S_{\\inf} \\cup S_I$.\n\nHence, if we can show that $V(\\pi^*)$ contains some $p$ and $q$ which lie in the same connected component of $S_{\\inf} \\cup S_I$\nbut lie in different components of $S_{\\inf}$, then we can apply the previous argument to deduce that\n$\\pi^*$ passes through some $\\alpha_n \\cup \\gamma_n$ or $\\beta_n \\cup \\delta_n$, and then use\nthe following proposition to conclude that $\\pi^*_{p,q}$ contains a feasible pair:\n\\begin{prop} \\label{prop:detourtodetour}\n Let $\\xi$ be an edge geodesic in $G$, and let $\\gamma$ be a self-avoiding $\\epsilon$-detour for a subpath of $\\xi$.\n Suppose that $w^*(e) \\in I_0$ for all $e \\in \\xi \\cup \\gamma$.\n Let $\\pi^*$ be a $T^*$-geodesic, and suppose that some subpath of $\\pi^*$ has the same endpoints as $\\xi$\n and that this subpath is contained in $\\xi \\cup \\gamma$. Then $\\xi \\cup \\gamma$ contains a feasible pair for $\\pi^*$.\\footnote{\n Technically we should include assumptions controlling the lengths of these paths to satisfy our definition of a feasible pair;\n in our applications of this proposition it is easy to see that the length of the detour is at most $C'(1+\\epsilon)$.}\n\\end{prop}\n\\begin{proof}[Proof of Proposition]\n Let $\\xi'$ be the subpath of $\\xi$ such that $\\gamma$ is an $\\epsilon$-detour for $\\xi'$, and write $\\xi = \\xi_1 * \\xi' * \\xi_2$.\n Let us also abuse notation and denote by $\\pi^*$ the subpath of $\\pi^*$ contained in $\\xi \\cup \\gamma$ which has the same endpoints as $\\xi$.\n If $\\pi^* = \\xi$,\n then $\\xi_1 * \\gamma * \\xi_2$ is an $\\epsilon$-detour for $\\pi^*$; loop-erasing then gives a \\emph{self-avoiding} $\\epsilon$-detour\n $\\gamma'$ for $\\xi$ (see the proof of Proposition\n \\ref{prop:detourequiv}), so $(\\pi^*, \\gamma')$ forms a feasible pair.\n If $\\pi^* \\ne \\xi$, then since $\\xi$ is an edge geodesic, $\\xi$ is a self-avoiding $\\epsilon$-detour for $\\pi^*$ (see the proof of Proposition\n \\ref{prop:detourequiv}), and hence $(\\pi^*, \\xi)$ forms a feasible pair.\n\\end{proof}\nSo it only remains to find such $p$ and $q$.\nThe idea is that, in order to make up for the slow edges $\\pi^*$ runs over when it enters and exits $B(\\Sigma R)$,\n$\\pi^*$ must visit many fast edges; we will then use the pigeonhole principle to conclude that it must contain suitable $p$ and $q$.\n\nExplicitly, first note that since $T^*(\\pi^*) \\le T^*(\\pi') < T(\\pi) \\le T(\\pi^*)$, we have $T(\\pi^*) - T^*(\\pi^*) > 0$.\nSince $w^* \\ge w$ on $E(B(\\Sigma R))^c \\cup S_{\\sup}$, $\\pi^*$ must therefore contain some edges in $S_I \\cup S_{\\inf}$.\nBut note that by construction, any path connecting $S(\\Sigma R)$ and $S_I \\cup S_{\\inf}$ contains a\na subpath which lies in $S_{\\sup}$ and connects two points in $B(\\Sigma R)$ of distance at least $s \\Sigma R > R$.\nSince $\\pi^*$ starts and ends outside of $B(\\Sigma R)$ and visits $S_{\\inf} \\cup S_I$, it contains at least\ntwo such subpaths, $\\alpha$ and $\\beta$. We then have\n\\[\n T^*(\\alpha) \\ge (\\sup - \\delta_{\\sup}) |\\alpha|, T^*(\\beta) \\ge (\\sup - \\delta_{\\sup}) |\\beta|\n\\]\nand \n\\[\n T(\\alpha) \\le (\\mathbb{E} w) |\\alpha|, T(\\beta) \\le (\\mathbb{E} w) |\\beta|\n\\]\n(since $A_2$ holds). Since $w^* \\ge w$ on $S_{\\sup}$ we then have\n\\[\n T^*(\\pi^* \\cap S_{\\sup}) - T(\\pi^* \\cap S_{\\sup}) \\ge T^*(\\alpha \\cup \\beta) - T(\\alpha \\cup \\beta)\n \\ge (\\sup - \\mathbb{E} w - \\delta_{\\sup}) s (2 \\Sigma R).\n\\]\nSince $T^*(\\pi^* \\cap E(B(\\Sigma R))^c) - T(\\pi^* \\cap E(B(\\Sigma R))^c) = 0$, in order to ensure that $T^*(\\pi^*) - T(\\pi^*) < 0$,\nit must be the case that\n\\[\n T(\\pi^* \\cap (S_{\\inf} \\cup S_I)) - T^*(\\pi^* \\cap (S_{\\inf} \\cup S_I)) > (\\sup - \\mathbb{E} w - \\delta_{\\sup}) s (2 \\Sigma R).\n\\]\nSince each edge $e$ admits savings at most $w(e) - w^*(e) \\le \\sup - \\inf$, this gives\n\\[\n |\\pi^* \\cap (S_{\\inf} \\cup S_I)| > \\frac{ \\sup - \\mathbb{E} w - \\delta_{\\sup} }{ \\sup - \\inf } s(2 \\Sigma R).\n\\]\nMoreover, since each component of $S_I$ is composed of less than $2C'$ edges\n\\[\n |S_I| \\le 2C' \\frac{2 \\Sigma R}{\\lfloor \\kappa \\Sigma R \\rfloor} = O(C') = o(R),\n\\]\nand so\n\\[\n |\\pi^* \\cap S_{\\inf}| \\ge \\frac{ \\sup - \\mathbb{E} w - \\delta_{\\sup} }{ \\sup - \\inf } s(2 \\Sigma R) - o(R);\n\\]\nsince by assumption $\\kappa < \\frac{ \\sup - \\delta_0 - \\mathbb{E} w }{\\sup - \\inf} s$, for sufficiently large $R$ we have in particular\n\\[\n |\\pi^* \\cap S_{\\inf}| > 2 \\kappa \\Sigma R.\n\\]\nSince $S_{\\inf} \\cup S_I$ has two connected components, at least one of the components contains more than\n$\\kappa \\Sigma R$ edges of $\\pi^* \\cap S_{\\inf}$. But each connected component of $S_{\\inf}$ contains at most \n$\\lfloor \\kappa \\Sigma R \\rfloor - C'$ edges, so $V(\\pi^*)$ must contain some pair of points $p,q$ which lie in \ndifferent connected components of $S_{\\inf}$ but in the same connected component of $S_{\\inf} \\cup S_I$, as desired.\nThus, this construction satisfies \\eqref{eq:resampleevents}. \n\nTo see that the construction satisfies the other hypothesis of Proposition \\ref{prop:conditionaltovdBK}, note that\n\\begin{align*}\n \\mathbb{P}(w^* \\in E_w | w) &= \n \\nu([\\inf, \\inf + \\delta_{\\inf}))^{|S_{\\inf}|} \n \\nu(I_0 \\cap (y_0 - \\frac{\\delta_{\\inf}}{2}, y_0 + \\frac{\\delta_{\\inf}}{2}))^{|S_I|} \n \\nu([\\sup - \\delta_{\\sup}, \\sup])^{|S_{\\sup}|} \\\\\n &\\ge \n \\min\\left( \\nu([\\inf, \\inf + \\delta_{\\inf})), \\nu(I_0 \\cap (y_0 - \\frac{\\delta_{\\inf}}{2}, y_0 + \\frac{\\delta_{\\inf}}{2}) ), \\nu([\\sup - \\delta_{\\inf}, \\sup])\\right)^{\n D C_1 (\\Sigma R)^d}\n\\end{align*}\nis bounded away from $0$ independently of $o_i$ and $x,y$, as desired.\n\n\\subsection{Geometric construction: unbounded case} \\label{unbddsupp}\nNow, suppose $\\nu$ has unbounded support. We construct the relevant events $A_i^R$ and configurations $E_w$ and show that\nthey satisfy \\eqref{eq:resampleevents}.\nThe main challenge for the case that $\\nu$ has unbounded support is in ensuring that the beginning and end of our\nprescribed path are far enough away from each other that we ``have enough room'' to make a segment and a detour which\ndon't collide with the rest of the path. Once we construct our prescribed path it will not be hard to force the \nresampled geodesic $\\pi^*$ to take it, since we can resample the prescribed path to have very small passage time\nand resample the surrounding edges to have arbitrarily large passage time.\n\nAgain assume that \\eqref{eq:extraassumption} holds, and then choose $\\epsilon>0, y_0, I_0$\nas in Lemma \\ref{lem:technicallemma}. \nAssume that $\\nu$ is exponential-subcritical and\nlet $q > 0$ be the parameter from Lemma \\ref{lem:bddawayfrominf}.\nThen fix $\\sigma > \\max(2, \\frac{2(\\inf + q)}{q})$ and $\\Sigma > \\sigma$.\nThe event $A_i^R$ will be constructed as the intersection of five events $A_i^R := A_1 \\cap A_2 \\cap A_3 \\cap A_4 \\cap A_5$.\nThe first event is\n\\[\n A_1 := \\{\\mbox{every path } \\gamma: v \\to w \\mbox{ in } B(\\Sigma R) \\mbox{ with } d(v,w) \\ge R \\mbox{ satisfies } \n T(\\gamma) \\ge (\\inf + q)d(v,w)\\}.\n\\]\nThis evidently only depends on the edges in $E(B(\\Sigma R))$. Moreover, by Lemma \\ref{lem:bddawayfrominf}, \nfor all sufficiently large $R$ we have\n\\[\n \\mathbb{P}(A_1^c) \\le \\sum_{\\substack{ v,w \\in B(o_i, \\Sigma R), \\\\ d(v,w) \\ge R}} \\mathbb{P}(T(v,w) < (\\inf + q)d(v,w))\n \\le |B(o_i,\\Sigma R)| e^{-c R} \\le C_1 R^d e^{-c R} \\tendsto{R}{\\infty} 0.\n\\]\nFor the next event we choose $\\delta_{\\inf}(R) \\tendsto{R}{\\infty} 0$ such that $\\nu([\\inf, \\inf+\\delta_{\\inf}]) > 0$ and\n${DC_1(\\Sigma R)^d \\delta_{\\inf}(R) \\le 1}$ for all $R$,\nand $ {\\left(\\nu([\\inf + \\delta_{\\inf}(R), \\infty))\\right)^{DC_1(\\Sigma R)^d} \\tendsto{R}{\\infty} 1}$. (Note that if there is an atom at $\\inf$, then eventually\nwe will have $\\delta_{\\inf}(R)=0$, but $\\delta_{\\inf}(R) \\ge 0$ always). Note that the second condition\nimplies in particular that $|E(B(\\Sigma R))| \\delta_{\\inf}(R) \\le 1$. We define\n\\[\n A_2 := \\{ w(e) \\ge \\inf + \\delta_{\\inf} \\mbox{ for all } e \\in E(B(\\Sigma R)) \\}.\n\\]\nThis clearly only depends on the weights in $E(B(\\Sigma R))$ and the third condition on $\\delta_{\\inf}(R)$ implies that \n\\[\n \\mathbb{P}(A_2) = \\nu([\\inf + \\delta_{\\inf}, \\infty))^{|E(B(\\Sigma R)|} \\ge \\nu([\\inf + \\delta_{\\inf}, \\infty))^{DC_1(\\Sigma R)^d} \\tendsto{R}{\\infty} 1.\n\\]\nFor the third event, we choose $M(R) \\tendsto{R}{\\infty} \\infty$ such that $\\nu^{*DC_1(\\Sigma R)^d}([0,M(R)]) \\tendsto{R}{\\infty} 1$.\nWe set\n\\[\n A_3 := \\left\\{ \\sum_{e \\in E(B(\\Sigma R))} w(e) \\le M \\right\\}.\n\\]\nIt is clear by the choice of $M(R)$ that $\\mathbb{P}(A_3) \\tendsto{R}{\\infty} 1$. Also note that since $\\nu$ is assumed to have infinite support,\n$\\nu( (M(R), \\infty)) > 0$ for all $R$.\n\nLet us call a value $p \\in \\mathrm{supp} \\hspace{2pt} \\nu$ \\emph{$(\\delta,\\eta)$-resamplable} if $\\nu([p,p+\\delta)) \\ge \\eta$. \nSet $\\delta_{sim}(R) := (D C_1 R^d)^{-1}$.\nThen, using Proposition \\ref{prop:resampling} below, choose $\\eta(R) > 0$ such that\n\\[\n \\nu( \\{ p : p \\mbox{ is } (\\delta_{sim}(R), \\eta(R)) \\mbox{-resamplable} \\} )^{DC_1 R^d} \\ge 1 - e^{-R}.\n\\]\nSet\n\\[\n A_4 := \\left\\{ w(e) \\mbox{ is } (\\delta_{sim},\\eta) \\mbox{-resamplable for all } e \\in E(B(\\Sigma R)) \\right\\}.\n\\]\nClearly $A_4$ only depends on weights of edges in $E(B(\\Sigma R))$, and by our choice of $\\eta(R)$ we have\n\\[\n \\mathbb{P}(A_4) \\ge 1 - e^{-R} \\tendsto{R}{\\infty} 1.\n\\]\n\nThe event $A_5$ is more complicated to describe, so we delay its description and the proof that $\\mathbb{P}(A_5) \\tendsto{R}{\\infty} 1$\nuntil the end of the section.\n\nNext we describe the construction of $E_w$. Denote by $\\pi$ the geodesic from $x$ to $y$, and denote by $v$ and $w$\nthe first vertex of $\\pi$ which lies in $B(\\Sigma R)$ and the last vertex of $\\pi$ which lies in $B(\\Sigma R)$, respectively.\nAs will be proved in Lemma \\ref{lem:a5} at the end of the section, the event $A_5$ implies that, for some $\\Sigma R \\ge r \\ge \\sigma R$,\nwe have disjoint self-avoiding paths $\\alpha$ and $\\beta$ with the following properties:\n\\begin{enumerate} \\label{pathconditions}\n \\item $\\alpha$ starts at $x$ and ends at a point $v' \\in S(r)$; moreover $V(\\alpha) \\cap B(r-1) = \\emptyset$.\n \\item $\\beta$ starts at a point $w' \\in S(r)$ and ends at $y$; moreover $V(\\beta) \\cap B(r-1) = \\emptyset$.\n \\item $d_{E(B(r))}(v',w') > K := 4C(1 + \\epsilon)$.\n \\item $\\alpha$ coincides with $\\pi$ until its last entrance into $B(\\Sigma R)$\n and $\\beta$ coincides with $\\pi$ after its first exit from $B(\\Sigma R)$.\n Explicitly,\n Choose $\\tilde{v}$ to be the last entrance of $\\alpha$ into $B(\\Sigma R)$, so that $\\alpha_{\\tilde{v},v'}$\n is the connected component of $E(B(\\Sigma R)) \\cap \\alpha$ containing $v'$.\n Similarly choose $\\tilde{w}$ to be the first exit of $\\beta$ from $B(\\Sigma R)$, so that \n $\\beta_{w',\\tilde{w}}$ is the connected component of $E(B(\\Sigma R) \\cap \\beta$ containing $w'$.\n We have that $\\tilde{v},\\tilde{w} \\in V(\\pi)$, and $\\pi_{x,\\tilde{v}} = \\alpha_{x,\\tilde{v}}$\n and $\\pi_{\\tilde{w},y} = \\beta_{\\tilde{w},y}$.\n \\item Let $v_r \\in S(r)$ be the vertex of $\\pi$ immediately preceding the first vertex of $\\pi$ which lies in $B(r-1)$, \n and let $w_r \\in S(r)$ be the vertex of $\\pi$ immediately following the last vertex of $\\pi$ which lies in $B(r-1)$.\n Then $|\\alpha_{\\tilde{v},v'}| \\le |\\pi_{\\tilde{v},v_r}|$ and $|\\beta_{w',\\tilde{w}}| \\le |\\pi_{w_r,\\tilde{w}}|$.\n\\end{enumerate}\nNow choose edge-geodesics $[v',o_i]$ from $v'$ to $o_i$ and $[o_i,w']$ from $o_i$ to $w'$.\nAgain let $C$ be such that every self-avoiding path of length $C$ admits a self-avoiding $\\epsilon$-detour.\nLet $a$ be the vertex of $[v',o_i]$ which is distance $C(1+\\epsilon)$ from $v'$.\nLet $b$ be the vertex of $[v',o_i]$ which is distance $C(2+\\epsilon)$ from $v'$.\nThen $[a,b]:=[v',o_i]_{a,b}$ is a self-avoiding path of length $C$, and hence it admits a self-avoiding $\\epsilon$-detour $\\gamma$.\n\\begin{prop}\n $\\gamma$ is contained in $B(r-1)$ and $V(\\gamma) \\cap V([o_i,w']) = \\emptyset$.\n\\end{prop}\n\\begin{proof}\n The first claim follows from the fact that $\\gamma$ has length at most $C(1+\\epsilon)$;\n To see the second claim, suppose to the contrary\n that there was some $z \\in V(\\gamma) \\cap V([o_i,w'])$. Since $d_{B(r)}(z,a) \\le C(1+\\epsilon)$ and $z$ and $a$ both lie on edge-geodesics\n to $o_i$, we have that \n \\begin{align*}\n |d_{B(r)}(v',a) - d_{B(r)}(w',z)| &= |[d_{B(r)}(v',o_i) - d_{B(r)}(a,o_i)] - [d_{B(r)}(w',o_i) - d_{B(r)}(z,o_i)]| \\\\\n &= |d_{B(r)}(a,o_i) - d_{B(r)}(z,o_i)| \\le d_{B(r)}(a,z) \\le C(1+\\epsilon),\n \\end{align*}\n and therefore\n \\[\n d_{B(r)}(w',z) \\le d_{B(r)}(v',a) + C(1+\\epsilon) = 2C(1 + \\epsilon),\n \\]\n hence\n \\[\n d_{B(r)}(v',w') \\le d_{B(r)}(v',a) + d_{B(r)}(a,z) + d_{B(r)}(z,w') \\le 2C(1+\\epsilon) + 2C(1 + \\epsilon) = 4C(1 + \\epsilon) = K,\n \\]\n contradicting the fact that $d(v',w') > K$.\n\\end{proof}\nSet $b'$ to be the vertex in $V([v',o_i])$ which has distance $C' = C(3 + 2\\epsilon)$ from $v'$.\nSet $o'$ to be the first intersection of $V([v',o_i])$ with $V([o_i,w'])$; the previous proposition shows\nthat $o'$ is strictly closer to $o_i$ than $b'$.\nDefine as usual $[v',o']:=[v',o_i]_{v',o'}$, $[o',w']:=[o_i,w']_{o',w'}$, $[v',b'] := [v',o_i]_{v',b'}$.\nWe now define the following subsets of $E(B(\\Sigma R))$:\n\\begin{align*}\n S_I &:= [v',b'] \\cup \\gamma, \\\\\n S_{\\inf} &:= \\left(\\alpha_{\\tilde{v},v'} * [v',o'] * [o',w'] * \\beta_{w',\\tilde{w}}\\right) \\setminus S_I, \\\\\n S_{sim} &:= (\\alpha \\cup \\beta) \\cap E(B(\\Sigma R)) \\setminus S_{\\inf} \\\\\n S_{M} &:= E(B(\\Sigma_R)) \\setminus (S_I \\cup S_{\\inf} \\cup S_{sim}).\n\\end{align*}\nNote that these sets are all pairwise disjoint and cover $E(B(\\Sigma R))$.\nNow we can finally define our set of configurations $E_w$:\n\\begin{align*}\n E_w :=\n \\left \\{ \\omega \\in [0, \\infty)^{E(B(\\Sigma R))} : \n \\omega(e) \\in \n \\begin{array}{lc}\n I_0 & e \\in S_I \\\\ \\relax\n [\\inf, \\inf + \\delta_{\\inf}] & e \\in S_{\\inf} \\\\ \\relax\n [w(e), w(e) + \\delta_{sim}) & e \\in S_{sim} \\\\ \\relax\n [M, \\infty) & e \\in S_M \n \\end{array} \n \\right \\}.\n\\end{align*}\n(We have used the assumption that $w \\in A_1 \\cap A_2 \\cap A_3 \\cap A_4 \\cap A_5$ to construct $E_w$, and this is really the only case\nwe care about; off of this event we may define $E_w = \\emptyset$).\n\\begin{figure}[t]\n \\centering\n \\includegraphics[scale=.45]{unboundedconstructionjustgeodesic_cropped}\n \\includegraphics[scale=.45]{unboundedconstruction_cropped}\n \\caption{If the $T$-geodesic from $x$ to $y$ is as in the diagram on the left, the prescribed weights $E_w$\n might be given by the diagram on the right.}\n \\label{fig:unboundedconstruction}\n\\end{figure}\n We now show that this choice of $A_i$ and $E_w$ satisfies \\eqref{eq:resampleevents}.\n \n Set\n \\[\n \\pi' := \\alpha * [v',o'] * [o',w'] * \\beta.\n \\]\n First we show that $T^*(\\pi') < T(\\pi)$.\n By construction we have $\\pi'_{x,\\tilde{v}} = \\pi_{x,\\tilde{v}}$ and $\\pi'_{\\tilde{w},y} = \\pi_{\\tilde{w},y}$.\n Moreover, each edge in either of those paths is also by construction either in $E(B(\\Sigma R))^c$ or $S_{sim}$, and hence \n when $w^* \\in E_w$,\n \\begin{align*}\n T^*(\\pi'_{x,\\tilde{v}} \\sqcup \\pi'_{\\tilde{w},y}) &\\le T(\\pi'_{x,\\tilde{v}} \\sqcup \\pi'_{\\tilde{w},y}) + |E(B(\\Sigma R))| \\delta_{sim} \\\\\n &\\le T(\\pi_{x,\\tilde{v}} \\sqcup \\pi_{\\tilde{w},y}) + 1.\n \\end{align*}\n Next, since $|\\alpha_{\\tilde{v},v'}| \\le |\\pi_{\\tilde{v},v_r}|$, $|\\beta_{w',\\tilde{w}}| \\le |\\pi_{w_r,\\tilde{w}}|$, since $A_2$ holds,\n and since $\\alpha_{\\tilde{v},v'}, \\beta_{w',\\tilde{w}} \\subset S_{\\inf}$, we have\n \\[\n T^*(\\alpha_{\\tilde{v},v'} \\sqcup \\beta_{w',\\tilde{w}}) \\le \n (\\inf + \\delta_{\\inf})|\\alpha_{\\tilde{v},v'} \\sqcup \\beta_{w',\\tilde{w}}|\n \\le (\\inf + \\delta_{\\inf})|\\pi_{\\tilde{v},v_r} \\sqcup \\pi_{w_r,\\tilde{w}}|\n \\le T(\\pi_{\\tilde{v},v_r} \\sqcup \\pi_{w_r,\\tilde{w}}).\n \\]\n Now, since $\\pi'_{v',w'} \\setminus [v',b'] \\subset S_{\\inf}$ and $[v',b'] \\subset S_I$, we have\n \\[\n T^*(\\pi'_{v',w'}) \\le (\\inf + \\delta_{\\inf})2r + (\\sup I_0)C(3+2\\epsilon),\n \\]\n while since $A_1$ holds and $\\pi_{v_r,w_r}$ starts and ends at $S(r)$ ($r \\ge \\sigma R > 2R$) and visits $S(R)$, we have\n \\[\n T(\\pi_{v_r,w_r}) \\ge 2(\\inf + q)(r - R),\n \\]\n so that \n \\begin{align} \\label{eq:pathfaster}\n T(\\pi_{v_r,w_r}) - T^*(\\pi'_{v',w'}) &\\ge 2R[ (q - \\delta_{\\inf})\\frac{r}{R} - (\\inf + q) ] - (\\sup I_0)C(3+2\\epsilon) \\nonumber \\\\\n &\\ge 2R[ (q - \\delta_{\\inf}) \\sigma - (\\inf + q) ] - (\\sup I_0)C(3+2\\epsilon),\n \\end{align}\n For $R$ sufficiently large we have $\\delta_{\\inf} < q/2$ so that \n \\[\n (q - \\delta_{\\inf}) \\sigma - (\\inf + q) > (q/2) \\sigma - (\\inf + q) > 0,\n \\]\n that is, the coefficient of $R$ \\eqref{eq:pathfaster} is strictly positive.\n Altogether we have\n \\[\n T(\\pi) - T^*(\\pi') \\ge 2R[ (q / 2) \\sigma - (\\inf + q) ] - (\\sup I_0)C(3+2\\epsilon) - 1 \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > R,\n \\]\n so in particular $T^*(\\pi') < T(\\pi)$ for all sufficiently large $R$.\n \n From this, we can conclude that the $T^*$-geodesic $\\pi^*$ must contain some edges in $S_{\\inf} \\cup S_I$.\n For suppose it did not;\n since $w^* \\ge w$ on $(S_{\\inf} \\cup S_I)^c$, we would have\n \\[\n T^*(\\pi^*) \\ge T(\\pi^*) \\ge T(\\pi) > T^*(\\pi'),\n \\]\n contradicting $T^*$-geodesicity of $\\pi^*$.\n \n Next, we know that $\\pi^*$ contains no edge in $S_M$. For suppose that it did; then, since $A_3$ holds,\n \\begin{align*}\n T^*(\\pi^*) &\\ge T^*(\\pi^* \\cap E(B(\\Sigma R))^c) + M \\\\\n &\\ge T(\\pi^* \\cap E(B(\\Sigma R))^c) + \\sum_{e \\in E(B(\\Sigma R))} w(e) \\\\\n &\\ge T(\\pi^*) \\ge T(\\pi) > T^*(\\pi'),\n \\end{align*}\n again contradicting $T^*$-geodesicity of $\\pi^*$.\n \n Note that $S_{\\inf} \\cup S_I$ and $S_{sim}$ by construction share no vertices in common, and so we see\n that $\\pi^*$, as a self-avoiding path which enters $S_{\\inf}$, does not intersect $S_M$ and eventually exits\n $B(\\Sigma R)$, must contain $\\pi'_{\\tilde{v},v'}$ and $\\pi'_{b',\\tilde{w}}$ or their reverses as a subpath.\n In particular, some subpath of $\\pi^*$ has endpoints $v',b'$ and is restricted to $S_I = [v',b'] \\cup \\gamma$\n and hence by Proposition \\ref{prop:detourtodetour} $S_I$ contains a feasible pair for $\\pi^*$, and we are done\n showing that \\eqref{eq:resampleevents} is satisfied.\n\nTo complete the proof that the $A_i^R, E_w$ satisfy\nthe hypotheses of Proposition \\ref{prop:conditionaltovdBK},\nit remains to prove the ``resampling lemma'' \nrelevant to $A_4$, to describe and prove the relevant properties of $A_5$,\nand to give a lower bound on the conditional probability of $\\{ w^* \\in E_w\\}$.\n\\begin{prop} \\label{prop:resampling}\n For any fixed $\\delta > 0$, we have\n \\[\n \\lim_{\\eta \\to 0} \\nu( \\{ p : p \\mbox{ is } (\\delta,\\eta) \\mbox{-resamplable}\\}) = 1.\n \\]\n\\end{prop}\n\\begin{proof}\n By continuity of measure we have that\n \\[\n \\lim_{\\eta \\to 0} \\nu( \\{p : \\nu([p,p+\\delta)) > \\eta ) = \\nu( \\{p : \\nu([p,p+\\delta)) > 0),\n \\]\n so it will suffice to show that\n \\[\n \\nu( \\{ p : \\nu([p,p+\\delta)) = 0 \\}) = 0.\n \\]\n Set $N := \\{ p : \\nu([p,p+\\delta)) = 0 \\}$. We claim that there is a countable subset $X \\subset N$ such that\n \\[\n N \\subset \\bigcup_{p \\in X} [p,p+\\delta).\n \\]\n Once we know this, the proposition follows, since then\n \\[\n \\nu(N) \\le \\nu \\left( \\bigcup_{p \\in X} [p,p+\\delta) \\right) \\le \\sum_{p \\in X} \\nu([p,p+\\delta)) = 0.\n \\]\n To construct $X$, first set $X_0 := \\emptyset$. \n For each $i \\ge 0$, consider $n_{i+1} := \\inf N \\setminus \\left( \\bigcup_{p \\in X_i} [p,p+\\delta) \\right)$.\n If $n_{i+1} \\in N$, then set $X_{i+1} := X_i \\cup \\{n_{i+1}\\}$. Otherwise choose a (countable) sequence $S_{i+1}$\n of points of $N$ approaching $n_{i+1}$ and set $X_{i+1} := X_i \\cup S_{i+1}$.\n It is simple to inductively check that each $X_i$ is countable and that $\\bigcup_{p \\in X_i} [p,p+\\delta)$\n covers at least $N \\cap [0,i\\delta)$, so $X := \\cup_{i=1}^{\\infty} X_i$ is a countable subset of $N$\n with $\\bigcup_{p \\in X} [p,p+\\delta) \\supset N$, as desired.\n\\end{proof}\n\nNow we describe the event $A_5$ and its properties.\nThe intuition is as follows: considering the $T$-geodesic $\\pi: x \\to y$, for each ball $B(r)$, if the first entrance of $\\pi$ into that ball\nis far from the last exit of $\\pi$ from that ball, then we have ``enough room'' to do our construction, that is, we have\npaths satisfying (1)-(5) above. So we want to bound the probability that, to the contrary, for all radii $r$, the first entry and last exit\nare close. In fact, an even weaker event gives us ``enough room,'' and we bound the probability of the failure of this event\nby showing that it would entail that $\\pi$ is constrained to a ``narrow'' subgraph as it crosses $B(o_i, \\Sigma R)$, making it unlikely\nthat the geodesic would enter so deep into $B(o_i, \\Sigma R)$ before turning around.\n\nFor the formal construction of the event, first, given a pair of points $p,q \\in S(\\Sigma R)$, take edge-geodesics $[p,o_i]$ and $[q,o_i]$ from $p$ and $q$ respectively to $o_i$.\nFor each $\\Sigma R \\ge r \\ge \\sigma R$, let $p^r$ and $q^r$ be the unique elements of $V([p,o_i]) \\cap S(r)$ \nand $V([q,o_i]) \\cap S(r)$ respectively. We then define\n\\[\n S^r_0(p,q) := \\left(B_{E(B(r))}(p^r,3K) \\cup B_{E(B(r))}(q^r,3K)\\right) \\cap S(r),\n\\]\nwhere $K := 4C(1 + \\epsilon)$. Then for each $\\ell \\ge 0$ we define\n\\[\n S^r_{\\ell}(p,q) := \\left\\{ z \\in B(\\Sigma R) \\setminus B(r-1) : d_{E(B(\\Sigma R) \\setminus B(r-1))}(z,S^r_0(p,q)) = \\ell \\right\\}.\n\\]\nLastly, for $\\Sigma R - 3K \\ge r \\ge \\sigma R$, set\n\\[\n S^r(p,q) := \\bigsqcup_{\\ell=0}^{3K} S^r_{\\ell}(p,q)\n\\]\nand define the event\n\\[\n C^r(p,q) := \n \\left \\{ \\begin{array}{c} \\mbox{ there exist paths }\\gamma_1,\\gamma_2 \\mbox{ in } S^r(p,q) \\mbox{ such that } \\\\\n \\mbox{ the endpoints } a_1,b_1 \\mbox{ of } \\gamma_1 \\mbox{ lie in } S^r_{2K} \\mbox{ and} \\\\\n |\\gamma_1| \\le K, \\mbox{ one endpoint of } \\gamma_2 \\mbox{ lies in} \\\\\n S^r_{2K} \\mbox{ and the other lies in } S^r_0, \\mbox{ and } T(\\gamma_2) \\le T(\\gamma_1)\n \\end{array}\n \\right\\}.\n\\]\nWe now define the event $A_5$ by\n\\[\n A_5 := \\bigcap_{\\substack{p,q \\in S(\\Sigma R), \\\\ d_{B(\\Sigma R)}(p,q) \\le K}} \n \\left( \\bigcap_{r=\\sigma R}^{\\Sigma R - 3K} C^r(p,q) \\right)^c,\n\\]\nthat is, $A_5$ is the event that for each pair $p,q$ of close points on $S(\\Sigma R)$, $C^r(p,q)$ fails for at least some $r$.\nNote that $A_5$ only depends on the weights of edges in $E(B(\\Sigma R))$.\n\\begin{prop}\n There exists some constant $\\rho < 1$ depending only on the degree $D$ of $G$, $\\nu$, and $K$ such that \n \\[\n \\mathbb{P}(C^r(p,q)) \\le \\rho\n \\]\n for all $R,p,q,r$.\n\\end{prop}\n\\begin{proof}\n First note that, since each $S^r_0$ lies in the union of two balls of radius $3K$, $S^r_0$ contains at most $2(D+1)^{3K}$ vertices.\n Since the entirety of $S^r$ lies within distance $3K$ of $S^r_0$, we further have that\n \\[\n |S^r| \\le |S^r_0| (D+1)^{3K} \\le 2(D+1)^{6K}.\n \\]\n That is, we have a uniform bound on the possible number of vertices in $S^r$, and so it is not hard to see\n that the subgraph induced by $S^r(p,q)$ can only take on finitely many isomorphism types as all parameters except $D$ and $K$ vary.\n Hence, to show our claim, it suffices to show that for each fixed isomorphism type, $\\mathbb{P}(C^r(p,q)) < 1$.\n (Here ``isomorphism type'' includes the relevant extra data of which subsets correspond to $S^r_0$ and $S^r_{2K}$,\n but even with this extra data it is easy to see that a bound on the number of vertices implies a bound on the number of \n possible isomorphism types).\n \n To this end, fix an isomorphism type, and let $E'$ be the set of edges in $S^r$ which lie in some path in $S^r$ of length\n at most $K$ joining two vertices of $S^r_{2K}$. Since $\\nu$ is assumed to have unbounded support (in particular it is not Dirac),\n there is some $a>0$ such that $\\nu([0,a))>0$ and $\\nu([a,\\infty)) > 0$. Then the event\n \\[\n \\{ w(e) < a \\mbox{ for all } e \\in E', w(e) \\ge a \\mbox{ for all } e \\notin E' \\}\n \\]\n has nonzero probability. Moreover, this event entails the failure of $C^r(p,q)$, since on it all candidates for $\\gamma_1$\n necessarily have edges in $E'$ and hence have $T(\\gamma_1) < aK$, while all candidates for $\\gamma_2$\n must have at least $K$ edges lying in $E'^c$, and hence $T(\\gamma_2) \\ge aK > T(\\gamma_1)$.\n\\end{proof}\n\\begin{prop}\n $\\mathbb{P}(A_5) \\tendsto{R}{\\infty} 1$.\n\\end{prop}\n\\begin{proof}\n For each fixed $p,q$, note that whenever $S \\subset [\\sigma R, \\Sigma R - 3K] \\cap \\mathbb{Z}$ is such that each element\n has distance at least $3K$ from every other element, the subgraphs\n $\\{ S^r(p,q) : r \\in S \\}$ are all disjoint and hence the events $\\{ C^r(p,q) : r \\in S \\}$ are all independent.\n Since $K, \\sigma, \\Sigma$ are constants fixed independent of $R$, it is easy to see that there is some $c_3 >0$ such that\n for all large $R$ we can pick such an $S$ with $|S| \\ge c_3 R$, and so\n \\begin{align*}\n \\mathbb{P} \\left( \\bigcap_{r=\\sigma R}^{\\Sigma R - 3K} C^r(p,q) \\right) &\\le \\mathbb{P} \\left( \\bigcap_{r \\in S} C^r(p,q) \\right) \\\\\n &= \\prod_{r \\in S} \\mathbb{P}( C^r(p,q) ) \\le \\rho^{c_3 R},\n \\end{align*}\n where $\\rho < 1$ is provided by the previous proposition. But then we have\n \\begin{align*}\n \\mathbb{P}(A_4^c) \\le \\sum_{p,q \\in S(\\Sigma R)} \\mathbb{P} \\left( \\bigcap_{r=\\sigma R}^{\\Sigma R - 3K} C^r(p,q) \\right)\n \\le (C_1 (\\Sigma R)^d)^2 \\rho^{c_3 R} \\tendsto{R}{\\infty} 0,\n \\end{align*}\n as desired.\n\\end{proof}\n\nNow we prove the key property of $A_5$.\n\\begin{lemma} \\label{lem:a5}\n On the event $A_5 \\cap \\{ x,y \\notin B(\\Sigma R), \\pi \\mbox{ visits } B_i \\}$, \n for some $\\Sigma R \\ge r \\ge \\sigma R$, there exist paths $\\alpha$ and $\\beta$ satisfying conditions 1 through 5 above.\n\\end{lemma}\n\\begin{proof}\n Denote by $v$ and $w$ respectively the first and last vertices of $\\pi$ which lie in $B(\\Sigma R)$.\n Now, for each $\\Sigma R \\ge r \\ge \\sigma R$, define\n $v_r \\in S(r)$ to be the vertex of $\\pi$ immediately preceding the first vertex of $\\pi$ lying in $B(r-1)$,\n and define $w_r \\in S(r)$ to be the vertex of $\\pi$ immediately following the last vertex of $\\pi$ lying in $B(r-1)$.\n All of these are well defined, since $\\pi$ starts and ends outside of $B(\\Sigma R)$ and visits $B_i \\subset B(R) \\subset B(\\sigma R)$.\n Then define $\\alpha_r := \\pi_{x,v_r}$, $\\beta_r := \\pi_{w_r,y}$.\n If for some $r$, $d_{E(B(r))}(v_r,w_r) > K$, then we can just take $\\alpha = \\alpha_r$ and $\\beta = \\beta_r$ and we are done.\n So from here on assume that $d_{E(B(r))}(v_r,w_r) \\le K$ for all $\\Sigma R \\ge r \\ge \\sigma R$.\n \n Next, for each $r$ we define the set\n \\[\n \\tilde{S}^r := \n \\left( \\bigcup_{p \\in V(\\alpha_r \\cup \\beta_r) \\cap B(\\Sigma R)} \n \\bigcup_{\\substack{\\gamma: p \\to o_i \\\\ \\mbox{edge geodesic}}} V(\\gamma) \\right) \\cap S(r).\n \\]\n Suppose that for some $\\Sigma R \\ge r \\ge \\sigma R$, there is some $z \\in \\tilde{S}^r$ \n with $d_{E(B(r))}(z,v_r), d_{E(B(r))}(z,w_r) > K$. Then we can construct $\\alpha$ and $\\beta$ as follows.\n $z$ by definition lies on some edge-geodesic $\\gamma$ from some point $p \\in V(\\alpha_r \\cup \\beta_r)$ to $o_i$.\n Consider the last vertex of $V(\\gamma) \\cap (V(\\alpha_r \\cup \\beta_r))$ (that is, the nearest vertex to $o_i$), and call it $z'$.\n If $z' \\in V(\\alpha_r)$, set $\\alpha := (\\alpha_r)_{x,z'} * \\gamma_{z',z}$ and $\\beta := \\beta_r$.\n If $z' \\in V(\\beta_r)$, set $\\beta := \\overline{\\gamma}_{z,z'} * (\\beta_r)_{z',y}$ and $\\alpha := \\alpha_r$ (here an overline denotes the reverse of a path).\n In either case, $\\alpha$ and $\\beta$ give disjoint self-avoiding paths because the original paths were disjoint and self-avoiding\n and because by construction $V(\\gamma_{z',z})$ only intersects $V(\\alpha_r \\cup \\beta_r)$ at $z'$.\n Conditions (1)-(3) are satisfied by choice of $z$, (4) is satisfied because $\\alpha$ and $\\overline{\\beta}$ agree with $\\alpha_r$ and \n $\\overline{\\beta_r}$ until one of them reaches $z'$, \n and from that point the path follows $\\gamma$; in particular, it stays inside $B(\\Sigma R)$ \n until it reaches its endpoint. For (5), note that, since $\\gamma$ is an edge-geodesic from $z'$ to $o_i$,\n \\[\n |\\gamma_{z',z}| = d(z',o_i) - d(z,o_i) = d(z',o_i) - r.\n \\]\n Since $(\\alpha_r)_{z', v_r}$ (or $(\\overline{\\beta}_r)_{z',w_r}$, if $z' \\in V(\\beta_r)$) is a path from $z'$ to $S(r)$,\n it must have length at least $d(z',o_i) - r$ by the triangle inequality, and so we get (5).\n \n Lastly, we show that, if both of the above conditions fail, i.e. for all $\\Sigma R \\ge r \\ge \\sigma R$ we have\n \\begin{equation} \\label{eq:closeentryexit}\n d_{E(B(r))}(v_r,w_r) \\le K\n \\end{equation}\n and\n \\begin{equation} \\label{eq:tightspace}\n \\tilde{S}^r \\subset \\left(B_{E(B(r))}(v_r, K) \\cup B_{E(B(r))}(w_r,K) \\right) \\cap S(r),\n \\end{equation}\n then the event $A_5$ fails.\n \n For this, first note that, since every $V(\\alpha_r \\cup \\beta_r) \\cap B(\\Sigma R)$ contains the entry and exit points $v$ and $w$,\n every $\\tilde{S}^r$ contains $v^r, w^r$ (in the notation used in defining the set $S^r(p,q)$ in the case $(p,q) = (v,w)$).\n Then \\eqref{eq:tightspace} implies that $v^r$ and $w^r$ are each distance at most $K$ from either $v_r$ or $w_r$.\n A general element $z \\in \\tilde{S}^r$ has the same property, and combining with \\eqref{eq:closeentryexit} gives\n \\[\n d(z,v^r) \\le \\min(d(z,v_r),d(z,w_r)) + d(v_r,w_r) + \\min(d(v_r,v^r),d(w_r,v^r)) \\le 3K,\n \\]\n and similarly $d(z,w^r) \\le 3K$. Hence $\\tilde{S}^r \\subset S^r_0(v,w)$. Moreover we have\n \\begin{claim}\n If $\\Sigma R \\ge r+\\ell, r \\ge \\sigma R$, then $v_{r+\\ell},w_{r+\\ell} \\in S^r_{\\ell}(v,w)$.\n \\end{claim}\n \\begin{proof}[Proof of Claim]\n Since $v_{r+\\ell},w_{r+\\ell} \\in S(r+\\ell)$ and $S^r_0(v,w) \\subset S(r)$, we have that\n \\[\n d_{E(B(\\Sigma R) \\setminus B(r-1))}(v_{r+\\ell}, S^r_0(v,w)), d_{E(B(\\Sigma R) \\setminus B(r-1))}(w_r, S^r_0(v,w)) \\ge \\ell,\n \\]\n so we only have to show the opposite inequality. For this, let $(v_{r+\\ell})^r$ be as usual the intersection of $S(r)$\n with an edge geodesic from $v_{r + \\ell}$ to $o_i$. Since $v_{r+\\ell} \\in V(\\alpha_{r+\\ell}) \\subset V(\\alpha_r)$,\n we have that $(v_{r + \\ell})^r \\in \\tilde{S}^r \\subset S^r_0(v,w)$; moreover, the geodesic from $v_{r+\\ell}$ to\n $(v_{r + \\ell})^r$ is a path of length $\\ell$ which lies in $B(\\Sigma R) \\setminus B(r-1)$, and so we have\n \\[\n d_{B(\\Sigma R) \\setminus B(r-1)}(v_{r+\\ell}, S^r_0(v,w)) \\le d_{B(\\Sigma R) \\setminus B(r-1)}(v_{r+\\ell}, (v_{r+\\ell})^r) = l,\n \\]\n as desired. The argument for $w_{r+\\ell}$ is the same.\n \\end{proof}\n Finally, we contradict $A_5$. For each $\\Sigma R - 3K \\ge r \\ge \\sigma R$, consider $\\gamma_3 := \\pi_{v_{r+2K},v_r}$.\n Since $\\gamma_3$ by construction does not visit $B(r-1)$, and since it starts at a point with distance\n $d_{E(B(\\Sigma R) \\setminus B(r-1))}(v_{r+2K}, S^r_0(v,w)) = 2K$ and ends at a point $v_r \\in S^r_0(v,w)$,\n some subpath $\\gamma_2$ of $\\gamma_3$ is contained in $S^r(v,w)$, starts at $S^r_{2K}(v,w)$ and ends\n at $S^r_0(v,w)$.\n On the other hand, let $\\gamma_1$ be an edge-geodesic from $v_{r+2K}$ to $w_{r+2K}$. By assumption,\n $d(v_{r+2K},w_{r+2K}) \\le K$, so $|\\gamma_1| \\le K$; therefore $\\gamma_1$ does not intersect $B(r-1)$,\n and since the endpoints of $\\gamma_1$ lie in $S^r_{2K}(v,w)$, $\\gamma_1$ is totally contained in $S^r(v,w)$.\n But since $\\pi$ is a $T$-geodesic, we have\n \\[\n T(\\gamma_1) \\ge T(\\pi_{v_{r+2K},w_{r+2K}}) \\ge T(\\gamma_3) \\ge T(\\gamma_2),\n \\]\n and so $C^r(v,w)$ holds. But then $C^r(v,w)$ holds for all $\\Sigma R - 3K \\ge r \\ge \\sigma R$, so $A_5$ fails.\n\\end{proof}\n\nTo apply Proposition \\ref{prop:conditionaltovdBK}\nit only remains to obtain a lower bound on $\\mathbb{P}( w^* \\in E_w | w)$ on the event $A_i^R$ which is\nindependent of $o_i$.\nBut on $A_i^R$ we have\n\\begin{align*}\n \\mathbb{P}(w^* \\in E_w | w) \n &\\ge \\nu(I_0)^{|S_I|} \\nu([\\inf, \\inf + \\delta_{\\inf}])^{|S_{\\inf}|} \\eta^{|S_{sim}|} \\nu([M,\\infty))^{|S_M|} \\\\\n &\\ge \\min( \\nu(I_0), \\nu([\\inf,\\inf+\\delta_{\\inf}]), \\eta, \\nu([M,\\infty)))^{D C_1 (\\Sigma R)^d} > 0,\n\\end{align*}\nas desired.\n\n\n\\subsection{Proof of Theorem \\ref{thm:polygrowthvdBK}}\n\nLet $G$ be a graph of strict polynomial growth which admits detours. \nLet $\\nu$ be an exponential-subcritical measure with finite mean, and let $\\tilde{\\nu}$ be a measure which has finite mean\nand is strictly more variable than $\\nu$.\nFirst assume \\eqref{eq:extraassumption}. Then\nlet $\\epsilon > 0, I_0, y_0$ be as in Lemma \\ref{lem:technicallemma}.\nIn case $\\nu$ has bounded support, construct $B_i, B(o_i, \\Sigma R), A_i^R,$ and $E_w$ as in Section \\ref{bddsupp}.\nIn case $\\nu$ has unbounded support, construct $B_i, B(o_i, \\Sigma R), A_i^R,$ and $E_w$ as in Section \\ref{unbddsupp}.\nIn their respective sections, we prove that both constructions satisfy the hypotheses of Proposition \\ref{prop:conditionaltovdBK},\nand so \n\\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} > 0.\n\\]\nNow, if $w, \\tilde{w}$ do not satisfy \\eqref{eq:extraassumption}, \ntake $\\bar{w}$ as in Lemma \\ref{lem:wlog}. Then we have\n\\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) }{d(x,y)} \\ge \n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E} T(x,y) - \\mathbb{E} \\bar{T}(x,y) }{d(x,y)} \n > 0.\n\\]\nThus $G$ is vdBK. The reverse implication is given by Theorem \\ref{thm:notvdBK}.\n\n\n\n\n\n\n\\subsection{Non-homogeneous graphs of polynomial growth}\nTheorem \\ref{thm:polygrowthvdBK} does not require almost-transitivity (although almost-transitivity does give us more information\nabout the condition of exponential-subcriticality, namely that $\\underline{p_c} = p_c$ \\cite{DuminilCopinTassion}).\nThis means that the theorem applies to a very broad class of graphs, but it can be difficult to produce examples of non-transitive\ngraphs which have \\emph{strict} polynomial growth and for which it is easy to check whether the graph admits detours.\nHere we give two examples (or one example and one counterexample).\n\nFirst, the theorem can be applied to a broad range of subgraphs of the standard Cayley graph of $\\mathbb{Z}^d$.\nFor instance $G := \\mathbb{Z}_{\\ge 0}^{d_1} \\times \\mathbb{Z}^{d_2} \\subset \\mathbb{Z}^{d_1 + d_2}$ will be vdBK whenever $d_1 + d_2 \\ge 2$.\nThese graphs have growth bounds $B_G(R) \\le B_{\\mathbb{Z}^{d_1+d_2}}(R) \\le 2^{-d_1} B_G(R)$, from which we can deduce\nthat $G$ has strict polynomial growth. Moreover, the unique geodesics in $G$ are all also unique geodesics in $\\mathbb{Z}^d$,\n(that is, they are represented by words of the form $e_i^k$, where $\\{e_i\\}$ is the standard generating set),\nand when $d_1 + d_2 \\ge 2$ one can easily see that these admit detours.\n\nMoreover, we can apply the theorem to ``sectors'', that is, graphs $G_{\\theta,\\theta'}$ induced by the vertex subset\n\\[\n V_{\\theta,\\theta'} := \\{ (x,y) \\in \\mathbb{Z}^2 : \\theta \\le \\arctan(y/x) \\le \\theta' \\}\n\\]\nfor fixed $\\theta < \\theta'$. Again we see that this is of strict polynomial growth.\nMoreover, the unique geodesics in this graph are either already unique geodesics in $\\mathbb{Z}^2$, or\nthey run along the ``boundary'' $\\{ (x,y) : \\arctan(y/x) \\approx \\theta \\mbox{ or } \\theta' \\}$ (in fact most geodesics along\nthe boundary are also not unique). But again it is simple to check that these admit detours, and hence $G$ is vdBK.\nSimilar constructions can be done in higher dimensions, and in fact many more subgraphs of $\\mathbb{Z}^d$ satisfy the\nhypotheses of the theorem.\n\nThe next obvious candidate for a non-almost-transitive graph of strict polynomial growth is the infinite cluster\nof a supercritical Bernoulli percolation on a Cayley graph of a virtually nilpotent group.\nIn fact, this will not have strict polynomial growth as we have defined here, since we require \\emph{uniform}\nvolume lower bounds. But beyond that, one can see that (for $p < 1$) almost surely the cluster does \\emph{not}\nadmit detours, and hence by Theorem \\ref{thm:notvdBK} it is \\emph{not} vdBK.\n\nThis can be seen by a simple ``finite energy'' type argument. For any $C < \\infty$, choose a large radius $R \\ge C$ such that\nthe probability that $B(R)$ intersects the infinite cluster is positive; this event is actually independent of the configuration of edges\ninside $E(B(R))$, so chose a particular configuration in $E(B(R))$ such that all edges in contact with the vertex boundary of $B(R)$ are open,\nsuch that all these edges are connected to each other by open edges, and such that these open edges on the boundary are connected\nto an open path of length $\\ge 3C$ which is otherwise surrounded by closed edges.\nThe probability that the boundary of $B(R)$ is connected to infinity \\emph{and} that the restriction of the sampled configuration\nrestricted to $E(B(R))$ is our prescribed configuration, is also positive. One quickly sees that on this event, the infinite cluster\ncontains a self-avoiding path of length $C$ which does not admit a detour of length at most, say, $(3/2)C$.\nThe event that the infinite cluster contains a such a path is clearly a translation-invariant event, and so by ergodicity,\nsince this event occurs with positive probability, it occurs with probability $1$.\nIntersecting all these events for a countable collection $C \\to \\infty$ shows that almost surely the infinite cluster does not admit detours.\n\nOf course, for graphs which are not almost-transitive, the vdBK condition is quite strong. One may ask the following question\n(which is equivalent in the case of almost-transitive graphs):\nfix $o \\in V$. If $\\tilde{\\nu}$ is strictly more variable than $\\nu$ and $\\nu$ is exponential-subcritical,\nis\n\\[\n \\liminf_{x \\to \\infty} \\frac{ \\mathbb{E} T(o,x) - \\mathbb{E} \\tilde{T}(o,x)}{d(o,x)} > 0?\n\\]\nIt is conceivable that the answer might be ``yes'' in the case of supercritical percolation clusters on nilpotent Cayley graphs,\nsince supercritical clusters ``generally behave like their underlying graph'' at large scales.\nPerhaps the proofs in this paper could be adapted to this case, but it would require ``large scale'' and perhaps ``statistical'' weakenings\nof the geometric properties used.\n\n\n\\section{Absolute continuity with respect to the expected empirical measure} \\label{sec:abscont}\nFor a graph $G$ and a probability measure $\\nu$ on $[0, \\infty)$, we say that the associated first passage percolation \n\\emph{has weight distribution absolutely continuous with respect to the expected empirical measure} if for any Borel set $A \\subset [0, \\infty)$ with $\\nu(A) > 0$ we have\n\\[\n \\liminf_{d(x,y) \\to \\infty} \\frac{ \\mathbb{E}[ \\sum_{e \\in \\pi} \\mathbbm{1}_{w(e) \\in A} ] }{ d(x, y) } > 0,\n\\]\nwhere $\\pi$ denotes the $T$-geodesic from $x$ to $y$.\nNote that this does not imply or presuppose that a literal expected empirical measure, that is, a weak limit of the expected empirical measures \n$\\frac{1}{d(x,y)} \\mathbb{E} \\sum_{e \\in \\pi} \\delta_{w(e)}$, exists,\\footnote{In the $G=\\mathbb{Z}^d$ case it was recently proven by Bates ~\\cite{Bates} that for ``generic''\n$\\nu$, the sequence of random empirical measures $\\frac{1}{d(0,nv)} \\sum_{e \\in \\pi} \\delta_{w(e)}$ in a fixed direction almost surely\nweakly converges to a deterministic limit measure, an even stronger result than the existence of an \\emph{expected} empirical measure in a particular direction.}\nalthough it does imply that $\\nu$ is absolutely continuous\nwith respect to any subsequential weak limit of this collection of measures. \nAs noted in \\cite{vdBK}, the above property implies strict monotonicity with respect to stochastic domination:\n\n\\begin{prop} \\label{prop:stochdommonotonicity}\n Suppose that $\\nu$ is absolutely continuous with respect to the expected empirical measure. Then whenever $\\nu$ strictly\n stochastically dominates $\\tilde{\\nu}$, that is, whenever $\\tilde{\\nu} \\ne \\nu$ and there exists some coupling $(\\tilde{w},w)$\n of $\\tilde{\\nu}$ and $\\nu$ such that $\\tilde{w} \\le w$ almost surely, we have $\\mathbb{E} \\tilde{T} \\ll \\mathbb{E} T$.\n\\end{prop}\n\\begin{proof}\n \n Fix a coupling $(\\tilde{w}, w)$ with $\\tilde{w} \\le w$; since $\\tilde{\\nu} \\ne \\nu$,\n $\\mathbb{P}(\\tilde{w} < w) > 0$, and so one can find sufficiently small $a > 0, b > 0,$ and Borel set $A \\subset [0, \\infty)$ such that $\\nu(A) > 0$\n and such that for every $y \\in A$,\n \\[ \\mathbb{P}( \\tilde{w} < y - a | w = y) \\ge b. \\]\n We thus have\n \\begin{align*}\n \\mathbb{E} T(x,y) - \\mathbb{E} \\tilde{T}(x,y) &\\ge \\mathbb{E}[ T(\\pi) - \\tilde{T}(\\pi)] \\\\\n &= \\mathbb{E} \\sum_{e \\in \\pi} (w(e) - \\tilde{w}(e)) \\\\\n &= \\mathbb{E}\\left[ \\sum_{e \\in \\pi} \\mathbb{E}[ w(e) - \\tilde{w}(e) | w(e) ] \\right] \\\\\n &\\ge \\mathbb{E} \\left[ \\sum_{e \\in \\pi} ab \\mathbbm{1}_{w(e) \\in A} \\right] \\\\\n &\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y),\n \\end{align*}\n where the last inequality follows from the fact that $\\nu$ is absolutely continuous with respect to the expected empirical measure.\n\\end{proof}\nIf $\\nu$ strictly stochastically dominates $\\tilde{\\nu}$, then $\\tilde{\\nu}$ is strictly more variable than $\\nu$,\nso our theorems above already prove strict monotonicity with respect to stochastic domination for \ngraphs which admit detours and are either quasi-trees or have strict polynomial growth (in the later case, on the\ncondition that $\\nu$ is also exponential-subcritical).\nHowever, we can prove absolute continuity with respect to the empirical measure---and hence strict monotonicity with respect to\nstochastic domination---whether or not $G$ admits detours directly, by using essentially\nidentical methods to those above. \n\n\\qitreeabscont*\n\\begin{proof}[Proof sketch]\n Let $A \\subset [0, \\infty)$ be Borel with $\\nu(A) > 0$. Set $I_0 = A$. Set $C=1, \\epsilon = 1$. Then do the same construction as in the proof of\n Theorem \\ref{thm:qitree}. Whereas Theorem \\ref{thm:qitree} gives $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ detours in expectation, now this construction\n gives $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ edges of $\\pi$ which have weights in $I_0 = A$, as desired.\n More explicitly, the construction gives a family of subgraphs \n $\\{ B_i \\}$ with \n \\[\n \\sum_{i} \\mathbb{P}(\\mbox{the geodesic } \\pi:x \\to y \\mbox{ contains an edge } e \\mbox{ in } B_i \\mbox{ of weight } w(e) \\in A) \\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y).\n \\]\n Then arguing similarly as in the proof of Lemma \\ref{lem:feasibletovdBK}, one gets a \\emph{disjoint} family $\\{ B_i \\}$ with this property,\n and so one concludes that $\\pi$ contains $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ edges with weight lying in $A$ in expectation, as desired.\n Lastly, the stochastic domination statement follows from Proposition \\ref{prop:stochdommonotonicity}.\n\\end{proof}\n\n\\polygrowthabscont*\n\\begin{proof}[Proof sketch]\n Let $A \\subset [0, \\infty)$ be Borel with $\\nu(A) > 0$ (one may without loss of generality replace $A$ with a bounded positive $\\nu$-measure subset). \n Assume $\\nu$ is exponential-subcritical,\n and choose $q, \\Sigma,$ and $\\delta$ as in the first and simplest construction in the proof of\n Theorem \\ref{thm:polygrowthvdBK}, that is, in the case that $\\nu$ has bounded support and $y_0 = \\sup$.\n Also define the event $A_1$ as in that construction.\n Then define $E_w = E$ by\n \\[\n E := \\left\\{ \\omega \\in [0,\\infty)^{E(B(\\Sigma R))} : \\omega(e) \\in [\\inf, \\inf + \\delta) \\mbox{ if } e \\in E(B(\\Sigma R - 1)), \n \\omega(e) \\in A \\mbox{ otherwise} \\right\\}.\n \\]\n\n \n Using this construction, we have that for sufficiently large $R$, whenever $A_1$ holds, \n the $T$-geodesic $\\pi$ crosses $B_i$, and $w^* \\in E_w = E$,\n there is a path entering $B(o_i, \\Sigma R)$ which has $T^*$-weight strictly smaller than any path not entering $B(o_i, \\Sigma R)$;\n hence the $T^*$ geodesic enters $B(o_i, \\Sigma R)$ and therefore contains\n an edge with weight valued in $A$. \n (Note that this is much simpler than in the proof of Theorem \\ref{thm:polygrowthvdBK} when $y_0 \\ne \\sup$; this is because\n in that proof, we had to ensure that the $T^*$-geodesic made a long excursion away from the boundary of $B(o_i, \\Sigma R)$,\n whereas here we only need it to hit a single edge with weight in $A$).\n \n The Peierls lemma (Lemma \\ref{lem:peierls}) gives, in expectation, $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ $B_i$\n such that the $T$-geodesic visits $B_i$ and $A_1$ holds; combining this with resampling \n (similar to Proposition \\ref{prop:conditionaltovdBK}) thus gives in expectation we at least $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ $B(o_i, \\Sigma R)$ which contain an\n edge of the $T$-geodesic with weight lying in $A$. Again arguing similarly as in the proof of Lemma \\ref{lem:feasibletovdBK} then\n gives that, in expectation, the $T$-geodesic contains $\\ \\lower4pt\\hbox{$\\buildrel{\\displaystyle > d(x,y)$ edges with weight lying in $A$, as desired.\n The stochastic domination statement again follows from Proposition \\ref{prop:stochdommonotonicity}.\n\\end{proof}\n\n\\subsection*{Acknowledgements}\nThe author thanks Antonio Auffinger for suggesting the problem of generalizing \\cite{vdBK}, as well as for helpful conversations.\nThe author also thanks Alice Kerr, from whom he learned Manning's bottleneck criterion.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "meta": {"timestamp": "2022-08-31T02:04:40", "yymm": "2208", "arxiv_id": "2208.13922", "language": "en", "url": "https://arxiv.org/abs/2208.13922"}} {"text": "\\section{Introduction}\nNotwithstanding fifty years of quantum chromodynamics (QCD), the spectrum of light quark mesons remains obscure. This is especially true if one includes states with strangeness, \\emph{e.g}., Refs.\\,\\cite{Ketzer:2019wmd} and \\cite[Sec.\\,63]{Workman:2022ynf}, and despite the claim, made long ago \\cite{Godfrey:1985xj}, ``\\ldots that all mesons -- from the pion to the upsilon -- can be described in a unified framework.'' The latter statement was made with reference to a quark model potential with one-gluon-like exchange plus linear confinement. Some features of such potential models can be understood to have a qualitative connection with QCD. For instance, regarding the lighter degrees-of-freedom, $g=u$, $d$, $s$ quarks, one might draw a link between a model's constituent quarks and QCD's dressed-quarks, insofar as dressed quarks are described by a momentum-dependent running-mass, $M_g(k^2)$, which is large at infrared momenta \\cite[Fig.\\,2.5]{Roberts:2021nhw}: $M_{u,d}(0)\\simeq 0.41\\,$GeV, $M_s(0) \\simeq 0.53\\,$GeV.\n\nThere are problems with the quark model position, however.\nOne issue is the quantum mechanics description of light-quark systems in terms of a potential. Owing to the fact that light-particle annihilation and creation effects are essentially nonperturbative in QCD, it is impossible to calculate a quantum mechanical potential between two light quarks \\cite{Bali:2005fu, Prkacin:2005dc, Chang:2009ae}.\nAnother recognises that whilst it might be possible to connect a linear potential with the Wilson area law applicable to infinitely heavy colour-sources and -sinks \\cite{Wilson:1974sk}, the associated flux tube picture \\cite{Isgur:1983wj} has neither a mathematical nor physical connection with the confinement of light quarks. In this sector, at least, confinement is more subtle, arguably manifested in analytic properties of dressed-gluon and \\mbox{-quark} propagators that are markedly different from those of asymptotic states\n\\cite{Krein:1990sf, Burden:1991gd, Roberts:2007ji, Brodsky:2012ku, Qin:2013ufa, Lucha:2016vte, Gao:2017uox, Binosi:2019ecz, Dudal:2019gvn, Fischer:2020xnb, Roberts:2020hiw}.\nA third notes that even if these and many other problems could be overcome, there is the Gell-Mann--Oakes--Renner identity \\cite{GellMann:1968rz}, which linearly relates the pion mass-squared, $m_\\pi^2$, to Nature's explicit source of chiral symmetry breaking, $\\hat m$, \\emph{viz}.\\ $m_\\pi^2 \\propto \\hat m$. Such behaviour is impossible in a potential model \\cite{Roberts:2012sv, Horn:2016rip}.\n\nChallenges like that can be surmounted by using continuum Schwinger function methods (CSMs) \\cite{Eichmann:2016yit, Qin:2020rad}; yet, others arise. The most notable relates to construction of a sound, symmetry preserving approximation to the quark+antiquark Bethe-Salpeter kernel. A straightforward systematic approach \\cite{Munczek:1994zz, Bender:1996bb} serves well for ground-state hadrons with little rest-frame orbital angular momentum between the dressed valence constituents \\cite{Holl:2004fr, Holl:2005vu, Fischer:2009jm, Krassnigg:2009zh, Qin:2011dd, Qin:2011xq, Blank:2011ha, Hilger:2014nma, Fischer:2014cfa, Eichmann:2016hgl, Qin:2019hgk}. It is limited, however, by a failure to express consequences of emergent hadron mass (EHM) \\cite{Roberts:2020hiw, Roberts:2021nhw, Binosi:2022djx, Papavassiliou:2022wrb}. Improved schemes are being developed \\cite{Chang:2009zb, Chang:2011ei, Binosi:2014aea, Williams:2015cvx, Binosi:2016rxz, Qin:2020jig}; and in applications to mesons constituted from $u$, $d$ valence quarks and/or antiquarks, to which they have been limited, the new approaches have shown promise.\n\n\n\\begin{figure*}[!t]\n\\centerline{%\n\\includegraphics[clip, width=0.9\\textwidth]{F1.pdf}}\n\\caption{\\label{figAllstrange}\nGold six-pointed stars -- spectrum of low-lying $u$, $d$, $s$ mesons predicted by the Bethe-Salpeter kernel developed herein;\nand black five-pointed stars -- same spectrum computed using rainbow-ladder truncation.\nComparison empirical spectrum \\cite[Summary Tables]{Workman:2022ynf}:\nblue circles (bars) -- $u$, $d$ systems; and green diamonds (bar) -- mesons with $s$ and/or $\\bar s$ quarks. The open red diamond is the $K(1460)$, about which little is known.\n(All numerical values are listed in Supplemental Material, Table~\\ref{TableMassPredictions}, along with an explanation of state selection and identification.)\n}\n\\end{figure*}\n\nHerein, we explore the capacity of a new kernel construction \\cite{Qin:2020jig} to simultaneously treat ground- and first-excited-states of light-quark mesons plus those containing $s$ and/or $\\bar s$ quarks. As shown in Fig.\\,\\ref{figAllstrange}, the empirical spectrum displays some curious features. For instance:\nwhilst $m_\\rho < m_{K^\\ast}$, this ordering is reversed for the first excitations of these states;\nthe first excited state of the $\\rho$ is heavier than that of the $\\pi$, but this is reversed for $K^\\ast$, $K$;\nand there is a near degeneracy between axialvector mesons, with the heavier mass of the $s$ quarks seeming to have little or no impact.\nIn delivering the first Poincar\\'e covariant, symmetry-preserving analysis of this collection of mesons to employ an EHM-improved kernel, we provide fresh insights into the physical foundations of the spectrum and structure of lighter-quark mesons.\n\n\\section{Rainbow-ladder truncation}\nMesons appear as poles in the quark+antiquark scattering matrix: the pole location reveals the mass (and width); and the residue of the pole is the Poincar\\'e-covariant bound-state wave function. The scattering matrix is the solution of an integral equation, whose driving term is the Bethe-Salpeter kernel and which features the dressed-propagators of the valence degrees-of-freedom that constitute the system.\n\nThe simplest symmetry-preserving approximation to the meson bound-state problem is provided by the rainbow-ladder (RL) truncation, which is leading-order in the scheme identified in Refs.\\,\\cite{Munczek:1994zz, Bender:1996bb}. It is readily introduced by focusing on the gap equation for a quark with flavour $g$:\n\\begin{subequations}\n\\label{EqGap}\n\\begin{align}\nS_g^{-1}(k) &= i\\gamma\\cdot k + m_g + \\Sigma_g(k) \\,,\\\\\n\\Sigma_g(k) & =\\int_{dq}\n4 \\pi \\alpha\\, D_{\\mu\\nu}(l)\\gamma_\\mu\\frac{\\lambda^{a}}{2} S(q) \\Gamma_\\nu^g(q,k)\\frac{\\lambda^{a}}{2}\\,,\n\\end{align}\n\\end{subequations}\nwhere $l=k-q$,\n$m_g$ is the Higgs-produced quark current-mass;\n$\\{\\tfrac{1}{2}\\lambda^a|a=1,\\ldots,8\\}$ are the generators of SU$(3)$-colour in the fundamental representation;\n$\\alpha$ is the QCD coupling; $D_{\\mu\\nu}$ is the dressed-gluon propagator; and $\\Gamma_\\nu^g$ is the relevant dressed-gluon-quark vertex. The solution of Eq.\\,\\eqref{EqGap} is often written\n$S_g(k) = 1/[i\\gamma\\cdot k\\,A_g(k^2) + B_g(k^2)]$.\n\nFollowing Refs.\\,\\cite{Munczek:1994zz, Bender:1996bb}, the Bethe-Salpeter kernel is determined once the diagrammatic content of the kernel in Eq.\\,\\eqref{EqGap} is specified. The rainbow-truncation is obtained by writing:\n$4 \\pi \\alpha D_{\\mu\\nu}(l) \\Gamma_\\nu^g(q,k) \\to {\\mathpzc G}_{\\mu\\nu}(l)\\gamma_\\nu$,\nwhere ${\\mathpzc G}_{\\mu\\nu}$ is a vector-boson exchange-interaction informed by analyses of QCD's gauge sector \\cite{Qin:2011dd}. In the associated ladder truncation, the Bethe-Salpeter kernel is $(y=l^2)$\n\\begin{subequations}\n\\label{EqRLInteraction}\n\\begin{align}\n\\label{KDinteraction}\n\\mathscr{K}_{tu}^{rs} & = {\\mathpzc G}_{\\mu\\nu}(l) [i\\gamma_\\mu\\frac{\\lambda^{a}}{2} ]_{ts} [i\\gamma_\\nu\\frac{\\lambda^{a}}{2} ]_{ru}\\,,\\\\\n {\\mathpzc G}_{\\mu\\nu}(l) & = \\tilde{\\mathpzc G}(y) T_{\\mu\\nu}(l)\\,,\n\\end{align}\n\\end{subequations}\n$l^2 T_{\\mu\\nu}(l) = l^2 \\delta_{\\mu\\nu} - l_\\mu l_\\nu$. This tensor structure specifies Landau gauge, which is used because \\cite{Bashir:2009fv}: (\\emph{i}) it is a fixed point of the renormalisation group; (\\emph{ii}) that gauge for which corrections to RL truncation are least noticeable; and (\\emph{iii}) most readily implemented in lattice-regularised QCD.\nIn Eq.\\,\\eqref{EqRLInteraction}, $r,s,t,u$ represent colour, spinor, and flavour matrix indices (as necessary).\n\nThe form of ${\\mathpzc G}_{\\mu\\nu}(l)$ is explained elsewhere \\cite{Qin:2011xq, Binosi:2014aea}:\n\\begin{align}\n\\label{defcalG}\n \\tilde{\\mathpzc G}(y) & =\n \\frac{8\\pi^2 D}{\\omega^4} e^{-y/\\omega^2} + \\frac{8\\pi^2 \\gamma_m \\mathcal{F}(y)}{\\ln\\big[ \\tau+(1+y/\\Lambda_{\\rm QCD}^2)^2 \\big]}\\,,\n\\end{align}\nwhere $\\gamma_m=4/\\beta_0$, $\\beta_0=25/3$,\n$\\Lambda_{\\rm QCD}=0.234\\,$GeV,\n$\\ln(\\tau+1)=2$,\nand ${\\cal F}(y) = \\{1 - \\exp(-y/[4 m_t^2])\\}/y$, $m_t=0.5\\,$GeV.\nRegarding Eq.\\,\\eqref{defcalG}:\n(\\emph{i}) $0 < \\tilde{\\mathpzc G}(0) < \\infty$ because a nonzero gluon mass-scale appears as a consequence of EHM in QCD \\cite{Binosi:2014aea, Cui:2019dwv, Roberts:2021nhw, Binosi:2022djx, Papavassiliou:2022wrb};\nand (\\emph{ii}) the large-$y$ behaviour ensures that the one-loop renormalisation group flow of QCD is preserved.\nProperty (\\emph{ii}) is crucial when considering, \\emph{e.g}., hadron elastic and transition form factors at large momentum transfer \\cite{Chen:2018rwz, Ding:2018xwy, Xu:2019ilh, Xu:2021mju} and the behaviour of parton distribution functions and amplitudes near the endpoints of their support domains \\cite{Ding:2019qlr, Cui:2020dlm, Cui:2020tdf}.\nIt is less important when calculating masses, which are global, integrated properties. Regarding masses, (\\emph{i}) is critical: even a symmetry-preserving treatment of a momentum-independent interaction can deliver good results \\cite{Yin:2021uom, Xu:2021iwv, Gutierrez-Guerrero:2021rsx}. Hence, we follow Refs.\\,\\cite{Chang:2009zb, Chang:2010hb, Qin:2020jig}, hereafter retaining only the first term on the right-hand-side of Eq.\\,\\eqref{defcalG}. By obviating the need for renormalisation, this simplifies analysis without materially affecting results.\n\nA typical RL truncation result for the spectrum of $u$, $d$, $s$ mesons is depicted in Fig.\\,\\ref{figAllstrange}. It was obtained by solving the coupled gap and Bethe-Salpeter equations with the interaction just described, using $\\omega = 0.8\\,$GeV, a value chosen because it matches results from analyses of QCD's gauge sector \\cite{Binosi:2014aea, Cui:2019dwv}, $D=D_{\\rm RL}$, with\n\\begin{equation}\n\\label{DRL}\n \\omega D_{\\rm RL} = (1.01\\,{\\rm GeV})^3\\,,\n\\end{equation}\nand current-quark masses\n$m_u = m_d = 2.7\\,$MeV, $m_s=72\\,$MeV.\nSince we have chosen to work with an interaction that eliminates the need for renormalisation, these current masses need not match those inferred from experiment. Nevertheless, compared with such values \\cite{Workman:2022ynf}: our mean $u$, $d$ current mass is not markedly different, \\emph{cf}.\\ $3.45_{-0.15}^{+0.35}\\,$MeV; and our value for the ratio $2 m_s/(m_u+m_d) = 26.7$ is a good match, \\emph{cf}.\\ $27.3_{-0.8}^{+0.7}$.\n\nThe features and flaws of RL truncation are evident in Fig.\\,\\ref{figAllstrange}. Overall, the mean absolute relative difference between RL masses and central experimental values is $13(8)$\\%. Whilst this might appear to be fair agreement, there is substantial scatter; and there are many qualitative discrepancies. For instance, labelling the first excited state with an apostrophe, then:\n$m_{K^\\prime} < m_{\\pi^\\prime}$ in RL truncation, whereas the empirical ordering is opposite, and\nthe same is true for $(m_{\\rho^\\prime},m_{\\pi^\\prime})$, $(m_{\\rho^\\prime},m_{K^{\\ast \\prime}})$;\nRL truncation $a_1$-$\\rho$ and $b_1$-$\\rho$ mass splittings are one-third of the empirical values because the $b_1$ and $a_1$ mesons are much too light;\n$m_{\\phi^\\prime}-m_\\phi$ is half the experimental value;\nand the level ordering of the $K_1^{+-}$, $K_1^{++}$ states is incorrect.\n\nFurthermore, RL truncation produces light quark+antiquark scalar mesons, which are not seen in Nature. As explained elsewhere \\cite[Sec.\\,64]{Workman:2022ynf}, the lightest scalar mesons are now considered to be complicated systems with material meson+antimeson components. Thus, the apparent agreement between experiment and the masses of the purely quark+antiquark $f_0$ and $K_0^\\ast$ mesons generated by RL truncation is misleading. Viewed from our perspective, the kernels described herein generate a hadron's dressed-quark core. They do not include the resonant contributions which are typically associated with a meson-cloud. Hence, the quark-core masses of any purely quark+antiquark $f_0$ and $K_0^\\ast$ mesons should be significantly greater than the empirical value because adding resonant contributions to the bound-state kernels will generate a large amount of attraction. This is illustrated, \\emph{e.g}., in Refs.\\,\\cite{Holl:2005st, Santowsky:2020pwd}.\n\n\nA technical remark is in order.\nThe Bethe-Salpeter equation can figuratively be written as an eigenvalue problem, \\emph{viz}.\\ $\\lambda(m) \\Gamma(m) = K(m) \\Gamma(m)$, where $K$ is the kernel and $\\Gamma$ is the bound-state amplitude. The on-shell solution is obtained at that mass $m$ for which the eigenvalue is unity, \\emph{i.e}., $d(m):=1-\\lambda(m)=0$.\nUsing RL truncation, one must employ an artificially inflated value of $D$ in Eq.\\,\\eqref{DRL} in order to approach a realistic expression of EHM \\cite{Binosi:2014aea}.\nConsequently, poles in the dressed-quark propagators enter the complex-plane integration domain sampled by the Bethe-Salpeter equation at lower values of $m$ than might otherwise be the case \\cite{Maris:1997tm, Windisch:2016iud}.\nThis limits the number of states for which the Bethe-Salpeter equation can be solved directly using simple numerical algorithms.\nIn our case, this is the set of all meson ground-states in Fig.\\,\\ref{figAllstrange}.\nFor each excited-state meson, we develop a $[n,l]$ Pad\\'e approximant to $d(m)$, $1\\leq n\\leq 4$, $l\\leq n$, and determine the zero by extrapolation. Using a one-point jackknife procedure, we select the result obtained with those values of $n$, $l$ for which the uncertainty in the zero's location is smallest, reporting the associated uncertainty.\n\n\\section{EHM improved kernel}\n\\label{SecEHM}\nThe key to a realistic expression of EHM in mesons lies in using gap equation kernels with a more direct connection to QCD. Today, $\\alpha D_{\\mu \\nu}$ is well understood \\cite{Binosi:2014aea, Cui:2019dwv, Roberts:2021nhw, Binosi:2022djx, Papavassiliou:2022wrb}; and although much remains to be learnt about $\\Gamma_\\nu^g$, EHM is known to generate a large anomalous chromomagnetic moment (ACM) for the lighter quarks \\cite{Chang:2010hb, Singh:1985sg, Bicudo:1998qb, Bashir:2011dp, Binosi:2016wcx, Kizilersu:2021jen} and, as illustrated elsewhere \\cite{Chang:2011ei, Williams:2015cvx, Qin:2020jig}, this ACM has a marked impact on the $u$, $d$ meson spectrum.\n\nTo expand existing studies and expose ACM effects on mesons containing $s$ and/or $\\bar s$ quarks, we write \\cite{Qin:2020jig}\n\\begin{equation}\n\\label{EqVertexGap}\n\\Gamma_\\nu^g(q,k) = \\gamma_\\nu + \\tau_\\nu(l)\\,,\\; \\tau_\\nu(l) = \\eta \\kappa(l^2)\\sigma_{l\\nu} \\,,\n\\end{equation}\n$\\sigma_{l\\nu} = \\sigma_{\\rho\\nu} l_\\rho$, $\\kappa(l^2) = (1/\\omega)\\exp{(-l^2/\\omega^2)}$.\nHere, $\\tau_\\nu(l)$ is the ACM term, with $\\eta$ its strength.\nIn QCD, $\\kappa(l^2)$ is power-law suppressed in the ultraviolet; but the Gaussian form, alike with the infrared-dominant term in Eq.\\,\\eqref{defcalG}, is sufficient for illustrative purposes.\nFollowing RL truncation convention, any overall dressing factor $F_1$, as in $F_1(l^2)[\\gamma_\\nu + \\tau_\\nu(l)]$, is implicitly absorbed into $\\tilde{\\mathpzc G}(l^2)$.\nEquation~\\eqref{EqVertexGap} assumes the vertex is flavour-independent. This is a good approximation for the lighter quarks \\cite{Bhagwat:2004hn, Williams:2014iea}.\n\nHaving specified the gap equation kernel, without reference to its diagrammatic content, the method of Ref.\\,\\cite{Qin:2020jig} can be used to obtain a (continuous and discrete) symmetry-consistent closed-form for the Bethe-Salpeter kernel. To proceed, consider the inhomogeneous Bethe-Salpeter equation, written figuratively ($g,h=u,d,s$):\n\\begin{align}\n\\Gamma_{H\\alpha\\beta}^{gh}(k;P) & = {\\mathpzc g}_H + \\int_{dq}K^{(2)}_{\\alpha\\alpha^\\prime,\\beta^\\prime\\beta}\n\\chi_{H\\alpha^\\prime \\beta^\\prime}^{gh} (q,P)\\,, \\label{BSequation}\n\\end{align}\nwhere $P=k_+-k_-$ is the total momentum of the quark($k_+)$+antiquark$(k_-)$ system;\n${\\mathpzc g}_H$ is a Dirac matrix combination that specifies the $J^{P(C)}$ of the channel under consideration;\n$K^{(2)}$ is the two-particle irreducible quark+antiquark scattering kernel, carrying one index for each of the four fermion legs;\n$\\int_{dq}$ denotes a four dimensional integral;\nand $\\chi_H^{gh} (k,P) = S_f(k_+)\\Gamma_H^{gh}(k,P)S_g(k_-)$ is the unamputated vertex. On an $H$-meson mass-shell, $\\Gamma_H^{gh}$ is the bound-state amplitude, with $\\chi_H^{gh}$ the associated Poincar\\'e-covariant wave function.\n\nIn systems that may include nondegenerate valence quarks, Ref.\\,\\cite[Eqs.\\,(6)]{Qin:2020jig} must be generalised by considering the following forms for the Ward-Green-Takahashi (WGT) identities satisfied, respectively, by the unamputated inhomogeneous vector and axialvector vertices -- ${\\mathpzc g}^H = i \\gamma_\\mu, \\gamma_\\mu\\gamma_5$ in Eq.\\,\\eqref{BSequation}:\n{\\allowdisplaybreaks\n\\begin{subequations}\n\\label{eq:WTI}\n\\begin{align}\n\tP_\\mu \\chi_{\\mu}^{gh}& (k_+,k_-) \\nonumber \\\\\n & = i\\Delta_{S_{gh}}^{\\pm}(k) + i (m_g-m_h) \\chi_{0}^{gh}(k_+,k_-) \\,, \\\\\nP_\\mu \\chi_{5\\mu}^{gh}&(k_+,k_-) \\nonumber \\\\\n& = i \\Delta_{S5}^{\\pm}(k) - i (m_g+m_h) \\chi_{5}^{gh}(k_+,k_-)\\,, \\label{eqAVWTI}\n\\end{align}\n\\end{subequations}\nwhere\n$\\Delta_{F_{gh}}^{\\pm}(k) = F_g(k_+)-F_h(k_-)$,\n$\\Delta_{F5_{gh}}^{\\pm}(k) = F_g(k_+) \\gamma_5 + \\gamma_5 F_h(k_-)$,\nand $\\chi_{0/5}^{gh}$ are the unamputated scalar/pseudoscalar vertices (${\\mathpzc g}^H = {\\mathbb I}/ \\gamma_5$).\n}\n\nSubsequently repeating the algebra that leads from Eqs.\\,(7) to (10) in Ref.\\,\\cite{Qin:2020jig}, one arrives at analogues of the four entries in Ref.\\,\\cite[Eq.\\,(S.6)]{Qin:2020jig}. Importantly, when $m_g \\neq m_h$, Ref.\\,\\cite[Eq.\\,(S.6a)]{Qin:2020jig} is no longer related to \\cite[Eq.\\,(S.6d)]{Qin:2020jig} by charge conjugation; so, \\cite[Eqs.\\,(10)]{Qin:2020jig} become four independent constraint equations. (The flavour labels are readily attached by following the connections $g\\leftrightarrow k_+$, $h\\leftrightarrow k_-$.)\n\nNow introducing Eq.\\,\\eqref{EqVertexGap} into the gap equations, Eq.\\,\\eqref{EqGap}, and using the algebraic equations obtained thereby to construct explicit forms for the WGT identity constraints, Ref.\\,\\cite[Eqs.\\,(S.6)]{Qin:2020jig}, one finds\n\\begin{align}\n\t{K}^{(2)} &= - \\mathcal{G}_{\\mu\\nu}(l)\\gamma_\\mu\\otimes\\gamma_\\nu - \\mathcal{G}_{\\mu\\nu}(l)\\gamma_\\mu \\otimes \\tau_\\nu(l) \\notag\\\\\n\t& + ~ \\mathcal{G}_{\\mu\\nu}(l) \\tau_\\nu(l) \\otimes \\gamma_\\mu + {K}_{\\rm ad} \\,.\n\\end{align}\nRef.\\,\\cite[Eqs.\\,(S.6c)]{Qin:2020jig} is blind to ${K}_{\\rm ad}$, but that is not the case for the other entries in the block. So, writing\n\\begin{align}\n\t{K}_{\\rm ad} &= [ \\mathbf{1} \\otimes_+ \\mathbf{1} ] f^{(+)}_{p0} + [ -\\mathcal{G}_{\\mu\\nu}(l)\\gamma_\\mu \\otimes_+ \\gamma_\\nu ] f^{(-)}_{p1} \\notag\\\\\n\t& \\quad + [ \\mathbf{1} \\otimes_- \\mathbf{1} ] f^{(+)}_{n0} + [ -\\mathcal{G}_{\\mu\\nu}(l)\\sigma_{l\\mu} \\otimes_- \\sigma_{l\\nu} ] f^{(+)}_{n1}\\,,\n\t\\label{eq:kernel_Ad}\n\\end{align}\nwhere $\\otimes_\\pm := \\tfrac{1}{2}(\\otimes \\pm \\gamma_5 \\otimes \\gamma_5)$ and\n$f_{pj}^{(\\pm)} = {\\mathsf u}_{pj}^{(\\pm)}(l^2,P^2) + i {\\mathsf v}_{pj}^{(\\pm)}(l^2,P^2)$,\n$f_{nj}^{(+)} = {\\mathsf u}_{nj}^{(+)}(l^2,P^2)$, $ j=0,1$, with ${\\mathsf u}(l^2;P^2)$, ${\\mathsf v}(l^2;P^2)\\in \\mathbb{R}$ for $\\{l^2,P^2\\}\\in \\mathbb{R}$,\none arrives at the following integral equations,\n{\\allowdisplaybreaks\n\\begin{subequations}\n\\label{KernelCompletion}\n\\begin{align}\n& \\int_{dq} \\, {\\mathpzc G}_{\\mu\\nu}(l) \\gamma_\\mu {\\mathpzc s}_A^g(q_+) \\tau_\\nu(l) \\nonumber\\\\\n= & \\int_{dq} \\, [{\\mathpzc s}_B^g(q_+) f_{p0}^{(+)} + {\\mathpzc G}_{\\mu\\nu}(l)\\gamma_\\mu {\\mathpzc s}_B^h(q_-) \\gamma_\\nu f_{p1}^{(-)}]\\,, \\\\\n& \\int_{dq} \\, {\\mathpzc G}_{\\mu\\nu}(l) \\gamma_\\mu {\\mathpzc s}_B^g(q_+) \\tau_\\nu(l) \\nonumber\\\\\n= & \\int_{dq} \\, [{\\mathpzc s}_A^g(q_+) f_{n0}^{(+)} - {\\mathpzc G}_{\\mu\\nu}(l)\\sigma_{l\\mu} {\\mathpzc s}_A^h(q_+) \\sigma_{l\\nu} f_{n1}^{(+)}]\\,.\n\\end{align}\n\\end{subequations}\nHere, $S_g(k) =: {\\mathpzc s}_A^g(k) + {\\mathpzc s}_B^g(k)$,\n$\\{{\\mathpzc s}_A^g,\\gamma_5\\}=0=[{\\mathpzc s}_B^g,\\gamma_5]$.\n}\n\nEquations~\\eqref{KernelCompletion} are a pair of complex-valued integral equations whose solutions complete $K_{\\rm ad}$ and hence $K^{(2)}$.\nIn being determined by resolving WGT identities, Eqs.\\,\\eqref{eq:WTI}, the results are minimal \\emph{Ans\\\"atze} for the kernels.\nThis is analogous to results obtained in analyses that attempt to determine a three-point function from similar identities, \\emph{e.g}., Refs.\\,\\cite{Ball:1980ay, Curtis:1990zs, Maris:1997hd, Bashir:2011dp, Qin:2013mta, Qin:2014vya, Aguilar:2014lha, Aguilar:2019jsj}.\nThere and here, notwithstanding their minimal character, the \\emph{Ans\\\"atze} deliver material improvements over leading-order results; in many cases, restoring crucial symmetries that would otherwise be broken.\n\nIn $u$, $d$ channels, assuming isospin symmetry, Eqs.\\,\\eqref{KernelCompletion} yield four real-valued functions because ${\\mathsf v}_{pj}^{(\\pm)}\\equiv 0$.\nSimilarly, there are four solution functions for $s \\bar s$ scattering. In general, they are different from those associated with $u$, $d$ channels.\nIn $u \\bar s$ and kindred channels, Eqs.\\,\\eqref{KernelCompletion} yield six real-valued scalar functions, the number needed to complete the kernel in this case.\nWith these fourteen scalar functions in hand, then for an arbitrary vertex in the family specified by Eqs.\\,\\eqref{EqVertexGap}, one has symmetry-preserving EHM-improved Bethe-Salpeter kernels for use in calculating $u$, $d$, $s$ meson bound-state properties.\nIt should be remembered that the solutions of Eqs.\\,\\eqref{KernelCompletion} depend on $P^2$; hence, must be obtained anew at each value of the total quark+antiquark momentum. Of course, they can be obtained on a $P^2$-grid and then interpolated.\n\nTo explore the impact of EHM-induced dressed-quark ACMs on meson properties, we solved for the spectrum of $u$, $d$, $s$ mesons as a function of $\\eta$ in Eq.\\,\\eqref{EqVertexGap} whilst adjusting $D$ in Eq.\\,\\eqref{defcalG} so as to maintain $m_\\rho = 0.77\\,$GeV.\nGiven that $\\eta > 0 $ adds EHM strength to the gap equation's kernel, then $D$ must decrease with increasing $\\eta$ in order to achieve this outcome:\n\\begin{equation}\n\\label{RunD}\nD(\\eta) \\stackrel{\\eta \\in [0, 1.6]}= D_{\\rm RL} \\frac{1 + 0.27 \\eta }{1+1.47 \\eta}\\,.\n\\end{equation}\n\nThe spectrum obtained with $\\eta = 1.2$ is displayed in Fig.\\,\\ref{figAllstrange}.\n(The masses and amplitudes of all meson ground-states in Fig.\\,\\ref{figAllstrange} can be computed directly. Extrapolation is used for their lightest radial excitations.)\nA demonstration that the EHM-improved kernels ensure preservation of QCD's Gell-Mann--Oakes--Renner and Goldberger-Treiman relations \\cite{GellMann:1968rz, Maris:1997hd, Qin:2014vya} may be found elsewhere \\cite[Supplemental Material]{Qin:2020jig}.\n\n\\begin{figure*}[!t]\n\\centerline{%\n\\includegraphics[clip, width=0.9\\textwidth]{F2.pdf}}\n\\caption{\\label{FigLeptonic}\nLeptonic decay constants for all states considered herein: ground states, $n=0$; and lowest lying radial excitations, $n=1$.\nFor the excited states, we present two extrapolation results for each state, \\emph{viz}.\\ one obtained with Pad\\'e approximants, as used for meson masses, and the other employing the Schlessinger point method \\cite{Cui:2022fyr, Binosi:2018rht, Yao:2021pyf, Yao:2021pdy}.\nWhere available, \\emph{i.e}., for some ground states, results inferred from data are also plotted \\cite[PDG]{Workman:2022ynf}.\n(Numerical values listed in Supplemental Material, Tables~\\ref{TableDecayConstantPredictionsn0}, \\ref{TableDecayConstantPredictionsn1}.)\n}\n\\end{figure*}\n\nRegarding the EHM-improved results in Fig.\\,\\ref{figAllstrange}, compared with central experimental values, the overall mean absolute relative difference is $2.9(2.7)$\\%, \\emph{i.e}., the EHM-improved kernels deliver a factor of $4.6$ improvement over the RL spectrum.\nFurther,\n$m_{K^\\prime} > m_{\\pi^\\prime}$,\n$m_{\\rho^\\prime} > m_{\\pi^\\prime}$,\n$m_{\\rho^\\prime} \\approx m_{K^{\\ast \\prime}}$,\nmatching empirical results;\nthe $a_1$-$\\rho$ and $b_1$-$\\rho$ mass splittings are commensurate with empirical values because including EHM effects in the kernel has substantially increased the masses of the $b_1$ and $a_1$ mesons, whilst $m_\\rho$ was deliberately kept unchanged -- Eq.\\,\\eqref{RunD};\n$m_{\\phi^\\prime}-m_\\phi$ matches experiment to within 2\\%;\nthe level ordering of the $K_1^{+-}$, $K_1^{++}$ states is correct;\nand quark+antiquark scalar mesons are heavy.\n\nGiven that our EHM-improved kernels deliver a realistic meson spectrum, their predictions for the masses of the as-yet unseen radial excitations of the $b_1$ and $K_1^{++}$ states should be reasonable (in GeV):\n\\begin{subequations}\n\\label{newmasses}\n\\begin{align}\nm_{b_1^\\prime} & = 1.67(3)\\,, \\;\nm_{b_1^\\prime} -m_{b_1} = 0.51(3)\\,, \\\\\nm_{K_1^{++ \\prime}} & = 1.63(1)\\,,\\;\nm_{K_1^{++ \\prime}} - m_{K_1^{++}} = 0.33(1) \\,.\n\\end{align}\n\\end{subequations}\nThe mass splittings in the partner channels are:\n$m_{a_1^\\prime} -m_{a_1} = 0.47(4)$,\n$m_{K_1^{+- \\prime}} - m_{K_1^{+-}} = 0.39(3)$.\nAnother potentially useful observation is that the EHM-improved kernels predict $m_{b_1^\\prime} \\approx m_{a_1^\\prime}$, $m_{K_1^{++\\prime}}\\approx m_{K_1^{+-\\prime}}$.\n\n\nEmploying the Bethe-Salpeter equation solutions used to produce the spectrum, with the amplitudes canonically normalised in the standard fashion \\cite[Sec.\\,3]{Nakanishi:1969ph}, one can calculate the leptonic decay constant, $f_H$, associated with each state.\nThe definitions of these observables may be found, \\emph{e.g}., in Refs.\\,\\cite[Eqs.\\,(2)]{Yin:2021uom}, \\cite[Eq.\\,(6)]{Bhagwat:2006py}. They reveal that such decay constants are identically zero for $J^{PC}=0^{++}$, $1^{+-}$ states constituted from mass-degenerate valence degrees-of-freedom.\nRegarding radially excited states, we choose the phase by requiring that the zeroth Chebyshev moment of the leading term in the system's Bethe-Salpeter amplitude is positive in the ultraviolet. This is opposite to the convention in Ref.\\,\\cite{Holl:2004fr, Holl:2005vu}.\nOur results are collected in Fig.\\,\\ref{FigLeptonic}.\n\nFor ground states, the leptonic decay constants can be calculated directly. However, as with meson masses, the decay constants for excited states must be obtained by extrapolation. To achieve that we wrote $f_H(\\lambda(m))$, where $\\lambda(m)$ is the eigenvalue used to define the Bethe-Salpeter equation, computed $f_H(\\lambda(m))$ on a set of equally distributed $m$-values within the domain on which the Bethe-Salpeter amplitude can be straightforwardly obtained, then extrapolated the result to $\\lambda(m) = 1$.\nHere we provide two sets of results. One was obtained with the Pad\\'e-approximant method used above to determine meson masses. The other set was calculated using the Schlessinger point method (SPM), discussed in detail elsewhere \\cite{Cui:2022fyr, Binosi:2018rht, Yao:2021pyf, Yao:2021pdy} and used herein as described briefly in the supplemental material. Owing to its foundations in analytic function theory and the powerful statistical aspect introduced in modern applications, the SPM provides reliable extrapolations with a rigorously quantified uncertainty. The important observation here is that both methods yield consistent results in all cases.\n\nGiven that we have simplified the interaction in Eq.\\,\\eqref{defcalG}, keeping only the first term, then the comparison between our results for ground-state meson decay constants and the few known empirical values is favourable, especially since decay constants are sensitive to ultraviolet physics, which we have omitted. Further, there is a hint that the EHM-improved kernels deliver better agreement.\n\nThe results for radially excited states are especially interesting. In quantum mechanics models of positronium-like systems, one typically finds that, owing to zeros in the associated radial wave functions, the decay constant of a first radial excitation is $(1/8)$-times that of the ground state. Our Bethe-Salpeter equation predictions are broadly consistent with this pattern, except in the case of pseudoscalar mesons. In $J^P=0^-$ channels, the leptonic decay constant must vanish in the chiral limit \\cite{Holl:2004fr, Holl:2005vu, McNeile:2006qy, Ballon-Bayona:2014oma}. Whilst our results are in accord with this prediction, quantum mechanics models cannot deliver such an outcome; thus, our values for unmeasured decay constants warrant testing.\n\n\\section{Perspectives}\n\\label{epilogue}\nIt is worth reiterating that the method employed herein to calculate Bethe-Salpeter kernels for meson bound-state problems is both flexible and certain to give results that are symmetry consistent with any realistic gluon-quark vertex, $\\Gamma_\\nu$. This is true whether or not the diagrammatic content of $\\Gamma_\\nu$ is known. Hence, the approach paves a way to new synergies between continuum and lattice approaches to strong interactions.\n\nThe kernels are not unique; but they are closed-form \\emph{Ans\\\"atze} that enable one to reliably reveal and understand how key features of emergent hadron mass (EHM), contained in $\\Gamma_\\nu$, are expressed in the meson spectrum. As an example, we highlighted the multifarious impacts on meson properties of the simple fact that EHM forces dressed-quarks to posses a large anomalous chromomagnetic moment [Sec.\\,\\ref{SecEHM}]. Extending these ideas to baryon bound-state problems would be valuable and attempts are underway.\n\nThis study delivers the first Poincar\\'e-invariant treatment of the spectrum and decay constants of the ground- and first-excited states of $u$, $d$, $s$ mesons along with predictions for masses of as-yet unseen states and many unmeasured decay constants. To expedite the analysis, we used a simplified treatment of quark-antiquark scattering. It would therefore be natural to repeat the work using a more realistic interaction. Also, having benchmarked the method against known lighter-quark states, extension of the approach to heavy+light mesons \\cite{Binosi:2018rht, Chen:2019otg, Qin:2019oar}, hybrid mesons \\cite{Burden:2002ps, Qin:2011xq, Hilger:2015hka, Xu:2018cor} and glueballs \\cite{Meyers:2012ka, Souza:2019ylx, Kaptari:2020qlt, Huber:2021yfy} is desirable, especially given world-wide investments in studies of and searches for such states \\cite{Denisov:2018unj, BESIII:2020nme, Anderle:2021wcy, AbdulKhalek:2021gbh, Pauli:2021gde, Quintans:2022}.\n\n\\begin{acknowledgments}\nWe are grateful for constructive comments from \\mbox{C.~Xu}.\nWork supported by:\nNational Natural Science Foundation of China (grant nos.\\ 12135007, 11805024);\nand\nNatural Science Foundation of Jiangsu Province (grant no.\\ BK20220122).\n\\end{acknowledgments}\n\n\n", "meta": {"timestamp": "2022-09-02T02:05:24", "yymm": "2208", "arxiv_id": "2208.13903", "language": "en", "url": "https://arxiv.org/abs/2208.13903"}} {"text": "\\section{Introduction}\n\\label{sec:introduction}\n\\IEEEPARstart{T}{he} composition of materials can be analyzed in a destructive way using traditional chemical process; nevertheless, we use an innovative material analysis system, based on PGNAA technology and machine learning classification algorithms. The composition of waste material could now be classified in a non-destructive way and with advanced speed.\n\nNeutron activation analysis (NAA) utilizes neutron-induced reactions to examine the spectra of the emissions of a radioactive sample, which could determine the element concentrations or material compositions of the sample. The sample is firstly bombarded by neutrons, causing the elements to form radioactive isotopes. The new forming activation product can be characterized through their radioactive emissions and the energy spectrum is detected by a sensor. and radioactive decay chain; therefore, metal compositions could be found. This project utilizes PGNAA \\cite{pgnaa1}, which is a subcategory of NAA. The additional prefix \u201cPG\u201d means prompt gamma-ray and hinted at state-of-the-art sensors which have a good resolution to detect the emission.\n\nThere is a strong marketing demand to apply PGNAA machines to production applications such as recycle factories or ore refineries \\cite{pgnaa1,pgnaa2,pgnaa3}. If the necessary measurement time for PGNAA is a few hours, such measurements are only useful in a laboratory setting. The measuring time for production must be \u201cheavily reduced\u201d. However, cutting the measuring time of PGNAA means the output spectrum is incomplete and noisy \\cite{noise}. \n\nTo deal with this incompleteness and noise, we devise Random Sampling Methods (RSMs) for generating the \u201cdown-sizing\u201d training samples, by assuming the spectral data as probability distributions.\nThe result spectrum keep the intrinsic down-sized nature while the measurement time of the target sample is drastically cut. \n\nAn additional trick to downsize the samples - visualizing and indicating the importance of each energy level using Class Activation Map (CAM). We can then discard the less important energy range in order to reduce the necessary number of channels in detectors or to allow a finer resolution in the remaining energy ranged. We use these downsized samples to train Convolutional Neural Network (CNN) models.\nSince downsizing too much causes negative effects, we closely monitor loss, accuracy, training time, and prediction time throughout the training process. Visualization such as Confusion Matrix was done to ensure the prediction is accurate even the sampling time is minimized. Though other strategies may achieve better result in speed and accuracy \\cite{helmand}, CNN is still an attractive tool for general classify purpose due to its interpretability of channel importance. \n\n\\section{Dataset}\nWe use totally three datasets for our experiments in Section \\ref{sec:results}. At the beginning of calibration test, the first dataset, with \\textit{10 species of substances (e.g. scrap metal powder, cement and stucco) and two species of soil}, was used for Experiment I. However, we also collected two data sets of samples in the \u201csame species\u201d. The second dataset is only with \\textit{aluminium} (for Experiment II and IIIa) and the third is with \\textit{copper alloys} from Wieland Electric GmbH (for Experiment IIIb). Each species creates a distinctive PGNAA spectrum. The kinetic energies of electron emission spread over a wide range of energy levels. An individual CSV file is dedicated to each species, which records \u201cEnergy (keV)\u201d vs \u201cCounts (intensities)\u201d. The following characteristics can be learned from a PGNAA spectrum:\n\n\\begin{enumerate}\n\\item Each constituent material of a sample could be identified by the characteristic peaks of PGNAA spectrum output.\n\\item Classification with sufficient measure counts is more accurate than with fewer counts; however, the resultant spectrum contains more information than necessary need then the measurement time is too long. If the measurement time is too short, the accuracy of the peaks will be poor, and noise will appear in the spectrum. The minimum value of sample count rate to provide sufficient accuracy will be further discussed in Section \\ref{count}.\n\\item The plottings of the substances having similar compounds are also close to each other because of the chemical similarity. These can be a challenge in this project.\n\\end{enumerate}\n\n\\section{Methods}\nIn order to construct a well-defined and modularized system architecture, we started our workflow with crucial procedures such as \\textbf{designing Random Sampling Methods}, \\textbf{constructing models architectures}, \\textbf{training} and \\textbf{optimizing the training set-up}. During this phase, we also have to foresee some of the future needs such as maintenance, revision, or expansion in the future.\n\\subsection{Random Sample Methods (RSMs)}\\label{sec:rsm}\n\\begin{figure}[t]\n\\centerline{\\includegraphics[width=2.2in]{sampling.png}}\n\\caption{Sampling PGNAA spectrum with different sample count rates. Spectra show clearly visible characteristic peaks for high count rates. Those peaks vanish in noise for low count rates. }\n\\label{fig1}\n\\end{figure}\nRandom Sample Methods (RSMs) is a novel idea that could fulfil our main goal of reducing the measurement time \\cite{helmand}, which is corresponded to \u201ccounts\u201d. The reason to conceive creating downsized samples with random methods is that their nature resembles the original PGNAA spectrum. Their similar nature help to identify the constituent substances and expedite the classification process. A complete data source can be collected all at once. The resulted sample spectrum is the skimmed version of its origin, and its measuring time is reduced tremendously. In other words, spectrum scanning of the target material takes originally few hours, and only a few seconds is allowed for our sample process. Hence, we generate down-sized samples for every training iteration. By applying RSM on the dataset, one detailed spectrum for each composition material is already enough train a model \\cite{dl,cnn}, e.g. a CNN.\n\nLet\u2019s use an imaginative and distinctive energy level list = [1.3 keV, 2.6 keV, 3.9 keV, 5.2 keV]\n, then we have an over-simplified spectrum from PGNAA equipment and treat as the full measuring sample data corresponding to a larger measuring time: [3.9 keV, 2.6 keV, 2.6 keV, 2.6 keV, 2.6 keV, 2.6 keV, 2.6 keV, 2.6 keV, 3.9 keV, 2.6 keV, 2.6 keV, 1.3 keV, 2.6 keV].\n\n\\begin{itemize}\n\\item For saving memory space, the data stored into a CSV file (in our data source) as two arrays: the above distinctive energy level list and its corresponding count list = [1, 10, 2, 0] are recorded.\n\\end{itemize}\n\nFor RSM, we randomly choose a smaller sample, corresponding to a shorter measuring time, for examples five samples (k = 5). The output of this sampling method is a downsized sample as a lengthy list of energy levels, and it might look like this: [2.6 keV, 1.3 keV, 2.6 keV, 3.9 keV, 2.6 keV]. Using the above distinctive energy level list = [1.3 keV, 2.6 keV, 3.9 keV, 5.2 keV] and counting the corresponding occurrence, the output of RSM is: \n\\begin{itemize}\n\\item Result of RSM = corresponding count list = [1, 3, 1, 0]\n\\end{itemize}\nRMS can be repeated arbitrarily often, to lead arbitrarily many data points for training machine learning models.\n\\subsection{CNN design for one-dimensional data}\nConvolutional Neural Network (CNN) has been widely used in the field of supervised learning \\cite{cnn} and especially with image classification. The filters (weights) are matrix values, which can be considered as feature extractor, are learned during the training phase of the model. A convolution between a filter and an image can induce effects such as sharpness, edge detection, and even the characteristic peaks of our spectrum.\n\nAfter rounds of testing and comparisons, see Table \\ref{tab2} and \\cite{benchmark}, the Inception-ResNet-V2 model was usually significantly faster than their competitors and offer a better throughput-accuracy tradeoff. It also has relatively fewer model blocks, thus less work is needed for the modification in comparison with other complex architectures.\n\nThe following steps are taken to handle one-dimensional input data so that it can be compatible with our speedy RSM:\n\\begin{itemize}\n\\item Replace/ add an untrained convolution layer with custom variants \\cite{replace} at the initial point of the pretrained network (\u201cStem\u201d in Inception model), so that the dimension of input data can be fed.\n\n\\item Set desired output classes.\n\n\\item Reduce all 2D layers to 1D including the kernel dimension. \n\\end{itemize}\n\n\\subsection{Training and Fine Tuning}\nBesides optimizing the common training hyper-parameters, like batch size, optimizer, learning rate, etc., we also want to minimize our PGNAA data by (1) seeking the lowest count rate (i.e. live time) of PGNAA samples and (2) seeking energy channels that can be discarded for both CNN training and prediction.\n\n\\begin{figure}[t]\n\\centerline{\\includegraphics[width=3.8in]{cam_detail.png}}\n\\caption{Visualize CNN prediction process with CAM. \\textit{Class Activation Map (CAM) of a specific category illustrates how CNN recognizes the distinguished image regions of that category, by using Global Average Pooling (GAP) performed before the last convolutional network layer.}}\n\\label{fig2}\n\\end{figure}\n\n\\begin{enumerate}\n\\item The measurement count of a sample corresponds linearly to the measurement time of a sample; the fewer the count of measurements, the shorter the measurement time, and the more difficult it is to classify the sample. In short, the reduction of the sampling time is essential for this project, but excessive reduction of measurement time could lead to very noisy data, see \\figurename~\\ref{fig1}. \n\n\\item Class Activation Map (CAM) can be used to illustrate the prediction decisions made by CNN, since Class Activation Map of a specific category indicates how CNN recognizes the distinguished image regions of that category \\cite{cam}. \n\n\\end{enumerate}\n\nAs shown in \\figurename~\\ref{fig2}, CAM is applied to squeeze the data further in order to expedite the classification process. The program\u2019s original setting for the CAM is two dimensions, the setting was changed to one dimension because our selected CNN Model requires one dimension; for instance, the input and the convolutions are one dimension. Base on the result graph, we discard Channel 8000 to Channel 16384 (5641.92 keV to 11552.48 keV) on the high energy band in the later setting for Experiment I. \n\nThe final decision is 0 to 103 on the lower energy band and C8000 to 16384 on the high energy band should be filtered out so that the accuracy and training time could be optimized.\n\nSummarizing above, the final tuning of the best training for \u201c10 species of substances and two species of soil\u201d is shown as Table \\ref{table}.\n\\begin{table}[htbp]\n\\caption{Training Set-up for Experiment I}\n\\label{table}\n\\setlength{\\tabcolsep}{3pt}\n\\begin{tabular}{|p{95pt}|p{145pt}|}\n\\hline\nRandom Sampling Method & Random weighted selection in counts\\\\\nCNN Model & CNN Inception-Resnet-V2 modified 1D\\\\\nBatch size & 128 (hardware limit)\\\\\nOptimizer & Adam \\cite{adam}\\\\\nLearning rate \\cite{lr} & 0.01\\\\\nEpoch & 150 (stop when the wish accuracy is reached)\\\\\nDiscarded energy range & 0 - 103 and 8000 - 16348 \\\\\nSample count rate & 19650 (live time $\\approx 2.479 s$)\\\\\n\\hline\n\\end{tabular}\n\\label{tab1}\n\\end{table}\n\\section{Experimental results}\\label{sec:results}\n\n\\subsection{Experiment I: Train CNN with Dataset of 10 species of substances and two species of soil (live time $\\approx$ 2.479 sec)}\n12 CSV files are used for training purposes which consists of 10 species of sample substances and two species of soil. These files fed into the RSM to be downsized and \u201csmashed\u201d. Totally 19,000 samples are generated and fed into the CNN model for training which requires 150 epochs of iterations. \n\nAs shown in Table \\ref{cm-exp1}, the trained CNN model makes good prediction and achieve good results; the accuracy is averagely about 96.88\\%. The prediction mostly matches with the true label. However, some misclassifications occurred ($\\approx$ 68.5\\%) that occurred between \u201cErdreich-11-15-30\u201d and \u201cErdreich-HgS-inhomogen\u201d. This \u201csame species\u201d classification problem will be handled in the next experiment.\n\n\n\\subsection{Experiment II: Dataset Aluminium with 20,000 as sample count rate (live time $\\approx$ 1.11 sec)}\nUsing the same set-up as Experiment I, such as the same CNN model, same RSM, only the input dataset is using \u201cDataset Aluminium\u201d and the discarded energy band of \u201c0 - 103\u201d for this new training. The result is schown in Table \\ref{cm-exp2}\n\nEven when the training is only focused on the substances with the same species, the performance remains about 65 \\% and seems impossible to be raised. The problem can be caused by the setting of the sample count rate, in which the resulted sample data from RSM is too noisy and doesn\u2019t provide enough information to distinguish from similar species.\n\n\\subsection{Experiment IIIa: Dataset Aluminium with increased sample count rate to 500,000 (live time $\\approx$ 27.74 s)}\\label{count}\n\n\\begin{figure}[t]\n\\centerline{\\includegraphics[width=3.8in]{alu_setup.png}}\n\\caption{Optimizing sampling strategy for Dataset Aluminium. \\textit{Cross Entropy Loss between input and target is computed during training a classification problem. The aim of training is to minimize the loss.}} \n\\label{fig3}\n\\end{figure}\n\nTotally five different sample count rates were tested. The curve with the lowest loss function and the best convergence is chosen. We started with the count rate (20,000) suggested by the last experiment. At the end, we choose 500,000 as our sample count rate according \\figurename~\\ref{fig3}a, in which the loss function descents stably. The result is shown in Table \\ref{cm-exp3a}.\n\nIn addition, we visualized in \\figurename~\\ref{fig3}b with CAM which energy ranges have more influence on classification and filtered out the irrelevant energy range to save training time.\n\n\\subsection{Experiment IIIb: Dataset Copper with increased sample count rate to 500,000 (live time $\\approx$ 24.44 s)}\nFor Dataset from Wieland\u2019s Copper, we repeated the same process as Experiment IIIa to choose the optimal sample count rate for training. The only difference is the CNN\u2019s high energy discarded range is set to the range of 14,000 to 16,384 (base on the suggestion from CAM) in order to improve the training speed. The result is shown in Table \\ref{cm-exp3b}\n\n\\section{Discussion}\nAll of our designed models, which equips with tested and chosen RSMs, can classify the PGNAA spectrum and find out the constituent materials speedily. The CNNs can classify the sampled PGNAA spectrum based on automated extracted aggregate features, which is different from common Machine Learning methods like Support Vector Machine (SVM) \\cite{svm}. Other classical approaches such as Normalized Cross Correlation (NCC) \\cite{ncc} and Chi-squared matching, perform the classification by comparing the input data with templates iteratively. In order to achieve better accuracy, the number of these templates must be increased since they are proportionally related.\n\n\\section{Conclusion}\nThe essence of this essay is to devise a suitable Random Sampling Method (RSM) to generate some downsized samples to train the CNN models instead of the cumbersome fully measured samples. The downsized samples are proven to be easier to handle and improving speed. Also, the downsized samples have a resemblance nature to the test spectrum reading from the PGNAA because both are heavily downsized data. These two reasons boost PGNAA\u2019s reading speed so fast that it becomes a high-speed scanner.\n\nIn the end, reduced Model (CNN Inception-ResNet-V2modified 1D) using RSM (Random weighted selection in counts) to extract our data from CSV, was determined as our best matching pair after several test runs.\n\nBesides tuning the model with different batch sizes, optimizers, and learning rates, the downsized training samples were also inspected with different sampling count rates so as to minimize our necessary sample live time for a prediction. In addition, to strengthen the downsizing process, some less important energy channels were filtered out according to the energy information provided by the Class Activation Map. For example, the highest energy range (Channel 8000 - 16384) and the lowest energy range are discarded.\n\n\\appendices\n\\section*{Data Availability}\nThe MetalClas project is funded by the German Federal Ministry of Education and Research (BMBF) with grant number 01IS20082B. The copper alloys were provided by Wieland-Werke AG. The PGNAA data used to support the findings of this study are available in a public repository for AI Challenge Days 2021.\n\n\\section*{Appendix}\n\\subsection{Comparing RSM and their best models}\nFour RSM were designed, but in section \\ref{sec:rsm} only RSM-3 was introduced. The other three RSM are implemented using Python as follow: \n\\begin{itemize}\n\\item RSM-1 is simply the lengthy measured energy value list. \n\n\\item RSM-2 applies a random binomial function to each row in the column \u2019Counts\u2019 of the CSV files. Nevertheless, the sum of measurement counts within a sample remains similar for each material.\n\n\\item RSM-4a or RSM-4b is a scatter-plot format or a histogram format converted from the sampling result from RSM-1. \n\\end{itemize}\nThe four RSM methods have to test together with DL models. In this section, CNN and other DL models were tested together with RSM. \n\\begin{table}[H]\n\\caption{Comparing RSM and their best models}\n\n\\setlength{\\tabcolsep}{3pt}\n\\begin{tabular}{|c|p{125pt}|c|c|c|}\n\\hline\n& Combination of model and RSM & Accuracy & Sampling & Iteration\\\\\n&&& time (s) & time (s)\\\\\n\\hline\n1 & CNN ResNet-101 + RSM-1 & $\\approx$ 65\\% & 0.012 & -\\\\\n2 & CNN custom 1D + RSM-3 & $\\approx$ 65\\% & 0.073 & -\\\\\n3 & \\scriptsize{CNN ResNet-101 (modified 1D) + RSM-2} & - & 1.488 & -\\\\\n4 & \\scriptsize{CNN ResNet-101 (modified 1D) + RSM-3} & $\\approx$ 80\\% & 0.073 & - \\\\\n5 & \\scriptsize{CNN Inception-ResNet-V2 \\cite{incp}+ RSM-2} & - & 1.488 & - \\\\\n\\textbf{6} & \\textbf{CNN Inception-ResNet-V2 + RSM-3} & $\\approx$ 90\\% & 0.073 & 2.1 \\\\\n7 & CNN NasnetALarge \\cite{nasnet} + RSM-4a & $<$ 80\\% & 0.696 & -\\\\\n8 & CNN PNasnet5Large \\cite{pnas}+ RSM-4a & $<$ 80\\% & 0.696 & -\\\\\n9 & RNN \\cite{transformer} custom + RSM-3 & $<$ 40\\% & 0.073 & - \\\\\n10 & Transformer \\cite{transformer} (custom) + RSM-3 & $\\approx$ 90\\% & 0.073 &13.8 \\\\\n\\hline\n\\end{tabular}\n\\label{tab2}\n\\end{table}\n\n\\subsection{Confusion Matrices for Experiment I, II, III$a$ and III$b$}\n\\newcommand\\items{12}\n\\arrayrulecolor{white}\n\\noindent\n\\begin{table}[H]\n\\caption{Confusion Matrix for Experiment I}\\label{cm-exp1}\n\\centering\n\\resizebox{9cm}{!}{\n\\begin{tabular}{cc*{\\items}{|E}|}\n\n\\multicolumn{1}{c}{} &\\multicolumn{1}{c}{} &\\multicolumn{\\items}{c}{Predicted} \\\\ \\hhline{~*\\items{|-}|}\n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{\\rot{Scrap metal powder}} & \n\\multicolumn{1}{c}{\\rot{Cement}} & \n\\multicolumn{1}{c}{\\rot{Stucco}} &\n\\multicolumn{1}{c}{\\rot{Al-1}} &\n\\multicolumn{1}{c}{\\rot{Cu-1}} &\n\\multicolumn{1}{c}{\\rot{Melamine}} &\n\\multicolumn{1}{c}{\\rot{Asilikos}} &\n\\multicolumn{1}{c}{\\rot{PVC}} & \n\\multicolumn{1}{c}{\\rot{Soil-1}} &\n\\multicolumn{1}{c}{\\rot{Soil Hgs inhomogeneous}} &\n\\multicolumn{1}{c}{\\rot{Battery NiCd}} &\n\\multicolumn{1}{c}{\\rot{Copper ore-A-prod}} \\\\ \n\\hhline{~*\\items{|-}|}\n\\multirow{\\items}{*}{\\rotatebox{90}{Actual}} \n&Scrap metal powder & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Cement & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Stucco & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Al-AW7075 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Cu & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Melamine & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Asilikos & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&PVC & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Soil-1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Soil Hgs inhomogeneous & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0.28 & 0.72 & 0 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Battery NiCd & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\\\ \\hhline{~*\\items{|-}|}\n&Copper ore-A-prod & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\ \\hhline{~*\\items{|-}|}\n\\end{tabular}\n}\n\\end{table}\n\n\\newcommand\\fouritems{4} \n\\noindent\n\\begin{table}[H]\n\\caption{Confusion Matrix for Experiment II}\\label{cm-exp2}\n\\centering\n\\resizebox{4.8cm}{!}{\n\\begin{tabular}{cc*{\\fouritems}{|E}|}\n\\multicolumn{1}{c}{} &\\multicolumn{1}{c}{} &\\multicolumn{\\fouritems}{c}{Predicted} \\\\ \\hhline{~*\\fouritems{|-}|}\n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{\\rot{Al-1}} & \n\\multicolumn{1}{c}{\\rot{Al-2}} & \n\\multicolumn{1}{c}{\\rot{Al-3}} &\n\\multicolumn{1}{c}{\\rot{Al-4}} \\\\ \n\\hhline{~*\\fouritems{|-}|}\n\\multirow{\\fouritems}{*}{\\rotatebox{90}{Actual}} \n&Al-1 & 0.97 & 0 & 0 & 0.03 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-2 & 0.06 & 0.38 & 0.41 & 0.16 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-3 & 0 & 0.12 & 0.88 & 0 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-4 & 0.53 & 0.22 & 0.03 & 0.22 \\\\ \\hhline{~*\\fouritems{|-}|}\n\\end{tabular}\n}\n\\end{table}\n\n\n\\noindent\n\\begin{table}[H]\n\\caption{Confusion Matrix for Experiment III$a$}\\label{cm-exp3a}\n\\centering\n\\resizebox{4.8cm}{!}{\n\\begin{tabular}{cc*{\\fouritems}{|E}|}\n\\multicolumn{1}{c}{} &\\multicolumn{1}{c}{} &\\multicolumn{\\fouritems}{c}{Predicted} \\\\ \\hhline{~*\\fouritems{|-}|}\n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{\\rot{Al-1}} & \n\\multicolumn{1}{c}{\\rot{Al-2}} & \n\\multicolumn{1}{c}{\\rot{Al-3}} &\n\\multicolumn{1}{c}{\\rot{Al-4}} \\\\ \n\\hhline{~*\\fouritems{|-}|}\n\\multirow{\\fouritems}{*}{\\rotatebox{90}{Actual}} \n&Al-1 & 0.98 & 0 & 0 & 0.02 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-2 & 0 & 1 & 0 & 0 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-3 & 0 & 0.1 & 0.9 & 0 \\\\ \\hhline{~*\\fouritems{|-}|}\n&Al-4 & 0.06 & 0.09 & 0 & 0.84 \\\\ \\hhline{~*\\fouritems{|-}|}\n\\end{tabular}\n}\n\\end{table}\n\n\\newcommand\\fiveitems{5} \n\\noindent\n\\begin{table}[H]\n\\caption{Confusion Matrix for Experiment III$b$}\\label{cm-exp3b}\n\\centering\n\\resizebox{6.0cm}{!}{\n\\begin{tabular}{cc*{\\fiveitems}{|E}|}\n\\multicolumn{1}{c}{} &\\multicolumn{1}{c}{} &\\multicolumn{\\fiveitems}{c}{Predicted} \\\\ \\hhline{~*\\fiveitems{|-}|}\n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{} & \n\\multicolumn{1}{c}{\\rot{Cu-1}} & \n\\multicolumn{1}{c}{\\rot{Cu-2}} & \n\\multicolumn{1}{c}{\\rot{Cu-3}} &\n\\multicolumn{1}{c}{\\rot{Cu-4}} &\n\\multicolumn{1}{c}{\\rot{Cu-5}}\\\\ \n\\hhline{~*\\fiveitems{|-}|}\n\\multirow{\\fiveitems}{*}{\\rotatebox{90}{Actual}} \n&Cu-1 & 0.97 & 0.03 & 0 & 0 & 0 \\\\ \\hhline{~*\\fiveitems{|-}|}\n&Cu-2 & 0.09 & 0.91 & 0 & 0 & 0 \\\\ \\hhline{~*\\fiveitems{|-}|}\n&Cu-3 & 0 & 0 & 1 & 0 & 0 \\\\ \\hhline{~*\\fiveitems{|-}|}\n&Cu-4 & 0 & 0 & 0 & 1 & 0 \\\\ \\hhline{~*\\fiveitems{|-}|}\n&Cu-5 & 0 & 0 & 0 & 0 & 1 \\\\ \\hhline{~*\\fiveitems{|-}|}\n\\end{tabular}\n}\n\\end{table}\n\n", "meta": {"timestamp": "2022-08-31T02:04:12", "yymm": "2208", "arxiv_id": "2208.13909", "language": "en", "url": "https://arxiv.org/abs/2208.13909"}} {"text": "\\section{Introduction}\r\n\r\n\r\n\r\nRadio frequency (RF) based wireless communication is inevitably susceptible to eavesdropping due to the broadcast nature of the electromagnetic waves.\r\nWith the ever-growing Internet of Things (IoT) applications, the security issue has become more and more crucial in the future sixth-generation\r\n(6G) wireless networks \\cite{Nguyen21Security}. For example, in large enterprise buildings, hospitals, factories, communication may be very sensitive to a hostile adversary.\r\n The conventional cryptography\r\napproaches \\cite{Chen17Survey} usually focus on protecting the transmission content or increasing message decoding complexity, while the physical-layer security \\cite{Barros11Physical, Wang16Physical} approaches exploit the intrinsic wireless fading channels properties\r\n to minimize\r\nthe information leakage to the eavesdroppers.\r\nIn fact, a higher level of security is to hide the existence of the communication, which not only can be applied in all aforementioned application scenarios {\\cite{Du2021Optimal,Du2022Reconfigurable,Du2022performance,Xie2022sec}}, but also meets more critical demands from military or security agencies.\r\nTo address this high level security, covert communications \\cite{Yan2019Gaussian,Bloch16}, which shields the existence of message transmissions against the detection of a warden,\r\nare emerging as a cutting-edge wireless communication security technique, and have recently attracted significant research attention. {{Note that, covert communication aims to hide the communication behavior from the eavesdropper, while physical layer security tries to reduce the interception information of the eavesdropper.\r\n}}\r\n\r\n\r\n\r\n\r\n\r\nThe basic idea of covert communications is as follows. The legitimate transmitter (Alice) transmits messages to the paired receiver (Bob), while guaranteeing a low detection probability for a Warden (Willie).\r\n Although such idea has been realized by spread-spectrum techniques \\cite{Simon94}\r\nfor several decades, the information-theoretic limits of covert communications, which is also referred to as low probability of detection (LPD) communications in some literature, were only recently derived \\cite{Bash13,Bloch16,Wornell16,Wu17}. In particular, the authors in \\cite{Bash13} firstly demonstrated that in additive white Gaussian noise\r\n(AWGN) channels, Alice can reliably\r\nsend at most $\\mathcal{O}\\left(\\sqrt n\\right)$ bits to Bob in $n$ channel usages under the covert requirement. Such result is also called the square root law (SRL).\r\nSubsequent works have extended this result\r\n to various channel models such as binary symmetric channels \\cite{Che2013Reliable}, broadcast communications \\cite{Arumugam_TIFS_2019}, multiple access channels \\cite{Arumugam_TIT_2019}, and interference channels\\cite{Cho_TIFS_2020}.\r\n\r\n\r\n\r\n\r\nAlthough the SRL indicates that the asymptotic achievable rate under the covert requirement approaches zero,\r\n many researchers have shown that the SRL limit can be beaten by exploiting additional techniques in the considered covert communication scenario. These methods include: taking advantage of the ignorance of the transmission time at Willie \\cite{Bash2016Covert}; applying an intelligent reflecting surface \\cite{Wang_TOM_2021,Chen_WCL_2021,Si_TCOM_2021}; exploring the molecular absorption or scattering feature of the Terahertz\r\nspectrum \\cite{Gao_TWC_2020,Liu_IoT_2020}; cooperative jamming \\cite{Zheng_TWC_2021} or uninformed jamming \\cite{Sobers2017Covert,Li_TWC_2020,Shmuel_TCOM_2021,Li_TWC_2021}; jointly optimizing the beam\r\ntraining and data transmission for millimeter-wave communication \\cite{Zhang_TIFS_2021};\r\n exploiting the uncertainty noise power or channel state information (CSI) at Willie \\cite{Goeckel16cov,Lee15ach,Yan17on,Shahzad_TIFS_2020,Cheng_TCOM_2021}; robust beamforming design \\cite{Zheng2019Multi-Antenna,MA_TIFS_2021}, over a finite number of\r\nchannel uses\\cite{Yan19Delay}; applying a full-duplex transceiver \\cite{Shu2019Delay,Wang_TWC_2021,SUN_TCOM_2021,Zheng_TCOM_2020}; intermediate relay \\cite{Hu2019Covert,Sheikholeslami2018Multi-Hop} or exploring unmanned aerial\r\nvehicle (UAV) as mobile relay \\cite{WangHM_TWC_2019,Yan_ISAC_2021}.\r\n\r\n To be more specific, it is shown in \\cite{Bash2016Covert} that Alice can covertly transmit ${\\cal O}\\left( {\\min \\left\\{ {n,\\sqrt {n\\log T\\left( n \\right)} } \\right\\}} \\right)$\r\nbits to Bob.\r\nWhen Willie lacks the knowledge of his noise power, Alice can reliably transmit $\\mathcal{O}\\left( n\\right)$\r\nbits \\cite{Goeckel16cov,Lee15ach,Yan17on}.\r\nWith the aid of an uninformed jammer \\cite{Sobers2017Covert},\r\nAlice can also achieve positive transmission rate.\r\n With a finite number of channel uses, a uniformly distributed power allocation scheme was proposed \\cite{Yan19Delay} to enhance the covert transmission.\r\n In \\cite{Shu2019Delay}, the authors showed that, the effective throughput under delay constraints can be improved by adding artificial noise (AN) at the full-duplex receiver.\r\nFor a one-way relay network, the authors in \\cite{Hu2019Covert} studied the performance limits of convert communications of an energy harvesting relay.\r\nIn \\cite{Sheikholeslami2018Multi-Hop}, a multiple-relay network was considered, and both the maximum throughput and the minimum\r\nend-to-end delay routing algorithms were developed with multiple Willies.\r\n In \\cite{Zheng2019Multi-Antenna}, two probabilistic metrics, called the covert outage\r\nprobability and the connectivity probability, were analyzed for multi-antenna covert communications\r\nwith randomly located wardens and\r\ninterferers.\r\n Using Kullback-Leibler divergence, i.e., $D\\left( {{p_1}||{p_0}} \\right)$ or $D\\left( {{p_0}||{p_1}} \\right)$\r\n to measure the covertness, a Gaussian input distribution\r\n was shown to be optimal for the covert metric $D\\left( {{p_1}||{p_0}} \\right)$, and not optimal for the covert metric $D\\left( {{p_0}||{p_1}} \\right)$\\cite{Yan2019Gaussian},\r\nwhere ${p_0}$ and ${p_1}$ represent Willie's received signal distributions when\r\ncovert communications occur and not occur, respectively.\r\n\r\n\r\n\r\n\r\nThe aforementioned research advances in covert communications\r\nmainly make the assumption of a Gaussian input distribution at the transmitter side,\r\nwhich can hardly be realized in practical communication\r\nsystems.\r\nIn fact, the information symbols in practical communication systems are realized in\r\nthe form of discrete constellation points, i.e., finite alphabet inputs, such as pulse amplitude modulation (PAM)\r\n and multiple-quadrature amplitude modulation (M-QAM).\r\nIn \\cite{Topal_WCL_2020}, both the lower bound of covert transmission probability and throughput maximization have been analyzed with discrete constellation inputs. The discrete constellation points are assumed to be equally likely, which is not optimal for practical covert communications, especially for high-order modulation schemes.\r\n So far, the optimal discrete constellation inputs of covert communication are still not well discussed in the literature.\r\n\r\nMotivated by the above background, we develop information-theoretic limits of covert communications with probabilistic constellation shaping. First, we\r\nderive the achievable rate expression of the system with the\r\ndiscrete constellation input signals, rather than the Gaussian\r\ninputs adopted in most of the existing works. Then, we\r\ninvestigate the performance with the optimal input distribution. Our results provide a\r\npractical design framework for covert communication systems.\r\n The main contributions of this\r\npaper are summarized as follows:\r\n\\begin{itemize}\r\n\\item Generally, the inputs of practical communications\r\nsystems follow a finite-set discrete distribution rather\r\nthan a Gaussian distribution. To evaluate performance, we derive the achievable\r\nrate expressions for an arbitrary discrete distributed input. Comparing to the existing rate expressions with\r\nequiprobable discrete constellation points, the derived\r\nexpressions are more general and practical. Since the derived\r\nrate expression is not in closed-form, we further derive\r\nboth lower and upper bounds. All these results\r\ncan be used as performance metrics for the considered covert communication\r\nsystem.\r\n\\item\r\nFurthermore, we design optimal discrete constellation inputs to\r\n maximize the exact covert rate under the covertness constraints, transmit power limitations, and the signal distribution requirements, which is a challenging problem since neither the exact covert rate nor the covertness constraint has an analytical expression. To\r\nefficiently solve it, we conservatively transform the covertness constraint into its upper bound with closed-form expression.\r\n Then, we adopt the numerical integration method\r\nto approximate the covert rate objective function and its gradient.\r\nAfterwards, the optimal\r\nprobability distributions of the discrete constellation are\r\ncalculated by the approximate gradient descent method, where the step sizes are calculated\r\nby the backtracking line search.\r\n\\item To reduce the computation complexity of the\r\ndesign problem, we further adopt the derived lower\r\nbound as the covert rate performance metric. To overcome the non-convexity challenge, this problem is iteratively\r\nsolved by the proposed Frank-Wolfe method.\r\n\r\n\r\n\r\n \\end{itemize}\r\n\r\n The rest of this paper is organized as follows. The\r\nsystem model and the derivation of Bob's achievable rate are presented in Section II.\r\nThe optimal probabilistic constellation shaping design for covert communications is provided in Section III.\r\n The probabilistic constellation shaping design and its approximations are presented in Section IV.\r\n In Section V, we evaluate the proposed probabilistic constellation shaping design using numerical results. Finally, we conclude the paper in Section VI.\r\n\r\n\\emph{Notations}: The vectors and matrices are represented by\r\nboldfaced lowercase and uppercase letters, respectively.\r\nThe notations ${\\left( \\cdot \\right)^{\\rm{*}}}$, ${{\\mathbb E}}\\left\\{ \\cdot \\right\\}$,\r\n$\\left\\| \\cdot \\right\\|$, ${\\rm{Tr}}\\left( \\cdot \\right)$, ${\\mathop{\\rm Re}\\nolimits} \\left( \\cdot \\right)$ and ${\\mathop{\\rm Im}\\nolimits} \\left( \\cdot \\right)$ represent the conjugate,\r\nthe expectation, Frobenius norm, trace, the real part and imaginary part of its argument, respectively.\r\nAnd $ \\odot$ is Hadamard Product,i.e.${A_{m \\times n}}\\left[ {{a_{ij}}} \\right] \\odot {B_{m \\times n}}\\left[ {{b_{ij}}} \\right] = {C_{m \\times n}}\\left[ {{a_{ij}}{b_{ij}}} \\right]$.\r\n The operator ${\\bf{A}}\\underline \\succ {\\bf{0}}$ means ${\\bf{A}}$ is positive semidefinite.\r\nThe notation $\\mathcal{CN}\\left( {\\mu ,{\\sigma ^2}} \\right)$ denotes a complex-valued circularly symmetric Gaussian distribution with mean $\\mu$ and variance ${\\sigma ^2}$.\r\n\r\n\r\n\r\n\r\n\r\n\\section{System model}\r\n\r\n\r\n\\begin{figure}[h]\r\n \\centering\r\n\t\\includegraphics[width=7cm]{figures/system_model}\r\n \\caption{ {The system model of covert communication.}}\r\n \\label{model}\r\n\\end{figure}\r\nConsider a typical covert communication scenario as illustrated in Fig. 1, where Alice and Bob are a legitimate communication pair, and Willie is the eavesdropper. Each one of them is equipped with a single antenna. Let ${{{g}}_{\\rm{b}}} \\sim {\\cal CN}\\left( {{{0}},\\sigma_1^2} \\right)$ and ${{{g}}_{\\rm{w}}} \\sim {\\cal CN}\\left( {{{0}},\\sigma_2^2} \\right)$ denote the Rayleigh flat fading channel from Alice to Bob and Willie, respectively \\cite{Shahzad2017Covert}, where $\\sigma_1^2$ and $\\sigma_2^2$ are the variances of ${{{g}}_{\\rm{b}}}$ and ${{{g}}_{\\rm{w}}}$.\r\nLet $x\\left[ i \\right]$ denote Alice's transmitted symbol at the $i$-th channel use, where $i = 1,...,N$, and $N$ is the total number of channel uses.\r\n\r\n\\subsection{ Achievable Rate of Bob}\r\n As it is the case in practice, $x\\left[ i \\right] \\in \\Omega $\r\n follows a discrete constellation distribution instead of Gaussian distribution. Here, $\\Omega$ denotes a discrete constellation set with $K$ discrete points\r\n ${\\left\\{ {{x_k}} \\right\\}_{1 \\le k \\le K}}$, i.e.,\r\n\\begin{align} \\label{Omega}\r\n\\Omega = \\left\\{ {X\\left| {\\begin{array}{*{20}{l}}\r\n\\begin{array}{l}\r\n\\Pr \\left( {X = {x_k}} \\right) = {p_k} \\ge {\\rm{0,}}\\sum\\limits_{k = 1}^K {{p_k}} = 1,\\\\\r\n\\sum\\limits_{k = 1}^K {{p_k}} {\\left| {{x_k}} \\right|^2} \\le {P_{\\rm{A}}},{x_k} \\in {\\mathbb {C}},k = 1,...,K\r\n\\end{array}\r\n\\end{array}} \\right.} \\right\\},\r\n\\end{align}\r\n where ${{x_k}}$ denotes the $k$th discrete point, ${{p_k}}$ denotes the corresponding probability, and ${P_\\text{A}}$ denotes the average power.\r\n\r\n\r\n\r\n For the $i$-th channel use, the received signal at Bob ${y_{\\rm{b}}}\\left[ i \\right]$\r\n is given as\r\n\\begin{align}\r\n{y_{\\rm{b}}}\\left[ i \\right]\r\n = {g_{\\rm{b}}^{\\rm{*}}}{x}\\left[ i \\right] + {z_{\\rm{b}}}\\left[ i \\right],\r\n\\end{align}\r\nwhere ${z_{\\rm{b}}}\\left[ i \\right] \\sim \\mathcal{CN}\\left( {0,\\sigma _{\\rm{b}}^2} \\right)$ denotes the received noise at Bob.\r\nSince $x\\left[ i \\right] \\in \\Omega$, then the likehood functions of ${y_{\\rm{b}}}\\left[ i \\right]$ is given as\r\n\\begin{align}\\label{pdf}\r\n{p}\\left( {{y_{\\rm{b}}}} \\right) = \\frac{1}{{{\\pi } {{\\sigma _{\\rm{b}}^2}}}}\\sum\\limits_{k = 1}^K {{p_k}\\exp \\left( { - \\frac{{{{\\left| {{y_{\\rm{b}}} - {g_{\\rm{b}}^{\\rm{*}}}{x_k}} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)}.\r\n\\end{align}\r\n\r\nTherefore, given the discrete constellation, the achievable rate of Bob ${R_{\\rm{b}}}$ is given by\r\n\\begin{subequations}\\label{MI}\r\n\\begin{align}\r\n&{R_{\\rm{b}}}=I\\left( {{y_{\\rm{b}}}\\left[ i \\right];{x}\\left[ i \\right]} \\right) \\\\\r\n&= h\\left( {{y_{\\rm{b}}}\\left[ i \\right]} \\right) - h\\left( {{z_{\\rm{b}}}\\left[ i \\right]} \\right) \\\\\r\n&= - \\int_{ - \\infty }^\\infty {{p}\\left( {{y_{\\rm{b}}}} \\right)} {\\log _2}{p}\\left( {{y_{\\rm{b}}}} \\right)d{y_{\\rm{b}}} - {\\log _2}\\pi e\\sigma _b^2\\\\\r\n&= - \\sum\\limits_{k = 1}^K {{p_k}} {\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {{{\\log }_2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) + {z_{\\rm{b}}}} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)} \\right\\}\\nonumber\\\\\r\n&\\quad - \\frac{1}{{\\ln 2}},\r\n\\end{align}\r\n\\end{subequations}\r\n{where $h\\left( X \\right) = - \\int {f\\left( x \\right)} \\log f\\left( x \\right)dx$ denotes differential entropy, and\r\n ${f\\left( x \\right)}$ represents the probability density function (PDF).}\r\n\r\nBased on the achievable rate expression {in \\eqref{MI}}, we will further investigate the optimal probability of discrete constellations for covert communications.\r\n\\subsection{Hypothesis Testing}\r\n\r\n\r\nAccording to the received signals,\r\nWillie attempts to decide whether Alice covertly transmits information\r\nto Bob or not by performing an optimal statistical hypothesis\r\ntest (such as the Neyman-Pearson test).\r\nSpecifically, Willies needs to distinguish between two hypotheses: 1) the null\r\nhypothesis ${{\\cal{H}}_0}$ indicating no transmission; 2) the hypothesis ${{\\cal{H}}_1}$ indicating the transmission. Let $y_{\\rm{w}}\\left[ i \\right]$ denote the received signal at Willie in the $i$-th channel use.\r\n Under the two hypotheses, the signal received at Willie is given as\r\n\\begin{subequations}\\label{hypothesis}\r\n\\begin{align}\r\n&\\quad {{\\cal{H}}_0}:y_{\\rm{w}}\\left[ i \\right] = {z_{\\rm{w}}}\\left[ i \\right],\\\\\r\n&\\quad {{\\cal{H}}_1}:y_{\\rm{w}}\\left[ i \\right] = g_{\\rm{w}}^{\\rm{*}}x\\left[ i \\right] + {z_{\\rm{w}}}\\left[ i \\right],\r\n\\end{align}\r\n\\end{subequations}\r\nwhere\r\n${z_{\\rm{w}}}\\left[ i \\right]\\sim \\mathcal{CN}\\left( {0,\\sigma _{\\rm{w}}^2} \\right)$ denotes the received noise at Willie.\r\nLet ${{\\cal{D}}_1}$ and ${{\\cal{D}}_0}$, respectively, denote\r\nthe binary decisions of Willie.\r\nThus, the total detection error probability of Willie is defined as \\cite{Yan2019Gaussian,T.M.Cover02,Lehmann_2005_Testing}\r\n\\begin{align}\r\n\\xi = \\Pr \\left( {{{\\cal{D}}_1}|{{\\cal{H}}_0}} \\right) + \\Pr \\left( {{{\\cal{D}}_0}|{{\\cal{H}}_1}} \\right).\r\n\\end{align}\r\n\r\nNote that, $\\Pr \\left( {{{\\cal{D}}_1}|{{\\cal{H}}_0}} \\right)$\r\n denotes the false alarm probability that Willie believes ${{\\cal{H}}_1}$ when Alice does not transmit,\r\n and $\\Pr \\left( {{{\\cal{D}}_0}|{{\\cal{H}}_1}} \\right)$ denotes the\r\n missed detection probability that Willie decides ${{\\cal{H}}_0}$ when Alice transmits.\r\nMoreover, let\r\n${p_{y,0}} = f\\left( {{y_{\\rm{w}}}\\left| {{{\\cal{H}}_0}} \\right.} \\right)$ and ${p_{y,1}}= f\\left( {{y_{\\rm{w}}}\\left| {{{\\cal{H}}_1}} \\right.} \\right)$ denote the likehood functions of $y_{\\rm{w}}$ under ${{\\cal{H}}_0}$ and ${{\\cal{H}}_1}$, respectively.\r\nAccording to \\eqref{hypothesis}, we have\r\n\\begin{subequations}\\label{likehood_function}\r\n\\begin{align}\r\n&{p_{y,0}} = \\frac{1}{{ {\\pi } {{\\sigma _{\\rm{w}}^2}}}}\\exp \\left( { - \\frac{\\left|{{y_{\\rm{w}}}}\\right|^2}{{\\sigma _{\\rm{w}}^2}}} \\right),\\\\\r\n&{p_{y,1}} = \\frac{1}{{ {\\pi } {{\\sigma _{\\rm{w}}^2}}}}\\sum\\limits_{k = 1}^K {{p_k}\\exp \\left( { - \\frac{{{{\\left| {y_{\\rm{w}} - {g_{\\rm{w}}^{\\rm{*}}}{x_k}} \\right|}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)}.\r\n\\end{align}\r\n\\end{subequations}\r\nLet ${V_T}\\left( {{{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.}} \\right) = \\frac{1}{2}{\\left\\| {{p_{y,0}} - {p_{y,1}}} \\right\\|_1}$ denote\r\nthe total variation distance between ${p_{y,0}}$ and ${p_{y,1}}$.\r\nAccording to Theorem 13.1.1 in \\cite{Lehmann_2005_Testing}, the optimal detection error probability of Willie is given as\r\n\\begin{align}\r\n{\\xi ^ {\\rm{opt}} } = 1 - {V_T}\\left( {{{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.}} \\right) = 1 - \\frac{1}{2}{\\left\\| {{p_{y,0}} - {p_{y,1}}} \\right\\|_1}.\r\n\\end{align}\r\n\r\n\r\nHowever, in general, ${V_T}\\left( {{{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.}} \\right)$ is difficult to analyze. To address this issue,\r\nwe apply Pinsker's inequality \\cite{T.M.Cover02} to obtain an upper bound\r\n\\begin{align}\r\n &{V_T}\\left( {{{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.}} \\right) \\le \\sqrt {\\frac{1}{2}D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)}, \\\\\r\n &{V_T}\\left( {{{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.}} \\right) \\le \\sqrt {\\frac{1}{2}D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)},\r\n\\end{align}\r\nwhere $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) = \\int_y {{p_{y,0}}{{\\log }_2}\\frac{{{p_{y,0}}}}{{{p_{y,1}}}}} d{y_{\\rm{w}}}$ denotes\r\n the Kullback-Leibler (KL) divergence from ${p_{y,0}}$ to ${p_{y,1}}$, and $ D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) = \\int_y {{p_{y,1}}{\\log _2} \\frac{{p_{y,1}}}{{p_{y,0}}}} d{y_{\\rm{w}}}$\r\n denotes the KL divergence from ${p_{y,1}}$ to ${p_{y,0}}$.\r\n\r\n\r\nBased on the likehood functions of $y_{\\rm{w}}$ in \\eqref{likehood_function}, $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ are, respectively, given as\r\n\\begin{subequations}\r\n\\begin{align}\r\n&D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) = - \\frac{1}{{\\ln 2}} - {\\mathbb{E}_{{z_{\\rm{w}}}}}\\left\\{ {{{\\log }_2}\\sum\\limits_{k = 1}^K {{p_k}} } \\right.\\nonumber\\\\\r\n&\\quad \\times \\left. {\\exp \\left( { - \\frac{{{{\\left| {{z_{\\rm{w}}} - g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right\\}\\label{D_p01},\\\\\r\n&D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) = \\sum\\limits_{k = 1}^K {{p_k}} {\\mathbb{E}_{{z_{\\rm{w}}}}}\\left\\{ {{{\\log }_2}\\sum\\limits_{j = 1}^K {{p_j}} } \\right.\\nonumber\\\\\r\n&\\quad \\left. { \\times \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}\\left( {{x_k} - {x_j}} \\right) + {z_{\\rm{w}}}} \\right|}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right\\} + \\frac{{\\left| {g_{\\rm{w}}^2{P_{\\rm{A}}}} \\right|}}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}} + \\frac{1}{{\\ln 2}}.\\label{D_p10}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nCovert communication is achieved for a given $\\varepsilon$ if the detection error probability\r\n$ \\xi $ is no less than $1 - \\varepsilon $, i.e.,\r\n\\begin{align}\r\n\\xi \\ge 1 - \\varepsilon, \\varepsilon \\in \\left[ {0,1} \\right],\r\n\\end{align}\r\nwhere $\\varepsilon $ is a small number determining the required covertness level.\r\n\r\nTherefore, to achieve covert communication with the given $\\varepsilon$, i.e., $ \\xi \\ge 1 - \\varepsilon $, the KL divergences of the likelihood functions should satisfy one of the following constraints:\r\n\\begin{subequations}\\label{Dp0p1}\r\n\\begin{align}\r\n&D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\\le 2{\\varepsilon ^2},\\\\\r\n&D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Both} of the above constraints can meet the requirements of covert communication, but these two constraints are not exactly the same, as we discuss next.\r\n\r\n\r\n\\section{Optimal Signaling Design under Covert Constraints}\r\n\r\nIn this section, we investigate the design of optimal probability of discrete constellation points\r\n for covert transmission with covertness constraints $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$ or\r\n$D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$ \\cite{Bash13,Wornell16,Bloch16,Yan2019Gaussian}.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\\subsection{Case of ${D}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$}\r\n\r\nFor the probabilistic constellation shaping scheme, we aim to maximize\r\nthe achievable rate of Bob ${R_{\\rm{b}}}$ by optimizing the distribution of discrete constellation inputs, while satisfying both the covert\r\ntransmission constraint and the discrete distribution constraint. Mathematically, the covert discrete constellation input optimization problem can be formulated as follows\r\n\\begin{subequations}\\label{orig_pro 01 1}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} ~& {R_{\\rm{b}}}\\\\\r\n{\\rm{s.t.}}~&D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\\le 2{\\varepsilon ^2},\\label{orig1a}\\\\\r\n&\\Pr \\left( {X = {x_k}} \\right) = {p_k} \\ge {\\rm{0}},\\label{orig1b}\\\\\r\n& \\sum\\limits_{k = 1}^K {{p_k}{{\\left| {{x_k}} \\right|}^2}} \\le {P_\\text{A}},\\label{orig1c}\\\\\r\n& \\sum\\limits_{k = 1}^K {{p_k}} = 1,k = 1,...,K.\\label{orig1d}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n\r\n\r\nSince the KL-divergence $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ in \\eqref{D_p10} is not an analytical expression, constraint \\eqref{orig1a} is intractable.\r\n To circumvent this, we\r\n derive an explicit upper bound for the KL-divergence $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$, which is given by\r\n \\begin{align}\\label{KL_p0p1_ub}\r\n{D_{{\\rm{U}}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\r\n= - {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right).\r\n\\end{align}\r\n\r\n{The} details of derivations for \\eqref{KL_p0p1_ub} can be found in Appendix A.\r\nBased on \\eqref{KL_p0p1_ub}, problem \\eqref{orig_pro 01 1} can be rewritten as\r\n\\begin{subequations}\\label{orig_2}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} ~& {R_{\\rm{b}}}\\\\\r\n{\\rm{s.t.}}~&- {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right) \\le 2{\\varepsilon ^2},\\\\\r\n&\\eqref{orig1b}, \\eqref{orig1c}, \\eqref{orig1d}.\\nonumber\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n In order to put problem \\eqref{orig_2} in a more concise form, we define the following variables \\begin{subequations}\r\n\\begin{align}\r\n&{\\bf{x}} \\buildrel \\Delta \\over = {[{x_1},...,{x_K}]^T},\\\\\r\n&{\\bf{p}} \\buildrel \\Delta \\over = {[{p_1},...,{p_K}]^T},\\\\\r\n\n &{{ {\\bf{q}}}} \\buildrel \\Delta \\over ={\\left[ {{{\\log }_2}{{\\bf{p}}^T}{{{\\bf{\\hat q}}}_1},...,{{\\log }_2}{{\\bf{p}}^T}{{{\\bf{\\hat q}}}_K}} \\right]^T},\\\\\r\n&{{\\hat {\\bf{q}}}_l} \\buildrel \\Delta \\over = \\left[ \\begin{array}{l}\r\n\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}\\left( {{x_l} - {x_1}} \\right) + {z_{\\rm{b}}}} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)\\\\\r\n~~~~~~~~~~~~~~~...\\\\\r\n\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}\\left( {{x_l} - {x_K}} \\right) + {z_{\\rm{b}}}} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)\r\n\\end{array} \\right],\\forall l \\in K,\\\\\r\n &{\\bf{t}} \\buildrel \\Delta \\over = {\\left[ {\\exp \\left( { - \\frac{{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_1}} \\right|}}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right),...,\\exp \\left( { - \\frac{{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_K}} \\right|}}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right]^T},\\\\\r\n &\\phi \\left( {\\bf{p}} \\right)\\buildrel \\Delta \\over ={\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {{{\\bf{p}}^T}{\\bf{q}}} \\right\\}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Furthermore}, the rate of Bob ${R_{\\rm{b}}}$ and the upper bound of the KL-divergence ${D_{\\rm{U}}}\\left( {{p_{0}}||{p_{1}}} \\right)$\r\n can be, respectively, rewritten as follows\r\n \\begin{subequations}\r\n\\begin{align}\r\n &{R_{\\rm{b}}}= - \\phi \\left( {\\bf{p}} \\right) - \\frac{1}{{\\ln 2}},\\\\\r\n &{D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) = - {\\log _2}{{\\bf{p}}^T}{\\bf{t}}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nSince $\\frac{1}{{\\ln 2}}$ is constant and maximizing $ - \\phi \\left( {\\bf{p}} \\right)$ is equivalent to minimizing $ \\phi \\left( {\\bf{p}} \\right)$, problem \\eqref{orig_2} can be reformulated as\r\n\\begin{subequations} \\label{pro_01 2}\r\n\\begin{align}\r\n\\mathop { \\min }\\limits_{{\\bf{p}}} &~ \\phi \\left( {\\bf{p}} \\right) \\label{obj_01}\\\\\r\n{\\rm{s}}.{\\rm{t}}.&-{\\log _2}{{\\bf{p}}^T}{\\bf{t}} \\le 2{\\varepsilon ^2},\\label{pro_01 2a}\\\\\r\n &{{\\bf{p}}^T}{{\\bf{1}}_K} = 1,\\label{pro_01 2b}\\\\\r\n &{\\bf{p}}^T\\left( {{\\bf{x}} \\odot {\\bf{x}}} \\right) \\le {P_{\\rm{A}}},\\label{pro_01 2c}\\\\\r\n &{\\bf{p}} \\ge {\\bf{0}},\\label{pro_01 2d}\r\n\\end{align}\r\n\\end{subequations}\r\nwhere ${{{\\bf{1}}_K}}$ is a $K \\times 1$ vector with all elements equal to $1$.\r\n\r\nProblem \\eqref{pro_01 2} is now a convex problem, and we adopt the gradient projection method to solve it. Specifically, let $\\nabla \\phi \\left( {\\bf{p}} \\right)$ denote the gradient of the objective function \\eqref{obj_01}, which is given by\r\n\\begin{subequations}\r\n\\begin{align}\r\n\\nabla \\phi \\left( {\\bf{p}} \\right) &= {\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {{\\bf{q}} + {\\bf{Qp}}} \\right\\}=\\int_{ - \\infty }^\\infty {{f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right)\\left( {{\\bf{q}} + {\\bf{Qp}}} \\right)d} {z_{\\rm{b}}}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Here}, ${f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right){\\rm{ = }}\\frac{{\\rm{1}}}{{\\pi \\sigma _{\\rm{b}}^2}}\\exp \\left( { - \\frac{{{{\\left| {{z_{\\rm{b}}}} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)$ denotes the probability density function of ${z_{\\rm{b}}}$, ${\\bf{Q}}\\buildrel \\Delta \\over =\\left[ {{Q_{i,j}}} \\right]$, where ${Q_{i,j}} \\buildrel \\Delta \\over = \\frac{{{\\bf{\\hat q}}_j^T{{\\bf{e}}_i}}}{{{\\bf{\\hat q}}_j^T{\\bf{p}}\\ln 2}}$, and ${{\\bf{e}}_i}$ is the unit vector where the $i$th element is $1$ and the other elements are $0$.\r\n\r\nHowever, neither the objective function $\\phi \\left( {\\bf{p}} \\right)$ or the gradient $\\nabla \\phi \\left( {\\bf{p}} \\right)$ has an analytic expression. To tackle this challenge, we adopt the numerical integration method to approximate $\\phi \\left( {\\bf{p}} \\right)$ and $\\nabla \\phi \\left( {\\bf{p}} \\right)$, i.e.,\r\n\\begin{subequations}\r\n\\begin{align}\r\n\\quad \\tilde \\phi \\left( {\\bf{p}} \\right) &= \\int_{ - {\\tau _1} }^{ {\\tau _1}} {{f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right)\\left( {{{\\bf{p}}^T}{\\bf{q}}} \\right)d} {z_{\\rm{b}}} ,\\label{obj_func}\\\\\r\n\\quad \\nabla \\tilde \\phi \\left( {\\bf{p}} \\right) &= \\int_{ - {\\tau _2} }^{ {\\tau _2}} {{f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right)\\left( {{\\bf{q}} + {\\bf{Qp}}} \\right)d} {z_{\\rm{b}}}, \\label{obj_func_gradient}\r\n\\end{align}\r\n\\end{subequations}\r\nwhere $\\left[ { - {\\tau _1} , {\\tau _1}} \\right]$ and $\\left[ { - {\\tau _2} , {\\tau _2}} \\right]$ denote the integration intervals, by defining ${\\tau _1} > 0$, ${\\tau _2} > 0$. Furthermore, $\\tilde \\phi \\left( {\\bf{p}} \\right)$ and $\\nabla \\tilde \\phi \\left( {\\bf{p}} \\right)$ denote the approximations of the objective function and its gradient, respectively.\r\n\r\nFurthermore, let ${{\\bf{p}}_0}$ denote a feasible starting point, and ${{\\bf{p}}_n}$ denote the $n$th iteration feasible point. With approximate gradient $ \\nabla \\tilde \\phi \\left( {\\bf{p}} \\right)$,\r\nthe gradient descent iteration step is given as\r\n \\begin{align}{{{\\bf{\\hat p}}}_{n + 1}} = {{\\bf{p}}_n} - {\\alpha _n}\\nabla \\tilde \\phi \\left( {{\\bf{p}}_n} \\right),\\end{align}\r\n where ${\\alpha _n} \\in \\left( {0,1} \\right]$ is the stepsize of the $n$th iteration. To choose a proper step size ${\\alpha _n}$ with a sufficient decrease, we adopt the backtracking line search algorithm, given in Algorithm $1$.\r\n\r\nThen, we project ${{{{\\bf{\\hat p}}}_{n + 1}}}$ in the feasible region of problem \\eqref{pro_01 2}.\r\nSpecifically, when ${{{{\\bf{\\hat p}}}_{n + 1}}}$ satisfies constraints \\eqref{pro_01 2a}-\\eqref{pro_01 2d}, ${{{{\\bf{ p}}}_{n + 1}}}={{{{\\bf{\\hat p}}}_{n + 1}}}$ is in the feasible region. Otherwise, we need to find the closest point ${{{{\\bf{ p}}}_{n + 1}}}$ in the feasible region as the projection of ${{{{\\bf{\\hat p}}}_{n + 1}}}$.\r\n\r\nMathematically, the projection operation of ${{{{\\bf{\\hat p}}}_{n + 1}}}$ can be formulated as follows\r\n\\begin{subequations}\\label{pro_01 3}\r\n\\begin{align}\r\n \\mathop {\\min }\\limits_{{{\\bf{p}}_{n + 1}}} &~ {\\left\\| {{{\\bf{p}}_{n + 1}} - {{{\\bf{\\hat p}}}_{n + 1}}} \\right\\|^2}\\\\\r\n{\\rm{s}}.{\\rm{t}}.&~-{\\log _2}{{{\\bf{p}}_{n + 1}^T}}{\\bf{t}} \\le 2{\\varepsilon ^2},\\\\\r\n &~{{{\\bf{p}}_{n + 1}^T}}{{\\bf{1}}_K} = 1,\\\\\r\n &~{{\\bf{p}}_{n + 1}^T}\\left( {{\\bf{x}} \\odot {\\bf{x}}} \\right) \\le {P_{\\rm{A}}},\\\\\r\n &~{{\\bf{p}}_{n + 1}} \\ge {\\bf{0}}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n\\begin{algorithm}[htb]\r\n\t\\caption{: Backtracking Line Search for Stepsize ${\\alpha _n}$.}\r\n\n\t\\begin{algorithmic}[1]\r\n\t\t\\State {\\bf{Input:}} ${{{\\bf{ p}}}_n}$, $\\tilde \\phi {\\left( {{{\\bf{p}}_{n }}} \\right)}$, $\\nabla \\tilde \\phi \\left( {{\\bf{p}}_n} \\right)$ and $\\bar \\alpha > 0$, $\\rho ,c \\in \\left( {0,1} \\right)$;\r\n\t\t\\State Update ${{{\\bf{\\hat p}}}_{n+1}}={{\\bf{p}}_n} - {\\alpha _n}\\nabla \\tilde \\phi \\left( {{\\bf{p}}_n} \\right)$;\r\n\t\t\\State {\\bf{if}} ${{{\\bf{\\hat p}}}_{n+1}}$ satisfies constraints \\eqref{pro_01 2a}-\\eqref{pro_01 2d}\r\n\t\t\\State ~~~${{{{\\bf{ p}}}_{n + 1}}}={{{{\\bf{\\hat p}}}_{n + 1}}}$;\r\n\t\t\\State {\\bf{else}}\r\n\t\t\\State ~~~Solve problem \\eqref{pro_01 3} over ${{{\\bf{\\hat p}}}_{n+1}}$ to obtain ${{{{\\bf{ p}}}_{n + 1}}}$;\r\n\t\t\\State {\\bf{end}};\r\n\t\t\\State {\\bf{While}}\r\n\t\t$\\tilde \\phi \\left( {{{{\\bf{ p}}}_{n + 1}}} \\right) \\le \\tilde \\phi \\left( {{{\\bf{p}}_{n }}} \\right) + c\\bar \\alpha \\nabla \\tilde \\phi {\\left( {{{\\bf{p}}_{n }}} \\right)^T}\\left( {{{{{\\bf{ p}}}_{n + 1}}} - {{\\bf{p}}_{n }}} \\right)$;\r\n\t\t\\State ~~~$\\bar \\alpha \\leftarrow \\rho \\bar \\alpha$;\r\n\t\t\\State ~~~Update ${{{\\bf{\\hat p}}}_{n+1}}={{\\bf{p}}_n} - {\\alpha _n}\\nabla \\tilde \\phi \\left( {{\\bf{p}}_n} \\right)$;\r\n\t\t\\State ~~~{\\bf{if}} ${{{\\bf{\\hat p}}}_{n+1}}$ satisfies constraints \\eqref{pro_01 2a}-\\eqref{pro_01 2d}\r\n\t\t\\State ~~~~~~${{{{\\bf{ p}}}_{n + 1}}}={{{{\\bf{\\hat p}}}_{n + 1}}}$;\r\n\t\t\\State ~~~{\\bf{else}}\r\n\t\t\\State ~~~~~~solve problem \\eqref{pro_01 3} over ${{{\\bf{\\hat p}}}_{n+1}}$ to obtain ${{{{\\bf{ p}}}_{n + 1}}}$;\r\n\t\t\\State ~~~{\\bf{end}};\r\n\t\t\\State {\\bf{end}};\r\n\t\t\\State {\\bf{return}} ${\\alpha _{n }} = \\bar \\alpha $.\r\n\t\\end{algorithmic}\r\n\\end{algorithm}\r\n\r\n\r\n\r\n\r\n\r\n\r\nTherefore, we propose the approximate gradient descent projection method to\r\n efficiently solve problem \\eqref{pro_01 2}, which is summarized\r\nin Algorithm 2.\r\n \\begin{algorithm}[htb]\r\n \t\\caption{ Inexact Gradient Descent Projection Method.}\r\n \t\\begin{algorithmic}[1]\r\n \t\t\\State {\\bf{Input}}:\r\n \t\t\\parbox[t]{\\dimexpr\\linewidth-\\algorithmicindent - 0.45cm}{choose $K \\ge 2$ and choose a random starting point ${{\\bf{p}}_0}$ which satisfies constraints \\eqref{pro_01 2a}-\\eqref{pro_01 2d}, set ${c_2}$ as the stopping parameter and $n=0$;\\strut}\t\t\t\r\n \t\t\\State {\\bf{Repeat}}\r\n \t\t\\State ~~~Let $n \\leftarrow n + 1$;\r\n \t\t\\State ~~~Update $\\tilde \\phi \\left( {{{\\bf{p}}_{n - 1}}} \\right)= \\int_{ - {\\tau _1} }^{ {\\tau _1}} {{f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right)\\left( {{{{\\bf{p}}_{n - 1}^T}}{\\bf{q}}} \\right)d} {z_{\\rm{b}}}$;\r\n \t\t\\State ~~~Update ${\\nabla \\tilde \\phi \\left( {{{\\bf{p}}_{n - 1}}} \\right)}= \\int_{ - {\\tau _2} }^{ {\\tau _2}} {{f_{{z_{\\rm{b}}}}}\\left( {{z_{\\rm{b}}}} \\right)\\left( {{\\bf{q}} + {\\bf{Q}}{{\\bf{p}}_{n - 1}}} \\right)d} {z_{\\rm{b}}}$;\r\n \t\t\\State ~~~Compute stepsize ${\\alpha _{n - 1}}$ by Algorithm $1$;\r\n \t\t\\State ~~~Update ${{{\\bf{\\hat p}}}_{n}}={{\\bf{p}}_{n-1}} - {\\alpha _{n-1}}\\nabla \\tilde \\phi \\left( {{\\bf{p}}_{n-1}} \\right)$;\r\n \t\t\\State ~~~{\\bf{if}} ${{{\\bf{\\hat p}}}_{n}}$ satisfies constraints \\eqref{pro_01 2a}-\\eqref{pro_01 2d}\r\n \t\t\\State ~~~~~~${{{{\\bf{ p}}}_{n }}}={{{{\\bf{\\hat p}}}_{n }}}$;\r\n \t\t\\State ~~~{\\bf{else}}\r\n \t\t\\State ~~~~~~solve problem \\eqref{pro_01 3} over ${{{\\bf{\\hat p}}}_{n}}$ to obtain ${{{{\\bf{ p}}}_{n }}}$;\r\n \t\t\\State ~~~{\\bf{end}};\r\n \t\t\\State {\\bf{Until}} $\\left\\| {{{\\bf{p}}_n} - {{\\bf{p}}_{n - 1}}} \\right\\| \\le {c_2}$;\r\n \t\t\\State {\\bf{Output}} ${{\\bf{P}}^{{\\rm{opt}}}} = {{\\bf{p}}_n}$.\r\n \t\\end{algorithmic}\r\n \\end{algorithm}\r\n\r\n\\subsection{Case of ${D}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$}\r\n\r\nIn this subsection, we further consider the other covert constraint ${D}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$, and the corresponding covert rate optimization problem can be formulated as\r\n\\begin{subequations}\\label{orig_pro 10 1}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} ~& {R_{\\rm{b}}}\\\\\r\n{\\rm{s.t.}}~&{D}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2},\\label{orig2a}\\\\\r\n&\\Pr \\left( {X = {x_k}} \\right) = {p_k} \\ge {\\rm{0}},\\label{orig2b}\\\\\r\n& \\sum\\limits_{k = 1}^K {{p_k}{{\\left| {{x_k}} \\right|}^2}} \\le {P_{\\rm{A}}},\\label{orig2c}\\\\\r\n& \\sum\\limits_{k = 1}^K {{p_k}} = 1,k = 1,...,K.\\label{orig2d}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nTo handle intractable constraint \\eqref{orig2a}, we first derive an upper bound on $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ denoted by ${D_{{\\rm{U}}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$, which is given by\r\n\\begin{align}\\label{KL_p1p0_ub}\r\n{D_{{\\rm{U}}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\r\n=&\\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{w}}^2}}} \\right)\\nonumber\\\\\r\n&+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1.\r\n\\end{align}\r\n\r\n{The} details of derivations for \\eqref{KL_p1p0_ub} are given in Appendix B.\r\nThen, problem \\eqref{orig_pro 10 1} can be reformulated as\r\n\\begin{subequations}\\label{orig_3}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} ~& {R_{\\rm{b}}}\\\\\r\n{\\rm{s.t.}}~&{D_{{\\rm{U}}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\le 2{\\varepsilon ^2},\\label{orig_3a}\\\\\r\n&\\eqref{orig2b}, \\eqref{orig2c}, \\eqref{orig2d}.\\nonumber\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nBy defining the following equations to simplify ${D_{{\\rm{U}}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$\r\n\\begin{subequations}\r\n\\begin{align}\r\n &{{\\bf{s}}_{{\\rm{w}},k}} \\buildrel \\Delta \\over = \\left[ \\begin{array}{l}\r\n\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}\\left( {{x_k} - {x_1}} \\right)} \\right|}^2}}}{{2\\sigma _{\\rm{w}}^2}}} \\right),...,\r\n\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}\\left( {{x_k} - {x_K}} \\right)} \\right|}^2}}}{{2\\sigma _{\\rm{w}}^2}}} \\right)\r\n\\end{array} \\right]^T,\\\\\r\n &{{\\bf{v}}_{\\rm{w}}}\\left( {\\bf{p}} \\right)\\buildrel \\Delta \\over = {\\left[ {{{\\log }_2}{{\\bf{p}}^T}{{\\bf{s}}_{{\\rm{w}},1}},...,{{\\log }_2}{{\\bf{p}}^T}{{\\bf{s}}_{{\\rm{w}},K}}} \\right]^T},\r\n\\end{align}\r\n\\end{subequations}\r\nwe obtain\r\n\\begin{align}\r\n &{D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) = {{\\bf{p}}^T}{{\\bf{v}}_{\\rm{w}}}\\left( {\\bf{p}} \\right) + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_x}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1.\r\n\\end{align}\r\n\r\nUnfortunately, ${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ is non-convex in ${\\bf{p}}$, and the covert constraint is also non-convex.\r\nTo handle this issue, we apply the first order Taylor expansion to ${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$. Specifically,\r\nthe derivative of ${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ at ${{{{\\bf{\\bar p}}}_n}}$ is given by\r\n\\begin{align}\r\n\\nabla {D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\left| {_{{\\bf{p}} = {{{{\\bf{\\bar p}}}_n}}}} \\right.\r\n= {{\\bf{v}}_{\\rm{w}}}\\left( {{{\\bf{\\bar p}}}_n} \\right) + \\nabla {{\\bf{v}}_{\\rm{w}}}{{{\\bf{\\bar p}}}_n},\r\n\\end{align}\r\nwhere $\\nabla {\\bf{v}}_{\\rm{w}} = {\\left[ {\\frac{{{{\\bf{s}}_{{\\rm{w}},1}}}}{{{{{\\bf{\\bar p}}}_n^T}{{\\bf{s}}_{{\\rm{w}},1}}}},...,\\frac{{{{\\bf{s}}_{{\\rm{w}},K}}}}{{{{{\\bf{\\bar p}}}_n^T}{{\\bf{s}}_{{\\rm{w}},K}}}}} \\right]_{K \\times K}}$. Then, the first order Taylor expansion of ${{\\bf{p}}^T}{{\\bf{v}}_{\\rm{w}}}\\left( {\\bf{p}} \\right)$ is given as follows\r\n\\begin{align}\r\nL\\left( {\\bf{p}} \\right)\\approx {\\bf{\\bar p}}_n^T{\\bf{v}}_{\\rm{w}}\\left( {{{\\bf{\\bar p}}}_n} \\right) + {\\left( {{\\bf{v}}_{\\rm{w}}\\left( {{{\\bf{\\bar p}}}_n} \\right) + \\nabla {\\bf{v}}_{\\rm{w}}{{{\\bf{\\bar p}}}_n}} \\right)^T}\\left( {{\\bf{p}} - {{{\\bf{\\bar p}}}_n}} \\right).\r\n\\end{align}\r\n\r\n\r\nThen, constraint \\eqref{orig_3a} can be recast to a convex form as\r\n\\begin{align}\r\nL\\left( {\\bf{p}} \\right) + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1 \\le 2{\\varepsilon ^2}\\label{constraint_1_2}.\r\n\\end{align}\r\n\r\nThus, problem \\eqref{orig_pro 10 1} can be reformulated as\r\n\\begin{subequations}\\label{pro_10 2}\r\n\\begin{align}\r\n \\mathop {\\min }\\limits_{{{\\bf{p}}}} &~ \\phi \\left( {\\bf{p}} \\right) \\\\\r\n\\quad {\\rm{s}}{\\rm{.t}}{\\rm{.}} &~L\\left( {\\bf{p}} \\right) + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1 \\le 2{\\varepsilon ^2},\\\\\r\n\\quad &~ {\\bf{p}}^T{{\\bf{1}}_K} = 1,\\\\\r\n\\quad & ~{\\bf{p}}^T\\left( {{\\bf{x}} \\odot {\\bf{x}}} \\right) \\le {P_{\\rm{A}}},\\\\\r\n\\quad & ~{{\\bf{p}}} \\ge 0,\r\n\\end{align}\r\n\\end{subequations}\r\nwhich is convex.\r\n\r\n Similarly, problem \\eqref{pro_10 2} can be efficiently solved by the inexact gradient descent projection method. The details are omitted due to space limitation.\r\n\r\n\\section{Signaling Design with Approximate Covert Rate Expression}\r\n\r\n\r\nDue to the expectation operation,\r\nthe achievable rate in \\eqref{MI} does not have a closed-form expression, and can only be computed numerically using the approximate gradient descent method at the expense of high computational complexity.\r\n To strike a balance\r\nbetween complexity and performance, we further derive analytical upper bound and lower bound on the achievable rate in \\eqref{MI}.\r\n\r\n\\textbf{Lemma 1}: An upper bound ${R_{\\rm{b}}^{\\text{U}}}$ on the rate ${R_{\\rm{b}}}$ is given by\r\n\\begin{align}\\label{mutual_I ub}\r\n{R_{\\rm{b}}^{\\text{U}}}\r\n=- \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right),\r\n\\end{align}\r\nwhile a lower bound ${R_{\\rm{b}}^{\\text{L}}}$ on the rate ${R_{\\rm{b}}}$ is given as\r\n\\begin{align}\\label{mutual_I lb}\r\n{R_{\\rm{b}}^{\\text{L}}}\r\n=& - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{b}}^2}}} \\right)\\nonumber\\\\\r\n&- \\frac{1}{{\\ln 2}}+1.\r\n\\end{align}\r\nPlease find the derivation in Appendices C and D.\r\n\r\nIn this section, we adopt the upper bound and lower bound on the achievable rate ${R_{\\rm{b}}}$ in our following analysis.\r\n\r\n\\subsection{Maximizing ${R_{\\rm{b}}^{\\text{U}}}$ }\r\n\r\nIn this subsection, we consider the upper bound on the achievable rate for Bob ${R_{\\rm{b}}^{\\text{U}}}$ as the objective function to find the optimal probability of discrete constellation points set. Specifically, we study beamforming design with the objective of maximizing ${R_{\\rm{b}}^{\\text{U}}}$, subject to the covert transmission\r\nconstraint, and the discrete constellation set.\r\n\r\n\\subsubsection{${D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$}\r\n\r\nFinding the optimal probability of discrete constellation set can be equivalently written as the following optimization problem\r\n\\begin{subequations}\\label{uper_pro 01 2}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} &~ {R_{\\rm{b}}^{\\text{U}}}\\\\\r\n{\\rm{s}}{\\rm{.t}}.&~{D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2},\\\\\r\n&~\\Pr \\left( {X = {x_k}} \\right) = {p_k} \\ge {\\rm{0}},\\label{up1b}\\\\\r\n&~ \\sum\\limits_{k = 1}^K {{p_k}{{\\left| {{x_k}} \\right|}^2}} \\le {P_{\\rm{A}}},\\label{up1c}\\\\\r\n&~ \\sum\\limits_{k = 1}^K {{p_k}} = 1,k = 1,...,K.\\label{up1d}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nIn order to solve problem \\eqref{uper_pro 01 2}, we first define the following variables\r\n\\begin{subequations}\r\n\\begin{align}\r\n &{{\\bf{r}}_k} \\buildrel \\Delta \\over = \\left[ \\begin{array}{l}\r\n \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_1}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right),...,\r\n \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_K}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)\r\n \\end{array} \\right]^T,\\\\\r\n &{\\bf{u}}\\left( {\\bf{p}} \\right) \\buildrel \\Delta \\over = {\\left[ {{{\\log }_2}{{\\bf{p}}^T}{{\\bf{r}}_1},...,{{\\log }_2}{{\\bf{p}}^T}{{\\bf{r}}_K}} \\right]^T}.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{In} this case, we can obtain\r\n\\begin{align}\r\n &{R_{\\rm{b}}^{\\text{U}}} = - {{\\bf{p}}^T}{\\bf{u}}\\left( {\\bf{p}} \\right).\r\n\\end{align}\r\n\r\nTherefore, problem \\eqref{uper_pro 01 2} can be reformulated as follows\r\n\\begin{subequations}\\label{approxi_pro 01}\r\n\\begin{align}\r\n \\mathop { \\min }\\limits_{{\\bf{p}}} &~ {f_{\\rm{U}}}\\left( {\\bf{p}} \\right) \\label{obj_01 1}\\\\\r\n{\\rm{s}}{\\rm{.t}}.&~- {\\log _2}{{\\bf{p}}^T}{\\bf{t}} \\le 2{\\varepsilon ^2},\\label{approxi_pro 01a}\\\\\r\n &~{{\\bf{p}}^T}{{\\bf{1}}_K} = 1,\\label{approxi_pro 01b}\\\\\r\n &~{\\bf{p}}^T\\left( {{\\bf{x}} \\odot {\\bf{x}}} \\right) \\le {P_{\\rm{A}}},\\label{approxi_pro 01c}\\\\\r\n &~{\\bf{p}} \\ge {\\bf{0}},\\label{approxi_pro 01d}\r\n\\end{align}\r\n\\end{subequations}\r\nwhere ${f_{\\rm{U}}}\\left( {\\bf{p}} \\right)={{{\\bf{p}}^T}{\\bf{u}}}\\left( {\\bf{p}} \\right)$.\r\nSince the Frank-Wolf method is an algorithm for solving linearly-constrained problems, it makes a linear approximation of the objective function, obtains the feasible descending direction by solving the linear programming, and conducts a one-dimensional search in the feasible region along this direction.\r\nTherefore, we will apply the Frank-Wolf method to solve the optimization problem.\r\n\r\n We use Taylor's expansion to make a linear approximation of the objective function ${f_{\\rm{U}}}\\left( {\\bf{p}} \\right)$.\r\n The first order Taylor expansion at ${{{\\bf{p}}_n}}$ is as follows\r\n\\begin{subequations}\r\n\\begin{align}\r\n&{f_{\\rm{U}}}\\left( {\\bf{p}} \\right) \\approx {\\bf{p}}_n^T{\\bf{u}}\\left( {{\\bf{p}}_n} \\right) + \\nabla {f_{\\rm{U}}}{\\left( {{{\\bf{p}}_i}} \\right)^T}\\left( {{\\bf{p}} - {{\\bf{p}}_n}} \\right),\\\\\r\n&\\nabla {f_{\\rm{U}}}\\left( {{{\\bf{p}}_n}}\\right) = {\\bf{u}}\\left( {{\\bf{p}}_n} \\right) + \\nabla {\\bf{u}}{{\\bf{p}}_n},\r\n\\end{align}\r\n\\end{subequations}\r\nwhere $\\nabla {\\bf{u}} = {\\left[ {\\frac{{{{\\bf{r}}_{1}}}}{{{{\\bf{p}}_n^T}{{\\bf{r}}_{1}}}},...,\\frac{{{{\\bf{r}}_{K}}}}{{{{\\bf{p}}_n^T}{{\\bf{r}}_{K}}}}} \\right]_{K \\times K}}$, and ${{\\bf{p}}_n}$ denotes the current iteration point.\r\nThen, we reformulate the optimization problem of \\eqref{approxi_pro 01} as follows\r\n\\begin{align}\\label{pro_U 01}\r\n \\mathop { \\min }\\limits_{{\\bf{p}}}& ~\r\n \\nabla {f_{\\rm{U}}}{\\left( {{{\\bf{p}}_n}} \\right)^T}{\\bf{p}} \\\\\r\n{\\rm{s}}{\\rm{.t}}.&~\\eqref{approxi_pro 01a}, \\eqref{approxi_pro 01b}, \\eqref{approxi_pro 01c}, \\eqref{approxi_pro 01d}\\notag.\r\n\\end{align}\r\n\r\nBy applying the Frank-Wolf method, the detailed procedures for solving (40) are summarized in Algorithm $3$. Note that ${\\lambda _n}$ is the stepsize of the $n$th iteration and ${{\\bf{d}}_n}$ denotes the feasible descending\r\ndirection of the $n$th iteration.\r\n\\begin{algorithm}[htb]\r\n \\caption{: Solving \\eqref{pro_U 01} by Frank-Wolf method.}\r\n \\begin{algorithmic}[1]\r\n \\State {\\bf{Initialization:}} Choose a feasible starting point ${{\\bf{p}}_0}$, set $\\delta > 0 $ as the stopping parameter, let $n=0$;\r\n \\State {\\bf{While}} $\\left\\| {\\nabla {f}{{\\left( {{{\\bf{p}}_n}} \\right)}^T}{{\\bf{d}}_n}} \\right\\| \\le \\delta $;\r\n\r\n \\State ~~~Solve the linear programming problems \\eqref{pro_U 01}\r\n and obtain optimal solution ${{\\bf{\\bar p}}_n}$ ;\r\n \\State ~~~Construct the feasible descending direction ${{\\bf{d}}_n} = {\\bf{\\bar p}}_n - {\\bf{p}}_n$;\r\n \\State ~~~Obtain optimal solution ${\\lambda _n}=\\mathop {\\arg \\min }\\limits_{0 \\le \\lambda \\le 1} {f}\\left( {{{\\bf{p}}_n} + \\lambda {{\\bf{d}}_n}} \\right)$;\r\n \\State ~~~Let ${{\\bf{p}}_{n + 1}} = {{\\bf{p}}_n} + {\\lambda _n}{{\\bf{d}}_n}$, $n \\leftarrow n + 1;$\r\n \\State {\\bf{end}}\r\n \\State {\\bf{Output}} ${{\\bf{p}}_n}$\r\n \\end{algorithmic}\r\n\\end{algorithm}\r\n\r\n\r\n\\subsubsection{${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$}\r\nFurthermore, we consider the optimal probabilistic constellation shaping for covert communications with covert constraint ${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$, such that\r\n\\begin{subequations}\\label{uper_pro 10 2}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}}&~ {R_{\\rm{b}}^{\\text{U}}}\\\\\r\n{\\rm{s}}{\\rm{.t}}.&~{D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2},\\label{uper_pro 10 2a}\\\\\r\n&~\\eqref{up1b}, \\eqref{up1c}, \\eqref{up1d}\\notag,\r\n\\end{align}\r\n\\end{subequations}\r\n which is non-convex.\r\n\r\n Similar to problem \\eqref{orig_3}, by replacing constraint \\eqref{uper_pro 10 2a} by constraint \\eqref{constraint_1_2}, we can obtain the optimization problem\r\n\\begin{subequations} \\label{pro_U 10}\r\n\\begin{align}\r\n \\mathop { \\min }\\limits_{{\\bf{p}}}&~\\nabla {f_U}{\\left( {{{\\bf{p}}_n}} \\right)^T}{\\bf{p}} \\\\\r\n{\\rm{s}}{\\rm{.t}}.& ~L\\left( {\\bf{p}} \\right) + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1 \\le 2{\\varepsilon ^2},\\\\\r\n &~\\eqref{approxi_pro 01a}, \\eqref{approxi_pro 01b}, \\eqref{approxi_pro 01c}, \\eqref{approxi_pro 01d}\\notag.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Then}, we apply the Frank-Wolf method to solve problem \\eqref{uper_pro 01 2}, and the details are omitted since the corresponding algorithm is\r\nsimilar to Algorithm 3.\r\n\r\n\r\n\\subsection{Maximizing $ {R_{\\rm{b}}^{\\text{L}}}$}\r\n\r\nIn this subsection, we further study the lower bound beamforming design for covert communication by maximizing the lower bound ${R_{\\rm{b}}^{\\text{L}}}$, while satisfying the covert transmission requirement and the discrete constellation set with $K$.\r\n\r\n\\subsubsection{${D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2}$}\r\n Under the covert constraint ${D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$,\r\n the optimal probabilistic constellation shaping for covert communications is formulated as follows\r\n\\begin{subequations}\\label{low_pro 01 1}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} &~ {R_{\\rm{b}}^{\\text{L}}}\\\\\r\n{\\rm{s}}{\\rm{.t}}.&~{D_{\\rm{U}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 2{\\varepsilon ^2},\\label{lowa}\\\\\r\n&~\\Pr \\left( {X = {x_k}} \\right) = {p_k} \\ge {\\rm{0}},\\label{lowb}\\\\\r\n&~ \\sum\\limits_{k = 1}^K {{p_k}{{\\left| {{x_k}} \\right|}^2}} \\le {P_{\\rm{A}}},\\label{lowc}\\\\\r\n&~ \\sum\\limits_{k = 1}^K {{p_k}} = 1,k = 1,...,K.\\label{lowd}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n\r\nTo solve the problem, we define the following equations\r\n\\begin{subequations}\r\n\\begin{align}\r\n &{{\\bf{s}}_{{\\rm{b}},k}} \\buildrel \\Delta \\over = \\left[ \\begin{array}{l}\r\n \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_1}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{b}}^2}}} \\right),...,\r\n \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_K}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{b}}^2}}} \\right)\r\n \\end{array} \\right]^T,\\\\\r\n &{{\\bf{v}}_{\\rm{b}}}\\left( {\\bf{p}} \\right) \\buildrel \\Delta \\over = {\\left[ {{{\\log }_2}{{\\bf{p}}^T}{{\\bf{s}}_{{\\rm{b}},1}},...,{{\\log }_2}{{\\bf{p}}^T}{{\\bf{s}}_{{\\rm{b}},K}}} \\right]^T},\r\n\\end{align}\r\n\\end{subequations}\r\nand then the lower bound ${R_{\\rm{b}}^{\\text{L}}}$ can be transformed to\r\n\\begin{align}\r\n &{R_{\\rm{b}}^{\\text{L}}} = - {{\\bf{p}}^T}{{\\bf{v}}_{\\rm{b}}}\\left( {\\bf{p}} \\right)- \\frac{1}{{\\ln 2}}+1.\r\n\\end{align}\r\n\r\nThus, the covert optimization problem is recast as follows\r\n\\begin{subequations}\\label{low_pro 01 2}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\bf{p}} & - {{\\bf{p}}^T}{{\\bf{v}}_{\\rm{b}}}\\left( {\\bf{p}} \\right)\\\\\r\n{\\rm{s}}{\\rm{.t}}. &~- {\\log _2}{{\\bf{p}}^T}{\\bf{t}} \\le 2{\\varepsilon ^2},\\label{low02a}\\\\\r\n &~{{\\bf{p}}^T}{{\\bf{1}}_K} = 1,\\label{low02b}\\\\\r\n &~{\\bf{p}}^T\\left( {{\\bf{x}} \\odot {\\bf{x}}} \\right) \\le {P_{\\rm{A}}},\\label{low02c}\\\\\r\n &~{\\bf{p}} \\ge {\\bf{0}},\\label{low02d}\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Similar} to problem \\eqref{uper_pro 01 2}, we can apply the Frank-Wolf method to solve problem \\eqref{low_pro 01 1}, and the details are omitted.\r\n\r\n\\subsubsection{${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$}\r\n With the covert constraint ${D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$,\r\n the optimal probabilistic constellation shaping for covert communications is given as\r\n\\begin{subequations}\\label{low_pro 10 1}\r\n\\begin{align}\r\n \\mathop { \\max }\\limits_{\\left\\{ {{p_k}} \\right\\}} &~ {R_{\\rm{b}}^{\\text{L}}}\\\\\r\n{\\rm{s}}{\\rm{.t}}.&{D_{\\rm{U}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 2{\\varepsilon ^2},\\label{low10a}\\\\\r\n&\\eqref{lowb}, \\eqref{lowc}, \\eqref{lowd}\\notag.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\nBy replacing the constraint \\eqref{low10a} with \\eqref{constraint_1_2}, we obtain the optimization problem\r\n\\begin{subequations}\\label{low_pro 10 2}\r\n\\begin{align}\r\n\\mathop { \\max }\\limits_{\\bf{p}} & ~ - {{\\bf{p}}^T}{{\\bf{v}}_{\\rm{b}}}\\left( {\\bf{p}} \\right)\\\\\r\n{\\rm{s}}{\\rm{.t}}.&~L\\left( {\\bf{p}} \\right) + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1 \\le 2{\\varepsilon ^2},\\\\\r\n&~\\eqref{low02b}, \\eqref{low02c}, \\eqref{low02d}\\notag.\r\n\\end{align}\r\n\\end{subequations}\r\n\r\n{Similar} to problem \\eqref{uper_pro 10 2}, we can apply a similar method to solve problem \\eqref{low_pro 10 1}.\r\n\r\n\\section{Numerical Results}\r\nIn this section, we present and discuss numerical results to assess the performance of the proposed probabilistic constellation shaping designs. In our simulations, the discrete constellation input is QAM modulation,\r\n the total transmit power of Alice\r\nis ${P_{\\rm{A}}} = 10\\rm{W}$,\r\nthe noise variance of Willie is $\\sigma _{\\rm{w}}^2=1\\rm{W}$, and the noise variance of Bob is $\\sigma _{\\rm{b}}^{\\rm{2}}{\\rm{ = }}\\frac{{{P_{\\rm{A}}}}}{{{{10}^{{{{\\rm{SNR}}} \\mathord{\\left/\r\n {\\vphantom {{{\\rm{SNR}}} {10}}} \\right.\r\n \\kern-\\nulldelimiterspace} {10}}}}}}$. Moreover, we assume that all channels experience Rayleigh flat fading, and $\\sigma _1=\\sigma _2=1$ \\cite{Shahzad2017Covert}.\r\n\r\nWe first compare the proposed optimal probabilistic constellation shaping design with the equiprobable design, starting from the empirical CDF of KL divergence and the rate comparison.\r\n\r\n\\begin{figure}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/cdfp0p1}\r\n \\vskip-0.2cm\\centering {\\footnotesize (a)}\r\n \\end{minipage}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/cdfp1p0}\r\n \\vskip-0.2cm\\centering {\\footnotesize (b)}\r\n \\end{minipage}\\hfill\r\n \\caption{ The empirical CDF of a) $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and (b) $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ with the covertness threshold $2{\\varepsilon ^2} = 0.02$ for the proposed optimal probabilistic constellation shaping design and the equiprobable design. }\r\n \\label{1} \n\\end{figure}\r\nFig. \\ref{1} shows the empirical CDF of the achieved $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$, respectively, for both the proposed optimal probabilistic constellation shaping design and the equiprobable design with ${\\rm{SNR}}=10\\rm{dB}$ and $K=8$, where the covertness threshold is\r\n$2{\\varepsilon ^2} = 0.02$, i.e., $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right) \\le 0.02$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le 0.02$.\r\nAs observed\r\nfrom Fig. \\ref{1}, the proposed optimal probabilistic constellation shaping design satisfies\r\nthe covertness constraint. On the other hand, the equiprobable design\r\ncannot satisfy the covertness constraints.\r\n\r\n\\begin{figure}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/equip01}\r\n \\vskip-0.2cm\\centering {\\footnotesize (a)}\r\n \\end{minipage}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/equip10}\r\n \\vskip-0.2cm\\centering {\\footnotesize (b)}\r\n \\end{minipage}\\hfill\r\n \\caption{ The covert rate of a) $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and (b) $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ for the proposed optimal probabilistic constellation shaping design and the equiprobable design. }\r\n \\label{2} \n\\end{figure}\r\n Fig. \\ref{2} (a) and (b) show the achievable covert rate of Bob ${R_{\\rm{b}}}$ for the proposed optimal probabilistic constellation shaping design and the equiprobable design with $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$, respectively. We observe that the optimal probabilistic constellation shaping design is superior when the signal-to-noise ratio is low. Therefore, in practical applications, the proposed optimal probabilistic constellation shaping design is advantageous in medium and low SNR, and the equiprobable design is suitable for high SNR.\r\n\r\n\r\nNext, we evaluate the performance of the proposed optimal probabilistic constellation shaping design.\r\n\r\n\\begin{figure}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/SNR12k16-2}\r\n \\vskip-0.2cm\\centering {\\footnotesize (a)}\r\n \\end{minipage}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/SNR12K16}\r\n \\vskip-0.2cm\\centering {\\footnotesize (b)}\r\n \\end{minipage}\\hfill\r\n \\caption{ The optimal probability of discrete constellation points with (a) $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and (b) $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ for the proposed optimal probabilistic constellation shaping design.}\r\n \\label{3} \n\\end{figure}\r\nFig. \\ref{3} shows the optimal probability distribution of\r\ninput $\\left\\{{p_{{i,j}}}\\right\\}$ with $\\rm{SNR=12dB}$ of the proposed optimal probabilistic constellation shaping design, for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ in Fig. \\ref{3} (a), and for $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ in Fig. \\ref{3} (b). As it can be seen from Fig. \\ref{3}, for the proposed optimal probabilistic constellation shaping design, the optimal probability distribution is not equiprobable, and the symmetrical points have equal probabilities. Specifically, when the number of discrete constellation points is sixteen, the probability of each constellation point for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ is given in Table I. From the table, we can clearly see that when the coordinates of the constellation points are symmetrical, their probabilities are the same, and vice versa.\r\n\r\n\r\n \\begin{table*}[htbp]\r\n \\centering\r\n \\caption{The optimal probability distribution of the optimal probabilistic constellation shaping design for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ with $\\rm{SNR=12dB}$ and $K=16$ }\r\n \\begin{tabular}{| m{3.5cm}<{\\centering}|m{1cm}<{\\centering} | m{1cm}<{\\centering} | m{1cm}<{\\centering} | m{1cm}<{\\centering}|m{1cm}<{\\centering} | m{1cm}<{\\centering} | m{1cm}<{\\centering} | m{1cm}<{\\centering}|}\r\n \\hline\r\n {\\multirow{2}*{\\diagbox{${\\rm{Im}}\\left\\{{x_{{i,j}}}\\right\\}$}{$\\left\\{{p_{{i,j}}}\\right\\}$}{${\\rm{Re}}\\left\\{{x_{{i,j}}}\\right\\}$}}}\r\n &\\multicolumn{4}{|c|}{$D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$} &\\multicolumn{4}{|c|}{$D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$}\\\\[1.5ex]\r\n \\cline{2-9}\r\n \\multicolumn{1}{|c|}{}&$-3$ &$-1$ &$1$ &$3$ &$-3$ &$-1$ &$1$ &$3$\\\\[0.8ex]\r\n \\hline\r\n-3&$0.0484$&$0.0524$&$0.0524$&$0.0484$&$0.0454$&$0.0502$&$0.0502$&$0.0454$\\\\\r\n\\hline\r\n-1&$0.0524$&$0.0968$&$0.0968$&$0.0524$&$0.0502$&$0.1041$&$0.1041$&$0.0502$\\\\\r\n\\hline\r\n1&$0.0524$&$0.0968$&$0.0968$&$0.0524$&$0.0502$&$0.1041$&$0.1041$&$0.0502$\\\\\r\n\\hline\r\n3&$0.0484$&$0.0552$&$0.0552$&$0.0484$&$0.0454$&$0.0502$&$0.0502$&$0.0454$\\\\\r\n\\hline\r\n \\end{tabular}\r\n \\end{table*}\r\n\r\n\r\n\r\n\r\n\r\n\r\n\\begin{figure}[htpb]\r\n \\centering\r\n\t\\includegraphics[width=8cm]{figures/K-SNR}\r\n \n \\caption{ The achievable covert rate of Bob versus $\\rm{SNR}$ with different number of points $K$ for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$.}\r\n \\label{4}\r\n\\end{figure}\r\nFig. \\ref{4} considers the proposed optimal probabilistic constellation shaping design and depicts, the achievable covert rate of Bob ${R_{\\rm{b}}}$ versus $\\rm{SNR}$ with different number of constellation points\r\n$K = 2, 4, 8, 16$ for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$. It can be seen from\r\nFig. \\ref{4} that when ${\\rm{SNR}}$ increases, the covert rate of Bob ${R_{\\rm{b}}}$ increases.\r\n{{In addition, we observe that larger number of points $K$ results in higher covert rate of Bob ${R_{\\rm{b}}}$, especially for high SNRs. Thus, as the modulation order increases, the rate of covert communication increases.}}\r\n\r\n Fig. \\ref{5} shows the covert rate versus $\\varepsilon$ for the proposed optimal probabilistic constellation shaping design for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$, where $K=8$, ${\\rm{SNR}}=6{\\rm{dB}}$. It can be observed from the figure that as $\\varepsilon$ increases, the covert constraint becomes loose, resulting in a covert rate increase. The rate of the covert constraint $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$ is higher than that of the case $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$. This is because $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ is less than $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ for the same probability distribution, and thus $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$ is more stringent than $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\\le 2{\\varepsilon ^2}$.\r\n\\begin{figure}[htpb]\r\n \\centering\r\n\t\\includegraphics[width=8cm]{figures/varepsilon}\r\n \n \\caption{ The covert rate versus $\\varepsilon$ for the proposed optimal probabilistic constellation shaping design with the cases of $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$.}\r\n \\label{5}\r\n\\end{figure}\r\n\r\nFinally, we compare the performance and complexity of three objective functions.\r\n\r\n\\begin{figure}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/SNR-p0-p1}\r\n \\vskip-0.2cm\\centering {\\footnotesize (a)}\r\n \\end{minipage}\r\n \\begin{minipage}[htbp]{0.45\\textwidth}\r\n \\centering\r\n \\includegraphics[height=7.5cm,width=7.5cm]{figures/SNR-p1-p0}\r\n \\vskip-0.2cm\\centering {\\footnotesize (b)}\r\n \\end{minipage}\\hfill\r\n \\caption{ The achievable rate of Bob versus $\\rm{SNR}$ for (a) $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and (b) $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ with the covertness threshold $2{\\varepsilon ^2} = 0.1$.}\r\n \\label{6} \n\\end{figure}\r\nFig. \\ref{6} depicts the achievable rate of Bob ${R_{\\rm{b}}}$ with the proposed optimal probabilistic constellation shaping design, as well as the objective functions ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$ versus the ${\\rm{SNR}}$ for the case of $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ and $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$, respectively.\r\n It can be observed that the mutual information of Bob ${R_{\\rm{b}}}$ increases as\r\nthe ${\\rm{SNR}}$ increases, while ${R_{\\rm{b}}}$ of the proposed optimal probabilistic constellation shaping design is between the objective functions ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$, and the objective function ${R_{\\rm{b}}^{\\text{U}}}$ is higher than\r\n the objective function ${R_{\\rm{b}}^{\\text{L}}}$.\r\n\r\nFrom Fig. \\ref{6} we observe that the proposed optimal probabilistic constellation shaping design is between the objective functions ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$. Here, we compare the computational complexity of the three designs by computational time\r\nin Table II, and all simulations of the three methods are performed using MATLAB 2016b with 2.30GHz, 2.29GHz dual CPUs and a 128GB RAM, where $K=8$.\r\nSpecifically, Table III shows that the computational time of objective functions ${R_{\\rm{b}}}$, ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$ for the covert constraint condition $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ is 55.04, 2.845 and 3.012 seconds, respectively. The computational times of the latter two cases is approximately 95 percent shorter than that of the probabilistic constellation shaping design.\r\nUnder the covert constraint condition $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ the computational time of objective functions ${R_{\\rm{b}}}$, ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$ is 107.77, 3.219 and 3.438 seconds, respectively. The computational times of the latter two cases is approximately improved by 97 percent compared to that of the probabilistic constellation shaping design.\r\n Moreover,\r\n the computational time of the design for $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ is less than that of $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$.\r\n\\begin{table}[htbp]\r\n \\centering\r\n \\caption{ Computational time comparison among the objective functions ${R_{\\rm{b}}}$, ${R_{\\rm{b}}^{\\text{L}}}$ and ${R_{\\rm{b}}^{\\text{U}}}$}\r\n \\begin{tabular}{|c|c|c|c|c|}\r\n \\hline\r\n \\diagbox{Constraint}{Time/second}{Objective}&${R_{\\rm{b}}}$ &${R_{\\rm{b}}^{\\text{L}}}$ &${R_{\\rm{b}}^{\\text{U}}}$ \\\\\r\n \\hline\r\n$D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ &$55.04$&$2.845$&$3.012$\\\\\r\n\\hline\r\n$D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$&$107.77$&$3.219$&$3.438$\\\\\r\n\\hline\r\n \\end{tabular}\r\n \\end{table}\r\n\r\n\r\n\\section{Conclusions}\r\nIn this paper, we propose an optimal probabilistic constellation shaping design for covert communications, where Alice\r\ncovertly sends a message to Bob while avoiding\r\nbeing discovered by Willie. We derive the achievable\r\nrate expressions of the covert communications system, and we study the covert rate maximization problem via\r\noptimizing the constellation distribution.\r\nIn addition, to strike a balance between the computational complexity and\r\nthe transmission performance,\r\nwe further develop a framework\r\nthat maximizes the upper and lower bounds of the achievable rate.\r\nNumerical\r\nresults quantify the gains of the proposed beamformers\r\ndesign over state-of-the-art schemes in terms of the achievable covert rate.\r\n\r\n\r\n\r\n\\begin{appendices}\r\n\\section{Derivation of the formulation \\eqref{KL_p0p1_ub}}\r\n The upper bound ${D_{{\\rm{U}}}}\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ on the KL divergence $D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)$ is derived as follows\r\n\\begin{subequations}\r\n\\begin{align}\r\n&D\\left( {{p_{y,0}}\\left\\| {{p_{y,1}}} \\right.} \\right)\r\n\\le - \\frac{1}{{\\ln 2}} \\nonumber\\\\\r\n& - {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - {\\mathbb{E}_{{z_{\\rm{w}}}}}\\left\\{ {\\frac{{{{\\left| {{z_{\\rm{w}}} - {g_{\\rm{w}}^{\\rm{*}}}{x_k}} \\right|}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right\\}} \\right)\\label{Aa}\\\\\r\n &= - \\frac{1}{{\\ln 2}} - {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - \\frac{{{\\mathbb{E}_{{z_{\\rm{w}}}}}\\left\\{ {z_{\\rm{w}}^2} \\right\\} + {{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right)\\\\\r\n & = - \\frac{1}{{\\ln 2}} - {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - 1- \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right)\\\\\r\n & = - \\frac{1}{{\\ln 2}} - \\left( { - \\frac{1}{{\\ln 2}} + {{\\log }_2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right)\\\\\r\n & = - {\\log _2}\\sum\\limits_{k = 1}^K {{p_k}} \\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}{x_k}} \\right|}^{\\rm{2}}}}}{{\\sigma _{\\rm{w}}^2}}} \\right),\r\n\\end{align}\r\n\\end{subequations}\r\n where inequality \\eqref{Aa} holds due to Jensen's Inequality.\r\n\r\n\\section{Derivation of the formulation \\eqref{KL_p1p0_ub} }\r\nThe upper bound ${D_{{\\rm{U}}}}\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ of the KL divergence $D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right)$ is given as\r\n\\begin{subequations}\r\n\\begin{align}\r\n&D\\left( {{p_{y,1}}\\left\\| {{p_{y,0}}} \\right.} \\right) \\le \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\nonumber\\\\\r\n& \\times {\\mathbb{E}_{{z_{\\rm{w}}}}}\\left\\{ {\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}\\left( {{x_k} - {x_j}} \\right) + {z_{\\rm{w}}}} \\right|}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right\\}\r\n+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}\\label{Ba}\\\\\r\n& =\\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} {\\mathbb{E}_{{z_{{\\rm{w}},R}}}}\\left\\{ {\\exp \\left( { - \\frac{{{{\\left( {{c_R} + {z_{{{\\rm{w}},R}}}} \\right)}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right\\}\\nonumber\\\\\r\n & \\times{\\mathbb{E}_{{z_{{\\rm{w}},I}}}}\\left\\{ {\\exp \\left( { - \\frac{{{{\\left( {{c_I} + {z_{{\\rm{w}},I}}} \\right)}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right\\}\r\n+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}\\label{Bb}\r\n \\end{align}\r\n\\begin{align}\r\n & = \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\left[ {\\int\\limits_{{\\rm{ - }}\\infty }^\\infty {\\frac{{d{z_{{\\rm{w}},R}}}}{{\\sqrt \\pi {\\sigma _{\\rm{w}}}}}\\exp \\left( { - \\frac{{{{\\left( {{c_R} + {z_{{\\rm{w}},R}}} \\right)}^2}}}{{\\sigma _{\\rm{w}}^2}}} \\right.} } \\right.\\nonumber\\\\\r\n& \\left. {\\left. { + \\frac{{z_{{\\rm{w}},R}^2}}{{\\sigma _{\\rm{w}}^2}}} \\right)} \\right] \\times \\left[ {\\int\\limits_{{\\rm{ - }}\\infty }^\\infty {\\frac{{d{z_{{\\rm{w}},I}}}}{{\\sqrt \\pi {\\sigma _{\\rm{w}}}}}\\exp \\left( { - \\frac{{{{\\left( {{c_R} + {z_{{\\rm{w}},I}}} \\right)}^2} + z_{{\\rm{w}},I}^2}}{{\\sigma _{\\rm{w}}^2}}} \\right)} } \\right]\\nonumber \\\\\r\n &+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}\r\n \\end{align}\r\n\\begin{align}\r\n & = \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\left[ {\\frac{{\\rm{1}}}{{\\sqrt \\pi {\\sigma _{\\rm{w}}}}}\\frac{1}{2}\\sqrt {\\frac{{\\sigma _{\\rm{w}}^2\\pi }}{2}} \\exp \\left( { - \\frac{{c_R^2}}{{2\\sigma _{\\rm{w}}^2}}} \\right)2} \\right]\\nonumber\\\\\r\n& \\times \\left[ {\\frac{{\\rm{1}}}{{\\sqrt \\pi {\\sigma _{\\rm{w}}}}}\\frac{1}{2}\\sqrt {\\frac{{\\sigma _{\\rm{w}}^2\\pi }}{2}} \\exp \\left( { - \\frac{{c_I^2}}{{2\\sigma _{\\rm{w}}^2}}} \\right)2} \\right]\r\n+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}} \\\\\r\n & = \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\frac{1}{2}\\exp \\left( { - \\frac{{c_R^2 + c_I^2}}{{2\\sigma _{\\rm{w}}^2}}} \\right)+ \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}} \\\\\r\n &= \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\frac{1}{2} + \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{w}}^2}}} \\right)\\nonumber\\\\\r\n& + \\frac{1}{{\\ln 2}} + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}} \\\\\r\n&= \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{w}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{w}}^2}}} \\right)+ \\frac{1}{{\\ln 2}}\\nonumber\\\\\r\n& + \\frac{\\left|{{ {{g_{\\rm{w}}^2}}}{P_{\\rm{A}}}}\\right|}{{\\left( {\\ln 2} \\right)\\sigma _{\\rm{w}}^2}}-1,\r\n\\end{align}\r\n\\end{subequations}\r\n where inequality \\eqref{Ba} is true due to Jensen's Inequality. Equality \\eqref{Bb} holds because of the definitions ${z_{{\\rm{w}},R}} \\buildrel \\Delta \\over = {\\rm{Re}}\\left\\{{{z_{\\rm{w}}}} \\right\\}$ and\r\n${z_{{\\rm{w}},I}}\\buildrel \\Delta \\over = {\\rm{Im}}\\left\\{{{z_{\\rm{w}}}} \\right\\}$, ${z_{{\\rm{w}},R}},{z_{{\\rm{w}},I}} \\sim {\\cal{N}}\\left( {0,\\frac{1}{2}\\sigma _{\\rm{w}}^2} \\right)$, and ${c_R}\\buildrel \\Delta \\over ={\\mathop{\\rm Re}\\nolimits} \\left( {{g_{\\rm{w}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right)$, ${c_I}\\buildrel \\Delta \\over ={\\mathop{\\rm Im}\\nolimits} \\left( {{g_{\\rm{w}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right)$.\r\n\r\n\r\n \\section{Derivation of the formulation \\eqref{mutual_I ub}}\r\nThe upper bound ${{R_{\\rm{b}}^{{\\rm{U}}}}}$ of the covert rate ${R_{\\rm{b}}}$ is derived as follows\r\n\\begin{subequations}\r\n\\begin{align}\r\n&{R_{\\rm{b}}}\\le - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\nonumber\\\\\r\n& \\times \\exp \\left( { - {\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {\\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}\\left( {{x_k} - {x_j}} \\right) + {z_{\\rm{b}}}} \\right|}^2}}}{{\\sigma _b^2}}} \\right\\}} \\right)\r\n- \\frac{1}{{\\ln 2}}\\label{Ca}\\\\\r\n & = - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}} - \\frac{{{\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {\\left|z_{\\rm{b}}\\right|^2} \\right\\}}}{{\\sigma _{\\rm{b}}^2}}} \\right)\\nonumber \\\\\r\n& - \\frac{1}{{\\ln 2}}\\\\\r\n & = - \\frac{1}{{\\ln 2}} - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}} - 1} \\right)\r\n \\end{align}\r\n\\begin{align}\r\n& = - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)\\exp \\left( { - 1} \\right)\\nonumber \\\\\r\n&- \\frac{1}{{\\ln 2}} \\\\\r\n & = - \\sum\\limits_{k = 1}^K {{p_k}} \\left( {{{\\log }_2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right) - \\frac{1}{{\\ln 2}}} \\right)\\nonumber\\\\\r\n& - \\frac{1}{{\\ln 2}}\r\n \\end{align}\r\n\\begin{align}\r\n& = - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\exp \\left( { - \\frac{{{{\\left| {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right|}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right),\r\n\\end{align}\r\n\\end{subequations}\r\nwhere inequality \\eqref{Ca} is true due to Jensen's Inequality.\r\n\r\n\r\n \\section{Derivation of the formulation \\eqref{mutual_I lb}}\r\nThe lower bound ${{R_{\\rm{b}}^{{\\rm{L}}}}}$ on the covert rate ${R_{\\rm{b}}}$ is given as\r\n\\begin{subequations}\r\n\\begin{align}\r\n&{R_{\\rm{b}}} \\ge - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\nonumber\\\\\r\n& \\times {\\mathbb{E}_{{z_{\\rm{b}}}}}\\left\\{ {\\exp \\left( { - \\frac{{{{\\left( {g_{\\rm{b}}^{\\rm{*}}\\left( {{x_k} - {x_j}} \\right) + {z_{\\rm{b}}}} \\right)}^2}}}{{\\sigma _{\\rm{b}}^2}}} \\right)} \\right\\}\r\n- \\frac{1}{{\\ln 2}} \\label{Da}\\\\\r\n& = - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}} \\left[ {\\int\\limits_{{\\rm{ - }}\\infty }^\\infty {\\frac{{{d{z_{{\\rm{b}},R}}}}}{{\\sqrt \\pi {\\sigma _{\\rm{b}}}}}\\exp \\left( { - \\frac{{{{\\left( {{a_R} + {z_{{\\rm{b}},R}}} \\right)}^2} + z_{{\\rm{b}},R}^2}}{{\\sigma _{\\rm{b}}^2}}} \\right)} } \\right]\\nonumber\\\\\r\n& \\times\\left[ {\\int\\limits_{{\\rm{ - }}\\infty }^\\infty {\\frac{{{d{z_{{\\rm{b}},I}}}}}{{\\sqrt \\pi {\\sigma _{\\rm{b}}}}}\\exp \\left( { - \\frac{{{{\\left( {{a_R} + {z_{{\\rm{b}},I}}} \\right)}^2} + z_{{\\rm{b}},I}^2}}{{\\sigma _{\\rm{b}}^2}}} \\right)} } \\right]- \\frac{1}{{\\ln 2}}\\label{Db}\\\\\r\n&= - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\left[ {\\frac{{\\rm{1}}}{{\\sqrt \\pi {\\sigma _{\\rm{b}}}}}\\frac{1}{2}\\sqrt {\\frac{{\\sigma _{\\rm{b}}^2\\pi }}{2}} \\exp \\left( { - \\frac{{a_R^2}}{{2\\sigma _{\\rm{b}}^2}}} \\right)2} \\right]\\nonumber\\\\\r\n& \\times\\left[ {\\frac{{\\rm{1}}}{{\\sqrt \\pi {\\sigma _{\\rm{b}}}}}\\frac{1}{2}\\sqrt {\\frac{{\\sigma _{\\rm{b}}^2\\pi }}{2}} \\exp \\left( { - \\frac{{a_I^2}}{{2\\sigma _{\\rm{b}}^2}}} \\right)2} \\right] - \\frac{1}{{\\ln 2}}\\\\\r\n& = - \\frac{1}{{\\ln 2}} - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\frac{1}{2}\\exp \\left( { - \\frac{{a_R^2 + a_I^2}}{{2\\sigma _{\\rm{b}}^2}}} \\right) \\\\\r\n\n & =- \\frac{1}{{\\ln 2}}+1 - \\sum\\limits_{k = 1}^K {{p_k}} {\\log _2}\\sum\\limits_{j = 1}^K {{p_j}}\\exp \\left( { - \\frac{{{{\\left| {g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right) \\right|}^{\\rm{2}}}}}{{2\\sigma _{\\rm{b}}^2}}} \\right),\r\n\\end{align}\r\n\\end{subequations}\r\nwhere inequality \\eqref{Da} is true due to Jensen's inequality, equality \\eqref{Db} holds because of the definitions ${z_{{\\rm{b}},R}} \\buildrel \\Delta \\over = {\\rm{Re}}\\left\\{{{z_{\\rm{b}}}} \\right\\}$, ${z_{{\\rm{b}},I}} \\buildrel \\Delta \\over = {\\rm{Im}}\\left\\{{{z_{\\rm{b}}}} \\right\\}$,\r\n${z_{{\\rm{b}},R}},{z_{{\\rm{b}},I}} \\sim {\\cal{N}}\\left( {0,\\frac{1}{2}\\sigma _{\\rm{b}}^2} \\right)$, and ${a_R} \\buildrel \\Delta \\over = {\\mathop{\\rm Re}\\nolimits} \\left( {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right) $, ${a_I} \\buildrel \\Delta \\over = {\\mathop{\\rm Im}\\nolimits} \\left( {{g_{\\rm{b}}^{\\rm{*}}}\\left( {{x_k} - {x_j}} \\right)} \\right) $.\r\n\r\n\r\n\r\n\r\n\r\n \\end{appendices}\r\n\r\n\\bibliographystyle{IEEE-unsorted}\r\n", "meta": {"timestamp": "2022-08-31T02:09:30", "yymm": "2208", "arxiv_id": "2208.14027", "language": "en", "url": "https://arxiv.org/abs/2208.14027"}} {"text": "\\section{Introduction}\\label{introduction}\n\n\n The rapid increase in the usage of artificial intelligence (AI) in critical applications has brought about a need to consider the ethics of how AI is used, and, whether it would make the right choice while encountering ethical dilemmas \\cite{cervantes_artificial_2020, formosa_making_2021}. The ethics of AI usage has been studied extensively by lawyers, philosophers, and technologists to develop policies to account for the ethical implications of an AI application. However, the development of inherent ethics in AI is still in its infancy; it has been discussed and debated in the last couple of decades \\cite{allen_artificial_2005, fisher_engineering_2016, gips_towards_2011}, resulting in few real-world implementations \\cite{tolmeijer_implementations_2021, stenseke_artificial_2021}. This question of how to make AI inherently ethical falls under the umbrella of \\textit{machine ethics}, often referred to as \\textit{artificial morality} or \\textit{AI alignment}. The majority of the frameworks discussed in machine ethics are based on rule-based (deontological), or consequentialist ethical theories \\cite{tolmeijer_implementations_2021}. In deontological implementations, an artificial agent abides by a set of rules which dictate its action, regardless of what happens as a result of this action. On the other hand, a consequentialist agent tends to focus on the utility value as a deciding factor of the goodness of an action. While there are advantages from using these theories, they have shortcomings, and we argue how these could be overcome by using virtue ethics.\n \n Virtue ethics centers morality on an individual's character, an individual who behaves such that he/she exercises virtues to manifest the character of a virtuous person. Generosity, truthfulness and bravery are examples of virtues; a virtuous person will know how to balance the extremes of these virtues, by striving towards a \\textit{golden mean}. An important advantage is that a virtuous person strives to make better choices when similar situations present themselves in the future. We posit that the trait of life-long learning in virtue ethics makes it compatible with modern day artificial intelligence, whose goal is also to learn from experiences to give the best possible predictions given its environment or data.\n \n There are several implementations of the dominant ethical theories mentioned above. However, these have been developed by demonstration on toy examples and very specific problems \\cite{cervantes_artificial_2020}. To expand the conversation and to apply these theories in more general scenarios, we propose to seek inspiration from the world of gaming, in particular role-playing games that compel a player to make ethical choices. Some examples of such games are \\textit{Witcher 3} \\cite{red_witcher_2015}, Fallout \\cite{bethesda_game_studios_fallout_2015}, \\textit{Batman: The Telltale Series} \\cite{telltale_games_batman_2016} and \\textit{Papers, Please} \\cite{lucas_pope_papers_2013}. These video games are usually based on a mechanism where game-play is dictated by the players' choices. One such mechanism follows a scripted approach, where the developer handcrafts moral dilemmas based on the story-line of the game \\cite{formosa_papers_2016}. A systemic approach, where the player makes choices where, initially, there are no specific moral dilemmas; but as the story unfolds, the dilemmas become apparent \\cite{tancred_player_2018}. \n \n With respect to implementation of virtues, previous works \\cite{berberich_virtuous_2018,abel_reinforcement_2016, stenseke_artificial_2021} have advocated for the RL paradigm because it fits well with virtue ethics, since an agent can learn behaviour from experience. We motivate the use of affinity-based RL where agents can be incentivized to learn virtues by modifying the objective function using policy regularization \\cite{maree_reinforcement_2022}, rather than designing the reward function itself. And since virtue ethics involves performing the right action in the right situation for the right reasons \\cite{annas_virtue_2007}, we also highlight the importance of interpretability, especially since we opt for the usage of opaque deep neural networks.\n \n In the subsequent sections, we will discuss state-of-the-art machine ethics, and make the case of artificial virtuous agents as a viable alternative to the dominant theories. Next, we review the literature from role-playing games which integrate aspects from ethics and morality, in particular, we will discuss \\textit{Papers, please}. Finally, we explain how systemic environments in role-playing games can be used to train artificial agents to develop virtues, and, how RL can be leveraged to train such agents.\n\n\\section{Background and Related Work} \\label{virtue}\nMost of the machine ethics literature \\cite{cervantes_artificial_2020} refers to artificial agents with inherent morals as \\textit{artificial moral agents} (AMA). In this section, we introduce artificial morality and argue for the development of artificial virtuous agents (AVA), where an artificial agent reasons in terms of virtues instead of labelling an act as \\textit{right} or \\textit{wrong}. We first talk about the current implementations of AMAs, then introduce virtue ethics as an alternative paradigm, and finally make the case for AVAs.\n\n \\subsection{Artificial Morality}\n In machine ethics, the conversation revolves mainly around morality: whether an artificial agent's choice is right or wrong. If an agent violates certain rules or fails to meet certain standards, it is said to be morally wrong. A famous example of rules for moral agents is Asimov's Laws, which formulates a set of laws that a robot must never violate. This approach is inspired by deontological ethics \\cite{alexander_deontological_2021}, where the right actions are chosen based on specific rules regardless of consequences of the action. In contrast, the utilitarians believe that the action which has the best consequences are moral; e.g., the action with the maximum pleasure and minimum pain. For example, a utilitarian might prioritize the needs of the majority over that of the few through utility maximization. For a computer or an artificial agent, following rules or calculating the best consequence is straightforward; this may be one of the reasons why most of the implementations in machine ethics are based on the deontological and consequentialist ethics \\cite{tolmeijer_implementations_2021}.\n \n Approaches to machine ethics include top-down, bottom-up and hybrid approaches \\cite{allen_artificial_2005}. As the name suggests, a top-down approach defines a set of rules for an artificial agent to follow. The environment gives no feedback for learning; the rules are presumed to be adequate for ensure an agent\u2019s moral behavior. Bottom-up approaches are preferred, in the sense that they allow for the agent to learn and adapt to new situations, while not having much control over how learning happens. This coincides with the premise of the use of machine learning: it is the preferred system design paradigm when not all future situations can be defined and thus accounted for during the design phase. Lastly, a hybrid approach strives to integrate the strengths of top-down and bottom-up approaches while mitigating their respective weaknesses. See \\cite{cervantes_artificial_2020} and \\cite{tolmeijer_implementations_2021} for reviews on machine ethics implementations based on their approaches.\n\n It is still early days for this field; while there have been several attempts to develop machine ethics systems, the challenges relating to machine ethics have not yet been adequately addressed. The disagreements among scientists and philosophers about ethical artificial intelligence design have not yet been resolved. Therefore, there is no obvious direction for the research to proceed in. Some may go as far to claim that the state of the art AI cannot be ethical, either because artificial agents lack moral agency or because they did not program themselves \\cite{hew_artificial_2014}. Deontological or consequentialist ethics, are thus unable to circumvent the limitations posed by these ethical theories and the dominance of these ethical theories mean that most implementations are top-down \\cite{tolmeijer_implementations_2021} or hybrid, rather than bottom-up. Given this current state, we suggest \\textit{virtue ethics} as a strong bottom-up alternative.\n\n \\subsection{Virtue Ethics}\n In his classic, The Nicomachean Ethics \\cite{ross_oxford_1980}, Aristotle defined virtues as an excellent trait of character that enables a person to perform the right actions in the right situations for the right reasons. A person can behave virtuously in a given situation by asking themselves: \u201cWhat would a virtuous person do in the same situation?\u201d. Such a person practices virtues by habituation, thus striving towards excellence in character. According to Aristotle, a child or a young person is inexperienced and thus lacks the wisdom to make virtuous decisions. However, with learning experiences from consistent practice of virtues, the youth will exhibit practical wisdom (phronesis).\n\n In virtue ethics, virtues are central and practical wisdom is a must, thus providing a framework to achieve eudaimonia, which translates to flourishing or happiness. Eudaimonia may sound like the utilitarian pleasure, but it is not. Eudaimonia refers to well-being of the individual and the overall society; it thus differs from consequentialist thinking \\cite{foot_virtues_2002}. Unlike a consequentialist, a virtuous person does not practice virtues for the sake of eudaimonia, but for the sake of the virtues in themselves. Some examples of virtues are honesty, bravery, and temperance. Another feature of a virtue is that there are often no absolute right or wrong actions in a given situation; a virtue is exercised in degree. A virtuous person knows to live by this golden mean, while a non-virtuous person might not find that balance. For example, a brave person would exercise the right amount of bravery required for a situation (golden mean), rather than being absolutely cowardly or reckless. This is unlike deontological ethics, where an action is deemed right or wrong based on its adherence to pre-defined rules.\n\n We propose that virtue ethics is the more felicitous ethical theory. For instance, consequentialism is about maximizing net utility of a given situation. As a result of the utility oriented approach, an action may favor the majority at the cost of the few. In such situations, a deontologist may vehemently disagree with the consequentialist means to such an end; to deontologists, the end is irrelevant but the means to such ends is vital. The means of such actions based on universal norms are said to be of moral worth. \u201cAlways speak the truth\u201d is an example of a deontological norm, where speaking the truth must be the means, regardless of the end. While universal norms may inform moral behaviour, opponents of deontology may point out that we cannot define rules for every single situation; it is practically impossible. A bottom-up approach of learning and improving, may thus, offer a viable alternative paradigm, and this is where virtue ethics will be relevant \\cite{macintyre_after_2007}.\n\n Moor \\cite{moor_nature_2006} distinguished artificial agents into four different levels: ethical impact agents (e.g., ATM machines), implicit ethical agents (e.g., airplane auto-pilot), explicit ethical agents (e.g., ethical knowledge and reasoning), and fully ethical agents (e.g., humans). Aristotelean virtue ethical agents are implicit ethical agents, which are agents that come preloaded with safety features, e.g. airplane autopilots. An issue with such a categorization is that Moor \\cite{moor_nature_2006} excludes the possibility that a virtuous agent may have the ability to learn from experience and exhibit virtues based on the situation. We thus place AVAs within the categorization of explicit ethical agents which are presumably able to gather knowledge and reason about ethical behavior.\n\n\\subsection{Related Works: Artificial Intelligence and Virtues}\nVirtue ethics was resurrected in a powerful piece by Elizabeth Anscombe \\cite{anscombe_modern_2022} in 1958, where she highlighted the weaknesses in contemporary ethics. Thereafter, philosophers such as Foot \\cite{foot_virtues_2002}, MacIntyre \\cite{macintyre_after_2007} and Hursthouse \\cite{hursthouse_virtue_2001} followed suit to develop a modern account of virtue ethics. In parallel, virtue ethics was introduced in the form of teleology (central to Aristotelean ethics) developed in cybernetics during the mid-twentieth century \\cite{rosenblueth_behavior_1943, berberich_virtuous_2018}. Artificial intelligence developed around this time in the form of symbolic AI and the scientific conversation started to expand to value alignment \\cite{wiener_moral_1960}. \n \nThe symbolic, rule-based AI was followed by the connectionist architectures. The former was heavily criticized by Dreyfus \\cite{dreyfus_what_1992} for being limited in its learning and perception; however, he was sympathetic towards connectionist architectures, and motivated the development of a Heideggerian AI. The rebirth of virtue ethics, and the birth of artificial intelligence followed by value alignment, may seem like a coincidence, but it was not. A manuscript titled \\textit{Android arete} \\cite{coleman_android_2001}, a name given to virtuous machines inspired from the Greek word for virtues (arete) used by Aristotle \\cite{ross_oxford_1980}, spoke about machines and possible virtues they can exhibit; this is a good point of departure towards artificial virtues in intelligent systems. In this context, Berberich and Diepold \\cite{berberich_virtuous_2018} took inspiration from Aristotelean virtue ethics, where they drew parallels with lifelong learning in virtue ethics and the RL paradigm. They define how virtues such as temperance and friendship can be realised in contemporary AI.\n \nStenseke \\cite{stenseke_artificial_2021} argued further and advocated for a connectionist approach towards realisation of artificial virtues where, depending on the application of the ethical agent, dedicated neural networks for specific virtues can combine to form an artificial virtuous agent. Such architectures, inspired by cognitive science and philosophy, serve to motivate research in and progression towards virtues approaches of machine ethics to address formalization, codifiability, and resolution of ethical dilemmas within the virtue ethics framework. While \\cite{stenseke_artificial_2021} defined a framework, we propose an alternative paradigm to the above architecture, and the use of role-playing game environments to train artificial virtuous agents. In the following sections, we shed further light on our hypothesis.\n\n\\section{Design of Games with Ethical Dilemmas} \\label{design}\nIn this section, we explore morality in games and look at some examples of how these can be used to invoke moral reasoning in players. Video games, especially role-player games, that force players to make difficult choices in moral dilemmas have become widespread. For example, \\textit{Witcher 3} \\cite{red_witcher_2015}, \\textit{Batman: The Telltale series} \\cite{telltale_games_batman_2016} and \\textit{Life is Strange} \\cite{dotnod_entertainment_life_2022} have become popular for enabling moral engagement among players \\cite{nay_meaning_2017} \\cite{tancred_player_2018}. We will briefly discuss how these games are designed to invoke moral engagement and go through examples of games such as \\textit{Papers, Please} (PP) \\cite{lucas_pope_papers_2013}.\n\\subsection{Mechanisms of Choices and Narratives}\n\\textit{Ultima IV: The Quest of the Avatar} \\cite{origin_systems_ultima_1985} was one of the earliest role-playing computer games. It featured player choices based on virtues such as compassion, honor, humility, etc. \\cite{zagal_ethically_2009}. In this game, a player is successful when he/she consistently makes virtuous choices; failure to do so brings with it undesirable consequences. \\textit{Ultima IV} is based on scripted choices, where the developer has designed sophisticated scenarios to test whether the right virtues are exercised.\n\nToday, video games with moral dilemmas following a scripted narrative are the most popular. For instance, in \\textit{Batman: The Telltale Series}, the player assumes the role of Batman. A series of interactions with non-playing characters (NPCs) is followed by the player's selection of dialogue. This choice determines the reaction of the NPC and how subsequent scenes are presented. Overall, the game follows a \\textit{linear} narrative with \\textit{scripted} choices, since the ending is the same regardless of the player's choices. The alternative to linear narratives is the branching narrative with different endings possible within the game. Examples of \\textit{branching} narratives are Fallout 4 and PP \\cite{lucas_pope_papers_2013}. However, unlike \\textit{Fallout 4}, where choices are hardcoded by the developer, PP is based on \\textit{systemic} choices presented to the player, where the ethical considerations within the game become evident as the game progresses \\cite{formosa_papers_2016}. Below, we analyze the game mechanism in PP to understand why systemic choices in moral dilemmas are interesting.\n\\subsection{Case Study: Papers, Please}\nIn PP, the player assumes the role of an immigration officer whose job is to assess documents and decide whether the entrant is legal or illegal (Fig \\ref{fig:pp}). For each correct evaluation, the player is rewarded, but for an incorrect decision, they are penalized. The reward takes the form of salary, which is then used to pay the rent and cover other family expenses. If the player does not make the correct decisions as an officer, the family gets sick and hungry, and eventually a family member dies. If the player has no family members left, then the game is over. Also, there is the dichotomy between loyalty and justice: the player could choose to take bribes from illegal entrants, thus increasing their income. At the same time, these illegal entrants might be spies sent by revolutionaries trying to overthrow the ruling government. For more details, see \\cite{formosa_papers_2016}.\n\n \\begin{figure}[ht!]\n \\centering\n \\includegraphics[width=0.6\\linewidth]{pp.png}\n \\caption{An example scenario from \\textit{Papers, Please}, where the player looks at multiple documents to make a decision on whether to allow or reject the entrant. Source: \\cite{formosa_papers_2016}}\n \\label{fig:pp}\n \\end{figure}\n\nPrior to Formosa et al. \\cite{formosa_papers_2016}, Heron et al. \\cite{heron_you_2014} wrote a critique of scripted approaches and how PP is a refreshing deviation from the plethora of script oriented games. Farmosa et al. \\cite{formosa_papers_2016} then analyzed the inner mechanisms where the impact of scripted and systemic approaches are distinguished along four dimensions: moral focus, sensitivity, judgement and action. These dimensions are based on the four component model in moral psychology and education \\cite{narvaez_moral_2008}. However, since our focus is on game mechanisms rather than a player's moral engagement, we refrain from discussing the model details; instead, we examine the systemic and scripted approaches and their impact on moral choices. We summarize the ethical dimensions within PP below:\n \n\\begin{itemize}\n \\item \\textbf{Dehumanization}: performing document checks for an extended period of time can challenge the human element in the game, thus affecting how a player assesses entrants.\n \\item \\textbf{Privacy}: The use of X-ray on the entrants to check for their gender or weapons might unnecessarily violate privacy.\n \\item \\textbf{Fairness}: An important aspect of the game, which allows a player to bend the rules for humane reasons. This makes the game more interesting.\n \\item \\textbf{Loyalty}: Whether the player is loyal to the country, their family or themselves.\n \\end{itemize}\n These moral aspects of PP become evident as we play the game, which is characteristic of a systemic approach. For example, only after processing around 30 entrants at the immigration office, the officer's loyalty is tested, where a spy asks to enter the country to overthrow the current corrupt regime. The player (officer) will assess their situation based on their finances, family situation and job, and all these aspects develop in the game over time.\n \n Formosa et al. \\cite{formosa_papers_2016} also highlight the pros and cons of systemic approaches. While systemic approaches allow morality to arise from the aggregation of choices made over a period of time where players are expected to explore moral themes, they prevent the formulation of apparent ethical problems. For example, a player who is presented with a single instance of having to choose between the interests of the ruling party and the country\u2019s safety and security may not be aware of the high-stakes nature of the decision; but a sequence of many such choices will make this obvious. While this may be considered as a disadvantage, it can be an advantage where such deep exploration of ethics may encourage a player to develop creative solutions to these problems.\n\n\n\\section{Development of Virtues through Games} \\label{develop}\nThis section aims to briefly demonstrate how artificial virtues can be brought about using a systemic approach in role-playing game environment and how virtues could be implemented using deep RL methods. We bring together the various concepts discussed in Sections \\ref{virtue} and \\ref{design}, by outlining possible ways to design a suitable environment, to solve such environments, and finally, explain their decisions.\n\\subsection{Environment Design}\nSince we aim to design an environment, a starting point could be to ponder about how we would judge a player (X) as being virtuous. We might observe how X responds to different situations, or perhaps a series of ethical dilemmas that gives us an impression that X is either \\textit{just}, \\textit{truthful} or \\textit{courageous}, for example, on a consistent basis. By ethical dilemmas, we do not refer to extreme dilemmas, such as the trolley problem or \\textit{Sophie's Choice}. Instead, we consider situations in everyday life, such as choosing between individual and collective goals when there is a conflict between the two. Such scenarios can be witnessed in some of the games discussed earlier. By presenting similar sequential dilemmas, we hypothesize that an artificial agent can learn to be virtuous in such environments.\n\nTraining an artificial agent to play a \\textit{linear} narrative with \\textit{scripted} player choices is straightforward for, say, a utilitarian RL agent. We need to think about a state-space complex enough to bring about learning and, at the same time, introduce moral dilemmas into the environment. Hence, a \\textit{branching} narrative with \\textit{systemic} player choices will ensure complexity of the state space. For example, in PP an artificial agent might process dozens of immigrants and as the game progresses, encounters dilemmas that test virtues such as loyalty and honesty. And through repeated encounters with such dilemmas, the agent is incentivized to develop an inclination towards specific virtues.\n\nIn addition to the branching narrative, the ability to go back in time and redo the choices make a game more sophisticated and allow the agent to make virtuous choices \\cite{nay_meaning_2017}. This can be witnessed in games like \\textit{Life is Strange} \\cite{dotnod_entertainment_life_2022} where better choices can be made with hindsight that lead to similar outcomes. Overall, these design elements make it difficult for an agent to hack the game, thus creating an environment with a complex state space. In such environments, agents that use optimization algorithms cannot explore the entire state space; instead require more sophisticated architectures.\n \n \\subsection{Artificial Virtuous Agents}\n In addition to the existence of virtues that could be applied across domains, virtuous behavior is also dependent on the situation, Aristotle argues:\n \\\\\n \\begin{quotation}\n ``[...] a young man of practical wisdom cannot be found. The cause is that such wisdom is concerned not only with universals but with particulars, which become familiar from experience'' (NE 1141b 10)\\\\\n \\end{quotation}\n \nThrough practice and habituation of virtues, an agent can fulfill their quest for \\textit{eudaimonia}-which translates to ``a combination of well-being, happiness and flourishing'' \\cite{hursthouse_virtue_2001}. In other words, it is not about getting the behavior right every time, but to strive towards virtuous behavior and to improve onself when the opportunity presents itself. Similarly, Berberich and Diepold \\cite{berberich_virtuous_2018} use Aristotle's teleological form of virtue ethics to make the link to goal-oriented RL. An RL agent strives towards maximizing a reward function, given the states and actions available in its environment; the agents will improves it actions over time through learning. Here, we use the word \\textit{goal} cautiously as Aristotle uses it: no one strives for \\textit{eudaimonia} for the sake of some higher goal, instead, \\textit{eudaimonia} itself is the highest goal, and other ends, such as physical health, money, and career success, are only possible means to being \\textit{eudaimon}. When it comes to an RL agent, the reward function should be defined in a similar fashion, but the objective function of the agent is to strive for excellence in the virtues.\n\nFor example, in a simplified version of the game PP, an artificial agent acts as an immigration officer with a family. The environment with states $ S = $ \\{Office, Restaurant, Home\\}, and actions $A = $ \\{Allow, Deny, Feed, Don't Feed, Heat, Don't Heat, Accept Bribe, Reject Bribe\\}. A dilemma can be introduced in the form of bribery or loyalty to family. Since this is a systemic game, the dilemmas are not apparent until the agent has processed multiple entrants. The virtues in this context are honesty (accepting or rejecting bribes) and compassion (allow or deny food/heat).\n\nNote that an artificial agent playing PP does not understand the concept of immigration, family, compassion, or food; it does not have to. The goal of a virtuous agent playing the game is to achieve excellence in relevant virtues, by processing inputs in the form of binary and numeric values, and then to output a decision in the form of discrete or continuous actions (which are again, numbers). The agent must strive to be virtuous, given such a context. In addition to being an inspiration for developing environments that teach artificial agents virtues, the purpose of using a role-player game is to give meaning to these binary and numeric inputs and outputs, thus making it easier for developers, researchers, and philosophers to \\textit{understand} the artificial virtuous agents.\n \\subsection{Deep Reinforcement Learning}\n \n In a single agent RL setting, the states $S$, actions $A$, transition probabilities $T$, and rewards $R$ are modeled in a Markov Decision Process (MDP) ${S, A, T, R}$ framework. Using optimization algorithms, an RL agent learns the best policy by either optimizing the policy, or a value function (the return from being in a particular state $S$, or a state-action pair $[S,A]$). When the state-space is very large, for example in Chess ($10^{43}$ complexity), approximations are applied to simplify this state-space. These approximations are possible using neural networks whose inputs are the states and outputs are either the predicted value or the policy. These networks are optimize an objective function parameterized by $\\theta$ using algorithms such as backpropagation. Various RL agents can be deployed to play systemic role-playing games, ranging from deep Q-learners (value optimizers) to actor-critic models (policy optimizers).\n \n Deep deterministic policy gradients algorithm (DDPG \\cite{lillicrap_continuous_2019}) is a RL algorithm that learns, by trial and error, the value of state-action pairs. It uses this learned state-action value function to select those actions that maximize the expected discounted future rewards. The value function is learned by a neural network $Q(\\theta_Q)$ (critic), while the policy is learned by a distinct and separate neural network $\\mu(\\theta_{\\mu})$ (actor). It uses a duplicate pair of neural networks $Q'(\\theta_{Q'})$ and $\\mu'(\\theta_{\\mu'})$ during learning, for which the network parameters $\\theta_{Q'}$ and $\\theta_{\\mu'}$ are updated slowly according to a soft-update function: $\\theta_i \\gets \\tau \\theta_i + (1-\\tau) \\theta'_i$, where $\\tau \\in [0,1]$ is usually a small number. In the following subsection, we briefly discuss affinity-based RL and how it may be applied to represent virtues in AI.\n \n \n \\subsection{Affinity-Based Reinforcement Learning}\n Affinity-based RL learns policies that are, at least partially, disjoint from the reward function resulting in a homogeneous set of locally-optimal policies for solving the same problem \\cite{aubret_survey_2019}. Contrary to constrained RL, which discourages agents from visiting given states \\cite{miryoosefi_reinforcement_2019, chow_risk-constrained_2015}, affinity-based learning encourages behavior that mimics a defined prior. It is a calculus that is suitable for modelling situations where the desirable behavior is somewhat decoupled from the global optimum. For example, a delivery van in Manhattan may prefer to take right turns over left turns, on the premise that this is a prudent safety measure \\cite{lu_safety_2001}. While it reaches the destination in the end, it navigates along a different route than the global optimum: the shortest distance is typically promoted by reward functions. The reasoning is that the deviation from the global optimum, and any corresponding penalty, is justified by other incentives, such as reduced risk in this case. It is compelling to thus motivate an agent to behave according to a given virtue either globally, or in a state dependent fashion. For example in PP, the prior might define an action distribution that favors honesty 95\\% of the time and loyalty 5\\% of the time. An agent that selects actions according to this distribution can be classified as honest, compared to an agent that was encouraged to act more loyally during learning. \n \nAffinity-based learning uses policy regularization with significant potential for this application. It expedites learning by encouraging exploration of the state space and is never detrimental to convergence \\citep{andres_collaborative_2022, vieillard_leverage_2020}. Haarnoja et al. \\cite{haarnoja_reinforcement_2017} proposed an entropy-based regularization method that penalizes any deviation from a uniform action distribution; it increases the entropy in the policy thereby encouraging exploration of the entire state space. Galashov et al. \\cite{galashov_information_2019} generalizes this method with a regularization term that penalizes the Kullback-Leibler (KL) divergence $D_{KL}$ between the state-action distribution of the policy and that of a given prior: $D_{KL}(P \\vert Q) = \\sum_{x \\in X} P(x) log(\\frac{P(x)}{Q(x)})$. Maree and Omlin \\cite{maree_reinforcement_2022} extended this concept to, rather than improving learning performance, instill a global action affinity into learned policies. They extended the DDPG objective function with a regularization term based on a specific prior:\n \\begin{align} \\label{eqn:regularized_obj_func}\n J(\\theta) &= \\mathbb{E}_{s,a \\sim \\mathcal{D}} \\left[ R(s,a) \\right] - \\lambda L \\\\\n L &= \\frac{1}{M} \\sum_{j=0}^{M} \\left[ \\mathbb{E}_{a \\sim \\pi_{\\theta}}(a_j) - (a_{j} \\vert \\pi_{0}(a)) \\right]^2 \\nonumber\n \\end{align}\n where $J$ is the objective function governed by parameters $\\theta$, $\\mathcal{D}$ is the replay buffer, $R(s,a)$ is the expected reward for action $a$ in state $s$, $\\lambda$ is a hyperparameter that scales the effect of the regularization term $L$, $M$ is the number of actions in the action space $\\pi_{\\theta}$ is the current policy, and $\\pi_0$ is the prior action distribution that defines the desired behavior. Maree and Omlin \\cite{maree_can_2022} demonstrated their method in a financial advisory application, where they trained several prototypical agents to invest according to the preferences from a set of personality traits; each agent invested in those assets that might appeal to a given personality trait. For instance, a highly conscientious agent preferred to invest in property while an extraverted agent preferred buying stocks. While these agents optimized a singular reward function-the maximization of profit-they learned vastly different strategies. To personalize investment strategies, Maree and Omlin \\cite{maree_can_2022} combined these agents according to individual customers' personality profiles. The final strategy was a unique linear combination of the investment actions of the prototypical agents.\n\n \\begin{figure}[ht!]\n \\centering\n \\includegraphics[width=\\linewidth]{architecture.png}\n \\caption{Affinity-based RL agent solving a systemic role-playing game. The agent takes virtuous action by optimizing the regularized objective function and receives next state and reward information from the game. Here, the observations 1 to n represents the state. The text highlighted in \\textcolor{red}{red} represents the affinity of the agent for taking action 2 when encountering a particular combination of observations.}\n \\label{fig:architecture}\n \\end{figure}\n \n The combination of prototypical agents seems a promising approach to learning virtuous behavior: while individual virtues can be learned using policy regularization, a combination of these virtues might represent a rational agent; we are not equally brave or honorable all the time. This way, an agent actually becomes virtuous rather than utilitarian by being solely dependent on the reward function. The other aspect in virtue ethics is \\textit{practical wisdom}, which is to know to what degree an agent must exhibit a virtue depending on the situation. As opposed to the work done in \\cite{maree_can_2022}, the combinations of virtues may therefore vary in time as well as between individuals. One way of attaining such combinations could be through decision trees with a (partially observable) state vector as input. Another approach could be to extend the policy regularization term in Equation~\\ref{eqn:regularized_obj_func} to specify a state-specific action distribution (Fig \\ref{fig:architecture}), resembling KL-regularization. Formally, the regularization term $L$ in Equation~\\ref{eqn:regularized_obj_func} could be replaced by:\n \\begin{align}\n L = \\sum_{s \\in S} \\pi_{\\theta}(s) \\cdot log \\left( \\frac{\\pi_{\\theta}(s)}{\\pi_0(s)} \\right)\n \\nonumber\n \\end{align}\n Thus, an agent may learn to act honorably in certain states, and bravely in others. Such a prior $\\pi_0$ should specify the desired action distribution as a function of the state variables, e.g., in PP a sick family member might prompt an agent to consider bravery 50\\% of the time, whereas a dying family member might elicit a higher rate. This is a compelling generalization of global affinity-based RL to local affinity-based RL. Fig \\ref{fig:architecture} illustrates the flow of information from the systemic role-playing game to the policy-regularized deep RL agent. Finally, once the agent is trained to make virtuous decisions in the game, it is crucial to investigate what the agent has learned from these experiences.\n\n \\subsection{Interpretation of Reinforcement Learning Agents}\n A virtuous agent is required to perform the right actions for the right reasons; it becomes critical that the decisions made be scrutinized. At the same time, black-box architectures such as recurrent neural networks (RNN) within the RL framework, are necessary to maintain a good performance. Such a trade-off between interpretability and performance means that an agent must learn to balance between these. In this paper, we use the words ``explainability'' and ``interpretability'' interchangably, but we acknowledge the differences expressed in literature \\cite{heuillet_explainability_2021}. The composition of prototypical agents is one way of achieving RL interpretability; other methods including causal lens \\cite{madumal_explainable_2020}, reward decomposition \\cite{juozapaitis_explainable_2019} and reward redistribution \\cite{patil_align-rudder_2022}. \n\n The action influence model, introduced by Madumal et al. \\cite{madumal_explainable_2020}, takes inspiration from cognitive science to encode cause-effect relations by using counterfactuals, i.e., events that could have happened along with the ones that did. We may define the causal model for PP and, based on the action influence model, explain the decisions made by the agent. An alternative approach is the reward decomposition technique, where, in addition to the rewards associated with winning a game, the agent is also incentivised to maximize other reward functions. This maximization is done by decomposing the overall Q-function into multiple elemental Q-functions and calculating differences in rewards using a reward difference explanation technique introduced in \\cite{juozapaitis_explainable_2019}. Finally, another interesting approach is the reward redistribution \\cite{patil_align-rudder_2022}, where the expected return is approximated using an LSTM or alignment methods. In reward redistribution, the agent receives delayed rewards at the end of an episode, after every sub-goal, until, finally, the full reward after achieving the main goal. Hence, this approach useful in episodic games such as PP, where salary (reward) is paid at the end of the day, and the main goal of the agent is to keep their family alive using the salary.\n\n\\section{Conclusions and Future Research Directions} \\label{discuss}\nIn this section, we outline some questions that arise as a result of our work, for instance: how could an artificial agent possibly exhibit virtuous behavior when it clearly lacks human agency and consciousness? At the same time, which virtues are artificial, and which are not? While these questions deserve articles of their own, we attempt to briefly discuss them here. After making the case for virtue ethics, we presented examples of role-playing games such as \\textit{Papers, Please} which include ethics as moral dilemmas and we suggested possible approaches to solving such games. Here, we also suggest fruitful directions for future research in virtuous game design and learning algorithms.\n\nWe have purposely side-stepped the question of consciousness and moral agency. We are not concerned with conscious artificial agents, but with AI that exists \\textit{today}. And once again we stress that the virtues we present here are different from human virtues. For example, in the \\textit{Nicomachean ethics} \\cite{ross_oxford_1980}, Aristotle argues for the existence of virtues such as temperance and bravery. Such virtues can be thought of exclusively for humans, because we show emotions such as anger and fear, whereas at this point, one cannot fathom an artificial agent exhibiting such emotions. Thus, it makes sense to think about a different set of virtues for artificial agents.\n\nArtificial virtues can be thought of as character traits for current day artificial intelligence. A starting point is to consider virtues such as honesty (degree of truthfulness), perseverance (how much to compute), and optimization (how much to fine-tune), demonstrated in \\cite{coleman_android_2001}. However, unlike \\cite{coleman_android_2001}, we are compelled to progress from mere machine learning towards designing virtuous AI. We consider virtues to be continuous variables; an agent\u2019s challenge is to find the golden mean for a given virtue. We will elaborate on this aspect of virtues in a future work.\n\nPrevious work has proposed POMDP \\cite{abel_reinforcement_2016}, inverse RL \\cite{berberich_virtuous_2018} and deep neural network frameworks \\cite{stenseke_artificial_2021} as possible means to implement artificial virtues. While these are widely adopted models of machine learning, we do recognize that there is a danger that these models might be perceived as consequentialist. There needs to be something more besides the reward function motivating virtuous behaviour. Techniques that work directly on the objective function to encourage certain behaviours may be needed to work in tandem with the reward function. For example, \\cite{maree_reinforcement_2022} have shown theoretical evidence of agent characterization through policy regularization. Such affinity-based RL methods also aid towards improving the explainability of models, and this is crucial with respect to virtues, as we highlighted in earlier sections.\n\nFinally, it is important to consider the data or environment used to train such agents, as these influence the model's performance. The framework of systemic role-player games highlighted in \\textit{Papers, Please} \\cite{formosa_papers_2016}, provides a reasonable model on how to integrate ethical dilemmas into an environment, such that these ethical aspects arise as the agent plays the game and learns to adjust its decision-making based on feedback received from the environment. Depending on the model and the environment used, it may be fruitful to see how multiple virtuous agents behave when they are at odds. Overall, this paper furthers the conversation on the implementation of ethical machines, which is a nascent research area.\n\n\n\\section*{Acknowledgments}\n\nFinancial support for this project is provided by the University of Agder. We would like to thank Marija Slavkovik, Department of Information Science and Media Studies, University of Bergen, and, Henrik Martin Hansen Torjusen, Department of Foreign Languages and Translation, University of Agder, for useful discussions and feedback on this manuscript.\n\n\n\n\n\\section*{Declarations}\nThe authors have no competing interests.\n", "meta": {"timestamp": "2022-08-31T02:09:41", "yymm": "2208", "arxiv_id": "2208.14037", "language": "en", "url": "https://arxiv.org/abs/2208.14037"}} {"text": "\\section{Introduction} \\label{sec:intro}\nIn a recent work, we investigated the weak interaction of neutrinos in the homogeneous neutron-star (NS) matter within the framework of Korea-IBS-Daegu-SKKU (KIDS) density functional~\\cite{Hutauruk:2022bii}. The work focused on the effect of uncertainties and/or corrections of the nuclear matter equation of state (EoS), i.e., symmetry energy and nucleon effective mass, to the neutrino mean free path (NMFP) within the NS systematically. We found that the NMFP depends strongly on these uncertainties and/or corrections. Compared with the NS radius, the NMFP could be as short as about half of the NS radius, but also could be larger than the NS radius. Such wide range values of NMFP can lead to a different result in the cooling behavior of the NS. Overall, the work of Ref.~\\cite{Hutauruk:2022bii} demonstrated the importance of the accurate determination of nuclear matter EoS in the neutrino weak interaction at finite density and zero temperature. However, it is worth noting that in Ref.~\\cite{Hutauruk:2022bii}, the nucleons are treated as point particles, while, in fact, the nucleon has a structure such as the electromagnetic form factor in the transverse momentum and parton distribution functions in the longitudinal momentum. Such structure of the nucleon has been confirmed not only theoretically~\\cite{Miller:2007kt}, but also experimentally~\\cite{Hyde:2004gef,Andivahis:1994rq,Litt:1969my,SAMPLE:1997dds}. Moreover, in the past, many studies have been done in investigating the neutral and charged current weak interaction of a neutrino with the matter by considering the electromagnetic form factor in the free space~\\cite{Horowitz:2003yx,Reddy:1998hb,Sulaksono:2005wv,Guo:2020tgx,Sulaksono:2006eu,Hutauruk:2006re}. In the present work, we consider the structure of the nucleon--which is originally from non-perturbative quantum chromodynamic (QCD) aspect of hadron and particle physics in both free space and medium. It is well known that the electromagnetic form factors are expected to be modified in the medium~\\cite{Cloet:2009tx,Lu:1998tn,Geesaman:1995yd,EuropeanMuon:1983wih,Malace:2008gf,Hutauruk:2018cgu,Hutauruk:2020mhl}. Such medium effect can be non-negligible in the model~\\cite{plb2009,jpg2010}, providing another source of the corrections that can affect the NMFP.\n\n\nIn most standard studies, neutrinos are assumed to be elementary particles. However, several experiments show the non-zero values for the magnetic moment and charge radius of the neutrino~\\cite{Super-Kamiokande:2004wqk,TEXONO:2002pra,MUNU:2003peb,Beda:2013mta,XENON:2020rca,Borexino:2017fbd,Allen:1992qe}, although the evidence is not firmly established yet. More theoretical studies and experiments using modern facilities like CONUS~\\cite{CONUS:2022qbb}, DUNE~\\cite{Jana:2022tsa}, NOMAD~\\cite{NOMAD:2009qmu}, MiniBooNe~\\cite{MiniBooNE:2010xqw}, MINERvA~\\cite{MINERvA:2013kdn}, Hyper-Kamiokande, and other reactor or accelerator experiments as well as galactic or atmospheric neutrinos are really needed to collect more data in order to establish the properties of the neutrino. If the neutrinos have an internal structure, this evidence of neutrino moment magnetic (NMM) and charge radius (NCR) will significantly impact elementary particle physics, nuclear physics, astrophysics, and cosmology in the standard model calculation.\n\n\nIn this study, we take into account the aforementioned uncertainties and/or corrections from nuclear physics, hadron physics, elementary particle physics, i.e., nucleon form factor (free space and medium), and NMM and NCR in the description of neutrino electro-weak interaction at finite density and zero temperature. The values of the NMM and NCR are obtained from experimental constraints~\\cite{Beda:2013mta,Allen:1992qe}. We then calculate the differential cross-section (DCS) for the neutrino-nucleon scattering and NMFP of the neutrinos in the core of the NS. The roles of the nucleon and neutrino corrections in the neutrino propagation inside NS are explored in further detail. We find that the in-medium corrections of the nucleon form factor and NCR give a significant impact on the DCS and NMFP. Compared to the vacuum nucleon form factor (VFF), the density dependence of the nucleon form factor (DDFF) decreases the DCS, implying an increase in the NMFP. The increase of the NMFP is more significant at higher matter density. It means that the neutrino with the DDFF will more freely escape from the core of NS. By taking an NMM value from the experimental constraints~\\cite{Beda:2013mta}, the DCS does not change significantly. This indicates that the NMM plays a small role in the neutrino emission from the NS core. However, it is drastically changed when we consider a finite NCR value, which is obtained from the experiment constraint~\\cite{Allen:1992qe}. It increases the DCS significantly in comparison with other scenarios without the NCR. This shows that the neutrinos are more strongly interacting with the matter if the NCR contribution is taken into account appropriately. \n\nThe present work is organized as follows: Section II is devoted to a brief description of the theoretical framework. The numerical results and related discussions are given in Section III. The final Section is devoted to the summary. \n\n\n\\section{Neutrino-nucleon interaction} \\label{sec:ermf}\n\n\nHere we adopt four non-relativistic energy density functional (EDF) models: KIDS0, KIDS-A, KIDS0-m*87, and SLy4~\\cite{Hutauruk:2022bii}. All the models have yielded identical values of the saturation density $\\rho_0=0.16$~fm$^{-3}$ and the binding energy per nucleon $E_{\\rm B}=16$~MeV. However, they have distinctive behavior for the EoS at densities below and above the saturation. A conventional expansion of the energy per nucleon can be written as\n\\begin{eqnarray}\n \\label{eq1}\n {\\cal E}(\\rho,\\, \\delta) &=& E(\\rho) + S(\\rho) \\delta^2 + O(\\delta^3), \\\\\n E(\\rho) &=& E_{\\rm B} + \\frac{1}{2} K_0 x^2 + O(x^3), \\\\\n S(\\rho) &=& J + L x + \\frac{1}{2}K_{\\rm sym} x^2 + O(x^3),\\,\\,\\,\\,\\,\\, x = \\frac{\\rho - \\rho_0}{3 \\rho_0},\\,\\, \\delta = \\frac{\\rho_n - \\rho_p}{\\rho}.\n\\end{eqnarray}\n\\begin{table}\n \\begin{center}\n \\begin{tabular}{|c|c|c|c|c|c|c|}\\hline\n & $K_0$ & $J$ & $L$ & $K_{\\rm sym}$ & $m^*_s/M$ & $m^*_v/M$ \\\\ \\hline\n KIDS0 & 240 & 32.8 & 49.1 & $-156.7$ & 1.0 & 0.8 \\\\ \n KIDS-A & 230 & 33 & 66 & $-139.5$ & 1.0 & 0.8 \\\\\n KIDS0-m*87 & 240 & 32.8 & 49.1 & $-156.7$ & 0.8 & 0.7 \\\\\n SLy4 & 229.9 & 32 & 45.9 & $-119.7$ & 0.7 & 0.8 \\\\\n \\hline\n \\end{tabular}\n \\end{center}\n \\caption{Nuclear matter parameters of the models.\n $K_0$, $J$, $L$ and $K_{\\rm sym}$ are in the units of MeV.\n $m^*_s$, $m^*_v$ and $M$ denote the isoscalar effective mass, isovector effective mass, and nucleon mass in free space,\n respectively}\n \\label{tab1}\n\\end{table}\nThe parameters that characterize the density-dependence of EoS are summarized in Tab.~\\ref{tab1}. The KIDS0, KIDS0-m*87, and SLy4 models have similar density dependencies for the symmetry energy but are very different in the effective mass. On the other hand, the KIDS0 and KIDS-A models have similar effective masses, whereas the symmetry energy for the KIDS-A model is much stiffer than that for the KIDS0 model. Comparison among these models will show the role of the effective mass and symmetry energy in the neutrino-nucleon interaction in the nuclear medium.\n\n\nThe neutral-current weak and electromagnetic (EM) interactions for the neutrino-nucleon scattering in the nuclear medium can be described in terms of the following effective Lagrangian:\n\\begin{eqnarray}\n \\label{eq2}\n {\\mathscr L}^N_{\\rm int} &=& \\frac{G_F}{\\sqrt{2}} \\left(\\bar{\\nu} \\Gamma^\\mu_{\\rm W} \\nu \\right)\n \\left( \\bar{N} J^{\\rm W}_\\mu N \\right) \n + \\frac{4 \\alpha_{\\rm EM}}{q^2} \\left(\\bar{\\nu}\\Gamma^\\mu_{\\rm EM} \\nu\\right)\n \\left(\\bar{N} J^{\\rm EM}_\\mu N\\right),\n\\end{eqnarray}\nwhere the $\\nu$ and $N=(n,p)$ denote the neutrino and nucleon fields, respectively. The weak and EM currents for the nucleon, $J^{\\rm W}_\\mu$ and $J^{\\rm EM}_\\mu$ are defined by\n\\begin{eqnarray}\n \\label{eq3}\n J^{\\rm W}_\\mu &=& F^{\\rm W}_1(q^2) \\gamma_\\mu - G_A(q^2) \\gamma_\\mu \\gamma^5\n + i F^{\\rm W}_2(q^2)\\frac{\\sigma_{\\mu \\nu} q^\\nu}{2M} + \\frac{G_p(q^2)}{2M} q_\\mu \\gamma^5, \\nonumber \\\\\n J^{\\rm EM}_\\mu &=& F^{\\rm EM}_1(q^2) \\gamma_\\mu + i F^{\\rm EM}_2(q^2)\\frac{\\sigma_{\\mu\\nu}q^\\nu}{2M}.\n\\end{eqnarray}\nThe values of the nucleon form factors $G_A$ and $F_{1,2}^W$ at $q^2 =0$ in vacuum are summarized in Tab.~\\ref{tab2}.\n\\begin{table}\n \\begin{center} \n \\begin{tabular}{|c|c|c|c|c|c|}\\hline\n Target & $G_A$ & $F^{\\rm W}_1$ & $F^{\\rm W}_2$ & $F^{\\rm EM}_1$ & $F^{\\rm EM}_2$ \\\\ \\hline\n $n$ &$-\\frac{g_A}{2}$ & $-0.5$ & $-\\frac{1}{2}(\\kappa_p -\\kappa_n) - 2 \\sin^2 \\theta_w \\kappa_n$ & 0 & $\\kappa_n$ \\\\\n $p$ & $\\frac{g_A}{2}$ & $0.5 - 2 \\sin^2 \\theta_w$ & $\\frac{1}{2}(\\kappa_p - \\kappa_n)-2 \\sin^2 \\theta_w \\kappa_n$ & 1 & $\\kappa_p$ \\\\\n \n \n \\hline\n \\end{tabular}\n\\end{center}\n\\caption{Vacuum form factor values at $q^2=0$. In the numerical calculation, we use $\\sin^2\\theta_w = 0.231$, $g_A=1.260$, $\\kappa_p=1.793$ and $\\kappa_n=-1.913$.}\n\\label{tab2}\n\\end{table}\n\nSince we are interested in the density effects of the neutrino-nucleon scattering, the nucleon form factors should be described as functions of density. \nFor this purpose, the DDFFs are calculated in the quark-meson coupling (QMC) model~\\cite{Hutauruk:2018cgu}. In the QMC model, the standard values of the current quark mass $m_u =$ 5 MeV and nucleon bag radius in free space $R_N =$ 0.8 fm are used. For more details on the model and related calculations, refer to Ref.~\\cite{Hutauruk:2018cgu} and references therein.\n\n\n\n\n\nThe weak-interaction vertex of the Dirac neutrino in Eq.~(\\ref{eq2}) can be written in the standard $V-A$ form as follows:\n\\begin{equation}\n \\Gamma^\\mu_{\\rm W} = \\gamma^\\mu (1-\\gamma^5),\n\\end{equation}\nwhile the EM-interaction vertex is constructed in terms of the four independent form factors in general\n\\begin{eqnarray}\n\\label{GAMMAEM}\n \\Gamma^\\mu_{\\rm EM} = f_1 \\gamma^\\mu - \\frac{i}{2m_e} f_2\\sigma^{\\mu\\nu}q_\\nu\n + g_1\\left(g^{\\mu\\nu}-\\frac{q^\\mu q^\\nu}{q^2} \\right)\\gamma_\\nu \\gamma^5\n -\\frac{i}{2m_e} g_2\\sigma^{\\mu\\nu}q_\\nu \\gamma^5.\n\\end{eqnarray}\nHere, $f_1$, $g_1$, $f_2$, and $g_2$ are called the Dirac, anapole, magnetic, and electric form factors as functions of $q^2$, respectively. The NCR is simply defined by\n\\begin{eqnarray}\n \\left< R^2_\\nu \\right> \\equiv \\left + \\left< R^2_A \\right>,\n\\end{eqnarray}\nwhere $R_V$ and $R_A$ are the vector and axial-vector charge radii, which are defined by\n\\begin{eqnarray}\n \\left< R^2_V \\right> &=& \\left. 6 \\frac{d f_1(q^2)}{d q^2} \\right|_{q^2=0},\\,\\,\\,\\,\\left< R^2_A \\right> = \\left .6 \\frac{d g_1(q^2)}{d q^2} \\right|_{q^2=0}.\n\\end{eqnarray}\nBy doing this, we can explore the effects of the spatial extension of the neutrino in the medium. In the Breit frame with $q_0=0$, we can use the approximate relation:\n\\begin{eqnarray}\n f_1(q^2) &\\simeq& - \\frac{1}{6} \\left< R^2_V \\right> \\mathbf{q}^2,\\,\\,\\,\\,g_1(q^2) \\simeq - \\frac{1}{6} \\left< R^2_A \\right> \\mathbf{q}^2.\n\\end{eqnarray}\nAt $q^2=0$, $f_2(q^2)$ and $g_2(q^2)$ define the NMM and charge-parity violating electric dipole moment as\n\\begin{equation}\n\\mu^m_\\nu = f_2(0) \\mu_B \\,\\,\\,\\,\\, {\\rm and} \\,\\,\\,\\,\\, \\mu^e_\\nu = g_2(0) \\mu_B,\n\\end{equation}\nfrom which we can define the effective NMM as\n\\begin{equation}\n\\mu^2_\\nu \\equiv (\\mu^m_\\nu)^2 + (\\mu^e_\\nu)^2\n\\end{equation}\nwith the Bohr magneton $\\mu_B = \\frac{e}{2m_e}$, where $e$ and $m_e$ is the electron unit charge and mass, respectively.\n\n\n\nThe DCS density for the neutrino-nucleon scattering in the weak and EM neutral-current interactions is given by\n\\begin{eqnarray}\n \\label{eq4}\n \\frac{1}{V} \\frac{d^3\\sigma}{dE'_\\nu \\, d^2\\Omega} &=& \n - \\frac{1}{16\\pi^2} \\frac{E'_\\nu}{E_\\nu} \\Big[ \\frac{G_F^2}{2} \\left(L_\\nu^{\\alpha \\beta} \\Pi_{\\alpha \\beta}^{\\mathrm{Im}} \\right)_{\\mathrm{W}} + \\frac{16 \\pi^2 \\alpha_{\\mathrm{EM}^2}}{q^2} \\left( L_\\nu^{\\alpha \\beta} \\Pi_{\\alpha \\beta}^{\\mathrm{Im}} \\right)_{\\mathrm{EM}} \\nonumber \\\\\n &+& \\frac{8\\pi G_F \\alpha_{\\mathrm{EM}}}{q^2 \\sqrt{2}} \\left( L_\\nu^{\\alpha \\beta} \\Pi_{\\alpha \\beta}^{\\mathrm{Im}} \\right)_{\\mathrm{INT}} \\Big]\n\\end{eqnarray}\nwhere $E'_\\nu$ and $E_\\nu$ are respectively the final and initial neutrino energies. The detailed forms of $L_\\nu$ and $\\Pi^{\\rm Im}$ for the weak, EM, and interference terms are given in Refs.~\\cite{Reddy:1998hb,Sulaksono:2006eu,Hutauruk:2018cgu}.\n\nThe inverse of the neutrino mean free path (NMFP) is determined by integrating the DCS in Eq.~(\\ref{eq4}) over $q_0$ and $|\\textbf{q}|$, resulting in\n\\begin{eqnarray}\n \\label{eq5}\n \\lambda^{-1} (E_\\nu) &=& 2 \\pi \\int_{q_0}^{(2 E_\\nu - q_0)} d|\\mathbf{q}| \\int_0^{2E_\\nu} dq_0 \\frac{|\\mathbf{q}|}{E_\\nu E_\\nu'} \\left[ \\frac{1}{V} \\frac{d^3 \\sigma}{dE_\\nu' d^2\\Omega} \\right].\n\\end{eqnarray}\nHere, $E_\\nu' = E_\\nu -q_0$.\n\n\n\n\\section{Numerical result and discussion} \\label{sec:NRCC}\nIn this Section, we present the numerical results with detailed discussions for the DDFF, DCR, and NMFP. \n\n\\subsection{Nucleon DDFF}\n\\begin{figure}[t]\n\\begin{tabular}{ccc}\n\\includegraphics[width=5cm]{fig1a.pdf}\n\\includegraphics[width=5cm]{fig1b.pdf}\n\\includegraphics[width=5cm]{fig1c.pdf}\n\\end{tabular}\n \\caption{Normalized density-dependent weak form factors (DDFF) for the nucleon as functions of $Q^2$ and $\\rho$ \n from the QMC model~\\cite{Saito:2005rv}: (a) $G_A(Q^2,\\rho)/G_A(0,0)$, (b) $F^W_{1}(Q^2,\\rho)/F^W_{1}(0,0)$, and (c) $F^W_{2}(Q^2,\\rho)/F^W_{2}(0,0)$.}\n \\label{fig1}\n \\end{figure}\n\nAs mentioned already, since we are interested in the neutrino-nucleon scattering inside the nuclear medium, density effects should be taken into account carefully for the nucleon form factors. For this purpose, we employ the QMC model~\\cite{Saito:2005rv}. In Fig.~\\ref{fig1}, we depict the numerical results for the normalized density-dependent weak form factors for the nucleon as functions of $Q^2$ and $\\rho/\\rho_0$, showing (a) $G_A(Q^2,\\rho)/G_A(0,0)$, (b) $F^W_{1}(Q^2,\\rho)/F^W_{1}(0,0)$, and (c) $F^W_{2}(Q^2,\\rho)/F^W_{2}(0,0)$ for the proton. Typical $Q^2$ dependencies are shown in the DDFF, while the density dependencies are rather different from each other. Note that the $F^W_1$ is almost independent of the density in the low $Q^2$ region, in which we are interested mostly, whereas the $G_A$ decreases with respect to the density, as shown in panel (a) of Fig.~\\ref{fig1}. This decreasing behavior of the $G_A$ can be understood by the lower component of the quark spinor $\\mathcal{L}(r)\\propto\\mathcal{O}(1/M^*_q)$ is enhanced more than its upper component $\\mathscr{U}(r)\\propto\\mathcal{O}(1)$ as functions of the density, when it is calculated using the three-point quark operator~\\cite{Saito:2005rv,Lu:2001mf}: $G_A \\equiv \\int d^3r \\left[ \\mathscr{U}^2(r) -\\mathcal{L}^2(r)\\right]>1$. On the contrary, the $F_2^W$ increases with respect to the density, and this tendency originated from $F_2^W \\equiv \\int d^3r \\mathscr{U}(r) \\mathcal{L}(r)$. From the numerical calculations, we verified that the contributions from the $G_A$ are the most dominant to describe the scattering cross-section. Hence, with the DDFF, the scattering cross-section will be reduced with respect to the density in comparison to that without it. It is worth mentioning that the result for the $G_A$ at saturation density, for instance, turns out to be consistent with other theoretical calculations~\\cite{Hutauruk:2018qku,Lu:2001mf,Rakhimov:1998hu}. Although we do not show the EM form factors here, we verify that their density dependencies are not significant in the cross-section. \n\n\n\\subsection{Neutrino-nucleon scattering DCS}\n\\begin{figure}\n\\begin{tabular}{cc}\n\\includegraphics[width=8cm]{fig2a.pdf}\n\\includegraphics[width=8cm]{fig2b.pdf}\n\\\\\n\\includegraphics[width=8cm]{fig2c.pdf}\n\\includegraphics[width=8cm]{fig2d.pdf}\n\\end{tabular}\n\\caption{Triple isotropic differential cross-section (DCS) of the neutrino-nucleon scattering as a function of the transferred energy $q_0$ for $\\rho= (1.0-3.0)\\,\\rho_0$ for the (a) KIDS0, (b) KIDS-A, (c) KIDS0-m*87, and (d) SLy4 models. The numerical results are given with the vacuum form factor DCS+VFF, DCS+DDFF, DCS+DDFF+NMM, and DCS+DDFF+NCR.}\n\\label{fig2}\n\\end{figure}\n\nThe numerical results for the DCS are depicted as a function of $q_0$ for the densities $\\rho=\\rho_0$ (left), $2\\rho_0$ (middle), and $3\\rho_0$ (right) in Fig.~\\ref{fig2}. In each row, we compute the DCS using different EDF models, i.e., KIDS0, KIDS-A, KIDS0-m*87, and SLy4 from top to bottom. In order to explore the effects of the new ingredients DDFF, NMM, and NCR, we separately show the results with the vacuum form factor (VFF), DDFF, DDFF+NMM, and DDFF+NMM+NCR (long-dashed). Note that the VFF indicates the DDFF at $\\rho=0$. \n\nFirst, we explain the overall tendency depending on the different EDF models. It is obvious that the DCS from KIDS0 and KIDS-A are qualitatively larger than those from KIDS0-m*87 and SLy4, although quantitative differences are still shown. As discussed in detail in Ref.~\\cite{Hutauruk:2022bii}, the larger DCS is originated from the effective neutron mass following the condition $M^*_n\\gtrsim M_n$, where $M_n$\nis the neutron mass in a vacuum, as in KIDS0 and KIDS-A. This observation can be basically explained by that the DCS is proportional to the $\\mathcal{O}(M^{*}_N)$ and $\\mathcal{O}(M^{*2}_N)$ terms from the spin summation over the nucleonic tensor given in Eq.~(\\ref{eq4}). Physically, the increasing nucleon mass in the scattering process results in decreasing energy transfer $\\Delta q_0$ in the $t$ channel at a certain $\\sqrt{s}$ value. In turn, the interaction time increases, resulting in a larger cross-section from $\\Delta q_0\\Delta \\tau\\gtrsim\\hbar/s$, where the $\\tau$ stands for the interaction time. The difference between the models becomes more obvious as the density increases as also reported previously in Ref.~\\cite{Hutauruk:2022bii}. The different endpoints of each cross-section are determined by the effective nucleon masses depending on the models. \n\nAs for the effects of the DDFF, it turns out that the inclusion of the DDFF provides a non-negligible reduction of the cross-section in comparison to that of VFF. Moreover, the reduction becomes more significant as the density increases. If we compare the maximum values with the DCS+DDFF with those of the DCS+VFF, the reduction rates are 0.83, 0.77, and 0.71 at $\\rho=\\rho_0$, $2\\rho_0$, and $3\\rho_0$, respectively, and it is weakly dependent on the EDF models as shown in the figure. Hence, consequently, the decreasing behavior of the DCS can be understood in terms of the in-medium behavior of the DDFF, especially $G_A$, as already discussed in the previous subsection.\n\nNow we are in a position to discuss the effects of NMM in the DCS. Here, we used $\\mu_\\nu = 2.9 \\times 10^{-11} \\mu_{\\rm B}$ for the numerical calculations. This value is constrained from the experiment~\\cite{Beda:2013mta} and close to the astronomical observation~\\cite{Raffelt:1999gv}. By comparing the DCS+DDFF (dotted) and DCS+DDFF+NMM (dashed) in Fig.~\\ref{fig2}, it is obvious that the NMM gives negligible contributions to the scattering process. Note that the neutrino magnetic tensor currents including the NMM are proportional to $q_0$ as shown in Eq.~(\\ref{GAMMAEM}). Hence, the effects of the NMM in the DCS are considerably suppressed in the low $q_0$ region as shown in Fig.~\\ref{fig2}, in addition to its extremely small value $\\propto10^{-11}\\,\\mu_B$, in comparison to other scales such as $\\mu_n$ and $\\mu_p$ in the scattering process.\n\nThe NCR relates to the neutrino electric vector current as in Eq.~(\\ref{GAMMAEM}) and indicates the EM structure of the particle, although the neutrino has been believed to be a point-like particle in general. In the LAMPF experiment for the measurement $\\nu_ee^-\\to\\nu_ee^-$~\\cite{Allen:1992qe}, the NCR was estimated by $R_\\nu = 3.5 \\times 10^{-5}$ MeV$^{-1}$. Using this value in the numerical calculations, the results are given in the long-dashed line in the figure. Very interestingly, all DCS increase considerably with the NCR commonly for the different EDF models. The effects of the NCR are especially profound for the SLy4 model. We also note that the increase of the DCS due to the NCR is less sensitive to the density as expected since the neutrino EM current is not dependent on the density. The drastic changes observed in the DCS due to the NCR can be understood physically by the spatial extension of the neutrino wave function, increasing the overlap with that for the nucleon, resulting in a rapidly growing DCS. \n\n\\subsection{NMFP}\n\\begin{figure}\n\\begin{tabular}{cc}\n\\includegraphics[width=8cm]{fig3a.pdf}\n\\includegraphics[width=8cm]{fig3b.pdf}\n\\\\\n\\includegraphics[width=8cm]{fig3c.pdf}\n\\includegraphics[width=8cm]{fig3d.pdf}\n\\end{tabular}\n\\caption{Neutrino mean free path (NMFP) as a function of density for the (a) KIDS0, (b) KIDS-A, (c) KIDS0-m*87, and (d) SLy4 modes. The numerical results are given with the vacuum form factor NMFP+VFF (dotted), NMFP+DDFF (dashed), and NMFP+DDFF+NCR (long-dashed). The shaded area stands for the region $R_{\\rm NS}=13$~km, an upper bound for the radius of $1.4M_\\odot$ mass NS.}\n\\label{fig3}\n\\end{figure}\n\n\nFinally, we present the numerical results for the NMFP in Fig.~\\ref{fig3} in the same manner as that for the DCS given above. The NMFP is one of the critical quantities that affect the cooling rate of the NS. It is also proportional by construction to the inverse of the DCS, so one can easily expect that shorter NMFP at a certain density for $M^*_n\\gtrsim M_n$ in the KIDS0 and KIDS-A models in comparison to others, as shown in the figure. In common for all the cases, the NMFP is a decreasing function of the density, except for the NMFP+DDFF and NMFP+DDFF+NMM in the SLy4 model, being consistent with the consideration that the neutrino-nucleon interaction rate increases in a dense matter. \n\nIn the low-density region, where VFF$\\sim$DDFF, the NMFP+VFF, and NMFP+DDFF are close to each other as expected, and the difference gets larger as the density grows for all the cases. The effects of the NCR turn out to be critical throughout the densities, decreasing the NMFP by $\\sim5$ km on average. Again, the NNM does not make any significant contributions to the NMFP, being consistent with the DCS results. Taking into account that the radius of the NS whose mass is $1.4\\odot$ is $R_{1.4M_\\odot} \\lesssim 13$~km~\\cite{miller2021}, indicated by the horizontal shaded line, as for the results from the KIDS0 and KIDS-A models, the neutrino can not escape freely from the NS beyond $\\rho\\approx1.5 \\rho_0$. In contrast, the neutrino seldom experiences weak and EM interactions until it escapes from the NS for the KIDS0-m*87 and SLy4 models. For instance, there is almost no delay of the neutrino emission for $R_{\\rm NS}\\approx10$ km because NMFP is always larger than $R_{\\rm NS}$.\n\nInterestingly, the NMFP+DDFF for the SLy4 model shows an increasing curve beyond $\\rho\\approx\\rho_0$, being different from others. That obvious difference shown in the SLy4 model is caused by the smallest neutron-effective mass among the EDF models~\\cite{Hutauruk:2022bii} in addition to the considerable DDFF effects, which make the DCS reduced. From this observation, the importance of the effective mass is crucial in analyzing the cooling processes of the NS via the neutrinos. Since the range of the NMFP predicted by the different EDF models is very wide, the application of the present results to the calculation of thermal evolution of the NS should be necessarily followed up.\n\n\nHere is a discussion on the effect of symmetry energy. A comparison of the KIDS0 and KIDS-A models shows clearly the role of the symmetry energy because the two models differ in symmetry energy but have similar effective masses. In the result of both DCS and NMFP, the two models show similar behavior up to $3\\rho_0$, so the stiffness of the symmetry energy appears to be insignificant. Symmetry energy has a direct effect on the particle fraction, giving a larger proton fraction with stiffer symmetry energy. A large proton fraction can ignite the direct URCA process which leads to super fast cooling of the NS. In addition, it's been shown in \\cite{Hutauruk:2022bii} that the particle fraction shows sizable dependence on the symmetry energy\nat densities higher than $3\\rho_0$. Therefore, the effect of the symmetry energy could be probed correctly when the NS cooling is considered explicitly.\n\n\n\n\\section{Summary and conclusion} \\label{sec:summary}\nIn the present work, we have investigated the neutral-current neutrino-nucleon scattering in the nuclear medium using the quark-meson coupling (QMC) model together with the four different energy-density functional (EDF) models, i.e., KIDS0, KIDS-A, KIDS0-m*87, and SLy4. The nucleon density-dependent form factor (DDFF), differential cross-section (DCS), and neutrino mean free path (NMFP) were computed numerically at various densities. In addition, we also explored the effects of the finite neutrino magnetic moment (NMM) and its EM size via the charge radius (NCR). Below, we list relevant observations found in the present work:\n\\begin{itemize}\n\\item Among the weak DDFFs, $G_A$ is a decreasing function of the density, and vice versa for $F^W_2$, whereas $F^W_1$ is almost insensitive to the density in the small $Q^2$ region. These opposite behaviors between $G_A$ and $F^W_2$ can be understood by the different combinations of the density-dependent lower and upper components of the quark spinor in the QMC model. We also find out that the lower part is more sensitive to density and increases with respect to it. The density dependencies in the EM DDFFs turn out to be weak. \n\\item The DCS increases with respect to the density in general and is larger in the KIDS0 and KIDS-A models which have $M^*_n\\gtrsim M_n$ in comparison to other EDF models. The dominant contribution among the weak DDFFs turns out to be $G_A$, and it makes the DCS gets reduced with respect to the density as understood by the above discussions. The effect of the NMM is almost negligible since it is highly suppressed in the small $q_0$ region. The finite NCR indicates a larger overlap with the nucleon wave function, resulting in the drastic increase of the DCS. \n\\item The NMFP which is inversely proportional to the DCS is scrutinized in the same manner as the DCS. The weak DDFF $G_A$ makes the NMFP increase as understood by its density dependence. The inclusion of the NCR drastically decreases the NMFP by about $5$ km on average within the models. If we take $R_{\\rm NS}\\approx 13$ km for instance in the present theoretical framework, the neutrino escapes from the NS almost without interactions up to $\\rho\\approx3\\rho_0$ for $M^*_n\\lesssim M_n$. On the contrary for $M^*_n\\gtrsim M_n$, DCS increases as density increases, and it leads to NMFP shorter than the NS radius. A decrease in NMFP implies that the interaction of the neutrino with NS matter becomes more probable, and it can impose a non-negligible effect\nto the thermal evolution of the NS\n\\end{itemize}\n\nAs discussed previously, there are considerable theoretical uncertainties in the present theoretical framework, depending on the EDF models, density-dependent form factors, and neutrino properties. Hence, to construct a more realistic theoretical model and determine model parameters consistently, it is necessary to investigate the NS thermal evolution. Related works will appear elsewhere. \n\n\\section*{Acknowledgments}\nThis work was supported by the National Research Foundation of Korea (NRF) Grants No.~2018R1A5A1025563, No.~2020R1F1A1052495, and No.~2022R1A2C1003964.\n\n", "meta": {"timestamp": "2022-08-31T02:07:17", "yymm": "2208", "arxiv_id": "2208.13971", "language": "en", "url": "https://arxiv.org/abs/2208.13971"}} {"text": "\\section{Introduction}\n\\label{sec:intro}\nIn the local universe, the density of galaxies spans several orders of magnitude, from $\\sim0.2\\,\\rho_{0}$ (where $\\rho_{0} \\sim 10^{-29.7} g/\\mathrm{cm}^{3}$ is the mean field density) in sparse void regions all the way up to $\\sim100\\,\\rho_{0}$ in the cores of massive clusters and $\\sim1000\\,\\rho_{0}$ in most compact groups \\citep{1989Sci...246..897G}.\nA large variety of galaxy properties are observed to correlate with galaxy environments such as star formation or quenched galaxy fraction \\citep{2006MNRAS.373..469B, 2008MNRAS.385.1903L, 2010ApJ...721..193P, 2013MNRAS.430.1447K, 2013MNRAS.428.3306W, 2017ApJ...838...87C}, morphology \\citep{1978ApJ...226..559B, 1980ApJ...236..351D, 1999ApJ...518..576P, 2009MNRAS.393.1324B, 2011MNRAS.416.1680C}, kinematics \\citep{2011MNRAS.416.1680C, 2020MNRAS.495.1958W}, interstellar medium \\citep{2011MNRAS.415.1797C, 2012MNRAS.425..273P, 2015MNRAS.453.2399W, 2017MNRAS.466.1275B, 2019MNRAS.483.5409D} and nuclear activity \\citep{2004MNRAS.353..713K, 2011MNRAS.418.2043E, 2013MNRAS.430..638S, 2015MNRAS.448L..72S}.\nGenerally, red galaxies with early-type morphology and little cold gas content tend to populate the inner part of group \\footnote{hereafter group refers to the structure where galaxies are bound within one large dark matter halo while it does not indicate the group mass or richness.\nCluster refers to a massive group.} environment while blue, late-type and gas-rich galaxies are mainly found away from crowded regions.\n\nAll these apparent links encourage the idea that environment-related processes are an important driver of the galaxy evolution.\nIndeed there are abundant pieces of evidence from both observational and theoretical point of view showing the existence of multiple environmental effects (see the review by \\citealt{2006PASP..118..517B}).\nSources of these effects can be broadly classified into two types.\nThe first type is through gravitational interactions with both galaxies and the entire group potential well.\nGravitational tides from neighbours may supply angular momentum to galaxies \\citep{1969ApJ...155..393P,1984ApJ...286...38W} and can condition their overall shape \\citep{1979MNRAS.188..273B}.\nDepending on velocity dispersion within the group, galaxy-galaxy interactions can either have long duration in small groups, such as during preprocessing \\citep{2004ogci.conf..341D,2004PASJ...56...29F}, or have higher frequency but short duration in massive clusters, the so-called galaxy harassment \\citep{1996Natur.379..613M,1998ApJ...495..139M}.\nWhen the group mass is large, the tidal force exerted by the entire group potential well becomes effective for perturbing group galaxies \\citep{1984ApJ...276...26M,1996ApJ...459...82H}.\nThe second type is through various kinds of hydrodynamic interactions occurring between gaseous components of galaxies and the hot intergalactic medium (hereafter IGM).\nIts importance has been suggested ever since when it became clear that hot IGM is ubiquitous among clusters \\citep{1977ApJ...215..401M,1977egsp.conf..369O}.\nSuch type of interaction can happen in various forms, including ram-pressure stripping \\citep{1972ApJ...176....1G,2017ApJ...844...48P}, viscous stripping \\citep{1982MNRAS.198.1007N,2015ApJ...806..104R} and thermal evaporation \\citep{1977Natur.266..501C,2007MNRAS.382.1481N} all of which are able to remove cold gas of galaxies, particularly for the low-mass ones \\citep[e.g.,][]{2013AJ....146..124H, 2020MNRAS.494.2090J}.\nSeveral prototypical galaxies under gas stripping in the Virgo cluster are highlighted in a series of works based on radio interferometry \\citep{2004AJ....127.3361K, 2007ApJ...659L.115C, 2009AJ....138.1741C, 2012A&A...537A.143V}.\nThough originating from different processes, in some cases several mechanisms can have similar effects to galaxies.\nOne example is galaxy starvation \\citep{1980ApJ...237..692L}, in which the loosely bound outer gaseous halos of galaxies are removed by both tidal interactions and ram-pressure stripping preventing further gas accretion \\citep{2002ApJ...577..651B}.\n\nIt is difficult to discern the relative importance of all these mechanisms in certain environments.\nBut one consensus reached by the majority of previous studies is that they are more effective on satellite galaxies, i.e. the less massive galaxies that are gravitationally bound by more massive galaxies.\nThe high-speed relative motion in hot IGM and their shallow potential well both make them more vulnerable to these effects.\nEarly studies of M31/M32 system \\citep[e.g.,][]{1962AJ.....67..471K,1973ApJ...179..423F} and Milky Way/Magellanic clouds system \\citep[e.g.,][]{1976ApJ...203...72T,1982MNRAS.198..707L} have been classic paradigm showing such vulnerability of satellites.\nThe most massive galaxy in the gravitationally bounded system is often called a \"central\" galaxy.\nAnalyses of environmental effects are thus commonly undertaken with the satellite and central galaxy dichotomy \\citep[e.g.,][]{2009MNRAS.394.1213W,2012ApJ...757....4P,2013MNRAS.428.3306W}, which is also adopted in this work.\n\nDespite the fact that these environment-related mechanisms are able to partly explain the various correlations with galaxy environment, it is still under debate to what extent they have played a role.\nIs there strong causality between environment and various galaxy properties just like what is shown by those superficial correlations?\nOr is this apparent link with environment merely a by-product of other more fundamental processes?\nThis question lies at the heart of the \"nature or nurture\" problem.\nOne embodiment of this problem is the controversy over morphology-density relation \\citep{1980ApJ...236..351D, 2003MNRAS.346..601G} which was originally thought to be caused by environmental effects.\nFollowing studies argued for the existence of other more important drivers \\citep[e.g.,][]{2009MNRAS.393.1324B,2016ApJ...818..180C,2017ApJ...851L..33G,2019MNRAS.485..666B} such as stellar mass, colour and sSFR.\nWithout doubt we are still not clear how important these environmental effects are.\n\nUseful information comes from studying the environmental dependence of specific star formation rate (sSFR) radial gradient ($\\nabla\\,\\mathrm{sSFR}$), because various mechanisms at work in group environments can affect different parts of the galactic star-forming discs.\nFor example, ram-pressure stripping is thought to be more efficient at removing loose peripheral atomic hydrogen gas (HI) than affecting inner dense molecular gas disks \\citep{2017MNRAS.467.4282M, 2022arXiv220505698Z}, thus probably tending to suppress outer star formation.\nWhile tidal force by cluster potential well can induce gas inflows and boost star formation in galactic central regions \\citep[e.g.,][]{1990ApJ...350...89B}.\nSo, studying environmental dependence of $\\nabla\\,\\mathrm{sSFR}$ helps to figure out what processes in group environment are important in terms of affecting galactic star formation histories.\nOr if we eventually find only weak dependence on environment, the effectiveness of those proposed mechanisms should be doubted.\n\nPrevious studies along this thread have been carried out using narrow-band $\\mathrm{H}\\alpha$ imaging \\citep[e.g.,][]{2004ApJ...613..851K,2004ApJ...613..866K,2013A&A...553A..91F}, resolved photometry \\citep[e.g.,][]{2007ApJ...658.1006M,2008ApJ...677..970W} and more recently integral field spectroscopy \\citep[IFS; e.g.,][]{2013MNRAS.435.2903B,2017MNRAS.464..121S,2018MNRAS.476..580S,2019A&A...621A..98C,2019ApJ...872...50L}.\nHowever, these studies have acquired very different and sometimes discrepant knowledge about how star formation distributions of galaxies are affected in group environment.\nThe conclusions include 1) outside-in truncation of star formation \\citep[e.g.,][]{2004ApJ...613..851K,2013A&A...553A..91F,2017MNRAS.464..121S,2019A&A...621A..98C}, 2) preferential suppression of star formation in inner regions \\citep[e.g.,][]{2008ApJ...677..970W,2019A&A...621A..98C} and 3) weak or no effect \\citep[e.g.,][]{2007ApJ...658.1006M,2013MNRAS.435.2903B,2018MNRAS.476..580S}.\nEven when the general conclusions are similar, the signals they found can still be in tension.\nFor instance, both using IFS data, \\citealt{2017MNRAS.464..121S} found outside-in truncation for massive galaxies with stellar mass in the range $10<\\mathrm{log}\\,\\mathcal{M}_{\\star}/\\mathcal{M}_{\\odot}<11$ while the outside-in signal in \\citealt{2019A&A...621A..98C} is for less-massive galaxies only ($9<\\mathrm{log}\\,\\mathcal{M}_{\\star}/\\mathcal{M}_{\\odot}<10$), and they found preferential central suppression for massive galaxies.\n\nIn this work, we revisit the environmental dependence of the spatial distribution of star formation by combining SDSS fiber spectral indices (for galaxy central region) and global sSFR measurements to indicate the (relative) shape of sSFR\\footnote{We approach the profiles of sSFR instead of SFR because characterizing the stellar population by the fraction of newborn stars is more representative of star formation status of galaxies.} profiles.\nThis brings sufficient statistics to the investigation, which is crucial, because unambiguous environmental dependence can only be extracted when other important factors, such as stellar mass and total star formation level, are properly controlled.\nCurrent IFS samples can still lack such statistics, especially for low-mass galaxies among which the environmental effects are usually the strongest.\nEven with currently the largest IFS survey MaNGA \\citep{2015ApJ...798....7B}, the sample size is at least an order of magnitude smaller than the sample studied in this work, and would limit the parameter control when we aim to explore in more detail how the sSFR profiles correlate with galaxy environment (see section \\ref{subsec:env}).\n\nThroughout this paper we adopt cosmological parameters from WMAP-9 \\citep{2013ApJS..208...20B} in which $\\mathrm{H}_0=69.3\\,\\mathrm{km}\\,\\mathrm{s}^{-1}\\,\\mathrm{Mpc}^{-1}$, $\\Omega_\\mathrm{m}=0.286$ and $\\Omega_{\\Lambda}=0.714$ and a Chabrier IMF.\n\n\n\\section{Sample}\n\\label{sec:data}\n\n\\subsection{MPA-JHU and GSWLC catalogues}\n\\label{subsec:cat}\n\nOur galaxy sample is assembled out of the MPA-JHU catalogue and the version 2 of GALEX-SDSS-WISE Legacy Catalogue \\citep[GSWLC-2,][]{2016ApJS..227....2S,2018ApJ...859...11S}.\n\nThe MPA-JHU catalogue is based on the Sloan Digital Sky Survey Data Release 7 \\citep[SDSS DR7,][]{2000AJ....120.1579Y,2009ApJS..182..543A}, providing both spectral and photometric measurements from SDSS as well as value-added derived quantities for more than 800,000 unique galaxies. We heavily use the spectral indices (more details in section \\ref{subsec:less}) measured from SDSS spectra which were extracted from fibers of 3 arcsec diameter centered on galaxies. We also take the radius enclosing 50\\% of the total r-band Petrosian flux $\\mathrm{R_{50}}$ as the apparent angular size of galaxies.\n\nDespite the fact that MPA-JHU catalogue does provide SFR, we use the values from GSWLC-2 instead. GSWLC-2 is a value-added catalogue for SDSS galaxies within the GALEX \\citep[Galaxy Evolution Explorer,][]{2005ApJ...619L...1M} footprint.\nIt provides better SFR measurements in overall by adopting the ultra-violet (UV) data in the multi-band spectral energy distribution (SED) fitting. The UV data is from GALEX, which is a space telescope mapping the sky in two UV bands, FUV (1350-1750 {\\rm \\AA}) and NUV (1750-2800 {\\rm \\AA}). Compared with optical SDSS bands, these UV bands are more sensitive to short-lived massive stars, thus to recent star formation.\nGSWLC-2 also uses the 22 \\ensuremath{\\mathrm{\\mu m}}\\ mid-infrared (MIR) band taken by WISE \\citep[Wide-field Infrared Survey Explorer,][]{2010AJ....140.1868W}, which is another space telescope providing all sky images in MIR bands. The 22 \\ensuremath{\\mathrm{\\mu m}}\\ band can trace the absorbed UV light re-emitted by the dust, improving the estimation of recent SFR.\nFor consistency, we also use the stellar mass from GSWLC-2 which is derived by the same SED fitting procedure.\n\nWe use the medium UV depth version of the GSWLC catalogue, taking a balance between the depth of GALEX images and the sky coverage. Our sample thus have a sSFR detection limit of $\\mathrm{sSFR} > 10^{-11.7}\\,\\mathrm{yr^{-1}}$, satisfying the main goal of studying galaxies at low star formation level. The matching between MPA-JHU and GSWLC-M2 is done with a 3 arcsec searching radius, giving a sample of 343,791 galaxies. Changing the matching radius has negligible effect to our sample (differing by less than 0.03\\% when matching radius ranges from 1 arcsec to 5 arcsec).\n\nWe further constrain our sample with the following criteria:\n\\begin{equation}\n\\label{equ:cut}\n \\begin{aligned}\n \\qquad \\qquad \\qquad \\qquad 0.01&10^{10}\\,\\mathcal{M}_{\\odot}$ \\citep{2006ApJS..167....1B}.\nThis magnitude limit is the same as the one adopted for group galaxies defined as halo proxy in the group catalogue used in this work (see section \\ref{subsec:yang}), making the halo mass more reliable below $z=0.085$.\nEven though the sample is not complete for galaxies with $\\mathcal{M}_{\\star}<10^{10}\\,\\mathcal{M}_{\\odot}$ out to $z=0.085$, the analyses throughout this work make proper control of stellar mass and sSFR so that the low-mass galaxies in different environment are compared in the same subvolume where they are complete.\nThe lower redshift limit and the brighter apparent r-band Petrosian magnitude limit are applied to exclude nearby galaxies with too large angular size as their photometry are not properly handled by the SDSS pipeline \\citep{2011AJ....142...31B}.\nAfter this cut, our sample size reduces to 119,820.\n\nOur analysis is applied only to galaxies with $\\ensuremath{\\mathrm{sSFR}} > 10^{-11.7}\\,\\mathrm{yr^{-1}}$, the nominal detection limit of the GSWLC-M2 catalogue.\nBelow this limit, the error in the total SFR surges to 0.7 dex and probing the sSFR radial profile by central spectral indices and total sSFR thus becomes highly uncertain.\n\n\n\\subsection{Galaxy environment}\n\\label{subsec:yang}\n\nWe use the group catalogue constructed by \\citet{2012ApJ...752...41Y} to classify the environment of each galaxy. It was built by applying an iterative group finder algorithm to SDSS galaxies. In each iteration the halo properties of the tentative galaxy groups (identified via friends-of-friends algorithm) are computed and then used to update the group membership for next iteration \\citep{2007ApJ...671..153Y}. The catalogue associates each galaxy to one galaxy group, hence one dark matter halo as well. Based on this, we classify the galaxies into three categories: central, satellite and isolated galaxies. Centrals and satellites are the members of multi-member groups, with the former to be the most massive one. The isolated galaxies belong to the groups with only one member.\n\nThe catalogue also provides dark matter halo mass estimation, based on the total stellar mass or luminosity of bright group members (absolute r-band magnitude $\\mathrm{M}_\\mathrm{r}<-19.5$) via abundance matching. A mock test suggests its typical uncertainty is about 0.3 dex \\citep{2012ApJ...752...41Y}. The halo mass links with a certain virial radius of the halo $R_{200}$:\n\n\\begin{equation}\\label{r200}\n\\qquad \\qquad \\qquad \\mathrm{R}_{200}=\\Bigg[\\frac{\\mathcal{M}_{200}}{\\frac{4\\pi}{3}200\\Omega _\\mathrm{m} \\frac{3\\mathrm{H}_0^2}{8\\pi \\mathrm{G}}}\\Bigg]^{\\frac{1}{3}}\\,\\,(1+z)^{-1}.\n\\end{equation}\n\nAmong the several catalogues with slightly different redshift completeness, we take the group catalogue constructed with SDSS redshifts only, which contains 599,301 galaxies.\nUsing the other versions makes negligible difference.\nAfter matching with the group catalogue, we get a sample of 112,028 galaxies.\n\n\n\n\n\\begin{figure*}\n\t\\begin{center}\n \\includegraphics[width=0.48\\textwidth]{D4000_100BSmedian}\n \\includegraphics[width=0.48\\textwidth]{HdA_100BSmedian}\n \\caption{Top panel: \\ensuremath{\\mathrm{D4000_n}}\\ (left) and \\ensuremath{\\mathrm{H\\delta_A}}\\ (right) as a function of sSFR for isolated (dashed line) and satellite galaxies (solid line), divided into three stellar mass bins (red, yellow and blue colours). The isolated and satellite galaxies are matched with $\\mathrm{R_{50}}$. The background grey contour is derived from kernel density estimation with $V_\\mathrm{max}$ correction, enclosing 68\\% and 95\\% of probability for galaxies in stellar mass range $10^{9}-10^{11.5}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$ and sSFR range $10^{-11.7}-10^{-9}\\,\\mathrm{yr}^{-1}$. Blue shaded region marks the span of the SFMS for galaxies in the lowest mass bin. Bottom panel: the difference between satellite and isolated galaxies.\n The bins with less than 20 galaxies are discarded.}\n \\label{fig:100bs1}\n\t\\end{center}\n\\end{figure*}\n\n\n\\section{Results}\n\\label{sec:result}\n\n\\subsection{Suppressed star formation in the center of satellite galaxies}\n\\label{subsec:less}\n\nThe SDSS single-fiber spectra are extracted from the central part of the galaxies, within a physical radius of 0.3, 1.5, 2.4 kpc respectively at $z=0.01,0.05,0.085$, where 0.05 is about the mean redshift of our sample.\nWe use the \\ensuremath{\\mathrm{D4000_n}}\\ and the Balmer absorption feature \\ensuremath{\\mathrm{H\\delta_A}}\\ to indicate the central sSFR (see also \\citealt{2004MNRAS.353..713K}).\n\\ensuremath{\\mathrm{D4000_n}}\\ is a break feature at around $4000$ {\\rm \\AA} mainly due to a series of metal absorption lines on the blueward side of $4000$ {\\rm \\AA}.\nThese lines are most prominent for stars with spectral types later than K \\citep{1985ApJ...297..371H}, i.e. old stellar populations.\nWhile the opacity at Balmer line \\ensuremath{\\mathrm{H\\delta_A}}\\ peaks among young massive stars with spectral types around A.\nTherefore if galaxies are more dominated by young stars (i.e. high sSFR), \\ensuremath{\\mathrm{H\\delta_A}}\\ and \\ensuremath{\\mathrm{D4000_n}}\\ are respectively higher and lower.\nThese two indices are insensitive to dust extinction as they are flux ratios in adjacent and narrow spectral windows.\nThis is particularly important because the central regions of galaxies are usually highly dust obscured which may introduce large uncertainty in the measured sSFR \\citep{2017MNRAS.469.4063W}.\nWith the central sSFR indicated by SDSS spectral indices and total sSFR from SED fitting, it becomes possible to roughly probe the gradient of the sSFR radial profiles.\nThough the central and total sSFR are not measured in a consistent way, we prove in Appendix \\ref{app:fea} the feasibility in a statistical sense with a smaller sample of galaxies with IFS data.\n\nWe investigate the environmental dependence of the relative difference in sSFR radial gradient by comparing the central sSFR of satellite and isolated galaxies at fixed total sSFR and stellar mass.\nTo ensure that the fiber measurements are on similar scales, we match the apparent angular size of galaxy $\\mathrm{R_{50}}$ so that fibers cover similar fractions of galaxy total light.\nAn alternate aperture controlling is to match redshift, to make fibers cover the same physical scales.\nWe have tested and found that the two ways lead to the same conclusion.\n\nSpecifically, in a certain bin of stellar mass and total sSFR, we minimally trim the satellite and isolated galaxy samples to reach the same $\\mathrm{R_{50}}$ distribution in 0.2 arcsec resolution (i.e. getting the maximally overlapping distribution).\nThe trimming is done in every $\\mathrm{R_{50}}$ bin by sampling with replacement a same number (i.e. minimum of $[\\mathrm{N_{sat},N_{iso}}]$) of the isolated and satellite galaxies.\nWe repeat this matching process for 1000 times to estimate the statistical uncertainty in distribution moments (see also \\citealt{2008MNRAS.385.1903L} and \\citealt{2015MNRAS.448L..72S}).\nWe compute the median \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ for each matched isolated and satellite sample respectively, and the mean and the standard deviation of the 1000 values are taken as the final measurement and its uncertainty.\n\nIn Fig. \\ref{fig:100bs1}, we show the relation between the central sSFR, indicated by \\ensuremath{\\mathrm{D4000_n}}\\ (left panel) and \\ensuremath{\\mathrm{H\\delta_A}} (right panel), and the total sSFR for satellite and isolated galaxies matched in $\\mathrm{R_{50}}$.\nAt given total sSFR, more massive galaxies have lower central sSFR (i.e. higher \\ensuremath{\\mathrm{D4000_n}}\\ and lower \\ensuremath{\\mathrm{H\\delta_A}}).\nIt is consistent with the well established observation that massive galaxies generally show more positive sSFR profiles \\citep[e.g.,][]{2016ApJ...819...91P,2018MNRAS.474.2039E,2018ApJ...856..137W}.\n\nNoteworthily, in the lowest mass bin and at given total sSFR, satellite galaxies show prominently higher \\ensuremath{\\mathrm{D4000_n}}\\ (hereafter we term this \"the central \\ensuremath{\\mathrm{D4000_n}}\\ excess\") when compared to their isolated counterparts (left panel).\nThis signal of environmental dependence of sSFR radial gradients is strongest when the total sSFR of galaxies is well below the star formation main sequence (SFMS; shown by the blue shaded region), whose location is defined by the peak sSFR at given mass of the volume-corrected number density distribution of our sample galaxies (see also the Appendix A of \\citet{2020MNRAS.495.1958W}).\nSimilar trend is also spotted in \\ensuremath{\\mathrm{H\\delta_A}}\\ versus sSFR diagram (right panel), where the low-mass satellite galaxies have systematically lower \\ensuremath{\\mathrm{H\\delta_A}}\\ values.\nThis suggests that environmental effects preferentially suppress the central star formation of galaxies, making the sSFR profile gradient more positive in a relative sense.\nConclusion remains the same if we use total SFR derived from a different recipe, for example measured directly from UV and MIR luminosity (see Appendix \\ref{app:nuvmir}).\n\n\n\\subsection{The dependence on galaxy environment}\n\\label{subsec:env}\n\nIn this section we further explore what suppresses the central star formation in low-mass satellite galaxies by studying how the \\ensuremath{\\mathrm{D4000_n}}\\ excess correlates with galaxy environment.\nWe investigate this environmental dependence in two sSFR windows: $10^{-10.4}-10^{-9.4}\\,\\mathrm{yr}^{-1}$ and $10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$.\nThese two windows respectively cover normal star-forming galaxies around the SFMS, and galaxies below the SFMS but still with detectable star formation activity.\nOur galaxies in the low sSFR bin have a median NUV-r colour index of $\\sim4$ which falls onto the conventional green valley on the colour-magnitude diagram (e.g., as in \\citealt{2007ApJS..173..267S}).\nWe use three parameters to quantify environment of satellites: the halo mass of the group \\ensuremath{\\mathcal{M}_\\mathrm{h}}, the normalized projected distance to the central galaxy \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ (which is effectively the distance to the halo center\\footnote{In groups with few members the weighted-geometric center can be a better tracer of the bottom of the group potential well, as there may not be a dominating central galaxy. We have tested for small groups using this alternate definition of group center and found consistent results that leave our conclusion unchanged.}) and the group richness \\ensuremath{N_\\mathrm{member}}\\ (i.e. number of galaxies within the group).\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.95\\textwidth]{DD4000_3Envs3}\n \\caption{\n Central \\ensuremath{\\mathrm{D4000_n}}\\ excess as a function of the halo mass (\\ensuremath{\\mathcal{M}_\\mathrm{h}}, left panel), the normalized projected distance to the central galaxy (\\ensuremath{R_\\mathrm{bcg}/R_{200}}, middle panel) and the group richness (\\ensuremath{N_\\mathrm{member}}, right panel). Galaxies are split into three stellar mass bins (blue, yellow and red lines), and two sSFR ranges (top and bottom panels), respectively. The horizontal error bar shows the width of the parameter bin, and the vertical error bar shows the statistical uncertainty estimated with bootstrapping technique. The bins with less than 20 galaxies are discarded.\n\t\t}\n\t\t\\label{fig:Edep}\n\t\\end{center}\n\\end{figure*}\n\nFig. \\ref{fig:Edep} represents the \\ensuremath{\\mathrm{D4000_n}}\\ excess as function of these group properties in low and high sSFR bins. The \\ensuremath{\\mathrm{D4000_n}}\\ excess is again calculated by comparing satellite galaxies and their matched isolated counterparts with $\\Delta\\log(\\ensuremath{\\mathcal{M}_\\star}) < 0.1$, $\\Delta\\log(\\mathrm{sSFR}) < 0.1$ and $\\Delta \\mathrm{R_{50}} < 0.2\\,\\mathrm{arcsec}$.\nFor galaxies with low sSFR, the \\ensuremath{\\mathrm{D4000_n}}\\ excess apparently correlates with all environment properties.\nThe satellite galaxies have redder cores (i.e. more suppressed central star formation) when they are: 1) in more massive halos; 2) closer to the center of galaxy groups; 3) in groups with more members.\nThe correlation steepens toward lower stellar mass.\n\nFor galaxies in the high sSFR bin, the environmental dependence is much weaker.\nClear \\ensuremath{\\mathrm{D4000_n}}\\ excess only exists in the largest \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ and \\ensuremath{N_\\mathrm{member}}\\ bins, and only for low-mass galaxies.\nWe note that for massive galaxies with high sSFR shown by the red lines in the bottom panels, in the most massive groups, the \\ensuremath{\\mathrm{D4000_n}}\\ signal is not excess but deficiency, indicating enhanced central star formation compared with galaxies in the field environment.\n\nTo further break down the environmental dependences of the \\ensuremath{\\mathrm{D4000_n}}\\ excess of low-mass satellites of low sSFR, in Fig. \\ref{fig:Edep1} we apply more environment control to the correlation between the \\ensuremath{\\mathrm{D4000_n}}\\ excess and certain environment properties.\nIn the first panel, we show the central \\ensuremath{\\mathrm{D4000_n}}\\ excess as a function of \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ in bins of high/low \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ and \\ensuremath{N_\\mathrm{member}}\\ respectively (split by the median value, i.e. 0.44 and 30, of the low-mass and low-sSFR satellite sample).\nThe second and third panel show the other two dependences on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ and \\ensuremath{N_\\mathrm{member}}\\ with further environment control in a similar manner.\nWe note that there are 88 individual massive groups included in the $\\mathcal{M}_h>10^{13.7}\\,\\mathcal{M}_{\\odot}$ bin, making the result in this bin statistically representative for large groups.\nThe relations for the low-mass and low-sSFR satellites without further environment control in Fig. \\ref{fig:Edep} are shown for reference by black symbols.\n\nWe find that \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ and \\ensuremath{N_\\mathrm{member}}\\ are almost interchangeable.\nIn the first panel, the relations of \\ensuremath{\\mathrm{D4000_n}}\\ excess and \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ in bins of high/low \\ensuremath{N_\\mathrm{member}}\\ (light red and light blue) are just the general relation (black symbols) at higher and lower \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ end.\nThe same case is seen in the third panel, and in the second panel the binning by \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ gives the same relations.\nThis is resulted from the tight correlation between \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ and among satellite galaxies with non-zero host halo mass catalogued in \\citealt{2012ApJ...752...41Y}, the Spearman rank correlation coefficient between \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ is as high as 0.92.\n\nComparing between the left two panels, generally the \\ensuremath{\\mathrm{D4000_n}}\\ excess depends more on \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ than on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ .\nThe \\ensuremath{\\mathrm{D4000_n}}\\ excess is small in less massive halos, nearly irrespective of groupcentric radius.\nThis is shown by the overlapping dark red and dark blue bands at low \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ end in the first panel and also the relatively flat relation in the second panel (dark blue band).\n\\ensuremath{\\mathrm{D4000_n}}\\ excess is present in massive halos even at very large \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ .\nThe dependence on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ starts to become significant in massive halos, especially at the center where we observe \\ensuremath{\\mathrm{D4000_n}}\\ excess as high as 0.2.\nThese results together seem to suggest the first-order importance of halo mass and also that the physical mechanism gets strongly enhanced in cluster center.\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.99\\textwidth]{DD4000_Mhrc1_scatter3}\n\t\t\\caption{\n\t\tThe same as in Fig. \\ref{fig:Edep} but only for low-mass low-sSFR galaxies (black squares). The sample is further split into sub-samples with different environmental parameters (red and blue stripes, see legend in each panel).\n\t\t}\n\t\t\\label{fig:Edep1}\n\t\\end{center}\n\\end{figure*}\n\n\nTaking a step further we introduce the relative velocity of satellites into the analysis to try to link the central \\ensuremath{\\mathrm{D4000_n}}\\ excess to the dynamic status of satellites in their host halos.\nFig. \\ref{fig:psd} shows low-mass satellites ($10^9-10^{9.8}\\,\\mathcal{M}_{\\odot}$) in massive halos ($\\mathcal{M}_h>10^{13.7}\\,\\mathcal{M}_{\\odot}$) on the phase-space diagram \\citep[i.e. normalized relative velocity versus normalized projected distance; See also][]{2015MNRAS.448.1715J}.\nWe calculate the absolute difference of line-of-sight velocities between the satellite and cluster as $|\\Delta v| = c|z-z_c|/(1+z_c)$ where $z_c$ is the luminosity weighted redshift of cluster member galaxies.\nThe velocity difference is then normalized by the cluster velocity dispersion $\\sigma_{200}$ (equation 6 of \\citealt{2007ApJ...671..153Y}).\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_o_final1.pdf}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_s_final1.pdf} \\\\\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_o_SF_final1.pdf}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_s_SF_final1.pdf}\n\t\t\\caption{\n\t\t The central \\ensuremath{\\mathrm{D4000_n}}\\ excess on the phase-space diagram for the low-mass galaxies ($10^9 - 10^{9.8}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$) in massive halos ($\\ensuremath{\\mathcal{M}_\\mathrm{h}} > 10^{13.7}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$). y-axis is the line-of-sight velocity of satellites relative to their host clusters, normalized by cluster velocity dispersion.\n The upper and lower row are for galaxies in the low and high sSFR range as in Fig. \\ref{fig:Edep}.\n The right panels are locally averaged version of the left panels, showing the underlying trend.\n Satellites in the lower triangle region are considered a virialized part of the clusters.\n The black dashed curve is the projected escape velocity normalized by cluster velocity dispersion and galaxies far beyond the curve are not gravitationally bound by the clusters.\n\t\t}\n\t\t\\label{fig:psd}\n\t\\end{center}\n\\end{figure*}\n\nWe mark the boundary of virialized area by a black straight line, below which galaxies are approximately within the part of the cluster in dynamical equilibrium.\nThe black dashed curve represents the normalized projected escape velocity $v_\\mathrm{esc}/\\sigma_{200}$ based on a Navarro-Frenk-White halo \\citep{1996ApJ...462..563N} of concentration $c_\\mathrm{NFW}=6$.\nStarting from the mass profile of a halo one can calculate the potential and thus the escape velocity:\n\\begin{equation}\\label{equa:vesc}\n\\qquad\nv_\\mathrm{esc,3D}=\\sqrt{\\frac{2GM_{200}}{R_{200}}\\times g(c_\\mathrm{NFW}) \\times \\frac{ln(1+c_\\mathrm{NFW}x)}{x}}\n\\end{equation}\nwhere\n\\begin{equation}\n\\qquad\ng(c_\\mathrm{NFW})=\\Big [ ln(1+c_\\mathrm{NFW})-\\frac{c_\\mathrm{NFW}}{1+c_\\mathrm{NFW}} \\Big ] ^{-1}\n\\end{equation}\nand\n\\begin{equation}\n\\qquad\nx=r_\\mathrm{3D}/R_{200}\n\\end{equation}\nWe project velocity along the line of sight and project the distance on the sky plane using the average relations $v_\\mathrm{esc} = \\frac{1}{\\sqrt{3}}v_\\mathrm{esc,3D}$ and $r = \\frac{\\pi}{4}r_\\mathrm{3D}$.\n\nThe same as previous analyses we match the satellites by isolated galaxies of stellar mass, sSFR and $R_{50}$ differences less than 0.1 dex, 0.1 dex and 0.2 arcsec respectively.\nThe central \\ensuremath{\\mathrm{D4000_n}}\\ excess averaged over 100 times matching is recorded for every satellite and we do this analysis separately for satellites in low ($10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$; the upper row of Fig. \\ref{fig:psd}) and high ($10^{-10.4}-10^{-9.4}\\,\\mathrm{yr}^{-1}$; the bottom row of Fig. \\ref{fig:psd}) sSFR ranges.\nThe right column shows the locally averaged results using the locally weighted regression method LOESS by \\citet{Cleveland1988} as implemented\\footnote{We use the Python package \\textsc{loess} v2.0.11 available from https://pypi.org/project/loess/} by \\citet{2013MNRAS.432.1862C}, to reveal the underlying trend.\nWe adopt a smoothing factor \\texttt{frac} = 0.3, and a linear local approximation, but the conclusion does not depend on these certain parameter choices.\n\nIn the upper right panel of Fig. \\ref{fig:psd} for satellites of low sSFR, LOESS reveals certain structure of \\ensuremath{\\mathrm{D4000_n}}\\ excess at low groupcentric radii.\nThe largest \\ensuremath{\\mathrm{D4000_n}}\\ excess is not seen evenly for all the galaxy populations near cluster center, but is particularly linked with the satellites of either small or large relative velocities.\nSatellites of intermediate velocities of about $|\\Delta v|/\\sigma_{200} = 0.7$ only show moderate \\ensuremath{\\mathrm{D4000_n}}\\ excess comparable to those at much larger groupcentric radii.\nThis result indicates an apparent connection between the \\ensuremath{\\mathrm{D4000_n}}\\ excess and the orbit configuration of satellites.\nIn the lower right panel for satellites of high sSFR, which are probably in the early stages of environmental processing, the \\ensuremath{\\mathrm{D4000_n}}\\ excess is low but noteworthily shows the same pattern as the low-sSFR satellites.\nThe consistency suggests that the observed pattern of locally averaged \\ensuremath{\\mathrm{D4000_n}}\\ excess reflects the true trend underlying the noisy data in the left column.\n\n\n\\section{Summary and discussion}\n\\label{sec:discuss}\n\nIn this paper, we have investigated the environmental dependence of the relative difference in sSFR radial gradient for 0.1 million SDSS galaxies at $z \\sim 0$.\nWe compare the central sSFR, indicated by indices \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ measured from SDSS fiber spectra, between satellite and isolated galaxies at the same total sSFR, so that we extract how galaxy environment affects the sSFR radial gradient in a relative sense.\nWith fiber coverage properly matched for the comparison, the large sample size facilitates the study of detailed correlations with a variety of environmental properties when the mass and star formation level of galaxies are controlled.\nOur findings are summarized as below:\n\\begin{enumerate}[(i)]\n \\item Low-mass satellite galaxies ($\\mathcal{M}_{\\star}=10^9-10^{9.8}\\,\\mathcal{M}_{\\odot}$) below the SFMS have lower central sSFR compared to isolated counterpart galaxies at given total sSFR (Fig. \\ref{fig:100bs1}).\n \\item The phenomenon of more suppressed central star formation (i.e. the central \\ensuremath{\\mathrm{D4000_n}}\\ excess at given total sSFR) among low-mass satellites becomes more noticeable in host halos of higher mass (equivalently of more member galaxies), and when closer to the group center, while more massive galaxies below the SFMS show consistent trend but with smaller amplitude (Fig. \\ref{fig:Edep}).\n The dependence on halo mass is of first-order importance and the dependence on groupcentric radius is secondary (Fig. \\ref{fig:Edep1}).\n \\item In the center of massive halos, phase-space diagram reveals that the phenomenon is strongest among satellites of either lowest or highest relative velocities to the halo (Fig. \\ref{fig:psd}), indicating the connection between the suppressed central star formation and orbital configuration of satellite galaxies.\n\\end{enumerate}\n\n\n\\begin{figure}\n \\centering\n \\includegraphics[width=\\linewidth]{Sigma_fiber_lm.pdf}\n \\caption{The stellar mass surface density within the fiber aperture as a function of total sSFR for low-mass satellite and isolated galaxies.}\n \\label{fig:fiber_mu}\n\\end{figure}\n\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.9\\textwidth]{Environmentx2.pdf}\n\t\t\\caption{\n A schematic illustration of the proposed scenario explaining how gas stripping in massive halos can render the quenching of satellites more inside-out.\n An actively star-forming galaxy in isolation (panel $\\spadesuit$) has an extended cold gas disk and a more extended hot gas halo.\n Cold gas disk is replenished by infalling gas cooled out of hot gas which also drives significant gas radial flow (denoted as white arrows) on the disk due to mismatch of angular momentum.\n If such a galaxy falls into the hot gas halo of another much more massive galaxy and becomes a satellite ($\\spadesuit \\Rightarrow \\clubsuit$) both its hot gas halo and the outskirt of its cold gas disk can be largely stripped during orbiting motion after which gas infalling and gas radial inflowing stop.\n In such starvation the central part of this galaxy quenches first due to the high star formation efficiency there (panel $\\diamondsuit$).\n By contrast if this galaxy keeps evolving in isolation ($\\spadesuit \\Rightarrow \\heartsuit$), the diminishing gas cooling and infalling (expected for quenching) do not terminate gas radial inflow immediately.\n Thus the central part can still be replenished during the overall quenching and so the quenching can be radially synchronized.\n\t\t}\n\t\t\\label{fig:illus}\n\t\\end{center}\n\\end{figure*}\n\n\\subsection{The physical mechanisms}\\label{subsec:phy}\n\nThe more suppressed central star formation of satellites compared to field galaxies of the same total sSFR suggests that additional physical processes in galaxy groups make the quenching of star formation happen more inside-out.\nThe environmentally promoted inside-out quenching is especially shown by the sharp increase of central \\ensuremath{\\mathrm{D4000_n}}\\ with decreasing total sSFR among the low-mass satellites (Fig. \\ref{fig:100bs1}).\nThe SFR profiles of low-mass satellites can even deviate more from the profiles of their field counterparts because we find, as shown in Fig. \\ref{fig:fiber_mu}, that the central stellar mass density within fiber area of low-mass satellites of low sSFR is smaller than field galaxies which is consistent with \\citealt{2017MNRAS.464.1077W}.\nThe stellar mass measurements inside fiber area are taken from the MPA-JHU catalogue, with a small mean difference of $\\sim0.1$ dex compared to GSWLC stellar mass \\citep{2016ApJS..227....2S}.\nThe lower central stellar mass density of satellites seems to result from the integrated effect of their suppressed central star formation.\n\nSo far it is unclear, among miscellaneous physical processes occurring in group environment, which mechanism is mainly responsible for the central \\ensuremath{\\mathrm{D4000_n}}\\ excess of low-mass satellite galaxies.\nIn Fig. \\ref{fig:Edep1}, we see that the high \\ensuremath{\\mathrm{D4000_n}}\\ excess is preferentially found in massive clusters, especially in the cluster center.\nThe strongest effect in the cluster center is seen among satellites with either lowest or highest velocities on the phase-space diagram.\nThe former satellite population with lowest velocity generally have low orbital energy as a result of their low potential energy (i.e. at the bottom of potential well) and low kinetic energy.\nSuggested by simulations \\citep[e.g.,][]{2013MNRAS.431.2307O}, these satellites joined the cluster during ancient infalls and have thus been trapped in the center for long time.\nThe latter satellite population with high velocity in the vicinity of cluster center are suggested to be recent infallers that are experiencing their first or second pericenter.\nProjection of velocity and position of satellites can smear such connection between orbital properties and the position on phase-space diagram.\nHowever the clear consistency across satellite populations of high and low sSFR living in a large number of different groups rejects the possibility that the result is due to random projection.\nFrom the perspective of environmental effect, the former satellite population experience in long term the enormous tidal force from the massive cluster, which anti-scales with cubic groupcentric distance and can play an important role in shaping the star formation and morphology of galaxies \\citep{1984ApJ...276...26M,1990ApJ...350...89B}.\nThe latter satellite population, when they pass the orbit pericenter, on short timescales not only do they feel the strong cluster tidal field but also large ram pressure due to both the high density of intracluster medium and their high velocities.\nThe middle panel of Fig. \\ref{fig:Edep1} shows that there is non-negligible \\ensuremath{\\mathrm{D4000_n}}\\ excess at even the outskirt of massive halos, where the cluster tidal field weakens dramatically.\nWhile hydrodynamic gas stripping can still be effective in the outskirt of halos for satellites with high velocities, and some cases were indeed caught in action \\citep[e.g.,][]{2018MNRAS.476.4753J}.\nThis also coincides with the fact shown in the upper right panel of Fig. \\ref{fig:psd} where we see that at large groupcentric radii satellites of higher velocities manifest larger central \\ensuremath{\\mathrm{D4000_n}}\\ excess.\nThese together seem to suggest that both tidal and hydrodynamic interactions are responsible for the phenomenon of suppressed central star formation of satellite galaxies.\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.96\\textwidth]{xxx561.pdf}\n\t\t\\caption{\n Comparison between the probability density functions (filled contours) of low-mass satellite (right panel) and isolated (left panel) galaxies in sSFR range $10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$ on the \\ensuremath{\\mathrm{H\\delta_A}}-\\ensuremath{\\mathrm{D4000_n}}\\ plane with evolutionary tracks (black lines) generated from BC03 models.\n The probability density functions are derived via kernel density estimation with $V_{\\mathrm{max}}$ corrections.\n The contour enclosing 68\\% of total probability is shown as a dotted line.\n The ridges (shaded regions) of density distributions are identified, following \\citealt{2016ApJ...823...18C}, as the most prominent track for each galaxy population.\n The uncertainty of ridges is assessed by 1,000 bootstrap samples and one sigma confidence intervals are shown.\n Model tracks are generated by convolving model spectra with exponentially declining SFHs ($\\mathrm{SFR}\\,\\propto\\,\\mathrm{exp}(-t/\\tau)$) of short (dashed lines; from up to down, $\\tau=0.2,0.4,0.6\\,\\mathrm{Gyr}$) and long (solid lines; from up to down, $\\tau=2,4,6\\,\\mathrm{Gyr}$) characteristic timescales.\n See more details of model tracks in the text.\n\t\t}\n\t\t\\label{fig:sfhs1}\n\t\\end{center}\n\\end{figure*}\n\nIt is known that tidal interactions can strip the loosely bound peripheral gas of galaxies in synergy with the hydrodynamic gas stripping, which together result in galaxy starvation and prevent further gas accretion \\citep{2002ApJ...577..651B}.\nIn starvation, galaxies tend to quench inside-out due to the one order of magnitude faster gas depletion in the center than in the outer part \\citep{2008AJ....136.2782L}.\nStarvation promotes inside-out quenching also because that the radial gas inflows on galactic disks may be largely reduced.\nAs accretions of gas from gaseous halos can drive radial gas inflows due to even just a small mismatch of angular momentum between the accreted gas and the disks \\citep{2016MNRAS.455.2308P}.\n\\citealt{2012MNRAS.426.2266B} reports that this process is one of the most dominant processes inducing radial inflows, making the process an important channel of fuelling central star formation.\nSo the central star formation is less supported in a satellite with a largely stripped gaseous halo (i.e. in starvation).\nBy contrast, during the quenching of isolated galaxies, as long as the hot gaseous halos still exist, their central parts are more likely to be fed by cold gas compared to those highly stripped satellites.\nWe illustrate this scenario in Fig. \\ref{fig:illus} ($\\spadesuit - \\clubsuit - \\diamondsuit$ for satellites and $\\spadesuit - \\heartsuit$ for isolated galaxies).\n\nStarvation as an explanation for the phenomenon shown in this work seems to be in line with \\citealt{2015Natur.521..192P, 2019MNRAS.tmp.2878T} which point out the major role of starvation in quenching the low-mass galaxy populations and the growing significance of starvation in denser environments.\nThough not reporting on the spatial distribution of star formation, \\citealt{2017MNRAS.464..508B} found that the same mechanism drives the enhancement of gas metallicity of satellite galaxies in the EAGLE simulations \\citep{2015MNRAS.446..521S}.\nThey found that the central gas metallicity is enhanced effectively when starvation suppresses the radial inflow of gas, which is predominantly metal-poor.\n\n\\subsubsection{Evidence in recent star formation history}\\label{subsubsec:sfhs}\n\nThe scenario above can have detectable consequences for the recent star formation history (SFH) in the central part of satellite and isolated galaxies.\nWe probe the recent SFH by the combination of \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ which trace stellar populations of different ages (see also \\citealt{2003MNRAS.341...33K}).\nFig. \\ref{fig:sfhs1} shows satellite and isolated galaxies of low mass and low sSFR on the \\ensuremath{\\mathrm{H\\delta_A}}-\\ensuremath{\\mathrm{D4000_n}}\\ plane, overlaid with evolutionary tracks of \\citealt{2003MNRAS.344.1000B} models (BC03).\n\nThe probability density function of galaxies (filled contours) is derived via kernel density estimation with $V_{\\mathrm{max}}$ corrections.\nWe use Gaussian kernel of width determined by Scott's rule \\citep{Scott2015Multivariate}.\nThen, we identify the ridge line (following \\citealt{2016ApJ...823...18C} and is shown by hatched area) for each density distribution as the representative track for the galaxy population.\nIn producing model tracks of exponentially declining SFHs (black dashed lines: declining timescale $\\tau=0.2,0.4,0.6\\,\\mathrm{Gyr}$; black solid lines: $\\tau=2,4,6\\,\\mathrm{Gyr}$), we use MILES stellar library of solar metallicity and Padova 1994 library for stellar evolution prescription.\nUsing other empirical or theoretical stellar libraries and other stellar evolution prescriptions provided in BC03 generates model tracks significantly incompatible with our data.\n\nThe contours show that, compared to isolated galaxies (left panel), a significantly higher fraction of satellites (right panel) populate the lower right area indicating again the suppressed central star formation of satellite galaxies.\nMoreover, the distribution of isolated galaxies is more concentrated around the ridge while that of satellites has a broader shape.\nThis may imply that group environment can diversify the SFH of galaxies.\nNoteworthily, while the ridge line of satellites can be overall matched by continuously declining SFHs of long timescales over Gyrs, the ridge line of isolated galaxies deviates obviously toward models of shorter timescales.\nSuch deviation is due to a non-negligible fraction of isolated galaxies with high \\ensuremath{\\mathrm{H\\delta_A}}\\ at given \\ensuremath{\\mathrm{D4000_n}}.\nAs \\ensuremath{\\mathrm{H\\delta_A}}\\ mainly traces A-type young stars, this elevated \\ensuremath{\\mathrm{H\\delta_A}}\\ indicates the significance of recent burst of star formation (see also the Fig. 6 in \\citealt{2003MNRAS.341...33K}) in the central part of isolated galaxies.\n\nThe observed difference in recent SFH between satellite and isolated galaxies fits into the scenario described before.\nThe existing hot gas halo of low-sSFR isolated galaxies can still fuel some small bursts of star formation, when the inefficient gas cooling (expected from low sSFR) is only able to drive gas radial flows episodically.\nBy contrast, the central part of satellites in starvation are more likely to turn red quiescently and smoothly when without further gas supply.\n\n\\subsection{Comparison with previous works}\\label{subsec:comparison}\n\nThe discussion above does not incorporate gas stripping caused outside-in quenching as a major driver of the cessation of total star formation in group environments.\nInstead, environments are observed to render quenching of low-mass galaxies more inside-out.\nHowever, it has to be clarified that the results do not indicate that gas stripping does not influence outer star formation to any extent.\nThe results only suggest that the inner parts of galaxies contribute primarily to the total decline of star formation under environmental effects, while the suppression of star formation in the outskirt is only secondary.\nThe conclusion is echoed by \\citealt{2019ApJ...872...50L}, who found that inside-out quenching is the highly dominant channel even for satellites in massive halos and the fraction of galaxies experiencing outside-in quenching does not depend on halo mass at all.\n\nThe same conclusion was not reached by many other works in the literature, which are also contradicting among themselves.\nUsing 1,494 MaNGA galaxies, \\citealt{2018MNRAS.476..580S} compared the sSFR radial profiles of central and satellite galaxies.\nTheir Fig. 7 indicates that, in the intermediate and high mass bins, the sSFR of satellites are systematically lower than the central galaxies particularly outside 0.5 effective radius.\nFor galaxies in the low-mass bin, this pattern appears to be reversed, showing more inside-out quenching for satellites.\nIn spite of the general consistency among low-mass galaxies between \\citealt{2018MNRAS.476..580S} and our work, our data do not indicate the outside-in quenching for massive satellite galaxies.\n\\citealt{2019A&A...621A..98C} used a smaller sample of 275 late-type CALIFA galaxies and carried out similar analyses.\nAs entirely opposed to the results in \\citealt{2018MNRAS.476..580S}, for low-mass galaxies in groups they found more suppressed star formation in the outer parts compared with galaxies in the field, and for the massive galaxies, more suppressed in the inner parts.\nRather than being suppressed, the low-mass satellite galaxies studied by \\citealt{2019MNRAS.489.1436L} show centrally enhanced star formation in the densest environments.\nApart from these recent works based on IFS data, \\citealt{2009MNRAS.394.1213W} studied the g-r colour profiles of galaxies in the SDSS Data Release 4.\nThey found outside-in quenching pattern for the satellite galaxies in their high mass bin.\nIn their low-mass bin, the colour profiles of the satellites are globally redder compared to the central galaxies.\nTheir sample almost does not cover the low-mass range of our data.\n\nThe intricate discrepancies between works in the literature can result from a variety of reasons.\nNoteworthily, the samples were selected with diverse criteria.\nFor example, \\citealt{2017MNRAS.464..121S} only selected galaxies with central regions classified as star-forming by emission line diagnostics.\nThis may have biased their sample against centrally quenched galaxies, which would have weak emission lines in the center.\n\\citealt{2019MNRAS.489.1436L} introduced thresholds for signal to noise of emission lines during the sample selection.\nThe sample of \\citealt{2019A&A...621A..98C} was preselected by Hubble type.\nMoreover, a problem in some previous studies is that sSFR radial profiles are not compared at the same level of total sSFR for galaxies in different environments.\nWhile many IFS studies \\citep[e.g.,][]{2018MNRAS.477.3014B,2018ApJ...856..137W} have shown that sSFR radial gradients clearly depend on the level of total sSFR.\nTherefore, extracting a more unambiguous dependence on environment needs better control of total sSFR, as we have done in this work.\n\n\n\\section*{Acknowledgements}\nBW acknowledges the elaborated and constructive comments from the anonymous referee which significantly helped improve this manuscript.\nBW thanks Li Shao for his insightful and decisive comments on this work, and thanks Jing Wang, Min Du, and Jingjing Shi for the fruitful discussions with them.\n\nFunding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England.\nThe SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.\n\n\n\\section*{Data Availability}\nThe data used in this work are all publicly available.\nWe take the MPA-JHU catalogue from https://wwwmpa.mpa-garching.mpg.de/SDSS/DR7/ and the GSWLC catalogue from https://salims.pages.iu.edu/gswlc/ and the group catalogue from https://gax.sjtu.edu.cn/data/Group.html for SDSS galaxies.\n\n\n\n\n\n\\bibliographystyle{mnras}\n\n\\section{Introduction}\n\\label{sec:intro}\nIn the local universe, the density of galaxies spans several orders of magnitude, from $\\sim0.2\\,\\rho_{0}$ (where $\\rho_{0} \\sim 10^{-29.7} g/\\mathrm{cm}^{3}$ is the mean field density) in sparse void regions all the way up to $\\sim100\\,\\rho_{0}$ in the cores of massive clusters and $\\sim1000\\,\\rho_{0}$ in most compact groups \\citep{1989Sci...246..897G}.\nA large variety of galaxy properties are observed to correlate with galaxy environments such as star formation or quenched galaxy fraction \\citep{2006MNRAS.373..469B, 2008MNRAS.385.1903L, 2010ApJ...721..193P, 2013MNRAS.430.1447K, 2013MNRAS.428.3306W, 2017ApJ...838...87C}, morphology \\citep{1978ApJ...226..559B, 1980ApJ...236..351D, 1999ApJ...518..576P, 2009MNRAS.393.1324B, 2011MNRAS.416.1680C}, kinematics \\citep{2011MNRAS.416.1680C, 2020MNRAS.495.1958W}, interstellar medium \\citep{2011MNRAS.415.1797C, 2012MNRAS.425..273P, 2015MNRAS.453.2399W, 2017MNRAS.466.1275B, 2019MNRAS.483.5409D} and nuclear activity \\citep{2004MNRAS.353..713K, 2011MNRAS.418.2043E, 2013MNRAS.430..638S, 2015MNRAS.448L..72S}.\nGenerally, red galaxies with early-type morphology and little cold gas content tend to populate the inner part of group \\footnote{hereafter group refers to the structure where galaxies are bound within one large dark matter halo while it does not indicate the group mass or richness.\nCluster refers to a massive group.} environment while blue, late-type and gas-rich galaxies are mainly found away from crowded regions.\n\nAll these apparent links encourage the idea that environment-related processes are an important driver of the galaxy evolution.\nIndeed there are abundant pieces of evidence from both observational and theoretical point of view showing the existence of multiple environmental effects (see the review by \\citealt{2006PASP..118..517B}).\nSources of these effects can be broadly classified into two types.\nThe first type is through gravitational interactions with both galaxies and the entire group potential well.\nGravitational tides from neighbours may supply angular momentum to galaxies \\citep{1969ApJ...155..393P,1984ApJ...286...38W} and can condition their overall shape \\citep{1979MNRAS.188..273B}.\nDepending on velocity dispersion within the group, galaxy-galaxy interactions can either have long duration in small groups, such as during preprocessing \\citep{2004ogci.conf..341D,2004PASJ...56...29F}, or have higher frequency but short duration in massive clusters, the so-called galaxy harassment \\citep{1996Natur.379..613M,1998ApJ...495..139M}.\nWhen the group mass is large, the tidal force exerted by the entire group potential well becomes effective for perturbing group galaxies \\citep{1984ApJ...276...26M,1996ApJ...459...82H}.\nThe second type is through various kinds of hydrodynamic interactions occurring between gaseous components of galaxies and the hot intergalactic medium (hereafter IGM).\nIts importance has been suggested ever since when it became clear that hot IGM is ubiquitous among clusters \\citep{1977ApJ...215..401M,1977egsp.conf..369O}.\nSuch type of interaction can happen in various forms, including ram-pressure stripping \\citep{1972ApJ...176....1G,2017ApJ...844...48P}, viscous stripping \\citep{1982MNRAS.198.1007N,2015ApJ...806..104R} and thermal evaporation \\citep{1977Natur.266..501C,2007MNRAS.382.1481N} all of which are able to remove cold gas of galaxies, particularly for the low-mass ones \\citep[e.g.,][]{2013AJ....146..124H, 2020MNRAS.494.2090J}.\nSeveral prototypical galaxies under gas stripping in the Virgo cluster are highlighted in a series of works based on radio interferometry \\citep{2004AJ....127.3361K, 2007ApJ...659L.115C, 2009AJ....138.1741C, 2012A&A...537A.143V}.\nThough originating from different processes, in some cases several mechanisms can have similar effects to galaxies.\nOne example is galaxy starvation \\citep{1980ApJ...237..692L}, in which the loosely bound outer gaseous halos of galaxies are removed by both tidal interactions and ram-pressure stripping preventing further gas accretion \\citep{2002ApJ...577..651B}.\n\nIt is difficult to discern the relative importance of all these mechanisms in certain environments.\nBut one consensus reached by the majority of previous studies is that they are more effective on satellite galaxies, i.e. the less massive galaxies that are gravitationally bound by more massive galaxies.\nThe high-speed relative motion in hot IGM and their shallow potential well both make them more vulnerable to these effects.\nEarly studies of M31/M32 system \\citep[e.g.,][]{1962AJ.....67..471K,1973ApJ...179..423F} and Milky Way/Magellanic clouds system \\citep[e.g.,][]{1976ApJ...203...72T,1982MNRAS.198..707L} have been classic paradigm showing such vulnerability of satellites.\nThe most massive galaxy in the gravitationally bounded system is often called a \"central\" galaxy.\nAnalyses of environmental effects are thus commonly undertaken with the satellite and central galaxy dichotomy \\citep[e.g.,][]{2009MNRAS.394.1213W,2012ApJ...757....4P,2013MNRAS.428.3306W}, which is also adopted in this work.\n\nDespite the fact that these environment-related mechanisms are able to partly explain the various correlations with galaxy environment, it is still under debate to what extent they have played a role.\nIs there strong causality between environment and various galaxy properties just like what is shown by those superficial correlations?\nOr is this apparent link with environment merely a by-product of other more fundamental processes?\nThis question lies at the heart of the \"nature or nurture\" problem.\nOne embodiment of this problem is the controversy over morphology-density relation \\citep{1980ApJ...236..351D, 2003MNRAS.346..601G} which was originally thought to be caused by environmental effects.\nFollowing studies argued for the existence of other more important drivers \\citep[e.g.,][]{2009MNRAS.393.1324B,2016ApJ...818..180C,2017ApJ...851L..33G,2019MNRAS.485..666B} such as stellar mass, colour and sSFR.\nWithout doubt we are still not clear how important these environmental effects are.\n\nUseful information comes from studying the environmental dependence of specific star formation rate (sSFR) radial gradient ($\\nabla\\,\\mathrm{sSFR}$), because various mechanisms at work in group environments can affect different parts of the galactic star-forming discs.\nFor example, ram-pressure stripping is thought to be more efficient at removing loose peripheral atomic hydrogen gas (HI) than affecting inner dense molecular gas disks \\citep{2017MNRAS.467.4282M, 2022arXiv220505698Z}, thus probably tending to suppress outer star formation.\nWhile tidal force by cluster potential well can induce gas inflows and boost star formation in galactic central regions \\citep[e.g.,][]{1990ApJ...350...89B}.\nSo, studying environmental dependence of $\\nabla\\,\\mathrm{sSFR}$ helps to figure out what processes in group environment are important in terms of affecting galactic star formation histories.\nOr if we eventually find only weak dependence on environment, the effectiveness of those proposed mechanisms should be doubted.\n\nPrevious studies along this thread have been carried out using narrow-band $\\mathrm{H}\\alpha$ imaging \\citep[e.g.,][]{2004ApJ...613..851K,2004ApJ...613..866K,2013A&A...553A..91F}, resolved photometry \\citep[e.g.,][]{2007ApJ...658.1006M,2008ApJ...677..970W} and more recently integral field spectroscopy \\citep[IFS; e.g.,][]{2013MNRAS.435.2903B,2017MNRAS.464..121S,2018MNRAS.476..580S,2019A&A...621A..98C,2019ApJ...872...50L}.\nHowever, these studies have acquired very different and sometimes discrepant knowledge about how star formation distributions of galaxies are affected in group environment.\nThe conclusions include 1) outside-in truncation of star formation \\citep[e.g.,][]{2004ApJ...613..851K,2013A&A...553A..91F,2017MNRAS.464..121S,2019A&A...621A..98C}, 2) preferential suppression of star formation in inner regions \\citep[e.g.,][]{2008ApJ...677..970W,2019A&A...621A..98C} and 3) weak or no effect \\citep[e.g.,][]{2007ApJ...658.1006M,2013MNRAS.435.2903B,2018MNRAS.476..580S}.\nEven when the general conclusions are similar, the signals they found can still be in tension.\nFor instance, both using IFS data, \\citealt{2017MNRAS.464..121S} found outside-in truncation for massive galaxies with stellar mass in the range $10<\\mathrm{log}\\,\\mathcal{M}_{\\star}/\\mathcal{M}_{\\odot}<11$ while the outside-in signal in \\citealt{2019A&A...621A..98C} is for less-massive galaxies only ($9<\\mathrm{log}\\,\\mathcal{M}_{\\star}/\\mathcal{M}_{\\odot}<10$), and they found preferential central suppression for massive galaxies.\n\nIn this work, we revisit the environmental dependence of the spatial distribution of star formation by combining SDSS fiber spectral indices (for galaxy central region) and global sSFR measurements to indicate the (relative) shape of sSFR\\footnote{We approach the profiles of sSFR instead of SFR because characterizing the stellar population by the fraction of newborn stars is more representative of star formation status of galaxies.} profiles.\nThis brings sufficient statistics to the investigation, which is crucial, because unambiguous environmental dependence can only be extracted when other important factors, such as stellar mass and total star formation level, are properly controlled.\nCurrent IFS samples can still lack such statistics, especially for low-mass galaxies among which the environmental effects are usually the strongest.\nEven with currently the largest IFS survey MaNGA \\citep{2015ApJ...798....7B}, the sample size is at least an order of magnitude smaller than the sample studied in this work, and would limit the parameter control when we aim to explore in more detail how the sSFR profiles correlate with galaxy environment (see section \\ref{subsec:env}).\n\nThroughout this paper we adopt cosmological parameters from WMAP-9 \\citep{2013ApJS..208...20B} in which $\\mathrm{H}_0=69.3\\,\\mathrm{km}\\,\\mathrm{s}^{-1}\\,\\mathrm{Mpc}^{-1}$, $\\Omega_\\mathrm{m}=0.286$ and $\\Omega_{\\Lambda}=0.714$ and a Chabrier IMF.\n\n\n\\section{Sample}\n\\label{sec:data}\n\n\\subsection{MPA-JHU and GSWLC catalogues}\n\\label{subsec:cat}\n\nOur galaxy sample is assembled out of the MPA-JHU catalogue and the version 2 of GALEX-SDSS-WISE Legacy Catalogue \\citep[GSWLC-2,][]{2016ApJS..227....2S,2018ApJ...859...11S}.\n\nThe MPA-JHU catalogue is based on the Sloan Digital Sky Survey Data Release 7 \\citep[SDSS DR7,][]{2000AJ....120.1579Y,2009ApJS..182..543A}, providing both spectral and photometric measurements from SDSS as well as value-added derived quantities for more than 800,000 unique galaxies. We heavily use the spectral indices (more details in section \\ref{subsec:less}) measured from SDSS spectra which were extracted from fibers of 3 arcsec diameter centered on galaxies. We also take the radius enclosing 50\\% of the total r-band Petrosian flux $\\mathrm{R_{50}}$ as the apparent angular size of galaxies.\n\nDespite the fact that MPA-JHU catalogue does provide SFR, we use the values from GSWLC-2 instead. GSWLC-2 is a value-added catalogue for SDSS galaxies within the GALEX \\citep[Galaxy Evolution Explorer,][]{2005ApJ...619L...1M} footprint.\nIt provides better SFR measurements in overall by adopting the ultra-violet (UV) data in the multi-band spectral energy distribution (SED) fitting. The UV data is from GALEX, which is a space telescope mapping the sky in two UV bands, FUV (1350-1750 {\\rm \\AA}) and NUV (1750-2800 {\\rm \\AA}). Compared with optical SDSS bands, these UV bands are more sensitive to short-lived massive stars, thus to recent star formation.\nGSWLC-2 also uses the 22 \\ensuremath{\\mathrm{\\mu m}}\\ mid-infrared (MIR) band taken by WISE \\citep[Wide-field Infrared Survey Explorer,][]{2010AJ....140.1868W}, which is another space telescope providing all sky images in MIR bands. The 22 \\ensuremath{\\mathrm{\\mu m}}\\ band can trace the absorbed UV light re-emitted by the dust, improving the estimation of recent SFR.\nFor consistency, we also use the stellar mass from GSWLC-2 which is derived by the same SED fitting procedure.\n\nWe use the medium UV depth version of the GSWLC catalogue, taking a balance between the depth of GALEX images and the sky coverage. Our sample thus have a sSFR detection limit of $\\mathrm{sSFR} > 10^{-11.7}\\,\\mathrm{yr^{-1}}$, satisfying the main goal of studying galaxies at low star formation level. The matching between MPA-JHU and GSWLC-M2 is done with a 3 arcsec searching radius, giving a sample of 343,791 galaxies. Changing the matching radius has negligible effect to our sample (differing by less than 0.03\\% when matching radius ranges from 1 arcsec to 5 arcsec).\n\nWe further constrain our sample with the following criteria:\n\\begin{equation}\n\\label{equ:cut}\n \\begin{aligned}\n \\qquad \\qquad \\qquad \\qquad 0.01&10^{10}\\,\\mathcal{M}_{\\odot}$ \\citep{2006ApJS..167....1B}.\nThis magnitude limit is the same as the one adopted for group galaxies defined as halo proxy in the group catalogue used in this work (see section \\ref{subsec:yang}), making the halo mass more reliable below $z=0.085$.\nEven though the sample is not complete for galaxies with $\\mathcal{M}_{\\star}<10^{10}\\,\\mathcal{M}_{\\odot}$ out to $z=0.085$, the analyses throughout this work make proper control of stellar mass and sSFR so that the low-mass galaxies in different environment are compared in the same subvolume where they are complete.\nThe lower redshift limit and the brighter apparent r-band Petrosian magnitude limit are applied to exclude nearby galaxies with too large angular size as their photometry are not properly handled by the SDSS pipeline \\citep{2011AJ....142...31B}.\nAfter this cut, our sample size reduces to 119,820.\n\nOur analysis is applied only to galaxies with $\\ensuremath{\\mathrm{sSFR}} > 10^{-11.7}\\,\\mathrm{yr^{-1}}$, the nominal detection limit of the GSWLC-M2 catalogue.\nBelow this limit, the error in the total SFR surges to 0.7 dex and probing the sSFR radial profile by central spectral indices and total sSFR thus becomes highly uncertain.\n\n\n\\subsection{Galaxy environment}\n\\label{subsec:yang}\n\nWe use the group catalogue constructed by \\citet{2012ApJ...752...41Y} to classify the environment of each galaxy. It was built by applying an iterative group finder algorithm to SDSS galaxies. In each iteration the halo properties of the tentative galaxy groups (identified via friends-of-friends algorithm) are computed and then used to update the group membership for next iteration \\citep{2007ApJ...671..153Y}. The catalogue associates each galaxy to one galaxy group, hence one dark matter halo as well. Based on this, we classify the galaxies into three categories: central, satellite and isolated galaxies. Centrals and satellites are the members of multi-member groups, with the former to be the most massive one. The isolated galaxies belong to the groups with only one member.\n\nThe catalogue also provides dark matter halo mass estimation, based on the total stellar mass or luminosity of bright group members (absolute r-band magnitude $\\mathrm{M}_\\mathrm{r}<-19.5$) via abundance matching. A mock test suggests its typical uncertainty is about 0.3 dex \\citep{2012ApJ...752...41Y}. The halo mass links with a certain virial radius of the halo $R_{200}$:\n\n\\begin{equation}\\label{r200}\n\\qquad \\qquad \\qquad \\mathrm{R}_{200}=\\Bigg[\\frac{\\mathcal{M}_{200}}{\\frac{4\\pi}{3}200\\Omega _\\mathrm{m} \\frac{3\\mathrm{H}_0^2}{8\\pi \\mathrm{G}}}\\Bigg]^{\\frac{1}{3}}\\,\\,(1+z)^{-1}.\n\\end{equation}\n\nAmong the several catalogues with slightly different redshift completeness, we take the group catalogue constructed with SDSS redshifts only, which contains 599,301 galaxies.\nUsing the other versions makes negligible difference.\nAfter matching with the group catalogue, we get a sample of 112,028 galaxies.\n\n\n\n\n\\begin{figure*}\n\t\\begin{center}\n \\includegraphics[width=0.48\\textwidth]{D4000_100BSmedian}\n \\includegraphics[width=0.48\\textwidth]{HdA_100BSmedian}\n \\caption{Top panel: \\ensuremath{\\mathrm{D4000_n}}\\ (left) and \\ensuremath{\\mathrm{H\\delta_A}}\\ (right) as a function of sSFR for isolated (dashed line) and satellite galaxies (solid line), divided into three stellar mass bins (red, yellow and blue colours). The isolated and satellite galaxies are matched with $\\mathrm{R_{50}}$. The background grey contour is derived from kernel density estimation with $V_\\mathrm{max}$ correction, enclosing 68\\% and 95\\% of probability for galaxies in stellar mass range $10^{9}-10^{11.5}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$ and sSFR range $10^{-11.7}-10^{-9}\\,\\mathrm{yr}^{-1}$. Blue shaded region marks the span of the SFMS for galaxies in the lowest mass bin. Bottom panel: the difference between satellite and isolated galaxies.\n The bins with less than 20 galaxies are discarded.}\n \\label{fig:100bs1}\n\t\\end{center}\n\\end{figure*}\n\n\n\\section{Results}\n\\label{sec:result}\n\n\\subsection{Suppressed star formation in the center of satellite galaxies}\n\\label{subsec:less}\n\nThe SDSS single-fiber spectra are extracted from the central part of the galaxies, within a physical radius of 0.3, 1.5, 2.4 kpc respectively at $z=0.01,0.05,0.085$, where 0.05 is about the mean redshift of our sample.\nWe use the \\ensuremath{\\mathrm{D4000_n}}\\ and the Balmer absorption feature \\ensuremath{\\mathrm{H\\delta_A}}\\ to indicate the central sSFR (see also \\citealt{2004MNRAS.353..713K}).\n\\ensuremath{\\mathrm{D4000_n}}\\ is a break feature at around $4000$ {\\rm \\AA} mainly due to a series of metal absorption lines on the blueward side of $4000$ {\\rm \\AA}.\nThese lines are most prominent for stars with spectral types later than K \\citep{1985ApJ...297..371H}, i.e. old stellar populations.\nWhile the opacity at Balmer line \\ensuremath{\\mathrm{H\\delta_A}}\\ peaks among young massive stars with spectral types around A.\nTherefore if galaxies are more dominated by young stars (i.e. high sSFR), \\ensuremath{\\mathrm{H\\delta_A}}\\ and \\ensuremath{\\mathrm{D4000_n}}\\ are respectively higher and lower.\nThese two indices are insensitive to dust extinction as they are flux ratios in adjacent and narrow spectral windows.\nThis is particularly important because the central regions of galaxies are usually highly dust obscured which may introduce large uncertainty in the measured sSFR \\citep{2017MNRAS.469.4063W}.\nWith the central sSFR indicated by SDSS spectral indices and total sSFR from SED fitting, it becomes possible to roughly probe the gradient of the sSFR radial profiles.\nThough the central and total sSFR are not measured in a consistent way, we prove in Appendix \\ref{app:fea} the feasibility in a statistical sense with a smaller sample of galaxies with IFS data.\n\nWe investigate the environmental dependence of the relative difference in sSFR radial gradient by comparing the central sSFR of satellite and isolated galaxies at fixed total sSFR and stellar mass.\nTo ensure that the fiber measurements are on similar scales, we match the apparent angular size of galaxy $\\mathrm{R_{50}}$ so that fibers cover similar fractions of galaxy total light.\nAn alternate aperture controlling is to match redshift, to make fibers cover the same physical scales.\nWe have tested and found that the two ways lead to the same conclusion.\n\nSpecifically, in a certain bin of stellar mass and total sSFR, we minimally trim the satellite and isolated galaxy samples to reach the same $\\mathrm{R_{50}}$ distribution in 0.2 arcsec resolution (i.e. getting the maximally overlapping distribution).\nThe trimming is done in every $\\mathrm{R_{50}}$ bin by sampling with replacement a same number (i.e. minimum of $[\\mathrm{N_{sat},N_{iso}}]$) of the isolated and satellite galaxies.\nWe repeat this matching process for 1000 times to estimate the statistical uncertainty in distribution moments (see also \\citealt{2008MNRAS.385.1903L} and \\citealt{2015MNRAS.448L..72S}).\nWe compute the median \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ for each matched isolated and satellite sample respectively, and the mean and the standard deviation of the 1000 values are taken as the final measurement and its uncertainty.\n\nIn Fig. \\ref{fig:100bs1}, we show the relation between the central sSFR, indicated by \\ensuremath{\\mathrm{D4000_n}}\\ (left panel) and \\ensuremath{\\mathrm{H\\delta_A}} (right panel), and the total sSFR for satellite and isolated galaxies matched in $\\mathrm{R_{50}}$.\nAt given total sSFR, more massive galaxies have lower central sSFR (i.e. higher \\ensuremath{\\mathrm{D4000_n}}\\ and lower \\ensuremath{\\mathrm{H\\delta_A}}).\nIt is consistent with the well established observation that massive galaxies generally show more positive sSFR profiles \\citep[e.g.,][]{2016ApJ...819...91P,2018MNRAS.474.2039E,2018ApJ...856..137W}.\n\nNoteworthily, in the lowest mass bin and at given total sSFR, satellite galaxies show prominently higher \\ensuremath{\\mathrm{D4000_n}}\\ (hereafter we term this \"the central \\ensuremath{\\mathrm{D4000_n}}\\ excess\") when compared to their isolated counterparts (left panel).\nThis signal of environmental dependence of sSFR radial gradients is strongest when the total sSFR of galaxies is well below the star formation main sequence (SFMS; shown by the blue shaded region), whose location is defined by the peak sSFR at given mass of the volume-corrected number density distribution of our sample galaxies (see also the Appendix A of \\citet{2020MNRAS.495.1958W}).\nSimilar trend is also spotted in \\ensuremath{\\mathrm{H\\delta_A}}\\ versus sSFR diagram (right panel), where the low-mass satellite galaxies have systematically lower \\ensuremath{\\mathrm{H\\delta_A}}\\ values.\nThis suggests that environmental effects preferentially suppress the central star formation of galaxies, making the sSFR profile gradient more positive in a relative sense.\nConclusion remains the same if we use total SFR derived from a different recipe, for example measured directly from UV and MIR luminosity (see Appendix \\ref{app:nuvmir}).\n\n\n\\subsection{The dependence on galaxy environment}\n\\label{subsec:env}\n\nIn this section we further explore what suppresses the central star formation in low-mass satellite galaxies by studying how the \\ensuremath{\\mathrm{D4000_n}}\\ excess correlates with galaxy environment.\nWe investigate this environmental dependence in two sSFR windows: $10^{-10.4}-10^{-9.4}\\,\\mathrm{yr}^{-1}$ and $10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$.\nThese two windows respectively cover normal star-forming galaxies around the SFMS, and galaxies below the SFMS but still with detectable star formation activity.\nOur galaxies in the low sSFR bin have a median NUV-r colour index of $\\sim4$ which falls onto the conventional green valley on the colour-magnitude diagram (e.g., as in \\citealt{2007ApJS..173..267S}).\nWe use three parameters to quantify environment of satellites: the halo mass of the group \\ensuremath{\\mathcal{M}_\\mathrm{h}}, the normalized projected distance to the central galaxy \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ (which is effectively the distance to the halo center\\footnote{In groups with few members the weighted-geometric center can be a better tracer of the bottom of the group potential well, as there may not be a dominating central galaxy. We have tested for small groups using this alternate definition of group center and found consistent results that leave our conclusion unchanged.}) and the group richness \\ensuremath{N_\\mathrm{member}}\\ (i.e. number of galaxies within the group).\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.95\\textwidth]{DD4000_3Envs3}\n \\caption{\n Central \\ensuremath{\\mathrm{D4000_n}}\\ excess as a function of the halo mass (\\ensuremath{\\mathcal{M}_\\mathrm{h}}, left panel), the normalized projected distance to the central galaxy (\\ensuremath{R_\\mathrm{bcg}/R_{200}}, middle panel) and the group richness (\\ensuremath{N_\\mathrm{member}}, right panel). Galaxies are split into three stellar mass bins (blue, yellow and red lines), and two sSFR ranges (top and bottom panels), respectively. The horizontal error bar shows the width of the parameter bin, and the vertical error bar shows the statistical uncertainty estimated with bootstrapping technique. The bins with less than 20 galaxies are discarded.\n\t\t}\n\t\t\\label{fig:Edep}\n\t\\end{center}\n\\end{figure*}\n\nFig. \\ref{fig:Edep} represents the \\ensuremath{\\mathrm{D4000_n}}\\ excess as function of these group properties in low and high sSFR bins. The \\ensuremath{\\mathrm{D4000_n}}\\ excess is again calculated by comparing satellite galaxies and their matched isolated counterparts with $\\Delta\\log(\\ensuremath{\\mathcal{M}_\\star}) < 0.1$, $\\Delta\\log(\\mathrm{sSFR}) < 0.1$ and $\\Delta \\mathrm{R_{50}} < 0.2\\,\\mathrm{arcsec}$.\nFor galaxies with low sSFR, the \\ensuremath{\\mathrm{D4000_n}}\\ excess apparently correlates with all environment properties.\nThe satellite galaxies have redder cores (i.e. more suppressed central star formation) when they are: 1) in more massive halos; 2) closer to the center of galaxy groups; 3) in groups with more members.\nThe correlation steepens toward lower stellar mass.\n\nFor galaxies in the high sSFR bin, the environmental dependence is much weaker.\nClear \\ensuremath{\\mathrm{D4000_n}}\\ excess only exists in the largest \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ and \\ensuremath{N_\\mathrm{member}}\\ bins, and only for low-mass galaxies.\nWe note that for massive galaxies with high sSFR shown by the red lines in the bottom panels, in the most massive groups, the \\ensuremath{\\mathrm{D4000_n}}\\ signal is not excess but deficiency, indicating enhanced central star formation compared with galaxies in the field environment.\n\nTo further break down the environmental dependences of the \\ensuremath{\\mathrm{D4000_n}}\\ excess of low-mass satellites of low sSFR, in Fig. \\ref{fig:Edep1} we apply more environment control to the correlation between the \\ensuremath{\\mathrm{D4000_n}}\\ excess and certain environment properties.\nIn the first panel, we show the central \\ensuremath{\\mathrm{D4000_n}}\\ excess as a function of \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ in bins of high/low \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ and \\ensuremath{N_\\mathrm{member}}\\ respectively (split by the median value, i.e. 0.44 and 30, of the low-mass and low-sSFR satellite sample).\nThe second and third panel show the other two dependences on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ and \\ensuremath{N_\\mathrm{member}}\\ with further environment control in a similar manner.\nWe note that there are 88 individual massive groups included in the $\\mathcal{M}_h>10^{13.7}\\,\\mathcal{M}_{\\odot}$ bin, making the result in this bin statistically representative for large groups.\nThe relations for the low-mass and low-sSFR satellites without further environment control in Fig. \\ref{fig:Edep} are shown for reference by black symbols.\n\nWe find that \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ and \\ensuremath{N_\\mathrm{member}}\\ are almost interchangeable.\nIn the first panel, the relations of \\ensuremath{\\mathrm{D4000_n}}\\ excess and \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ in bins of high/low \\ensuremath{N_\\mathrm{member}}\\ (light red and light blue) are just the general relation (black symbols) at higher and lower \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ end.\nThe same case is seen in the third panel, and in the second panel the binning by \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ gives the same relations.\nThis is resulted from the tight correlation between \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ and among satellite galaxies with non-zero host halo mass catalogued in \\citealt{2012ApJ...752...41Y}, the Spearman rank correlation coefficient between \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ or \\ensuremath{N_\\mathrm{member}}\\ is as high as 0.92.\n\nComparing between the left two panels, generally the \\ensuremath{\\mathrm{D4000_n}}\\ excess depends more on \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ than on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ .\nThe \\ensuremath{\\mathrm{D4000_n}}\\ excess is small in less massive halos, nearly irrespective of groupcentric radius.\nThis is shown by the overlapping dark red and dark blue bands at low \\ensuremath{\\mathcal{M}_\\mathrm{h}}\\ end in the first panel and also the relatively flat relation in the second panel (dark blue band).\n\\ensuremath{\\mathrm{D4000_n}}\\ excess is present in massive halos even at very large \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ .\nThe dependence on \\ensuremath{R_\\mathrm{bcg}/R_{200}}\\ starts to become significant in massive halos, especially at the center where we observe \\ensuremath{\\mathrm{D4000_n}}\\ excess as high as 0.2.\nThese results together seem to suggest the first-order importance of halo mass and also that the physical mechanism gets strongly enhanced in cluster center.\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.99\\textwidth]{DD4000_Mhrc1_scatter3}\n\t\t\\caption{\n\t\tThe same as in Fig. \\ref{fig:Edep} but only for low-mass low-sSFR galaxies (black squares). The sample is further split into sub-samples with different environmental parameters (red and blue stripes, see legend in each panel).\n\t\t}\n\t\t\\label{fig:Edep1}\n\t\\end{center}\n\\end{figure*}\n\n\nTaking a step further we introduce the relative velocity of satellites into the analysis to try to link the central \\ensuremath{\\mathrm{D4000_n}}\\ excess to the dynamic status of satellites in their host halos.\nFig. \\ref{fig:psd} shows low-mass satellites ($10^9-10^{9.8}\\,\\mathcal{M}_{\\odot}$) in massive halos ($\\mathcal{M}_h>10^{13.7}\\,\\mathcal{M}_{\\odot}$) on the phase-space diagram \\citep[i.e. normalized relative velocity versus normalized projected distance; See also][]{2015MNRAS.448.1715J}.\nWe calculate the absolute difference of line-of-sight velocities between the satellite and cluster as $|\\Delta v| = c|z-z_c|/(1+z_c)$ where $z_c$ is the luminosity weighted redshift of cluster member galaxies.\nThe velocity difference is then normalized by the cluster velocity dispersion $\\sigma_{200}$ (equation 6 of \\citealt{2007ApJ...671..153Y}).\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_o_final1.pdf}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_s_final1.pdf} \\\\\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_o_SF_final1.pdf}\n\t\t\\includegraphics[width=0.45\\textwidth]{PSD_s_SF_final1.pdf}\n\t\t\\caption{\n\t\t The central \\ensuremath{\\mathrm{D4000_n}}\\ excess on the phase-space diagram for the low-mass galaxies ($10^9 - 10^{9.8}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$) in massive halos ($\\ensuremath{\\mathcal{M}_\\mathrm{h}} > 10^{13.7}\\,\\ensuremath{\\mathcal{M}_\\mathrm{\\odot}}$). y-axis is the line-of-sight velocity of satellites relative to their host clusters, normalized by cluster velocity dispersion.\n The upper and lower row are for galaxies in the low and high sSFR range as in Fig. \\ref{fig:Edep}.\n The right panels are locally averaged version of the left panels, showing the underlying trend.\n Satellites in the lower triangle region are considered a virialized part of the clusters.\n The black dashed curve is the projected escape velocity normalized by cluster velocity dispersion and galaxies far beyond the curve are not gravitationally bound by the clusters.\n\t\t}\n\t\t\\label{fig:psd}\n\t\\end{center}\n\\end{figure*}\n\nWe mark the boundary of virialized area by a black straight line, below which galaxies are approximately within the part of the cluster in dynamical equilibrium.\nThe black dashed curve represents the normalized projected escape velocity $v_\\mathrm{esc}/\\sigma_{200}$ based on a Navarro-Frenk-White halo \\citep{1996ApJ...462..563N} of concentration $c_\\mathrm{NFW}=6$.\nStarting from the mass profile of a halo one can calculate the potential and thus the escape velocity:\n\\begin{equation}\\label{equa:vesc}\n\\qquad\nv_\\mathrm{esc,3D}=\\sqrt{\\frac{2GM_{200}}{R_{200}}\\times g(c_\\mathrm{NFW}) \\times \\frac{ln(1+c_\\mathrm{NFW}x)}{x}}\n\\end{equation}\nwhere\n\\begin{equation}\n\\qquad\ng(c_\\mathrm{NFW})=\\Big [ ln(1+c_\\mathrm{NFW})-\\frac{c_\\mathrm{NFW}}{1+c_\\mathrm{NFW}} \\Big ] ^{-1}\n\\end{equation}\nand\n\\begin{equation}\n\\qquad\nx=r_\\mathrm{3D}/R_{200}\n\\end{equation}\nWe project velocity along the line of sight and project the distance on the sky plane using the average relations $v_\\mathrm{esc} = \\frac{1}{\\sqrt{3}}v_\\mathrm{esc,3D}$ and $r = \\frac{\\pi}{4}r_\\mathrm{3D}$.\n\nThe same as previous analyses we match the satellites by isolated galaxies of stellar mass, sSFR and $R_{50}$ differences less than 0.1 dex, 0.1 dex and 0.2 arcsec respectively.\nThe central \\ensuremath{\\mathrm{D4000_n}}\\ excess averaged over 100 times matching is recorded for every satellite and we do this analysis separately for satellites in low ($10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$; the upper row of Fig. \\ref{fig:psd}) and high ($10^{-10.4}-10^{-9.4}\\,\\mathrm{yr}^{-1}$; the bottom row of Fig. \\ref{fig:psd}) sSFR ranges.\nThe right column shows the locally averaged results using the locally weighted regression method LOESS by \\citet{Cleveland1988} as implemented\\footnote{We use the Python package \\textsc{loess} v2.0.11 available from https://pypi.org/project/loess/} by \\citet{2013MNRAS.432.1862C}, to reveal the underlying trend.\nWe adopt a smoothing factor \\texttt{frac} = 0.3, and a linear local approximation, but the conclusion does not depend on these certain parameter choices.\n\nIn the upper right panel of Fig. \\ref{fig:psd} for satellites of low sSFR, LOESS reveals certain structure of \\ensuremath{\\mathrm{D4000_n}}\\ excess at low groupcentric radii.\nThe largest \\ensuremath{\\mathrm{D4000_n}}\\ excess is not seen evenly for all the galaxy populations near cluster center, but is particularly linked with the satellites of either small or large relative velocities.\nSatellites of intermediate velocities of about $|\\Delta v|/\\sigma_{200} = 0.7$ only show moderate \\ensuremath{\\mathrm{D4000_n}}\\ excess comparable to those at much larger groupcentric radii.\nThis result indicates an apparent connection between the \\ensuremath{\\mathrm{D4000_n}}\\ excess and the orbit configuration of satellites.\nIn the lower right panel for satellites of high sSFR, which are probably in the early stages of environmental processing, the \\ensuremath{\\mathrm{D4000_n}}\\ excess is low but noteworthily shows the same pattern as the low-sSFR satellites.\nThe consistency suggests that the observed pattern of locally averaged \\ensuremath{\\mathrm{D4000_n}}\\ excess reflects the true trend underlying the noisy data in the left column.\n\n\n\\section{Summary and discussion}\n\\label{sec:discuss}\n\nIn this paper, we have investigated the environmental dependence of the relative difference in sSFR radial gradient for 0.1 million SDSS galaxies at $z \\sim 0$.\nWe compare the central sSFR, indicated by indices \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ measured from SDSS fiber spectra, between satellite and isolated galaxies at the same total sSFR, so that we extract how galaxy environment affects the sSFR radial gradient in a relative sense.\nWith fiber coverage properly matched for the comparison, the large sample size facilitates the study of detailed correlations with a variety of environmental properties when the mass and star formation level of galaxies are controlled.\nOur findings are summarized as below:\n\\begin{enumerate}[(i)]\n \\item Low-mass satellite galaxies ($\\mathcal{M}_{\\star}=10^9-10^{9.8}\\,\\mathcal{M}_{\\odot}$) below the SFMS have lower central sSFR compared to isolated counterpart galaxies at given total sSFR (Fig. \\ref{fig:100bs1}).\n \\item The phenomenon of more suppressed central star formation (i.e. the central \\ensuremath{\\mathrm{D4000_n}}\\ excess at given total sSFR) among low-mass satellites becomes more noticeable in host halos of higher mass (equivalently of more member galaxies), and when closer to the group center, while more massive galaxies below the SFMS show consistent trend but with smaller amplitude (Fig. \\ref{fig:Edep}).\n The dependence on halo mass is of first-order importance and the dependence on groupcentric radius is secondary (Fig. \\ref{fig:Edep1}).\n \\item In the center of massive halos, phase-space diagram reveals that the phenomenon is strongest among satellites of either lowest or highest relative velocities to the halo (Fig. \\ref{fig:psd}), indicating the connection between the suppressed central star formation and orbital configuration of satellite galaxies.\n\\end{enumerate}\n\n\n\\begin{figure}\n \\centering\n \\includegraphics[width=\\linewidth]{Sigma_fiber_lm.pdf}\n \\caption{The stellar mass surface density within the fiber aperture as a function of total sSFR for low-mass satellite and isolated galaxies.}\n \\label{fig:fiber_mu}\n\\end{figure}\n\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.9\\textwidth]{Environmentx2.pdf}\n\t\t\\caption{\n A schematic illustration of the proposed scenario explaining how gas stripping in massive halos can render the quenching of satellites more inside-out.\n An actively star-forming galaxy in isolation (panel $\\spadesuit$) has an extended cold gas disk and a more extended hot gas halo.\n Cold gas disk is replenished by infalling gas cooled out of hot gas which also drives significant gas radial flow (denoted as white arrows) on the disk due to mismatch of angular momentum.\n If such a galaxy falls into the hot gas halo of another much more massive galaxy and becomes a satellite ($\\spadesuit \\Rightarrow \\clubsuit$) both its hot gas halo and the outskirt of its cold gas disk can be largely stripped during orbiting motion after which gas infalling and gas radial inflowing stop.\n In such starvation the central part of this galaxy quenches first due to the high star formation efficiency there (panel $\\diamondsuit$).\n By contrast if this galaxy keeps evolving in isolation ($\\spadesuit \\Rightarrow \\heartsuit$), the diminishing gas cooling and infalling (expected for quenching) do not terminate gas radial inflow immediately.\n Thus the central part can still be replenished during the overall quenching and so the quenching can be radially synchronized.\n\t\t}\n\t\t\\label{fig:illus}\n\t\\end{center}\n\\end{figure*}\n\n\\subsection{The physical mechanisms}\\label{subsec:phy}\n\nThe more suppressed central star formation of satellites compared to field galaxies of the same total sSFR suggests that additional physical processes in galaxy groups make the quenching of star formation happen more inside-out.\nThe environmentally promoted inside-out quenching is especially shown by the sharp increase of central \\ensuremath{\\mathrm{D4000_n}}\\ with decreasing total sSFR among the low-mass satellites (Fig. \\ref{fig:100bs1}).\nThe SFR profiles of low-mass satellites can even deviate more from the profiles of their field counterparts because we find, as shown in Fig. \\ref{fig:fiber_mu}, that the central stellar mass density within fiber area of low-mass satellites of low sSFR is smaller than field galaxies which is consistent with \\citealt{2017MNRAS.464.1077W}.\nThe stellar mass measurements inside fiber area are taken from the MPA-JHU catalogue, with a small mean difference of $\\sim0.1$ dex compared to GSWLC stellar mass \\citep{2016ApJS..227....2S}.\nThe lower central stellar mass density of satellites seems to result from the integrated effect of their suppressed central star formation.\n\nSo far it is unclear, among miscellaneous physical processes occurring in group environment, which mechanism is mainly responsible for the central \\ensuremath{\\mathrm{D4000_n}}\\ excess of low-mass satellite galaxies.\nIn Fig. \\ref{fig:Edep1}, we see that the high \\ensuremath{\\mathrm{D4000_n}}\\ excess is preferentially found in massive clusters, especially in the cluster center.\nThe strongest effect in the cluster center is seen among satellites with either lowest or highest velocities on the phase-space diagram.\nThe former satellite population with lowest velocity generally have low orbital energy as a result of their low potential energy (i.e. at the bottom of potential well) and low kinetic energy.\nSuggested by simulations \\citep[e.g.,][]{2013MNRAS.431.2307O}, these satellites joined the cluster during ancient infalls and have thus been trapped in the center for long time.\nThe latter satellite population with high velocity in the vicinity of cluster center are suggested to be recent infallers that are experiencing their first or second pericenter.\nProjection of velocity and position of satellites can smear such connection between orbital properties and the position on phase-space diagram.\nHowever the clear consistency across satellite populations of high and low sSFR living in a large number of different groups rejects the possibility that the result is due to random projection.\nFrom the perspective of environmental effect, the former satellite population experience in long term the enormous tidal force from the massive cluster, which anti-scales with cubic groupcentric distance and can play an important role in shaping the star formation and morphology of galaxies \\citep{1984ApJ...276...26M,1990ApJ...350...89B}.\nThe latter satellite population, when they pass the orbit pericenter, on short timescales not only do they feel the strong cluster tidal field but also large ram pressure due to both the high density of intracluster medium and their high velocities.\nThe middle panel of Fig. \\ref{fig:Edep1} shows that there is non-negligible \\ensuremath{\\mathrm{D4000_n}}\\ excess at even the outskirt of massive halos, where the cluster tidal field weakens dramatically.\nWhile hydrodynamic gas stripping can still be effective in the outskirt of halos for satellites with high velocities, and some cases were indeed caught in action \\citep[e.g.,][]{2018MNRAS.476.4753J}.\nThis also coincides with the fact shown in the upper right panel of Fig. \\ref{fig:psd} where we see that at large groupcentric radii satellites of higher velocities manifest larger central \\ensuremath{\\mathrm{D4000_n}}\\ excess.\nThese together seem to suggest that both tidal and hydrodynamic interactions are responsible for the phenomenon of suppressed central star formation of satellite galaxies.\n\n\\begin{figure*}\n\t\\begin{center}\n\t\t\\includegraphics[width=0.96\\textwidth]{xxx561.pdf}\n\t\t\\caption{\n Comparison between the probability density functions (filled contours) of low-mass satellite (right panel) and isolated (left panel) galaxies in sSFR range $10^{-11.4}-10^{-10.4}\\,\\mathrm{yr}^{-1}$ on the \\ensuremath{\\mathrm{H\\delta_A}}-\\ensuremath{\\mathrm{D4000_n}}\\ plane with evolutionary tracks (black lines) generated from BC03 models.\n The probability density functions are derived via kernel density estimation with $V_{\\mathrm{max}}$ corrections.\n The contour enclosing 68\\% of total probability is shown as a dotted line.\n The ridges (shaded regions) of density distributions are identified, following \\citealt{2016ApJ...823...18C}, as the most prominent track for each galaxy population.\n The uncertainty of ridges is assessed by 1,000 bootstrap samples and one sigma confidence intervals are shown.\n Model tracks are generated by convolving model spectra with exponentially declining SFHs ($\\mathrm{SFR}\\,\\propto\\,\\mathrm{exp}(-t/\\tau)$) of short (dashed lines; from up to down, $\\tau=0.2,0.4,0.6\\,\\mathrm{Gyr}$) and long (solid lines; from up to down, $\\tau=2,4,6\\,\\mathrm{Gyr}$) characteristic timescales.\n See more details of model tracks in the text.\n\t\t}\n\t\t\\label{fig:sfhs1}\n\t\\end{center}\n\\end{figure*}\n\nIt is known that tidal interactions can strip the loosely bound peripheral gas of galaxies in synergy with the hydrodynamic gas stripping, which together result in galaxy starvation and prevent further gas accretion \\citep{2002ApJ...577..651B}.\nIn starvation, galaxies tend to quench inside-out due to the one order of magnitude faster gas depletion in the center than in the outer part \\citep{2008AJ....136.2782L}.\nStarvation promotes inside-out quenching also because that the radial gas inflows on galactic disks may be largely reduced.\nAs accretions of gas from gaseous halos can drive radial gas inflows due to even just a small mismatch of angular momentum between the accreted gas and the disks \\citep{2016MNRAS.455.2308P}.\n\\citealt{2012MNRAS.426.2266B} reports that this process is one of the most dominant processes inducing radial inflows, making the process an important channel of fuelling central star formation.\nSo the central star formation is less supported in a satellite with a largely stripped gaseous halo (i.e. in starvation).\nBy contrast, during the quenching of isolated galaxies, as long as the hot gaseous halos still exist, their central parts are more likely to be fed by cold gas compared to those highly stripped satellites.\nWe illustrate this scenario in Fig. \\ref{fig:illus} ($\\spadesuit - \\clubsuit - \\diamondsuit$ for satellites and $\\spadesuit - \\heartsuit$ for isolated galaxies).\n\nStarvation as an explanation for the phenomenon shown in this work seems to be in line with \\citealt{2015Natur.521..192P, 2019MNRAS.tmp.2878T} which point out the major role of starvation in quenching the low-mass galaxy populations and the growing significance of starvation in denser environments.\nThough not reporting on the spatial distribution of star formation, \\citealt{2017MNRAS.464..508B} found that the same mechanism drives the enhancement of gas metallicity of satellite galaxies in the EAGLE simulations \\citep{2015MNRAS.446..521S}.\nThey found that the central gas metallicity is enhanced effectively when starvation suppresses the radial inflow of gas, which is predominantly metal-poor.\n\n\\subsubsection{Evidence in recent star formation history}\\label{subsubsec:sfhs}\n\nThe scenario above can have detectable consequences for the recent star formation history (SFH) in the central part of satellite and isolated galaxies.\nWe probe the recent SFH by the combination of \\ensuremath{\\mathrm{D4000_n}}\\ and \\ensuremath{\\mathrm{H\\delta_A}}\\ which trace stellar populations of different ages (see also \\citealt{2003MNRAS.341...33K}).\nFig. \\ref{fig:sfhs1} shows satellite and isolated galaxies of low mass and low sSFR on the \\ensuremath{\\mathrm{H\\delta_A}}-\\ensuremath{\\mathrm{D4000_n}}\\ plane, overlaid with evolutionary tracks of \\citealt{2003MNRAS.344.1000B} models (BC03).\n\nThe probability density function of galaxies (filled contours) is derived via kernel density estimation with $V_{\\mathrm{max}}$ corrections.\nWe use Gaussian kernel of width determined by Scott's rule \\citep{Scott2015Multivariate}.\nThen, we identify the ridge line (following \\citealt{2016ApJ...823...18C} and is shown by hatched area) for each density distribution as the representative track for the galaxy population.\nIn producing model tracks of exponentially declining SFHs (black dashed lines: declining timescale $\\tau=0.2,0.4,0.6\\,\\mathrm{Gyr}$; black solid lines: $\\tau=2,4,6\\,\\mathrm{Gyr}$), we use MILES stellar library of solar metallicity and Padova 1994 library for stellar evolution prescription.\nUsing other empirical or theoretical stellar libraries and other stellar evolution prescriptions provided in BC03 generates model tracks significantly incompatible with our data.\n\nThe contours show that, compared to isolated galaxies (left panel), a significantly higher fraction of satellites (right panel) populate the lower right area indicating again the suppressed central star formation of satellite galaxies.\nMoreover, the distribution of isolated galaxies is more concentrated around the ridge while that of satellites has a broader shape.\nThis may imply that group environment can diversify the SFH of galaxies.\nNoteworthily, while the ridge line of satellites can be overall matched by continuously declining SFHs of long timescales over Gyrs, the ridge line of isolated galaxies deviates obviously toward models of shorter timescales.\nSuch deviation is due to a non-negligible fraction of isolated galaxies with high \\ensuremath{\\mathrm{H\\delta_A}}\\ at given \\ensuremath{\\mathrm{D4000_n}}.\nAs \\ensuremath{\\mathrm{H\\delta_A}}\\ mainly traces A-type young stars, this elevated \\ensuremath{\\mathrm{H\\delta_A}}\\ indicates the significance of recent burst of star formation (see also the Fig. 6 in \\citealt{2003MNRAS.341...33K}) in the central part of isolated galaxies.\n\nThe observed difference in recent SFH between satellite and isolated galaxies fits into the scenario described before.\nThe existing hot gas halo of low-sSFR isolated galaxies can still fuel some small bursts of star formation, when the inefficient gas cooling (expected from low sSFR) is only able to drive gas radial flows episodically.\nBy contrast, the central part of satellites in starvation are more likely to turn red quiescently and smoothly when without further gas supply.\n\n\\subsection{Comparison with previous works}\\label{subsec:comparison}\n\nThe discussion above does not incorporate gas stripping caused outside-in quenching as a major driver of the cessation of total star formation in group environments.\nInstead, environments are observed to render quenching of low-mass galaxies more inside-out.\nHowever, it has to be clarified that the results do not indicate that gas stripping does not influence outer star formation to any extent.\nThe results only suggest that the inner parts of galaxies contribute primarily to the total decline of star formation under environmental effects, while the suppression of star formation in the outskirt is only secondary.\nThe conclusion is echoed by \\citealt{2019ApJ...872...50L}, who found that inside-out quenching is the highly dominant channel even for satellites in massive halos and the fraction of galaxies experiencing outside-in quenching does not depend on halo mass at all.\n\nThe same conclusion was not reached by many other works in the literature, which are also contradicting among themselves.\nUsing 1,494 MaNGA galaxies, \\citealt{2018MNRAS.476..580S} compared the sSFR radial profiles of central and satellite galaxies.\nTheir Fig. 7 indicates that, in the intermediate and high mass bins, the sSFR of satellites are systematically lower than the central galaxies particularly outside 0.5 effective radius.\nFor galaxies in the low-mass bin, this pattern appears to be reversed, showing more inside-out quenching for satellites.\nIn spite of the general consistency among low-mass galaxies between \\citealt{2018MNRAS.476..580S} and our work, our data do not indicate the outside-in quenching for massive satellite galaxies.\n\\citealt{2019A&A...621A..98C} used a smaller sample of 275 late-type CALIFA galaxies and carried out similar analyses.\nAs entirely opposed to the results in \\citealt{2018MNRAS.476..580S}, for low-mass galaxies in groups they found more suppressed star formation in the outer parts compared with galaxies in the field, and for the massive galaxies, more suppressed in the inner parts.\nRather than being suppressed, the low-mass satellite galaxies studied by \\citealt{2019MNRAS.489.1436L} show centrally enhanced star formation in the densest environments.\nApart from these recent works based on IFS data, \\citealt{2009MNRAS.394.1213W} studied the g-r colour profiles of galaxies in the SDSS Data Release 4.\nThey found outside-in quenching pattern for the satellite galaxies in their high mass bin.\nIn their low-mass bin, the colour profiles of the satellites are globally redder compared to the central galaxies.\nTheir sample almost does not cover the low-mass range of our data.\n\nThe intricate discrepancies between works in the literature can result from a variety of reasons.\nNoteworthily, the samples were selected with diverse criteria.\nFor example, \\citealt{2017MNRAS.464..121S} only selected galaxies with central regions classified as star-forming by emission line diagnostics.\nThis may have biased their sample against centrally quenched galaxies, which would have weak emission lines in the center.\n\\citealt{2019MNRAS.489.1436L} introduced thresholds for signal to noise of emission lines during the sample selection.\nThe sample of \\citealt{2019A&A...621A..98C} was preselected by Hubble type.\nMoreover, a problem in some previous studies is that sSFR radial profiles are not compared at the same level of total sSFR for galaxies in different environments.\nWhile many IFS studies \\citep[e.g.,][]{2018MNRAS.477.3014B,2018ApJ...856..137W} have shown that sSFR radial gradients clearly depend on the level of total sSFR.\nTherefore, extracting a more unambiguous dependence on environment needs better control of total sSFR, as we have done in this work.\n\n\n\\section*{Acknowledgements}\nBW acknowledges the elaborated and constructive comments from the anonymous referee which significantly helped improve this manuscript.\nBW thanks Li Shao for his insightful and decisive comments on this work, and thanks Jing Wang, Min Du, and Jingjing Shi for the fruitful discussions with them.\n\nFunding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England.\nThe SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.\n\n\n\\section*{Data Availability}\nThe data used in this work are all publicly available.\nWe take the MPA-JHU catalogue from https://wwwmpa.mpa-garching.mpg.de/SDSS/DR7/ and the GSWLC catalogue from https://salims.pages.iu.edu/gswlc/ and the group catalogue from https://gax.sjtu.edu.cn/data/Group.html for SDSS galaxies.\n\n\n\n\n\n\\bibliographystyle{mnras}\n", "meta": {"timestamp": "2022-08-31T02:08:17", "yymm": "2208", "arxiv_id": "2208.14004", "language": "en", "url": "https://arxiv.org/abs/2208.14004"}} {"text": "\\section{Introduction}\n\\label{sec:intro}\n\nMany areas of environmental science seek to predict a space-time variable of interest from observations at scattered points in the space-time field of study. Among modern techniques proposing efficient methods for estimation and prediction in a spatio-temporal framework, there is a clear distinction between two possible ways of constructing and treating spatio-temporal models \\citep{wikle2010}: either one follows the traditional geostatistical paradigm, using joint space-time covariance functions (see for example \\citet{cressie99}, \\citet{gneiting2002}, \\citet{stein2005}), or one uses dynamical models, by combining time series and spatial statistics (see for example \\citet{wikle99}, \\citet{sigrist2012}).\n\nWhile the theoretical aspects of spatio-temporal geostatistics show good progress \\citep{cressie2011}, implementations lack behind. The geostatistical paradigm can be computationally expensive for large spatio-temporal datasets, due to the factorization of dense $(N_SN_T,N_SN_T)$ covariance matrices, which is of the order of $\\mathcal{O}((N_SN_T)^3)$, $N_S$ and $N_T$ being the number of points in space and time respectively. \\citet{banerjee2014} called this issue the ``big $n$ problem''. Moreover, it is hard to define complex space-time covariance functions. For this reason, separable space-time covariance functions have often been applied to spatio-temporal models in order to take advantage of their computational convenience, even when they are not realistic in describing the processes due to the impossibility of allowing space-time interaction in the covariance. Recent studies have focused on constructing non-separable models, which are physically more realistic, albeit computationally more expensive. Non-separable space-time covariance models can be constructed from Fourier transforms of permissible spectral densities, mixtures of separable models, and partial differential equations (PDEs) representing physical laws \\citep{chen2021,lindgren2022}. They can be fully symmetric or asymmetric, stationary or non-stationary, univariate or multivariate, and in the Euclidean space or on the sphere. See \\citet{porcu202130} for a recent comprehensive review.\n\nIn this paper, we follow the dynamic approach that makes use of physical laws and study models which are defined through Stochastic Partial Differential Equations (SPDEs). The SPDE approach relies on the representation of a continuously indexed Gaussian Field (GF) as a discretely indexed random process, i.e. a Gaussian Markov Random Field (GMRF, see \\citet{rue2005}). Passing from a GF to a GRMF, the covariance function and the dense covariance matrix are substituted respectively by a neighborhood structure and a sparse precision matrix. The advantage of using GMRFs is that the use of sparse precision matrices implies computationally efficient numerical methods, especially for matrix factorization. The link between GF and GMRFs in the purely spatial case has been pioneered by \\citet{lindgren2011}, who proposed to construct a GMRF representation of the spatial Mat\\'ern field on a triangulated lattice of the domain through the discretization of a diffusion SPDE with a Finite Element Method (FEM). We refer to \\citet{bakka2018tuto} for a simple explanation of FEM applied to the spatial SPDE or to Section \\ref{sec:discretization} for a detailed generalization to spatio-temporal SPDE. \n\nIn the spatial framework, major mathematical and algorithmic advances in the SPDE approach have been made \\citep{pereira2019thesis, pereira2019, fulgstad2015}, making it possible to efficiently process very large datasets, even in the presence of non-stationarities and varying local anisotropy. The development of SPDE-based approaches to Gaussian processes has led to several practical solutions, among which we find the R package for approximate Bayesian inference R-INLA \\citep{lindgren2015} that uses SPDEs to sample from spatial and separable spatio-temporal models. \n\nWhen generalizing to the spatio-temporal framework, a direct space-time formulation of the SPDE approach was first suggested in \\citet{lindgren2011}, without any precise detail on estimation and prediction methods. The SPDE approach was coupled with the Bayesian framework by \\citet{cameletti2011} to provide a separable space-time model. Non-separable spatio-temporal models have been elaborated in \\citet{krainski2019} and \\citet{bakka2020diffusion} as a spatio-temporal generalization of the diffusion-Mat\\'ern model of \\citet{lindgren2011}. \n\nIn all the approaches overviewed above, the space-time processes are symmetrical in the sense that the spatio-temporal covariance does not change when the sign of the space and/or time lag changes. However, atmospheric and geophysical processes are often asymmetric due to transport effects, such as air and water flows. \\citet{vergara2022general} defined new spatio-temporal models incorporating the physical processes linked to the studied phenomena (advection, diffusion, etc.). Problems relating to the estimation of the parameters and the conditioning to the observed data remained however open. \\citet{sigrist2015} built non-symmetrical and non-separable space-time Gaussian as a solution to an advection-diffusion SPDE with computationally efficient algorithms for statistical estimation using fast Fourier transforms and Kalman filters. The applicability of this approach remains difficult however, especially in a non-stationary context or with scattered data, as it relies on the Fourier transform of the data. \\citet{liu2020} extended the approach to spatially-varying advection-diffusion and non-zero mean source-sink, leading to a space-time covariance which is non-stationary in space. \n\nIn this wok, we propose an alternative approach for dealing with spatio-temporal SPDEs that include both a diffusion and an advection terms. In contrast to \\citet{sigrist2015}, we make use of the sparse formulation of the spatio-temporal field which is the approximate solution of the SPDE obtained by a combination of FEM and finite differences. This sparse formulation allows to get fast algorithms for parameter estimation and spatio-temporal prediction. We also treat the case of an advection-dominated SPDE, by introducing the Streamline Diffusion stabilization term in the SPDE. We apply our method to a practical dataset of solar irradiance, where a transport effect due to wind is clearly present.\n\nThe paper is organized as follows: Section \\ref{sec:spatiotemp_spde} first presents the background of the spatio-temporal SPDE approach, then defines the spatio-temporal advection-diffusion model developed in the paper and its discretization. Moreover, the stabilization of advection-dominated SPDEs is introduced. Section \\ref{sec:estimation} explores fast and scalable estimation methods, kriging formula for prediction and conditional simulations. Section \\ref{sec:application} presents an application of the proposed spatio-temporal SPDE approach to a solar radiation dataset. Section \\ref{sec:discussion} discusses the advantages and the limitations of the approach and opens the way to further works on the subject. \n\n\n\\section{The spatio-temporal advection-diffusion SPDE and its discretization}\n\\label{sec:spatiotemp_spde}\n\n\\subsection{Background}\n\\label{sec:background_spde}\n\nThe spatial SPDE-based approach is a formalism introduced by \\citet{lindgren2011} (from an idea of \\citet{whittle54,whittle63}), that considers the solution $X(\\mathbf{s})$, $\\mathbf{s} \\in \\Omega\\subseteq\\mathbb{R}^d$, $d$ being the space dimension, of the SPDE \n\\begin{align}\n(\\kappa^2 - \\Delta)^{\\alpha/2} X(\\mathbf{s}) = \\tau W(\\mathbf{\\cdot}),\n\\label{eq:spatial_spde}\n\\end{align} \nwhere $\\Delta = \\sum_{i=1}^d \\frac{\\partial^2}{\\partial s_i^2}$ is the Laplacian operator and $W(\\cdot)$ is a standard spatial Gaussian white noise defined as a (zero-mean) Generalized GRF with the property that the covariance measure of $W({\\cdot})$ over two subregions $A, B \\subseteq \\Omega$ is equal to the area measure of their intersection, $\\lvert A \\cap B \\rvert_\\Omega$. In principle, $W(\\cdot)$ has no element-wise definition, but for a sake of clarity, we will make an abuse of notation and write $W(\\sbold)$.\n\n\\citet{whittle54,whittle63} showed that $X(\\mathbf{s})$, solution to Equation \\eqref{eq:spatial_spde}, is a spatial GF with Mat\\'ern covariance\n\\begin{equation}\n C(h)=\\frac{\\sigma^2}{2^{\\nu-1}\\Gamma(\\nu)}\\left(\\frac{h}{a}\\right)^\\nu \\mathcal{K}_\\nu\\left(\\frac{h}{a}\\right),\n\\label{eq:matern_spatial} \n\\end{equation}\nhaving regularity parameter $\\nu=\\alpha-d/2>0$, scale parameter $a=\\kappa^{-1}$ and variance $\\sigma^2=\\tau^{2}(4\\pi)^{-d/2}\\Gamma(\\nu) \\Gamma(\\nu+d/2)^{-1}\\kappa^{-2\\nu}$, and where $h=\\norm{\\mathbf{s}-\\mathbf{s}'}$ is the distance between two spatial locations and $\\mathcal{K}_\\nu$ is the modified $2^\\text{nd}$ order Bessel function. In particular, when $\\nu=1/2$ we get the exponential covariance function and when $\\nu\\rightarrow +\\infty$, after proper renormalization, we get the Gaussian covariance function.\n\nIn \\citet{lindgren2011}, the smoothness parameter $\\nu$ considered in the Mat\\'ern covariance function corresponds to integer values of $\\alpha$, hence $\\nu = 0, 1, 2, \\dots$ in $\\bbR^2$. When non-integer values of $\\nu$ are introduced in the modeling, the SPDE is said to be fractional. A review of results and applications of the fractional SPDE approach is available in \\citet{bolin2019, roques2022}, but this case will not be treated further in this work.\n\nWhen generalizing to spatio-temporal processes $X(\\sbold,t)$, we consider the general framework proposed in \\citet{vergara2022general} for extending the SPDE approach to a wide class of linear spatio-temporal SPDEs. Let us denote $\\xibold \\in \\bbR^2$ a spatial frequency and $\\omega \\in \\bbR$ a temporal frequency. The space-time white noise with unit variance, denoted $W(\\sbold,t)$, is characterized by its spectral measure $d\\mu_W(\\xibold, \\omega) = (2\\pi)^{-(d+1)/2} d\\xibold d\\omega$.\nNew spatio-temporal models were obtained from known PDEs describing physical processes, such as diffusion, advection, and oscillations with stochastic forcing terms. In particular, Theorem 1 in \\citet{vergara2022general} provides necessary and sufficient conditions to the existence and unicity of stationary solutions to the following general SPDE\n\\begin{equation}\n \\left[\\frac{\\partial^\\beta}{\\partial t^\\beta} + \\mathcal{L}_g\\right] X(\\sbold,t) = W_S(\\sbold) \\otimes W_T(t).\n \\label{eq:evol_eq}\n\\end{equation}\nIn \\eqref{eq:evol_eq}, the spatial operator $\\mathcal{L}_g$ is defined using the spatial Fourier transform on $\\bbR^2$ (denoted $\\mathcal{F}_S$):\n$$\\mathcal{L}_g(\\cdot)=\\mathcal{F}_S^{-1}(g\\mathcal{F}_S(\\cdot)),$$\nwhere $g : \\bbR^2 \\to \\bbC$ is a sufficiently regular and Hermitian-symmetric function called the \\textit{symbol function} of the operator $\\mathcal{L}_g$. The temporal operator $\\frac{\\partial^\\beta}{\\partial t^\\beta}$ is \n$$\\frac{\\partial^\\beta}{\\partial t^\\beta}(\\cdot)=\\mathcal{F}_T^{-1}((i\\omega)^\\beta\\mathcal{F}_T(\\cdot)),$$\nwhere $\\mathcal{F}_T$ is the temporal Fourier transform on $\\mathbb{R}$ and where we have used the symbol function over $\\bbR$\n$$\\omega \\mapsto (i\\omega)^\\beta = \\lvert \\omega\\rvert^\\beta e^{i\\sgn(\\omega)\\beta\\pi/2}.$$\nThe spatio-temporal symbol function of the operator involved in \\eqref{eq:evol_eq} is thus\n\\begin{equation*}\n (\\xibold, \\omega) \\mapsto (i\\omega)^\\beta + g(\\xibold) = \\lvert\\omega\\rvert^\\beta \\cos\\left(\\frac{\\beta \\pi}{2}\\right) + g_R(\\xibold) + i\\left(\\sgn(\\omega)\\lvert\\omega\\rvert^\\beta \\sin\\left(\\frac{\\beta \\pi}{2}\\right) + g_I(\\xibold)\\right)\n\\end{equation*}\nwhere $g_R$ and $g_I$ are the real and imaginary part of the spatial symbol function $g(\\xibold)$. Theorem 1 and Proposition 3 in \\citet{vergara2022general} state that \\eqref{eq:evol_eq} admits a unique stationary solution if and only if $g_R$ is such that $\\lvert g_R\\lvert$ is inferiorly bounded by the inverse of a strictly positive polynomial and $g_R\\cos\\left(\\frac{\\beta \\pi}{2}\\right)\\geq 0$.\n\n\\subsection{The spatio-temporal advection-diffusion SPDE}\n\\label{sec:model}\nThe advection-diffusion equation is a Partial Differential Equation (PDE) that describes physical phenomena where particles, energy, or other physical quantities evolve inside a physical system due to two processes: diffusion and advection. Advection represents the mass transport due to the average velocity of all molecules, and diffusion represents the mass transport due to the instantaneously varying velocity of individual molecules, compared to the average velocity of the fluid as a whole.\n\nWe here introduce the non-separable space-time models (with $\\sbold \\in \\bbR^2$), derived from advection-diffusion SPDEs, i.e., advection-diffusion PDEs where a stochastic noise is added as a forcing term. The model is similar to the one presented in \\citet{lindgren2011}, \\citet{sigrist2015}, \\citet{liu2020} and \\citet{vergara2022general} and reads \n\\begin{equation}\n \\left[\\frac{\\partial}{\\partial t} + \\frac{1}{c}(\\kappa^2 - \\nabla \\cdot \\Hbold\\nabla)^{\\alpha} + \\frac{1}{c}\\gammabold \\cdot \\nabla \\right] X(\\sbold,t) = \\frac{\\tau}{\\sqrt{c}} Z(\\sbold,t),\n \\label{eq:adv_diff} \n\\end{equation}\nwhere\n\\begin{itemize}\n \\item the operator $\\gammabold \\cdot \\nabla$ models the \\textit{advection}, where $\\gammabold=(\\gamma_x,\\gamma_y) \\in \\bbR^2$ is a velocity vector;\n \\item the operator $\\nabla \\cdot \\Hbold\\nabla$ is a \\textit{diffusion} term that can incorporate \\textit{anisotropy} in the matrix $\\Hbold$. When the field is isotropic, i.e., $\\Hbold = \\lambda\\Ibold$, this term reduces to the Laplacian operator $\\lambda\\Delta$;\n \\item $\\kappa^2 >0$ accounts for \\textit{damping};\n \\item $c$ is a time-scale parameter;\n \\item $\\alpha$ is either equal to 1 in presence of diffusion or equal to 0 when there is no diffusion behavior;\n \\item $\\tau$ is a standard deviation factor;\n \\item $Z$ is a stochastic forcing term.\n\\end{itemize}\n\nSince Equation \\eqref{eq:adv_diff} explicitly models phenomena such as transport and diffusion, the parameters can be given a physical interpretation if desired, but the use of the SPDE is not restricted to situations where it is a priori known that phenomena such as transport and diffusion occur \\citep{sigrist2015}. The stochastic forcing term $Z(\\sbold,t)$ is assumed separable with\n\\begin{align*}\nZ(\\mathbf{s},t) = W_T(t) \\otimes Z_S(\\mathbf{s}),\n\\end{align*}\nwhere $Z_S$ is a spatial Generalized Random Field and $W_T$ is a temporal white noise. $Z_S$ is often chosen to be a spatial white noise, denoted $W_S$ in this case. In order to ensure a given regularity for $Z$, $Z_S$ can alternatively chosen to be a \\textit{colored noise}, such as for example the solution to the spatial Wittle-Mat\u00e9rn SPDE \\citep{lindgren2011}\n\\begin{equation}\n(\\kappa^2 - \\nabla \\cdot \\Hbold\\nabla)^{\\alpha_S/2} Z_S(\\mathbf{s}) = W_S(\\mathbf{s}).\n\\label{eq:col_noise}\n\\end{equation}\nThe spatial dependence of $Z_S$ ensures a desired regularity of the solutions, and is physically interpreted as a smoothing mechanism for introducing new mass into the system. \n\n\\begin{proposition}\n\\label{prop:spatial_trace_sep}\nLet the coefficients of the SPDE \\eqref{eq:adv_diff} be such that $\\alpha=0$ and $\\gammabold=\\0bold$; the spatial operator applied to the spatio-temporal field $X(\\sbold,t)$ is then the constant value $c^{-1}$. Let $Z(\\sbold,t)$ be a spatio-temporal noise colored in space, with $Z_S(\\sbold)$ satisfying \\eqref{eq:col_noise}. If $\\alpha_S>1$, the solution of the SPDE is a separable spatio-temporal field whose covariance is the product of an exponential temporal covariance (with scale parameter equal to $c$) and a Mat\\'ern spatial covariance \\eqref{eq:matern_spatial} with scale parameter equal to $a = \\kappa^{-1}$, regularity parameter $\\nu=\\alpha_S-1$ and marginal variance equal to \n$$\\sigma^2 = \\frac{\\tau^{2}\\Gamma(1/2)\\Gamma(\\alpha_S -1)}{\\Gamma(1)\\Gamma(\\alpha_S)8\\pi^{3/2}\\kappa^ {2(\\alpha_S-1)}\\lvert\\Hbold\\rvert^{1/2}}.$$\n\\end{proposition}\n\nIn order to get an identifiable model, we will always set $\\Hbold[1,1]=1$. Proposition \\ref{prop:spatial_trace_sep} is a particular case of Corollary 3.3 of \\citet{bakka2020diffusion}, hence the proof is not reported here. \n\nWhen $\\alpha=1$ and $\\gammabold$ is non-null, $X(\\sbold,t)$ is a non-separable spatio-temporal field. \nThe advection-diffusion equation \\eqref{eq:adv_diff} is a particular first order evolution model as in Equation \\eqref{eq:evol_eq} with $\\beta=1$. Its spatial symbol function \n$$g(\\xibold) = \\frac{1}{c}(\\kappa^2 - \\xibold^\\top \\Hbold \\xibold + i\\gammabold^\\top\\xibold),$$\nverifies the necessary and sufficient condition for existence and uniqueness of a stationary solution recalled at the end of Section \\ref{sec:background_spde}. The spatial behavior of this model is described by the spatial SPDE \\citep{vergara2022general}\n\\begin{equation*}\n\\frac{\\sqrt{2}}{c}(\\kappa^2 - \\nabla \\cdot \\Hbold\\nabla)^{1/2} X_S(\\sbold) = \\frac{\\tau}{\\sqrt{c}}Z_S(\\sbold).\n\\end{equation*}\n\n\\begin{proposition}\n\\label{prop:spatial_trace}\nLet $Z(\\sbold,t)$ be a spatio-temporal noise colored in space (such that $Z_S(\\sbold)$ satisfies \\eqref{eq:col_noise}), and let define $\\alpha_{tot}=\\alpha + \\alpha_S$. If $\\alpha_{tot}>1$, the spatial trace of the solution $X(\\sbold,t)$ of the SPDE \\eqref{eq:adv_diff} is a Mat\\'ern field with covariance \\eqref{eq:matern_spatial} with $a=\\kappa^{-1}$, $\\nu = \\alpha_{tot}-1$ and marginal variance $\\sigma^2$ equal to \n\\begin{equation}\n\t\\sigma^2 = \\frac{\\tau^{2}\\Gamma(1/2)\\Gamma(\\alpha_{tot}-1)}{\\Gamma(1)\\Gamma(\\alpha_{tot})8\\pi^{3/2} \\kappa^{2(\\alpha_{tot}-1)} \\lvert\\Hbold\\rvert^{1/2}}.\n\t\\label{eq:variance}\n\\end{equation}\n\\end{proposition}\n\nA similar proposition, along with its proof, can be found in Proposition 3.1 in \\citet{bakka2020diffusion}. \n\n\n\\subsection{Discretization}\n\\label{sec:discretization}\n\nThe advection-diffusion SPDE \\eqref{eq:adv_diff} can be discretized in time and space using a Finite Difference Method (FDM) and a Finite Element Method (FEM), respectively. Since implicit solvers are usually less sensitive to numerical instability than explicit solvers, we choose the implicit Euler scheme for the temporal discretization. This choice prevents errors in the convergence of the algorithm. The FEM method for the spatial discretization is the Continous Galerkin method with Neumann Boundary Conditions. This choice is justified in detail in \\citet{lindgren2011}.\n\nFirst, the Implicit Euler scheme is applied to the temporal derivative in order to obtain an approximation of the temporal behavior defined at each time step of the discretization. Let us consider a triangulation $\\cal T$ of the spatial domain $\\Omega \\subset \\mathbb{R}^2$ with $N_S$ vertices $\\{\\sbold_1,\\dots,\\sbold_{N_S}\\} \\subset \\mathbb{R}^2$ and define $h$ the size of the triangulation. Let then consider the purely spatial SPDE defined at each discretization time step. By writing its stochastic weak formulation, we construct a finite element representation $X_h$ of the solution of the spatial SPDE as a linear combination of piecewise linear basis functions $\\{\\psi_i\\}_{i=1}^{N_S}$ equal to $1$ at the node $\\sbold_i$ and $0$ at all the other nodes, and Gaussian distributed weights $\\{x_i\\}_{i=1}^{N_S}$, such that\n$$X_h = \\sum_{i=1}^{N_S} x_i\\psi_i.$$\nThe weights determine the values of the field at the vertices, while the values in the interior of the triangles are determined by linear interpolation. The continuous Galerkin finite dimensional solution is obtained by finding the distribution for the representation weights that fulfills the stochastic weak SPDE formulation for the specific set of test functions equal to the basis functions, $\\{\\psi_i\\}_{i=1}^{N_S}$.\n\nGiven the spatial white noise $W_S(\\mathbf{s})$, for any set of test functions $\\{\\psi_i\\}_{i=1}^{N_S}$, the integrals $$\\int_\\Omega \\psi_i(\\sbold) W_S(\\mathbf{s})d\\sbold \\quad i=1,\\dots,N_S$$\nare jointly Gaussian, with expectation and covariance measures given by\n\\begin{eqnarray*}\n\t& \\mathbb{E}\\left(\\int_\\Omega \\psi_i(\\sbold) W_S(\\mathbf{s})d\\sbold\\right)= 0,\\\\\n\t& \\text{Cov}\\left(\\int_\\Omega \\psi_i(\\sbold) W_S(\\mathbf{s})d\\sbold, \\int_\\Omega \\psi_j(\\sbold) W_S(\\mathbf{s})d\\sbold\\right) = \\int_\\Omega \\psi_i(\\sbold)\\psi_j(\\sbold)d\\sbold.\n\\end{eqnarray*}\n\n\n\\begin{proposition}\n\\label{prop:discr_adv_diff}\nLet $X(\\sbold,t)$ be the spatio-temporal process of Equation \\eqref{eq:adv_diff} with $\\alpha=1$ and $\\Hbold=\\Ibold$. Let $Z(\\sbold,t)$ be a spatio-temporal white noise $W(\\sbold,t)$. Let $\\mathcal{T}$ be a triangulation of the spatial domain $\\Omega$ of the process $X(\\sbold,t)$. Let $\\{\\psi_i\\}_{i=1}^{N_S}$ be the piecewise linear basis functions defined over $\\mathcal{T}$. Finally, let $\\Mbold$ (mass matrix), $\\Gbold$ (stiffness matrix), $\\Bbold$ (advection matrix) and $\\Kbold$ be the $(N_S, N_S)$ matrices, with entries\n\\begin{align*}\n M_{ij}&= \\int_\\Omega \\psi_i(\\sbold) \\psi_j(\\sbold)d\\sbold, \\nonumber \\\\\n G_{ij}&= \\int_\\Omega \\nabla\\psi_i(\\sbold)\\cdot \\nabla\\psi_j(\\sbold)d\\sbold, \\\\ \n B_{ij}&= \\int_\\Omega \\gammabold\\cdot\\nabla\\psi_i(\\sbold)\\psi_j(\\sbold)d\\sbold, \\\\\n (K_{\\kappa^2})_{ij} &= \\kappa^2 M_{ij} + G_{ij}.\n\\end{align*}\nThen, the continuous Galerkin finite element solution vector $\\xbold = \\{x_i\\}_{i=1}^{N_S}$, defined on the vertices of the triangulation $\\mathcal{T}$ satisfies at each discretization time step $(t+dt)$\n\\begin{equation}\n\\left(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold)\\right)\\mathbf{x}_{t+dt}=\\Mbold\\mathbf{x}_{t}+\\frac{\\tau\\sqrt{dt}}{\\sqrt{c}} \\Mbold^{1/2}\\mathbf{z}_{t},\n\\label{eq:implicit_scheme}\n\\end{equation}\nwhere $\\mathbf{z}_t$ is a standard independent Gaussian vector, $\\mathbf{z}_t \\sim \\mathcal{N}(\\0bold,\\Ibold)$, and $\\Mbold^{1/2}$ is any matrix such that $\\Mbold^{1/2}\\Mbold^{1/2} = \\Mbold$. \nWhen the noise on the right-hand side is colored in space, the discretization reads\n\\begin{equation*}\n\t\\left(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold)\\right)\\mathbf{x}_{t+dt}=\\Mbold\\mathbf{x}_{t}+\\frac{\\tau\\sqrt{dt}}{\\sqrt{c}} \\Lbold_S^\\top\\mathbf{z}_{t},\n\\end{equation*}\nwhere $\\Lbold_S$ is the Cholesky decomposition of the discretized spatial precision matrix $\\Qbold_{S}$ of the solution of the spatial SPDE $(\\kappa -\\nabla \\cdot \\Hbold \\nabla)^{\\alpha_S/2}X_S(\\sbold) = W(\\sbold)$, obtained with the Continuous Galerkin FEM \\citep{lindgren2011}.\n\\end{proposition}\n\n\\begin{proof}\nThe proof is available in Appendix \\ref{sec:A1}.\n\\end{proof}\n\nWe remark that the elements of the matrices $\\Mbold$, $\\Gbold$ and $\\Bbold$ are non-zero only for pairs of basis functions which share common triangles. This implies that the matrix $(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold))$ is sparse and that Equation \\eqref{eq:implicit_scheme} can be solved by Cholesky decomposition in an efficient way. \n\n\n\\subsection{Stabilization of advection-dominated SPDE}\n\\label{sec:adv_domination}\nWhen the advection term is too important with respect to the diffusion term, we obtain an advection-dominated SPDE. The advection-dominated flow is defined with respect to the P\\'eclet number $\\text{Pe}^h = \\frac{\\norm{\\gammabold} h}{2\\lambda}$, where $\\lambda$ is the coefficient of the isotropic Laplacian operator: when $\\text{Pe}^h > 1$, we are in presence of an advection-dominated flow. When the advection dominates the diffusion, it is well known that the stability of the Finite Element method with the continuous Galerkin method is unsatisfactory (see, for example, \\citet{mekuria2016} or \\citet{quarteroni2008}, Chapter 5). This is due to the fact that the advection term is non-symmetric, and causes the condition number of the matrix $\\left[\\Mbold + \\frac{dt}{c}(\\Kbold +\\Bbold)\\right]$ to increase, which leads to oscillations in the solution and may induce instability. A possible solution could be to refine the triangulation until the advection no longer dominates on the element-level. However, in many cases this is not a feasible solution because it would be too much computer demanding. Several stabilization terms can be introduced, some of which are more accurate than others. A detailed explanation of the stabilization approach is reported in Appendix \\ref{sec:A2}.\n\nIn our case, in the trade-off of order of accuracy and computational complexity, we opt for the Streamline Diffusion stabilization term, that stabilizes the extra advection by introducing an artificial diffusion term along the advection direction. \n\n\\begin{proposition}\n\\label{prop:discr_adv_diff_stab}\nLet consider the same hypotheses as in Proposition \\ref{prop:discr_adv_diff}. The introduction of the Streamline Diffusion stabilization implies that the discretization defined in Equation \\eqref{eq:implicit_scheme} becomes\n\\begin{eqnarray}\n\\left(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold+\\Sbold)\\right) \\mathbf{x}_{t+dt}=\\Mbold\\mathbf{x}_{t} + \\frac{\\tilde\\tau\\sqrt{dt}}{\\sqrt{c}}\\Mbold^{1/2} \\mathbf{z}_{t},\n\\label{eq:implicit_scheme_stab}\n\\end{eqnarray} \nwhere $\\Sbold = [S_{ij}]_{i,j=1}^{N_S}$ is the matrix of the Streamline Diffusion stabilization operator $\\mathcal{S}$, such that $S_{ij} = \\mathcal{S}(\\psi_i,\\psi_j)=h\\lvert\\gammabold\\rvert^{-1}\\int_\\Omega (\\gammabold\\cdot \\nabla \\psi_i )(\\gammabold\\cdot \\nabla \\psi_j) d\\sbold$, and $\\tilde\\tau= \\tau \\left( \\lvert\\Hbold + h\\gammabold\\lvert\\gammabold\\rvert^{-1}\\gammabold^\\top \\rvert \\right)^{-1/4} \\left( \\lvert\\Hbold\\rvert \\right)^{1/4}$ (see, for example, \\citet{fulgstad2015}). \nIf the noise on the right-hand side of Equation \\eqref{eq:adv_diff} is colored in space, the discretization reads\n\\begin{equation*}\n\\left(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold+\\Sbold)\\right) \\mathbf{x}_{t+dt}=\\Mbold\\mathbf{x}_{t} + \\frac{\\tilde\\tau\\sqrt{dt}}{\\sqrt{c}} \\Lbold_S^\\top \\mathbf{z}_{t},\n\\end{equation*}\nwhere $\\Lbold_S$ is the Cholesky decomposition of the discretized spatial precision matrix $\\Qbold_S$ of the solution of the spatial SPDE $(\\kappa -\\nabla \\cdot \\Hbold \\nabla)^{\\alpha_S/2}X_S(\\sbold) = W(\\sbold)$, obtained with the Continuous Galerkin FEM \\citep{lindgren2011}. With this parametrization, the marginal variance of the spatial trace of the field is the same as the one defined in Equation \\eqref{eq:variance} with $\\tilde{\\tau}$ instead of $\\tau$. \n\\end{proposition}\n\nThe Streamline Diffusion operator $\\mathcal{S}$ can be considered from another point of view as a perturbation of the original SPDE \\citep{bank1990}. In fact, by making the classical hypothesis of Neumann boundary condition on $\\Omega$ and by using the Green's first identity, we have that\n$$\\int_\\Omega(\\gammabold\\cdot \\nabla x )(\\gammabold\\cdot \\nabla v) d\\sbold = -\\int_\\Omega\\nabla \\cdot (\\gammabold\\gammabold^\\top) \\nabla x vd\\sbold.$$\nThis means that the original SPDE \\eqref{eq:adv_diff} can be rewritten with an additional diffusion term as \n\\begin{equation}\n \\left[\\frac{\\partial}{\\partial t} + \\frac{1}{c} \\left(\\kappa^2 - \\nabla \\cdot \\left(\\Hbold + h\\lvert\\gammabold\\rvert^{-1}\\gammabold\\gammabold^\\top\\right) \\nabla + \\gammabold \\cdot \\nabla \\right)\\right] X(\\sbold,t) = \\frac{\\tilde\\tau}{\\sqrt{c}}Z(\\sbold,t).\n \\label{eq:diff_adv_sl_spde}\n\\end{equation}\nThe term $(h\\lvert\\gammabold\\rvert^{-1}\\gammabold\\gammabold^\\top)$ acts as an anisotropic \u201cdiffusion\u201d tensor that is added to the anisotropy (or identity) matrix $\\Hbold$ of the original diffusion. This extra diffusion stabilizes the advection directed along the direction $\\gammabold$.\n\n\n\\subsection{Spatio-temporal Gaussian Markov Random Field approximation}\n\\label{sec:prec_mat}\n\n\\begin{proposition}\nIn presence of an advection-dominated flow and a spatio-temporal white noise on the right-hand side of Equation \\eqref{eq:adv_diff}, the discretized vector $\\xbold_{t+dt}$ on the mesh $\\cal T$ at each time step can be found as the solution of the following equation:\n\\begin{align}\n\\mathbf{x}_0 &\\sim \\mathcal{N}(\\0bold,\\Sigmabold_0), \\nonumber\\\\\n\\mathbf{x}_{t+dt}&=\\Dbold\\mathbf{x}_{t} + \\Ebold \\mathbf{z}_{t},\n\\label{eq:timestep}\n\\end{align}\nwhere \n\\begin{eqnarray}\n\\Dbold & = & (\\Mbold + \\frac{dt}{c}(\\Kbold + \\Bbold + \\Sbold))^{-1}\\Mbold, \\nonumber \\\\\n\\Ebold & = & \\frac{\\tilde\\tau \\sqrt{dt}}{ \\sqrt{c}}(\\Mbold + \\frac{dt}{c}(\\Kbold + \\Bbold + \\Sbold))^{-1}\\Mbold^{1/2},\n\\label{eq:matrices_timestep}\n\\end{eqnarray}\nand $\\mathbf{z}_{t}\\sim\\mathcal{N}(\\0bold,\\Ibold)$ is independent of $\\mathbf{x}_0,\\dots,\\mathbf{x}_t$. In presence of a spatio-temporal noise colored in space on the right-hand side of Equation \\eqref{eq:adv_diff}, the matrix $\\Ebold$ reads \n$$\\Ebold = \\frac{\\tilde\\tau \\sqrt{dt}}{ \\sqrt{c}}(\\Mbold + \\frac{dt}{c}(\\Kbold + \\Bbold + \\Sbold))^{-1}\\Lbold_S^\\top,$$\nwith $\\Lbold_S$ defined in Proposition \\ref{prop:discr_adv_diff_stab}.\n\\end{proposition}\n\n\\begin{proof}\nStarting from Equation \\eqref{eq:implicit_scheme_stab}, which represents the numerical scheme for the advection-diffusion spatio-temporal SPDE with stabilization, it is straightforward to obtain \\eqref{eq:timestep}.\n\\end{proof}\n\nWhen the SPDE is not advection-dominated (and hence no stabilization term is needed), Equation \\eqref{eq:matrices_timestep} is replaced by the similar equation where the matrix $\\Sbold$ is deleted and where $\\tilde\\tau$ is replaced by $\\tau$.\n\nThe covariance matrix $\\Sigmabold_0$ can be taken to be equal to any admissible positive definite matrix. The speed of convergence to the stationary solution of the equation will depend on the proximity of the covariance of the spatial trace of $X(\\sbold,t)$ to $\\Sigmabold_0$. When the noise is colored in space, we can set $\\Sigmabold_0$ to be equal to the Mat\\'ern covariance of the spatial trace defined in Proposition \\ref{prop:spatial_trace}.\n\nTo make the precision matrix sparse to obtain GMRF models, $\\Mbold$ should be replaced by the diagonal matrix $\\tilde{\\Mbold}$, where $\\tilde{\\Mbold}_{ii} = \\langle \\psi_i,1 \\rangle$. This technique is called mass lumping and is common practice in FEM. From now on, we always use the lumped matrix $\\tilde{\\Mbold}$, but for ease of reading, it will still be denoted $\\Mbold$.\n\n\\begin{proposition}\\label{prop:global_prec}\nLet us denote $\\xbold_{0:t}=[\\mathbf{x}_0,\\dots,\\mathbf{x}_{t}]^\\top$ the vector containing all spatial solutions until time step $t$, \nthe global precision matrix $\\Qbold$ of the vector $\\xbold_{0:t}$ reads\n\\begin{equation}\n\\Qbold = \\begin{pmatrix}\n\\Sigmabold_0^{-1}+\\Dbold^\\top\\Fbold^{-1}\\Dbold & -\\Dbold^\\top\\Fbold^{-1} & 0 & \\dots & 0 \\\\\n-\\Fbold^{-1}\\Dbold & \\Fbold^{-1}+\\Dbold^\\top\\Fbold^{-1}\\Dbold & -\\Dbold^\\top\\Fbold^{-1} & \\ddots & \\vdots \\\\\n\\vdots & \\ddots & \\ddots & \\ddots & 0 \\\\\n\\vdots & \\ddots & -\\Fbold^{-1}\\Dbold & \\Fbold^{-1}+\\Dbold^\\top\\Fbold^{-1}\\Dbold & -\\Dbold^\\top\\Fbold^{-1} \\\\\n0 & \\dots & 0 & -\\Fbold^{-1}\\Dbold & \\Fbold^{-1}\n\\end{pmatrix},\n\\label{eq:prec_matrix}\n\\end{equation}\nwhere $\\Fbold=\\Ebold \\Ebold^\\top$.\n\\end{proposition}\n\n\\begin{proof}\nThe proof is available in Appendix \\ref{sec:A3}.\n\\end{proof}\n\n\n\\section{Estimation, prediction and simulation}\n\\label{sec:estimation}\n\nThis section explores the techniques for estimation and prediction of spatio-temporal processes following the SPDE approach described in Section \\ref{sec:spatiotemp_spde}. We will consider the SPDE \\eqref{eq:adv_diff} with $\\alpha=1$, $\\Hbold = \\Ibold$ (isotropic diffusion) and colored noise in space with $\\alpha_S=2$. Similar computations can be generalized to other values of $\\alpha_S$ such that $\\alpha_S/2$ is integer or to anisotropic diffusion.\n\nWe consider $n$ spatio-temporal data $\\zbold$ scattered in the spatio-temporal domain $\\Omega \\times [0,T]$, discretized in space with a triangulation $\\cal T$ with $N_S$ nodes and discretized in time by means of $N_T$ regular time steps. We shall denote this space-time discretization ${\\cal T}' = {\\cal T} \\times \\{ 1/T,\\dots,N_T/T\\}$. We assume a statistical model with fixed and random effects. The fixed effect is a regression on a set of covariates and the random effect is modeled as the FEM discretization of a random field described by the SPDE \\eqref{eq:adv_diff} with the addition of random noise:\n\\begin{equation}\n \\zbold=\\mathbf{\\boldsymbol{\\eta}}\\bbold + \\mathbf{A}^\\top \\xbold + \\sigma_0 \\varepsilonbold,\n \\label{eq:stat_model}\n\\end{equation}\nwhere $\\bbold$ is the vector of $q$ fixed effects and $\\boldsymbol{\\eta}$ is a $(n,q)$ matrix of covariates with $[\\boldsymbol{\\eta}]_{jk} = \\eta_k(\\sbold_j,t_j)$, $j=1\\dots,n$ and $k=1,\\dots,q$. The matrix $\\Abold$ is the $(N_SN_T,n)$ projection matrix between the points in ${\\cal T}'$ and the data, and $\\varepsilonbold$ is a standard Gaussian random vector with independent components.\n\n\\subsection{Estimation of the parameters}\n\\label{sec:estimation_scattered}\n\nThe parameters of the SPDE are estimated using Maximum Likelihood (ML). We collect the parameters of the SPDE in the vector $\\thetabold^\\top= (\\kappa, \\gamma_x, \\gamma_y, c, \\tau)$, while the parameters of the statistical model are collected in $\\psibold^\\top = (\\thetabold^\\top,\\bbold^\\top,\\sigma_0)$.\nFollowing \\eqref{eq:stat_model}, $\\zbold$ is a Gaussian vector with expectation $\\boldsymbol{\\eta}\\bbold$ and covariance matrix \n$$\\Sigmabold_{\\zbold}=\\Abold^\\top \\Qbold^{-1}(\\thetabold)\\Abold+\\sigma_0^2 \\Ibold_{(n,n)},$$\nwhere $\\Qbold(\\thetabold)$ is a precision matrix of size $(N_SN_T, N_SN_T)$ depending on the parameters $\\thetabold$. For ease of notation, we use $\\Qbold$ instead of $\\Qbold(\\thetabold)$. The log-likelihood is equal to\n\\begin{equation}\n\\mathcal{L}(\\psibold)=-\\frac{n}{2}\\log(2\\pi)-\\frac{1}{2}\\log \\lvert\\Sigmabold_{\\zbold}(\\psibold)\\rvert -\\frac{1}{2}(\\zbold-\\boldsymbol{\\eta}\\bbold)^\\top\\Sigmabold_{\\zbold}^{-1}(\\psibold)(\\zbold-\\boldsymbol{\\eta}\\bbold).\n\\label{eq:loglike_latent}\n\\end{equation}\n\nWe use the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) optimization algorithm, that makes use of the second-order derivative of the objective function. The gradients of the log-likelihood function \\eqref{eq:loglike_latent} with respect to the different parameters included in $\\psibold$ are approximately computed with a Finite Difference Method.\n\nWe now propose an easier formulation of each term of the profile log-likelihood \\eqref{eq:loglike_latent} to speed up the computations. Concerning the term $\\log\\lvert\\Sigmabold_{\\zbold}\\rvert$ we derive the following property:\n\n\\begin{proposition}\nIn the framework outlined above, we have \n\\begin{equation}\n \\log\\lvert\\Sigmabold_{\\zbold}\\rvert=n\\log\\sigma_0^2 -\\log\\lvert\\Qbold\\rvert+\\log\\lvert\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top\\rvert.\n \\label{eq:logdet_sigmaz}\n\\end{equation}\n\\end{proposition}\n\n\\begin{proof}\nTo compute $\\log\\lvert\\Sigmabold_{\\zbold}\\rvert$, let us consider the augmented matrix \n\\begin{equation}\n\\Sigmabold_c=\\begin{pmatrix} \\Qbold^{-1} & \\Qbold^{-1}\\Abold\\\\ \\Abold^\\top \\Qbold^{-1} & \\Sigmabold_{\\zbold}\\end{pmatrix}.\n\\label{eq:augmented_mat}\n\\end{equation}\nBy using block formulas, we have \n$$\\log\\lvert\\Sigmabold_c\\rvert=-\\log\\lvert\\Qbold\\rvert+n\\log\\sigma_0^2,$$\nand\n$$\\log\\lvert\\Sigmabold_c\\rvert=\\log\\lvert\\Sigmabold_{\\zbold}\\rvert+\\log\\lvert\\Qbold^{-1}-\\Qbold^{-1}\\Abold\\Sigmabold_{\\zbold}^{-1}\\Abold^\\top \\Qbold^{-1}\\rvert=\\log\\lvert\\Sigmabold_{\\zbold}\\rvert-\\log\\lvert\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top\\rvert,$$\nwhere the last equality is a consequence of the Woodbury identity.\nThis leads to the result.\n\\end{proof}\n\n\\begin{proposition}\nThe term $\\log\\lvert\\Qbold\\rvert$ of Equation \\eqref{eq:logdet_sigmaz} can be computed with the computationally cheap formula\n\\begin{equation}\n \\log\\lvert \\Qbold\\rvert = \\log\\lvert \\Sigmabold_0^{-1}\\rvert+(N_T-1)\\log\\lvert \\Fbold^{-1}\\rvert,\n \\label{eq:logdet_Q}\n\\end{equation}\nwhere $N_T$ is the number of time steps and $\\Fbold^{-1}=\\frac{c}{\\tilde\\tau^2 dt}(\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold+\\Sbold))^\\top \\Mbold^{-1} (\\Mbold+\\frac{dt}{c}(\\Kbold+\\Bbold+\\Sbold))$. $\\Mbold^{-1}$ is replaced by $\\Qbold_S$ when the noise is colored in space. $\\lvert \\Fbold^{-1}\\rvert$ is the determinant of an $(N_S, N_S)$ sparse, symmetric and positive definite matrix. The computation of its determinant can be obtained by Cholesky decomposition of $\\Fbold^{-1}$.\n\\end{proposition}\n\n\\begin{proof}\nFollowing \\citet{powell2011}, let $\\Nbold_N = [\\Nbold_ij]_{i,j=1}^N$ be an $(nN, nN)$ matrix, which is partitioned into $N$ blocks, each of size $(n, n)$. Then the determinant of $\\Nbold_N$ is \n$$\\lvert \\Nbold_N\\rvert = \\prod_{k=1}^N \\lvert \\alpha_{kk}^{(N-k)}\\rvert,$$\nwhere the $\\alpha^{(k)}$ are defined by\n\\begin{align*}\n\\alpha_{ij}^{(0)} &= \\Nbold_{ij} \\\\\n\\alpha_{ij}^{(k+1)} &= \\alpha_{ij}^{(k)} - \\alpha_{i,N-k}^{(k)} (\\alpha_{N-k,N-k}^{(k)})^{-1} \\alpha_{N-k,j}^{(k)},\\quad k \\geq 1.\n\\end{align*}\n$\\Qbold$ is a block-matrix organized as $\\Nbold_N$. Hence, the formula for $\\lvert \\Qbold\\rvert$ is \n\\begin{equation}\n\\lvert \\Qbold\\rvert= \\lvert \\Sigmabold_0^{-1}\\rvert\\lvert \\Fbold^{-1}\\rvert^{N-1}.\n\\label{eq:det_Q}\n\\end{equation}\nApplying the logarithm, we obtain Equation \\eqref{eq:logdet_Q}.\n\\end{proof}\n\nThe term $\\log\\lvert\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top\\rvert$ must be analyzed in detail. The term $\\sigma_0^{-2}\\Abold\\Abold^\\top$ is an $(N_SN_T,N_SN_T)$ diagonal block matrix, whose $(N_S,N_S)$ blocks are sparse. The computation of $\\log\\lvert\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top\\rvert$ is not straightforward as in the case of $\\log\\lvert\\Qbold\\rvert$, since there is no way of reducing the computation to purely spatial matrices. Depending on the size $N_SN_T$, we can either apply a Cholesky decomposition of the $(N_SN_T, N_SN_T)$ matrix $(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)$ or a matrix-free approach that is detailed below.\n\nWhen the number of mesh points prevents us from computing the Cholesky decomposition of $(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)$, we can approximate $\\log\\lvert \\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top\\rvert$ by expressing it as $\\text{tr}[\\log(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)]$, approximating the log function with a Chebyschev polynomial and using the Hutchinson's estimator \\citep{hutchinson1990} to obtain a stochastic estimate of the trace of the matrix $[\\log(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)]$. The method is detailed in Algorithm 5 of \\citet{pereira2022hal}.\n\nConcerning the computation of the quadratic term of the log-likelihood, we note that using the Woodbury formula again, we can work with the more convenient expression\n$$\\Sigmabold_{\\zbold}^{-1}=\\sigma_0^{-2}\\Ibold_{(n,n)}-\\sigma^{-4}\\Abold^\\top(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}\\Abold$$\nthat leads to the formula\n\\begin{eqnarray*}\n(\\zbold-\\boldsymbol{\\eta}\\bbold)^\\top \\Sigmabold_{\\zbold}^{-1}(\\psibold) (\\zbold & - & \\boldsymbol{\\eta}\\bbold) = \\sigma_0^{-2}(\\zbold-\\boldsymbol{\\eta}\\bbold)^\\top \\Ibold_{(n,n)}(\\zbold-\\boldsymbol{\\eta}\\bbold) + \\\\\n& - & \\sigma_0^{-4}(\\zbold-\\boldsymbol{\\eta}\\bbold)^\\top \\Abold^\\top(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}\\Abold(\\zbold-\\boldsymbol{\\eta}\\bbold),\n\\end{eqnarray*}\nwhich is obtained by first computing the second term either by Cholesky decomposition or with the Conjugate Gradient method. This latter method solves $\\Nbold \\vbold=\\wbold$ with respect to $\\vbold$ and computes $\\vbold_{sol} = \\wbold^\\top \\vbold$, with $\\Nbold = (\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)$ and $\\wbold=\\Abold(\\zbold-\\boldsymbol{\\eta}\\bbold)$. In this case, it is useful to find a good preconditioner for the matrix $(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)$ to ensure fast convergence of the conjugate gradient method. A temporal block Gauss-Seidel preconditioner is a good choice in this case. A detailed explanation of the Conjugate Gradient method and the Gauss-Seidel preconditioner is available in Appendix \\ref{sec:A4}. \n\nIn Table \\ref{tab:ml_spatio_temp} we report some results of estimation of the parameters $\\thetabold^\\top = (\\kappa, \\gamma_x, \\gamma_y, c, \\tau)$ for a spatio-temporal model simulated with the SPDE \\eqref{eq:adv_diff}. We set $\\Hbold=\\Ibold$, $\\alpha=1$, $\\alpha_S=2$ and the other parameters of the spatio-temporal discretization to the following: $N_S=900$, $dx=dy=1$, $dt=1$, $N_T = 10$ and $n_S=500$ observations in the spatial domain at the same locations for the $N_T$ time-steps (hence $n=5000$). Since the size of both the dataset and the spatio-temporal mesh is quite small, we report only the estimations computed with the Cholesky decomposition.\n\nThe initial values for the estimation have been set in the following way: first, a Mat\u00e9rn variogram with regularity parameter fixed at $\\nu = \\alpha +\\alpha_S - 1=2$ is fitted to one (or more) spatial trace of data in order to set the value of $\\kappa$ equal to $\\kappa = \\sqrt{12\\nu}/r_M$, where $r_M$ is the practical range (the distance at which the correlation is approximately equal to 0.05), and the value of $\\sigma^2$ equal to the sill. Then an exponential variogram is fitted to the temporal trace and $c$ is set to be equal to $c = r_E/3$, where $r_E$ is the practical range of the exponential variogram. In other words, we initialize the parameters of the SPDE model to the values obtained through the method of moments applied assuming the hypothesis of Proposition \\ref{prop:spatial_trace}. Finally $\\gammabold$ is set to $\\gammabold=\\0bold$, i.e., to the purely diffusive case.\n\n\n\\begin{table}[hbt]\n\\begin{adjustbox}{width=\\columnwidth,center}\n\\begin{tabular}{cccccccccccc}\n\\toprule\n$\\kappa$ & $\\hat{\\kappa}$ & $\\gamma_x$ & $\\hat{\\gamma}_x$ & $\\gamma_y$ & $\\hat{\\gamma}_y$ & $c$ & $\\hat{c}$ & $\\tau$ & $\\hat{\\tau}$ & average time $(s)$\\\\\n\\midrule\n\t0.2 & 0.203 (0.051) & -2 & -2.043 (0.262) & 3 & 2.979 (0.351) & 1 & 1.037 (0.093) & 1 & 1.036 (0.092) & 120 \\\\\n\t\\midrule\n\t0.33 & 0.328 (0.059) & -1 & -1.008 (0.134) & 1 & 1.018 (0.143) & 0.5 & 0.546 (0.041) & 1.2 & 1.221 (0.037) & 124\\\\\n\\botrule\n\\end{tabular}\n\\end{adjustbox}\n\\caption{Mean (and standard deviation) of ML estimates $\\hat{\\thetabold}^\\top = (\\hat{\\kappa}, \\hat{\\gamma}_x, \\hat{\\gamma}_y, \\hat{c}, \\hat{\\tau})$ over 10 simulations for two different subsets of advection-diffusion model's parameters}\n\n\\label{tab:ml_spatio_temp}\n\\end{table}\n\n\n\\subsection{Kriging}\n\\label{sec:kriging}\n\nFrom the vector $\\zbold$ of data at the observed locations defined in Equation \\eqref{eq:stat_model}, we aim to predict of the spatio-temporal vector $\\xbold$ on the entire spatial mesh during the time window $\\left[0,T\\right]$, i.e. on ${\\cal T}'$. The prediction is made through kriging, which computes the conditional expectation of $\\xbold$ knowing $\\zbold$ as\n\\begin{equation}\n\\xbold^\\star = \\mathbb{E}(\\xbold\\lvert \\zbold) = (\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}\\Abold \\zbold.\n \\label{eq:kriging}\n\\end{equation}\nThis formula comes from the definition of the augmented matrix $\\Sigmabold_c$ of equation \\eqref{eq:augmented_mat}. The computation of \\eqref{eq:kriging} requires the inversion of the $(N_SN_T, N_SN_T)$ matrix $(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}$, that can be achieved by Cholesky decomposition or Conjugate Gradient method as explained in Section \\ref{sec:estimation_scattered}. The conditional variance, also called kriging variance, is \n$$\n\\text{Var}(\\xbold \\lvert \\zbold) = (\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}.\n$$\n\nThe extrapolation of the diagonal of an inverse matrix is not straightforward when only the Cholesky decomposition of the matrix is available. Some methods exist to as the Takahashi method described in \\citet{takahashi73} and \\citet{erisman75}. Another way of computing the kriging variance is through conditional simulations, as will be detailed in Section \\ref{sec:conditional_simu}.\n\nIf we want to predict the $(N_S)$ vector $\\xbold_{T+1}$ at time $(T+1)$ on the mesh, we can define $\\xbold_{T+1}$ using a time discretization step as defined in Equation \\eqref{eq:timestep}\n$$\\xbold_{T+1} = \\Dbold\\xbold_{T} + \\Ebold \\varepsilonbold_{T+1},$$\nwhere $\\xbold_T$ is the last subset of size $(N_S)$ of vector $\\xbold$ related to time $T$ and $\\varepsilonbold_{T+1}$ is a $(N_S)$ standardized Gaussian vector.\nIf we want to link $\\xbold_{T+1}$ to $\\xbold$ (the vector containing all time steps), we must replace the $(N_S,N_S)$ matrix $\\Dbold$ with a sparse $(N_S,N_SN_T)$ matrix $\\Dbold_{ST}$ with $N_T$ blocks, all of whom are null except the last $(N_S,N_S)$ block, which is equal to $\\Dbold$. The same procedure is done for $\\Ebold$. Hence, we obtain \n$$\\xbold_{T+1} = \\Dbold_{ST}\\xbold + \\Ebold_{ST} \\varepsilonbold_{ST},$$\nwhere $\\varepsilonbold_{ST}$ is a $(N_SN_T)$ standardized Gaussian vector.\nThe covariance between $\\xbold_{T+1}$ and $\\zbold$ (the $(n)$ observation vector) is then given by $\\Dbold_{ST}\\Qbold^{-1}\\Abold$.\nIf we consider the augmented covariance matrix $\\Sigmabold_{c+1}$ of the vectors $\\xbold$, $\\xbold_{T+1}$ and $\\zbold$, we have\n\n\\begin{equation*}\n\\Sigmabold_{c+1}=\\begin{pmatrix} \\Qbold^{-1} & \\Qbold^{-1}\\Dbold_{ST}^\\top & \\Qbold^{-1}\\Abold\\\\ \n\\Dbold_{ST} \\Qbold^{-1} & \\Dbold_{ST} \\Qbold^{-1}\\Dbold_{ST}^\\top + \\Ebold_{ST}\\Ebold_{ST}^\\top & \\Dbold_{ST} \\Qbold^{-1}\\Abold\\\\\n\\Abold^\\top \\Qbold^{-1} & \\Abold^\\top \\Qbold^{-1} \\Dbold_{ST}^\\top & \\Sigmabold_{\\zbold}\\end{pmatrix}.\n\\end{equation*}\n\nFinally, we obtain the formula for kriging $\\xbold_{T+1}^\\star$ at $(T+1)$ as \n\\begin{equation}\n\\ybold_{T+1}^\\star = \\mathbb{E}(\\ybold_{T+1}\\lvert \\zbold) = \\Dbold_{ST}(\\Qbold+\\sigma_0^{-2}\\Abold\\Abold^\\top)^{-1}\\Abold \\zbold.\n \\label{eq:kriging_t1}\n\\end{equation}\nThis formula means that we can obtain the kriging $\\xbold_{T+1}^\\star$ at time $(T+1)$ just by applying one Implicit Euler step to the kriging result $\\xbold_{T+1}^\\star$ at time $T$. The same procedure can be used to predict at time $(T+2)$ and so on.\n\n\\subsection{Conditional simulations}\n\\label{sec:conditional_simu}\n\nTo realize a conditional simulation, we use the conditional kriging paradigm detailed below. First we realize a non-conditional simulation $\\xbold_{NC}$ on the spatio-temporal grid ${\\cal T}'$. \nAt the observation points $(\\sbold, t) \\in \\mathcal{I}$, we compute the vector $\\zbold_\\mathcal{I}$\n$$\\zbold_\\mathcal{I}=\\Abold^\\top\\xbold_{NC}+\\varepsilonbold.$$\nWe define the residuals as $ \\rbold_\\mathcal{I} = \\zbold_\\mathcal{I} - \\xbold_{NC}(\\mathcal{I})$, where the last term represents $\\xbold_{NC}$ considered at the observation locations only. Then we apply kriging on the residuals and obtain the values $\\rbold^\\star$ over the entire spatio-temporal grid $\\mathcal{T}'$; we here use the method explained in Section \\ref{sec:kriging}. Finally, the conditional simulation is obtained as \n$$\\zbold_{C}(\\mathcal{I}) = \\Abold^\\top\\xbold_{NC} + \\Abold^\\top\\rbold^\\star.$$ \n\n\n\\section{Application to a solar radiation dataset}\n\\label{sec:application}\n\nThe approach detailed in the previous sections is now applied to a solar radiation dataset for which experts agree on the presence of advection due to Western prevailing winds transporting the clouds from one side of the domain to the other.\nThe HOPE campaign \\citep{macke2017} recorded Global Horizontal Irradiance (GHI) (or SSI, Surface Solar Irradiance) over a $10000 \\times 16000 \\ m^2$ region in West Germany near the city of J\\\"ulich from the 2nd of April, to the second of July, 2013. The sensors were located at 99 stations located as pictured in Figure \\ref{fig:locations} and GHI was recorded every 15 seconds. A detailed description of the campaign can be found in \\citet{macke2017}.\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.45\\textwidth]{mymap_large.pdf}\n\\includegraphics[width=0.4\\textwidth]{mymap.pdf}\n\\caption{Stations over the spatial domain}\n\\label{fig:locations}\n\\end{figure}\n\nThe dataset was cleaned for outlying values and non-operating sensors, and the temporal resolution was reduced from 15 seconds to 1 minute. Figure \\ref{fig:4_stations} (left panel) shows GHI as a function of time (in minute, during a full day -- the 28th of May 2013) at 4 different stations. These stations, represented in color in Figure \\ref{fig:locations}, are located at the border of the domain, far from each other. The GHI starts close to 0, increases after sunrise, peaks at midday and tends to 0 at sunset. The maximal theoretical amount of irradiance reaching the sensor follows an ideal concave curve. The divergence between the measured irradiance and the optimal curve can be slight or important, depending on the presence of clouds. One can see on this example that the evolution among the 4 stations is similar, with variations accounting for spatio-temporal variations of the clouds.\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.45\\textwidth]{GHI_d_40.png}\n\\includegraphics[width=0.45\\textwidth]{CS_index_d_40.png}\n\\caption{GHI $G$ and Clear Sky Index $K_c$ for 4 different stations on the 28th of May 2013}\n\\label{fig:4_stations}\n\\end{figure}\n\nA first preprocessing was made in order to stationarize the phenomenon. \\citet{oumbe2014} showed that the solar irradiance at ground level, GHI (denoted $G$ for short from now on), computed by a radiative transfer model can be approximated by the product of the irradiance under clear atmosphere (called Clear Sky GHI, or $G_c$) and a modification factor due to cloud properties and ground albedo only (Clear Sky Index, or $K_c$, \\citet{beyer96}):\n\\begin{equation}\n G \\simeq G_c K_c.\n\\label{eq:ghi_general} \n\\end{equation}\nThe error made in using this approximation depends mostly on the solar zenith angle, the ground albedo and the cloud optical depth. In most cases, the maximum errors (95th percentile) on global and direct surface irradiances are less than 15 $Wm^{-2}$ and less than 2- to 5 \\% in relative value, as recommended by the World Meteorological Organization for high-quality measurements of the solar irradiance \\citep{oumbe2014}. Practically, it means that a model for fast calculation of surface solar irradiance may be separated into two distinct and independent models: i) a deterministic model for $G$, under clear-sky conditions, as computed according to \\citet{gschwind2019}, considered as known in this study; ii) a model for $K_c$ which accounts for cloud influence on the downwelling radiation and is expected to change in time and space. $K_c$ is modeled as a random spatio-temporal process and will the subject of our analysis. Figure \\ref{fig:4_stations} (right panel) shows the variable $K_c$ at the same 4 stations and on the same day of left panel. In general, $K_c$ lies between 0 and 1, but in rare occasions, values above 1 can be observed. This phenomenon is called \\textit{overshooting} \\citep{schade2007}.\n\n\\subsection{Estimation and prediction}\n\\label{sec:results}\n\nAs an example of application of the estimation and prediction methods described in Section \\ref{sec:estimation}, we extract a time window of 16 minutes around 4 p.m. the 28th of May 2013 and we consider observations every minute at the 73 stations with well recorded values. We compute the spatial prediction by kriging on a fine grid over the $T=16$ time steps and the spatio-temporal prediction at the two following minutes: $(T+1)$ and $(T+2)$. The estimation of the SPDE parameters is done according to the method described in Section \\ref{sec:estimation_scattered}. Kriging equations are computed as presented in Section \\ref{sec:kriging} for the 16 time steps on a grid of 900 points. Results are shown in Figure \\ref{fig:krig} for the last five minutes (from $t=12$ to $t=16$). To assess the accuracy of the predictions in the direction of the prevailing winds, we apply the kriging procedure to a restricted dataset where a subset of observations in the South-Eastern region has been removed. The kriging formula are then computed with a subset of observations. The two step-ahead predictions are made by applying the Implicit Euler discretization scheme to the kriging vector at time $T$ as presented at the end of Section \\ref{sec:kriging}. Prediction maps at time $T$, $(T+1)$ and $(T+2)$ are plotted in the upper panel of Figure \\ref{fig:error}, along with the correct values with black outline. In the lower panel, we plot the signed error $(\\text{Real} \\ K_c - \\text{Pred} \\ K_c)$.\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.4\\textwidth]{hist_d_40_t_985_ns_16.png}\n\\caption{Histogram of GHI over 20 time steps}\n\\label{fig:hist}\n\\end{figure}\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.99\\textwidth]{krig_partial_d_40_t_985_ns_16.png}\n\\caption{Kriging of $K_c$ over the last 5 time steps. The points with black outline are the real values of $K_c$}\n\\label{fig:krig}\n\\end{figure}\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.99\\textwidth]{krig_partial_t012_d_40_t_985_ns_16.png}\n\\includegraphics[width=0.99\\textwidth]{error_partial_t012_d_40_t_985_ns_16.png}\n\\caption{High: Predicted kriging of $K_c$ at $T$, $(T+1)$ and $(T+2)$. The points with black outline are the real values of $K_c$. Low: signed error $(\\text{Real} \\ K_c - \\text{Pred} \\ K_c)$}\n\\label{fig:error}\n\\end{figure}\n\n\\subsection{Conditional simulations}\n\nWe present here the results of 100 conditional simulations for time $T$, $(T+1)$ and $(T+2)$. We consider a subset of stations that are located along the direction of the advection direction, i.e. the stations represented in red in the first panel of Figure \\ref{fig:cond_simu}. Then we represent the real $K_c$, the mean of the predicted $K_c$ over 10 simulations, along with a $\\pm 2\\sigma$ envelope, where $\\sigma^2$ is the kriging variance among the simulations. The results for time steps $T$, $(T+1)$ and $(T+2)$ are shown in the last three panels of Figure \\ref{fig:cond_simu}. We remark that the kriging variance is very small at time $T$ and that the predicted values almost lie on the exact values; at time $(T+1)$ and $(T+2)$ the variance is larger, but all the exact values lie in the $\\pm 2\\sigma$ envelope.\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=0.4\\textwidth]{adv_dir_d_40_t_985_ns_16.png}\\\\\n\\includegraphics[width=0.32\\textwidth]{cond_simu_t_d_40_t_985_ns_16.png}\n\\includegraphics[width=0.32\\textwidth]{cond_simu_t1_d_40_t_985_ns_16.png}\n\\includegraphics[width=0.32\\textwidth]{cond_simu_t2_d_40_t_985_ns_16.png}\n\\caption{Up: Stations with estimated advection direction. Down: Real $K_c$, mean of predicted $K_c$ and $\\pm2\\sigma$ envelope at time $T$, $(T+1)$ and $(T+2)$}\n\\label{fig:cond_simu}\n\\end{figure}\n\n\n\n\\section{Discussion}\n\\label{sec:discussion}\nThe spatio-temporal SPDE approach based on advection-diffusion equations proposed in this work combines elements of physics, numerical analysis and statistics. It can be seen as a first step toward \\textit{physics informed geostatistics}, which introduces physical dynamics into the statistical model, accounting for possible hidden structures governing the evolution of the spatio-temporal phenomenon. \nThe different terms of the SPDE (advection, diffusion) directly influence the spatio-temporal dependencies of the process, by controlling its variability in space and time. Compared to spatio-temporal models built on covariance functions such as the Gneiting class \\citep{gneiting2002}, we gain in interpretability since the parameters of the model can be directly linked to the physical coefficients of SPDEs.\n\nWe showed that it is possible to build an accurate space-time approximations of the process driven by the SPDE using a combination of FEM in space and implicit Euler scheme in time. It leads to sparse structured linear systems. We obtained\npromising results for the estimation and for the prediction of processes both in terms of precision and speed. When the size of the data set is moderate, direct matrix implementation is possible. We showed how matrix-free methods can be implemented in order to obtain scalable computations even for very large datasets. \n\nFurther work would be necessary to better assess the prediction accuracy and the computational complexity.\nApplications to larger and more complex datasets should be considered and comparison to models expressing the advection in a Lagrangian framework \\citep{ailliot2011space,salvana2021} should be performed.\n\nOne of the main advantages of the SPDE formulation is that it is easy to generalize to non-stationary settings. Non-stationary fields can be defined by letting the parameters ($\\kappa(\\sbold,t)$, $\\vbold(\\sbold,t)$) be space-time-dependent. This generalization implies only minimal changes to the method used in the stationary case concerning the simulation, but needs more work for estimation and prediction, since the maximum likelihood approach becomes much more expensive. We can also incorporate models of spatially varying anisotropy by modifying the general operator $\\nabla \\cdot \\Hbold(\\sbold,t)\\nabla X(\\sbold,t)$ with a non-stationary anisotropic matrix $\\Hbold(\\sbold,t)$. The introduction of non-stationarities could allow to better describe phenomena where local variations are clearly present. The generalization of the approaches by \\citet{fulgstad2015} and \\citet{pereira2019thesis} should be investigated and generalized to the spatio-temporal framework.\n\nAnother interesting consequence of defining the models through local stochastic partial differential equations is that the SPDEs still make sense when $\\bbR^d$ is replaced by a space that is only locally flat. We can define non-stationary Gaussian fields on manifolds, and still obtain a GMRF representation. Important improvements were obtained in the spatial case \\citep{pereira2022hal}. The generalization to space-time processes could be explored further.\n\nPossible generalization to spatio-temporal SPDEs with a fractional exponent in the diffusion term could also be considered. A development of the methods proposed by \\citet{bolin2019} and \\citet{vabishchevich2015} should be explored.\n\n\\backmatter\n\n\\bmhead{Acknowledgments}\n\nWe are grateful to O.I.E. center of Mines Paris -- ARMINES, especially to Yves-Marie Saint-Drenan, Philippe Blanc and Hadrien Verbois, for providing the data and for inspiring discussions about renewable resources evaluation. In addition, we thank the Mines Paris / INRAE chair ``Geolearning'' for the constant support. \n\n\\bibliographystyle{apalike}\n", "meta": {"timestamp": "2022-08-31T02:08:40", "yymm": "2208", "arxiv_id": "2208.14015", "language": "en", "url": "https://arxiv.org/abs/2208.14015"}} {"text": "\\section{Introduction}\n\\noindent Given graphs $G$ and $F$, if $G$ does not contain a copy of $F$, then we say that $G$ is $F$-free.\nWe call an $n$-vertex $F$-free graph with maximum number of edges an extremal graph for $F$.\nTur\\'{a}n \\cite{turan1941} showed that the $n$-vertex complete $(t-1)$-partite graph with part sizes as equal as possible, denote by $T(n,t-1)$, is the unique extremal graph for $K_{t}$, where $K_{t}$ is the complete graph on $t$ vertices.\nSince then, determining the extremal graphs for a given graph with some additional conditions became one of the most important topics in combinatorics.\n\nWe will consider the following extremal problem.\nLet $[r]=\\{1,2,\\ldots,r\\}$.\nIn the rest of this paper, we will mostly consider $n$-vertex $r$-partite graphs with a partition $\\mathcal{V}=(V_1,\\ldots,V_r)$ such that $|V_i|=n_i$ for $i\\in[r]$ and $n_1\\geq \\ldots\\geq n_r$, that is, we consider spanning subgraphs of $K_{n_1,\\ldots,n_r}$, where $K_{n_1,\\ldots,n_r}$ is the complete $r$-partite graph containing all edges between different $V_i$'s.\nDenote by ex$(n_1,\\ldots,n_r,F)$ the maximum number of edges in an $F$-free $r$-partite graph with parts of sizes $n_1,\\ldots,n_r$.\n\nResearches on extremal problems in multi-partite graphs can be traced back to the 1950s.\nIn 1954, Zarankiewicz \\cite{Zarakiewicz1954} studied ex$(n,n,K_{2,2})$.\nLater, K\\\"{o}v\\'{a}ri, S\\'{o}s and Tur\\'{a}n \\cite{Kovari1954} gave an upper bound of ex$(n,n,K_{s,t})$.\nThe above two results are strongly connected to the Tur\\'{a}n numbers of bipartite graphs.\nVery recently, based a quantitative variant of the random algebraic method, Conlon \\cite{Conlon} gave good lower bounds for ex$(n,m,K_{s,t})$ when $n,m$ satisfy some additional conditions.\nFor related topics of the Zarankiewicz problems, we refer the interested readers to \\cite{Conlon} and references therein.\n\nThe following problem is related to the Tur\\'{a}n numbers of non-bipartite graphs.\nFor a set of integers $I$, let $n_I:=\\sum_{i\\in I} n_i$.\nGiven $r \\geq t \\geq 3$ and $k \\geq 2$, let $n_1\\geq \\ldots \\geq n_r$.\nFor $I\\subseteq [r]$, write $m_I = \\min_{i\\in I} \\{n_i\\}$.\nGiven a partition $\\mathcal{P}$ of $[r]$, let $n_{\\mathcal{P}}= \\max_{I\\in \\mathcal{P}}\\{n_I-m_I\\}.$\nDefine\n$$f(n_1,\\ldots,n_r,k,t):=\\max\\limits_{\\mathcal{P}} \\left\\{(k-1)n_\\mathcal{P} + \\sum_{I\\neq I'\\in \\mathcal{P}} n_I\\cdot n_{I'} \\right\\},$$\nwhere the maximum is taken over all partitions $\\mathcal{P}$ of $[r]$ into $t-1$ parts.\nIn particular, define\n$$f(n_1,\\ldots,n_r,1,t):=\\max\\limits_{\\mathcal{P}} \\left\\{ \\sum_{I\\neq I'\\in \\mathcal{P}} n_I\\cdot n_{I'} \\right\\},$$\nwhere the maximum is taken over all partitions $\\mathcal{P}$ of $[r]$ into $t-1$ parts.\n\nBollob\\'{a}s, Erd\\H{o}s and Straus \\cite{BES1974} consider the extremal graphs for $K_{t}$ in multi-partite graphs.\n\n\\begin{theorem}[Bollob\\'{a}s, Erd\\H{o}s and Straus \\cite{BES1974}]\\label{extremal number 1}\nLet $r\\geq t$.\nThen\n $${\\rm ex}(n_1,\\ldots,n_r,K_{t})=f(n_1,\\ldots,n_r,1,t).$$\n\\end{theorem}\n\nWe will characterize the extremal $(t-1)$-partitions in Theorem~\\ref{extremal number 1}.\nFirst, we introduce some notations.\n\nGiven two partitions $\\mathcal{P}=(P_1,\\ldots ,P_{t-1})$ and $\\mathcal{V}=(V_1,\\ldots,V_r)$ of $V$.\nFor $i\\in[t-1]$ and $j\\in[r]$, we say that $V_i$ is \\textcolor{blue}{{\\it integral}} in $P_j$ or $P_j$ is \\textcolor{blue}{{\\it integral}} to $V_i$ if $V_i \\subseteq P_j$ and $V_i$ is \\textcolor{blue}{{\\it partial}} in $P_j$ or $P_j$ is \\textcolor{blue}{{\\it partial}} to $V_i$ if $V_i \\cap P_j\\neq \\emptyset$ and $V_i \\nsubseteq P_j$.\n\nFix $P_j$, the \\textcolor{blue}{{\\it integral part}} of $P_j$ is the union of $V_i$'s which are integral in $P_j$ and the \\textcolor{blue}{{\\it partial part}} of $P_j$ is the union of $V_i\\cap P_j$'s such that $V_i$ is partial in $P_j$.\nWe say $P_j$ is partial to $\\mathcal{V}$ (simply $P_j$ is partial) if the partial part of $P_j$ is not empty and is integral to $\\mathcal{V}$ (simply $P_j$ is integral) otherwise.\n\n\n\n\nWe say $\\mathcal{P}$ is \\textcolor{blue}{{\\it 1-partial}} to $\\mathcal{V}$ (simply $\\mathcal{P}$ is 1-partial) if each $P_j$ contains at most one partial $V_i$.\nThe \\textcolor{blue}{\\it internalization} of $\\mathcal{P}$ is a partition, denote by $I(\\mathcal{P})$, obtained from $\\mathcal{P}$ by putting all vertices of each partial $V_i$ into one $P_j$ containing some vertices of $V_i$.\n\nFurthermore, we say $\\mathcal{P}$ is \\textcolor{blue}{{\\it stable}} to $\\mathcal{V}$ if the followings hold,\n\\begin{itemize}\n \\item $\\mathcal{P}$ is $1$-partial to $\\mathcal{V}$,\n \\item the size of the integral part of any partial class of $\\mathcal{P}$ equals to each other, and is no more than the size of any integral class of $\\mathcal{P}$,\n \\item the size of the integral part of any class of $\\mathcal{P}$ is no less than the size of any partial class of $\\mathcal{V}$,\n \\item after removing any integral $V_i$, the size of the resulting set is no more than the size of the integral part of any other class of $\\mathcal{P}$.\n\\end{itemize}\n\n\n\nGiven a graph $H$ with partitions $\\mathcal{V}$ and $\\mathcal{P}$, let $H[\\mathcal{P}]$ denotes the induced $(t-1)$-partite subgraph of $H$, that is, the edge set of $H[\\mathcal{P}]$ is\n$$\\big\\{ xy:xy\\in E(H), x\\in P_j\\cap V_{i},y\\in P_{j'}\\cap V_{i'},i\\neq i'\\in[r],j\\neq j'\\in[t-1] \\big\\}.$$\n\n\nWe say a partition of $\\mathcal{P}$ is \\textcolor{blue}{{\\it an extremal $(t-1)$-partition}} for $\\mathcal{V}$ if $e(K_{n_1,\\ldots,n_r}[\\mathcal{P}])=f(n_1,\\ldots,n_r,1,t)$.\nThe following theorem characterizes the extremal structures in Theorem~\\ref{extremal number 1}.\n\n\n\n\n\n\n\\begin{theorem}\\label{extremal (t-1)-partition}\nLet $\\mathcal{V}=(V_1,\\ldots,V_r)$ such that $|V_i|=n_i$ for $i\\in[r]$ and $n_1\\geq \\ldots\\geq n_r$.\nThe $(t-1)$-partition $\\mathcal{P}$ is an extremal partition for $\\mathcal{V}$ if and only if\n\\begin{itemize}\n \\item $\\mathcal{P}$ is stable to $\\mathcal{V}$ and\n \\item $I(\\mathcal{P})$ is still an extremal $(t-1)$-partition for $\\mathcal{V}$.\n\\end{itemize}\n\\end{theorem}\n\n\nLet $K_t^s=T(st,t)$.\nErd\\H{o}s and Stone \\cite{erdHos1946} showed that the extremal graphs for $K_t^s$ have $e(T(n,t-1))+o(n^2)$ edges.\nIn 1968, Erd\\H{o}s and Simonovits \\cite{erdHos1967,erdHos1968,Simonovits1968} proved the following well-known stability theorem.\n\n\\begin{theorem}[Erd\\H{o}s-Simonovits Weak Stability Theorem \\cite{erdHos1967,erdHos1968,Simonovits1968}]\\label{stability theorem}\nLet $F$ be a graph with chromatic number $t\\geq 3$.\nFor every $0<\\epsilon<1$, there exists a constant $\\delta >0$ such that every $F$-free graph $H$ with $n\\geq1/\\delta$ vertices and at least $e(T(n,t-1))-\\delta n^2$ edges contains a $(t-1)$-partite subgraph with at least $e(T(n,t-1))-\\epsilon n^2$ edges and can be obtained from an extremal $F$-free graph by changing at most $\\epsilon n^2$ edges.\n\\end{theorem}\n\n\nErd\\H{o}s and Simonovits essentially got a more detailed version of Theorem~\\ref{stability theorem} when $H$ is an extremal graph for $F$.\n\\begin{theorem}[Erd\\H{o}s-Simonovits Strong Stability Theorem \\cite{erdHos1967,erdHos1968,Simonovits1968}]\\label{s stability theorem}\nLet $F$ be a graph with chromatic number $t\\geq 3$.\nFor every $0<\\epsilon<1$ there exists a constant $\\delta >0$ such that every extremal $F$-free graph $H$ with $n\\geq1/\\delta$ vertices can be partitioned into $t-1$ classes each containing $n/(t-1)+o(n)$ vertices, and, with the exception of at most $c_\\epsilon$ vertices, each vertex of $H$ is joined to at most $\\epsilon n$ vertices in its own class and to all but $\\epsilon n$ vertices in the other classes.\n\\end{theorem}\n\nTheorems~\\ref{stability theorem} and~\\ref{s stability theorem} are powerful tools in extremal graph theory.\nFor example, applying Theorem~\\ref{s stability theorem}, Erd\\H{o}s and Simonovits~\\cite{ES1971} determined the extremal graphs for $K_{2,2,2}$ and applying Theorem~\\ref{stability theorem}, Mubayi \\cite{mubayi2010}, Pikhurko and Yilma \\cite{pikhurko2017} considered the supersaturation problems (the minimum number of copies of $F$ in an $n$-vertex graph $H$ on ex$(n,F)+q$ edges) for some specific graphs.\nWe do not try to list more applications of Theorems~\\ref{stability theorem} and~\\ref{s stability theorem}.\nWe only mention that Theorems~\\ref{stability theorem} and~\\ref{s stability theorem} often help when we consider extremal problems (not only Tur\\'{a}n type problems) for non-bipartite graphs.\n\n\n\nFor a set $X$ and a partition $\\mathcal{P}$, we define:\n$$\\mathcal{P}_X:=(P_1\\setminus X,\\ldots, P_{t-1}\\setminus X).$$\n\n\nLet $A\\bigtriangleup B$ stands for the symmetric difference of the sets $A$ and $B$.\nGiven two graphs $H$ and $F$ on the same vertex set, we say $F$ is \\textcolor{blue}{$\\alpha$-$close$} to $H$ if every vertex $v$ satisfies $|N_H(v) \\bigtriangleup N_F(v)|\\leq \\alpha$.\n\n\nFor a spanning subgraph $G$ of $K_{n_1,\\ldots,n_r}$, we say a partition $\\mathcal{P}=(P_1,\\ldots,P_{t-1})$ of $G$ is an \\textcolor{blue}{$(X,\\epsilon)$-$stable$} partition (see Figure 1) if there exists a vertex set $X$ and a small constant $0<\\epsilon<1$ with $|X|\\leq\\epsilon n_{t-1}$ such that\n\\begin{itemize}\n \\item $G-X$ is $\\epsilon n_{t-1}$-close to $K_{n_1,\\ldots,n_r}[\\mathcal{P}]-X$ and\n \\item $\\mathcal{P}_X$ is stable to $\\mathcal{V}_X$.\n \n\\end{itemize}\n\n\n\n\n\n\nWe will establish the following stability result in multi-partite graphs.\n\n\\begin{theorem}[Weak Multi-partite Stability Theorem]\\label{weak stability}\nLet $F$ be a graph with chromatic number $t\\geq 3$.\nLet $G$ be an $F$-free $r$-partite graph with parts $\\mathcal{V}=(V_1,\\ldots,V_r)$.\nFor every $0<\\epsilon<1$ there exists a constant $\\delta >0$ such that if $n_{t-1}\\geq1/\\delta$ and\n$$e(G)\\geq f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2,$$\nthen, after removing $\\epsilon n_{t-1}^2$ edges, $G$ has an $(X,\\epsilon )$-stable $(t-1)$-partition $\\mathcal{P}$ and the integral part of any class of $\\mathcal{P}_X$ is larger than $(1-\\epsilon)n_{t-1}$.\n\\end{theorem}\n\n\\begin{center}\n\\begin{tikzpicture}[scale = 1]\n\\tikzstyle{every node}=[scale=1]\n\n\\draw (1,2.7) ellipse (0.5 and 0.2);\n\\draw node at (1,3.2) {$X$};\n\n\\draw (-1,0) ellipse (0.5 and 2.2);\n\\draw node at (-2.8,0) {$V_2$};\n\\draw (-1,-3.6) ellipse (0.5 and 1.2);\n\\draw node at (-2.8,-3.6) {$V_1$};\n\n\\draw (3,0) ellipse (0.5 and 2.2);\n\\draw node at (4.8,0) {$V_3$};\n\\draw (3,-3.3) ellipse (0.5 and 0.9);\n\\draw node at (4.8,-3.3) {$V_1$};\n\n\\draw[style=dashed] (-4,-5) rectangle (6,2.4);\n\\draw node at (-4.5,-1) {$\\mathcal{P}_X$};\n\n\n\\draw node at (1,-5.5) {Figure 1. A form of $(X,\\epsilon)$-stable partition $\\mathcal{P}$ with $V_1$ {\\it partial} in it.};\n\n\\draw [line width=0.2cm, dotted,blue] (-0.5,0)--(2.5,0);\n\\draw [line width=0.2cm, dotted,blue] (-0.5,-3.6)--(2.5,0);\n\\draw [line width=0.2cm, dotted,blue] (-0.5,0)--(2.5,-3.3);\n\n\\end{tikzpicture}\n\\end{center}\n\n\\medskip\\par\\noindent {\\bf Remark.~~} Different from Theorem~\\ref{stability theorem}, our Theorem~\\ref{weak stability} shows that there exist graphs $H$ with $\\chi(H)=t$ such that if $e(G)\\geq f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2,$ then $G$ may be far away from the extremal graph for $H$, see the following example.\n\n\\medskip\n\n\\noindent{\\bf Example.} Let $K=K_{n_1,n_2,n_3}$ with classes $V_1$, $V_2$ and $V_3$.\nLet $n_1=m$, $n_2=m-1$ and $n_3=m-1$.\nFrom Theorem~\\ref{extremal number 1} we have ex$(n_1,n_2,n_3,K_3)=2m(m-1)$.\nMoreover, from Theorems~\\ref{extremal (t-1)-partition} and~\\ref{strong bollobas}, we can easily deduce that the unique extremal graph is $K_{m,2(m-1)}$.\nOn the other hand, if a $K_3$-free subgraph $H$ of $K$ has $2m(m-1)-o(m^2)$ edges, then $H$ may be a subgraph of $K[\\mathcal{P}]$, where $\\mathcal{P}=(V_1 \\cup X,V_2 \\cup Y)$ and $(X,Y)$ is a partition of $V_3$ with $|X|=\\lfloor|V_3|/2\\rfloor$ and $|Y|=\\lceil|V_3|/2\\rceil$.\nThus $H$ is far away from (changing $m^2/2$ edges) the unique extremal graph for $K_3$.\n\n\\medskip\n\nGiven $a,b \\in (0,1)$, we use $a\\mathop{\\ll}b$ to denote that $a0$ there exist constants $\\delta_1 \\ll \\delta_2 \\ll \\delta_3 \\ll \\epsilon$ and $c_{\\epsilon,F}$ depending on $r$, $\\epsilon$ and $F$ such that if $n_{t-1}\\geq 1 / \\delta_1$ then\n\\begin{itemize}\n\\item there exists an $(X,\\delta_2)$-stable $(t-1)$-partition $\\mathcal{P}$,\n\\item there exists a set $Y$ with $|Y|\\leq c_{\\epsilon,F}$ such that every vertex $v$ of $G-Y$ satisfies $|N_G(v)\\bigtriangleup N_{K_{n_1,\\ldots,n_r}[\\mathcal{P}]}(v)|\\leq \\epsilon n_{t-1}$,\n \\item the difference of the degrees of vertices of $G-Y$ in same class of $\\mathcal{V}$ is at most $\\epsilon n_{t-1}$ and\n\\item every vertex of $Y$ is adjacent to at least $\\delta_3 n_{t-1}$ vertices of each class of $\\mathcal{P}$ such that they induced a complete $(t-1)$-partite graph.\n\\end{itemize}\n\n\\end{theorem}\n\n\nAs an application of our stability theorems, we strengthen Theorem~\\ref{extremal number 1} as following.\n\n\\begin{theorem}\\label{strong bollobas}\nLet $r\\geq t$ and $n_{t-1}$ be sufficiently large.\nAll extremal graphs in Theorem~\\ref{extremal number 1} are $(t-1)$-partite.\n\\end{theorem}\n\nThe second application of our stability theorems is the extremal problem of vertex-disjoint copies of a clique in multi-partite graphs.\nLet $kK_{t}$ be the vertex-disjoint union of $k$ copies of $K_{t}$.\nChen, Li and Tu \\cite{Chen} gave the value of ex$(n_1,n_2,kK_2)$.\nLater, De Silva, Heysse and Young \\cite{Silva} strengthened Chen, Li and Tu's result and gave a problem about ex$(n_1, \\ldots,n_r,kK_t)$ when $r>t$.\nIn the most recently, Han and Zhao \\cite{Han} determined ex$(n_1,n_2,n_3,n_4,kK_3)$ and gave the following conjecture.\n\n\\begin{conjecture}\\label{-9}\nGiven $r \\geq t \\geq 3$ and $k \\geq 2$, let $n_r$ be sufficiently large.\nThen \\begin{equation*}\n{\\rm ex}(n_1,\\ldots,n_r,kK_{t})=f(n_1,\\ldots,n_r,k,t).\n\\end{equation*}\n\\end{conjecture}\nThey gave a graph achieving the lower bound, which is nearly (see the remark below) the lower bond graph of a $K_{t}$-free extremal graph by adding $O(n)$ edges on it.\n\n\n\\medskip\\par\\noindent {\\bf Remark.~~} There are extremal graphs for $kK_t$ which are not obtained by adding edges to an extremal $(t-1)$-partition.\nLet $K=K_{n_1,\\ldots,n_6}$ with classes $V_1,\\ldots,V_6$.\nLet $n_1=n_2=m+1$ and $n_3=\\ldots,n_6=m$.\nNote that one extremal graph for $3K_3$ is achieved by adding edges to the partition $\\mathcal{P}=(V_1\\cup V_2\\cup V_3,V_4\\cup V_5\\cup V_6)$.\nHowever $\\mathcal{P}$ is not an extremal 2-partition.\n\\medskip\n\n\nLet $\\mathcal{H}$ be the family of $(t-1)$-partite graphs in $K_{n_1,\\ldots,n_r}$.\nLet $\\mathcal{H}^{k-1}$ be the family of graphs obtained from $H\\in \\mathcal{H}$ by joining all possible edges incident with $k-1$ fixed vertices.\nDefine\n$$g(n_1,\\ldots,n_r,k,t):=\\max\\left\\{ e(F):F \\in\\mathcal{H}^{k-1} \\right\\}.$$\n\nAs an application of Theorem~\\ref{strong stability}, we will confirm Han and Zhao's conjecture in the following stronger form.\n\\begin{theorem}\\label{conjecture}\nGiven $r \\geq t \\geq 3$ and $k \\geq 2$, let $n_1 \\geq \\ldots \\geq n_r$ and $n_{t-1}$ be sufficiently large.\nThen \\begin{equation*}\n{\\rm ex}(n_1,\\ldots,n_r,kK_{t})=g(n_1,\\ldots,n_r,k,t).\n\\end{equation*}\nMoreover, all extremal graphs are obtained from $(t-1)$-partite graphs by joining all possible edges incident with $k-1$ fixed vertices.\n\\end{theorem}\n\nThe organization of this paper is as follows.\nIn Section~\\ref{section 2}, we introduce some lemmas.\nIn Section~\\ref{section 3}, we present the properties of extremal graphs for $K^s_t$ and prove Theorem~\\ref{extremal (t-1)-partition}.\nIn Sections~\\ref{section 4} and \\ref{section 6}, we give the proofs of Theorems~\\ref{weak stability}, \\ref{strong stability}, \\ref{strong bollobas} and \\ref{conjecture}.\n\n\\section{Preliminaries}\\label{section 2}\n\\subsection{Definition and Notation}\n\\noindent Recall that we mostly consider $r$-partite graph with parts $\\mathcal{V}=(V_1,\\ldots,V_r)$.\nLet $V=\\bigcup_{i=1}^rV_i$.\nLet $x$ be any vertex of a given graph $G$, the \\textcolor{blue}{{\\it neighborhood}} of $x$ in $G$ is denoted by $N_G(x)=\\{y\\in V(G):(x,y)\\in E(G)\\}$.\nThe \\textcolor{blue}{{\\it degree}} of $x$ in $G$, denoted by $d_{G}(x)$, is the size of $N_G(x)$.\nGiven a graph $G$ and a subset $A$ of $V(G)$, let $N_G[A]$ be the set of common neighbours of $A$ in $G$.\nGiven two disjoint independent sets $A,B$ of $G$, let $G_{A\\rightarrow B}$ be the graph obtained from $G$ by deleting edges incident with $A$ and adding each possible edge between $A$ and $B$.\nFor an independent set $A$ of $G$, let $G_{A}=G_{A\\rightarrow N_G(u)}$ where $u$ is a vertex with maximum degree in $A$.\n\n\t\nGiven a partition $\\mathcal{P}=(P_1,\\ldots,P_{t-1})$ of $V$, we define $\\mathcal{P} \\wedge \\mathcal{V}$ as a refinement of $\\mathcal{V}$:\n$$\\mathcal{P} \\wedge \\mathcal{V}:=(P_i\\cap V_j: i\\in [t-1],j\\in [r]).$$\n\n\n\nWe say a family of numbers $a_1,\\ldots,a_s$ with $a_1\\geq \\ldots\\geq a_s$ is \\textcolor{blue}{{$L$-$balance$}}, if $a_i\\geq a_{i-1}/L^4$ for each $i\\in [s]$.\nWe partition the family of numbers $n_1,\\ldots,n_r$ into maximal $L$-balance families $B_i$ such that each member of $B_{i-1}$ is larger than each member of $B_i$.\nIf $|B_1|\\geq t-1$, then we set $ \\tau(n_1,\\ldots,n_r,t,L)=0$.\nOtherwise, let\n$$\\tau(n_1,\\ldots,n_r,t,L):=\\max \\left\\{x: \\sum\\limits^x_{i=1}|B_i|< t-1\\right\\}.$$\n\n\nGraph Removal Lemma is widely used in extremal graph theory and related topics.\nWe will apply the following simple form of the Removal Lemma.\n\n\\begin{lemma}[A simple form of the Removal Lemma \\cite{Furedi2015}]\\label{removal lemma}\nFor every $\\alpha>0$, $s$ and $t\\geq 3$, there is a $\\delta$ such that if $n> 1/\\delta$ and $G$ is an $n$-vertex $K_t^s$-free graph then it contains a $K_t$-free subgraph $H$ with $e(H)>e(G)-\\alpha n^2$.\n\\end{lemma}\n\nThe following lemma can be found in the classic book of Bollob\\'{a}s \\cite{B1976}.\n\n\\begin{lemma}[Selection lemma \\cite{B1976}\\label{selection lemma}]\nLet $\\epsilon_1,\\ldots,\\epsilon_k$ be positive numbers, $0<\\alpha<1$ and $t$ be a natural number.\nThere exists a natural number $N=N(\\epsilon_1,\\ldots,\\epsilon_k;\\alpha;t)$ with the following property.\nFor $V_1,\\ldots,V_k$ with $|V_i|=n_i$ and $A_{ij}\\subset V_i$ with $j\\in [N]$, if\n$$\\frac{1}{N}\\sum_{j=1}^N |A_{ij}|\\geq \\epsilon_i n_i \\quad\\mbox{ for }\\quad i=1,\\ldots,k,$$\nthen there is a subset $T\\subset [N]$ with $|T|=t$, such that if $S\\subseteq T$ then\n$$\\left|\\bigcap\\nolimits_{j\\in S} A_{ij}\\right|\\geq (\\alpha \\epsilon_i)^{|S|}n_i \\quad\\mbox{ for }\\quad i=1,\\ldots,k.$$\n\\end{lemma}\n\nThe following proposition which will be used when the differences of $V_i$'s are large.\n\\begin{proposition}\\label{size gape}\nLet $N\\leq \\sum\\nolimits_{i=1}^m u_i$.\nIf $ {\\epsilon}<(1/m)^{10^m}$, then there exists an $\\eta\\in [\\epsilon,\\sqrt[10^m]{\\epsilon}]$ such that either $u_i< \\eta N$ or $u_i\\geq \\sqrt[10]{\\eta}N$ for each $i\\in[m]$.\n\\end{proposition}\n\\begin{proof}\nLet $0=u_0\\leq u_1\\leq \\ldots \\leq u_m$.\nIf $u_i\\leq \\sqrt[10^i]{\\epsilon}N$ and $u_{i+1}\\geq \\sqrt[10^{i+1}]{\\epsilon}N$ for some $i\\in \\{0,1,\\ldots,m-1\\}$, then let $\\eta=\\sqrt[10^i]{\\epsilon}$.\nOtherwise, we have $\\sum\\nolimits_{i=1}^m u_i\\leq m\\sqrt[10^m]{\\epsilon}Nm}$ be the integral class of $\\mathcal{P}$.\nLet $X_i$ denote the integral part of $P_i$ and $Y_i$ be the partial part of $P_i$.\n\nLet $m=\\min\\{|X_{i\\leq m}|,|P_{i>m}|\\}$.\nSince $\\mathcal{P}_X$ is $\\epsilon $-stable to $\\mathcal{V}$, we only need to remove at most $2\\epsilon$ vertices from each $X_{i\\leq m}$ to make all of them with size $m$.\nMoreover, we can remove at most $4\\epsilon$ vertices from each $Y_{i\\leq m}$ to make sure that\n\\begin{itemize}\n\\item the size of each partial class of $\\mathcal{V}$ is no more than $m$ and\n\\item after removing any integral $V_j$ in $P_{i\\leq m}$ the size of the resulting set is no more than $m$.\n\\end{itemize}\n\nLet $i>m$.\nIf $|P_i|\\geq 4tr\\epsilon+m$, then we remove $3\\epsilon$ vertices from every integral $V_j$ of $P_{i}$, otherwise we do nothing.\nDenote the obtained class by $\\widetilde{P}_i$.\nNote that each class of $\\mathcal{P}\\wedge \\mathcal{V}$ is of size at least $10tr\\epsilon$.\nClearly, after deleting all vertices of any $V_i$ in $\\widetilde{P}_i$, the resulting set is of size at most $m$.\nTherefore, we only need to remove $X$ with $|X|\\leq 4tr\\epsilon$ to ensure $\\mathcal{P}_X$ be stable to $ \\mathcal{V}_X$.\nWe complete the proof of Lemma \\ref{remove vertices to make it stable}.\n\\end{proof}\n\n\\section{Properties of extremal graphs in multi-partite graphs}\\label{section 3}\n\n\nFirst we present some properties of the extremal graphs for $K_t$ in multi-partite graphs.\n\n\n\\begin{proposition}\\label{pro:f(n,t)}\nLet $n_1\\geq\\ldots \\geq n_r$.\nThen\n$f(n_1,\\ldots,n_r,1,t)\\geq f(n_1,\\ldots,n_r,1,t-1)+n_{t-1}^2$.\n\\end{proposition}\n\\begin{proof}\nLet $\\mathcal{P}=(P_1,\\ldots,P_{t-2})$ be a $(t-2)$-partition of $[r]$ which attains $f(n_1,\\ldots,n_r,1,t-1)$.\nThus, by the Pigeonhole Principle, one part of $\\mathcal{P}$, say $P_s$, contains at least two integers $i,j\\in [t-1]$.\nHence $\\mathcal{P}^\\ast=(P_1,\\ldots,\\{i\\},P_s\\setminus\\{i\\},\\ldots,P_{t-2})$ is a $(t-1)$-partition of $[r]$.\nNote that $n_i\\geq n_{t-1}$ and $n_j\\geq n_{t-1}$.\nTherefore, we have $f(n_1,\\ldots,n_r,1,t-1)+n_{t-1}^2\\leq f(n_1,\\ldots,n_r,1,t-1)+n_i n_j \\leq f(n_1,\\ldots,n_r,1,t)$.\nThe proof of this proposition is complete.\n\\end{proof}\n\nNow we present some properties of the extremal graphs for $K^s_t$.\n\n\\begin{proposition}\\label{neighbour copy keeps free}\nGiven a graph $F$ with $\\chi(F) \\geq 3$.\nLet $G$ be an $r$-partite graph with parts $\\mathcal{V}=(V_1,\\ldots,V_r)$.\nSuppose that $G$ is $F$-free with two vertex-disjoint independent sets $A,B$ with $|B| \\geq |F|$.\nLet $X=N_G[B]\\setminus A$.\nThen $G_{A \\rightarrow X}$ is also $F$-free.\nIn particular, if $G$ is $K_t$-free, then $G_{V_i}$ is still $K_t$-free.\n\\end{proposition}\n\\begin{proof}\nSuppose for a contradiction that $G_{A \\rightarrow X}$ contains a copy of $F$.\nClearly, $V(F)\\cap A$ is not empty, as otherwise $G$ contains a copy of $F$, a contradiction.\nWe can see that $G[X]$ contains a copy of $F-A$, implying $G[X\\cup B]$ contains a copy of $F$ (note that $A$ is an independent set in $G_{A \\rightarrow X}$), a contradiction ($G$ is $F$-free).\nThe proof of this proposition is complete.\n\\end{proof}\t\n\nNow we are ready to prove Theorem~\\ref{extremal (t-1)-partition}.\n\n\\medskip\n\n\\noindent {\\bf Proof of Theorem~\\ref{extremal (t-1)-partition}.}\nLet $G=K_{n_1,\\ldots,n_r}[\\mathcal{P}]$ and $\\mathcal{P}=(P_1,\\ldots,P_{t-1})$ be an extremal $(t-1)$-partition of $V(G)$ (See Figure 2).\nWe denote $X_{i,j}=V_i\\cap P_j$ for $i\\in [r],j\\in [k]$.\nWe only prove the ``only if'' part of Theorem~\\ref{extremal (t-1)-partition} since the ``if'' part is trivial.\n\nFirst, if $V_i$ is non-empty in $P_j$, then in any $P_{j^\\prime}$ with $j\\neq j^\\prime$ we have $|P_j\\setminus V_i|\\leq |P_{j^\\prime}\\setminus V_i|.$\nOtherwise, $G_{X_{i,j}\\rightarrow N_G[X_{i,j^\\prime}]}$ has more edges than $G$, a contradiction to the maximality of $G$.\nThus if $V_i$ is partial in $P_j,P_{j^\\prime}$ with $j\\neq j^\\prime$, then $|P_j\\setminus V_i|=|P_{j^\\prime}\\setminus V_i|,$ and if $V_i$ is integral in $P_j$ then for every distinct $P_{j^\\prime}$, we have\n\\begin{equation}\\label{eq 0}\n|P_j\\setminus V_i|\\leq |P_{j^\\prime}|.\n\\end{equation}\n\t\n\tAssume there exists a part $P_{j_1}$ such that two sets $V_{i_1},V_{i_2}$ with $i_1\\neq i_2$ are both partial in it.\n\tLet $V_{i_1}$ be partial in $P_{j_2}$ with $j_2\\neq j_1$ and $V_{i_2}$ be partial in $P_{j_3}$ with $j_3\\neq j_1$ (it is possible that $j_2=j_3$).\n\tFrom the last paragraph, we have $|P_{j_1}\\setminus V_{i_1}|=|P_{j_2}\\setminus V_{i_1}|$ and hence $G^1=G_{X_{i_1,j_2}\\rightarrow N_G[X_{i_1,j_1}]}$ has same number of edges with $G$.\n Note that $|P_{j_1}\\setminus V_{i_2}|=|P_{j_3}\\setminus V_{i_2}|$.\n Thus $G^1_{X_{i_2,j_1}\\rightarrow N_{G^1}[X_{i_2,j_3}]}$ has more edges than $G^1$, and hence has more edges than $G$, a contradiction.\n\tTherefore, $\\mathcal{P}$ is $1$-partial.\n\t\n\tNow, we may suppose that $V_{i_1}$ is partial in $P_{j_1},P_{j_2}$ and $V_{i_2}$ is partial in $P_{j_3},P_{j_4}$ with $i_1\\neq i_2$ and $j_10$ depending on $t$, $r$ and $\\epsilon$ such that if $n_{t-1}\\geq 1/\\delta$ and $e(G)\\geq f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2$, then $G$ has an $(X,\\epsilon)$-stable $(t-1)$-partition $\\mathcal{P}$.\nMoreover, the size of the integral part of each class of $(\\mathcal{P} \\wedge \\mathcal{V})_X$ is larger than $(1-\\epsilon)n_{t-1}$.\n\\end{theorem}\n\n\\begin{proof}\nLet $$\\delta \\ll \\epsilon_r \\ll \\epsilon_{r-1} \\ll\\ldots \\ll \\epsilon_1 \\ll \\epsilon_0=\\epsilon \\mbox{\\quad and \\quad} n_{t-1}\\geq 1/\\delta.$$\n\n\nLet $G$ be a $K_t$-free spanning subgraph of $K_{n_1,\\ldots,n_r}$ with $e(G)\\geq f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2$.\nLet $G^0=G$ and define $G^i=G^{i-1}_{V_i}$ recursively.\nClearly, by Proposition \\ref{neighbour copy keeps free} all of them are $K_t$-free and\n\\begin{equation}\\label{eq 1 for weak stability}\ne(G^r)\\geq \\ldots \\geq e(G^0)=e(G)\\geq f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2.\n\\end{equation}\n\nWe first prove that $G^r$ has an $(X_r,\\epsilon_r )$-stable $(t-1)$-partition $\\mathcal{P}^r$ such that each $V_{i\\leq r}$ is integral in $\\mathcal{P}^r$.\nThe hierarchy of constants in the proof satisfy $\\delta \\ll \\xi \\ll \\epsilon_r$.\nSince $\\delta$ is sufficiently small, by Lemma \\ref{size gape} for $\\delta$ there exists a $\\xi\\in [\\delta,\\sqrt[10^{r}]{\\delta}]$, such that every set of $\\mathcal{V}$ with size either less than $\\xi n_{t-1}$ or at least $\\sqrt[10]{\\xi} n_{t-1}$.\nLet $b$ be the minimum integer such that $n_b\\leq \\xi n_{t-1}$.\nLet $X_r$ be the union of sets in $\\mathcal{V}$ with size less than $\\xi n_{t-1}$ and let $|X_r|=m_r= \\sum_{i=b}^{r}n_i$.\nLet $H_0=G^r-X_r$.\nDefine $H_{i+1}$ be the subgraph of $H_i$ induced by the neighbours of a vertex with maximum degree in $H_{i}$ recursively.\nSince $G^r$ is $K_t$-free, we define graphs $H_1,\\ldots,H_{s-1}$ with $s\\leq t-1$.\nLet $A_i=V(H_{i-1})-V(H_{i})$ for $i \\in [s]$.\nClearly, $A_s$ is an independent set in $G^r$.\n\nNote that every set of $\\mathcal{V}_{X_r}$ is integral in $(A_1,\\ldots,A_s)$ (each vertex in $V_i$ has common neighbors in $G^r$).\nThus\\begin{equation}\\label{eq 2 for weak stability}\ne(H_0)\\leq \\sum\\limits_{i=1}^{s-1} \\left(\\sum\\limits_{v\\in A_i}\\frac{d_{A_i}(v)}{2}+d_{H_i}(v)\\right)\\leq \\sum\\limits_{1\\leq i f(n_1,\\ldots,n_r,1,t)$, hence $G^k$ contains a copy of $K_t$, a contradiction.\n\nClearly, the vertices in $\\widetilde{V}_j$ have same neighbours in $G^{k-1}$ for $j \\leq k-1$.\nFor $j \\geq k+1$, similar as the above proof (the proof for the bound of $Z_1$), there exists a set of vertices $Z_2$ with size at most $\\sqrt{\\epsilon_{k}} n_{t-1}$ such that for each vertex $a\\in\\widetilde{V}_j \\setminus Z_2$ in $G^{k-1}$, we have\n\\begin{equation}\\label{minimal degree 1}\nd_{G^{k-1}}(a)\\geq \\sum_{i\\neq i_j} |P_{i}^k|- \\sqrt{\\epsilon_{k}} n_{t-1}.\n\\end{equation}\nLet $\\mathcal{\\widetilde{V}}=(\\widetilde{V}_1,\\ldots,\\widetilde{V}_r)$.\nBy Proposition \\ref{size gape}, for $\\sqrt[4]{\\epsilon_{k}}$ there exists $\\xi$ with a $\\xi\\in [\\sqrt[4]{\\epsilon_{k}},\\sqrt[4 \\times 10^{(t-1)r}]{\\epsilon_{k}}]$, such that every set of $(\\mathcal{P}^k\\wedge \\mathcal{\\widetilde{V}})_{X_k \\cup Z_1 \\cup Z_2}$ with size either less than $\\xi n_{t-1}$ or more than $\\sqrt[10]{\\xi} n_{t-1}$.\nLet $Z_3$ be the union of sets in $(\\mathcal{P}^k\\wedge \\mathcal{\\widetilde{V}})_{X_k \\cup Z_1 \\cup Z_2}$ with size less than $\\xi n_{t-1}$.\nLet $X_{k-1}^1=X_{k}\\cup Z_1\\cup Z_2\\cup Z_3$.\nThus $|X_{k-1}^1|\\leq 4tr\\xi n_{t-1}$.\n\nLet $S=V_{k}\\cup X_{k-1}^1$.\nNow we try to construct the desired $(t-1)$-partition of $G^{k-1}$ by distributing vertices in $V_{k}\\setminus X_{k-1}^1$ to each part of $\\mathcal{P}^k_{S}=(\\widetilde{P}_1,\\ldots,\\widetilde{P}_{t-1})$.\nWe partition $V_k\\setminus X_{k-1}^1$ into $B_1\\cup \\ldots\\cup B_{t-1}$ such that if $v\\in B_i$ then $v$ has minimum number of neighbours in $\\widetilde{P}_i$.\n\n\\medskip\n\n\\noindent{\\bf Claim 1.} There is no edge between $B_i$ and $\\widetilde{P}_i$ for $i\\in [t-1]$.\n\n\n\\medskip\n\n\\begin{proof}\nClearly, we may suppose that $|\\widetilde{P}_i|\\geq \\sqrt[10]{\\xi}n_{t-1}$, as otherwise there is nothing to show.\nLet $a\\in B_i$.\nDue to (\\ref{minimal degree}) and $\\mathcal{P}^k$ is $\\sqrt{\\xi} n_{t-1}$-stable to $\\mathcal{V}$, by Proposition~\\ref{lemma for Q} there exists an $m(a)$ such that $a$ has more than $(\\sqrt[4]{ \\epsilon_k}/4)n_{t-1}$ vertices of the integral part of every $\\widetilde{P}_{i\\neq m(a)}$.\nIf there is an edge between $a$ and $\\widetilde{P}_i$, then by the definition of $B_i$, there is an edge $ab$ with $b \\in\\widetilde{P}_{m(a)}$.\nSince $\\mathcal{P}^k_{X_k}$ is $(X_k,\\epsilon_{k})$-stable, we can construct a copy of $K_t$ in $G^{k-1}$ with vertices $a,b$ and their common neighbours in the integral parts of $\\widetilde{P}_{i\\neq m(a)}$'s, a contradiction.\nHence we finish the proof of Claim 1.\n\\end{proof}\n\n\nWe construct the $(t-1)$-partition $\\mathcal{P}_{X_{k-1}^1}^{k-1}$ with each part $P^{k-1}_j=\\widetilde{P}_j\\cup B_j$.\nBy Claim 1, each part of $\\mathcal{P}^{k-1}_{X_{k-1}^1}$ is an independent set in $G^{k-1}$.\nMoreover, since $\\mathcal{P}^k_{X_k}$ is $(X_k,\\epsilon_{k})$-stable, the size of $P^k_{i_k} \\setminus V_{k}$ is no more than the size of the integral part of other class of $\\mathcal{P}^k_{X_k}$.\n\nBy Proposition \\ref{size gape} for $\\xi$ there exists a $\\zeta$ with $\\zeta\\in [{\\xi},\\sqrt[10^{(t-1)r}]{\\xi}]$ such that every set of $(\\mathcal{P}^{k-1}\\wedge \\mathcal{V})_{X_{k-1}^1}$ with size either less than $\\zeta n_{t-1}$ or more than $\\sqrt[10]{\\zeta} n_{t-1}$.\nLet $Z_4$ be the union of sets in $(\\mathcal{P}^{k-1}\\wedge \\mathcal{V})_{X_{k-1}^1}$ with size less than $\\zeta n_{t-1}$ and $X_{k-1}^2=X_{k-1}^1\\cup Z_4$.\nThus $|X_{k-1}^2|\\leq 6tr\\zeta n_{t-1}$.\n\n\nWe define $\\widehat{V}_i=V_i\\setminus X_{k-1}^2,\\widehat{P}_j=P^{k-1}_j\\setminus X_{k-1}^2$ and let $X_{i,j}=\\widehat{V}_i\\cap \\widehat{P}_j$ for $i\\in [r],j\\in [t-1]$.\nNow we will show that $\\mathcal{P}^{k-1}_{X^2_{k-1}}$ is $\\sqrt[8]{\\zeta}n_{t-1}$-stable to $\\mathcal{V}$.\nThe proofs of the following claims are quite similar to the proof in Theorem~\\ref{extremal (t-1)-partition}.\n\n\n\\medskip\n\n\n\\noindent{\\bf Claim 2.} If $\\widehat{V}_i$ is partial in distinct $\\widehat{P}_s,\\widehat{P}_\\ell$, then $\\big||\\widehat{P}_s\\setminus \\widehat{V}_i|-|\\widehat{P}_\\ell\\setminus \\widehat{V}_i|\\big |\\leq \\sqrt{\\zeta} n_{t-1}$.\n\n\\medskip\n\n\\begin{proof}\nWithout loss of generality, suppose for a contradiction that $\\widehat{V}_1$ is partial in parts $\\widehat{P}_1,\\widehat{P}_2$ with $|\\widehat{P}_2\\setminus \\widehat{V}_1|-|\\widehat{P}_1\\setminus \\widehat{V}_1|> \\sqrt{\\zeta} n_{t-1}$.\nLet $S=\\bigcup\\nolimits_{i=2}^{t-1} (\\widehat{P}_i\\setminus \\widehat{V}_1)$.\nNote that $|X_{1,1}|\\geq \\sqrt[10]{\\zeta}n_{t-1}$.\nThus by \\eqref{eq 1 for weak stability} and Proposition \\ref{neighbour copy keeps free}, we know that $H=G^{k-1}_{X_{1,1}\\to S}$ is a $K_t$-free graph with\n\\begin{eqnarray*}\ne(H)&\\geq& e(G^{k-1})+ \\sqrt{\\zeta}n_{t-1}|X_{1,1}|\\\\\n&\\geq& f(n_1,\\ldots,n_r,1,t)-\\delta n_{t-1}^2+ \\sqrt{\\zeta} n_{t-1} \\sqrt[10]{\\zeta} n_{t-1}\\\\\n&>& f(n_1,\\ldots,n_r,1,t),\n\\end{eqnarray*}\n a contradiction.\nWe finish the proof of Claim 2.\n\\end{proof}\n\n\n\n\n\n\n\\medskip\n\n\n\\noindent{\\bf Claim 3.} $\\mathcal{P}^{k-1}_{X_{k-1}^2}$ is $1$-partial to $\\mathcal{V}_{X_{k-1}^2}$.\n\n\\medskip\n\n\\begin{proof}\nWithout loss of generality, suppose for a contradiction that both $\\widehat{V}_1,\\widehat{V}_2$ are partial in $\\widehat{P}_1$.\nLet $\\widehat{V}_1$ be partial in $\\widehat{P}_{s}$ and $\\widehat{V}_2$ be partial in $\\widehat{P}_{\\ell}$ (it is possible that $s=\\ell$).\nSince $\\widehat{V}_2$ is partial in $\\widehat{P}_1,\\widehat{P}_{\\ell}$, it follows from Claim 2 that $|\\widehat{P}_1\\setminus \\widehat{V}_2|-|\\widehat{P}_{\\ell}\\setminus \\widehat{V}_2|\\geq -\\sqrt{\\zeta} n_{t-1}$.\nLet $S_1=\\bigcup\\limits_{i\\neq \\ell} (\\widehat{P}_i\\setminus \\widehat{V}_2)$.\nThus $H^1=G^{k-1}_{X_{2,1}\\to S_1}$ is a $K_t$-free graph with\n$$e(H^1)\\geq e(G^{k-1})+ (|\\widehat{P}_1\\setminus \\widehat{V}_2|-|\\widehat{P}_{\\ell}\\setminus \\widehat{V}_2|-|X_{k-2}^2|)|X_{2,1}|\n\\geq e(G^{k-1}) -2\\sqrt{\\zeta}n_{t-1}|X_{2,1}|.$$\nLet $S_2=\\bigcup\\limits_{i=2}^{t-2}(\\widehat{P}_i\\setminus \\widehat{V}_1)\\cup X_{2,1}$.\nSince $\\widehat{V}_1$ is partial in $\\widehat{P}_1,\\widehat{P}_{s}$, Claim 2 implies $|\\widehat{P}_1\\setminus X_{1,1}|-|\\widehat{P}_{s}\\setminus X_{1,s}|\\leq \\sqrt{\\zeta} n_{t-1}$.\nNote that $|X_{2,1}|,|X_{1,s}|\\geq \\sqrt[10]{\\zeta}n_{t-1}$.\nThus $H^2=H^1_{X_{1,s}\\to S_2}$ is a $K_t$-free graph with\n\\begin{eqnarray*}\ne(H^2)&\\geq& e(H^1)+(|X_{2,1}|-\\sqrt{\\zeta}n_{t-1}-|X_{k-2}^2|)|X_{1,s}|\\\\\n&\\geq& e(H^1)+\\dfrac{1}{2}|X_{2,1}||X_{1,s}|\\\\\n& \\geq& e(G^{k-1})+\\dfrac{1}{2}\\sqrt[5]{\\zeta}n_{t-1}^2\\\\\n&>&f(n_1,\\ldots,n_r,1,t),\n\\end{eqnarray*}\na contradiction.\nThus $\\mathcal{P}^{k-1}_{X_{k-1}^2}$ is $1$-partial to $\\mathcal{V}_{X_{k-1}^2}$.\n\\end{proof}\n\n\nBy Claim 3 we know that if $\\widehat{V}_i$ is partial in $\\widehat{P}_j$, then $\\widehat{P}_j\\setminus \\widehat{V}_i$ is the integral part of $\\widehat{P}_j$.\n\n\\medskip\n\n\n\\noindent{\\bf Claim 4.} If $\\widehat{P}_{s},\\widehat{P}_{\\ell}$ are distinct partial classes of $\\mathcal{P}^{k-1}_{X_{k-1}^2}$, then the integral parts of $\\widehat{P}_{s}$ and $\\widehat{P}_{\\ell}$ are less than $\\sqrt[5]{\\zeta} n_{t-1}$ difference in sizes.\n\n\\medskip\n\n\\begin{proof}\nIn Claim 2, we have already showed that Claim 4 holds when $\\widehat{V}_i$ is partial in both $\\widehat{P}_s,\\widehat{P}_\\ell$.\nWithout loss of generality, we only need to show if $\\widehat{V}_1$ is partial in $\\widehat{P}_1$ and $\\widehat{V}_2$ is partial in $\\widehat{P}_2$ then $\\big||\\widehat{P}_1\\setminus \\widehat{V}_1|-|\\widehat{P}_2\\setminus \\widehat{V}_2|\\big|\\leq \\sqrt[5]{\\zeta} n_{t-1}.$\nWithout loss of generality, suppose that $\\widehat{V}_2$ is also partial in $\\widehat{P}_3$ and let $a\\in X_{2,2}$ be a vertex with maximum degree among $X_{2,2}\\cup X_{2,3}$ in graph $G^{k-1}$.\nThus $H^1=G^{k-1}_{X_{2,3}\\to N_{G^{k-1}(a)}}$ keeps $K_t$-free with $e(H^1)\\geq e(G^{k-1})$.\nLet $S=\\bigcup\\limits_{i\\neq 3} (\\widehat{P}_i\\setminus \\widehat{V}_1)\\cup X_{2,3} $.\nNote $H^2=H^1_{X_{1,1}\\to S}$ is a $K_t$-free graph.\nThus\n$$f(n_1,\\ldots,n_r,1,t)\\geq e(H^2)\\geq e(G^{k-1})+(|\\widehat{P}_1\\setminus \\widehat{V}_1|-|\\widehat{P}_3\\setminus \\widehat{V}_2|-|X_{k-2}^2|)|X_{1,1}|,$$\nwhich implies $|\\widehat{P}_1\\setminus \\widehat{V}_1|-|\\widehat{P}_3\\setminus \\widehat{V}_2|\\leq \\sqrt[4]{\\zeta}n_{t-1}$.\nBy Claim 2 we know $|\\widehat{P}_3\\setminus \\widehat{V}_2|-|\\widehat{P}_2\\setminus \\widehat{V}_2|\\leq \\sqrt{\\zeta}n_{t-1}$.\nThus $|\\widehat{P}_1\\setminus \\widehat{V}_1|-|\\widehat{P}_2\\setminus \\widehat{V}_2|\\leq \\sqrt[5]{\\zeta}n_{t-1}$.\nBy symmetry of $\\widehat{P}_1,\\widehat{P}_2$, we have $|\\widehat{P}_2\\setminus \\widehat{V}_2|-|\\widehat{P}_1\\setminus \\widehat{V}_1|\\leq \\sqrt[5]{\\zeta}n_{t-1}$ and finish the proof of Claim 4.\n\\end{proof}\n\n\n\n\n\n\\noindent{\\bf Claim 5.} If $\\widehat{V}_i$ is partial in $\\widehat{P}_j$, then $|\\widehat{P}_j\\setminus \\widehat{V}_i|\\geq |\\widehat{V}_i|-\\sqrt[7]{\\zeta}n_{t-1}$.\n\n\\medskip\n\n\\begin{proof}\nWithout out loss of generality, suppose that $\\widehat{V}_1$ is partial in $\\widehat{P}_1,\\widehat{P}_2,\\ldots,\\widehat{P}_s$ with $|\\widehat{P}_1\\setminus \\widehat{V}_1|\\leq |\\widehat{V}_1|-\\sqrt[7]{\\zeta}n_{t-1}$.\n\n\n\nIn graph $G^{k-1}$, we turn the neighbours of every vertex of $\\widehat{V}_1$ to $\\bigcup_{j=1}^{t-1} (\\widehat{P}_j\\setminus \\widehat{V}_1)$ and delete the edges between $\\widehat{P}_1\\setminus \\widehat{V}_1$ and $\\widehat{P}_2\\setminus \\widehat{V}_1$.\nDenote the resulting graph by $J$.\nNote that $J$ is a $K_t$-free graph with\n\\begin{displaymath}\n\\begin{aligned}\ne(J)&\\geq e(G^{k-1})+(|\\widehat{P}_1\\setminus \\widehat{V}_1|-2\\sqrt[5]{\\zeta}n_{t-1})|\\widehat{V}_1|-|\\widehat{P}_1\\setminus \\widehat{V}_1||\\widehat{P}_2\\setminus \\widehat{V}_1|\\\\\n&\\geq e(G^{k-1})+|\\widehat{P}_1\\setminus \\widehat{V}_1|( \\sqrt[7]{\\zeta}n_{t-1}-3\\sqrt[5]{\\zeta}n_{t-1})-2\\sqrt[7]{\\zeta}\\cdot \\sqrt[5]{\\zeta} n_{t-1}^2\\\\\n&\\geq e(G^{k-1})+\\sqrt{\\zeta}n_{t-1}^2>f(n_1,\\ldots,n_r,1,t),\n\\end{aligned}\n\\end{displaymath}\na contradiction.\nThe proof of Claim 5 is complete.\n\\end{proof}\n\n\n\n\\noindent{\\bf Claim 6.} If $\\widehat{V}_i$ is partial in $\\widehat{P}_\\ell$ and $\\widehat{P}_s$ is an integral class, then $|\\widehat{P}_j\\setminus \\widehat{V}_i|\\leq |\\widehat{P}_s|+\\sqrt[7]{\\zeta}n_{t-1}$.\n\n\\medskip\n\\begin{proof}\nLet $S=\\bigcup\\limits_{\\iota\\neq s}(\\widehat{P}_\\iota\\setminus \\widehat{V}_i)$.\nOtherwise, the graph $H=G^{k-1}_{X_{i,\\ell}\\to S}$ is $K_t$-free graph with more than $e(G^{k-1})+(\\sqrt[7]{\\zeta}-\\sqrt{\\zeta})\\sqrt[10]{\\zeta}n_{t-1}^2> f(n_1,\\ldots,n_r,1,t)$ edges, a contradiction.\nTherefore, we complete the proof of Claim 6.\n\\end{proof}\n\n\n\n\n\\noindent{\\bf Claim 7.} If $\\widehat{V}_i$ is integral in $\\widehat{P}_\\ell$ and $\\widehat{V}_j$ is partial in $\\widehat{P}_s$, then $|\\widehat{P}_\\ell \\setminus \\widehat{V}_i| \\leq |\\widehat{P}_s \\setminus \\widehat{V}_j|+\\sqrt[7]{\\zeta}n_{t-1}$.\n\n\n\n\\medskip\n\\begin{proof}\nLet $\\widehat{V}_j$ be partial in $\\widehat{P}_{s^\\prime}$.\nWe first suppose $s^\\prime \\neq \\ell$.\nWithout loss of generality, let $a\\in X_{j,s}$ be a vertex with maximum degree among set $X_{j,s}\\cup X_{j,s^\\prime}$ in graph $G^{k-1}$.\nThus $H^1=G^{k-1}_{X_{j,s^\\prime}\\to N_{G^{k-1}(a)}}$ keeps $K_t$-free with $e(H^1)\\geq e(G^{k-1})$.\nLet $S=\\bigcup_{\\iota\\neq s^\\prime}(\\widehat{P}_\\iota\\setminus \\widehat{V}_i)\\cup X_{j,s^\\prime}$ and $H^2=H^1_{X_{i,\\ell}\\to S}$.\nNote that $H^2$ is a $K_t$-free graph.\nWe have $f(n_1,\\ldots,n_r,1,t)\\geq e(H^2)\\geq e(G^{k-1})+(|\\widehat{P}_\\ell \\setminus \\widehat{V}_i|- |\\widehat{P}_{s^\\prime} \\setminus \\widehat{V}_j|-|X_{k-2}^2|)|X_{i,\\ell}|$, implying $|\\widehat{P}_\\ell \\setminus \\widehat{V}_i|- |\\widehat{P}_{s^\\prime} \\setminus \\widehat{V}_j|\\leq \\sqrt[5]{\\zeta}n_{t-1}$.\nSince $\\big| |\\widehat{P}_s \\setminus \\widehat{V}_j|- |\\widehat{P}_{s^\\prime} \\setminus \\widehat{V}_j| \\big| \\leq \\sqrt{\\zeta}n_{t-1}$, we have $|\\widehat{P}_\\ell \\setminus \\widehat{V}_i| \\leq |\\widehat{P}_s \\setminus \\widehat{V}_j|+\\sqrt[7]{\\zeta}n_{t-1}$.\n\nSuppose $s^\\prime =\\ell$.\nLet $S_1=\\bigcup_{\\iota\\neq \\ell} \\widehat{P}_\\iota\\setminus \\widehat{V}_j$ and $H^1=G^{k-1}_{X_{j,s}\\to S_1}$.\nLet $S_2=\\bigcup_{\\iota\\neq s} (\\widehat{P}_\\iota\\setminus \\widehat{V}_i)\\cup X_{j,s}$ and $H^2=H^1_{\\widehat{V}_i\\to S_2}$.\nClearly, $H^1$ and $H^2$ are both $K_t$-free.\nNote that $|\\widehat{V}_i|\\geq \\sqrt[10]{\\zeta}n_{t-1}$.\nBy Claim 2 we have $|\\widehat{P}_s\\setminus \\widehat{V}_j|-|\\widehat{P}_\\ell\\setminus \\widehat{V}_j|\\geq -\\sqrt{\\zeta}n_{t-1}$, hence\n\\begin{equation}\n\\begin{aligned}\nf(n_1,\\ldots,n_r,1,t)&\\geq e(H^2)\\geq e(H^1)+(|\\widehat{P}_\\ell\\setminus \\widehat{V}_i|-|\\widehat{P}_s\\setminus \\widehat{V}_i|+|X_{j,s}|-|X_{k-1}^2|)|\\widehat{V}_i|\\\\\n&\\geq e(G^{k-1})+(|\\widehat{P}_s\\setminus \\widehat{V}_j|-|\\widehat{P}_\\ell\\setminus \\widehat{V}_j|-|X^2_{k-1}|)|X_{j,s}|\\\\\n&+(|\\widehat{P}_\\ell\\setminus \\widehat{V}_i|-|\\widehat{P}_s\\setminus \\widehat{V}_i|+|X_{j,s}|-|X_{k-1}^2|)|\\widehat{V}_i|\\\\\n&\\geq e(G^{k-1})+(|\\widehat{P}_\\ell\\setminus \\widehat{V}_i|-|\\widehat{P}_s\\setminus \\widehat{V}_i|)|\\widehat{V}_i|.\n\\end{aligned}\n\\end{equation}\nTherefore, we have $|\\widehat{P}_\\ell\\setminus \\widehat{V}_i|\\leq |\\widehat{P}_s\\setminus \\widehat{V}_j|+\\sqrt[7]{\\zeta}n_{t-1}$, and we finish the proof of Claim 7.\n\\end{proof}\n\nBy Claims 3-7, $\\mathcal{P}^{k-1}_{X^2_{k-1}}$ is $\\sqrt[8]{\\zeta}n_{t-1}$-stable to $\\mathcal{V}$.\nBy Lemma \\ref{remove vertices to make it stable}, we can remove at most $2tr\\sqrt[7]{\\zeta}n_{t-1}$ vertices, say $Z_5$, to ensure that $(\\mathcal{P} \\wedge \\mathcal{V})_{X_{k-1}^2\\cup Z_5}$ satisfies the following.\n\\begin{itemize}\n\\item the sizes of integral parts of partial classes are equal to each other,\n\\item the size of the integral part of any class is at least the size of any partial class and\n\\item after removing any integral $V_i$, the size of the resulting set is at most the size of the integral part of any other class.\n\\end{itemize}\nLet $X_{k-1}=X_{k-1}^2\\cup Z_5$.\nThus $|X_{k-1}|\\leq \\sqrt[8]{\\zeta}n_{t-1}\\leq \\epsilon_{k-1} n_{t-1}$.\nCombining (\\ref{minimal degree}) and (\\ref{minimal degree 1}), we can see that the partition $\\mathcal{P}^{k-1}$ is an $(X_{k-1},\\epsilon_{k-1})$-stable partition of $G$.\nMoreover, every $V_{i\\leq k-1}\\setminus X_{k-1}$ keeps integral in $\\mathcal{P}^{k-1}_{X_{k-1}}$.\nFurthermore, similar as the proof of Proposition~\\ref{larger than n_{t-1}}, the size of integral part of each class of $\\mathcal{P}_{X_{k-1}}$ is at least $(1-\\epsilon_{k-1}) n_{t-1}$.\n\n\nThus recursively, we obtain an $(X_0,\\epsilon)$-stable partition $\\mathcal{P}^0$ of $G$, and we finish the proof of Theorem~\\ref{weak stability n case}.\n\\end{proof}\n\nNow we start to prove Theorem \\ref{weak stability}.\n\n\\medskip\n\n\n\n\\noindent{\\bf Proof of Theorem~\\ref{weak stability}.}\nNote that each graph with chromatic number $t$ is a subgraph of a $t$-partite graph.\nWe only need to prove that the theorem holds for $K_t^s$ where $s$ is an integer depending on $F$.\n\n\nLet $G$ be a $K_t^s$-free $r$-partite graph with parts $\\mathcal{V}=(V_1,\\ldots,V_r)$.\nLet $0<\\delta\\ll \\delta_1 \\ll\\epsilon<1$, where $\\delta$ and $\\delta_1$ will be determined later by $\\epsilon$, Lemma \\ref{removal lemma} and Theorem \\ref{weak stability n case}.\nLet $n \\geq 1/\\delta $.\nSuppose that\n\\begin{equation}\\label{eq for weak}\ne(G)\\geq f(n_1,\\ldots,n_t,1,t)-\\delta n^2_{t-1}.\n\\end{equation}\nWe shall show that $G$ has an $(X,\\epsilon )$-stable $(t-1)$-partition $\\mathcal{P}$ after removing $ \\epsilon n_{t-1}^2$ edges.\n\nLet $\\tau=\\tau(n_1,\\ldots,n_r,t,tr)$.\nWe first discuss the case $\\tau=0$.\nSince $\\tau=0$, we have $n_{t-1}\\geq (1/tr)^{4t}n$.\nFor $\\alpha=(1/tr)^{8t}\\delta_1$, by Lemma \\ref{removal lemma}, $G$ contains a $K_t$-free graph $\\widetilde{G}$ with $e(\\widetilde{G})\\geq e(G)-(1/tr)^{8t}\\delta_1 n^2 \\geq e(G)-\\delta_1 n_{t-1}^2$ edges.\nHence by (\\ref{eq for weak}), the resulting graph $\\widetilde{G}$ is $K_t$-free with $e(\\widetilde{G})\\geq f(n_1,\\ldots,n_t,1,t)- 2\\delta_1 n^2_{t-1}$.\nTherefore, by Theorem \\ref{weak stability n case}, $\\widetilde{G}$ has an $(X,\\epsilon)$-stable $(t-1)$-partition $\\mathcal{P}$ and each class of $\\mathcal{P}$ is larger than $(1-\\epsilon) n_{t-1}$.\nThus we finish the proof of the theorem in this case.\n\n\n\nThen we discuss the case when $\\tau\\geq 1$.\nLet $U_L$ be the union of large sets $V_1,\\ldots,V_{\\tau}$ and $U_S=V(K)-V_L$ to be the union of rest small sets.\n\nIf $\\tau\\geq 1$, then $\\tau\\leq t-2$ (recall the definition of $\\tau$).\nClearly, we have\n\\begin{equation}\\label{1.4 Eq 1}\nf(n_1,\\ldots,n_r,1,t)\\geq e(K_{n_1,\\ldots,n_\\tau})\\geq e(K_{n_1,\\ldots,n_r})-|U_S|^2\\geq e(K_{n_1,\\ldots,n_r})-(r n_{\\tau+1})^2.\n\\end{equation}\nLet $Z_i^1$ be the vertices of large set $V_i$ with degree less than $n-|V_i|-2rn_{\\tau+1}$.\nThus $e(G)\\leq e(K_{n_1,\\ldots,n_r})-2rn_{\\tau+1}|Z_i^1|$.\nHence by (\\ref{eq for weak}) and (\\ref{1.4 Eq 1}), we have $2rn_{\\tau+1}|Z_i^1|\\leq 2(rn_{\\tau+1})^2$ and thus $|Z_i^1|\\leq rn_{\\tau+1}\\leq r(1/tr)^4n_\\tau\\leq (1/tr)^3n_\\tau$.\nNote that every vertex of $V_i$ except $Z_i^1$ is adjacent to all but at most $2rn_{\\tau+1}\\leq 2rn_\\tau /(tr)^4$ vertices in other classes of $K_{n_1,\\ldots,n_\\tau}$.\nTherefore, we can easily find a copy of $K_\\tau^s$ in the subgraph of $G$ induced by the union of any $n_\\tau/2$ vertices from each set of $V_1,\\ldots,V_\\tau$.\n\nNow we prove $G[U_S]$ can be $K^s_{t-\\tau}$-free by only removing $\\delta_1 n_{\\tau+1}^2$ edges.\nSuppose $G[U_S]$ contains a copy of $K^s_{t-\\tau}$.\nThen there is at least one vertex of $K^s_{t-\\tau}$ having neighbours less than $|U_L|-2tr n_{\\tau+1}$ neighbours in $U_L$, as otherwise all vertices of $K^s_{t-\\tau}$ will have more than $n_i-2(tr)^2 n_{\\tau+1} \\geq n_\\tau/2$ common neighbours in each set of $V_1,\\ldots,V_{\\tau}$.\nThus we can extend $K^s_{t-\\tau}$ to $K_t^s$ with some $s$ vertices from each $V_i$, a contradiction to that $G$ is $K_{t}^s$-free.\nTherefore, at least one vertex $v_1$ of $K^s_{t-\\tau}$ has degree at most $|U_L|-2tr n_{\\tau+1}$ in $U_L$, implying $d_G(v_1)\\leq |U_L|-tr n_{\\tau+1}$.\n\nLet $H^1=G_{\\{v_1\\}\\to U_L}$.\nThus $e(H^1)\\geq e(G)+|U_L|-d_G(v_1)\\geq e(G)+trn_{\\tau+1}$.\nSince $G$ is $K_t^s$-free and $v_1$ is not adjacent any copy of $K_{t-1}$ in $H^1$, $H^1$ is still $K_t^s$-free.\n\nWe repeat this process recursively and obtain graph $H^{C}$ such that $H^C[U_S]$ is $K_{t-\\tau}^s$-free.\nBy Lemma \\ref{removal lemma} $H^C[U_S]$ contains a $K_{t-\\tau}$-free subgraph $H_S$ with $e(H_S)\\geq e(H^C[U_S])-(1/tr)^{8t}\\delta_1 n_{\\tau+1}^2$.\nThus by Theorem~\\ref{extremal number 1}, we have\n\\begin{equation}\\label{eq 7}\ne(H^C[U_S])\\leq f(n_{\\tau+1},\\ldots,n_r,1,t-\\tau)+(1/tr)^{8t}\\delta_1 n_{\\tau+1}^2\n\\end{equation}\nIt follows from Lemma \\ref{remove big set} and \\eqref{eq for weak} that\n\\begin{equation}\\label{eq 8}\n\\begin{aligned}\ne(H^C[U_S])&\\geq e(H^C)-\\sum\\limits_{1\\leq i0$ and $n_{t-1}\\geq 1 / \\delta_1$.\n\nBy Theorem~\\ref{extremal number 1}, there is an extremal graph for $K_{t}$ which is $(t-1)$-partite, implying $e(G)\\geq f(n_1,\\ldots,n_r,1,t)$ (since every $(t-1)$-partite graph is $F$-free).\nSince $\\delta_2 \\gg \\delta_1$, by Theorem \\ref{weak stability}, after removing at most $\\delta_2n_{t-1}^2$ edges, the resulting graph $\\widetilde{G}$ has an $(X,\\delta_2)$-stable $(t-1)$-partition $\\mathcal{P}=(P_1,\\ldots,P_{t-1})$.\nMoreover, we partition $X$ into $X_i\\subseteq P_i$ such that each vertex of $X$ has smallest number of neighbours in its own class of $\\mathcal{P}$.\n\n\n\nLet $\\epsilon \\gg \\delta^\\prime_3 \\gg \\delta_2$.\nBy Proposition \\ref{size gape} there exists an $\\epsilon_1$ with $\\epsilon_1\\in [\\delta^\\prime_3,\\sqrt[10^{(t-1)r}]{\\delta^\\prime_3}]$, such that every set of $(\\mathcal{P}\\wedge \\mathcal{V})_{X}$ with size either less than $\\epsilon_1 n_{t-1}$ or more than $\\sqrt[10]{\\epsilon_1} n_{t-1}$.\nIt is clear that $\\epsilon_1 \\ll \\epsilon$.\nLet $Z_i$ be subset of $P_i$ in $(\\mathcal{P}\\wedge \\mathcal{V})_X$ with size less than $\\epsilon_1 n_{t-1}$ and $Z=(\\bigcup^{t-1}_{i=1}Z_i) \\cup X$.\nThus $|Z|\\leq 2tr\\epsilon_1 n_{t-1}$.\n\n\n\nLet $\\widetilde{V}_i=V_i\\setminus Z$, $\\widetilde{P}_i=P_i\\setminus Z$, $X_{i,j}=\\widetilde{V}_i\\cap \\widetilde{P}_j$ and $\\widetilde{\\mathcal{V}}=(\\widetilde{V}_1,\\ldots,\\widetilde{V}_r)$.\n\nLet $v\\in V_k$ and $P_{i_k}\\setminus V_k$ be any smallest part among $\\mathcal{P}_{V_k}$.\nLet $\\widetilde{V}_\\iota$ be integral in $\\widetilde{P}_{i_k}$.\nChoose $B \\subseteq\\widetilde{V}_\\iota$ with size $|F|$.\nBy Proposition \\ref{neighbour copy keeps free}, $G_{\\{v\\} \\to N_G[B]}$ is also $F$-free.\nSince $G$ is the extremal graph, we have\n\\begin{equation}\\label{minimum degree in F free extremal}\nd_G(v)\\geq |N_{\\widetilde{G}}[B]| \\geq \\sum_{i\\neq i_k}|P_i\\setminus V_k|-\\sqrt{\\epsilon_1} n_{t-1}.\n\\end{equation}\nIf $V_k$ does not belong to $Z$, then since $\\mathcal{P}_Z$ is $\\sqrt{\\epsilon_1}n_{t-1}$-stable to $\\widetilde{\\mathcal{V}}$ by \\eqref{minimum degree in F free extremal} and Proposition~\\ref{lemma for Q}, there exists an integer $m(v)\\in [1,t-1]$ with $v$ adjacent to $\\sqrt[6]{\\epsilon_1}n_{t-1}$ vertices from the integral part of each $\\widetilde{P}_{i\\neq m(v)}$.\n\nSuppose that $V_k \\subseteq Z$.\nLet $m$ be the minimum size of integral parts of classes of $\\mathcal{P}_Z$ and $m^\\prime$ be the minimum size of partial parts of classes of $\\mathcal{P}_Z$ ($\\mathcal{P}_Z$ is 1-partial).\nBy (\\ref{minimum degree in F free extremal}), we have $d_G(v)\\geq n-m-m^\\prime-3|Z|$.\nSuppose that $v\\in V_k$ is adjacent to less than $\\sqrt[6]{\\epsilon_1}n_{t-1}$ vertices from the integral part of $\\widetilde{P}_{i_1}$ and $\\widetilde{P}_{i_2}$.\nLet $X_{a_1,i_1}$ and $X_{a_2,i_2}$ be the possible partial parts of $\\widetilde{P}_{i_1}$ and $\\widetilde{P}_{i_2}$ (if $\\widetilde{P}_{i}$ is integral, we set the partial part empty).\nSince $\\mathcal{P}=(P_1,\\ldots,P_{t-1})$ is an $(X,\\delta_2)$-stable $(t-1)$-partition, the integral part of $\\widetilde{P}_{i}$ is at least $m-|Z|$ and $m^\\prime\\leq m/2+|Z|$.\nThus\n$$d_G(v)\\leq \\sum_{i\\neq i_1,i_2}|P_i|+|X_{a_1,i_1}|+|X_{a_2,i_2}|+3\\sqrt[6]{\\epsilon_1}n_{t-1}< n-m-m^\\prime-3|Z|, $$\na contradiction.\nThus there exists an integer $m(v)\\in [1,t-1]$ such that $v$ is adjacent to $\\sqrt[6]{\\epsilon_1}n_{t-1}$ vertices from the integral part of each $\\widetilde{P}_{i\\neq m(v)}$.\n\n\n\nLet $Y_i$ be the vertices of $P_i$ with more than $\\epsilon n_{t-1}$ neighbours in their own part.\nLet $c_{\\epsilon,F}=tN(\\sqrt[3]{\\epsilon_1},\\ldots,\\sqrt[3]{\\epsilon_1};1/2;|F|)$ and $Y=\\bigcup_{i=1}^{t-1}Y_i$.\nThus every vertex $v$ of $Y$ is adjacent to $\\sqrt[6]{\\epsilon_1}n_{t-1}$ vertices from the integral part of each $\\widetilde{P}_{i\\neq m(v)}$ and $\\sqrt[6]{\\epsilon_1}n_{t-1}$ vertices from part $\\widetilde{P}_{m(v)}$.\nSince $\\mathcal{P}$ is an $(X,\\zeta )$-stable partition of $\\widetilde{G}$, we can pick $\\sqrt[5]{\\epsilon_1}n_{t-1}$ vertices $Q_i(v)$ from each $\\widetilde{P}_{i}$ to form a copy of $K_{t-1}^{\\sqrt[5]{\\epsilon_1}n_{t-1}}$ whose vertices are all adjacent to $v$.\n\n\nLet $\\mathcal{Q}(v)=(Q_1(v),\\ldots,Q_{t-1}(v))$.\nFor a set of vertices $X$, let $$\\bigcap_{v\\in X} \\mathcal{Q}(v)=\\left(\\bigcap_{v\\in X}Q_1(v),\\ldots,\\bigcap_{v\\in X}Q_{t-1}(v)\\right).$$\n\nSuppose that some $|Y_i|$ is larger than $c_{\\epsilon,F}/t$.\nLet $\\tau=\\tau(n_1,\\ldots,n_r,t,r)$.\nWe divide the proof into the following two cases.\n\nIf $\\tau=0$, then $n_{t-1}\\geq (1/r)^{4t}n$.\nNote that each part size of $\\mathcal{Q}(a)$ is larger than $\\sqrt[5]{\\epsilon_1}n_{t-1}\\geq \\sqrt[4]{\\epsilon_1}n$ for each vertex $a\\in Y_i$.\nThus by Lemma \\ref{selection lemma}, there exist $|F|$ vertices, denote it $C_F$, of $Y_i$ such that every part size of $\\bigcap_{v\\in C_F} \\mathcal{Q}(v)$ is larger than $\\epsilon_1 n_{t-1}/2$.\nHence $C_F$ and $\\bigcap_{v\\in C_F} \\mathcal{Q}(v)$ can form a copy of $K_t^{\\epsilon_1 n_{t-1}/4}$, implying $F\\subset G$, a contradiction.\nHence $|Y_i|\\leq c_{\\epsilon,F}/t$, implying $|Y|\\leq c_{\\epsilon,F}$.\n\n\n\nLet $\\tau\\geq 1$.\nWe will show that $V_1,\\ldots,V_\\tau$ are parts of $\\mathcal{P}$.\nWe will first show $\\widetilde{V}_1,\\ldots,\\widetilde{V}_\\tau$ are parts of $\\mathcal{P}_Z$.\nSuppose that part $\\widetilde{P}_i$ of $\\mathcal{P}_Z$ contains vertices of set in $\\widetilde{V}_1,\\ldots,\\widetilde{V}_\\tau$, let it be $\\widetilde{V}_{i_1}$, and also contains vertices from other set of $\\widetilde{\\mathcal{V}}$, let it be $\\widetilde{V}_{i_2}$.\nIf $\\widetilde{V}_{i_1}$ is integral in $\\widetilde{P}_i$ then $|\\widetilde{P}_i\\setminus \\widetilde{V}_{i_2}|\\geq |\\widetilde{V}_{i_1}|$.\nIf $\\widetilde{V}_{i_1}$ is partial in $\\widetilde{P}_i$ then due to $\\mathcal{P}$ is $Y$-stable of $G$ then $|\\widetilde{P}_i\\setminus \\widetilde{V}_{i_1}|\\geq |V_{i_1}|-|Y|$.\nThus by letting $k=i_1,i_2$ and $m=1,\\ldots,t-1$ in (\\ref{minimum degree in F free extremal}) we obtain each part of $\\mathcal{P}_Z$ is larger than $ |\\widetilde{V}_{i_1}|/2$, implying every part of $\\mathcal{P}_Z$ contains at least one of $\\widetilde{V}_1,\\ldots,\\widetilde{V}_\\tau$, a contradiction due to $\\tau\\leq t-2$.\nThus let $\\widetilde{P}_1=\\widetilde{V}_1,\\ldots,\\widetilde{P}_\\tau=\\widetilde{V}_\\tau$.\nDue to each $Y_i$ has minimal neighbours in $\\widetilde{P}_i$ and by (\\ref{minimum degree in F free extremal}) trivially, we obtain $P_1=V_1,\\ldots,P_\\tau=V_\\tau$.\n\n\n\nSuppose that $|Y_i|\\geq c_{\\epsilon,F}/t$ for some $Y_i$ .\nSince vertex in every $P_{i\\leq \\tau}$ has no neighbour in its own part, we only consider $Y_{i>\\tau}$.\nNote $n_{t-1}\\geq (1/r)^{4t}n_{\\tau+1}$.\nThus every $P_{i>\\tau}$ has $|P_i|\\leq rn_{\\tau+1}\\leq r^{5t}n_{t-1}$.\nFor each vertex $a\\in Y_i$, the size of each part of $\\mathcal{Q}(a)$ in $P_{i>\\tau}$ is larger than $\\sqrt[5]{\\epsilon_1}n_{t-1}\\geq \\sqrt[4]{\\epsilon_1}|P_i|$ .\nAs the proof in the case $\\tau=0$, there is a copy of $K_{t-\\tau}^{\\epsilon_1/4 n_{t-1}}$ in $P_{i>\\tau}$'s.\nNote that each vertex of $K_{t-\\tau-1}^{|F|}$ have at least $|V_{i\\leq \\tau}|-\\delta_2 n_{t-1}$ neighbours in each $V_{i\\leq \\tau}$.\nThere is a copy of $K_t^{\\epsilon_1 n_{t-1}/4}$ in $G$, implying $F\\subset G$, a contradiction.\nThus we have $|Y_i|\\leq c_{\\epsilon,F}/t$, implying $|Y|\\leq c_{\\epsilon,F}$.\nNote that the degree of vertex of $V_k\\setminus Y$ in every $P_j$ is less than $\\sum_{i\\neq j} |P_i\\setminus V_k|+\\sqrt{\\epsilon_1}n_{t-1}\\leq \\sum_{i\\neq i_k} |P_i\\setminus V_k|+2\\sqrt{\\epsilon_1}n_{t-1}$.\nThus by \\eqref{minimum degree in F free extremal}, the difference of the degrees of vertices of $G-Y$ in same class of $\\mathcal{V}$ is at most $\\epsilon n_{t-1}$.\n\nLet $\\delta_3=\\sqrt[5]{\\epsilon_1}\\gg \\delta_2$.\nIn the proof we show every vertex of $Y$ is adjacent to a copy of $K_t^{\\delta_3 n_{t-1}}$.\nFor vertex $v\\in V(G)\\setminus Y$, due to (\\ref{minimum degree in F free extremal}) and the neighbours size of $a$ in its own part is less than $\\epsilon n_{t-1}$, we have $|N_{K[\\mathcal{P}]}(v)\\bigtriangleup N_G(v)|\\leq \\epsilon n_{t-1}$.\nWe finish the proof the strong stability theorem.\n\\hfill$\\square$ \\medskip\n\n\n\n\n\\section{Applications}\\label{prove conjecture}\\label{section 6}\n\\noindent{\\bf Proofs of Theorems~\\ref{strong bollobas} and~\\ref{conjecture}.}\nGiven $r \\geq t \\geq 3$ and $k \\geq 2$, let $n_1 \\geq \\ldots \\geq n_r$ and $n_{t-1}$ be sufficiently large.\nLet $G \\subseteq K_{n_1,\\ldots,n_r}$ be an extremal graph for $kK_t$.\nWe will show that $e(G)=f(n_1,\\ldots,n_r,k,t).$\nMoreover, the extremal graph is obtained from a complete $(t-1)$-partite graph in $K_{n_1,\\ldots,n_r}$ by joining all possible edges incident with $k-1$ fixed vertices.\n\n\nLet $\\epsilon>0$ be a small constant.\nSince $e(G)\\geq f(n_1,\\ldots,n_r,1,t)$, by Theorem \\ref{strong stability} there exist constants $\\delta_1 \\ll \\delta_2 \\ll \\delta_3 \\ll \\epsilon$ and $c_{\\epsilon,F}$ depending on and $r$, $\\epsilon$ and $F$ such that if $n_{t-1}\\geq 1 / \\delta_1$ then\n\\begin{itemize}\n\\item [$(a)$] there exists an $(X,\\delta_2)$-stable $(t-1)$-partition $\\mathcal{P}$,\n\\item [$(b)$] there exists a set $Y$ with $|Y|\\leq c_{\\epsilon,F}$ such that every vertex $v$ of $G-Y$ satisfies $|N_G(v)\\bigtriangleup N_{K_{n_1,\\ldots,n_r}[\\mathcal{P}]}(v)|\\leq \\epsilon n_{t-1}$ and\n \\item [$(c)$] the difference of the degrees of vertices of $G-Y$ in same class of $\\mathcal{V}$ is at most $\\epsilon n_{t-1}$ and\n\\item [$(d)$] every vertex of $Y$ is adjacent to at lest $\\delta_3 n_{t-1}$ to each class of $\\mathcal{P}$ such that they induced a complete $(t-1)$-partite graph.\n\\end{itemize}\n\n\n\n\nFor an edge $xy$ inside $P_s \\setminus Y$, we say $xy$ \\textcolor{blue}{{\\it good}} if $xy$ is adjacent to a copy of $K_{t-2}^{kt}$ consisting of edges between different $P_i$'s and say $xy$ \\textcolor{blue}{{\\it bad}} otherwise.\nLet $E$ be the set of good edges.\n\n\\begin{center}\n\\begin{tikzpicture}[scale = 1]\n\\tikzstyle{every node}=[scale=0.7]\n\n\\draw (0,0) ellipse (0.4 and 1.2);\n\\draw node at (-1,0) {$\\widetilde{P}_1$};\n\n\\draw [purple] (0,2) circle (2pt);\n\\draw [purple] node at (-0.25,2.1) {$x$};\n\n\\draw [blue](0,1.5) circle (2pt);\n\\draw [blue] node at (-0.25,1.4) {$y$};\n\n\n\\draw [purple] (3,1.4) ellipse (0.4 and 1.2);\n\\draw [purple] node at (3.8,1.4) {$\\widetilde{V}_{i_1}$};\n\n\\draw [blue] (3,-1.4) ellipse (0.4 and 1.2);\n\\draw [blue] node at (3.8,-1.4) {$\\widetilde{V}_{i_2}$};\n\n\\draw [line width=0.1cm, red ,opacity=1] (0,2)--(0,1.5);\n\\draw [line width=0.1cm, dotted,black ,opacity=0.5] (0,1.5)--(2.6,1.4);\n\\draw [line width=0.1cm, dotted,black ,opacity=0.5] (0,2)--(2.6,-1.4);\n\\draw [line width=0.1cm, dotted,black ,opacity=0.5] (0.4,0)--(2.6,-1.4);\n\\draw [line width=0.1cm, dotted,black ,opacity=0.5] (0.4,0)--(2.6,1.4);\n\n\\draw node at (1.5,-3) {Figure 3. A bad edge $xy$ with $x\\in V_{i_1},y\\in V_{i_2}$};\n\\end{tikzpicture}\n\\end{center}\n\n\\medskip\n\n\\noindent{\\bf Claim 1.} $|E|\\leq k\\epsilon n_{t-1}$.\n\n\\medskip\n\n\\begin{proof}\nClearly, there is no matching on $k$ edges in $E$, as otherwise we can easily find a copy of $kK_t$.\nSince each vertex is adjacent to at most $ \\epsilon n_{t-1}$ vertices in their own class of $\\mathcal{P}$, we have $|E|\\leq k\\epsilon n_{t-1}$.\n\\end{proof}\n\n\\noindent{\\bf Claim 2.} If $Y=\\emptyset$, then $e(G)-|E| \\leq f(n_1,\\ldots,n_r,1,t)$. Moreover, the equality holds when $G-E$ is a $(t-1)$-partite graph.\n\n\\medskip\n\n\\begin{proof}\nLet $\\widetilde{V}_i=V_i \\setminus X $, $\\widetilde{P}_j= P_j \\setminus X$, $X_{i,j}=\\widetilde{V}_i\\cap \\widetilde{P}_j$ and $X_i=X \\cap P_i$.\nLet $m$ be the minimum size of the integral part of all classes of $\\mathcal{P}_X$.\n\n\nLet $xy$ be a bad edge in $P_s$ with $x \\in V_{i_1}$ and $y\\in V_{i_2}$.\nSuppose that the size of $X_{i_1,j}$ is less than $m- 6\\epsilon n_{t-1}$ for each $j\\in[t-1]$ distinct from $s$.\nBy $(b)$, $x$ is adjacent to $|\\widetilde{P}_{j}\\setminus \\widetilde{V}_{i_1}|- \\epsilon n_{t-1}$ vertices of $\\widetilde{P}_{j\\neq s}$.\nIn particular, $x$ is adjacent to at least $5\\epsilon n_{t-1}$ vertices of each integral part of $\\widetilde{P}_{j\\neq s}$.\nIf $V_{i_2}\\subseteq X$, $\\widetilde{V}_{i_2}$ is partial in $\\mathcal{P}_X$ or $\\widetilde{V}_{i_2}$ is integral in $\\widetilde{P}_s$, then $y$ is adjacent to all but at most $\\epsilon n_{t-1}$ vertices of $\\widetilde{P}_{j\\neq s}$ and hence $xy$ is adjacent to a copy of $K_{t-2}^{kt}$ consisting of edges between different $P_{j\\neq s}$'s, a contradiction.\n\nNow we may assume that $\\widetilde{V}_{i_2}$ is integral in $\\widetilde{P}_{q}$ with $q\\neq s$.\nNote that $|\\widetilde{P}_{s}|\\geq m$ and $|\\widetilde{P}_{q} \\setminus \\widetilde{V}_{i_2}| \\leq m$ (by (a), $\\mathcal{P}_X$ is stable to $\\mathcal{V}_X$).\nConsider vertices in $V_{i_2}$, it follows from (b) and (c) that $ |P_q|-2 \\epsilon n_{t-1} \\leq |\\widetilde{P}_s| +|\\widetilde{V}_{i_2}| \\leq|P_q|+2 \\epsilon n_{t-1}$, implying $m \\leq |\\widetilde{P}_s |\\leq m+3\\epsilon n_{t-1}$ and $m-3\\epsilon n_{t-1} \\leq |\\widetilde{P}_{q} \\setminus \\widetilde{V}_{i_2}| \\leq m$.\nRecall that $|X_{i_1,q}|\\leq m- 6\\epsilon n_{t-1}$.\nWe can easily see that $xy$ is adjacent to a copy of $K_{t-2}^{kt}$ consisting of edges between different $P_{j\\neq s}$'s, a contradiction.\nTherefore, there exists a constant $\\ell$ distinct from $s$ such that $|X_{i_1,\\ell}| \\geq m- 6\\epsilon n_{t-1}$.\nRemind that $m\\leq |\\widetilde{P}_s|\\leq m+3\\epsilon n_{t-1}$, thus the partial part size of $\\widetilde{P}_s$ is less than $3\\epsilon n_{t-1}$, implying $|X_{i_1,s}|\\leq 3\\epsilon n_{t-1}$.\n\nNote $|\\widetilde{P}_\\ell\\setminus (\\widetilde{V}_{i_1}\\cup \\widetilde{V}_{i_2})|\\leq \\epsilon n_{t-1}$.\nConsider vertices in $V_{i_2}$, applying (c) again, we have $|X_{i_1,\\ell}|=|\\widetilde{P}_s\\setminus \\widetilde{V}_{i_2}|\\leq |\\widetilde{P}_s|+\\Theta( \\epsilon n_{t-1})=m+\\Theta( \\epsilon n_{t-1})$.\nSimilarly, consider vertices in $V_{i_1}$, we have $|X_{i_2,\\ell}|=|\\widetilde{P}_s\\setminus \\widetilde{V}_{i_1}|=m+\\Theta( \\epsilon n_{t-1})$.\nTherefore, there exists an integer $\\ell(i_1,i_2)=\\ell$ such that\n$$|X_{i_1,\\ell(i_1,i_2)}|=|X_{i_2,\\ell(i_1,i_2)}|=m+\\Theta( \\epsilon n_{t-1}).$$\nOtherwise, $xy$ is adjacent to a copy of $K_{t-2}^{kt}$ consisting of edges between different $P_{j\\neq s}$'s, a contradiction.\n\n\n\nNow we show that $P_{\\ell(i_1,i_2)}\\setminus (V_{i_1}\\cup V_{i_2})=\\emptyset$.\nSuppose that exists a vertex $z\\in P_{\\ell(i_1,i_2)}\\setminus (V_{i_1}\\cup V_{i_2})$ and let $z\\in V_{i_3}$.\nThus by $(b)$ we have $d_G(v)=\\sum_{i\\neq \\ell}|P_i\\setminus V_{i_3}|+\\Theta(\\epsilon n_{t-1})$.\nIf $V_{i_3}\\subset X$ then $d_G(v)\\geq \\sum_{i\\neq s}|P_i|+\\Theta(\\epsilon n_{t-1})$ implying $m\\geq 2m+\\Theta(\\epsilon n_{t-1})$, a contradiction.\nLet $\\widetilde{V}_{i_3}$ has vertices in $\\widetilde{P}_k$.\nBy $(b),(c)$ we have $|\\widetilde{P}_k\\setminus \\widetilde{V}_{i_3}|=2m+\\Theta(\\epsilon n_{t-1})$.\nBy $(a)$ we have $|\\widetilde{P}_k\\setminus \\widetilde{V}_{i_3}|\\leq m$, a contradiction.\nThus $P_{\\ell(i_1,i_2)}\\setminus (V_{i_1}\\cup V_{i_2})=\\emptyset$.\nWe call such $P_{\\ell(i_1,i_2)}$ a \\textcolor{blue}{{\\it bad class}} of the bad edge $xy$ and call such $(i_1,i_2)$ \\textcolor{blue}{{\\it bad pair}}.\n\nWe can conclude there is no $i$ which appears in two bad pairs.\nOtherwise, suppose we have bad pairs $(i_1,i_2),(i_2,i_3)$.\nDue to $(a)$ we have $|X_{i_1,\\ell(i_1,i_2)}|\\geq |X_{i_2,\\ell(i_1,i_2)}|+|X_{i_2,\\ell(i_1,i_2)}|=2m+\\Theta(\\epsilon n_{t-1})$, a contradiction.\n\nLet $G^\\prime=G-E$.\nSuppose that there is a bad edge $v_iv_j$ in $P_1$, otherwise we are done.\nLet $v_i\\in V_i$ and $v_j\\in V_j$.\nClearly, $V_i \\cap P_1$ and $ V_j\\cap P_1$ forms a bipartite graph in $P_1$ and other vertex in $P_1$ is not adjacent to $(V_i \\cup V_j) \\cap P_1$.\nLet $G^\\ast$ be the graph obtained from $G^\\prime$ by adding all possible edges between $P_i$'s and all edges between $V_i$ and $V_j$ (in the same class of $\\mathcal{P}$) incident with a bad edge.\n\nWe can conclude $G^\\ast$ is $K_t$-free.\nOtherwise we have $K_t\\subset G^\\ast$.\nLet $V(K_t)=\\{a_1,\\ldots,a_t\\}$ with $a_i\\in V_i$ for $i\\in [t]$.\nBy Pigeonhole Principle, we can suppose $a_1,a_2\\in P_1$ thus $a_1a_2$ form a bad edge.\nNote $1$ cannot appear in other bad pair except $(1,2)$ thus $a_{i\\geq 3}\\notin P_1$.\nObviously, we have $a_{i\\geq 3} \\notin P_{\\ell(1,2)}$.\nThus use the Pigeonhole Principle recursively and obtain vertex $a_t$ not belongs to any of $P_j$, a contradiction.\nThus $G^\\ast$ is $K_t$-free.\n\nIf $|P_1 \\setminus (V_i \\cup V_j)|> |V_j\\cap P_{\\ell(i,j)}|$, then similarly, we can show that $H_1=G^\\ast_{V_i\\cap P_1 \\rightarrow S}$ with $S=\\bigcup_{\\iota\\neq \\ell(i,j)}P_i\\setminus V_i $ keeps $K_t$-free with more edges than $G^\\ast$.\nHence, by Theorem~\\ref{extremal number 1}, $e(G^\\prime) \\leq e(G^\\ast) < e(H_1)\\leq f(n_1,\\ldots,n_r,1,t)$.\nThus we may assume that $|P_1 \\setminus (V_i \\cup V_j)| \\leq \\min \\{|V_i\\cap P_{\\ell(i,j)}|,|V_j\\cap P_{\\ell(i,j)}|\\}$.\nWithout loss of generality, let $| V_i\\cap (P_{\\ell(i,j)}\\cup P_1) | \\leq |V_j\\cap (P_{\\ell(i,j)}\\cup P_1)|$.\nNow, let $P^\\ast_1= (P_1 \\setminus V_j) \\cup (V_i\\cap P_{\\ell(i,j)}) $ and $P^\\ast_{\\ell(i,j)}=V_j\\cap (P_{\\ell(i,j)}\\cup P_1)$.\nThe graph $H_2 $ obtained from $G^\\ast$ by changing $P_1$ and $P_{\\ell(i,j)}$ to $P^\\ast_1$ and $P^\\ast_{\\ell(i,j)}$ and replacing all edges between $P_1$ and $P_{\\ell(i,j)}$ to all edges between $P^\\ast_1$ and $P^\\ast_{\\ell(i,j)}$ (keeping other edges incident with $P_1$ and $P_{\\ell(i,j)}$).\nAgain, we can see that $H_2$ keeps $K_t$-free with more edges than $G^\\ast$.\nby Theorem~\\ref{extremal number 1}, $e(G^\\prime) \\leq e(G^\\ast) < e(H_1)\\leq g(n_1,\\ldots,n_r,1,t)$.\nWe finish the proof of Claim 2.\n\\end{proof}\n\n\n\n\n\n\nFirst, we prove the $k=1$ case, i.e., we first prove Theorem~\\ref{strong bollobas}.\nClearly, $(d)$ implies that $Y$ is empty.\nMoreover, we have $|E|=0$.\nIf there is an edge in $P_i$, then by Claim 2 we get a contradiction.\nThus $G[P_i]$ is an independent set, and hence the result follows by Theorem~\\ref{extremal number 1}.\n\n\nWe prove Theorem~\\ref{conjecture} by induction on $k$.\nSuppose the theorem holds for $k-1$.\nSuppose that there is a vertex $a \\in Y$.\nBy $(d)$, $a$ is adjacent to a copy of $K_{t-1}^{kt}$, thus $G-a$ must be $(k-1)K_t$-free.\nLet $a \\in V_{\\iota}$.\nBy induction hypothesis, we have $$e(G)\\leq e(G-a)+n-n_\\iota\\leq g(n_1,\\ldots,n_\\iota-1,\\ldots,n_r,k-1,t)+n-n_\\iota\\leq g(n_1,\\ldots,n_r,k,t),$$\nthe inequality holds only if $d_G(a)=n-n_\\iota$ and the last inequality holds by the construction of joining a vertex of $V_\\iota$ to the possible vertices in extremal graph reaches $g(n_1,\\ldots,n_\\iota-1,\\ldots,n_r,k-1,t)$.\nThus by induction hypothesis, the extremal graph is a complete $(t-1)$-partite graph with $k$ vertices adjacent to all other vertices.\n\nTherefore, we may assume that $Y$ is empty.\nCombining with Claims 1 and 2, $e(G)\\leq g(n_1,\\ldots,1,t)+k\\epsilon n_{t-1} 1/2$ gain a $\\mathcal{P}$,$\\mathcal{T}$-odd-induced MQM that similarly to {\\em e}EDM\\ interacts with an unpaired electron spin.\nMeasuring MQMs with the use of molecules may provide improved limits on the strength of $\\mathcal{P}$,$\\mathcal{T}$-odd nuclear forces, on the proton, neutron, and quark EDMs, on the quark chromo-EDMs, and on the QCD $\\theta$ term and CP-violating quark interactions \\cite{flambaum2014time, PhysRevLett.113.263006}. \n\nThe scheme of measuring MQMs with the use of molecules is essentially the same as for the {\\em e}EDM. The energy shift in molecular spectra is equal to \n\\begin{equation}\n\\delta E_M = MW_M P_{ M},\n\\label{Msplit}\n\\end{equation}\nwhere $M$ is the value of MQM, $W_M$ (similarly to $E_{\\rm eff}$) is determined by the electronic structure of the molecule, $P_{ M}$ (similarly to $P$) is the dimensionless constant (polarization). $W_M$ constant for $^{173}$YbOH were calculated in \\cite{maison2019theoretical}.\nTo extract $M = \\delta E_M / W_M P_{ M}) $ from the measured\nshift $\\delta E_M $, one needs to know both $W_M$ and $P_{ M}$ values.\n\n\n\nYbOH molecule is a promising system for experiments looking for nonconservation effects such as {\\em e}EDM\\ and nuclear MQM \\cite{Kozyryev:17, PhysRevA.105.L050801}. \nFor {\\em e}EDM\\ search experiments, the spinless common $^{174}$Yb isotope would be ideal, while for MQM searches, the $^{173}$Yb isotope with nuclear spin $I=5/2$ has to be used. In Ref. \\cite{PhysRevA.105.L050801} we have developed the method for calculation of the polarization $P$. The aim of the present work is to extend the method to $P_{ M}$ and apply it to the ground rotational level of the first excited $v = 1$ bending vibrational mode of $^{173}$YbOH molecule.\n\nSince both {\\em e}EDM\\ and MQM contribute to the measured energy splitting of the molecular $^{173}$YbOH spectra one needs to find a way to distinguish these two contributions. It is possible due to the different dependence of $P$ and $P_{ M}$ ($E_{\\rm eff}$ and $W_M$ are the same for all hyperfine levels) on hyperfine level of the molecule. Then, performing the measurements on two (at least) different hyperfine levels (provided $P$ and $P_{ M}$ are known) allows one to distinguish between the {\\em e}EDM\\ and MQM contributions. Therefore calculation of the $P$ for $^{173}$YbOH is also performed in the paper.\n\n\n\\section{Method}\nFor the purpose of the present paper we present our Hamiltonian as\n\\begin{equation}\n{\\rm \\bf\\hat{H}} = {\\rm \\bf\\hat{H}}_{\\rm mol} + {\\rm \\bf\\hat{H}}_{\\rm hfs} + {\\rm \\bf\\hat{H}}_{\\rm ext},\n\\label{Hamtot}\n\\end{equation} \nwhere\n${\\rm \\bf\\hat{H}}_{\\rm mol}$ is the molecular Hamiltonian as it is described in Ref. \\cite{PhysRevA.105.L050801},\n\\begin{equation}\n\\begin{aligned}\n {\\rm \\bf\\hat{H}}_{\\rm hfs} = { g}_{\\rm H} {\\bf \\rm I^H} \\cdot \\sum_a\\left(\\frac{\\bm{\\alpha}_{2a}\\times \\bm{r}_{2a}}{r_{2a}^3 }\\right) + \\\\\n{ g}_{\\rm Yb}{\\mu_{N}} {\\bf \\rm I^{Yb}} \\cdot \\sum_a\\left(\\frac{\\bm{\\alpha}_a\\times \\bm{r}_{1a}}{{r_{1a}}^3}\\right) \\\\\n-e^2 \\sum_q (-1)^q \\hat{Q}^2_q({\\bf \\rm I^{\\rm Yb}}) \\sum_a \\sqrt{\\frac{2\\pi}{5}}\\frac {Y_{2q}(\\theta_{1a}, \\phi_{1a})}{{r_{1a}}^3}\n\\end{aligned}\n\\end{equation}\nis the hyperfine interaction electrons with Yb and H nuclei,\n${ g}_{\\rm Yb}$ and ${ g}_{\\rm H}$ are the\n g-factors of the ytterbium and hydrogen nuclei, $\\bm{\\alpha}_a$\n are the Dirac matrices for the $a$-th electron, $\\bm{r}_{1a}$ and $\\bm{r}_{2a}$ are their\n radius-vectors in the coordinate system centered on the Yb and H nuclei,\n $\\hat{Q}^2_q({\\bf \\rm I^{\\rm Yb}})$ is the quadrupole moment operator for $^{173}$Yb nucleus,\nindex $a$ enumerates (as in all equations below) electrons of YbOH.\n\n\\begin{equation}\n {\\rm \\bf\\hat{H}}_{\\rm ext} = -{ {\\bf D}} \\cdot {\\bf E}\n\\end{equation}\ndescribes the interaction of the molecule with the external electric field, and\n{\\bf D} is the dipole moment operator.\n\nWavefunctions were obtained by numerical diagonalization of the Hamiltonian (\\ref{Hamtot})\nover the basis set of the electronic-rotational-vibrational wavefunctions\n\\begin{equation}\n \\Psi_{\\Omega m\\omega}P_{lm}(\\theta)\\Theta^{J}_{M_J,\\omega}(\\alpha,\\beta)U^{\\rm H}_{M^{\\rm H}_I}U^{\\rm Yb}_{M^{\\rm Yb}_I}.\n\\label{basis}\n\\end{equation}\nHere \n $\\Theta^{J}_{M_J,\\omega}(\\alpha,\\beta)=\\sqrt{(2J+1)/{4\\pi}}D^{J}_{M_J,\\omega}(\\alpha,\\beta,\\gamma=0)$ is the rotational wavefunction, $U^{\\rm H}_{M^{\\rm H}_I}$ and $U^{\\rm Yb}_{M^{\\rm Yb}_I}$ are the hydrogen and ytterbium nuclear spin wavefunctions, $M_J$ is the projection of the molecular (electronic-rotational-vibrational) angular momentum $\\hat{\\bf J}$ on the lab axis, \n $\\omega$ is the projection of the same momentum on $z$ axis of the molecular frame,\n $M^{\\rm H}_I$ and $M^{\\rm Yb}_I$ are the projections of the nuclear angular \nmomenta of hydrogen and ytterbium on the lab axis, $P_{lm}(\\theta)$ is the associated Legendre polynomial, $\\Psi_{\\Omega m\\omega}$ is electronic wavefunction (see Ref. \\cite{PhysRevA.105.L050801} for details).\n\n\n\nIn this calculation functions with $\\omega - m = \\Omega = \\pm 1/2$, $l=0-30$ and $m=0,\\pm 1, \\pm 2$, $J=1/2,3/2,5/2$ (as in Ref. \\cite{PhysRevA.105.L050801}) were included to the basis set (\\ref{basis}).\nNote, that the ground mode $v=0$ corresponds to $m=0$,\nthe first excited bending mode $v=1$ to $m=\\pm 1$, the second excited bending mode has states with $m=0, \\pm2$ etc.\n\nProvided that the {\\it electronic-vibrational} matrix elements are known, the matrix elements of ${\\rm \\bf\\hat{H}}$ between states in the basis set (\\ref{basis}) can be calculated with help of the angular momentum algebra \\cite{LL77, PhysRevA.105.L050801} in the same way as for the diatomic molecules \\cite{Petrov:11}.\nMatrix elements required to calculate ${\\rm \\bf\\hat{H}}_{\\rm mol}$, ${\\rm \\bf\\hat{H}}_{\\rm ext}$ and hyperfine interaction associated with hydrogen nucleus were taken from Ref. \\cite{PhysRevA.105.L050801}.\n\n\\begin{figure}[th]\n \\includegraphics[width=0.5\\textwidth]{P_Pm_mF_150.pdf}\n \\caption{(Color online) The calculated polarizations $P$, $P_M$ and ratio $P/P_M$. The abscissa numbering the states by increasing energy.\n Calculations have performed for electric field $E=150$ V/cm. For levels suggested for experiment the $P/P_M$ values are marked by red (bold style). }\n \\label{EDMMQMshift}\n\\end{figure}\n\nMatrix elements required to calculate hyperfine interaction with ytterbium nucleus are\n\n\n\\begin{multline}\nA_{ \\parallel} = \\frac{g_{\\rm Yb}}{\\Omega} \\times\\\\\n \\langle\n \\Psi_{\\Omega m\\omega}P_{lm} |\\sum_a\\left(\\frac{\\bm{\\alpha}_{1a}\\times\n\\bm{r}_{1a}}{r_{1a}^3}\\right)\n_z|\\Psi_{\\Omega m \\omega}P_{l'm}\\rangle \\\\\n= -1929~\\delta_{ll'} {~ \\rm MHz},\n\\end{multline}\n\n\\begin{multline}\nA_{ \\perp} = {g_{\\rm Yb}} \\times\\\\\n \\langle\n \\Psi_{\\Omega=1/2m\\omega}P_{lm} |\\sum_a\\left(\\frac{\\bm{\\alpha}_a\\times\n\\bm{r}_{1a}}{r_{1a}^3}\\right)\n_+|\\Psi_{\\Omega=-1/2 m \\omega-1}P_{l'm}\\rangle \\\\\n= -1856~ \\delta_{ll'} {~ \\rm MHz},\n\\end{multline}\n\n\\begin{multline}\ne^2Qq_0 = \\langle\n \\Psi_{\\Omega m\\omega}P_{lm} | \\\\\n e^2 \\sum_q (-1)^q \\hat{Q}^2_q({\\bf \\rm I^{\\rm Yb}}) \\sum_a \\sqrt{\\frac{2\\pi}{5}}\\frac {Y_{2q}(\\theta_{1a}, \\phi_{1a})}{{r_{1a}}^3} \\\\\n |\\Psi_{\\Omega m \\omega}P_{l'm}\\rangle = 3319~ \\delta_{ll'} {~ \\rm MHz}\n\\end{multline}\nfrom Ref. \\cite{Pilgram:21}.\nFollowing Ref. \\cite{PhysRevA.105.L050801} we neglect\n the $\\theta$ dependence of the above matrix elements.\n\n\\begin{figure}\n\\includegraphics[width=0.49\\linewidth]{P_1.pdf}\n\\includegraphics[width=0.49\\linewidth]{PM_1.pdf}\n\\caption{\\label{EDMMQMshift2} \n Calculated polarizations $P$ and $P_M$ for the $M_F=0.5$ of the lowest $N=1$ rotational level\nof the first excited the $v=1$ bending vibrational mode of $^{173}$YbOH as functions of the external electric field.}\n\\end{figure}\n\n\\section{Results and discussions}\n\n\nIn Fig. \\ref{EDMMQMshift} the calculated polarizations $P$, $P_M$, as well as ratio $P/P_M$ for the lowest $N=1$ rotational level\nof the first excited $v=1$ bending vibrational mode of the $^{173}$YbOH for the external electric field $E=150$ V/cm are presented as a bar chart for clarity. Within the $M_F$ manifolds the levels are ordered\nby the energy value.\nHere $M_F=M_J+{M^{\\rm H}_I}+{M^{\\rm Yb}_I}$ is the projection of the total molecular (electronic-rotational-vibrational-nuclear spins) angular momentum ${\\bf F}$ on the lab axis.\nThere are 24 levels for $M_F=1/2$, 22 levels for $M_F=3/2$, 16 levels for $M_F=5/2$, 8 levels for $M_F=7/2$ and 2 levels for $M_F=9/2$.\nAs an example, in Fig. \\ref{EDMMQMshift2} the calculated $P$ and $P_M$\nfor $M_F=1/2$ as functions of the external electric field are presented.\nElectric field $E=150$ V/cm ensure almost saturated values for $P$ and $P_M$.\nCalculations showed that all levels have polarizations $P<0.65$, $P_M<0.15$.\nEnergy levels for all $M_F$ values as functions of the external electric field are presented\nin Fig. \\ref{Energy}.\n\nFor MQM searches the levels with large $P_M$ values are preferred. Beyond this, \n to distinguish {\\em e}EDM\\ and MQM contributions, the levels with different $P/P_M$ ratios have to be used. For the levels satisfied these conditions the $P/P_M$ values are marked by red (bold style) in Fig. \\ref{EDMMQMshift}. In Fig. \\ref{Energy} the corresponding levels are marked by the bold style. The selection of the levels is not unique. On the base of Fig. \\ref{EDMMQMshift} one can select another appropriate levels for the MQM search. \n \nFinally we calculated the polarizations $P$ and $P_M$ associated with {\\em e}EDM\\ and MQM energy shifts for\n $^{173}$YbOH molecule in the first excited bending mode. The levels most suitable for the MQM search are determined.\n\n\n\\section{acknowledgement}\nThe work is supported by the Russian Science Foundation grant No. 18-12-00227.\n\n\n\n\n", "meta": {"timestamp": "2022-08-31T02:03:26", "yymm": "2208", "arxiv_id": "2208.13881", "language": "en", "url": "https://arxiv.org/abs/2208.13881"}} {"text": "\n\n\n\\subsection*{#1}}\n {\\end{marginfigure}}\n\n\n\\let\\quote\\quoting\n\\let\\endquote\\endquoting\n\\renewenvironment{quotation}\n {\\ClassError{Please use the `quote` environment instead of `quotation`}}\n\n\n\n\\newcolumntype{L}{>{$}l<{$}}\n\\newcolumntype{C}{>{$}c<{$}}\n\\newcolumntype{R}{>{$}r<{$}}\n\\newcolumntype{T}{>{\\ttfamily}l}\n\\newcolumntype{S}{>{\\sffamily}l}\n\n\n\\newenvironment{block}\n {\\begin{center}}\n {\\end{center}}\n\n\n\n\\newmacro{newlogo}[2][]\n {\\ifthenelse{\\isempty{#1}}\n {\\newlogoaux{#2}{\\smallcaps{\\lowercase{#2}}}}\n {\\newlogoaux{#1}{#2}}}\n\\newmacro{newlogoaux}[2]\n {\\newmacro{#1}{#2}}\n\n\n\n\\newmacro{newoperator}[1]\n {\\newmathcommand[op]{#1}}\n\\newmacro{newkeyword}[2][]\n \n {\\ifthenelse{\\isempty{#1}}\n {\\newoperator{#2}{\\text{\\normalfont\\sffamily\\bfseries #2}}}\n {\\newoperator{#1}{\\text{\\normalfont\\sffamily\\bfseries #2}}}}\n\\newmacro{newvalue}[2][]\n {\\ifthenelse{\\isempty{#1}}\n {\\newoperator{#2}{\\text{\\normalfont\\sffamily #2}}}\n {\\newoperator{#1}{\\text{\\normalfont\\sffamily #2}}}}\n\\newmacro{newtype}[2][]\n {\\ifthenelse{\\isempty{#1}}\n {\\newoperator{#2}{\\text{\\normalfont\\sffamily\\scshape #2}}}\n {\\newoperator{#1}{\\text{\\normalfont\\sffamily\\scshape #2}}}}\n\n\n\n\n\\newmacro{obox}[2]\n {\\makebox[0pt][l]{\\ensuremath{#2}}\\phantom{\\ensuremath{#1}}}\n\n\\newmacro{highlight}[1]\n {\\colorbox{lightgray}{\\ensuremath{#1}}}\n\n\n\\newmacro{Quad}\n {\\hspace{1.5em}}\n\\newmacro{Break}\n {\\\\[\\smallskipamount]}\n\n\n\n\n\\newmacro{upon}\n {\\genfrac{}{}{0pt}{0}}\n\n\n\n\n\\let\\nothing\\varnothing\n\n\n\n\n\\let\\<\\langle\n\\let\\>\\rangle\n\n\\newmathcommand[open]{llbrace}{\\{\\!|}\n\\newmathcommand[close]{rrbrace}{|\\!\\}}\n\n\\newmacro{set}[1]\n {\\ensuremath{\\{#1\\}}}\n\\newmacro{tuple}[1]\n {\\ensuremath{\\<#1\\>}}\n\n\n\n\\let\\lt<\n\\let\\gt>\n\\let\\To\\Rightarrow\n\n\\newmathcommand[bin]{pp}\n {+\\!\\!+}\n\\newmathcommand{Mid}\n {\\;\\mid\\;}\n\n\n\n\\newmacro{powerset}[1]\n \n {\\mathcal{P}(#1)}\n\n\\newmathcommand{n}{\\underline{n}}\n\n\\newmathcommand[bb] {NN}{N}\n\\newmathcommand[bb] {ZZ}{Z}\n\\newmathcommand[bb] {EE}{E}\n\\newmathcommand[bb] {OO}{O}\n\\newmathcommand[bb] {QQ}{A}\n\\newmathcommand[bb] {RR}{R}\n\\newmathcommand[bb] {CC}{C}\n\\newmathcommand[bb] {HH}{H}\n\n\\newmathcommand[bb] {LL}{L}\n\\newmathcommand[bb] {UU}{U}\n\\newmathcommand[bb] {BB}{B}\n\\renewmathcommand[bb]{SS}{S}\n\n\\let\\to\\rightarrow\n\\let\\implies\\supset\n\\let\\infers\\vdash\n\n\n\n\\newmacro{hint}[1]\n {\\quad\\text{\\{ #1 \\}}}\n\n\\newmathcommand[op]{when}\n {\\mathbf{when}}\n\\newmathcommand[op]{where}\n {\\mathbf{where}}\n\\renewmathcommand[op]{and}\n {\\mathbf{and}}\n\\newmathcommand[op]{otherwise}\n {\\mathbf{otherwise}}\n\\newmathcommand[op]{impossible}\n {\\mathrm{impossible}}\n\n\n\n\\let\\group\\begingroup\n\n\\newenvironment*{marginequation}[1]\n {\\begin{marginfigure}[#1]\\equation}\n {\\endequation\\end{marginfigure}}\n\n\\newenvironment*{marginequation*}[1]\n {\\begin{marginfigure}[#1]\\equation\\nonumber}\n {\\endequation\\end{marginfigure}}\n\n\n\\newenvironment*{function}\n {\\begin{tabular}{@{}L@{\\ \\ }C@{\\ \\ }L@{}}}\n {\\end{tabular}}\n\\newmacro{signature}[1]\n {\\multicolumn{3}{@{}L@{}}{#1}}\n\\newmacro{inset}[1]\n {\\multicolumn{3}{L}{\\quad #1}}\n\n\n\\newenvironment*{grammar}\n \n {\\begin{block}\\begin{tabular}{@{}rRCLl}}\n {\\end{tabular}\\end{block}}\n\\newenvironment*{grammar*}\n \n {\\begin{block}\\begin{tabular}{@{}RLl}}\n {\\end{tabular}\\end{block}}\n\n\n\n\n\n\n\n\n\\newmacro{placerule}[4]\n {\\ensuremath{\n \\upon\n {\\text{\\smallcaps{#1}}\\hfill}\n {\\dfrac{#2}{#3}\\ #4}\n }}\n\n\\newmacro{newrule}[4]\n {\\newmacro{#1}{\\placerule{#1}{#2}{#3}{#4}}}\n\\newmacro{userule}\n {\\usemacro}\n\\newmacro{refrule}[1]\n {\\ifthenelse{\\isundefined{#1}}\n {\\GenericError{}{Rule `#1` is not defined}{}{}}\n {\\textsc{#1}}}\n\n\n\\section{Examples}\n\\label{sec:examples}\n\nWe present a few examples to demonstrate how our framework handles TopHat programs.\nThe candy vending machine combines the Select and View editor, the\nStep Task, and the Pair Task to construct a candy machine (Section~\\ref{sec:example-candymachine}).\nThe calorie calculator demonstrates a real-world\napplication of our framework (Section~\\ref{sec:example-calorie-calculator}). The chat sessions demonstrates the use of shared data stores (Section~\\ref{sec:example-chat}), and finally Section~\\ref{sec:tax} describes UI generation for the tax example from Section~\\ref{sec:top:formal}\n\n\n\n\n\n\\subsection{Candy vending machine}\n\\label{sec:example-candymachine}\n\nThe candy machine allows a user to choose a chocolate bar and, after the bill is\npaid, the candy machine returns the bar. The candy machine combines the Edit,\nPair and Step task. We have defined different Edit tasks with View and Select\neditors. The implementation of the initial task is given in Listing\n\\ref{sec:artefact:lst:candymachine}. The Pair combinator is denoted with the\noperator \\lstinline{><}.\n\n\\begin{enumerate}\n \\item After the candy machine is started, the machine displays some\n introductory text and a selection of chocolate bars (see Figure\n \\ref{fig:candymachine-step1}). This is done using a Pair Task that consists\n of two Edit tasks: an Edit task with a View editor and an Edit Task with a\n Select editor.\n\n \\item Select a chocolate bar. After choosing a bar, the candy machine\n displays the price of the bar (see Figure \\ref{fig:candymachine-step2}).\n This is done using another Pair Task that consists of an Edit task with a\n View editor (\\textit{``you need to pay:''}) and a Step Task. The Step task consists of\n two tasks: first a view editor is shown (with the price) and after the step,\n a select editor is rendered (see Figure \\ref{fig:candymachine-step3}).\n\n \\item Press the continue button.\n\n \\item Insert coins until you have paid the bill (see Figure \\ref{fig:candymachine-step3}). The application alternates a view and a select editor.\n\n \\item The application shows a view editor to indicate to the user that the\n bill is paid (see Figure \\ref{fig:candymachine-step4}).\n\\end{enumerate}\n\n\n \\begin{lstlisting}[caption={Initial Task of the candy vending machine (Haskell)},label={sec:artefact:lst:candymachine},language=Haskell]\ndata CandyMachineMood = Fair | Evil\n\nstartCandyMachine :: (Task h (Text, (Text, Text)))\nstartCandyMachine = view \"We offer you three chocolate\n bars. Pure Chocolate: It's all in the name. IO\n Chocolate: Chocolate with unpredictable side effects.\n Sem Chocolate: don't try to understand, just eat\n it!\" >< select candyOptions\n\ncandyOptions :: HashMap Label (Task h (Text, Text))\ncandyOptions =\n [ entry \"Pure Chocolate\" 8,\n entry \"IO Chocolate\" 7,\n entry \"Sem Chocolate\" 9\n ]\n where\n entry :: Text -> Int -> (Label, Task h (Text, Text))\n entry name price =\n (name, view \"You need to pay:\" >< (view price >>? payCandy))\n\npayCandy :: Int -> Task h Text\npayCandy bill =\n select (payCoin bill) >>? \\billLeft ->\n case compare billLeft 0 of\n EQ -> dispenseCandy Fair\n LT -> dispenseCandy Evil\n GT -> payCandy billLeft\n\npayCoin :: Int -> HashMap Label (Task h Int)\npayCoin bill =\n [ coinSize 5,\n coinSize 2,\n coinSize 1\n ]\n where\n coinSize :: Int -> (Label, Task h Int)\n coinSize size = (display size, view (bill - size))\n\ndispenseCandy :: CandyMachineMood -> Task h Text\ndispenseCandy Fair =\n view \"You have paid. Here is your candy. Enjoy it!\"\ndispenseCandy Evil =\n view \"You have paid too much! Sorry, no change, but here is your candy.\"\n\\end{lstlisting}\n\n\n\\begin{figure}[t]\n \\subfloat[Step 1: Select a chocolate bar]{\n \\includegraphics[width=\\linewidth]{pics/candymachine_step1-small.png}\n \\label{fig:candymachine-step1}\n }\n\n \\subfloat[Step 2: Price of the selected candy is shown to the user]{\n \\includegraphics[width=\\linewidth]{pics/candymachine_step2-small.png}\n \\label{fig:candymachine-step2}\n }\n\n \\subfloat[Step 3: Insert a coin]{\n \\includegraphics[width=\\linewidth]{pics/candymachine_step3-small.png}\n \\label{fig:candymachine-step3}\n }\n\n \\subfloat[Step 4: You have paid the bill]{\n \\includegraphics[width=\\linewidth]{pics/candymachine_step4-small.png}\n \\label{fig:candymachine-step4}\n }\n \\caption{Different stages of the candy vending machine}\n\\end{figure}\n\n\n\n\\subsection{Calorie calculator}\n\\label{sec:example-calorie-calculator}\n\nTo demonstrate a more real-world application that incorporates most task types,\nwe created a calorie calculator. This application calculates how many calories a\nperson should eat per day in order to maintain their weight. The calculation\ndepends on several factors, such as age, weight, and activity level. The\napplication can be broken down in several steps to prompt the user for input,\nand finally calculating the result. The implementation of the task is\ngiven in Listing \\ref{sec:artefact:lst:calories}.\n\n\\begin{enumerate}\n \\item When started, the application presents the user with some information\n about the calculation using a View editor.\n \\item After pressing continue, the user is prompted to enter the required\n data in different steps: height, weight, and age using Enter editors,\n and gender and activity level using Select editors. Each prompt is\n wrapped in a Pair task with a View editor on the left side to act as\n the label. Such a prompt is shown in\n Figure~\\ref{fig:calorie-calulator}.\n \\item In the last step the result is displayed using a View editor.\n\\end{enumerate}\n\n\\begin{figure}\n\\begin{lstlisting}[caption={Task of the calorie calculator (Haskell)},label={sec:artefact:lst:calories},language=Haskell]\ndata Gender = Male | Female\n\ndata ActivityLevel = Sedentary | Low | Active | VeryActive\n\ntype Height = Int\n\ntype Weight = Int\n\ntype Age = Int\n\ncalculateCaloriesTask :: Task h Text\ncalculateCaloriesTask =\n introduction >>? \\_ -> do\n (_, height) <- promptHeight\n (_, weight) <- promptWeight\n (_, age) <- promptAge\n (_, gender) <- promptGender\n (_, activityLevel) <- promptActivityLevel\n let calories = calculateCalories gender activityLevel height weight age\n view\n ( \"Your resting metabolic rate is: \"\n <> display calories\n <> \" calories per day.\"\n )\n\nintroduction :: Task h Text\nintroduction = view <| unlines\n [ \"This tool estimates your resting metabolic rate,\",\n \"i.e. the number of calories you have to consume\",\n \"per day to maintain your weight.\",\n \"Press \\\"Continue\\\" to start\"\n ]\n\npromptGender :: Task h (Text, Gender)\npromptGender =\n view \"Select your gender:\"\n >< select\n [ \"Male\" ~> Done Male,\n \"Female\" ~> Done Female\n ]\n\npromptHeight :: Task h (Text, Height)\npromptHeight = view \"Enter your height in cm:\" >< enter\n\npromptWeight :: Task h (Text, Weight)\npromptWeight = view \"Enter your weight in kg:\" >< enter\n\npromptAge :: Task h (Text, Age)\npromptAge = view \"Enter your age:\" >< enter\n\npromptActivityLevel :: Task h (Text, ActivityLevel)\npromptActivityLevel =\n view \"What is your activity level?\"\n >< select\n [ \"Sedentary\" ~> Done Sedentary,\n \"Low active\" ~> Done Low,\n \"Active\" ~> Done Active,\n \"Very Active\" ~> Done VeryActive\n ]\n\n-- We omit the actual calculation here since it is a bit lengthy.\ncalculateCalories :: Gender -> ActivityLevel -> Height -> Weight -> Age -> Int\ncalculateCalories gender al h w age = ...\n\\end{lstlisting}\n\\end{figure}\n\n\\begin{figure}\n \\begin{center}\n \\includegraphics[width=.7\\linewidth]{pics/calorie_calculator_height_step_smaller.png}\n \\end{center}\n \\caption{Prompting the user to enter their height}\n \\label{fig:calorie-calulator}\n\\end{figure}\n\\subsection{Chat session}\n\\label{sec:example-chat}\n\nThis example uses shared data stores to model a chat session between two\nusers, as displayed in Figure~\\ref{fig:tophat-chat-session}. Each user can write\nmessages to the chat history on the left hand side using their respective inputs\non the right hand side.\n\nThe implementation for this example is given in Listing~\\ref{lst:chat-session}.\nThe function \\lstinline{share} creates a data store that can be accessed by\nmultiple tasks, in this case the two \\lstinline{chat} tasks. The \\lstinline{<<=}\noperator is used to transform the contents of the shared data store.\n\\begin{figure}\n \\centering\n \\includegraphics[width=\\linewidth]{pics/tophat_ui_chatsession_smaller.png}\n \\captionsetup{type=figure}\n \\captionof{figure}{A chat session using shared data stores.}\n \\label{fig:tophat-chat-session}\n\\end{figure}\n\n\\begin{minipage}{\\linewidth}\n\\begin{lstlisting}[\n caption={A chat Session using shared data stores (Haskell)},\n label={lst:chat-session},\n language=Haskell]\nchatSession :: Reflect h => Task h (Text, ((), ()))\nchatSession = do\n history <- share \"\"\n watch history ><\n (chat \"Tim\" history >< chat \"Nico\" history)\n where\n chat :: Text -> Store h Text -> Task h ()\n chat name history = repeat <|\n enter >>* [\"Send\" ~> append history name]\n\n append :: Store h Text -> Text -> Text -> Task h ()\n append history name msg = do\n history <<= \\h ->\n (if h == \"\" then h else h ++ \"\\n\")\n ++ name ++ \": '\"\n ++ msg ++ \"'\"\n\\end{lstlisting}\n\\end{minipage}\n\n\\subsection{Tax example}\n\\label{sec:tax}\nFor our final example, we revisit the tax program from Section~\\ref{sec:top:formal}.\n\n\\begin{lstlisting}[caption={Tax example in Haskell},label={sec:tax:code},language=Haskell]\ntax :: Task h ((((Amount, Bool), Bool), Date), Date)\ntax =\n let today :: Date\n today = 100\n\n provideDocuments :: Task h (Amount, Date)\n provideDocuments = enter >< enter\n\n companyConfirm :: Task h Bool\n companyConfirm = enter\n\n officerApprove :: Date -> Date -> Bool -> Task h Bool\n officerApprove invoiceDate date confirmed =\n view (date - invoiceDate < 365 && confirmed)\n in (provideDocuments >< companyConfirm)\n >>? \\((invoiceAmount, invoiceDate), confirmed) ->\n officerApprove invoiceDate today confirmed\n >>? \\approved ->\n let subsidyAmount =\n if approved\n then min 600 (invoiceAmount `div` 10)\n else 0\n in view\n <| unlines\n [ \"Subsidy amount: \" ++ display subsidyAmount,\n \"Approved: \" ++ display approved,\n \"Confirmed: \" ++ display confirmed,\n \"Invoice date: \" ++ display invoiceDate,\n \"Today: \" ++ display today\n ]\n\\end{lstlisting}\n\nListing~\\ref{sec:tax:code} gives the Haskell code that implements the task.\nCompared to the original definition as given in Listing~\\ref{lst:tax}, the task is nearly identical.\nThe only change made is to the final line, where we have opted for a different presentation of the final result, for simplicity's sake.\n\n\\begin{figure}[t]\n \\subfloat[Step 1: The citizen enters the request info on the left, the installation company confirms on the right]{\n \\includegraphics[width=\\linewidth]{pics/tophat-ui-tax-2-step-2.png}\n \\label{}\n }\n\n \\subfloat[Step 2: The tax office confirms or denies the request]{\n \\hspace{.5cm}\n \\includegraphics[width=.30\\linewidth]{pics/tophat-ui-tax-2-step-3.png}\n \\label{}\n }\\hspace{1cm}\n \\subfloat[Step 3: The final outcome of the request is displayed]{\n \\includegraphics[width=.30\\linewidth]{pics/tophat-ui-tax-2-step-4-alt.png}\n }\n \\caption{Different stages of the tax subsidy application}\n \\label{fig:tax}\n\\end{figure}\n\nFigure~\\ref{fig:tax} lists the different stages of the UI for the tax subsidy task.\nFirst, the user requesting the subsidy can enter in information (first two tasks), while the company can confirm or deny.\nThen, the tax officer can verify if the conditions are met, and approve the request.\nFinally, the outcome is shown.\n\nSince we did not have to modify the task at all, besides a minor presentation detail, this task can still be proven correct using symbolic execution.\nThis example clearly illustrates the advantage of TopHat with a UI over the current state-of-the-art in the form of iTasks.\n\n\\section{TopHat UI Framework}\n\\label{sec:tophat-user-interface}\n\nIn this section we describe our prototype TOP UI framework, which is a\nproof-of-concept and not a fully fledged TOP framework. Our application supports\nTopHat tasks as mentioned in Section~\\ref{sec:tophat}. We limit\nourselves to a select number of datatypes: only integers, booleans, and strings\nare supported. Advanced framework features such as multi-user support are out of\nscope as well.\nWe will reflect on this in Section~\\ref{sec:conclusion}.\nThe framework is written in Haskell, and we use the following extensions.\n\\begin{description}\n \\item [OverloadedLists] to allow for a more convenient HashMap notation.\n \\item [OverloadedStrings] to allow for a more convenient way of using Text.\n \\item [PackageImport and NoImplicitPrelude] to deal with the fact that TopHat defines its own Prelude.\n\\end{description}\nAll source code is published on GitHub\\footnote{\\url{https://github.com/mark-gerarts/ou-afstuderen-artefact}},\nalong with the examples described below.\n\n\n\nKey to our approach is that we leave the task specification of TopHat untouched.\nThis preserves the nice formal properties for which TopHat has been developed in the first place.\nThe prototype UI framework completely relies on the TopHat semantics for handling input and rewriting tasks.\nThe responsibility of the UI framework is to render the task in a web browser, and hand off input that comes in from the user to the TopHat semantics.\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[width=\\linewidth]{pics/architecture.png}\n \\captionsetup{type=figure}\n \\captionof{figure}{Architecture. Each box represents a main module.}\n \\label{fig:architecture}\n\\end{figure}\n\nThe prototype framework is architecturally separated in two parts: the backend\nand the frontend. Figure~\\ref{fig:architecture} shows the main modules of each\npart. The backend is responsible for initializing tasks and handling\ncommunication with TopHat. The frontend renders tasks and allows the user to\ninteract with them. After a comparative study of existing web server and UI\nframeworks~\\cite{markmarc2021}, we have selected Servant~\\cite{servant} as our webserver and\nHalogen~\\cite{purescripthalogen} for the UI. Other options are discussed in the\nSection~\\ref{sec:related-work}.\nSection~\\ref{sec:artefact:communication} illustrates the communication\nbetween frontend and backend. Section~\\ref{sec:artefact:backend} explains\nthe working of the backend and the frontend is discussed in Section~\\ref{sec:artefact:frontend}.\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[width=0.75\\linewidth]{pics/frontendBackendSSD.png}\n \\captionsetup{type=figure}\n \\captionof{figure}{Communication between frontend and backend. Sequence diagram that displays requests (solid arrows) and responses (dashed arrows). \\texttt{update value} and \\texttt{reset} are user actions. Task and Input are JSON objects.}\n \\label{fig:frontendBackendSSD}\n\\end{figure}\n\n\\subsection{Communication between backend and frontend}\n\\label{sec:artefact:communication}\n\nFigure \\ref{fig:frontendBackendSSD} shows the communication between frontend and\nbackend. The frontend first requests the initial task, which the backend returns\nusing a JSON representation of this task. A user can now interact with the\nsystem. In this example, the user updates a value. The frontend sends the input\nas JSON to the backend, and the backend responds with the updated task. This\nstep can be repeated as necessary. In this case, the user resets the\napplication, which results in the backend resetting back to the initial task.\n\nThe frontend is written in PureScript and the backend in Haskell. We choose JSON\nas data interchange format, because JSON allows custom data structures, it is\neasy to use, and both backend and frontend support JSON out-of-the-box.\n\n\\subsection{Backend}\n\\label{sec:artefact:backend}\n\nThe backend is written in Haskell, using Servant~\\cite{servant} as the web\nserver.\nIt has three main responsibilities, which is reflected in its module structure, shown in Figure~\\ref{fig:architecture}:\n\n\\begin{enumerate}\n \\item The Application module loads the application, defines the web\n server and configures the handlers.\n \\item The Communication module handles JSON conversion, both encoding tasks\n to their JSON representation and decoding user input.\n \\item The Visualize module is intended for the end user. It exposes\n functions to start the framework, which is demonstrated in\n Listing~\\ref{sec:artefact:usage}.\n\\end{enumerate}\n\n\\begin{lstlisting}[caption={Starting the framework (Haskell)},label={sec:artefact:usage},language=Haskell]\nimport Task (Task, enter, view, (>>?))\nimport Visualize (visualizeTask)\n\nmain :: IO ()\nmain = visualizeTask greet\n\ngreet :: Task h String\ngreet = enter >>? \\result -> view (\"Hello \" ++ result)\n\\end{lstlisting}\n\n\\paragraph{Application module}\n\nWe create an abstract web application (WAI-application) in the Application\nmodule (see the \\texttt{application} function in Listing\n\\ref{sec:artefact:lst:application}). We define the endpoints, the request and\nthe response formats. For example, see the \\texttt{TaskAPI} in Listing\n\\ref{sec:artefact:lst:application}. The \\texttt{server} function provides\nhandlers to serve the initial task, to handle interaction with the frontend and\nto perform a reset. The remainder of the module consists of functions that\nexpose functionality of TopHat: initializing tasks, deconstructing tasks in a\nrepresentation that can be sent to the frontend, and interacting with tasks. We\nhave only added key signatures to Listing \\ref{sec:artefact:lst:application}.\n\n\\begin{minipage}{\\linewidth}\n\\begin{lstlisting}[caption={Application module (Haskell)},label={sec:artefact:lst:application},language=Haskell]\nmodule Application (application, State (..)) where\n\ndata State h t = State\n { currentTask :: TVar (Task RealWorld t),\n initialised :: Bool,\n originalTask :: Task RealWorld t\n }\n\ntype TaskAPI =\n \"initial-task\" :> Get '[JSON] TaskDescription\n :<|> \"interact\"\n :> ReqBody '[JSON] JsonInput :> Post '[JSON] TaskDescription\n :<|> \"reset\" :> Get '[JSON] TaskDescription\n\ntype StaticAPI = Get '[HTML] RawHtml :<|> Raw\ntype API = TaskAPI :<|> StaticAPI\n\ninteractIO :: Input Concrete -> Task RealWorld a -> IO (Task RealWorld a)\ninitialiseIO :: Task RealWorld a -> IO (Task RealWorld a)\ndescribeIO :: Task RealWorld a -> IO TaskDescription\n\nserver :: ToJSON t => State h t -> ServerT API (AppM h t)\n\napplication :: ToJSON t => State h t -> Application\\end{lstlisting}\n\\end{minipage}\n\n\\paragraph{Communication module}\n\nIn Listing \\ref{sec:artefact:lst:communication} we show the core of the\ncommunication module. We introduce a new datatype, \\texttt{TaskDescription},\nthat holds all data we need to render a task: the task itself\n(\\texttt{JsonTask}) and its possible inputs (\\texttt{InputDescription}), along\nwith the \\texttt{describe} function that extracts this data from a TopHat task.\nUser input, which is sent back and forth from the client to the server, is\ndefined in \\texttt{JsonInput}.\n\n\n\\begin{lstlisting}[caption={Communication module (Haskell)},label={sec:artefact:lst:communication},language=Haskell]\nmodule Communication (JsonTask (..), TaskDescription (..), describe) where\n\ntype JsonTask = Value\n\ntype InputDescriptions = List (Input Abstract)\n\ndata TaskDescription where\n TaskDescription :: JsonTask -> InputDescriptions -> TaskDescription\n\ninstance ToJSON JsonTask\n\ndescribe :: Members '[Alloc h, Read h] r => Task h t -> Sem r TaskDescription\n\ndata JsonInput where\n JsonInput :: Input Concrete -> JsonInput\n\ninstance FromJSON JsonInput\n\\end{lstlisting}\n\n\\paragraph{Visualize module}\n\nIn Listing \\ref{sec:artefact:lst:visualize} we show the signatures of the\nvisualize module. We use this module to run the web server in production\n(\\texttt{visualizeTask}) or development (\\texttt{visualizeTaskDevel}) mode. We\ndifferentiate between these modes because we implemented live code reloading for\ndevelopment, which requires a bit of additional setup. Both\n\\texttt{visualizeTask} and \\texttt{visualizeTaskDevel} use the \\texttt{initApp}\nfunction. \\texttt{InitApp} on its turn invokes the application-function of the\nApplication Module.\n\n\\begin{minipage}{\\linewidth}\n\\begin{lstlisting}[caption={Visualize module (Haskell)},label={sec:artefact:lst:visualize},language=Haskell]\nmodule Visualize (visualizeTask, visualizeTaskDevel) where\n\ninitApp :: ToJSON t => Task RealWorld t -> IO Application\n\nvisualizeTaskDevel :: ToJSON t => Task RealWorld t -> IO ()\n\nvisualizeTask :: ToJSON t => Task RealWorld t -> IO ()\\end{lstlisting}\n\\end{minipage}\n\\subsection{Frontend}\\label{sec:artefact:frontend}\n\nThe frontend renders the UI and provides a way for the user to interact with it. The code is written in PureScript using the Halogen framework.\nThe frontend consists of three main modules and some auxiliary modules. We\nexplain the main modules:\n\n\\begin{enumerate}\n \\item The Client module is the communication layer with the backend. It\n defines functions which send requests to the backend and handles the\n responses.\n \\item The Task module handles JSON encoding and decoding of our domain's\n datatypes (tasks and user input).\n \\item The TaskLoader module is the starting point of Halogen and is\n responsible for rendering the UI.\n\\end{enumerate}\n\n\\paragraph{Client module}\n\nThe client module is responsible for the communication between frontend and\nbackend. The backend sends a response in JSON that consists of two parts: a\n\\texttt{Task} and a description of possible inputs. We decode this JSON object\ninto a \\texttt{TaskResponse}. See Listing \\ref{sec:artefact:lst:client}.\n\n\n\\begin{lstlisting}[caption={Client module (PureScript)},label={sec:artefact:lst:client},language=Haskell]\nmodule App.Client (ApiError, TaskResponse(..), getInitialTask, interact, reset) where\n\ndata TaskResponse\n = TaskResponse Task (Array InputDescription)\n\ninstance decodeJsonTaskResponse :: DecodeJson TaskResponse\n\ngetInitialTask :: Aff (Either ApiError TaskResponse)\n\ninteract :: Input -> Aff (Either ApiError TaskResponse)\n\nreset :: Aff (Either ApiError TaskResponse)\\end{lstlisting}\n\n\n\\paragraph{Task module}\nIn the Client module we defined a \\texttt{TaskResponse}. This\n\\texttt{TaskResponse} consists of two parts: a \\texttt{Task} and an array of\n\\texttt{InputDescription}. In the Task module we define the decoding process of\n\\texttt{Task} and \\texttt{InputDescription}. See Listing~\\ref{sec:artefact:lst:task}.\n\n\\begin{minipage}{\\linewidth}\n\\begin{lstlisting}[caption={Task module (PureScript)},label={sec:artefact:lst:task},language=Haskell]\nmodule App.Task where\n\ndata Task\n = Edit Name Editor\n | Select Name Task Labels\n | Pair Task Task\n | Choose Task Task\n | Step Task\n | Trans Task\n | Done\n | Fail\n\ninstance showTask :: Show Task\n\ninstance decodeJsonTask :: DecodeJson Task\n\ndata Input\n = Insert Int Value\n | Decide Int String\n\ninstance showInput :: Show Input\n\ninstance encodeInput :: EncodeJson Input\n\ndata InputDescription\n = InsertDescription Int String\n | OptionDescription Int String\n\ninstance showInputDescription :: Show InputDescription\n\ninstance decodeJsonInputDescription :: DecodeJson InputDescription\n \\end{lstlisting}\n\\end{minipage}\n\n\\paragraph{TaskLoader module}\n\nThe TaskLoader module renders the user interface (the \\texttt{render} function\nin Listing \\ref{sec:artefact:lst:taskloader}). The module also contains logic to\nhandle events (\\texttt{handleAction}), for example when a user modifies a value.\nFinally, the \\texttt{taskLoader} function (see Listing\n\\ref{sec:artefact:lst:taskloader}) initializes the component.\n\n\n\\begin{lstlisting}[caption={TaskLoader module (PureScript)},label={sec:artefact:lst:taskloader},language=Haskell]\nmodule Component.TaskLoader (taskLoader) where\n\ntaskLoader :: forall query input output m. MonadAff m => H.Component query input output m\n\nhandleAction :: forall output m. MonadAff m => Action -> H.HalogenM State Action Slots output m Unit\n\nrender :: forall m. MonadAff m => State -> HH.ComponentHTML Action Slots m\n\\end{lstlisting}\n\n\\section{Conclusion}\n\\label{sec:conclusion}\n\nWe have demonstrated TopHat UI, a proof-of-concept framework that implements a GUI for Tophat programs.\nNone of the advanced Clean features used by iTasks were required to do so, as expected.\nOn top of that, we were able to leave the TopHat language untouched, preserving its formal properties.\n\nOur framework implements all basic requirements for a TOP framework, by supporting tasks, shared data stores, combinators and generics.\nThe source code for our framework is available online, and can thus be leveraged by developers and researchers to advance the field of Task-Oriented Programming.\n\n\\subsection{Future work}\n\nAs mentioned in Section~\\ref{sec:tophat-user-interface}, TOP features such as multi-user support and richer datatypes are considered future work.\nWe see no technical or formal reason prohibiting them from being included in future versions of the UI framework.\nAs with iTasks, the rendering of values, and editors of values, is generic in the type of the value.\nAdding support for more complex datatypes would just mean making instances for them for viewing and editing them, similar to how this is done in iTasks.\nAs for multi-user support, this is a limitation in the current version of TopHat.\nIts developers are already working on adding multi-user support.\nOnce this feature is released, we see no fundamental limitations in supporting this in the UI.\nThe server framework used in the current implementation, Servant, already has extensive support for user authentication, which could be leveraged~\\footnote{\\url{https://docs.servant.dev/en/stable/tutorial/Authentication.html}}.\n\n\n\\section{Introduction}\n\nWorkflow software is present in most businesses and institutions\nnowadays. From health care and first responders, to commerce and industrial\nprocesses. Businesses use workflow software to streamline their processes,\nincrease efficiency and reduce costs. In these sectors, reliability of software\nis crucial.\n\nPrevious research into workflow software in the functional\nprogramming community aimed to improve reliability, while at the same time\nreducing the effort of development. This led to the development of\nTask-Oriented Programming (TOP), a programming paradigm that aims to facilitate\nworking with multiple people towards a shared goal over the internet. TOP\nseparates the \\emph{what} from the \\emph{how}. This separation allows\nprogrammers to focus on the work that has to be done (\\emph{what}) instead of\npaying attention to design issues, implementation details, operating system\nlimitations, and environment requirements\n(\\emph{how})~\\cite{achten2015top,plasmeijer2012taskoriented}.\n\n\\textit{iTasks}~\\cite{achten2015top}, implemented in the functional programming language\nClean~\\cite{clean}, is the main TOP framework and has been around for a long time.\niTasks has been used to create real-world applications, such as an incident\ncoordination tool for the Dutch coast guard~\\cite{lijnse2012incidone}. While\nthis proves its practical usability, iTasks lacks in formalization.\nThe iTasks semantics are given by its implementation, making it much harder to formally reason about iTasks programs.\nPrevious attempts to mitigate this issue by some of iTasks' creators involved developing a separate iTasks semantics, which allowed them to perform model-based testing, but no formal verification~\\cite{DBLP:conf/ifl/KoopmanPA08}.\nFormal program verification is a very powerful tool to ensure the correctness of critical\nsoftware, like the incident coordination tool.\nTopHat is a Domain-Specific Language (DSL) that paves the way to formally reason about task-oriented\nprograms~\\cite{steenvoorden2019tophat}, by defining a formal TOP semantics.\nThese semantics have been implemented in Haskell and Idris~\\footnote{\\url{https://github.com/timjs/tophat-proofs}}.\nIdris is a programming language that features dependent types and a totality checker, which is used to prove properties of TopHat programs.\n\n\\subsection*{Motivation}\nIn this paper, we develop an interactive UI for TopHat.\nBefore the development of TopHat, it was the case that iTasks, TOP and Clean were tied together very strongly.\nPrevious research even suggests that certain specific Clean features are essential to the implementation of TOP\\cite{plasmeijer2012taskoriented}:\nuniqueness typing, data generic programming, dynamics~\\cite{DBLP:conf/ifl/VervoortP02} and a sophisticated backend using interpreted ABC bytecode on clients \\cite{Oortgiese2017distributed}, to name a few.\nWe want to show that none of those features are essential in implementing a TOP framework with a GUI.\nOn top of that, we want to demonstrate that this can be achieved without making any changes to the TopHat language and its implementation in Haskell.\n\nWe expect this work to bring TOP to a bigger audience.\nThe current Clean user base is quite small.\nHaskell is being used in production code, has a huge number of packages available online and an active community.\nTask-oriented programming could benefit from being ported to Haskell, making it available to a large community of both developers and researchers.\nDeveloping an interactive UI for TopHat brings this one step closer.\n\nMotivated by the above, this paper presents a prototype framework written on top of TopHat's\nHaskell implementation that is able to create interactive graphical user interfaces of\nTopHat programs.\n\n\n\n\\subsection*{Structure}\n\nThe remainder of this paper is structured as follows: we first provide some\nbackground about TOP, including iTasks and TopHat in Section~\\ref{sec:top}.\nSection~\\ref{sec:tophat-user-interface} introduces our TopHat UI prototype.\nSection~\\ref{sec:examples} demonstrates the capabilities of our framework, including formal reasoning, using several example TopHat programs.\nWe highlight related work in Section~\\ref{sec:related-work}.\nSection~\\ref{sec:discussion} reflects on the goals and research questions outlined above.\nSection~\\ref{sec:conclusion} concludes.\n\n\\section{Task-Oriented Programming}\n\\label{sec:top}\n\nThis paper builds upon previous TOP research~\\cite{plasmeijer2012taskoriented,achten2015top,steenvoorden2019tophat}.\nIn this section we describe the basic idea of TOP and two TOP\nimplementations: iTasks and TopHat.\n\n\\subsection{Task-Oriented Programming}\n\nTOP is centered around the concept of \\textit{tasks}, which specify the work a\nuser or system has to perform with a high level of abstraction.\nThe smallest possible task represents the smallest amount of\nwork a user or system can perform~\\cite{plasmeijer2012taskoriented}.\n\nCombining small tasks allows creating large and complex applications\nusing simple building blocks. Tasks can be combined using combinators:\nthey can be executed sequentially, in parallel, or conditionally.\nThese combinators closely resemble how collaboration happens in real life.\n\nTOP aims to facilitate collaborating with multiple people towards a shared goal, over the internet.\nCreating complex applications is further facilitated because\ntasks are first-class citizens: they can be used as input of functions, they can\nbe returned from them, and tasks can contain other tasks as value.\n\nTasks are interactive and input-driven. When a task receives input it is\nreevaluated and results in a new task. A task's value can be observed at all\ntimes. Tasks can share information with each other, either directly through\nshared data stores, or by passing task values to continuations.\n\nTOP itself focuses on the domain logic, with tasks providing merely a description of\nthe work that has to be performed. It is left up to a TOP framework to do the\nheavy lifting, such as generating the user interface, storing and handling data,\nsetting up a web server, and authenticating users.\n\\textit{iTasks}~\\cite{achten2015top} is such a\nframework, implemented in the functional programming language\nClean~\\cite{clean}. An example of a basic task in iTasks is presented in\nListing~\\ref{lst:task-itasks}. Developers only have to specify that they\nwant the user to enter some information. Passing this task description to iTasks generates\nan application that prompts the user for their name.\n\n\\begin{lstlisting}[language=Haskell,caption={A simple task prompting the user for their name (Clean)},label={lst:task-itasks}]\nenterName :: Task String\nenterName = Hint \"What is your name?\" @>> enterInformation []\\end{lstlisting}\n\nThe TOP paradigm provides an abstraction over workflow software. Instead of\nhaving to write a server, database, user interfaces, etc, programmers just\ndefine what needs to be done. The complete application is then derived from this\nspecification by the TOP framework. TOP is usually embedded in pure functional programming.\n\nTo summarize, TOP is made up of the following three core concepts:\n\n\\begin{description}\n \\item[Tasks] that describe the work that has to be performed, providing\n an abstraction that separates the \\textit{what} from the\n \\textit{how}~\\cite{achten2015top}.\n \\item[Composition] of tasks through combinators, allowing the\n creation of arbitrarily large tasks.\n \\item[Data] that is being passed between tasks sequentially and globally.\n\\end{description}\n\n\n\n\n\\subsection{iTasks}\n\\label{sec:context:itasks}\n\n\niTasks~\\cite{plasmeijer2007itasks} is a TOP framework that uses\nClean~\\cite{brus1987clean} as its host language. It supplements Clean with a set\nof combinators, model types, and algorithms that allow the construction of\ntask-oriented programs.\n\nAn example of a basic task was given in Listing~\\ref{lst:task-itasks}. iTasks\nwill automatically generate an entire application for this task. It uses\ngenerics to deduce that a task of type \\texttt{String} requires a text input\nfield. In Listing~\\ref{lst:itasks-greet} we\ncombine the task with a view task using a sequential step combinator. A\nuser has to enter their name and is greeted by the program after stepping to\nthe next task. Figure~\\ref{fig:itasks-greet} shows how these steps would look in\niTasks.\n\n\n\\begin{lstlisting}[language=Haskell,caption={Combining two tasks with a step combinator (Clean)},label={lst:itasks-greet}]\ngreet :: Task String\ngreet = enterName >>!\n \\result -> viewInformation [] (\"Hello \" +++ result)\\end{lstlisting}\n\n\\begin{figure}[t]\n \\centering\n \\includegraphics[width=0.47\\linewidth]{pics/itask_greeting_1_frame.png}\n \\includegraphics[width=0.47\\linewidth]{pics/itask_greeting_2_frame.png}\n \\captionsetup{type=figure}\n \\captionof{figure}{Entering your name (left) and the result after pressing continue (right)}\n \\label{fig:itasks-greet}\n\\end{figure}\n\niTasks is a work in progress, receiving constant updates and improvements. For\nexample, a recent addition is the usage of a distributed, dynamic\ninfrastructure~\\cite{Oortgiese2017distributed}. iTasks has formed the basis of\nfurther research as well. Tonic~\\cite{stutterheim2014tonic} facilitates the\nsubject for non-technical people by providing graphical blueprints of iTasks\nspecifications. It also provides a way to monitor the process while end users\nare interacting with the application~\\cite{stutterheim2019static}. iTasks acted\nas the starting point for research into declarative user interfaces, first for\nSVG images~\\cite{achten2014itasks} and later as a generalized\nsolution~\\cite{achten2016layout}.\n\n\\subsection{TopHat}\n\\label{sec:tophat}\n\nWhen software is used in critical applications, it is important that its\nbehavior can be verified and formally reasoned about. iTasks is primarily\nfocused on practical applicability, and therefore lacks this formalization.\nTesting an iTasks application is time consuming and often incomplete because of\nthe many different execution paths.\n\nTopHat~\\cite{steenvoorden2019tophat} distills TOP's core features\nto provide a way to reason about task-oriented programs. By employing\nsymbolic execution it is possible to formally verify TopHat\nprograms~\\cite{naus2019symbolic}. Symbolic execution has also been used to\nprovide end-users of tasks with additional feedback~\\cite{naus2020generating}.\n\nOur work is based on TopHat's Haskell implementation.\nListing~\\ref{lst:tophat-greet} gives the TopHat implementation of the example introduced in Section~\\ref{sec:context:itasks}.\nSimilar to the iTasks code, this task uses a step combinator to ask a user their name and subsequently greet them.\n\n\n\\begin{lstlisting}[language=Haskell,caption={A TopHat task that greets the user (Haskell)},label={lst:tophat-greet}]\ngreet :: Task h String\ngreet = enter >>? \\result -> view (\"Hello \" ++ result)\\end{lstlisting}\n\n\nTopHat contains the following set of tasks and combinators:\n\n\\begin{description}\n \\item[Editors] model user interaction.\n They are typed containers\n that are either empty or hold a value.\n TopHat contains different kinds of editors:\n \\begin{description}\n \\item[Update] contains a predefined value.\n \\item[View] is an editor with a view-only value.\n \\item[Enter] is an editor that is initially empty. Filling it transforms it into an Update editor.\n \\item[Watch] displays the value of a shared data store.\n \\item[Change] is an editor that allows to change the value of a shared data store.\n \\end{description}\n \\item[Done and Fail] are success and failure end tasks.\n \\item[Pair] combines two tasks (parallel-and).\n \\item[Choose] makes a choice between two tasks (parallel-or).\n \\item[Step] sequentially moves from one task to another.\n \\item[Share] creates a shared data store.\n \\item[Assign] assigns a value to a reference in a shared data store.\n\\end{description}\n\n\n\\subsection{Formal reasoning}\n\\label{sec:top:formal}\n\niTasks defines tasks as a ``state transforming function that reacts to an event, rewrites itself to a reduct and accumulates responses to users''~\\cite{plasmeijer2012taskoriented}.\nFor combinators, iTasks takes the swiss-army-knife-approach.\nIt defines two combinators that perform a multitude of actions.\nFrom these combinators, more simple ones can be constructed.\nFor example, the \\texttt{>>*} combinator performs sequential composition, allows the user to choose from a list of tasks, allows automatic progressing tasks, guarded tasks, and stepping on exception.\nIts definition in the latest version of iTasks is about 100 lines of Clean code, relying on many custom functions~\\footnote{\\url{https://gitlab.com/clean-and-itasks/itasks-sdk/-/blob/master/Libraries/iTasks/WF/Combinators/Core.icl}}.\nWhile iTasks is certainly an impressive engineering accomplishment, it is unfit for formal reasoning.\n\nTopHat on the other hand defines tasks as a simple datatype, with three base cases and a small number of simple combinators~\\cite{steenvoorden2019tophat}.\nThe TopHat framework takes care of handling events, rewriting and task rendering.\nThe formal TopHat semantics fits on a single page, and is largely straightforward.\n\nTo demonstrate the formal reasoning capabilities of TopHat, a symbolic execution semantics has been developed~\\cite{naus2019symbolic}.\nFor space reasons, we will refrain from repeating syntax and semantics here, but will revisit an example, to use thoughout this paper.\n\n\\lstset{emph={invoiceDate,date,confirmed,invoiceAmount,approved}}\n\\begin{TASK}[float=ht\n ,numbers=right\n ,caption=Subsidy request and approval workflow at the Dutch tax office.\n ,label=lst:tax\n ]\n let today = $\\text{25 Sept 2020}$ in\n let provideDocuments = enter Amount <&> enter Date in\n let companyConfirm = edit True edit False in\n let officerApprove = \\ invoiceDate. \\ date. \\ confirmed.\n edit False if (date - invoiceDate < 365 /\\ confirmed) |\\label{lst:tax:officer-approve-def}|\n then edit True\n else fail in\n provideDocuments <&> companyConfirm >>= |\\label{lst:tax:documents-and-company-confirm}|\n \\ <<<>, confirmed>>.\n officerApprove invoiceDate today confirmed >>= \\ approved.|\\label{lst:tax:officer-approve}|\n let subsidyAmount = if approved\n then min 600 (invoiceAmount / 10) else 0 in\n edit <>|\\label{lst:tax:result}|\n\\end{TASK}\n\nListing~\\ref{lst:tax} provides the code for a small example task, implementing the process of applying for a tax subsidy.\nThis example was inspired by a collaboration with the Dutch Tax office.\nThe user gets asked to provide documents to back up their tax subsidy request for solar panel installation (line 2).\nThe installation company has to confirm that they installed the panels (line 3), which can be done in parallel (line 8).\nFinally, a tax officer can either approve or deny the request (line 4), depending on certain conditions (line 5).\nAfter the task has been completed, the subsidy amount is being calculated (line 12), and the details are returned in a view (line 13).\n\nFor this task, symbolic execution allowed the authors to prove correctness properties over the code, such as functional correctness.\nIn Section~\\ref{sec:tax} we will take a look at generating a UI using the framework presented in the coming section.\n\n\\section{Related work}\n\\label{sec:related-work}\n\nSection~\\ref{sec:top} presentend related work on TOP and iTasks.\nIn this section, we will briefly discuss Functional reactive programming as an alternative to TOP, as well as alternatives for the UI framework and web server we have used during the development of the UI for TopHat.\n\n\n\\subsection{Functional Reactive Programming}\n\nFunctional Reactive programming (FRP) is another approach to UI development\nusing functional programming. FRP is a programming paradigm centered around\ninteractive event-based applications. It has implementations in multiple\nprogramming languages, such as Haskell and\nJavaScript~\\cite{Bainomugisha2013reactive}.\n\nFRP consists of two main concepts: \\textit{behaviors} and \\textit{events}. A\nbehavior consists of a value and can be mapped to output, for example a label.\nBehaviors can depend on other behaviors, so a change in a behavior can propagate\nthrough a network of dependent behaviors. An event only occurs at a certain\npoint in time and contains a value. Input is mapped to events, for example the\npressing of a key or the position of the mouse cursor. Events can trigger\nchanges in behaviors.\n\nIt is worth noting that, while they share some similarities, FRP and TOP are\nconceptually different. FRP is a paradigm for reactive programming, whereas TOP\nis a way to model collaboration between users.\n\n\\subsection{User Interface frameworks}\n\nWe build upon the Halogen framework to create our prototype, but many other UI\nframeworks exist in the domain of functional programming.\nWe discuss three of these briefly below.\n\nElm~\\cite{czaplicki2012elm} refers to both Elm, a functional programming\nlanguage that compiles to JavasScript~\\cite{elmlang}, and\nTEA~\\cite{elmarchitecture}, a programming pattern that emerged from it.\nElm's ecosystem consists of a large number of available libraries that help in\ncreating web applications.\n\n\n\nMiso~\\cite{haskellmiso} is a Haskell front-end framework inspired by Elm and\nRedux. It relies on GHCJS~\\cite{ghcjs}, a Haskell-to-JavaScript compiler based\non GHC.\n\n\n\nReflex~\\cite{reflexfrp} is an FRP framework written in Haskell with support for\na variety of platforms, including the web, desktop, and mobile. Reflex\napplications are modular, which makes growing and refactoring an application\nefficient and swift.\n\n\n\n\nWe have selected PureScript and Halogen because it is a powerful functional programming language\nthat fits our problem domain. Halogen provides an excellent developer\nexperience, has a component based architecture and builds upon PureScript's\npower and expressiveness.\n\n\n\\subsection{Web servers}\n\nWe have opted for Servant as our web server.\nServant provides combinators to implement our features, which makes\ncoding less error prone and time-consuming. Servant is up-to-date,\nwell-maintained, well documented and it is easy to get a working prototype.\nBelow we discuss Yesod and Warp as possible alternatives for the server used in our implementation.\n\nThe Yesod Web Framework~\\cite{snoyman} is a Haskell web framework that allows for rapid development of type-safe, RESTful and high performance web applications~\\cite{yesod}.\nThe Yesod Web Framework adds the strengths of Haskell (like type safety) to the web.\nEspecially on the boundaries of Yesod and the world, for example a user enters input or persistent data is loaded, Yesod adds mechanisms to define the expected types~\\cite{yesodBook}. We found that developing a prototype based on Yesod is more difficult than developing a prototype based Servant. We also found that the Yesod Web Framework is too extensive for our purposes~\\cite{markmarc2021}.\n\n\n\nThe Warp web server is a light-weight web server that supports the Web Application Interface (WAI)~\\cite{snoyman2011warp}.\nIt is meant to be easy to use and provide easy composition of web services.\nBecause of the design choices to achieve this, the code of a Warp prototype is low-level.\nThis means that implementing all features in this way will be error prone and time-consuming.\nTherefore, we have chosen Servant. However, Servant also uses Warp as its web server~\\cite{markmarc2021}.\n\n", "meta": {"timestamp": "2022-08-31T02:03:20", "yymm": "2208", "arxiv_id": "2208.13870", "language": "en", "url": "https://arxiv.org/abs/2208.13870"}} {"text": "\\section{Introduction}\n\n\nIt is natural to expect that some states of a quantum system are more ``quantum'' than the others. To transform this intuitive thought into a qualitative concept, we use the conventional statistical interpretation of quantum mechanics. The quasiprobability distribution functions will be regarded as a source of information about the classicality/quantumness of a state. Our consideration is based on the ideas borrowed from the geometric probability theory \\cite{KlainRota1997} and a commonly accepted opinion that if quasiprobability functions attain negative values, then it is a certain sign of quantum nature (see \\cite{Wigner1932}-\\cite{Feynman1987} and \\cite{FerrieMorrisEmerson2010} with references therein). This observation allows one to specify the notion of ``classical states \u00e0 la Wigner'' as the states whose Wigner function is positive semidefinite everywhere in the phase space. Based on this definition, several measures of classicality/quantumness have been constructed \\cite{Hillery1987}-\\cite{AKA2021}. When dealing with an ensemble of random states, the probability to find a ``classical state'' among the members of an ensemble is an example of these kind of measures \\cite{AKhT2020}-\\cite{AKR2021}. \n\nIn the present article, after the introduction of classicality indicator $\\mathcal{Q}$ as the geometric probability, we will compute it for a 3-level quantum system, a qutrit. We will compare the characteristics of classicality of qutrits from three random ensembles: the Hilbert-Schmidt and two other ensembles, associated with the monotone Riemannian metrics \\--- the Bures and the Bogoliubov-Kubo-Mori metrics (cf. \\cite{MorozovaChensov1990}-\\cite{HiaiKosakiPetzRuskai2013}). To make the presentation self-consistent, in the next sections necessary notions and definitions related to these random ensembles and the Wigner function of a finite-dimensional quantum system will prelude calculations of the corresponding probabilities. Calculating the probabilities for different varieties of states, we analyze the dependence of the classicality measure on the moduli parameter of a qutrit Wigner function.\n\n\n\n\\section{Unitary invariant ensembles of qudits}\n\n\\label{sec:Random}\n\n\nLet us consider a qudit \\--- a quantum system associated with an $N\\--$dimensional Hilbert space. The quantum state space $\\mathfrak{P}_N$ of an $N\\--$level qudit is defined as: \n\\begin{equation}\n\\label{eq:StateSpace}\n\\mathfrak{P}_N =\\{\\, \\varrho \\in M_N(\\mathbb{C}) \\ |\\ \\varrho=\\varrho^\\dagger\\,,\\quad \\varrho \\geq 0\\,, \\quad \\mbox{tr}\\left( \\varrho \\right) = 1 \\, \\}\\,.\n\\end{equation}\nThe unitary $U(N)$ automorphism of the Hilbert space of an $N\\--$level quantum system induces the adjoint $SU(N)$ transformations of density matrices $\\varrho \\in \\mathfrak{P}_N$\\,: \n\\begin{equation}\n\\label{eq:UP}\n g\\cdot \\varrho = g \\varrho g^\\dagger\\,, \\qquad g\\in SU(N)\\,.\n\\end{equation}\nFor a closed system it is assumed that the probability density function of the corresponding ensemble of $N\\--$ dimensional qudits is invariant under (\\ref{eq:UP}):\n\\begin{equation}\n\\label{eq:InvPDF}\n P(\\varrho)= P(g\\varrho g^\\dagger)\\,, \\qquad \\forall\\ g \\in SU(N)\\,.\n\\end{equation}\nFurther in the report three ensembles of random states respecting this unitary symmetry will be used for evaluation of the measure of classicality. Namely, we will consider the unitary invariant ensembles associated with the following Riemannian metrics on state space:\n\\begin{itemize}\n\\item[\\--] the Hilbert-Schmidt metric $\\mathrm{g_{{}_\\mathrm{HS}}}$\\,; \n\\item[\\--] the Bures metric $\\mathrm{g_{{}_\\mathrm{B}}}$\\,;\n\\item[\\--] the Bogoliubov-Kubo-Mori metric $\\mathrm{g_{{}_\\mathrm{BKM}}}$\\,.\n\\end{itemize}\nBefore dealing with a specific ensemble, it is worth drawing attention to a common property of each of these ensembles emerging due to $SU(N)$ invariance (\\ref{eq:InvPDF}). \n\n\n\\paragraph{Stratification and factorization of probability distribution on $\\mathfrak{P}_N$\\,.}\nThe invariance property (\\ref{eq:InvPDF}) leads to a certain factorization of the probability distribution functions $ P(\\varrho)$ into two factors, one depending on $SU(N)\\--$invariants solely, and the other being a universal function of the ``angular variables''.\nMoreover, the structure of this factorization is universal for all states whose unitary orbits are characterised by the same isotropy group, $H_\\alpha \\subset SU(N)$\\,, i.e., belong to a class with the same ``orbit type'' \n\\footnote{\nSubgroup $H_x \\subset SU(N)$ is the \\textit{isotropy group (stabilizer)} of point $x\\in \\mathfrak{P}_N$ and is defined as \n\\[\nH_x =\\{\\, g\\in SU(N)\\ | \\ g\\cdot x =x\\, \\}\\,.\n\\]\nIf the conjugacy class of $H$ is denoted by $[H]$\\,, then we say that \\textit{the type of the orbit is $[H]$}, if the stabilizer $H_x$ of some/any point $x$ in the orbit belongs to $[H]\\,.$ \n}\\,.\nIsotropy groups $H_\\varrho$ of any point $\\varrho \\in \\mathfrak{P}_N$ are determined by the algebraic degeneracy of the spectrum of $\\varrho$ and are in one-to-one correspondence with the Young diagrams of all possible decompositions of $N$ into non-negative integers. Hence, we associate the given partition of $N$ with the \\textit{stratum} $\\mathfrak{P}_{[H_\\alpha]}\\,,$ defined as the set of all points of $\\mathfrak{P}_N\\,,$ whose stabilizer is conjugate to subgroup $H_\\alpha$\\,:\n\\begin{equation}\n \\mathfrak{P}_{[H_\\alpha]}: =\\big\\{\\, x \\in \\ \\mathfrak{P}_N|\\ H_x \\mbox{~is~conjugate~to}\\ H_\\alpha \\, \\big\\} \\,,\n\\end{equation}\nwhere $ \\ \\alpha = 1, 2, \\dots, p(N)$\\,. \\footnote{The partition function $p(N)$ gives a number of possible partitions of a non-negative integer ${N}$ into natural numbers.\n}\nThe union of $\\mathfrak{P}_{[H_\\alpha]}$ results in the state space $\\mathfrak{P}_{N}$\\,:\n\\begin{equation}\n\\label{eq:OrbitDec}\n\\mathfrak{P}_N=\\bigcup_{\\mbox{orbit types}}{\\mathfrak{P}}_{[H_\\alpha]}\\,,\n\\end{equation}\nwith each component of the decomposition \n(\\ref{eq:OrbitDec}) consisting of density matrices with a fixed algebraic degeneracy,\n\\begin{equation}\n\\label{eq:stratumDeg}\n\\mathfrak{P}_{[H_\\alpha]}=\n \\bigcup_{\\omega \\in S_s }{\\mathfrak{P}_{k_{\\omega(1)},\n k_{\\omega(2)},\n \\dots, k_{\\omega(s)}}}\\,. \\end{equation}\nIn (\\ref{eq:stratumDeg}) $S_s$ is a symmetric group acting on a given partition of $N$ into $s$ natural numbers $k_1, k_2, \\dots, k_s\\,.$ Algebraically, $\\mathfrak{P}_{k_1, k_2, \\dots, k_s}$ being a set of states with a fixed degeneracy is defined via the characteristic polynomial of a density matrix:\n\\footnote{Note that in (\\ref{eq:DegSet}) the condition of summing up the degrees of degeneracy to $N$ means that only the maximal rank states are considered.}\n\\begin{equation}\n\\label{eq:DegSet}\n \\mathfrak{P}_{k_1, k_2, \\dots, k_s} = \\{\\, \\varrho\\in\\mathfrak{P}_N\\,, k_i \\in \\mathbb{Z}_+\\, |\\, \n\\det(\\varrho-\\lambda)=\\prod_{i=1}^s (r_i-\\lambda)^{k_i}\\,, \\quad \\sum_{i=1}^s k_i= N \\, \\}\\,.\n\\end{equation}\nGeometrically, the set $\\mathfrak{P}_{k_1, k_2, \\dots, k_s}$ with $k_1=k_2=\\cdots=k_N=1$ represents the interior of an $(N-1)\\--$dimensional simplex $C_{N-1}$ of eigenvalues:\n\\begin{equation}\n\\label{eq:NorderedSim}\n C_{N-1} := \\{\\, \\boldsymbol{r} \\in \\mathbb{R}^N \\, \\biggl| \\, \n\\sum_{i=1}^{N} r_i = 1\\,, \\quad 1\\geq r_1\\geq r_2 \\geq \\dots \\geq r_{N-1}\\geq r_N \\geq 0 \\, \\}\\,, \n\\end{equation}\nwhile for all other admissible tuples $\\boldsymbol{k}=(k_1, k_2, \\dots, k_s )$ each $\\mathfrak{P}_{k_1, k_2, \\dots, k_s}$ represents the union of the faces and edges of the $(N-1)\\--$simplex parameterized by the barycentric coordinates of the following kind: \n\\begin{equation}\n\\label{eq:spec}\n\\boldsymbol{r}^{\\downarrow}(\\varrho)=\\{r_1 \\overbrace{(1, \\dots, 1)}^{k_1}\\,;\\, r_2\\overbrace{(1, \\dots, 1)}^{k_2}\\,;\\, \\dots \\,;\\, r_s\\overbrace{(1, \\dots, 1)}^{k_s}\\}\\,.\n\\end{equation}\n\n\nNow, bearing in mind the above described stratification of $\\mathfrak{P}_N\\,,$ it is easy to show the factorization of $SU(N)\\--$invariant measures. Indeed, one can be convinced that the Singular Value Decomposition (SVD) of the density matrix from a stratum $\\mathfrak{P}_{[H_\\alpha]}$ with spectrum of the form (\\ref{eq:spec}): \n\\begin{equation}\n\\label{eq:SVD}\n\\varrho=U\\mathrm{diag}\\left(r_1, r_2 , \\dots r_s\\right) U^\\dagger\\,, \\qquad U\\in SU(N)/H_\\alpha\\,,\n\\end{equation} \nreveals the following factorization of the invariant probability distribution (\\ref{eq:InvPDF}):\n\\begin{equation}\n \\mathrm{P}(\\varrho)=P(r_1,\\dots, r_s)\\, \\mathrm{d}r_1\\wedge \\cdots \\wedge \\mathrm{d}r_N\\wedge \\mathrm{d}\\mu_{U(N)/H}\\,,\n\\end{equation}\nwhere the first factor $P(r_1,\\dots, r_s)$ represents a measure on\nsubset $\\mathfrak{P}_{k_1,k_2,\\dots,k_s}$ of the simplex $\\mathcal{C}_{N-1}$, while the second factor is the measure on coset \n$U(N)/H$\\,. \n\nAfter a preliminary exposition of this generic property of unitary invariant ensembles, we will now specify the form of the distribution $P(r_1,\\dots, r_N)$ for the Hilbert-Schmidt metric and for an important class of the monotone metrics.\n\n\n\\paragraph{The Hilbert-Schmidt ensemble of qudits.}\nLet us consider the metric corresponding to the distance between two infinitesimally close matrices $\\varrho-\\mathrm{d}\\varrho$ and \n $\\varrho+\\mathrm{d}\\varrho$ calculated with respect to the Frobenius norm, \n\\begin{equation}\n \\label{eq:HSGen}\n \\mathrm{g_{{}_\\mathrm{HS}}} \\propto \\mathrm{Tr} \\left(\\mathrm{d}\\varrho\\otimes\\mathrm{d}\\varrho\\right)\\,.\n\\end{equation}\nIf a density matrix belongs to the interior of the simplex $C_{N-1}$\\,, i.e., the matrix has $N$ distinct non-zero eigenvalues $(k_1=k_2=\\cdots=k_N=1)$\\,, then the metric (\\ref{eq:HSGen}) defines the standard \\textit{Hilbert-Schmidt ensemble} of random full rank $N\\--$qudits. A straightforward computation shows that the joint probability distribution of eigenvalues reads\n\\begin{equation}\nP^{\\rm HS}(r_1,\\dots,r_N) \\propto \\,\n \\delta(1-\\sum_{j=1}^N r_j) \\prod_{j r_1 > r_2 > r_3 > 0$\\,, the components of decomposition (\\ref{eq:OTD}) are described as follows (see geometrical illustration in Fig. \\ref{fig:2S+CL}): \n\\begin{enumerate}\n\\item the regular stratum $\\mathfrak{P}_{[T^3]}$ of maximal dimension 6 consists of matrices with a simple spectrum, $1> r_1\\neq r_2\\neq r_3 > 0$\\,. The corresponding orbit space is the face $F_{123}$ of the ordered 2-simplex, the interior of $\\triangle AOB$\\,, \n\\item the degenerate 4-dimensional stratum $\\mathfrak{P}_{[S(U(2)\\times U(1))]}$ with density matrices whose degeneracy is $\\boldsymbol{k}=(2,1)$ and $\\boldsymbol{k}=(1,2)$\\,, i.e., $1>r_1\\neq r_2=r_3> 0$ and $1>r_1=r_2\\neq r_3>0$\\,. The corresponding orbit space represents the union of edges $F_{1|23}$ and $F_{12|3}$ of the 2-simplex, two sides of $\\triangle AOB$\\,, \n\\item the 0-dimensional stratum $\\mathfrak{P}_{[SU(3)]}$ of the maximally mixed state with the triple degeneracy $\\boldsymbol{k}=(3)$\\,, $r_1=r_2=r_3=1/3$\\,.\n\\end{enumerate}\n\n\\begin{figure}[h!]\n\\center{\n\\includegraphics[width=0.4\\textwidth]{2S+CL}\n}\n\\caption{The ordered 2-simplex of qutrit eigenvalues is represented by $\\triangle AOB$\\,, and the hatched region, $\\triangle COD\\,,$ corresponds to the classical states. Edges $AO/\\{A\\}$ and $BO/\\{B\\}$ are locus of degenerate states $F_{12|3}$ and $F_{1|23}$\\,, while their parts $CO/\\{O\\}$ and $DO/\\{O\\}$ represent the degenerate classical states $F^+_{12|3}$ and $F^+_{1|23}$\\,.\n}\n\\label{fig:2S+CL}\n\\end{figure} \n\n\nTaking into account the decreasing order of the eigenvalues, $1 \\geq r_1 \\geq r_2 \\geq r_3 \\geq 0$\\,, the spectrum of qutrit admits the following parameterization:\n\\begin{eqnarray}\n\\label{eq:specrho1}\nr_1&=&\n\\frac{1}{3}-\\frac{2r}{\\sqrt{3}}\\,\n\\cos\\left(\\frac{\\varphi+2\\pi}{3}\\right),\\\\\n\\label{eq:specrho2}\nr_2 &=&\n\\frac{1}{3}-\\frac{2r}{\\sqrt{3}}\\,\n\\cos\\left(\\frac{\\varphi+\n4\\pi}{3}\\right),\\\\\n\\label{eq:specrho3}\nr_3&=&\\frac{1}{3}-\n\\frac{2r}{\\sqrt{3}}\\,\n\\cos\\left(\\frac{\\varphi}{3}\\right),\n\\end{eqnarray}\nwith $r \\in [0, 1/\\sqrt{3}]$ and the angle $\\varphi \\in [0, \\pi]$\\,. \nIf $r$ and $\\varphi$ are treated as the polar coordinates on a plane, $\\left(r\\cos\\varphi\\,, r\\sin\\varphi\\right)\\,,$ then geometrically the formulae (\\ref{eq:specrho1})-(\\ref{eq:specrho3}) can be interpreted as a map between the ordered simplex $\\mathcal{C}_{2}$ and the domain of the upper half-plane outlined by the Maclaurin trisectrix:\n\\begin{equation}\n r(\\varphi, 1/\\sqrt{3})=\\frac{1}{2\\sqrt{3} \\cos({\\varphi}/{3})}\\,.\n\\end{equation}\n\nMore precisely, under transformations (\\ref{eq:specrho1})-(\\ref{eq:specrho3}), the ordered simplex of eigenvalues $\\mathcal{C}_2$ \nmaps to the domain (see Fig. \\ref{fig:QutritMacTris}) \n\\begin{equation}\n\\label{eq:OrbitQutrit}\n F_{123}=\\,: \\ \n \\biggl\\{\\, r \\geq 0\\,, \\varphi \\in [0, \\pi]\\,\\ \\biggl|\\, \\ \\cos\\left(\\frac{\\varphi}{3}\\right) \\leq \\frac{1}{2\\sqrt{3}r} \\, \\biggl\\}\\,.\n\\end{equation} \n\n\\begin{figure}[h!]\n\\center{\n\\includegraphics[width=0.4\\textwidth]{QutritMacTris.png} \n}\n\\caption{The image of the ordered simplex $\\mathcal{C}_2$ on the plane $x=r\\cos\\varphi\\,,\\, y=r\\sin\\varphi\\,$ under the mapping (\\ref{eq:specrho1})-(\\ref{eq:specrho3}).}\n\\label{fig:QutritMacTris}\n\\end{figure}\n\n\n\\paragraph{Wigner function of a qutrit}\nThe master equations (\\ref{eq:SWspace}) for eigenvalues of the Stratonovich-Weyl kernel of a qutrit,\n\\begin{equation}\n\\label{eq:mod3}\n\\pi_1+\\pi_2+\\pi_3=1\\,, \\qquad \\pi^2_1+\\pi^2_2+\\pi^2_3=3 \\,, \n\\end{equation}\ndefine a one-parametric family of the Wigner functions. Due to the permutation symmetry of (\\ref{eq:mod3}),\nthe corresponding moduli space is a unit circle factorised by the symmetric group $S_3\\,.$ Let $\\mu_3$ and $\\mu_8$ be Cartesian coordinates of this arc with a polar angle from the interval $\\zeta \\in[0, \\frac{\\pi}{3}]\\,,$\n \\begin{equation}\n\\label{eq:muzeta}\n\\mu_3=\\sin\\zeta\\,, \\qquad \\mu_8=\\cos\\zeta\\,, \n\\qquad \n\\end{equation}\nthen, providing the decreasing order of the SW kernel eigenvalues, $\\pi_1 \\geq \\pi_2 \\geq \\pi_3\\,,$ one can represent the whole class of solutions to (\\ref{eq:mod3}) as: \n\\begin{eqnarray}\n\\label{eq:piparam}\n\\pi_1= \\frac{1}{3}+\\frac{2}{\\sqrt{3}}\\,\n\\mu_3+\\frac{2}{3}\\,\\mu_8\\,,\\quad\n\\pi_2=\\frac{1}{3}-\\frac{2}{\\sqrt{3}}\\,\\mu_3+\\frac{2}{3}\\,\\mu_8\\,, \\quad \n\\pi_3=\\frac{1}{3}-\\frac{4}{3}\\,\\mu_8\\,.\n\\end{eqnarray}\n\n\n\\paragraph{Classical states of qutrit.} \nThe image of classical states from the regular stratum $\\mathfrak{P}_{[T^3]}$ to the unitary orbit space is the interior $F^+_{123}$ of a cone which is cut out from the simplex $\\mathcal{C}_2$ by the line (see Fig. \\ref{fig:2S+CL}) \n\\begin{equation}\nL_{\\boldsymbol{\\pi}}(\\boldsymbol{r}) \\, : \\qquad r_1\\pi_3+ r_2\\pi_2+ r_3\\pi_1=0\\,,\n\\end{equation}\nwhile the orbit space of classical states from the stratum $\\mathfrak{P}_{S(U(2)\\times U(1))}$ consists of two pieces: $F^+_{1|23}$ and $F^+_{12|3}$\\,, corresponding to the matrices of degeneracy types $(2,1)$ and $(1,2)$ respectively.\nUsing the polar form of parameterization of the spectrum of a density matrix \n(\\ref{eq:specrho1})-(\\ref{eq:specrho3})\nand expressions (\\ref{eq:piparam}) \nfor the SW kernel eigenvalues, the cone of classical states on a regular stratum reads:\n\\begin{eqnarray}\n\\label{eq:coneclass1}\n&& F^+_{123}\\, : \\biggl\\{\\, r > 0\\,, \\varphi \\in (0, \\pi)\\,\\ \\biggl|\\, \\ \\cos\\left(\\frac{\\varphi}{3} +\\zeta -\\frac{\\pi}{3}\\right) \\leq \\frac{1}{4\\sqrt{3}r}\\,\\biggl\\}\\,,\n\\end{eqnarray}\nwhile the cone of classical states on the degenerate stratum $\\mathfrak{P}^{(+)}_{[S(U(2)\\times U(1))]}$ is:\n\\begin{eqnarray}\n \\label{eq:coneclass2} \n F^+_{1|23}&=&\\left\\{\\,\n \\varphi= 0, \\ r \\in (0, \\frac{1}{2\\sqrt{3}})\\ \\biggl| \\ \n \\cos\\left(\\zeta-\\frac{\\pi}{3}\\right)\n < \\frac{1}{4\\sqrt{3}r}\n \\,\\right\\}\\,,\\\\\n F^+_{12|3}&=& \n \\left\\{\\,\n \\ \\varphi= \\pi, \\ r \\in (0, \\frac{1}{\\sqrt{3}})\\ \\biggl| \\ \n \\cos\\left( \\zeta\\right)\n < \\frac{1}{4\\sqrt{3}r}\n \\,\\right\\}\\,.\n\\end{eqnarray}\n\n\n\\begin{figure}[h!]\n\\begin{minipage}[h]{0.32\\linewidth}\n\\center{\\includegraphics[width=1\\linewidth]{QutritLowerBound-zeta0}}\n\\end{minipage}\n\\hfill\n\\begin{minipage}[h]{0.32\\linewidth}\n\\center{\\includegraphics[width=1\\linewidth]{QutritLowerBound-zetapi6}}\n\\end{minipage}\n\\hfill\n\\begin{minipage}[h]{0.32\\linewidth}\n\\center{\\includegraphics[width=1\\linewidth]{QutritLowerBound-zetapi3}}\n\\end{minipage}\n\\begin{minipage}[h]{0.96\\linewidth}\n\\begin{tabular}{p{0.32\\linewidth}p{0.32\\linewidth}p{0.32\\linewidth}}\n\\centering \n\\footnotesize $\\zeta=0$ & \\centering \n\\footnotesize $\\zeta=\\pi/6$ & \\centering \n\\footnotesize $\\zeta=\\pi/3$ \\\\\n\\end{tabular}\n\\end{minipage}\n\\caption{The orbit space $F_{123}$ in blue and its subspace $F^+_{123}$ in red for different values of the moduli parameter: $\\zeta = 0, \\pi/6, \\pi/3$\\,.}\n\\label{fig:QutritLowerBound}\n\\end{figure} \n\n\n\\paragraph{$\\mathcal{Q}_3\\--$indicator for Hilbert-Schmidt ensemble of qutrits from regular stratum.} \nThe regular stratum $\\mathfrak{P}_{[T^3]}$ consists of density matrices with a simple spectrum. The expression $\\mathcal{Q}_{[T^3]}$ comprises the integrals over the face $F_{123}$ and its subset $F^{+}_{123}$\\,:\n\\begin{equation}\n\\label{eq:Q3T31}\n\\mathcal{Q}^{\\mathrm{ HS}}_{[T^3]}=\\frac{\\mathrm{vol_{HS}}(F^+_{123})}{\\mathrm{vol_{HS}}(F_{123})}\\,. \n\\end{equation}\nIn (\\ref{eq:Q3T31}) the expression $\\mathrm{vol_{HS}}(X)$ denotes the Riemannian integral over a region $X$ taken with the measure induced on $X \\in \\mathcal{C}_2$ from the Hilbert-Schmidt on $\\mathfrak{P}_3$\\,,\n\\begin{eqnarray}\n\\label{eq:volx}\n\\mathrm{vol}_{\\mathrm{HS}}(X)=\n\\int_{X}\\,\nP_{1,1,1}^{\\mathrm{HS}}(r_1,r_2,r_3)\\,\\mathrm{d}r_1\\wedge\\mathrm{d}r_2\\wedge\\mathrm{d}r_3\\,. \n\\end{eqnarray}\nTaking into account the expression (\\ref{eq:HSGen}) for the Hilbert-Schmidt measure and the polar form of the parameterization of the qutrit orbit space (\\ref{eq:OrbitQutrit}) and of (\\ref{eq:coneclass1}), we obtain the indicator of classicality as a function of the moduli parameter $\\zeta$\\,: \n\\begin{equation}\n\\label{eq:QR}\n\\mathcal{Q}^{\\mathrm{HS}}_{[T^3]}(\\zeta) = \\frac{20 \\cos^2{\\left(\\zeta -{\\pi }/{6}\\right)}+1}{128 \\left(4\\cos^2{\\left(\\zeta -{\\pi }/{6}\\right)} -1\\right)^5}\\,.\n\\end{equation}\n\n\n\\paragraph{$\\mathcal{Q}_3\\--$indicator for Hilbert-Schmidt ensemble of qutrits from degenerate stratum.} \nThe stratum \n$\\mathfrak{P}_{[S(U(2)\\times U(1))]}$ has two pieces, $F_{1|23}$ and $F_{12|3}$, associated with density matrices with degenerate eigenvalues $r_1=r_2\\neq r_3$ and $r_1\\neq r_2= r_3$\\,, respectively.\nHence, the $\\mathcal{Q}_3\\--$indicator for the degenerate stratum of a qutrit reads:\n\\begin{equation}\n\\label{eq:Q3U2U1}\n\\mathcal{Q}^{\\mathrm{ HS}}_{[S(U(2)\\times U(1))]}=\\frac{\\mathrm{vol_{HS}}(F^+_{1|23})+\\mathrm{vol_{HS}}(F^+_{12|3})}{\\mathrm{vol_{HS}}(F_{1|23})+\\mathrm{vol_{HS}}(F_{12|3})}\\,, \n\\end{equation}\nwhere we keep the notation previously used for the regular stratum (\\ref{eq:volx}), noticing only that the dimension of integration over the degenerate orbit state strata has decreased by one: \n\\begin{eqnarray}\n\\mathrm{vol}_{\\mathrm{HS}}(F_{1|23})=\n\\int_{F_{1|23}}\\,\nP_{2,1}^{\\mathrm{HS}}(r_1,r_2)\\,\\mathrm{d}r_1\\wedge\\mathrm{d}r_2\\,. \n\\end{eqnarray}\nThe evaluation of all integrals in (\\ref{eq:Q3U2U1}) gives:\n\\begin{equation}\n\\label{eq:QD}\n \\mathcal{Q}^{\\mathrm{ HS}}_{[S(U(2)\\times U(1))]}(\\zeta) = \\frac{1}{1056}\\left({\\csc ^5\\left(\\zeta +\\frac{\\pi }{6}\\right)+\\sec ^5(\\zeta )}\\right)\\,.\n\\end{equation}\nThe functional dependence of the indicator $\\mathcal{Q}_3^{\\mathrm{HS}}$ for the regular (\\ref{eq:QR}) and degenerate (\\ref{eq:QD}) strata is depicted in Fig. \\ref{fig:Qutrit-Q-Ln}\\,. Apart from this, in Fig. \\ref{fig:RatioQregdegenHS} we present the ratio\n\\begin{equation}\n \\mathrm{R^{HS}}(\\zeta)=\\frac{\\mathcal{Q}^{\\mathrm{ HS}}_{[S(U(2)\\times U(1))]}(\\zeta)}{\\mathcal{Q}^{\\mathrm{ HS}}_{[T^3]}(\\zeta)}\n\\end{equation}\nas a certain measure of the relation between the symmetry of a state and its classicality. \n\n\\begin{figure}[h!]\n \\centering\n \\begin{subfigure}[b]{0.4\\textwidth}\n \\centering\n \\includegraphics[width=\\textwidth]{Qutrit-Q-Ln}\n \\caption{ }\n \\label{fig:Qutrit-Q-Ln}\n \\end{subfigure}\n \\hfill\n \\begin{subfigure}[b]{0.4\\textwidth}\n \\centering\n \\includegraphics[width=\\textwidth]{RatioQregdegenHS}\n \\caption{ }\n \\label{fig:RatioQregdegenHS}\n \\end{subfigure}\n\\caption{\n(a) $\\mathcal{Q}_3\\--$indicators of a Hilbert-Schmidt qutrit as functions of $\\zeta $ for the regular (gray curve) and degenerate (blue curve) strata. The absolute minimum of both indicators is attained at $\\zeta=\\pi/6$\\,. \n(b) The ratio of degenerate to regular $\\mathcal{Q}_3\\--$indicators.}\n \\label{fig:RHS}\n\\end{figure} \n\n\n\\paragraph{$\\mathcal{Q}_3\\--$indicator for Bures ensemble of qutrits from regular stratum.}\nUsing the generic expressions for the joint probability distributions of eigenvalues for monotone metrics (\\ref{eq:JPDMono}) and the technique developed above, we compute the $\\mathcal{Q}_3\\--$indicators for the Bures and Bogoliubov-Kubo-Mori ensembles of qutrits. The results of our calculations are presented in Fig. \\ref{fig:Qutrit-Q-B-BKM-Ln}.\n\n\\begin{figure}[h!]\n \\centering\n \\begin{subfigure}[b]{0.4\\textwidth}\n \\centering\n\\includegraphics[width=\\textwidth]{Qutrit-Q-B-BKM-Ln}\n \\caption{ }\n\\label{fig:Qutrit-Q-B-BKM-Ln}\n \\end{subfigure}\n \\hfill\n\\begin{subfigure}[b]{0.4\\textwidth}\n\\centering\n \\includegraphics[width=\\textwidth]{RatioQregdegenB-BKM}\n \\caption{ }\n \\label{fig:RatioQregdegenB-BKM}\n \\end{subfigure}\n\\caption{(a) The plot of $\\mathcal{Q}_3$ for the Bures (solid curves) and BKM (dashed curves) ensembles of qutrits from the regular (gray curves) and degenerate (blue curves) strata. \n(b) The ratio $R$ of degenerate to regular $\\mathcal{Q}_3\\--$indicators for the Bures (solid blue) and the BKM (dashed blue) ensembles.}\n\\label{fig:2}\n\\end{figure}\n\n\\section{Summary}\n\nBearing in mind the results of the calculations of $\\mathcal{Q}_3$\\,, we will summarize with a few comments. The indicator of classicality $\\mathcal{Q}_3$\\,, being a functional of the ensemble probability distribution function, at the same time depends on two characteristics of the SW kernel: its isotropy group $H_\\alpha$ and the moduli parameter $\\zeta$\\,. Our studies of the $\\mathcal{Q}_3\\--$indicator reveal several interesting peculiarities concerning their interrelations: \n\\begin{itemize}\n\\item There is a certain coherence between the classification of states according to their classicality and their symmetry properties. In particular, it turns out that the states with a ``larger'' symmetry are more classical, cf. Fig. \\ref{fig:RHS} and Fig. \\ref{fig:2}. This observation demands further study and we plan to formalize it in forthcoming publications;\n\\item The character of the dependence of $\\mathcal{Q}_3$ on the type of the ensemble is monotone, i.e., the values of $\\mathcal{Q}_3$ for all strata are ordered in correspondence with the order of the ensembles, see \nFig. \\ref{fig:PairwiseRat}; \n\n\\item The $\\mathcal{Q}_3(\\zeta)\\--$indicator of the Hilbert-Schmidt ensemble is a symmetric function with respect to the global minimum point, $\\zeta=\\pi/6$\\,, see Fig. \\ref{fig:Qutrit-Q-Ln}\\,;\n\\item For monotone metrics the symmetry possessed by the Hilbert-Schmidt ensemble is broken. Data specifying the range of violation is given in Table\\,\\ref{table:1}.\n\\end{itemize} \n\n\\begin{figure}[h!]\n \\centering\n \\begin{subfigure}[b]{0.44\\textwidth}\n \\centering\n \\includegraphics[width=\\textwidth]{QtrRegRatioslnQlnQ}\n \\caption{ }\n\\label{fig:QtrRegRatioslnQlnQ}\n \\end{subfigure}\n \\hfill\n \\begin{subfigure}[b]{0.44\\textwidth}\n \\centering\n \\includegraphics[width=\\textwidth]{QtrDegenRatioslnQlnQ}\n \\caption{ }\n \\label{fig:QtrDegenRatioslnQlnQ}\n \\end{subfigure}\n\\caption{Pairwise ratios of $\\mathcal{Q}_3\\--$indicators of different ensembles for the regular (a) and for the degenerate (b) stratum.}\n\\label{fig:PairwiseRat}\n\\end{figure}\n\n\\begin{table}[h!]\n\\begin{center}\n\\begin{tabular}{|p{3cm}|p{3cm}|p{3cm}|p{3cm}|}\n\\hline\\hline\n\\multicolumn{4}{|c|}{\\sc Global $\\mathcal{Q}_3\\--$indicator vs. moduli parameter} \\\\ [0.2ex]\n\\hline\n\\hline\n{\\bf Ensemble }& $\\min{\\mathcal{Q}_3(\\zeta)}$ &$\\zeta_{\\min}$ & $\\mathcal{Q}_3(0)-\n\\mathcal{Q}_3\n({\\pi}/{3})$ \\\\\n\\hline\nHilbert-Schmidt & 0.0006751 & $\\pi/6 \\approx 0.523599$ & 0 \\\\\n\\hline\nBKM & 0.0000121609 & 0.527798 & 0.0000216102\\\\\n\\hline\nBures &\n0.0000891011 & 0.525096 & 0.0000472609 \\\\\n\\hline \n\\end{tabular}\n\\caption{Data on symmetry properties of $\\mathcal{Q}_3\\--$indicators.}\n\\label{table:1}\n\\end{center}\n\\end{table}\n\n\n\\section*{Acknowledgments}\nThe work of A.K. was supported in part by the Shota Rustaveli National Science Foundation of Georgia, Grant FR-19-034. \n\n", "meta": {"timestamp": "2022-08-31T02:04:11", "yymm": "2208", "arxiv_id": "2208.13908", "language": "en", "url": "https://arxiv.org/abs/2208.13908"}} {"text": "\\section{Introduction}\n\\label{intro}\n\nSobolev spaces on metric-measure spaces $M^{1,p}$ have been introduced in \\cite{Haj2}, and soon after, many other definitions followed.\nIndependently, Cheeger \\cite{Ch} and\nShanmugalingam \\cite{Sha} introduced notions of Sobolev spaces on metric-measure spaces based on the upper gradient of Heinonen and Koskela \\cite{HK}. \nTheir spaces are denoted by $H_{1,p}$ and $N^{1,p}$, respectively.\nWhile their definitions are different, it was observed by Shanmugalingam \\cite[Theorem~4.10]{Sha}, that the spaces $H_{1,p}$ and $N^{1,p}$ and are isometrically isomorphic when $p>1$.\n\nThroughout the paper we assume that $(X,d,\\mu)$ is a metric-measure space with a Borel regular doubling measure. In this setting,\nwe define $N^{1,p}(X)$, $p\\in [1,\\infty)$, as the space of functions $u\\in L^p(X)$ that have an upper gradient in $L^p(X)$. $N^{1,p}(X)$ is a Banach space with respect to the norm\n$$\n\\|u\\|_{N^{1,p}(X)}:=\\Big(\\| u \\|^p_{L^p(X)} + \\inf_g \\|g\\|^p_{L^p(X)}\\Big)^{1/p}.\n$$\nHere, the infimum is taken over all upper gradients $g$ of $u$. \nSee Section~\\ref{prelims} for additional details regarding our setting and the space $N^{1,p}(X)$.\n\nIf there are no rectifiable curves in $X$, then $g=0$ is an upper gradient of any function, and hence, $N^{1,p}(X)=L^p(X)$ isometrically. Therefore, in order to have a rich theory, we need a large family of rectifiable curves in $X$, which is guaranteed when the space supports a $p$-Poincar\\'e inequality. Recall that the space $(X,d,\\mu)$ supports a $p$-Poincar\\'e inequality, $p\\in [1,\\infty)$, if the measure $\\mu$ is doubling and there are constants \n$c_{{\\mathrm{PI}}} > 0$ and $\\lambda \\geq 1$ such that\n$$\n\\mvint_B |u-u_B|\\,d\\mu \\le c_{{\\mathrm{PI}}}\\operatorname{diam} (B) \\Biggl(\\,\\,\\mvint_{\\lambda B} g^p\\,d\\mu\\Biggr)^{1/p}\n$$\nfor all balls $B\\subseteq X$, for all Borel functions $u\\in L^1_{\\mathrm{loc}}(X)$, and all upper gradients $g$ of $u$. Here, and in what follows, the barred integral stands for the integral average and $u_B:=\\mvint_B u\\, d\\mu$ is the integral average of $u$ over the ball $B$. Also, $\\operatorname{diam} (B)$ denotes the diameter of $B$, and $\\lambda B$ stands for a ball concentric with $B$ and radius $\\lambda$ times that of $B$.\n\nCheeger \\cite{Ch} proved that if the space $(X,d,\\mu)$ supports a $p$-Poincar\\'e inequality for some $p\\in (1,\\infty)$, then the space $N^{1,p}(X)$ is reflexive. In fact, he proved in this setting that the space $N^{1,p}(X)$ can be equipped with an equivalent uniformly convex norm, from which reflexivity follows. \nHis proof of reflexivity is, however, very difficult and based on the celebrated construction of a measurable differentiable structure. \nLater Keith \\cite{Keith} proved the existence of a measurable differentiable structure and hence, reflexivity of $N^{1,p}(X)$, $p\\in (1,\\infty)$, under the so-called Lip-lip condition.\nAs demonstrated by Heinonen \\cite[Section~12.5]{Hei07}, for general metric-measure spaces, $N^{1,p}(X)$, $p\\in (1,\\infty)$, need not be reflexive.\n\nA different approach to reflexivity was provided by Ambrosio, Colombo, and Di Marino \\cite{Ambrosio}. They proved reflexivity of $N^{1,p}(X)$, $p\\in(1,\\infty)$, under the assumptions that the metric space $X$ is metric-doubling, complete, and the measure $\\mu$ is finite on balls. They did not, however, assume that the space supports a $p$-Poincar\\'e inequality. \nIn fact, they proved reflexivity of a Sobolev type space $W^{1,p}(X)$ whose definition is based on a notion of $p$-relaxed slope, and they proved that the space is equivalent to $N^{1,p}(X)$ under the given assumptions.\nTheir proof is actually quite difficult since it involves methods of mass-transportation, gradient flows, $\\Gamma$-convergence, and Christ dyadic cubes, just to name a few. \nA simplification of this proof of reflexivity in the case when the space supports a $p$-Poincar\\'e inequality was obtained by Durand-Cartagena and Shanmugalingam \\cite{DurSha}; their proof follows arguments from \\cite{Ambrosio} and, in particular, they use $\\Gamma$-convergence and Christ dyadic cubes to construct an equivalent norm on $N^{1,p}(X)$ that is uniformly convex.\n\nThe purpose of this paper is to provide a further simplification of the proof of reflexivity of $N^{1,p}(X)$ when $p\\in (1,\\infty)$ and the space supports a $p$-Poincar\\'e inequality. In fact, we provide an explicit construction of an equivalent norm on $N^{1,p}(X)$, $p\\in [1,\\infty)$, which is uniformly convex when $p\\in (1,\\infty)$. Our construction of this norm is direct and it does not require $\\Gamma$-convergence nor Christ cubes.\n\nA brief outline of our construction is below. All details can be found in Section~\\ref{uc}.\n\nFor each $k\\in\\mathbb Z$, we select a covering of $X$ by balls $\\{B_i^k\\}_i$, of radii $2^{-k}$, such that the balls in the family $\\big\\{\\frac{1}{5}B_i^k\\big\\}_i$ are pairwise disjoint. We say that balls $B_i^k$ and $B_j^k$ are neighbors if $\\operatorname{dist}(B_i^k, B_j^k)<2^{-k}$, and we denote neighbors by $B_i^k\\sim B_j^k$.\nIt follows from the doubling condition that the number of neighbors \nof a given ball $B_i^k$\nis bounded by some constant $N\\in\\mathbb N$ that is independent of $k$.\n\nFor each $x\\in X$, there is a smallest index $i$ such that $x\\in B_i^k$, and we write $B^k[x]:=B^k_i$. Then for $p\\in [1,\\infty)$ and $u\\in L^1_{\\rm loc}(X)$ we define\n$$\n|T_k u(x)|_p:=\n2^k\\Bigg(\\sum_{j:B_j^k\\sim B^k[x]} |u_{B^k[x]}-u_{B_j^k}|^p\\Bigg)^{1/p},\n$$\nwhere the sum is taken over all neighbors of $B^k[x]=B_i^k$, and we set $|T_k u(x)|:=|T_k u(x)|_1$.\nFinally, we equip $N^{1,p}(X)$, $p\\in [1,\\infty)$ with a new norm,\n$$\n\\Vert u\\Vert_{1,p}^* :=\\Big(\\Vert u\\Vert_{L^p(X)}^p+\\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}^p\\Big)^{1/p}.\n$$\nThe main result of the paper reads as follows.\n\\begin{theorem}\n\\label{main}\nSuppose that the space $(X,d,\\mu)$ supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$. Then $\\Vert\\cdot\\Vert_{1,p}^*$ is an equivalent norm on $N^{1,p}(X)$. \nMoreover, if $p\\in(1,\\infty)$, then the space $N^{1,p}(X)$ with the equivalent norm $\\Vert\\cdot\\Vert_{1,p}^*$ is uniformly convex and hence, the space $N^{1,p}(X)$ is reflexive.\n\\end{theorem}\n\\noindent The notion of uniform convexity is recalled in Section~\\ref{Clar}.\n\nThe construction of the norm $\\Vert\\cdot\\Vert_{1,p}^*$ is different from, but related to, the constructions given in \\cite{Ambrosio,DurSha}. Recall that their constructions were less direct, as they required $\\Gamma$-convergence and Christ cubes. The equivalence of the norms when $p=1$ is, however, new.\nAs a corollary, we also prove\n\\begin{theorem}\n\\label{apes}\nSuppose that the space $(X,d,\\mu)$ supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$. Then the space $N^{1,p}(X)$ is separable.\n\\end{theorem}\nIt is well known that separability can be deduced from reflexivity when $p\\in(1,\\infty)$, see \\cite{Ch}, but separability in the case $p=1$ seems to be new.\n\nIt follows from the proof of Theorem~\\ref{main} (more specifically, Proposition~\\ref{gradcomparable}) that if $p\\in [1,\\infty)$, then there is $C\\geq 1$ such that\n$$\nC^{-1}\\Vert g_u\\Vert_{L^p(X)}\\leq \\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}\\leq C\\Vert g_u\\Vert_{L^p(X)},\n$$\nwhere $g_u$ is the minimal $p$-weak upper gradient of $u$. The next result shows not only a comparison of norms, but a pointwise comparison under the additional assumptions that $X$ is complete and $p>1$.\n\n\n\\begin{theorem}\n\\label{main2}\nSuppose that the space $(X,d,\\mu)$ is complete and supports a $p$-Poincar\\'e inequality for some $p\\in(1,\\infty)$. Then there exists a constant $C\\geq 1$ such that for every $u\\in N^{1,p}(X)$,\n$$\nC^{-1} g_u(x)\\leq \\limsup_{k\\to\\infty} |T_{k} u(x)|\\leq C g_u(x)\\quad\n\\text{for $\\mu$-a.e. $x\\in X$,}\n$$\nwhere $g_u\\in L^p(X)$ denotes the minimal $p$-weak upper gradient of $u$.\n\\end{theorem}\n\\begin{remark}\nNote that we could replace $|T_ku(x)|$ in Theorem~\\ref{main2} by $|T_ku(x)|_p$, because the number of neighbors is bounded by $N$ and all norms in $\\mathbb R^N$ are equivalent. However, in Theorem~\\ref{main} we have to work with $|T_ku|_p$ in order to guarantee uniform convexity of the norm.\n\\end{remark}\n\n\nThe paper is structured as follows. In Section~\\ref{prelims} we fix notation used in the paper, recall basic definitions, and state known results that will be used in the subsequent sections. In Section~\\ref{uc} we carefully explain the statement of the main result, Theorem ~\\ref{main}, and we prove it. In Section~\\ref{sepa} we prove Theorem~\\ref{apes} and finally, in Section~\\ref{ptwise} we prove Theorem~\\ref{main2}.\n\n\\subsection*{Acknowledgements}\nWe would like to thank Nicola Gigli from whom we learned Proposition~\\ref{reflextosep}, and Giorgio Metafune for helpful comments.\n\n\\section{Preliminaries}\n\\label{prelims}\n\n\\subsection{Notational conventions}\nLet $\\mathbb Z$ denote all integers and $\\mathbb N$ all (strictly) positive integers. By $C$ we denote a generic constant whose actual value may change from line to line.\nFor nonnegative quantities, $L,R\\geq 0$, the notation $L \\lesssim R$ will be used to express that there exists a constant $C>0$, perhaps dependent on other constants within the context, such that $ L \\le CR$. If $L \\lesssim R$ and simultaneously $R \\lesssim L$, then we will simply write $L \\approx R$ and say that the quantities $L$ and $R$ are \\emph{equivalent} (or \\emph{comparable}).\n\nThe characteristic function of a set $E$ will be denoted by $\\chi_E$.\n\nWe assume that all function spaces are linear spaces over the field of real numbers.\n\nWe use a convention that the names ``Theorem'' and ``Proposition'' are reserved for new results, while well-known results and results of technical character are called ``Lemma'' or ``Corollary''. \n\n\\subsection{Metric-measure spaces}\nA metric-measure space is a triplet $(X,d,\\mu)$ where $(X,d)$ is a metric space and $\\mu$ is a Borel measure such that $0<\\mu(B)<\\infty$ for every ball $B\\subseteq X$. We will assume that $\\mu$ is \\textit{Borel regular}, in the sense that every $\\mu$-measurable set is contained in a Borel set of equal measure. We will also assume that $\\mu$ is \\emph{doubling}, i.e., there is a constant $C_d\\geq 1$ such that $\\mu(2B) \\le C_d \\mu(B)$ for every ball $B \\subseteq X$. \n\nWe will need the following version of the Lebesgue differentiation theorem.\n\\begin{lemma}\n\\label{T5}\nAssume that $\\mu$ is a Borel regular doubling measure on $X$ and $u\\in L^1_{\\rm loc}(X)$. Then for $\\mu$-a.e. $x\\in X$ the following is true. If $\\{B_i\\}_i$ is a sequence of balls such that $x\\in B_i$ for all $i$ and $\\operatorname{diam}(B_i)\\to 0$ as $i\\to\\infty$, then\n\\begin{equation}\n\\label{eq3}\n\\lim_{i\\to\\infty}\\, \\mvint_{B_i} u\\, d\\mu=u(x).\n\\end{equation}\n\\end{lemma}\nEquality \\eqref{eq3} is satisfied whenever $x$ is a Lebesgue point of $u$.\nThis result is well known if $\\mu$ is the Lebesgue measure in $\\mathbb R^n$, but the standard proofs easily generalize to the case of metric-measure spaces equipped with a Borel regular doubling measure.\n\n\n\\subsection{Integrating along curves in metric spaces and modulus of the path family}\n\nBy a \\emph{curve} in $X$, we mean a continuous mapping $\\gamma\\colon[a,b]\\to X$. Given a curve $\\gamma$, the \\textit{image of} $\\gamma$ is denoted by $|\\gamma|:=\\gamma([a,b])$ and $\\ell(\\gamma)$ stands for the \\textit{length of} $\\gamma$. We will say that $\\gamma$ is \\textit{rectifiable} if $\\ell(\\gamma)<\\infty$ and the family of all non-constant rectifiable curves in $X$ will be denoted by $\\Gamma(X)$. Every $\\gamma\\in\\Gamma(X)$ admits a unique (orientation preserving) \\textit{arc-length parameterization} $\\widetilde{\\gamma}\\colon[0,\\ell(\\gamma)]\\to X$, and the arc-length parameterization is 1-Lipschitz; see, e.g., \\cite[Theorem~3.2]{Haj}. Given a curve $\\gamma\\in\\Gamma(X)$ and a Borel measurable function $\\varrho\\colon|\\gamma|\\to[0,\\infty]$, we define\n$$\n\\int_\\gamma\\varrho\\,ds:=\\int_0^{\\ell(\\gamma)}\\varrho(\\widetilde{\\gamma}(t))\\,dt.\n$$\nWe can naturally define the integral over a curve for a general function by considering the positive and negative parts of the function.\n\nLet $\\Gamma\\subseteq\\Gamma(X)$ and consider the collection $F(\\Gamma)$ of all Borel functions $\\varrho\\colon X\\to[0,\\infty]$ satisfying\n$$\n\\int_\\gamma\\varrho\\,ds\\geq1\\quad\\mbox{for all $\\gamma\\in\\Gamma$.}\n$$\nThen, for each $p\\in[1,\\infty)$, the $p$-\\textit{modulus of the family} $\\Gamma$ is defined as\n$$\n{\\rm Mod}_p(\\Gamma):=\\inf_{\\varrho\\in F(\\Gamma)}\\int_X\\varrho^p\\,d\\mu.\n$$\nNote that ${\\rm Mod}_p$ is an outer-measure on $\\Gamma(X)$ and, in particular, it is countably subadditive; see, e.g., \\cite[Theorem~5.2]{Haj}. A family of curves $\\Gamma\\subseteq\\Gamma(X)$ is called \\emph{$p$-exceptional} if ${\\rm Mod}_p(\\Gamma)=0$ and a statement is said to hold for \\emph{${\\rm Mod}_p$-a.e.} curve $\\gamma\\in\\Gamma(X)$ if the family of curves in $\\Gamma(X)$ for which this statement does not hold is $p$-exceptional.\n\nFor the next result, see \\cite[Proposition~2.45]{BjoBjo}.\nIt follows from H\\\"older's inequality and \\cite[Proposition~1.37(c)]{BjoBjo}.\n\\begin{lemma}\n\\label{except}\nIf a family of curves is $p$-exceptional for some $p\\in(1,\\infty)$, then it is\n$q$-exceptional for every $q\\in[1,p]$.\n\\end{lemma}\n\n\nWe will also need the following important result; see, e.g., \\cite[Theorem~5.7]{Haj} and \\cite[Lemma~2.1]{BjoBjo}.\n\\begin{lemma}[Fuglede's lemma]\n\\label{fuglede}\nLet $p \\in [1, \\infty)$ and assume that $\\{g_k\\}_{k=1}^\\infty$ is a sequence of Borel functions that converges in $L^p(X)$ to a Borel function $g\\in L^p(X)$. Then, there is a subsequence $\\{g_{k_i}\\}_{i=1}^\\infty$, such that for ${\\rm Mod}_p$-a.e.\\@ curve $\\gamma \\in \\Gamma(X)$, one has\n\\[\n\\int_\\gamma g_{k_i}\\,ds \\to \\int_\\gamma g\\,ds\\quad \\text{and} \\quad \\int_\\gamma |g_{k_i} - g|\\,ds \\to 0\\quad\\mbox{as $i\\to\\infty$,}\n\\]\nwhere all of the integrals are well defined and finite.\n\\end{lemma}\n\n\n\\subsection{Sobolev spaces in metric-measure spaces}\n\\label{subsect:sobolev}\nA Borel measurable function $g\\colon X \\to [0, \\infty]$ is called an \\emph{upper gradient} of a Borel measurable function $u\\colon X \\to [-\\infty,\\infty]$ if\n %\n\\begin{equation}\n\\label{eq:ug_def}\n|u(\\gamma(a)) - u(\\gamma(b))| \\le \\int_\\gamma g\\,ds,\n\\end{equation}\nfor every rectifiable curve $\\gamma\\colon [a,b]\\to X$, with the convention that $|(\\pm\\infty)-(\\pm\\infty)|=\\infty$. The function $g$ shall be referred to as a \\emph{$p$-weak upper gradient} of $u$, $p\\in[1,\\infty)$, if \\eqref{eq:ug_def} holds true for ${\\rm Mod}_p$-a.e.\\@ curve $\\gamma\\in\\Gamma(X)$.\n\n\nThe next result shows that $p$-weak upper gradients can be approximated by upper gradients in the $L^p$ norm; see e.g. \\cite[Lemma~6.3]{Haj}\n\\begin{lemma}\n\\label{T6}\nIf $g$ is a $p$-weak upper gradient of $u$ which is finite $\\mu$-a.e., then for every $\\varepsilon\\in (0,\\infty)$ there is an upper gradient $g_\\varepsilon$ of $u$ such that\n$$\ng_\\varepsilon\\geq g\\, \\text{ pointwise everywhere in $X$}\n\\quad\n\\text{and}\n\\quad\n\\Vert g_\\varepsilon-g\\Vert_{L^p(X)}<\\varepsilon.\n$$\n\\end{lemma}\n\n\n\nFor $p\\in[1,\\infty)$ we define $\\widetilde{N}^{1,p}(X)$, to be the space of all Borel measurable functions $u: X \\to [-\\infty,\\infty]$ for which\n\\begin{equation}\n \\label{eq:def-N1p-norm}\n\\|u\\|_{N^{1,p}(X)}:=\\Big(\\| u \\|^p_{L^p(X)} + \\inf_g \\|g\\|^p_{L^p(X)}\\Big)^{1/p} < \\infty,\n\\end{equation}\nwhere the infimum is taken over all upper gradients $g$ of $u$. \nEquivalently,\nwe can take the infimum over all $p$-weak upper gradients in \\eqref{eq:def-N1p-norm} since every $p$-weak upper gradient can be approximated in $L^p$ by upper gradients (Lemma~\\ref{T6}).\n\n\n\nThe functional $\\|\\cdot\\|_{N^{1,p}(X)}$ is a seminorm on $\\widetilde{N}^{1,p}$ and a norm on ${N}^{1,p}(X) := \\widetilde{N}^{1,p}(X)/\\mathord\\sim$, where the equivalence relation $u\\sim v$ is given by $\\|u-v\\|_{N^{1,p}(X)} = 0$. Furthermore, the space ${N}^{1,p}(X)$ is complete and thus a Banach space, see \\cite[Theorem~3.7]{Sha}.\n\nFor the next result see, e.g., \\cite[Corollary~7.7]{Haj}.\n\\begin{lemma}\n\\label{T3}\nIf $u,v\\in\\widetilde{N}^{1,p}(X)$ and $u=v$ pointwise $\\mu$-a.e. in $X$, then $u\\sim v$ i.e., the two functions define the same element in $N^{1,p}(X)$.\n\\end{lemma}\n\nFor $p\\in[1,\\infty)$, every $u\\in N^{1,p}(X)$ has a \\textit{minimal $p$-weak upper gradient} $g_u\\in L^p(X)$ in the sense that if $g\\in L^p(X)$ is another $p$-weak upper gradient of $u$, then $g\\geq g_u$ pointwise $\\mu$-a.e. in $X$, see, e.g., \\cite[Theorem~7.16]{Haj}. Hence, the infimum in \\eqref{eq:def-N1p-norm} is attained with $g_u$, which is given uniquely up to pointwise a.e. equality.\n\n\n\n\n\nRecall that the \\emph{pointwise lower Lipschitz-constant} of a function $\\eta\\colon X\\to\\mathbb{R}$ is given by\n\\begin{equation}\n\\label{lillip}\n{\\rm lip}\\,\\eta(x):=\\liminf_{r\\to 0^+}\\sup_{y\\in B(x,r)}\\frac{|\\eta(x)-\\eta(y)|}{r},\\quad x\\in X. \n\\end{equation}\nFor the next lemma, see, e.g., \\cite[Lemma~6.7]{Haj} or \\cite[Lemma~6.2.6]{HKST}.\n\\begin{lemma}\n\\label{T1}\n${\\rm lip}\\,\\eta$ is an upper gradient of any Lipschitz continuous function $\\eta$ on a metric space.\n\\end{lemma}\n\n\\begin{lemma}\n\\label{leibniz}\nFix $p\\in[1,\\infty)$ and suppose that $u\\in N^{1,p}(X)$ and $\\eta\\colon X\\to\\mathbb{R}$ is a bounded Lipschitz function. Then $\\eta u\\in N^{1,p}(X)$ and the function $h:=|\\eta|g_u+|u|\\,{\\rm lip}\\,\\eta$ is a $p$-weak upper gradient for $\\eta u$, where $g_u\\in L^p(X)$ is the minimal $p$-weak upper gradient of $u$.\n\\end{lemma} \n\n\\begin{proof}\nThe Leibniz rule for $p$-weak upper gradients, \\cite[Lemma~6.3.28]{HKST}, the fact that functions in $N^{1,p}(X)$ are absolutely continuous on $\\operatorname{Mod}_p$-a.e. curve, \\cite[Lemma~7.6]{Haj}, and Lemma~\\ref{T1}\nimply that the function $h:=|\\eta| g_u+|u|\\operatorname{lip} \\eta$\nis a $p$-weak upper gradient for $\\eta u$. Since $\\eta u\\in L^p(X)$ and $h\\in L^p(X)$, it follows that $\\eta u\\in N^{1,p}(X)$.\n\\end{proof}\n\n\nWe say that $(X,d,\\mu)$ supports a \\emph{$p$-Poincar\\'e inequality}, $p\\in[1,\\infty)$, if there exist constants $c_{{\\mathrm{PI}}} > 0$ and $\\lambda \\geq 1$ such that\n\\begin{equation}\n\\label{poincare}\n\\mvint_B |u-u_B|\\,d\\mu \\le c_{{\\mathrm{PI}}}\\operatorname{diam} (B) \\Biggl(\\,\\,\\mvint_{\\lambda B} g^p\\,d\\mu\\Biggr)^{1/p}\n\\end{equation}\nfor all balls $B\\subseteq X$, all Borel functions $u\\in L^1_{\\mathrm{loc}}(X)$, and all upper gradients $g$ of $u$. Recall that we always assume that $\\mu$ is a Borel regular doubling measure in this setting.\nIn this situation we say that the space supports a $p$-Poincar\\'e inequality with constants $c_{{\\mathrm{PI}}}$ and $\\lambda$.\n\n\n\nThe following lemma is an immediate consequence of \nLemma~\\ref{T6}.\n\\begin{lemma}\n\\label{pwkpoin}\nSuppose that $X$ supports a $p$-Poincar\\'e inequality for some $p\\in[1,\\infty)$. \nIf $u\\in L^1_{\\rm loc}(X)$ is Borel, then\n\\begin{equation}\n\\label{poin-p-weak}\n\\mvint_B |u-u_B|\\,d\\mu \\le c_{{\\mathrm{PI}}}\\operatorname{diam} (B) \\Biggl(\\,\\,\\mvint_{\\lambda B} g^p\\,d\\mu\\Biggr)^{1/p},\n\\end{equation}\nfor all balls $B\\subseteq X$ and all $p$-weak upper gradients $g$ of $u$ that are finite $\\mu$-a.e.\n\\end{lemma}\n\n\n\\subsection{Uniformly convex spaces} \n\\label{Clar}\nWe begin with a definition due to Clarkson \\cite{Clark}.\n\nWe say that a normed space $(Z,\\|\\cdot\\|)$ is \\emph{uniformly convex} if for every $\\varepsilon\\in(0,\\infty)$, there exists $\\delta\\in(0,\\infty)$ with the property that $\\|x + y\\| \\le 2(1-\\delta)$ whenever $x,y\\in Z$ satisfy $\\|x\\|=\\|y\\| =1$, and $\\|x - y\\| > \\varepsilon$. \n\nFrom a geometric point of view, uniform convexity implies that the boundary of the unit ball does not contain any segments and that the unit ball is, in a sense, uniformly ``round''. \n\nThe next result is well known, but it is not easy to find a proof in the literature.\n\\begin{lemma}\n\\label{unifconvex-clsd}\nA normed space $(Z,\\|\\cdot\\|)$ is uniformly convex if and only if for every $\\varepsilon\\in(0,\\infty)$, there exists $\\delta\\in(0,\\infty)$ with the property that $\\|x + y\\| \\le 2(1-\\delta)$ whenever $x,y\\in Z$ satisfy $\\|x\\|\\leq1$, $\\|y\\|\\leq 1$, and $\\|x - y\\| > \\varepsilon$. \n\\end{lemma}\n\n\\begin{proof}\nOne direction is clear. To see the other, fix $\\varepsilon\\in(0,\\infty)$ and suppose that $x,y\\in Z$ satisfy $\\|x\\|,\\|y\\|\\leq1$ and $\\|x-y\\|>\\varepsilon$. Since $Z$ is uniformly convex, there is $\\tilde{\\delta}\\in(0,\\infty)$ associated to the choice of $\\varepsilon/3$. Let $\\delta:=\\min\\{\\varepsilon/6,\\tilde{\\delta}/3\\}$. If either $\\|x\\|\\leq1-2\\delta$ or $\\|y\\|\\leq1-2\\delta$, then $\\|x+y\\|\\leq2(1-\\delta)$. If $\\|x\\|,\\|y\\|>1-2\\delta$, then $\\tilde{x}:={x}/{\\|x\\|}$ and $\\tilde{y}:={y}/{\\|y\\|}$ satisfy $\\Vert x-\\tilde{x}\\Vert,\\Vert y-\\tilde{y}\\Vert<2\\delta$, and hence, $\\|\\tilde{x}-\\tilde{y}\\|>\\varepsilon-4\\delta\\geq \\varepsilon/3$. \nSince $\\Vert\\tilde{x}\\Vert=\\Vert\\tilde{y}\\Vert=1$, uniform convexity yields\n$\\|\\tilde{x}+\\tilde{y}\\|\\leq2(1-\\tilde{\\delta})$ and hence, $\\|x+y\\|\\leq\\|x-\\tilde{x}\\|+\\|\\tilde{x}+\\tilde{y}\\|+\\|\ny-\\tilde{y}\\|\\leq2(1-\\delta)$. \n\\end{proof}\n\nA clever proof of the next result that avoids the use of Clarkson's inequalities can be found in\n\\cite[Proposition~2.4.19]{HKST}.\n\\begin{lemma}[Clarkson]\n\\label{Lp}\n$L^p(X)$ is uniformly convex for $p\\in(1,\\infty)$.\n\\end{lemma}\nFor a proof of the following theorem, see, e.g., \\cite[Theorem~2.4.9]{HKST}.\n\\begin{lemma}[Milman--Pettis' theorem]\n\\label{milmanpettis}\nEvery uniformly convex Banach space is reflexive.\n\\end{lemma}\n\nBy $\\ell^p_M$ we will denote $\\mathbb R^M$ with the norm $|x|_p:=\\big(\\sum_{j=1}^M|x_j|^p\\big)^{1/p}$, where $x=(x_1,\\ldots,x_M)$, and so $L^p(X,\\ell^p_M)$ is a Banach space equipped with the norm\n\\begin{equation}\n\\label{reoi}\n\\Phi(f):=\\Bigg(\\sum_{j=1}^M\\|f_j\\|_{L^p(X)}^p\\Bigg)^{1/p}, \n\\quad\n\\text{where}\n\\quad\nf=(f_1,\\ldots,f_M).\n\\end{equation}\n\\begin{corollary}\n\\label{clarkson}\nIf $p\\in(1,\\infty)$ and $M\\in\\mathbb N$, then $L^p(X,\\ell^p_M)$ is uniformly convex.\n\\end{corollary}\n\\begin{proof}\nSince $L^p(X,\\ell^p_M)$ is isometric to $L^p(X_M)$, where \n$$\nX_M=X\\sqcup X\\sqcup\\ldots\\sqcup X=X\\times\\{1,2,\\ldots,M\\}\n$$ \nis the disjoint union of $M$ copies of the measure space $(X,\\mu)$, the result follows from Lemma~\\ref{Lp}.\n\\end{proof}\n\n\n\\subsection{Dunford-Pettis theorem}\n\nRecall that a family of $\\mu$-measurable functions $\\mathcal{F}$ is said to be \\emph{equi-integrable} if for every $\\varepsilon\\in(0,\\infty)$ there exists $\\delta\\in(0,\\infty)$ such that for every $\\mu$-measurable set $S \\subseteq X$ with $\\mu(S) < \\delta$ we have\n\\begin{equation}\n\\label{it:equiint2}\n\\sup_{f\\in\\mathcal{F}}\\int_S |f|\\,d\\mu < \\varepsilon.\n\\end{equation}\n\n\nThe proof for the following version of the Dunford--Pettis theorem can be found in, e.g., \\cite[Theorem~2.54]{FL07}.\n\n\\begin{lemma}[Dunford--Pettis' theorem]\n\\label{thm:DunfordPettis}\nLet $\\mathcal{F}\\subseteq L^1(X)$. \nThen every sequence in $\\mathcal{F}$ has a subsequence that is weakly convergent in $L^1(X)$ if and only if\nthe following two conditions are satisfied:\n\\begin{enumerate}\n\\item[{\\rm(a)}] $\\mathcal{F}$ is bounded in $L^1(X)$ and equi-integrable;\n\\vskip.08in\n\\item[{\\rm (b)}] for every $\\varepsilon\\in(0,\\infty)$ one can find a $\\mu$-measurable set $E\\subseteq X$ such that $\\mu(E)<\\infty$ and\n\\begin{equation}\n\\label{it:equiint1}\n\\sup_{f\\in\\mathcal{F}}\\,\\int_{X\\setminus E} |f|\\,d\\mu < \\varepsilon.\n\\end{equation}\n\\end{enumerate}\n\\end{lemma}\n\\begin{remark}\nObserve that whenever $\\mu(X) < \\infty$, condition (b) is trivially satisfied by setting $E = X$, which is why it is omitted in literature that discusses the Dunford--Pettis theorem over spaces of finite measure.\n\\end{remark}\n\n\\section{The main result}\n\\label{uc}\n\nIn this section we will carefully record all of the notation and technical lemmata used in the poof the main result, Theorem~\\ref{main}, and then we will prove it. The reader is reminded that we are always assuming $(X,d,\\mu)$ is a metric-measure space, where $\\mu$ is a Borel regular doubling measure. However, unless explicitly stated, we do not assume that the space supports a $p$-Poincar\\'e inequality.\n\n\\subsection{Notation} \nFor each $k\\in\\mathbb Z$, let $\\{B_i\\}_{i=1}^{M(k)}$, $M(k)\\in\\mathbb{N}\\cup\\{\\infty\\}$, be a covering of $X$ by balls of radius $2^{-k}$ such that the balls in the family $\\{\\frac{1}{5}B_i\\}_{i=1}^{M(k)}$ are pairwise disjoint. The existence of such coverings follows from the familiar $5r$-covering lemma.\n\nThe doubling property of the measure $\\mu$ implies that for each fixed $\\theta\\in[1,\\infty)$, the family $\\{\\theta B_i\\}_{i=1}^{M(k)}$ of enlarged balls has bounded overlapping, in the sense that there exists a constant $C_0\\in[1,\\infty)$ such that $\\sum_i\\chi_{\\theta B_i}(x)\\leq C_0$ for every $x\\in X$. \nNote that $C_0$ depends only on $\\theta$ and $C_d$ (the doubling constant of $\\mu$). In particular, $C_0$ is independent of $k$. \n\nFor each $k\\in\\mathbb Z$, we have a different family of balls (referred to as \\textit{balls of generation $k$}) and we will write $B^k_i:=B_i$ if we wish to stress for which $k\\in\\mathbb Z$ the family was constructed.\n\nWe say that balls $B^k_i$ and $B^k_j$ are \\textit{neighbors} if $\\operatorname{dist}(B^k_i,B^k_j)<2^{-k}$, and we will write $B^k_i\\sim B^k_j$ in this case. Note that there exists $N\\in\\mathbb N$, such that each ball has at most $N$ neighbors, where $N$ depends only on the doubling constant of $\\mu$ and, in particular, is independent of $k$. \n\nIf $B_{i,1},\\dots,B_{i,{n_i}}$, $n_i< N$ are \\textit{all} of the neighbors of $B_i$ then we set\n$$\nB_{i,{n_i+1}},\\dots,B_{i,{N}}:=B_i.\n$$\nThat is, we set the last $N-n_i$ balls in the sequence $\\{B_{i,j}\\}_{j=1}^N$ to be identical copies of $B_i$. While this construction is somewhat formal, for reasons that will be clear later, we need to have the same number of balls ``around'' each of the $B_i$'s. \n\nLet $A_1:=B_1$ and $A_i:=B_i\\setminus(B_1\\cup\\cdots\\cup B_{i-1})$ for each $i\\geq2$. Then $X=\\bigcup_{i=1}^{M(k)}A_i$, and so, in particular, $\\{A_i\\}_{i=1}^{M(k)}$ is a partition of $X$ into pairwise disjoint sets. For each $x\\in X$, there is a unique $i$ such that $x\\in A_i$. In other words, $i$ is the smallest index such that $x\\in B_i$. As such, we define $B[x]:=B_i$ and set $B[x,j]:=B_{i,j}$ for $j\\in[1,N]$. In particular, $B[x,j]=B[x]$ if $j\\in(n_i,N]$.\n\n\nFor $u\\in L^1_{\\mathrm{loc}}(X)$ and $k\\in\\mathbb Z$, we define\n\\begin{equation}\n\\label{ukdef}\nS_ku:=\n\\sum_{i=1}^{M(k)}u_{B_i}\\,\\chi_{A_i},\n\\end{equation}\nand note that $S_ku(x)=u_{B[x]}$ for each $x\\in X$. According to Lebesgue's differentiation theorem (Lemma~\\ref{T5}), $S_ku\\to u$ pointwise $\\mu$-a.e. in $X$ as $k\\to\\infty$. \n\nFor $u\\in L^1_{\\mathrm{loc}}(X)$ and $k\\in\\mathbb Z$, we also define\n\\begin{equation*}\n\\label{Tkdef}\nT_ku(x):=2^k\\big[u_{B[x]}-u_{B[x,1]},\\dots,u_{B[x]}-u_{B[x,N]}\\big]\\in\\mathbb R^N,\n\\end{equation*}\nfor $x\\in X$, or equivalently,\n$$\nT_ku:=2^k\\sum_{i=1}^{M(k)}\\big[u_{B_i}-u_{B_{i,1}},\\dots,u_{B_i}-u_{B_{i,N}}\\big]\\chi_{A_i}.\n$$\nObserve that if $x\\in A_i$ and $n_i0$ such that \nif $u\\in L^1_{\\rm loc}(X)$ is Borel measurable and $g$ is a $p$-weak upper gradient of $u$ that is finite $\\mu$-a.e., then for each $k\\in\\mathbb Z$, we have\n\\begin{equation}\n\\label{tkpwest}\n|T_ku(x)|_p\\leq \nC\\Bigg(\\,\\mvint_{5\\lambda B[x]} g^p\\,d\\mu\\Bigg)^{1/p}\n\\quad\\mbox{for all $x\\in X$.}\n\\end{equation}\nTherefore, there is a constant $C'=C'(p,C_d,C_{{\\mathrm{PI}}},\\lambda)>0$ such that\n\\begin{equation}\n\\label{tkpwest-2}\n\\big\\Vert|T_ku|_p\\big\\Vert_{L^p(X)}\\leq C'\\Vert g\\Vert_{L^p(X)}.\n\\end{equation}\nConsequently, if $u\\in N^{1,p}(X)$, then the sequence $\\{T_ku\\}_{k\\in\\mathbb Z}$ is bounded in $L^p(X,\\ell^p_N)$.\n\\end{lemma}\n\n\n\\begin{proof}\nFix $k\\in\\mathbb Z$ and $x\\in X$, along with $j\\in[1,N]$. Since $B[x]$ and $B[x,j]$ are neighbors (by definition) we have that $B[x,j]\\subseteq 5 B[x]\\subseteq 10 B[x,j]$. Therefore, the doubling condition of $\\mu$ implies that $\\mu(B[x,j]) \\approx \\mu(5B[x])$.\nApplying Lemma~\\ref{pwkpoin} to the pair\n$(u,g)$ we can estimate\n\\begin{equation*}\n\\begin{split}\n|u_{B[x]} - u_{B[x,j]}| \n&\\leq\n|u_{B[x]} - u_{5B[x]}|+|u_{5B[x]} - u_{B[x,j]}|\\\\\n&\\lesssim \n\\mvint_{5B[x]} |u-u_{5B[x]}|\\,d\\mu\n\\leq C 2^{-k} \\Bigg(\\,\\,\\mvint_{5\\lambda B[x]} g^p\\,d\\mu \\Bigg)^{1/p},\n\\end{split}\n\\end{equation*}\nwhere $C\\in(0,\\infty)$ depends only on $C_d$ and $c_{{\\mathrm{PI}}}$. From the formula for $|T_ku|_p$, we have\n\\begin{equation}\n\\label{eq5}\n|T_ku(x)|_p\n\\leq 2^{k}\\Bigg(\\sum_{j=1}^NC^p2^{-kp}\\mvint_{5\\lambda B[x]} g^p\\,d\\mu\\Bigg)^{1/p}\n=C \\cdot N^{1/p}\\Bigg(\\,\\mvint_{5\\lambda B[x]} g^p\\,d\\mu\\Bigg)^{1/p},\n\\end{equation}\nwhere $C$ and $N$ only depend on $C_d$ and $c_{{\\mathrm{PI}}}$.\nThis proves \\eqref{tkpwest}.\n\nTurning our attention to proving \\eqref{tkpwest-2}, observe that estimate \\eqref{eq5} is equivalent to\n\\begin{equation}\n\\label{eq6}\n|T_ku(x)|_p\\leq \nC\\cdot N^{1/p}\\Bigg(\\sum_{i=1}^{M(k)}\\chi_{A_i}(x)\\mvint_{5\\lambda B_i} g^p\\,d\\mu\\Bigg)^{1/p},\n\\end{equation}\nbecause the right hand sides of \\eqref{eq5} and \\eqref{eq6} are equal.\nThe bounded overlapping of the family of enlarged balls $\\{5\\lambda B_i\\}_{i=1}^{M(k)}$ and the fact that $A_i\\subseteq B_i$ together yield\n\\[\n\\big\\Vert|T_ku|_p\\big\\Vert_{L^p(X)}^p\\leq \nC^pN \\sum_{i=1}^{M(k)}\\mu(A_i) \\Bigg(\\,\\,\\mvint_{5\\lambda B_i} g^p\\,d\\mu\\Bigg)\\leq\nC^pN \\sum_{i=1}^{M(k)} \\,\\,\\int_{5\\lambda B_i} g^p\\,d\\mu \\leq\nC'\\Vert g\\Vert_{L^p(X)}^p,\n\\]\nwhere $C'=C'(p,C_d,C_{{\\mathrm{PI}}},\\lambda)$.\n\nFinally, if $u\\in N^{1,p}(X)$ then \\eqref{tkpwest-2} applied with $g=g_u\\in L^p(X)$ proves boundedness of \n$\\{T_ku\\}_{k\\in\\mathbb Z}$ in $L^p(X,\\ell^p_N)$.\n\\end{proof}\n\n\nThe next result follows immediately from the definition of $\\Vert\\cdot\\Vert_{1,p}^\\ast$ in \\eqref{newnorm}, Lemma~\\ref{T3}, and Lemma~\\ref{TkBDD}.\n\n\\begin{corollary}\n\\label{objectprops}\nSuppose that the space supports a $p$-Poincar\\'e inequality for some $p\\in[1,\\infty)$. Then $\\Vert\\cdot\\Vert_{1,p}^*\\colon N^{1,p}(X)\\to [0, \\infty)$ as in \\eqref{newnorm} is a well-defined norm \non $N^{1,p}(X)$ and there exists $C\\in(0,\\infty)$ satisfying\n$$\n\\Vert u\\Vert_{1,p}^*\\leq C\\|u\\|_{N^{1,p}(X)}\\quad\\mbox{for all $u\\in N^{1,p}(X)$.}\n$$\n\\end{corollary}\n\n\n\nThe reader is reminded of the definition of $S_ku$ in \\eqref{ukdef}.\n\\begin{lemma}\n\\label{pro:T_ku-converges-balls}\nLet $u \\in N^{1,p}(X)$ with $p \\in [1, \\infty)$ and assume that $\\{h_k\\}_{k=1}^\\infty$ is a sequence of nonnegative Borel functions in $L^p(X)$ such that \n\\begin{equation}\n\\label{bwr-249}\n|S_ku(x)-S_ku(y)|\\leq \\int_\\gamma h_k\\,ds,\n\\end{equation}\nwhenever $k\\in\\mathbb N$, $x,y\\in X$ satisfy $d(x,y)\\geq 2^{-k}$, and $\\gamma$ is a rectifiable curve connecting $x$ and $y$. If $\\{h_k\\}_{k=1}^\\infty$ contains a subsequence that converges weakly in $L^p(X)$ to some nonnegative Borel function $h \\in L^p(X)$, then $h$ is a $p$-weak upper gradient of $u$ and hence, $h\\ge g_u$ pointwise $\\mu$-a.e.\\@ in $X$, where $g_u \\in L^p(X)$ is the minimal $p$-weak upper gradient of $u$. \n\\end{lemma}\n\n\\begin{proof}\nWithout loss of generality, we can assume that $\\{h_k\\}_{k=1}^\\infty$ converges weakly in $L^p(X)$ to some Borel function $h \\in L^p(X)$. Then by Mazur's lemma (see, e.g., \\cite[p.~19]{HKST}), there exists a sequence\n\\[\ng_k := \\sum_{\\ell=k}^{L(k)} \\alpha_{k,\\ell} h_\\ell \\to h\\,\\,\\text{ in $L^p(X)$ as }k\\to\\infty,\n\\]\nwhere $\\alpha_{k,\\ell}\\geq0$ and $\\sum_{\\ell=k}^{L(k)} \\alpha_{k,\\ell} = 1$ for each $k\\in\\mathbb N$ (with $L(k)\\in\\mathbb N$). By further passing to a subsequence, if necessary, we can assume that $g_k\\to h$ pointwise $\\mu$-a.e. in $X$ as $k\\to\\infty$. Consider the corresponding family of convex combinations of $S_ku$, $v_k := \\sum_{\\ell=k}^{L(k)} \\alpha_{k,\n\\ell}S_\\ell u$. Since $S_ku \\to u$ pointwise $\\mu$-a.e. in $X$ by Lebesgue's differentiation theorem (Lemma~\\ref{T5}), we have that $v_k \\to u$ pointwise $\\mu$-a.e.\\@ in $X$, as well. \n\nClearly, if $d(x,y)\\geq 2^{-k}$ for some $k\\in\\mathbb N$ and $\\gamma\\in\\Gamma(X)$ connects $x$ and $y$, then by \\eqref{bwr-249} we have\n\\begin{equation}\n\\label{eq14}\n|v_k(x)-v_k(y)|\\leq \\int_\\gamma g_k\\, ds.\n\\end{equation}\nDefine $\\tilde{u}\\colon X\\to[-\\infty,\\infty]$ by setting $\\tilde{u}(x) := \\limsup_{k\\to\\infty} v_k(x)$ for every $x\\in X$ and note that $\\tilde{u}=u$ pointwise $\\mu$-a.e. in $X$. We will prove that $\\tilde{u}$ is finite everywhere on the image $|\\gamma|$ for ${\\rm Mod}_p$-a.e curve $\\gamma\\in\\Gamma(X)$. To this end, by Fuglede's lemma (Lemma~\\ref{fuglede}), there is a set $\\Gamma_1\\subseteq\\Gamma(X)$ with ${\\rm Mod}_p(\\Gamma_1)=0$ and a subsequence of $\\{g_k\\}_{k=1}^\\infty$ (also denoted by $\\{g_k\\}_{k=1}^\\infty$) such that\n\\begin{equation}\n\\label{rwq43}\n\\int_\\gamma g_k\\,ds \\to \\int_\\gamma h\\,ds \\,\\in\\mathbb R\n\\quad\n\\mbox{as $k\\to\\infty$,}\n\\end{equation}\nfor every curve $\\gamma\\in\\Gamma(X)\\setminus\\Gamma_1$. Next, let $E$ be the set of all $x\\in X$ for which the convergence $v_k(x)\\to u(x)\\in\\mathbb R$ does not hold, and set\n$$\n\\Gamma_2:=\\big\\{\\gamma\\in\\Gamma(X):\\ |\\gamma|\\subseteq E\\big\\}.\n$$\nNote that $E\\subseteq X$ is $\\mu$-measurable and $\\mu(E)=0$, which implies that $\\|\\infty\\cdot\\chi_E\\|_{L^p(X)}=0$. This, together with the observation that $\\infty\\cdot\\chi_E\\in F(\\Gamma_2)$, immediately gives ${\\rm Mod}_p(\\Gamma_2)=0$ and hence, ${\\rm Mod}_p(\\Gamma_1\\cup\\Gamma_2)=0$. \n\nNow fix a curve $\\gamma\\in\\Gamma(X)\\setminus(\\Gamma_1\\cup\\Gamma_2)$\nand let $\\widetilde{\\gamma}$ be the arc-length parameterization of $\\gamma$. We claim that the sequence $\\{v_k(\\widetilde{\\gamma}(s))\\}_{k=1}^\\infty$ of real numbers is bounded for every $s\\in[0,\\ell(\\gamma)]$. Let $s\\in[0,\\ell(\\gamma)]$ and note that since $\\gamma\\not\\in\\Gamma_2$, there is a point $t\\in[0,\\ell(\\gamma)]$ such that $\\widetilde{\\gamma}(t)\\not\\in E$. By definition of the set $E$, we have that $v_k(\\widetilde{\\gamma}(t))\\to u(\\widetilde{\\gamma}(t))\\in\\mathbb R$. In particular, $\\{v_k(\\widetilde{\\gamma}(t))\\}_{k=1}^\\infty$ is a bounded sequence. To proceed, it is enough to consider the scenario when $s\\leq t$ as the other case is handled similarly. If $\\widetilde{\\gamma}(s)=\\widetilde{\\gamma}(t)$ then $\\{v_k(\\widetilde{\\gamma}(s))\\}_{k=1}^\\infty$ is bounded by the choice of $t$. If, on the other hand, $\\widetilde{\\gamma}(s)\\neq\\widetilde{\\gamma}(t)$ then we have $d(\\widetilde{\\gamma}(t),\\widetilde{\\gamma}(s))\\geq2^{-k}$ for all sufficiently large $k\\in\\mathbb{N}$ and so, by appealing to \\eqref{eq14} we can write\n\\begin{equation}\n\\label{rqi-47}\n\\begin{split}\n\\big|v_k(\\widetilde{\\gamma}(s))\\big|\n&\\leq\n\\big|v_k(\\widetilde{\\gamma}(s))-v_k(\\widetilde{\\gamma}(t))\\big|+\\big|v_k(\\widetilde{\\gamma}(t))\\big|\n\\\\\n&\\leq\\int_s^tg_k(\\widetilde{\\gamma}(\\tau))d\\tau+\\big|v_k(\\widetilde{\\gamma}(t))\\big|\n\\leq\\int_\\gamma g_k\\,ds+\\big|v_k(\\widetilde{\\gamma}(t))\\big|.\n\\end{split}\n\\end{equation}\nSince $\\gamma\\not\\in\\Gamma_1$, we have that $\\int_\\gamma g_k\\,ds$ converges to the finite number \\eqref{rwq43}, as $k\\to\\infty$ and hence, is bounded. Therefore, the right-hand side of \\eqref{rqi-47} is bounded by a finite constant that is independent of $k$, and it follows that $\\{v_k(\\widetilde{\\gamma}(s))\\}_{k=1}^\\infty$ is a bounded sequence for each fixed $s\\in[0,\\ell(\\gamma)]$. Consequently, $\\tilde{u}$ is finite on the image $|\\gamma|$ whenever $\\gamma\\in\\Gamma(X)\\setminus(\\Gamma_1\\cup\\Gamma_2)$.\n\n\nMoving on, we claim next that $h$ is a $p$-weak upper gradient of $\\tilde{u}$. Fix $\\gamma\\in\\Gamma(X)\\setminus(\\Gamma_1\\cup\\Gamma_2)$ and let $x,y \\in X$ be the end-points of $\\gamma$. If $x=y$, then the inequality $|\\tilde{u}(x) - \\tilde{u}(y)| \\leq \\int_\\gamma h\\,ds$ is trivially satisfied, since $\\tilde{u}(x)=\\tilde{u}(y)\\in\\mathbb{R}$. If $x \\neq y$, then $d(x,y)\\geq 2^{-k}$ for all $k\\in\\mathbb{N}$, large enough, and so \\eqref{eq14} is satisfied. Since $\\gamma\\not\\in\\Gamma_1\\cup\\Gamma_2$, we have that \\eqref{rwq43} holds and $\\tilde{u}(x),\\tilde{u}(y)\\in\\mathbb{R}$. As such, we can estimate\n\\[\n |\\tilde{u}(x) - \\tilde{u}(y)|\\leq\\limsup_{k\\to\\infty} |v_k(x) - v_k(y)| \\leq \\limsup_{k\\to\\infty} \\int_\\gamma g_k\\,ds = \\int_\\gamma h\\,ds\\,.\n\\]\nTherefore, $h \\in L^p(X)$ is a $p$-weak upper gradient of $\\tilde{u}$, and hence $\\tilde{u} \\in N^{1,p}(X)$. Since $u=\\tilde{u}$ pointwise $\\mu$-a.e. in $X$ and both $u$ and $\\tilde{u}$ belong to $N^{1,p}(X)$, the function $h$ is also a $p$-weak upper gradient of $u$ by Lemma~\\ref{T3}. Therefore, $h \\ge g_u$ pointwise $\\mu$-a.e.\\@ in $X$ by the definition of a minimal $p$-weak upper gradient. This completes the proof of Lemma~\\ref{pro:T_ku-converges-balls}.\n\\end{proof}\n\n\nWe will show that the sequence $h_k:=4|T_k u|_p$ satisfies the hypotheses of Lemma~\\ref{pro:T_ku-converges-balls}. We first verify estimate \\eqref{bwr-249}.\n\n\\begin{lemma}\n\\label{almostug}\nLet $u\\in L^{1}_{\\mathrm{loc}}(X)$ and suppose that $\\gamma$ is a rectifiable curve in $X$ with endpoints $x$ and $y$. If $d(x,y)\\geq 2^{-k}$ for some $k\\in\\mathbb Z$, then\n$$\n|S_ku(x)-S_ku(y)|\\leq 4\\int_\\gamma |T_ku|_p\\,ds,\n$$\nwhere $S_ku$ is as in \\eqref{ukdef}.\n\\end{lemma}\n\\begin{proof}\nWe can assume that $\\gamma:[0,L]\\to X$ is parametrized by arc-length and $x=\\gamma(0)$, $y=\\gamma(L)$.\nConsider a partition\n$$\n0=t_01$, we can rely on the reflexivity of $L^p$ and Lemma~\\ref{TkBDD}. However, the case of $p=1$ is more delicate; it relies on the\nDunford--Pettis theorem (Lemma~\\ref{thm:DunfordPettis}) and some ideas from \\cite{FraHajKos}.\n\n\nFor the next result, see also \\cite[Lemma~6]{FraHajKos}. We will only need it for $p=1$.\n\\begin{lemma}\n\\label{lem:Tku-equiint-balls}\nSuppose that the space supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$. If $u\\in N^{1,p}(X)$, then the every subsequence of $\\{|T_ku|_p^p \\}_{k=1}^\\infty$ has a further subsequence that is weakly convergent in $L^1(X)$.\n\\end{lemma}\n\\begin{proof}\nFix $u\\in N^{1,p}(X)$.\nWe will prove that $\\{|T_ku|_p^p \\}_{k=1}^\\infty$ satisfies (a) and (b) in Lemma~\\ref{thm:DunfordPettis}.\n\nTo verify (b), fix $\\varepsilon\\in(0,\\infty)$ and $k\\in\\mathbb N$. Since inequality \\eqref{tkpwest} is satisfied with $g=g_u\\in L^p(X)$, we have\n$$\n|T_k u(x)|_p^p \\lesssim \n\\mvint_{5\\lambda B[x]} g_u^p\\,d\\mu=\n\\sum_{i=1}^{M(k)}\\chi_{A_i}(x)\\,\\mvint_{5\\lambda B_i} g_u^p\\,d\\mu\n\\quad\\mbox{for every $x \\in X$.}\n$$\nConsequently, since $A_i \\subseteq B_i$, for every measurable set $S\\subseteq X$, we have\n\\begin{equation}\n\\label{eq8}\n\\int_S |T_k u|_p^p\\,d\\mu \\lesssim \\sum_{i=1}^{M(k)} \\frac{\\mu(S \\cap B_i)}{\\mu(5\\lambda B_i)} \\int_{5\\lambda B_i} g_u^p\\,d\\mu\n\\leq\\sum_{i:\\, S\\cap B_i\\neq\\varnothing}\\ \\int_{5\\lambda B_i} g_u^p\\, d\\mu.\n\\end{equation}\n\nFix $x_o\\in X$, $R>6\\lambda$, and let $S_R:=X\\setminus B(x_o,2R)$.\nEach of the balls $B_i$ has radius $2^{-k}<1\\leq\\lambda$. Thus, if $S_R\\cap B_i\\neq\\varnothing$, then \n$5\\lambda B_i\\cap B(x_o,R)=\\varnothing$, by a simple application of the triangle inequality, and hence \\eqref{eq8} and the bounded overlapping of the balls $\\{ 5\\lambda B_i\\}_i$ yield\n$$\n\\int_{X\\setminus B(x_o,2R)} |T_ku|_p^p\\, d\\mu\\lesssim \\int_{X\\setminus B(x_o,R)} g_u^p\\, d\\mu<\\varepsilon,\n$$\nprovided $R$ is sufficiently large. This proves condition (b) in Lemma~\\ref{thm:DunfordPettis} with $E:=B(x_o,2R)$.\n\nNext, we prove that condition (a) holds.\nNote that Lemma~\\ref{TkBDD} implies that $\\{|T_ku|_p^p \\}_{k=1}^\\infty$ is bounded in $L^1(X)$.\nThus it remains to prove that the family is equi-integrable. \n\nFix $\\varepsilon\\in(0,\\infty)$ and $k\\in\\mathbb N$, and let $\\sigma\\in(0,\\infty)$ be any number. The value of $\\sigma$ will be fixed later.\n\nGiven a $\\mu$-measurable set $S \\subseteq X$, we define $\\mathcal{G}$ to be the collection of all integers $i\\in[1,M(k)]$ satisfying $\\mu(S \\cap B_i) \\le \\sigma\\mu(5\\lambda B_i)$, and we let $\\mathcal{B}$ consist of all integers $i\\in[1,M(k)]\\setminus\\mathcal{G}$. \nNote that $\\mu(5\\lambda B_i)<\\mu(S\\cap B_i)/\\sigma$ for all $i\\in \\mathcal{B}$.\nThus $\\mathcal{G}$ and $\\mathcal{B}$ partition the set of integers in $[1,M(k)]$ and \\eqref{eq8} yields\n\\begin{equation}\n\\label{eq9}\n\\int_S |T_k u|_p^p\\,d\\mu \\leq C_1\\Bigg(\n\\sigma\\sum_{i\\in\\mathcal{G}}\\,\\int_{5\\lambda B_i} g_u^p\\, d\\mu +\n\\sum_{i\\in\\mathcal{B}}\\, \\int_{5\\lambda B_i} g_u^p\\, d\\mu \\Bigg)\\, ,\n\\end{equation}\nwhere the constant $C_1$ does not depend on $\\sigma$ or $k$.\n\nAssume that the overlapping constant of the balls $\\{5\\lambda B_i\\}_i$ is bounded by $C_2$. Now we fix $\\sigma\\in (0,\\infty)$ such that\n$$\nC_2\\sigma\\Vert g_u\\Vert_p^p<\\frac{\\varepsilon}{2C_1}\\, .\n$$\nThen the first sum in \\eqref{eq9} can be estimated by\n\\begin{equation}\n\\label{eq10}\n\\sigma\\sum_{i\\in\\mathcal{G}}\\,\\int_{5\\lambda B_i} g_u^p\\, d\\mu\\leq C_2\\sigma\\int_X g_u^p\\, d\\mu<\\frac{\\varepsilon}{2C_1}\\, .\n\\end{equation}\nRegarding the second sum in \\eqref{eq9}, we have\n\\begin{equation}\n\\label{eq11}\n\\sum_{i\\in\\mathcal{B}}\\, \\int_{5\\lambda B_i} g_u^p\\, d\\mu\\leq C_2 \\int_G g_u^p\\, d\\mu\n\\quad\n\\text{where}\n\\quad\nG:=\\bigcup_{i\\in\\mathcal{B}} 5\\lambda B_i.\n\\end{equation}\nNote that\n\\begin{equation}\n\\label{eq12}\n\\mu(G)\\le\\sum_{i\\in\\mathcal{B}}\\mu(5\\lambda B_i)\\leq \\sum_{i\\in\\mathcal{B}} \\frac{\\mu(S \\cap B_i)}{\\sigma} \\le \\frac{C_2\\,\\mu(S)}{\\sigma}\\,.\n\\end{equation}\nAbsolute continuity of the integral yields $\\tilde{\\delta}\\in (0,\\infty)$ such that\n\\begin{equation}\n\\label{eq13}\n\\int_G g_u^p\\, d\\mu < \\frac{\\varepsilon}{2C_1C_2}\\, ,\n\\end{equation}\nprovided $\\mu(G)<\\tilde{\\delta}$.\n\nLet $\\delta:=\\sigma\\tilde{\\delta}/C_2$. If $\\mu(S)<\\delta$, then $\\mu(G)<\\tilde{\\delta}$ by \\eqref{eq12} and hence \\eqref{eq13} is satisfied. This, in concert with \\eqref{eq9}, \\eqref{eq10}, \\eqref{eq11}, and \\eqref{eq13}, yield\n$$\n\\int_S |T_ku|_p^p\\, d\\mu<\nC_1\\Big(\\frac{\\varepsilon}{2C_1}+C_2\\, \\frac{\\varepsilon}{2 C_1C_2}\\Big)=\\varepsilon,\n$$\nand that completes the proof of the equi-integrability and the proof of Lemma~\\ref{lem:Tku-equiint-balls}.\n\\end{proof}\n\n\n\n\n\\begin{corollary}\n\\label{cor:subseq-T_k-wkconv}\nSuppose the space supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$. If $u \\in N^{1,p}(X)$ then, every subsequence of $\\{|T_k u|_p\\}_{k=1}^\\infty$ has a further subsequence that converges weakly in $L^p(X)$.\n\\end{corollary}\n\n\n\\begin{proof}\nIf $p>1$, then the sequence $\\{|T_k u|_p\\}_{k=1}^\\infty$ is bounded in $L^p(X)$ by Lemma~\\ref{TkBDD}, and\nthe result follows from the reflexivity of $L^p(X)$.\nIf $p=1$, then the existence of a weakly convergent subsequence is guaranteed by Lemma~\\ref{lem:Tku-equiint-balls}.\n\\end{proof}\n\n\\subsection{Proof of the main result}\n\\label{ssec:proof}\n\\begin{proof}[Proof of Theorem~\\ref{main}]\nWe need to prove that:\n\\begin{itemize}[label={\\footnotesize\\textbullet}]\n\\item $\\Vert\\cdot\\Vert_{1,p}^*\\approx\\Vert\\cdot\\Vert_{N^{1,p}(X)}$ on $N^{1,p}(X)$ when $p\\in [1,\\infty)$;\n\\vskip.08in\n\n\\item the norm $\\Vert\\cdot\\Vert_{1,p}^*$ is uniformly convex on $N^{1,p}(X)$ when $p\\in (1,\\infty)$.\n\\end{itemize}\nThen reflexivity of $N^{1,p}(X)$, $p\\in (1,\\infty)$, will follow directly from the Milman-Pettis theorem, Lemma~\\ref{milmanpettis}. \n\nTherefore, the proof of Theorem~\\ref{main} is contained in Proposition~\\ref{gradcomparable} and Proposition~\\ref{F-UniCon} below.\n\n\\begin{proposition}\n\\label{gradcomparable}\nSuppose the space supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$.\nThen there exists $C=C(p,C_d,c_{{\\mathrm{PI}}},\\lambda)\\in(0,\\infty)$ such that\n$$\n4^{-1}\\|g_u\\|_{L^p(X)}\\leq \\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}\\leq C\\|g_u\\|_{L^p(X)},\n$$\nfor all $u\\in N^{1,p}(X)$. Consequently, $\\Vert u\\Vert_{1,p}^*\\approx\\Vert u\\Vert_{N^{1,p}(X)}$ for all $u\\in N^{1,p}(X)$.\n\\end{proposition}\n\n\\begin{proof}\nFix $u \\in N^{1,p}(X)$ and let $g_u \\in L^p(X)$ denote the minimal $p$-weak upper gradient of $u$. In view of Lemma~\\ref{TkBDD}, we immediately have that\n\\begin{equation}\n\\label{eq4}\n\\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}\\leq C\\|g_u\\|_{L^p(X)}\n\\end{equation}\nfor some $C=C(p,C_d,c_{{\\mathrm{PI}}},\\lambda)\\in(0,\\infty)$. \n\nTo see the opposite inequality, take a subsequence $\\{|T_{k_j} u|_p\\}_{j=1}^\\infty$ of $\\{|T_{k} u|_p\\}_{k=1}^\\infty$ such that\n\\begin{equation}\n\\label{liminf-est}\n\\lim_{j\\to\\infty} \\big\\Vert|T_{k_j} u|_p\\big\\Vert_{L^p(X)}\n=\\liminf_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}.\n\\end{equation}\nIn light of Corollary~\\ref{cor:subseq-T_k-wkconv}, by passing to a further subsequence, we can assume $\\{|T_{k_j} u|_p\\}_{j=1}^\\infty$ converges weakly in $L^p(X)$.\nLet $|T|(u) \\in L^p(X)$ be a Borel representative of the weak limit of $\\{|T_{k_j} u|_p\\}_{j=1}^\\infty$ and set $h_k:=4|T_{k} u|_p$ and $h:=4|T|(u)$. Note that $h$ and each $h_k$ are nonnegative Borel functions. Since $\\{h_{k_j}\\}_{j=1}^\\infty$ converges weakly to $h$ in $L^p(X)$, by appealing to Lemma~\\ref{almostug}, we can conclude that the pair $\\big(\\{h_k\\}_{k=1}^\\infty, h\\big)$ satisfies the hypotheses of Lemma~\\ref{pro:T_ku-converges-balls}. Therefore, we have that $h$ is a $p$-weak upper gradient of $u$ and hence, $h\\geq g_u$ pointwise $\\mu$-a.e. in $X$. Combining this fact with \n\\eqref{liminf-est} and the lower semicontinuity of the $L^p$-norm (with respect to the weak convergence), we can estimate\n\\begin{equation}\n\\label{svv-2}\n\\begin{split}\n\\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)} \n&\\ge \n\\liminf_{k\\to\\infty} \\big\\Vert |T_k u|_p\\big\\Vert_{L^p(X)} \n= 4^{-1}\\lim_{j\\to\\infty}\\Vert h_{k_j}\\Vert_{L^p(X)}\\\\\n&\\ge \n4^{-1}\\|h\\|_{L^p(X)} \\ge 4^{-1}\\|g_u\\|_{L^p(X)}.\n\\end{split}\n\\end{equation}\nThe proof of Proposition~\\ref{gradcomparable} is now complete.\n\\end{proof}\n\n\n\\begin{remark}\n\\label{liminf-limsup-equiv}\nCombining \\eqref{eq4} and \\eqref{svv-2} we can conclude that for $p\\in[1,\\infty)$, there is a finite constant $\\xi=\\xi(p,C_d,c_{{\\mathrm{PI}}},\\lambda)\\ge1$ satisfying\n\\begin{equation*}\n\\label{eq:liminf-limsup-equiv}\n\\liminf_{k\\to\\infty}\\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}\\le \\limsup_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)} \\le \\xi\\liminf_{k\\to\\infty} \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)},\n\\end{equation*}\nfor every $u\\in N^{1,p}(X)$. \n\\end{remark}\n\nWe will now proceed to showing that $\\Vert\\cdot\\Vert_{1,p}^*$ is uniformly convex on $N^{1,p}$ when $p\\in(1,\\infty)$.\n\n\\begin{proposition}\n\\label{F-UniCon}\nSuppose the space supports a $p$-Poincar\\'e inequality for some $p\\in (1,\\infty)$. Then the\nnorm $\\Vert\\cdot\\Vert_{1,p}^*$ is uniformly convex on $N^{1,p}(X)$. In particular, the Banach space $(N^{1,p}(X),\\|\\cdot\\|_{N^{1,p}(X)})$ is reflexive. \n\\end{proposition}\n\n\n\\begin{proof}\nFix $\\varepsilon\\in(0,\\infty)$. We will first prove that there exists $\\delta\\in(0,\\infty)$ such that $\\Vert u + v\\Vert_{1,p}^*\\leq 2(1-\\delta)$ whenever $u, v \\in N^{1,p}(X)$ satisfy $\\Vert u\\Vert_{1,p}^*<1$, $\\Vert v\\Vert_{1,p}^*<1$, and $\\Vert u - v\\Vert_{1,p}^*>\\varepsilon$. Fix $u, v \\in N^{1,p}(X)$ as above. Then by definition of $\\Vert\\cdot\\Vert_{1,p}^*$, we have that\n\\begin{equation}\n\\label{xwui-29}\n\\left(\\|u\\|_{L^p(X)}^p + \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p} <1\n\\qquad\n\\text{and}\n\\qquad\\left(\\|v\\|_{L^p(X)}^p + \\big\\Vert|T_{k} v|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p} < 1\n\\end{equation}\nfor all sufficiently large $k\\in\\mathbb N$. In light of Remark~\\ref{liminf-limsup-equiv}, we can estimate \n\\begin{align*}\n\\varepsilon<\\Vert u - v\\Vert_{1,p}^* & \\le \\left(\\|u-v\\|_{L^p(X)}^p + \\xi^p \\liminf_{k\\to\\infty} \\big\\Vert|T_{k}(u-v)|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p} \\\\\n & \\le \\xi \\left(\\|u-v\\|_{L^p(X)}^p + \\liminf_{k\\to\\infty} \\big\\Vert|T_{k}u-T_{k}v|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p},\n\\end{align*}\nwhere we have used the fact that $\\xi\\ge1$ and $T_k$ is linear in obtaining the last inequality. Consequently, \\eqref{xwui-29} and\n\\begin{equation}\n\\label{xwui-30}\n\\big(\\|u-v\\|_{L^p(X)}^p + \\big\\Vert|T_{k}u -T_{k}v|_p\\big\\Vert_{L^p(X)}^p\\big)^{1/p} > \\varepsilon/\\xi,\n\\end{equation}\nhold true for all sufficiently large $k\\in\\mathbb N$. Fix such a $k$. Since $T_{k}u$ and $T_{k}v$ are vectors in $\\mathbb{R}^N$, we can write\n$$\n\\big\\Vert|T_{k}u|_p\\big\\Vert_{L^p(X)}^p=\\sum_{\\ell=1}^N\\big\\Vert T^\\ell_{k}u\\big\\Vert_{L^p(X)}^p\n\\quad\n\\mbox{and}\n\\quad\n\\big\\Vert|T_{k}v|_p\\big\\Vert_{L^p(X)}^p=\\sum_{\\ell=1}^N\\big\\Vert T^\\ell_{k}v\\big\\Vert_{L^p(X)}^p,\n$$\nwhere $T_{k}u=(T^1_{k}u,\\dots,T^N_{k}u)$ and $T_{k}v=(T^1_{k}v,\\dots,T^N_{k}v)$. \nTherefore, if we let\n$$\nf:=(u,T^1_{k}u,\\dots,T^N_{k}u)\\qquad\\mbox{and}\\qquad\ng:=(v,T^1_{k}v,\\dots,T^N_{k}v),\n$$\nthen $f,g\\in L^p(X,\\ell^p_{N+1})$ and, with $\\Phi$ defined as in \\eqref{reoi}, a rewriting of \\eqref{xwui-29} and \\eqref{xwui-30} yields\n\\begin{equation*}\n\\begin{gathered}\n\\Phi(f)=\\left(\\|u\\|_{L^p(X)}^p + \\big\\Vert|T_{k} u|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p}<1,\n\\quad\n\\Phi(g)=\\left(\\|v\\|_{L^p(X)}^p + \\big\\Vert|T_{k} v|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p} < 1\\\\\n\\mbox{and}\n\\quad\n\\Phi(f-g)=\\left(\\|u-v\\|_{L^p(X)}^p + \\big\\Vert|T_{k}u - T_{k}v|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p}>\\varepsilon/\\xi.\n\\end{gathered}\n\\end{equation*}\nBy Corollary~\\ref{clarkson}, $L^p(X,\\ell^p_{N+1})$ is uniformly convex and so (keeping in mind Lemma~\\ref{unifconvex-clsd}) there exists $\\delta\\in(0,\\infty)$, which depends on $\\varepsilon$ and $\\xi$, but is independent of $f$ and $g$ (in particular, $\\delta$ is independent of $u$, $v$, and $k$), such that \n\\begin{equation}\n\\label{gamq-58}\n\\left(\\|u+v\\|_{L^p(X)}^p + \\big\\Vert|T_{k} (u+v)|_p\\big\\Vert_{L^p(X)}^p\\right)^{1/p} = \\Phi(f+g) \\le 2(1-\\delta).\n\\end{equation}\nNote that we have used the linearity of $T_{k}$ in obtaining the equality in \\eqref{gamq-58}. Given that \\eqref{gamq-58} holds for all sufficiently large $k\\in\\mathbb N$, it follows that $\\Vert u+v\\Vert_{1,p}^* \\le 2(1-\\delta)$. \n\nTo complete the proof of the proposition, suppose that $u, v \\in N^{1,p}(X)$ are such that $\\Vert u\\Vert_{1,p}^*=\\Vert v\\Vert_{1,p}^*=1$, and $\\Vert u - v\\Vert_{1,p}^*>\\varepsilon$. Then, for all $\\theta\\in(0,1)$ sufficiently close to 1, we have that $\\theta u, \\theta v \\in N^{1,p}(X)$ satisfy $\\Vert\\theta u\\Vert_{1,p}^*,\\Vert\\theta v\\Vert_{1,p}^*<1$ and $\\Vert\\theta u - \\theta v\\Vert_{1,p}^*>\\varepsilon$. As such, we have $\\Vert \\theta u + \\theta v\\Vert_{1,p}^*\\leq 2(1-\\delta)$ by what has been established above. Since $\\delta$ is independent of $\\theta$, passing to the limit as $\\theta\\to1^-$ yields $\\Vert u + v\\Vert_{1,p}^*\\leq 2(1-\\delta)$. Given that $\\varepsilon\\in(0,\\infty)$ was arbitrary, it follows that $\\Vert\\cdot\\Vert_{1,p}^*$ is a uniformly convex norm on $N^{1,p}(X)$.\n\nFinally, the assertion that $(N^{1,p}(X),\\|\\cdot\\|_{N^{1,p}(X)})$ is reflexive follows as an immediate consequence of the Milman--Pettis theorem (see Lemma~\\ref{milmanpettis}) and the fact that a reflexive space remains reflexive for an equivalent norm. The proof of Proposition~\\ref{F-UniCon} is now complete.\n\\end{proof} \n\\noindent This completes the proof of Theorem~\\ref{main}.\n\\end{proof}\n\n\n\n\\section{Separability from reflexivity}\n\\label{sepa}\n\nIn this section we will prove separability of $N^{1,p}(X)$ for $p\\in[1,\\infty)$ (Theorem~\\ref{apes}). In its proof we will employ a general result that provides a mechanism for using reflexivity to establish separability, see Proposition~\\ref{reflextosep}. Recall that we always assume that the measure on $X$ is doubling and Borel regular.\n\nThroughout this section, all vector spaces are over the field of real numbers. Also, as a notational convention, if $S$ is a set of vectors then we let $\\operatorname{span}_{\\mathbb{Q}}S$ and $\\operatorname{span} S$ denote the set of all finite linear combinations of vectors in $S$ with coefficients in $\\mathbb Q$ and $\\mathbb R$, respectively.\n\n\\begin{proposition}\n\\label{reflextosep}\nIf $T\\colon V\\to W$ is a linear and bounded injective map of a reflexive Banach space $V$ into a separable normed space $W$, then $V$ is separable.\n\\end{proposition}\n\n\\begin{proof}\nIt suffices to prove that the unit ball $B\\subseteq V$ is separable. Given that $W$ is separable, there is a set $\\{v_k: k\\in\\mathbb{N}\\}\\subseteq B$ such that the set $\\{T(v_k): k\\in\\mathbb{N}\\}$ is dense in $T(B)$. Since $\\operatorname{span}_{\\mathbb{Q}}\\{v_k: k\\in\\mathbb{N}\\}\\subseteq\\operatorname{span}\\{v_k: k\\in\\mathbb{N}\\}$ is countable and dense, we conclude that $\\operatorname{span}\\{v_k: k\\in\\mathbb{N}\\}$ is separable and hence, it suffices to prove that\n\\begin{equation}\n\\label{tqi-348}\n\\operatorname{span}\\{v_k: k\\in\\mathbb{N}\\}\\cap B\\,\\,\\mbox{ is dense in $B$.}\n\\end{equation}\nLet $v\\in B$. Then, there exists a sequence $\\{v_{k_i}\\}_i\\subseteq B$ such that $T(v_{k_i})\\to T(v)$ in $W$ as $i\\to\\infty$. Since $\\{v_{k_i}\\}_i$ is bounded in $V$ and $V$ is reflexive, by passing to a subsequence, we can assume that $\\{v_{k_i}\\}_i$ converges weakly in $V$ to some $\\tilde{v}\\in V$. Then Mazur's lemma yields a sequence of convex combinations that converge to $\\tilde{v}$ in the norm on $V$:\n\\begin{equation}\n\\label{rhq-26}\n\\operatorname{span}\\{v_k: k\\in\\mathbb{N}\\}\\cap B\\ni\\sum_{j=i}^{L(i)}\\alpha_{i,j}v_{k_j}\\to \\tilde{v}\\,\\,\\,\\mbox{ in $V$ as $i\\to\\infty$,}\n\\end{equation}\nwhere $\\alpha_{i,j}\\geq 0$ and $\\sum_{j=i}^{L(i)}\\alpha_{i,j}=1$ with $L(i)\\in\\mathbb N$. Appealing to the boundedness and linearity of $T$, we have\n$$\nT(\\tilde{v})=\\lim_{i\\to\\infty}T\\Bigg(\\sum_{j=i}^{L(i)}\\alpha_{i,j}v_{k_j}\\Bigg)\n=\\lim_{i\\to\\infty}\\sum_{j=i}^{L(i)}\\alpha_{i,j}T(v_{k_j})=T(v).\n$$\nSince $T$ is injective, we conclude that $\\tilde{v}=v$ and \\eqref{tqi-348} now follows from \\eqref{rhq-26} because $v\\in B$ was chosen arbitrarily.\n\\end{proof}\n\n\n\n\\begin{lemma}\n\\label{lipdensity}\nSuppose that the space $(X,d,\\mu)$ supports a $p$-Poincar\\'e inequality for some $p\\in [1,\\infty)$. Then\nthe space $\\operatorname{Lip}_b(X)$ of Lipschitz functions with bounded support\nis a dense subset of $N^{1,p}(X)$.\n\\end{lemma}\n\nFor a proof see \\cite[Corollary~5.15]{BjoBjo}. In fact, they proved density of compactly supported Lipschitz functions under the additional assumption that the space $X$ is complete. Without assuming completeness of $X$, the same proof gives density of Lipschitz functions with bounded support. Completeness of $X$, since the space is equipped with a doubling measure, implies that bounded and closed sets are compact and hence Lipschitz functions with bounded support have compact support.\n\nLemma~\\ref{lipdensity} follows also from Theorem~8.2.1 and the proof of Proposition~7.1.35 in \\cite{HKST}.\n\nWe are now ready to present the\n\n\n\\begin{proof}[Proof of Theorem~\\ref{apes}]\nSuppose first that $p>1$. Note that $(X,d)$ is a separable metric space since $\\mu$ is a doubling measure on $X$ and so, $L^p(X)$ is separable by \\cite[Proposition~3.3.55]{HKST}. Clearly, the identity mapping $\\iota\\colon N^{1,p}(X)\\to L^p(X)$ is a linear and bounded injective map. Now, since the space $N^{1,p}(X)$ is reflexive by Proposition~\\ref{F-UniCon}, Proposition~\\ref{reflextosep} immediately implies that $N^{1,p}(X)$ is separable.\n\nSuppose next that $p=1$ and fix $q\\in(1,\\infty)$. \nIt follows from H\\\"older's inequality that $X$ supports a $q$-Poincar\\'e inequality and by what we have already shown, $N^{1,q}(X)$ is separable so, there is a dense subset $\\{\\psi_i:\\, i\\in\\mathbb N\\}$ of $N^{1,q}(X)$.\n\nFix $x_o\\in X$ and for each $k\\in\\mathbb N$ choose a Lipschitz function with bounded support $\\eta_k\\in\\operatorname{Lip}_b(X)$ such that $\\eta_k\\equiv 1$ on $B(x_o,k)$. We will prove that $\\mathcal{F}:=\\{\\eta_k\\psi_i:\\, k,i\\in\\mathbb N\\}$ is a dense subset of $N^{1,1}(X)$.\n\n\n\nWe first need to show that $\\mathcal{F}\\subseteq N^{1,1}(X)$. Fix $k,i\\in\\mathbb{N}$. It follows from Lemma~\\ref{leibniz} that $\\eta_k\\psi_i\\in N^{1,q}(X)$ and $h_{k,i}:=|\n\\eta_k|g_{\\psi_i}+|\\psi_i|\\,{\\rm lip}\\,\\eta_k$ is a $q$-weak (hence, also $1$-weak by Lemma~\\ref{except}) upper gradient for $\\eta_k\\psi_i$, where $g_{\\psi_i}\\in L^q(X)$ is the minimal $q$-weak upper gradient for $\\psi_i\\in L^q(X)$. Since $\\eta_k\\in{\\rm Lip}_b(X)$, it follows from \\eqref{lillip} that ${\\rm lip}\\,\\eta_k$ is a bounded function with bounded support. Therefore, by H\\\"older's inequality we can conclude that $\\eta_k\\psi_i, h_{k,i}\\in L^{1}(X)$ and hence, $\\eta_k\\psi_i\\in N^{1,1}(X)$, as wanted.\n\n\nIn light of Lemma~\\ref{lipdensity} it suffices to prove that any Lipschitz function with bounded support can be approximated in the $N^{1,1}$ norm by functions in $\\mathcal{F}$.\nFix $u\\in\\operatorname{Lip}_b(X)$ and let $k_o\\in\\mathbb N$ be such that ${\\rm supp}\\, u\\subseteq B(x_o,k_o)$, so $\\eta_{k_o}u=u$ pointwise in $X$. Since $u\\in N^{1,q}(X)$, there is a sequence $\\{\\psi_{i_j}\\}_j$ such that $\\psi_{i_j}\\to u$ in $N^{1,q}(X)$ as $j\\to\\infty$. Then it easily follows from Lemma~\\ref{leibniz} that\n$$\nu-\\eta_{k_o}\\psi_{i_j}=\\eta_{k_o}(u-\\psi_{i_j})\\to 0\n\\quad\n\\text{in $N^{1,1}(X)$ as $j\\to\\infty$.}\n$$\nThis completes the proof of Theorem~\\ref{apes}.\n\\end{proof}\n\n\n\n\\section{Pointwise estimates}\n\\label{ptwise}\n\nThe purpose of this section is to prove Theorem~\\ref{main2}. In order to do so, it suffices to prove the following theorem.\n\\begin{theorem}\n\\label{ptwisethm}\nFix $p\\in(1,\\infty)$ and suppose that $X$ supports a $q$-Poincar\\'e inequality for some $q\\in[1,p)$. Then there exists a constant $C\\geq 1$ such that for all $u\\in N^{1,p}(X)$,\n\\begin{equation}\n\\label{rqi-34}\nC^{-1}g_u(x)\\leq\\limsup_{k\\to\\infty} |T_{k} u(x)|\\leq Cg_u(x)\\quad\n\\mbox{for $\\mu$-a.e. $x\\in X$,}\n\\end{equation}\nwhere $g_u\\in L^p(X)$ denotes the minimal $p$-weak upper gradient of $u$.\n\\end{theorem}\n\nIndeed, Theorem~\\ref{ptwisethm} and the following deep result due to Keith and Zhong \\cite{KeithZhong} \n(see also \\cite{Er}, \\cite[Theorem~12.3.9]{HKST})\nimmediately yield Theorem~\\ref{main2}.\n\n\n\\begin{lemma}[Keith and Zhong]\n\\label{KZ}\nLet $(X,d,\\mu)$ be a complete metric-measure space that supports a $p$-Poincar\\'e inequality for some $p\\in(1,\\infty)$. Then there exists $q\\in[1,p)$ such that $X$ supports a $q$-Poincar\\'e inequality.\n\\end{lemma}\n\n\n\n\nIn the proof of Theorem~\\ref{ptwisethm} we will make use of the \\emph{Hardy-Littlewood maximal operator} of a function $g\\in L^1_{{\\rm loc}}(X)$ which is defined by\n$$\n\\big(\\mathscr{M}g\\big)(x):=\\sup_{r>0}\\mvint_{B(x,r)}|g|\\,d\\mu\\quad\\mbox{for all $x\\in X$.}\n$$\nWe will use the boundedness of the maximal function in $L^p(X)$, \\cite[Theorem~3.5.6]{HKST}:\n\\begin{lemma}\n\\label{T4}\nIf $\\mu$ is a doubling measure on a metric space $X$ and $p\\in (1,\\infty]$, then\nthere is a constant $C$ depending on $p$ and the doubling constant of the measure only, such that\n$\\Vert \\mathscr{M}g\\Vert_{L^p(X)}\\leq C\\Vert g\\Vert_{L^p(X)}$ for all $g\\in L^p(X)$.\n\\end{lemma}\n\n\n\\begin{proof}[Proof of Theorem~\\ref{ptwisethm}]\nAssume that the $q$-Poincar\\'e inequality holds with constants $c_{{\\mathrm{PI}}}'$ and $\\lambda'$.\nSince all norms in $\\mathbb R^N$ are equivalent, it suffices to prove\nthat there exists a constant $C\\geq 1$ such that for all $u\\in N^{1,p}(X)$,\n\\begin{equation}\n\\label{eq7}\nC^{-1}g_u(x)\\leq\\limsup_{k\\to\\infty} |T_{k} u(x)|_p\\leq Cg_u(x)\\quad\n\\mbox{for $\\mu$-a.e. $x\\in X$.}\n\\end{equation}\nFix $u\\in N^{1,p}(X)$. The second inequality in \\eqref{eq7} follows from \\eqref{tkpwest} and the Lebesgue differentiation theorem (Lemma~\\ref{T5}) whenever $x$ is a Lebesgue point of $g_u^p$. Indeed, it is immediate from H\\\"older's inequality that $X$ supports a $p$-Poincar\\'e inequality and so, Lemma~\\ref{pwkpoin} implies the pair $(u,g_u)$ satisfies the $p$-Poincar\\'e inequality \\eqref{poincare}. \n\n\nThere remains to prove the first inequality in \\eqref{eq7}. Our plan in this regard is to apply Lemma~\\ref{pro:T_ku-converges-balls} with $h_k:=4\\sup_{j\\geq k}|T_ju|_p$ and $h:=4\\limsup_{k\\to\\infty} |T_{k} u|_p$ in order to conclude that $h$ is a $p$-weak upper gradient for $u$. To this end, first observe that clearly each $h_k$ and $h$ are nonnegative Borel functions. Moreover, Lemma~\\ref{almostug} implies that if $d(x,y)\\geq 2^{-k}$ for some $x,y\\in X$ and $k\\in\\mathbb{N}$, and $\\gamma$ is a rectifiable curve connecting $x$ and $y$, then\n$$\n|S_ku(x)-S_ku(y)|\\leq \\int_\\gamma 4|T_ku|_p\\,ds\n\\leq \\int_\\gamma h_k\\,ds,\n$$\nwhere $S_ku$ is as in \\eqref{ukdef}. Hence, \\eqref{bwr-249} in Lemma~\\ref{pro:T_ku-converges-balls} holds. \n\nNext, we claim that $\\{h_k\\}_{k\\in\\mathbb{N}}$ converges to $h$ in $L^p(X)$. \n\nSince $g_u$ is a $p$-weak upper gradient of $u$, it is also a $q$-weak upper gradient by Lemma~\\ref{except}. Since $g_u$ is finite $\\mu$-a.e., \\eqref{tkpwest} in Lemma~\\ref{TkBDD} (used here with $q$ in place of $p$) yields\n\\begin{equation}\n\\label{ejr-27}\n|T_ku|_p\\lesssim|T_ku|_q\\lesssim\n\\Bigg(\\,\\,\\mvint_{5\\lambda' B[x]}g_u^q\\,d\\mu\\Bigg)^{1/q}\n\\lesssim\\big(\\mathscr{M}g_u^q\\big)^{1/q}\\quad\\mbox{pointwise on $X$,}\n\\end{equation}\nwhere the implicit constant is independent of $k$. Note that in \\eqref{ejr-27}, the first inequality is a consequence of the fact that all norms on $\\mathbb R^N$ are equivalent, and the last inequality follows from doubling condition and the definition of $\\mathscr{M}$. Therefore, we have that $h_k\\lesssim\\big(\\mathscr{M}g_u^q\\big)^{1/q}$ pointwise on $X$ for every $k\\in\\mathbb{N}$. On the other hand, since $g_u^q\\in L^{p/q}(X)$ and $p/q>1$, the boundedness of $\\mathscr{M}$ on $L^{p/q}(X)$ (Lemma~\\ref{T4}) implies that $h_k\\lesssim(\\mathscr{M}g_u^q)^{1/q}\\in L^p(X)$. Clearly, $\\{h_k\\}_{k\\in\\mathbb{N}}$ converges pointwise to $h$ and so, by Lebesgue's dominated convergence theorem, we have that $h_k\\to h$ in $L^p(X)$. In particular, $\\{h_k\\}_{k\\in\\mathbb{N}}$ converges weakly to $h$ in $L^p(X)$. Therefore, $\\{h_k\\}_{k\\in\\mathbb{N}}$ and $h$ satisfy the hypotheses of Lemma~\\ref{pro:T_ku-converges-balls}, and it follows that $h$ is a $p$-weak upper gradient for ${u}$ which, in turn, implies that $h \\ge g_u$ pointwise $\\mu$-a.e.\\@ in $X$ by the definition of a minimal $p$-weak upper gradient. This completes the proof of the first inequality in \\eqref{eq7} and, in turn, the proof of Theorem~\\ref{ptwisethm}.\n\\end{proof}\n\n\n\n\n", "meta": {"timestamp": "2022-08-31T02:05:00", "yymm": "2208", "arxiv_id": "2208.13932", "language": "en", "url": "https://arxiv.org/abs/2208.13932"}} {"text": "\\section{Introduction}\n\\label{SEC:Intro}\nThe next generation of large and deep continuum radio surveys will produce catalogues with multi-million radio sources.\nThis will have both a huge impact on our understanding of the evolution of galaxies and a large potential for new discoveries.\nThe majority of these surveys will use advanced radio interferometers, including the Australian Square Kilometre Array Pathfinder \\citep[ASKAP:][]{johnston07ASKAP,DeBoer09,hotan21}, \nthe Murchison Widefield Array \\citep[MWA:][]{tingay13,wayth18}, MeerKAT \\citep{jonas16}, the Low Frequency\nArray \\citep[LOFAR:][]{vanharleem13} and the Karl G. Jansky Very Large Array \\citep[JVLA:][]{perley11}.\nThese instruments have already shown their capability to survey hundreds of square degrees of radio sky at unprecedented depths.\nTo capture the full potential of these surveys comes the need to transform the data analysis and interpretation techniques.\n\nHistorically, the greatest scientific discoveries with major telescopes are serendipitous and lie beyond the original goals of the experiment \\citep[][]{norris15}.\n\\cite{ekers09} finds that in the last 60 years, only seven out of 18 major astronomical discoveries were planned.\nCurrently, existing methods to make unexpected discoveries are primarily powered by human intelligence that are not expected to scale up to the massive data volumes of this decade.\nWithout redesigning the search efforts, several unknown radio phenomena may take years to be found, or may never be found.\n\nIn recent years, machine learning has emerged as a powerful tool to model highly non-linear data.\nDepending on the availability of data, machine learning can be performed in a supervised or unsupervised manner.\nFor supervised learning the model is trained on several examples of input-output pairs.\nSuch a model trained with truth labels is then used to estimate the output from a given input.\nRecently, these machine learning models have shown encouraging results when used to classify the radio source morphologies \\citep[e.g.][]{lukic18, alger18,wu19,viera21}.\nHowever, without training labels these models in their current form are useless.\nWith multi-million radio detections in future surveys where labelling a large training dataset is both expensive and time consuming, making it more pertinent to invest in unsupervised learning techniques.\n\nIn the present work, we use a self-organizing map \\citep[SOM][]{kohonen82} that does not require truth labels and focuses on the recognition of structure in a dataset.\nSOMs have previously been used to classify the radio morphologies \\citep[e.g.][]{ralph19,galvin19,galvin20} and very recently to find some of the rarest radio morphologies \\citep[][]{mostert21}.\nFollowing these previous studies, we use an implementation of the SOM that is invariant to affine transformations e.g. rotational, flipping and scaling variation of a radio galaxy.\nWe train a SOM using a catalogue of ``complex'' (defined here as all multi-component) sources in ASKAP's Evolutionary Map of Universe pilot survey \\citep[EMU-PS;][]{norris11, norris21}.\nThe trained SOM is then used to find the most unusual radio sources.\nWe derive a similarity metric for complex sources in EMU-PS as well as the pilot phase of Deep Investigation of Neutral Gas Origins survey (DINGO\\footnote{https://dingo-survey.org/}) and the Survey With ASKAP of GAMA-09 + X-ray \\citep[SWAG-X;][]{moss22prep}.\nBased on this similarity metric score, we visually inspect sources with the top 0.5\\% most complex radio morphologies. \nWe present the rarest radio morphologies in the top 0.5\\% complex sources.\nAmong these are the peculiar morphologies with unusual radio structures and, no corresponding diffuse emission in the optical wavelengths.\nWe briefly discuss some of these peculiar sources in the present paper and note that future work should study them in more detail to understand the unconventional physical mechanisms behind their formation.\nIn addition, the rest of the top 0.5\\% complex sources have conventional radio morphologies with known mechanisms of formation.\nWe present few examples of these sources as well.\n\nThe paper is structured as follows.\nIn Section~\\ref{SEC:Observations}, we describe the ASKAP observations and other multiwavelength datasets we used.\nSection~\\ref{SEC:method} is dedicated to the methods that include data pre-processing, description of SOMs, details about the network training, and the procedure to select peculiar sources.\nIn Section~\\ref{SEC:results}, we present a multiwavelength view of peculiar radio sources and examples of conventional sources.\nIn Section~\\ref{SEC:Discussion}, we discuss the overdensity of galaxies near the circular radio sources.\nWe summarise our findings in Section~\\ref{SEC:Summary} and provide directions for future work.\nThroughout this paper, we assume a flat $\\Lambda$CDM cosmology based on \\cite[][]{planck18-1} with $H_0=67.5$ and $\\Omega_{m}=0.315$.\n\n\\section{Observations}\n\\label{SEC:Observations}\nIn this section we describe the radio, infra-red and optical observations we used.\n\\subsection{ASKAP Observations}\n\\label{SEC:ASKAP}\nASKAP is a radio telescope located at the Murchison Radio-astronomy Observatory (MRO). \nThe telescope is equipped with the phased array feed \\citep[PAF:][]{hay06} technology that enables high survey speed by virtue of wide instantaneous field of view. \nASKAP has 36 antennas with a range of baselines. \nMost of these are located within a region of 2.3 km diameter, with the outer six extending the baselines up to 6.4 km \\citep{hotan21}.\nASKAP has recently completed the first all-sky Rapid ASKAP Continuum Survey \\citep[RACS:][]{McConnell20} covering the entire sky south of Declination $+41^{\\circ}$ to a median RMS of about 250 \u00b5Jy/beam.\nThis has paved a way for subsequent deeper surveys using ASKAP.\n\nOne such survey is the Evolutionary Map of the Universe \\citep[EMU;][]{norris11}, which is planned to observe the entire Southern Sky and is expected to produce a catalogue of as many as 40 million sources \\footnote{Forecast based on the allocated time for the EMU 5-year survey program (see https://www.atnf.csiro.au/projects/askap/commissioning\\_update.html).}.\nProceeding in this direction, the EMU Pilot Survey \\citep[EMU-PS:][]{norris21} was completed in late 2019.\nThe EMU-PS covers 270 deg$^2$ of sky with $301^{\\circ}< {\\rm RA} < 336^{\\circ}$ and $-63^{\\circ}< {\\rm Dec} < -48^{\\circ}$.\nIt consists of 10 tiles with total integration time of $\\sim 10$ hours each, reaching an RMS sensitivity of $25-35~\\mu$Jy/beam and a beamwidth of $13^{\\prime\\prime} \\times 11^{\\prime\\prime}$ FWHM.\nThe operating frequency of EMU-PS is between 800 and 1088 MHz centred at 944 MHz.\nThe raw data was processed using the ASKAPsoft pipeline \\citep[][]{whiting17,norris21}. \nAs the survey data consists of ten overlapping tiles, value-added processing was performed to produce a unified image and source catalogue.\nThis includes merging of tiles by performing the weighted average of the data in overlapping regions and convolving the unified image to a common restoring beam size of $18^{\\prime\\prime}$ FWHM to overcome the variations in point spread function (PSF) from beam to beam \\citep[][]{norris21}. \nA catalogue of islands and components is then constructed by running the ''$Selavy$\" source finder \\citep[e.g.][]{whiting12} on the convolved image.\nThis catalogue contains 220,102 components with 81.3\\% simple (or single component) and 18.7\\% complex (multiple components) sources.\nAs the main goal of the present work is to find a way to streamline a search of new peculiar radio sources, we have limited our analysis to the 41,181 components of complex sources in the catalogue.\n\nThe second survey used here is the Deep Investigation of Neutral Gas Origins pilot survey (DINGO\\footnote{https://dingo-survey.org/}). \nDINGO aims to provide a legacy of deep HI observations out to redshift $z\\sim0.4$. \nThe key science goals of DINGO are to study the evolution of the cosmic HI density and the evolution of galaxies \\citep[][]{meyer09}. \nThe central frequency of the survey is 1367 MHz.\nIn the present work, we use 11 DINGO tiles publicly available from the CSIRO ASKAP Science Data Archive (CASDA\\footnote{https://research.csiro.au/casda/}).\nEach tile has a total integration time of $\\sim 8$ hours except for two tiles with $\\sim 6$ hours of integration.\nThe average beamwidth of the survey is $10^{\\prime\\prime} \\times 6^{\\prime\\prime}$ FWHM.\nEach tile was processed using ASKAPsoft with standard continuum settings.\nSeven tiles with Scheduling Block IDs (SBIDs): 10991, 10994, 11000, 11003, 11006, 11010 and 11026 cover the same sky region with $338^{\\circ}< {\\rm RA} < 346^{\\circ}$ and $-36^{\\circ}< {\\rm Dec} < -29^{\\circ}$.\nThese tiles have RMS sensitivity between 49 and $64~\\mu$Jy/beam.\nWeighting the individual tiles proportional to $1/\\rm RMS^2$ we generate an averaged map from these tiles with a final RMS sensitivity near $21~\\mu$Jy/beam.\nIn the same way, tiles with SBIDs 14109 and 14136 covering the area of $217^{\\circ}< {\\rm RA} < 223^{\\circ}$ and $-3^{\\circ}< {\\rm Dec} < +4^{\\circ}$ are also combined to get a second averaged map with final RMS noise of $40~\\mu$Jy/beam.\nA third averaged map is generated combining SBIDs 14055 and 14082 covering the area of $211^{\\circ}< {\\rm RA} < 218^{\\circ}$ and $-3^{\\circ}< {\\rm Dec} < +4^{\\circ}$ with resultant RMS noise of $37~\\mu$Jy/beam.\nSource catalogues are publicly available at CASDA for each of the 11 tiles.\nIn this analysis we use three catalogues that correspond to the three tiles with the lowest RMS noise in that sky area.\nWe then combine these three catalogues by removing duplicate sources in the overlapping regions.\nThe final catalogue has a total number of 34,705 components with 3,841 complex source components.\nWe use source positions given in the catalogues to make cutouts from the averaged maps.\n\nAnother ASKAP survey used in the present work is the Survey With ASKAP of GAMA-09 + X-ray (SWAG-X) which as the name suggests is designed to cover the GAMA\\footnote{http://www.gama-survey.org/} and eROSITA\\footnote{https://www.mpe.mpg.de/eROSITA} Final Equatorial-Depth Survey \\citep[eFEDS;][]{brunner21} fields.\nThis survey comprises 13 ASKAP tiles (publicly available at CASDA) for complete coverage of the eFEDS region, with $\\sim 8$ hours integration per tile.\nSimilar to EMU-PS and DINGO, each tile is processed using ASKAPsoft.\nThe average beamwidth of the survey is $14^{\\prime\\prime} \\times 12^{\\prime\\prime}$ FWHM.\nThe frequency band of the survey is centred at 888 MHz.\nThe RMS noise of these 13 tiles ranges from 49 to $64~\\mu$Jy/beam.\nThese tiles cover the sky area with $126^{\\circ}< {\\rm RA} < 146^{\\circ}$ and $-5^{\\circ}< {\\rm Dec} < +8^{\\circ}$.\nWe generated six averaged maps by combining 2-3 tiles for each map and using weights proportional to $1/\\rm RMS^2$.\nThe tile SBIDs used for making averaged maps include: 10132 and 20875; 10108 and 20931; 10123 and 10475; 10135 and 20132; 10126 and 21021; 10137, 10129 and 10486.\nThe resultant RMS noise of these six averaged maps is between 32 and $36~\\mu$Jy/beam.\nWe use six catalogues corresponding to the tiles with the lowest RMS noise in the same sky region.\nWe then combine these three catalogues by removing duplicate sources in the overlapping regions.\nThe combined catalogue has 145,011 components with 21,324 complex source components.\n\nAs mentioned before, our analysis in this paper is limited to the complex sources from all three ASKAP surveys.\nWe use EMU-PS for training the ML model and the other two surveys are used to infer the trained model.\nNote that the source catalogues used to get the positions of radio sources in the SWAG-X and the DINGO surveys are from the individual tiles.\nHowever, we use the averaged maps instead of the individual tiles to make image cutouts at the positions of these radio sources.\nThese cutouts are then used to find the peculiar sources using the trained ML model and for the figures in the present work.\nDue to lower noise in the averaged maps, it is possible that the complex radio sources detected in the individual tiles have higher signal-to-noise in the averaged maps.\nHere we assume that these catalogues have all the top peculiar complex sources that are detected by our ML method.\nFuture work should verify this by creating source catalogues from the averaged images, which is beyond the scope of the\ncurrent work that is focused on the development of ML method from available catalogues.\n\n\\subsection{Infrared and Optical data}\n\\label{SEC:DES_SDSS}\nWe use the photometric data available for the ASKAP survey regions to identify the infrared and optical sources in the region of circular and peculiar radio objects presented here. \nThe Wide-field Infrared Survey Explorer \\citep[WISE;][]{wright10} is an all sky infrared survey observed in the W1, W2, W3 and W4 bands that correspond to 3.4, 4.6, 12 and 22 $\\mu$m wavelength.\nIn this study, we use only the W1 band from AllWISE \\citep[AllWISE;][]{cutri13} that has a 5$\\sigma$ point source sensitivity of 28 $\\mu$Jy.\nThe optical data were taken from the publicly available 9th data release of the Dark Energy Spectroscopic Instrument's Legacy Imaging Surveys \\citep[DESI LS DR9\\footnote{https://www.legacysurvey.org/dr9/};][]{schlegel21}, the Science Archive Server of Sloan Digital Sky Survey \\citep[SDSS;][]{alam15} and Dark Energy Survey \\citep[DES;][]{abbott18}.\nUnless specified otherwise, we report photometric redshifts from the counterparts in DESI LS DR9 throughout this paper.\n\n\\section{Method}\n\\label{SEC:method}\nThe first crucial step while fitting a machine learning model is to pre-process the data and make it suitable for the machine.\nIn this section, we describe the pre-processing procedure as well as the machine learning technique used here.\n\n\\begin{figure*}\n\\centering\n\\includegraphics[width=18cm, scale=0.5]{figures/preprocessing.pdf}\n\\caption{Pre-processing procedure for radio images. From left to right, the first panel shows an ASKAP observed radio image. Second panel of Figure 1 shows the full (blue-filled histogram) and clipped (orange-dashed line) distributions of image pixels. Noise is estimated as the standard deviation ($\\sigma$) of clipped distribution. Third panel shows the segmented islands at positions where pixel values are greater than 3$\\sigma$. Here pixel values are converted to logarithmic scale and Min-Max normalisation is applied. Fourth panel shows the final pre-processed image where a threshold limit on number of pixels that constitute an island is imposed. This removes most of the noise fluctuations in radio maps.} \n\\label{FIG:Preprocessing}\n\\end{figure*}\n\n\\subsection{Data Pre-processing}\n\\label{SEC:preprocessing}\nThe most important aspect of machine learning is the quality of data used to train models.\nThe high sensitivity of ASKAP surveys creates advanced challenges for data pre-processing due to the large source density in survey images.\nWe design the following pre-processing scheme to enhance useful features in the radio images:\n\\begin{itemize}\n \\item We create cutouts from the survey images at the positions of all components of complex radio sources.\n We chose a cutout size of $5^{\\prime} \\times 5^{\\prime}$ as only 11 sources in EMU-PS (i.e. 1 in $\\sim 20,000$) have a size greater than $5^{\\prime}$ \\citep[][]{yew22prep}.\n This gives us a $150\\times 150$ pixel image with pixel size of $2^{\\prime \\prime}$.\n One such cutout is shown in the left panel of Figure~\\ref{FIG:Preprocessing}.\n This map has a faint double lobed radio source in the centre which has a low signal-to-noise ratio.\n \\item We estimate the noise in each cutout. \n This is done by first measuring the Median Absolute Deviation (MAD) of pixel values. \n Two rounds of data clipping are then applied to remove outlying pixels. \n The outlier threshold is chosen at $3\\times$MAD.\n The noise is then estimated as the standard deviation of the clipped distribution.\n The second panel of Figure~\\ref{FIG:Preprocessing} shows the full and clipped distributions of image pixels in blue (filled) and orange (dashed) colors.\n \\item We perform an island segmentation for each cutout by generating masks of island sources with pixel values greater than 3$\\sigma$.\n Here $\\sigma$ is defined as the standard deviation of the clipped distribution.\n At the positions of these masks, we convert the pixel values to a logarithmic scale and perform Min-Max normalisation that enhances the signal on the scales of islands.\n The pixel values of the rest of the image are set to zero and the Min-Max normalisation of segmented regions changes the image scale in the range of 0 to 1.\n In the resultant image, shown in the third panel of Figure~\\ref{FIG:Preprocessing},\n the source density is moderately high, and some of the islands may just be noise fluctuations or artefacts.\n \\item To overcome this issue we impose a threshold on the number of pixels that constitute an island in the image.\n This means that we keep only those islands for which the signal is distributed over a large number of pixels.\n After some tests and visual inspections we set the minimum size for an island to 60 pixels.\n This threshold removes most of the noise fluctuations from the maps.\n Note that this limit may also remove some point sources. \n However, that doesn't effect our analysis as the purpose of this study is to discover the most peculiar complex sources.\n The final pre-processed radio image is shown in the right panel of Figure~\\ref{FIG:Preprocessing}.\n\\end{itemize}\n\n\n\\subsection{Self Organizing Map}\n\\label{SEC:SOM}\nA self-organizing map \\citep[SOM;][]{kohonen82} is a neural network that provides an efficient way to understand high-dimensional data.\nThe neural network constructs a representative feature map of the training dataset. \nThis can be used for the tasks of dimensionality reduction and to display similarities among data sets.\nSOM learns in an unsupervised manner and does not require a target vector for the dataset. \nThis is important for our task as the radio sources that we aim to find are unknown objects.\nAn advantage of using SOM over other unsupervised architectures is topologically preserved mapping from input to output spaces.\nThis is important to retain the spatial information of astronomical images.\n\nThe basic unit of the SOM is a neuron $n$. \nA number of $N$ neurons are organized in an input layer and are connected to an output feature map.\nThese connections have associated weights $w$ that are randomly initialised.\nWhile training, data is provided to the input layer and the extracted features are propagated to the output map.\nThe output map has the form of a lattice or grid where each neuron is placed at a position $p$. \nEach neuron in the lattice competes with the others to win every subject in the dataset.\nFor a training iteration $i$, a subject $d$ from the dataset $D$ is selected to compute a similarity measure $S(d,w_p)$ with respect to a neuron with prototype weights $w_p$.\nThe winning neuron for $D_j$ is its Best Matching Unit (BMU) whose position is identified as $k$.\nFollowing this the prototype weights of BMU and neighbouring neurons are updated as\n\\begin{equation}\nw_p^{\\prime} = w_p + (\\phi(d)-w_p) \\times G(p,k) \\times L(i),\n\\label{EQ:weights}\n\\EE\nwhere $w_p^{\\prime}$ is the updated weight. The term\n$(\\phi(d)-w_p)$ is required to spatially align $d$ onto $w_p$.\n$G(p,k)$ is the neighbourhood function parametrised as a Gaussian that controls the propagation of weight updates to neighbouring neurons. \nIn principle, the neighbouring neurons of a BMU get smaller updates and the amount depends upon the separation between $k$ and $p$ as well as the chosen width $\\sigma_G$ of the Gaussian. \n$L(i)$ is the learning rate that further controls the weighting updates for each iteration.\n\nThe SOMs have been used previously in astronomy for classification of light curves and clustering of gigahertz-peaked sources \\citep[e.g.][]{brett04,torniainen08}. \nMore recently, SOMs are used for the estimation of photometric redshifts in large surveys \\citep[e.g.][]{geach12,wright20}.\nThese datasets are in the form of catalogues of sources.\nThe application of neural networks on the astronomical image datasets requires that the method is invariant to affine transformations.\nExamples of such transformations include translation, scaling, flipping and rotation of images.\nThis means that for sources in an image, e.g. double lobed Active Galactic Nuclei (AGN), the algorithm should not be sensitive to their orientation in the sky.\nTo approach a solution, \\cite{ralph19} used a convolutional auto-encoder to reduce the impact of affine transformations for the classification of radio galaxies.\nSimilarly, \\citep{segal22} is using auto-encoders to measure the complexity of radio galaxies.\nHowever, the training of the SOM using the compressed latent vector space of auto-encoders results in the loss of topological information.\nIn a different approach, \\cite{polsterer15} developed Parallelized rotation and flipping INvariant Kohonen maps (PINK) to incorporate the transformational invariance into the SOMs.\n\\cite{galvin19} showed that PINK can be an ideal solution to break the degeneracy arising due to affine transformations without losing topological information.\n\\cite{galvin20} further exploited PINK to classify different morphologies of radio sources using the Faint Images of the Radio-Sky at Twenty centimetres \\citep[FIRST;][]{becker95}. \nFollowing this, we use PINK in this analysis to find the rare and unusual radio morphologies.\n\nPINK implements a modified Euclidean distance metric for similarity measure that can be written as\n\\begin{equation}\nS(d, w_k) = \\underset{\\forall \\phi \\in \\Phi}{{\\rm minimize}(\\phi)} \\sqrt{\\sum_{c=0}^C \\sum_{x=0}^X \\sum_{y=0}^Y \\left(w_{k(c,x,y)} - \\phi(d_{c,x,y}) \\right)^2},\n\\label{EQ:similarity}\n\\EE\nwhere $\\phi$ is an affine transformation drawn from a set of $\\Phi$ and is optimized to align an image to features in the BMU.\nThis is propagated to update the neighbouring units. $C$ is the number of channels of an image.\nHere we use only one channel. $X$ and $Y$ define the pixel size of the image.\nThis optimizes the search for transformation parameters to align $d$ to prototype weights $w_k$ of a SOM.\n\n\\begin{table}[!ht]\n\\centering\n\\begin{center}\n\\begin{tabular}{lccccc}\n\\hline\n\\hline\n\\multicolumn{1}{c}{Stage} & \\multicolumn{1}{c}{Iterations} & \\multicolumn{1}{c}{Rotations} & \\multicolumn{1}{c}{Increments} & \\multicolumn{1}{c}{$\\sigma_G$} & \\multicolumn{1}{c}{$L$} \\\\ \n\\hline\n1 & 5 & 90 &$4^{\\circ}$& 1.5 & 0.1 \\\\\n2 & 5 & 180 &$2^{\\circ}$& 1.0 & 0.05 \\\\\n3 & 5 & 360 &$1^{\\circ}$& 0.7 & 0.05 \\\\\n4 & 10 & 360 &$1^{\\circ}$& 0.5 & 0.005\\\\\n\\hline\n\\end{tabular}\n\\end{center}\n\\caption{Parameters for different stages of training. From left to right are the number of iterations, number of rotations, increment with each rotation, width of $G(p,k)$ and learning rate.}\n\\label{TAB:TRAINING}\n\\end{table}\n\n\\begin{figure}[!ht]\n\\centering\n\\includegraphics[width=8.5cm, scale=0.5]{figures/EMU_Complex_EMU_Channel.pdf}\n\\caption{The trained $10\\times10$ SOM using the complex sources in the EMU-PS.\nThe X- and Y-axes show identities of neurons that are representatives of the best matched radio sources.\nAcross the lattice, these neurons represent resolved radio lobes, extended structures bridged by diffuse emission, and more compact sources.\nThis shows that after 4 stages of training, the SOM represents meaningful radio morphologies.} \n\\label{FIG:SOM}\n\\end{figure}\n\n\\begin{figure}[!ht]\n\\centering\n\\includegraphics[width=8cm, scale=0.5]{figures/BMU_Counts.pdf}\n\\caption{Number counts of complex EMU-PS sources on $10\\times10$ SOM lattice. \nThe largest number is associated with the neuron (8,5) with resolved double lobed sources.} \n\\label{FIG:BMUcounts}\n\\end{figure}\n\n\\subsection{Training}\n\\label{SEC:Training}\nWe construct a SOM in a Cartesian lattice space with $10\\times 10$ neurons.\nEach neuron has a circular shape initialised with uniform random noise between 0 and 1.\nThe circular shape preserves the entire region of the image against the affine transformations.\nThis is an improvement over the previous versions of PINK with square shaped neurons which resulted in the loss of information\nin the outer regions due to image transformations \\citep[e.g.][]{galvin19, galvin20}.\nThe SOM is trained in four stages with user-defined parameters outlined in Table~\\ref{TAB:TRAINING}.\n\nIn each stage, every subject in the dataset is passed through the network to update prototype weights.\nEach full passage of the dataset through the network is called an iteration.\nThe first three stages include five iterations each of the dataset, and the final stage has 10 iterations.\nAcross all stages, a normalised Gaussian neighbouring function is used to update the weights of neighbouring neurons.\nThe $1\\sigma$ width of the Gaussian is reduced in every stage with $\\sigma_G = 1.5$ and 0.5 for the first and final stages, respectively.\nThis helps in establishing a broad set of morphologies across the lattice in the first stage, and fine tuning of the small scale structure in later and final stages.\nFor the same reason, the first stage requires a minimal set of rotations.\nThus our first training stage has 90 rotations for each subject in dataset with increments of $4^{\\circ}$. \nThis is increased to 360 rotations in the final stage with increments of $1^{\\circ}$.\nThe large learning rate and the size of neighbouring function in the first stage allows the modification of many prototypes with each update.\nThese are subsequently reduced to shrink the region of influence of each prototype weight in later stages.\n\nNote that there are no formal convergence criteria for training a SOM as the algorithm works in an unsupervised way.\nThis makes the manual estimation of the training parameters an important aspect of our analysis.\nWith a small learning rate, the SOM will take a long computational time to train. \nOn the other hand, larger values result in unstable prototype updates.\nSimilarly, a small neighbouring function decouples the neurons from each other, whereas its larger width results in the modification of more prototype weights.\nWe converge on the training parameters for the four stages by experimenting with several possibilities and qualitative examination of the meaningful morphologies across the SOM lattice.\nWe also train a SOM with $25\\times 25$ neurons and find no difference in the detection of rare radio morphologies when compared with a SOM of $10\\times 10$ neurons.\n\nIn this analysis, the SOM is trained using the 41,181 components of complex radio sources from the EMU-PS catalogue.\nEach image is centred at the component position and has a cutout size of $150\\times150$ pixels amounting to a $5^{\\prime}\\times 5^{\\prime}$ field of view.\nThe training of the SOM is carried out on a cluster with 8 GPUs and 64 GB of memory for a total of $\\sim 18$ hours.\n\n\\begin{figure}[!ht]\n\\centering\n\\includegraphics[width=8cm, scale=0.5]{figures/BMU_Euclidean_distance.pdf}\n\\caption{The distributions of Euclidean distance for the EMU-PS (solid green), SWAG-X (dashed blue) and DINGO (dot-dashed red) survey datasets. \nThe tails of these distributions (towards the right end) have sources among the rarest and peculiar sources (see Section~\\ref{SEC:selection} for details).} \n\\label{FIG:Eucl_histogram}\n\\end{figure}\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_314.679483_-57.614637_146_80_16499.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_315.743789_-62.005714_263_142_15561.pdf}\n\\caption{Previously discovered ORCs located in the EMU-PS fields \\citep[see][and Table~\\ref{TAB:all-orcs}]{norris21b}.\nThe left panels show $12^{\\prime} \\times 12^{\\prime}$ radio images from EMU-PS.\nWe show pre-processed radio images with no threshold on the number of pixels for an island.\nThe larger cutout size helps to rule out the possibility of association with other sources on large scales.\nCentral ORC sky positions, ID numbers for visual inspections and Euclidean distances are noted on these images.\nThe middle panels show radio contours on top of the WISE-W1 infrared images to visualize the nearby infrared sources.\nThe right panels show $5^{\\prime} \\times 5^{\\prime}$ cutouts that is the size of the images used to train the SOM.\nThis shows that our method comfortably detects previously known rare morphologies among the top 0.5\\% sources.\n} \n\\label{FIG:ORCCandidates0}\n\\end{figure*}\n\n\\begin{table*}[!ht]\n\\centering\n\\begin{center}\n\\begin{tabular}{ccccccc}\n\\hline\n\\hline\nName & Integrated radio & RA (Deg). & Dec (Deg) & Survey & Reference \\\\\n & flux density (mJy) \\\\\n\\hline\n\\\\\nORC J2102--6200 & 6.26 & 315.7429 & $-62.0044$ & ASKAP & \\citet{norris21b} \\\\\n & & & & (EMU-PS) & \\\\\nORC J2058--5736 & 6.97 & 314.6783 & $-57.6161$ & ASKAP & \\citet{norris21b} \\\\\n & & & & (EMU-PS) & \\\\\nORC J2058--5736 & 1.86 & 314.7346 & $-57.6153$ & ASKAP & \\citet{norris21b} \\\\\n & & & & (EMU-PS) & \\\\\nORC J1555+2726 & -- & 238.8527 & $+27.4427$ & GMRT & \\citet{norris21b} \\\\\nORC J0102--2450 & 3.9 & 015.6016 & $-24.8442$ & ASKAP & \\citet{koribalski21} \\\\\n\\hline\n\\\\\nJ084927.5--045721 & 228.5 & 132.3645 & $-4.956 $ & ASKAP & Present work \\\\\n & & & & (SWAG-X) & \\\\\nJ222339.5--483449 & 17.2 & 335.9145 & $-48.5803$ & ASKAP & Present work \\\\\n & & & & (EMU-PS) & \\\\\n\\hline\n\\hline\n\\end{tabular}\n\\end{center}\n\\caption{Previously known ORCs (top 5 rows) and ORC candidates from present work (bottom 2 rows).\nFrom left to right we show: IDs, names using the approximated centre of diffuse emission, integrated radio flux densities, approximate geometrical centres of these systems, their parent surveys and references.}\n\\label{TAB:all-orcs}\n\\end{table*}\n\n\\subsection{Final SOM \\& Selection of Rare Radio Morphologies}\n\\label{SEC:selection}\nThe final trained SOM is shown in Figure~\\ref{FIG:SOM}.\nAfter four stages of training the SOM appears to show meaningful radio morphologies.\nThese morphologies include resolved radio lobes, extended structures bridged by diffuse emission, and more compact sources.\n\nThe information attached to a neuron can be used to identify all subjects that share this neuron as their BMU.\nA properly trained SOM contains a representative neuron for each subject in the training dataset.\nUsing this information, we map the image dataset on the trained SOM to evaluate the similarity statistics.\nFigure~\\ref{FIG:BMUcounts} shows the number counts of EMU-PS components for each of the neurons in the SOM lattice.\nThe lowest number of subjects in the lattice is attached to the neuron (6,7). \nThe largest number is associated with the neuron (8,5) with resolved double lobed sources.\nNote that SOM BMUs are representative of the majority of sources in a sample (the typical radio galaxies). Rare and unusual sources will be much more poorly characterised by the BMUs, leading to a much larger Euclidean distance than for the bulk of sources.\n\nFor an adequately trained SOM, all sources in the dataset have a BMU.\nAs can be noted from the prototypes in the trained SOM lattice in Figure~\\ref{FIG:SOM}, all structures in the neurons can be identified as known morphologies of radio sources.\nThese prototypes can be used to classify these radio sources which is beyond the scope of the present work as here we are focused only on finding the rare radio morphologies.\nThe rare and unusual sources are not expected to be clustered in a single neuron.\nTherefore, we use a similarity measure to identify the most peculiar sources in the dataset.\nWe use the modified Euclidean distance metric to identify these objects.\nNote that the SOM is trained with EMU-PS complex sources only but we map the complex sources from all three surveys on the trained lattice.\n\nWe examine the distributions of Euclidean distances.\nFigure~\\ref{FIG:Eucl_histogram} shows the Euclidean distance histograms for EMU-PS (solid green), SWAG-X (dashed blue) and DINGO (dot-dashed red) complex sources.\nThe median (and standard deviation) of these distributions are 2.1 (2.3), 3.1 (2.4) and 3.2 (2.1) for EMU-PS, SWAG-X and DINGO, respectively.\nWe notice that the SWAG-X and DINGO distributions have higher median Euclidean distances as compared to the EMU-PS.\nThis is possibly due to the differences in observing frequencies, map resolutions and RMS sensitivities of these surveys described in Section~\\ref{SEC:ASKAP} and/or a lower number of complex sources in DINGO and SWAG-X surveys.\nFor each of these distributions, we chose a lower limit to the Euclidean distances and visually examine the top 0.5\\% of complex sources for peculiarity.\nWe note that this is a simplistic approach to reduce the number of visual inspections.\nThe choice of the top rarest 0.5\\% leaves us with approximately 200, 100 and 20 sources in the EMU-PS, SWAG-X and DINGO surveys, respectively.\nIn the following sections, we discuss some of these rare radio source morphologies.\n\n\\floatsetup[table]{font=tiny}\n\\begin{table*}[!ht]\n\\centering\n\\begin{center}\n\\begin{tabular}{ccccccccccccc}\n\\hline\n\\hline\n\\\\\n\\multicolumn{1}{c}{Name} &\\multicolumn{1}{c}{RA (deg)} & \\multicolumn{1}{c}{Dec (deg)} & \\multicolumn{1}{c}{Flux (mJy)} & \\multicolumn{1}{c}{Counterparts} & \\multicolumn{1}{c}{$g$} & \\multicolumn{1}{c}{$r$} & \\multicolumn{1}{c}{$i$} & \\multicolumn{1}{c}{W1} & \\multicolumn{1}{c}{W2} & \\multicolumn{1}{c}{W1-W2} & \\multicolumn{1}{c}{$z_{\\rm ph}$} & \\multicolumn{1}{c}{$z_{\\rm spec}$} \\\\\n\\hline\n\\\\\nSWAG-X \\\\\nJ084927.5--045721\\\\\n\\\\\nA & 132.3638 & -4.9588 & 3 & WISEA J084927.33-045732.3 & 16.17 & 15.56 & 15.32 & 13.24 & 13.32 & -0.08 & $0.02\\pm0.05$ & --\\\\% & -- \\\\\n & & & & 2MASS J08492733-0457315 & \\\\\nB & 132.3659 & -4.9614 & 9 & WISEA J084927.80-045741.1 & 17.78 & 16.94 & 16.39 & 12.48 & 12.45 & -0.03 & $0.08\\pm0.01$ & --\\\\% & Galaxy\\\\\n & & & & 2MASX J08492779-0457412 & \\\\\nC & 132.3692 & -4.9542 & 6 & WISEA J084928.60-045715.0 & 18.07 & 17.23 & 16.81 & 12.54 & 12.49 & -0.05 & $0.08\\pm0.02$ & 0.07697\\\\% & Galaxy\\\\\n & & & & 2MASX J08492860-0457152 & \\\\\nD & 132.3684 & -4.9505 & 18 & WISEA J084928.42-045702.1 & 18.36 & 17.48 & 17.01 & 12.69 & 12.69 & 0.00 & $0.08\\pm0.01$ & --\\\\% & -- \\\\\n & & & & 2MASS J08492840-0457017 & \\\\\nE & 132.3607 & -4.9544 & 2 & WISEA J084926.56-045715.9 & 18.85 & 18.1 & 17.68 & 14.34 & 14.31 & 0.03 & $0.09\\pm0.01$ & --\\\\% & -- \\\\\n\n\\\\\n\\hline\n\\\\\nEMU-PS \\\\\nJ222339.5--483449\\\\\n\\\\\nA & 335.9158 & -48.5827 &0.06& WISEA J222339.73-483457.9 & 20.78 & 19.35 & 18.87 & 15.71 & 15.45 & 0.26 & $0.34\\pm0.04$ & --\\\\% & -- \\\\\nB & 335.9148 & -48.5903 &0.10& WISEA J222339.53-483524.8 & 18.76 & 17.61 & 17.19 & 15.05 & 14.77 & 0.28 & $0.22\\pm0.02$ & --\\\\% & Galaxy\\\\\n & & & & 2MASS J22233951-4835247 & \\\\\nC & 335.9145 & -48.5803 &0.06& WISEA J222343.07-483440.6 & 19.51 & 18.32 & 17.93 & 14.52 & 14.17 & 0.35 & $0.23\\pm0.01$ & --\\\\% & Galaxy\\\\\n & & & & 2MASS J22234313-4834406 & \\\\\nD & 335.9075 & -48.5785 &0.07& WISEA J222337.80-483442.4 & 21.27 & 19.94 & 19.41 & 15.78 & 15.52 & 0.26 & $0.33\\pm0.04$ & --\\\\% & -- \\\\\n\\\\\n\\hline\n\\hline\n\\end{tabular}\n\\end{center}\n\\caption{Properties of optical and infrared sources near the two new ORC candidates presented in the present work.\nFrom left to right, we show ORC names and prominent optical sources.\nRight Ascension (RA) and Declination (Dec) of these sources.\nIntegrated radio flux density estimated at their positions using ASKAP images.\nThe optical ($gri$) and infrared (W1, W2) photometry for each of the nearby sources.\nPhotometric redshifts from DESI LS DR9 and spectroscopic redshifts where available.\nThe $gri$ information for SWAG-X J084927.5--045721 is taken from Pan-STARRS \\citep{flewelling20} and for EMU-PS J2223-4834 from DES surveys.\nW1, W2 band information is from the WISE survey.\nPhotometric redshifts are taken from DESI LS DR9.\n}\n\\label{TAB:ORC-counterparts}\n\\end{table*}\n\n\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_132.364541_-4.956006_134_64_21128.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_335.914468_-48.580295_139_76_23824.pdf}\n\\caption{ORC candidates from present work: SWAG-X J084927.5--045721 (top panels) and EMU-PS J222339.5--483449 (bottom panels).\nRadio continuum images (left panels), radio contours overlaid on WISE-W1 infrared images (middle panels), and smaller cutouts (right panels).\nLeft and middle panels have a size of $12^{\\prime} \\times 12^{\\prime}$ and right panels show $5^{\\prime} \\times 5^{\\prime}$ cutouts that is the same size used to train the SOM.\nLeft panels show central sky positions, ID numbers for visual inspections and Euclidean distances noted on the images.} \n\\label{FIG:ORCCandidates1}\n\\end{figure*}\n\n\n\\begin{figure*}\n\\centering\n\\includegraphics[width=8.5cm, scale=0.5, trim = 0cm 5cm 0cm 5.8cm]{figures/ORCs/Cutout_SWAG-X_DESI-LS_132.364541_-4.956006_134_64_21128v2.pdf}\n\\includegraphics[width=8.5cm, scale=0.5, trim = 0cm 5cm 0cm 5.8cm]{figures/ORCs/Cutout_EMU_DES_335.914468_-48.580295_139_76_23824v2.pdf}\n\\caption{Radio continuum contours overlaid on optical 3-color composite image ($5^{\\prime} \\times 5^{\\prime}$ cutouts).\nOptical image from DESI LS DR9 is used for SWAG-X J084927.5--045721 (left panel) and DES image for EMU-PS J222339.5--483449 (right panel).\nSeveral optical/infrared sources are identified near each ORC candidate with counterparts in WISE and 2MASS surveys and are labelled in alphabetical order (see Table~\\ref{TAB:ORC-counterparts} also).\n}\n\\label{FIG:ORCs2}\n\\end{figure*}\n\n\\section{Results}\n\\label{SEC:results}\nIn this section, we present peculiar radio source morphologies among the top 0.5\\% complex sources along with their observations in optical and infrared bands.\nThe peculiar radio sources have unconventional shapes with no corresponding diffuse emissions at optical wavelenghts.\nNote that the purpose of this study is to streamline the detection of rare radio morphologies using machine learning.\nFuture work should should study each of these in more detail to uncover the mechanisms of their formation.\nIn addition to peculiar sources, we discuss examples of other conventional radio morphologies among the top 0.5\\% complex sources.\n\n\\subsection{Peculiar Radio Morphologies}\n\\label{SEC:ORCs}\nAmong the peculiar radio morphologies, we find sources with nearly circular diffuse radio emission.\nSuch circular shapes are well known in radio images, and they either arise due to imaging artefacts or are real physical structures.\nAmong the known circular structures are the supernova remnants, planetary nebulae, circumstellar shells, face-on spiral galaxies or protoplanetary discs.\nIn a recent study, \\cite{norris21b} reports the discovery of a new class of circular features in radio images and named them as Odd Radio Circles (ORCs).\nThey report the discovery of three ORCs in EMU-PS and one in archival data from the Giant Metrewave Radio Telescope \\citep[GMRT;][]{ananthakrishnan01}.\nAnother ORC was discovered by \\cite{koribalski21} using a different ASKAP survey.\nAll of these are identified serendipitously by visual inspection of the radio images (see Table~\\ref{TAB:all-orcs} for the complete list).\nThree out of these five previously discovered ORCs have a central galaxy.\n\nFigure~\\ref{FIG:ORCCandidates0} shows two of the previously discovered ORCs in EMU-PS \\citep[ORC J2102\u20136200 and ORC J2058-5736;][]{norris21b}, \nand our method places them among the top 0.5\\% complex sources.\nEach row in the figure has three panels.\nThe left panels show radio images of $12^{\\prime} \\times 12^{\\prime}$ size.\nThroughout the present work, we show pre-processed radio images with no threshold on the number of pixels for an island.\nCentral pixel sky positions, ID numbers for visual inspections and Euclidean distances are noted on these images.\nThe value of ID increases with decreasing Euclidean distance and describes the chronology for visual inspections in order of decreasing complexity. \nFor example, a source with ID $=0$ is termed most peculiar with highest Euclidean distance and has highest priority for visual inspection.\nThe maximum value for the ID is equivalent to the number of top 0.5\\% complex sources.\nThe middle panels show same sized infrared images from WISE W1 bands on top of the radio contours.\nThe larger images show that there are no prominent structures near the ORCs to which these objects may have possible associations (see Section~\\ref{SEC:familiar} for other examples).\nThe right panels show smaller cutouts of $5^{\\prime} \\times 5^{\\prime}$, the size used to train the SOM, with radio contours overlaid on the infrared image.\n\nIn this paper, we present two more ORC candidates that are also among the top 0.5\\% sources and are similar to other previously known ORCs.\nTable~\\ref{TAB:all-orcs} presents positions and integrated flux densities of previously known ORCs and two ORC candidates from this analysis. \nThese positions correspond to their approximate geometrical center.\nTable~\\ref{TAB:ORC-counterparts} shows the properties of infrared and optical sources within the extent of the continuum emission of these ORC candidates.\nWe present positions, ASKAP fluxes, counterparts in different surveys, redshifts, and types of morphology from literature.\nThe $gri$ colors and WISE (W1, W2) photometry are also shown.\n\n\n\\begin{figure*}\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/UnusualRadioShapes/Cutout_323.538962_-53.60873_0_0_2188.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_330.109636_-56.175122_10_7_15520.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/UnusualRadioShapes/Cutout_327.610556_-62.168415_17_10_1734.pdf}\n\\caption{Peculiar radio morphologies in EMU-PS: Radio morphologies other than the ORCs and among the top rarest 0.5\\% of sources selected for visual inspections.\nFrom top to bottom we show three radio sources namely EMU-PS J213409.5--533631, EMU-PS J220026.3--561030 and EMU-PS J215026.5--621006.\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nBoth left and middle panels are $12^{\\prime} \\times 12^{\\prime}$ large and right panels are of the same size that is used to train the SOM ($5^{\\prime} \\times 5^{\\prime}$). \n} \n\\label{FIG:unusual_radio_shapes-1}\n\\end{figure*}\n\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_144.514318_-1.879923_63_24_12684.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_133.143462_6.46686_185_94_5604.pdf}\n\\caption{Peculiar radio morphologies in SWAG-X: Radio morphologies other than the ORCs and among the 0.5\\% sources selected for visual inspections.\nFrom top to bottom we show two radio sources namely SWAG-X J093803.4--015247 and SWAG-X J085234.4+062801.\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nBoth left and middle panels are $12^{\\prime} \\times 12^{\\prime}$ large and right panels are of the same size that is used to train the SOM ($5^{\\prime} \\times 5^{\\prime}$).See Figure~\\ref{FIG:unusual_radio_shapes-1-3big} and Table~\\ref{TAB:PEC-counterparts} as well.} \n\\label{FIG:unusual_radio_shapes-2}\n\\end{figure*}\n\n\n\n\\begin{figure*}\n\\includegraphics[width=8.cm, scale=0.5, trim = 0cm 4.5cm 0cm 7.cm]{figures/UnusualRadioShapes/Cutout_EMU_DES_323.538962_-53.60873_0_0_2188v2.pdf}\n\\includegraphics[width=8.cm, scale=0.5, trim = 0cm 5cm 0cm 9.8cm]{figures/ORCs/Cutout_EMU_DES_330.116369_-56.175463_19_9_17194v2.pdf}\n\\includegraphics[width=8.cm, scale=0.5, trim = 0cm 5cm 0cm 5.8cm]{figures/UnusualRadioShapes/Cutout_EMU_DES_327.610556_-62.168415_84_41_1974v2.pdf}\n\\includegraphics[width=8.cm, scale=0.5, trim = 0cm 5cm 0cm 5.8cm]{figures/ORCs/Cutout_SWAG-X_SDSS_144.5143_-1.8799_63_24_12684v2.pdf}\n\\includegraphics[width=8.cm, scale=0.5, trim = 0cm 5cm 0cm 5.8cm]{figures/ORCs/Cutout_SWAG-X_SDSS_133.149135_6.472304_63_28_981v2.pdf}\n\\caption{Peculiar radio morphologies in EMU-PS and SWAG-X: Panels show radio continuum contours overlaid on DES and SDSS 3-color ($gri$) composite images.\nThe peculiar sources are EMU-PS J213409.5--533631 (top left), EMU-PS J220026.3--561030 (top right), EMU-PS J215026.5--621006 (middle left), SWAG-X J093803.4--015247 (middle right) and SWAG-X J085234.4+062801 (bottom).\nWe identify optical/infrared sources near the radio emission for each source labelled with capital letters (see details in Table~\\ref{TAB:PEC-counterparts}).} \n\\label{FIG:unusual_radio_shapes-1-3big}\n\\end{figure*}\n\n\n\\begin{table*}[!ht]\n\\centering\n\\begin{center}\n\\begin{tabular}{ccccccccccccc}\n\\hline\n\\hline\n\\\\\n\\multicolumn{1}{c}{Name} &\\multicolumn{1}{c}{RA (deg)} & \\multicolumn{1}{c}{Dec (deg)} & \\multicolumn{1}{c}{Flux (mJy)} & \\multicolumn{1}{c}{Counterparts} & \\multicolumn{1}{c}{$g$} & \\multicolumn{1}{c}{$r$} & \\multicolumn{1}{c}{$i$} & \\multicolumn{1}{c}{W1} & \\multicolumn{1}{c}{W2} & \\multicolumn{1}{c}{W1-W2} & \\multicolumn{1}{c}{$z_{\\rm ph}$} & \\multicolumn{1}{c}{$z_{\\rm spec}$} \\\\\n\\hline\n\\\\\nEMU-PS \\\\\nJ213409.5--533631 \\\\\n\\\\\nA & 323.5738 & -53.6363 &18 & WISEA J213417.69-533811.1 & 15.24 & 14.29 & 13.90 & 11.49 & 11.48 & 0.01 & $0.07\\pm0.03$ & 0.0763\\\\% & Lenticular\\\\\n & & & & 2MASX J21341775-5338101 & \\\\\nB & 323.5367 & -53.5811 &5.4 & WISEA J213408.81-533451.8 & 16.39 & 15.44 & 15.07 & 12.75 & 12.73 & 0.02 & $0.11\\pm0.06$ & --\\\\% & Galaxy\\\\ \n & & & &2MASX J21340880-5334516 & \\\\\nC & 323.5278 & -53.5719 &0.4 & WISEA J213406.70-533418.7 & 15.29 & 14.35 & 13.97 & 11.74 & 11.71 & 0.03 & $0.08\\pm0.01$ & 0.07836\\\\% & Lenticular\\\\\n & & & & 2MASX J21340666-5334186 & \\\\\n\\\\\n\\hline\n\\\\\nEMU-PS \\\\\nJ220026.3--561030 \\\\\n\\\\\nA & 330.1004 & -56.1782 &110 & WISEA J220024.11-561041.7 & 14.93 & 13.99 & 13.59 & 11.71 & 11.75 & -0.04 & $0.05\\pm0.01$ & 0.0757\\\\% & Elliptical\\\\\n & & & & 2MASX J22002408-5610413 & \\\\\nB & 330.1346 & -56.1742 &1 & WISEA J220032.19-561026.0 & 17.20 & 16.25 & 15.86 & 13.59 & 13.58 & 0.01 & $0.08\\pm0.01$ & --\\\\% & Galaxy \\\\ \n & & & & 2MASX J22003234-5610273 & \\\\\n\\\\\n\\hline\n\\\\\nEMU-PS \\\\\nJ215026.5--621006 \\\\\n\\\\\nA & 327.6138 & -62.1703 & 36 & WISEA J215027.29-621013.3 & 15.79 & 14.85 & 14.47 & 12.10 & 12.08 & 0.02 & $0.07\\pm0.01$ & -- \\\\% & Galaxy\\\\\n & & & & 2MASX J21502732-6210129 & \\\\\nB & 327.5745 & -62.1852 & 4 & WISEA J215017.86-621106.4 & 15.66 & 14.77 & 14.39 & 12.28 & 12.27 & 0.01 & $0.06\\pm0.01$ & --\\\\% & Lenticular \\\\ \n & & & & 2MASX J21501790-6211070 & \\\\\nC & 327.6038 & -62.1485 & 3 & WISEA J215024.94-620854.5 & 17.22 & 16.28 & 15.90 & 13.37 & 13.33 & 0.04 & $0.08\\pm0.01$ & --\\\\% & Galaxy \\\\\n & & & & 2MASX J21502489-6208550 & \\\\\n\\\\\n\\hline\n\\\\\nSWAG-X\\\\\nJ093803.4--015247\\\\\n\\\\\nA & 144.5139 & -1.88 & 2 & WISEA J093803.35-015247.9 & 18.44 & 17.06 & 16.56 & 13.94 & 13.65 & 0.29 & $0.22\\pm0.01$ & --\\\\% & -- \\\\\n & & & & 2MASS J09380334-0152480 & \\\\\n\\\\\n\\hline\n\\\\\nSWAG-X \\\\\nJ085234.4+062801 \\\\\n\\\\\nA & 133.149 & 6.4725 & 2.1& WISEA J085235.74+062821.1 & 18.54 & 17.35 & 13.59 & 13.99 & 13.51 & 0.48 & $0.19\\pm0.02$ & 0.15958\\\\% & Galaxy\\\\\n & & & & 2MASX J08523573+0628209 & \\\\\nB & 133.1357 & 6.4605 & 1.2& WISEA J085232.90+062731.9 & 21.84 & 20.8 & 20.49 & 15.55 & 15.8 & -0.25 & $0.18\\pm0.06$ & --\\\\% & Galaxy \\\\ \n & & & & SDSS J085232.91+062731.7 & \\\\\nC & 133.1442 & 6.455 & 0.3& WISEA J085235.31+062720.2 & 22.1 & 20.93 & 20.32 & 13.92 & 13.78 & 0.14 & $0.26\\pm0.05$ & --\\\\% & Galaxy \\\\\n & & & & SDSS J085235.32+062720.5 & \\\\\n\\\\\n\\hline\n\\hline\n\\end{tabular}\n\\end{center}\n\\caption{Properties of optical and infrared sources near the peculiar radio sources other than the ORC candidates.\nThe columns are the same as described in Table~\\ref{TAB:ORC-counterparts}.\nThe $gri$ information here for EMU-PS J213409.5--533631, EMU-PS J220026.3--561030 and EMU-PS J215026.5--621006 is taken from DES, and for SWAG-X J093803.4--015247 and SWAG-X J085234.4+062801 is taken from SDSS.}\n\\label{TAB:PEC-counterparts}\n\\end{table*}\n\n\\subsubsection{SWAG-X J084927.5--045721}\n\\label{SEC:ORC-1}\nThis ORC candidate is found in the 888~MHz SWAG-X survey.\nThe left panels of Figure~\\ref{FIG:ORCCandidates1} show radio images of $12^{\\prime} \\times 12^{\\prime}$ size implying no sign of association with other surrounding sources. \nIn the middle panel radio contours are shown overlaid on the infrared image from WISE band W1.\nThe right panel shows a smaller cutout with the same size that is used to train the SOM ($5^{\\prime} \\times 5^{\\prime}$).\nThe source has a near circular shape with a diameter of $\\sim 50^{\\prime \\prime}$.\nThe integrated 888~MHz flux density is 228 mJy.\nThis source is also known as PMN J0849-0457 \\citep[Parkes-MIT-NRAO Surveys;][]{wright94}.\n\nWe identify five optical/infrared sources near the ORC candidate.\nThe left panel of Figure~\\ref{FIG:ORCs2} shows the radio contours overlaid on the DESI LS DR9 composite image using $gri$ optical bands.\nNear the geometrical centre of the ORC candidate, we find a bright optical/infrared source labelled as ``A\", which is\nWISEA~J084927.33-045732.3, and \\citep[2MASS J08492733-0457315;][]{skrutskie06}.\nDESI LS DR9 gives a highly uncertain photometric redshift of $z=0.02\\pm0.05$. \nThe Gaia parallax ($1.9\\pm0.3$ mas) and proper motion ($13.31\\pm0.36$ mas/year) measurements suggest that it is a nearby Galactic star \\citep[][]{brott05}.\nA galaxy labelled as ``B\" is located towards the south-east of ``A\".\nThis galaxy is WISEA~J084927.80-045741.1 and also \\citep[2MASX J08492779-0457412;][]{jarrett00}, with\na photometric redshift $z_{\\rm ph} = 0.08\\pm0.01$. \n\nTwo more galaxies labelled as ``C\" and ``D\" are located at the north-east edge at photometric redshifts of $0.08\\pm0.02$ and $0.08\\pm0.01$, respectively. \nGalaxy ``C\" is WISEA~J084928.60-045715.0 or also 2MASX~J08492860-0457152, with $z_{\\rm spec} = 0.07697$ \\citep[][]{jones09}.\nGalaxy ``D\" is WISEA~J084926.56-045715.9 or also 2MASS~J08492840-0457017.\nOne more galaxy labelled as ``E\" (WISEA J084926.56-045715.9) is located at the north-west edge with $z_{\\rm ph}=0.09\\pm0.01$.\nThe redshifts of these four galaxies are consistent with 0.08 which may also be the redshift of the ORC candidate.\nNote that the detected radio emission of this source resembles that of previously known ORCs. \nHowever, two collimated jets from galaxy ``B\", seen in the Very Large Array Sky Survey (VLASS) 2-4 GHz images\\footnote{http://cutouts.cirada.ca/} \\citep[][]{lacy20}, \nsuggest that it may also be a bent-tail radio galaxy with its too far outer tails forming a rare ring-like shape.\nA dedicated study of this radio source may help us to understand the physics of the previously known ORC J2058--5736 that \nalso have ring-shaped radio lobes \\citep[][]{norris21b}.\n\nWe find 4 four galaxy clusters within the $10^{\\prime}$ radius (closest one at a separation of $\\sim 3^{\\prime}$) of this radio source in the Canada France Hawaii Telescope Legacy Survey (CFHTLS) galaxy cluster catalogue \\citep[][]{durret11}.\nHowever they are all located at much higher redshifts, between 0.75 and 1.\nWe also look for possible associations with galaxy clusters in DESI survey \\citep[][]{zou21} and do not find any below $z=0.5$.\nHowever, the cluster catalogued as WHY~J084927.8--045741 with $z=0.0935$ \\citep[][]{wen18} lies within the ASKAP detected emission,\nand includes the group of galaxies seen in the left panel of Fig.~\\ref{FIG:ORCs2}\nIn Section~\\ref{SEC:Discussion}, we discuss a galaxy overdensity around this radio source.\n\n\\subsubsection{EMU-PS J222339.5--483449}\n\\label{SEC:ORC-2}\nThis ORC candidate is in the EMU-PS survey field and was also discovered serendipitously \\citep[][]{norris22prep}.\nWe independently rediscover this source using our machine learning technique.\nIt has a near circular morphology with diameter of $\\sim 80^{\\prime \\prime}$.\nFrom left to right, the top panels of Figure~\\ref{FIG:ORCCandidates1} show radio continuum image, radio contours overlaid on WISE-W1 infrared image, and a smaller cutout with the same size that is used to train the SOM.\nThe $12^{\\prime} \\times 12^{\\prime}$ radio continuum image shows that it has no association with any of the extended radio structures in its vicinity.\n\nWe identify four optical/infrared sources near this ORC candidate.\nThe right panel of Figure~\\ref{FIG:ORCs2} shows radio continuum contours overlaid on DES $gri$-color composite image.\nNear its geometrical centre, we find an optical/infrared source labelled ``A\". \nIt is WISEA J222339.73-483457.9, and DESI LS DR9 gives $z_{\\rm ph}=0.34\\pm0.04$ for it. \nIts morphological type is not known but the colors indicate that it is a passive galaxy.\n\nTowards the north-east edge, we find a galaxy (labelled ``B\") which is WISEA J222339.53-483524.8, or also 2MASS J22233951-4835247 at $z_{\\rm ph}=0.22\\pm0.02$.\nNear the southern edge, we identify another galaxy (labelled ``C\") which is WISEA J222343.07-483440.6, or also 2MASS J22234313-4834406, at $z_{\\rm ph}=0.23\\pm0.01$.\nAnother optical/infrared source labelled as ``D\" (WISEA J222337.80-483442.4) is seen due west of the radio source centre, with $z_{\\rm ph}=0.33\\pm0.04$.\n\nWe find one galaxy cluster at a separation of $\\sim 8^{\\prime}$ using the galaxy cluster catalogue from South Pole Telescope \\citep[SPT;][]{bleem15}.\nThis cluster is both far away from ORC candidate and is located at a much higher redshift of 0.65.\nWe also look for possible associations with galaxy clusters in the DESI survey \\citep[][]{zou21} and find one galaxy cluster at a separation of $\\sim 4^{\\prime}$ and $z=0.51\\pm0.02$.\nAs the maximum redshift among all optical/infrared sources is much smaller, this galaxy cluster is not likely to be associated with the ORC candidate.\nIn Section~\\ref{SEC:Discussion} we discuss other possibilities of association.\n\nOther than the ORC candidates, we also find several other peculiar radio morphologies among the 0.5\\% of sources with highest Euclidean distance.\nTable~\\ref{TAB:PEC-counterparts} shows the properties of infrared and optical sources near them.\nWe briefly describe these radio sources in the following sections.\n\n\\subsubsection{EMU-PS J213409.5--533631}\n\\label{SEC:PEC-1}\nThis peculiar radio source found in the EMU-PS consists of a group of distorted radio components, collectively known as \nPKS~2130--538 \\citep[][]{otrupcek91}, and nicknamed ''the dancing ghosts\" \\citep[see Figure~21 in][]{norris21}.\nThis radio source has the highest Euclidean distance which means that our algorithm classifies it as the most peculiar source in EMU-PS.\nThe top panels of Figure~\\ref{FIG:unusual_radio_shapes-1} show radio and infrared images.\nThe top left panel of Figure~\\ref{FIG:unusual_radio_shapes-1-3big} shows radio continuum contours overlaid on the DES 3-color ($gri$) composite image ($12^{\\prime} \\times 7^{\\prime}$).\nThese ''dancing ghosts\" are in galaxy cluster ABELL 3785 \\citep[][]{abell89}.\nThe twisted shape of this structure is possibly due to an interaction of a intergalactic wind with radio jets from two super massive black holes in lenticular galaxies ``A\" and ``C\" \\citep[][]{norris21}. \nThe two galaxies ``A\" and ``C\" shown in Figure~\\ref{FIG:unusual_radio_shapes-1-3big} have reported $z_{\\rm spec} =0.0763$ and $0.07836$, respectively \\citep[][]{lauer14}.\nThe galaxy ``B\" has $z_{\\rm ph} = 0.07444$ \\citep[][]{bilicki14}.\n\n\\subsubsection{EMU-PS J220026.3--561030}\n\\label{SEC:PEC-2}\nThis peculiar radio source also has a high Euclidean distance in the EMU-PS.\nThe middle panels of Figure~\\ref{FIG:unusual_radio_shapes-1} show radio and infrared images.\nThese images imply a circular morphology where radio jets are emitted from the galaxy nucleus and may have caused the jets to have bent in nearly half circles (analogous to a rotating garden sprinkler).\n\nWe identify two galaxies near this radio source.\nNear the geometrical centre of the structure, we find a bright elliptical galaxy 2MASX~J22002408-5610413 (WISEA J220024.11-561041.7) labelled as ``A\" with $z_{\\rm spec}=0.0757$ \\citep[][]{jones09}.\nAnother galaxy located towards the east, labelled ``B\", is 2MASX J22003234--5610273 (WISEA J220032.19-561026.0),\nwith $z_{\\rm ph} = 0.08\\pm 0.01$.\nThe rich galaxy cluster \\citep[ABELL 3826][]{abell89} at $z=0.075$ is centred $4.2^{\\prime}$ (or 0.36 Mpc) north-west of the elliptical galaxy ``A\".\nThis suggests that the shape of this radio source is induced by the cluster environment. \nFuture work should study the environmental effects leading to this shape in more detail.\n\n\\subsubsection{EMU-PS J215026.5--621006}\n\\label{SEC:PEC-3}\nThis peculiar radio source has a radio core and an extended emission towards west and north-east.\nThe bottom panels of Figure~\\ref{FIG:unusual_radio_shapes-1} show radio and infrared images.\nThe middle left panel of Figure~\\ref{FIG:unusual_radio_shapes-1-3big} shows radio continuum contours overlaid on the DES 3-color ($gri$) composite image ($12^{\\prime} \\times 12^{\\prime}$).\n\nWe identify three galaxies near the radio source.\nThe galaxy labelled as ``A\" in the center of circular structure is 2MASX J21502732-6210129 (WISEA J215027.29-621013.3)\nat $z_{\\rm ph}=0.07\\pm0.01$.\nThere is a lenticular galaxy (``B\") in the south-west direction where the extended emission towards the west starts and whose jet passes over the circular emission towards north-east. \nThis galaxy is 2MASX J21501790-6211070 (WISEA J215017.86-621106.4) with $z_{\\rm ph}=0.06\\pm0.01$.\nOne more galaxy labelled as ``C\" (2MASX J21502489-6208550, WISEA J215024.94-620854.5) has $z_{\\rm ph}=0.08\\pm0.01$ and is located towards the north edge of the radio source. \nHowever, this galaxy is unlikely to act as a host of any parts of the radio emission due to its position.\nWe find a previously identified galaxy group $\\sim 2^{\\prime}$ north-east from ``A\" \\citep[DZ2015 028;][]{diaz15}.\nThis suggests that the emission around the central galaxy is possibly due to emission from the group of galaxies.\nFuture work should study the group environmental effects leading to this radio shape in more detail.\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_145.823393_5.909243_17_9_2075.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/ORCs/Cutout_219.9101_0.5395_2_2_2737.pdf}\n\\caption{Diffuse radio emission from galaxy clusters: The two sources are in SWAG-X (top panels) and DINGO (bottom panels) surveys (see Section~\\ref{SEC:galaxyclusters}).\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nThe sky blue square and circle in the top right panel show central BCG positions of galaxy clusters MaxBCG J145.82575+05.91142 and WHL J094322.3+055537, respectively.\nIn the bottom right panel, sky blue square and circle show BCG positions of galaxy clusters HSCS J143930+003220 and WHL J143934.3+003153, respectively.\nBoth left and middle panels are $12^{\\prime} \\times 12^{\\prime}$ large and right panels are $5^{\\prime} \\times 5^{\\prime}$ large which is the same size that is used to train the SOM.} \n\\label{FIG:Gcl}\n\\end{figure*}\n\n\\subsubsection{SWAG-X J093803.4--015247}\n\\label{SEC:PEC-4}\nThis peculiar radio source is in the SWAG-X field.\nThe top panels of Figure~\\ref{FIG:unusual_radio_shapes-2} show the radio continuum image (left), radio contours overlaid on WISE-W1 infrared image (middle), \nand a smaller cutout with the same size that is used to train the SOM (right).\nThe $12^{\\prime} \\times 12^{\\prime}$ radio continuum image shows that it has no association with any of the nearby extended radio sources.\n\nThe middle right panel of Figure~\\ref{FIG:unusual_radio_shapes-1-3big} shows radio continuum contours overlaid on an SDSS 3-color ($gri$) composite image.\nWe find an optical/infrared object labelled as ``A\"(2MASS J09380334-0152480, WISEA J093803.35-015247.9) with $z_{\\rm ph}=0.22\\pm0.01$ near the geometrical center of the source.\nThis radio structure with a bright source at its center is possibly an end-on remnant radio galaxy, though it shows indications of a partial outer ring in radio emission similar to ORCs.\nFuture work should study this morphology in more detail.\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/Resolved_galaxies/Cutout_327.326496_-60.709897_8_5_2207.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/Resolved_galaxies/Cutout_145.516506_0.336339_48_16_14159.pdf}\n\\caption{Resolved star forming galaxies in EMU-PS survey: Top panels show NGC 7125, a spiral galaxy located at $z=0.01$. \nBottom panels show NGC 2967, a face-on star forming spiral galaxy at $z=0.0063$.\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nBoth left and middle panels are $12^{\\prime} \\times 12^{\\prime}$ large and right panels are $5^{\\prime} \\times 5^{\\prime}$ large which is the same size that is used to train the SOM.} \n\\label{FIG:Resolved_galaxies}\n\\end{figure*}\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/WATs/Cutout_322.32228_-50.870756_1_1_11809.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/WATs/Cutout_327.124216_-57.231407_61_37_25904.pdf}\n\\caption{Bent-tail (BT) radio galaxies in EMU-PS survey: Top and bottom panels show BT galaxies at $z=0.079$ and $z=0.081$, respectively.\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nBoth left and middle panels are $12^{\\prime} \\times 12^{\\prime}$ large and right panels are $5^{\\prime} \\times 5^{\\prime}$ large which is the same size that is used to train the SOM.} \n\\label{FIG:WATs}\n\\end{figure*}\n\n\\subsubsection{SWAG-X J085234.4+062801}\n\\label{SEC:PEC-5}\nThis peculiar radio morphology is also in the SWAG-X field.\nFrom left to right, the bottom panels of Figure~\\ref{FIG:unusual_radio_shapes-2} show the radio continuum image, radio contours overlaid on WISE-W1 infrared image, and a smaller cutout with the same size that is used to train the SOM.\n\nWe identify three galaxies near the edges of this structure.\nThe bottom panel of Figure~\\ref{FIG:unusual_radio_shapes-1-3big} shows radio continuum contours overlaid on a SDSS 3-color ($gri$) composite image.\nTowards the north edge, we find a galaxy ``A\" (2MASX J08523573+0628209, WISEA J085235.74+062821.1) at $z_{\\rm spec} = 0.15958$ (from SDSS).\nNear the south-west edge, we identify a galaxy ``B\" (SDSS J085232.56+062737.6, WISEA J085232.90+062731.9) at $z_{\\rm ph}=0.18\\pm0.06$.\nAnother optical/infrared object ``C\" lies due south-east (2MASS J08523531+0627206, WISEA J085235.31+062720.2) and has $z_{\\rm ph}=0.26\\pm0.05$. \nThe Gaia parallax ($3.5\\pm 0.1$ mas) and proper motion ($42.9\\pm 0.1$ mas/year) measurements suggests it to be a star.\n\nThe radio emission appears to be dominated by the two overlapping bright galaxies ``A\" and ``B\u201d.\nIn fact, galaxy ``A\u201d with mostly compact radio emission appears to be hosting a bent-tail jet that points toward ``B\u201d making a half circle.\nThe circular diffuse emission is possibly associated with ``A\u201d and/or ``B\u201d.\nTwo arcminutes north of galaxy ``A\", there is an extended radio source which appears to be unrelated to the diffuse emission from this source.\n\n\\subsection{Conventional Radio Morphologies}\n\\label{SEC:familiar}\nThe ORC candidates and other peculiar radio sources discussed in the previous section are the most unusual radio morphologies in the three ASKAP pilot surveys. \nThe rest of the top 0.5\\% radio sources have standard morphologies with known mechanisms of formation.\nThese conventional sources include the diffuse emission from galaxy clusters, resolved star forming galaxies, bent-tailed galaxies and Fanaroff-Riley sources.\nThese sources generally have more complex shapes and larger extent compared to the typical radio galaxies, and therefore have higher Euclidean distances than the rest of the data.\nThe discussion of all of these sources is out of the scope of the present work.\nHowever, we present some representative examples of these sources in this section.\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/FRI/Cutout_327.8733_-55.3381_2_2_13149.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/FRI/Cutout_311.475816_-51.075544_4_3_12499.pdf}\n\\caption{FR-I radio galaxies in EMU-PS: the top and bottom panels show bright extended radio sources with host galaxies 2MASX J21512991-5520124 at $z=0.0388$ and \n2MASX J20455226-5106267 at $z=0.0485$, respectively.\nThe description of the panels is the same as for Figure~\\ref{FIG:ORCCandidates0}.\nThe top left panel is $12^{\\prime} \\times 12^{\\prime}$, and the bottom left panel is $25^{\\prime} \\times 25^{\\prime}$ large. \nThe right panels are $5^{\\prime} \\times 5^{\\prime}$ large which is the same size that is used to train the SOM.} \n\\label{FIG:FRI}\n\\end{figure*}\n\n\\begin{figure*}[!ht]\n\\centering\n\\includegraphics[width=20cm, scale=0.5]{figures/FRII/Cutout_343.443638_-34.91259_1_1_825.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/FRII/Cutout_135.671411_3.545205_164_81_3855.pdf}\n\\includegraphics[width=20cm, scale=0.5]{figures/FRII/Cutout_324.254392_-61.329859_33_21_597.pdf}\n\\caption{FR-II radio galaxies in DINGO, SWAG-X and EMU-PS surveys: Top, middle and bottom panels show GRGs with host galaxies 2MASS J22533602-3455305 at $z_{\\rm sp}=0.2115$, 2MASS J09022915+0332041 at $z=0.25$ and 2MASX J21365159-6125128 at $z_{\\rm sp}=0.1249$, respectively.\nThe description of the panels is same as Figure~\\ref{FIG:ORCCandidates0}.\nBottom left panels are $18^{\\prime} \\times 18^{\\prime}$, and others are $12^{\\prime} \\times 12^{\\prime}$ large. \nRight panels are $5^{\\prime} \\times 5^{\\prime}$ large which is the same size that is used to train the SOM.} \n\\label{FIG:FRII}\n\\end{figure*}\n\n\\subsubsection{Diffuse emission from galaxy clusters}\n\\label{SEC:galaxyclusters}\nGalaxy clusters are usually detected in microwave \\citep[e.g.][]{planck13-29, bleem19, hilton21}, X-ray \\citep[e.g.][]{piffaretti11, liu21} and optical \\citep[e.g.][]{rykoff16} wavelengths.\nGalaxy clusters are known to have an overdensity of radio sources as compared to the field \\citep[e.g.][]{coble07, gupta17a, gupta20b}.\nRecently, a growing number of galaxy clusters are found to have sources with diffuse radio emission.\nThese sources are classified as radio halos, radio shocks (relics), and revived AGN fossil plasma sources \\citep[e.g.][]{weeren19, giovannini20}. \nWith the higher sensitivity of the new generation of radio telescopes like ASKAP, we expect to see diffuse emission from galaxy clusters.\nIn Figure~\\ref{FIG:Gcl}, we show two such systems at very high Euclidean distances from the SWAG-X and DINGO surveys.\n\nThe top panels show diffuse emission from the galaxy cluster MaxBCG J145.82575+05.91142 identified in the SDSS survey using the maxBCG red-sequence method \\citep[][]{koester07a}.\nThe sky-blue square in the right panel of the figure shows its brightest cluster galaxy (BCG).\nThis cluster is located at $z=0.094$ \\citep[][]{rozo15}. \nLess than $2^{\\prime}$ north-east of this system, there is an another known galaxy cluster located at $z=0.334$ \\citep[WHL J094322.3+055537;][]{wen12, wen15}.\nThe sky-blue circle in the right panel of the figure shows its brightest cluster galaxy (BCG).\n\nThe bottom panels show a rare diffuse radio emission possibly from two galaxy clusters at different redshifts.\nThe radio emission has the highest Euclidean distance score which means that it is the most peculiar source in the DINGO survey.\nWe find galaxy clusters HSCS J143936+003231 \\citep[$z=0.108$;][]{oguri18} and WHL J143934.3+003153 \\citep[$z=0.15$;][]{wen15} towards the north-west and west edges of the diffuse emission.\nThe sky-blue cross and circle in the right panel of the figure show BCGs of HSCS J143936+003231 and WHL J143934.3+003153, respectively.\nIt is not clear whether both or only one of these clusters have diffuse emission towards the east of their central BCG positions.\nFuture dedicated work should study the radio emission these galaxy clusters in more detail.\n\n\\subsubsection{Resolved star forming galaxies}\n\\label{SEC:resolvedGl}\nNearby edge-on and face-on star forming galaxies are usually detected in radio continuum images and H$\\alpha$ emission lines \\citep[e.g.][]{pogge93, colbert96}.\nIn all cases, infrared and radio continuum emissions are known to be correlated \\citep[e.g.][]{murphy06, vlahakis07, garn09, lacki10}.\nThe radio emission associated with these resolved galaxies has two well known components that correlate with the star formation rate i.e. the synchrotron emission from relativistic electrons accelerated by supernova remnants and the free\u2013free emission emerging directly from H-II regions containing massive ionizing stars \\citep[e.g.][]{condon92, murphy11, kennicutt12}.\n\nAmong the 0.5\\% sources at high Euclidean distances, we find many edge-on and face-on star forming galaxies.\nFigure~\\ref{FIG:Resolved_galaxies} shows two such resolved star-forming galaxies in EMU-PS survey.\nThe top panels show NGC~7125, a spiral galaxy located at $z_{\\rm sp}=0.0105$ \\citep[][]{wong06}.\nThis galaxy is also a part of the galaxy group PGC1 0067418 NED002 \\citep[][]{kourkchi17}.\nThe bottom panels show NGC~2967, a face-on star forming spiral galaxy at $z_{\\rm sp}=0.0063$ \\citep[][]{couto06}.\nThe star formation properties of the inner ring are known to be independent of the ring shape of this source \\citep[][]{grouchy10}.\n\n\\subsubsection{Bent-tailed sources}\n\\label{SEC:bent-tail}\nBent-Tailed (BT) radio sources are those where radio lobes and jets are not aligned linearly with the host galaxy.\nThese sources are broadly classified into Wide-Angle Tail (WAT) and Narrow-Angle Tail (NAT) radio galaxies. \nWATs are usually associated with central cluster galaxies and possess a pair of well-collimated jets with small opening angles ($\\leq 60^{\\circ}$).\nNATs have plumes of emission which are bent to such a degree\nthat their whole radio structure lies on one side of the optical host galaxy.\nBT radio galaxies are exclusively found in the most dense environments like galaxy clusters or groups \\citep[e.g.][]{mao09}.\nThe peculiar morphology of BT radio galaxies is typically a result of ram pressure stripping due to the relative movement of the host galaxy through an intra-cluster or intra-group medium \\citep[e.g.][]{gunn72, miley72, eilek84, sakelliou2000}.\n\nSeveral BT galaxies appear at high Euclidean distances among the top 0.5\\% sources.\nFigure~\\ref{FIG:WATs} shows two such galaxies in the EMU-PS survey.\nThe top panels show a BT radio galaxy near the ABELL 3771 cluster at $z=0.075$ \\citep[][]{martinez14}.\nThe bottom panels show another BT galaxy at $z=0.081$.\n\n\\subsubsection{FR-I and FR-II sources}\n\\label{SEC:FRI-II}\nThe morphologies of extended radio emission of radio galaxies are typically classified into two broad categories: \nFanaroff-Riley Class I (FR-I) and Class II (FR-II) sources \\citep[][]{fanaroff74}.\nFR-I radio galaxies generally have lower radio brightness with increasing distance from the host galaxy.\nFR-II radio galaxies often have linear jets that terminate in hotspots of large radio lobes.\nThus, FR-I and FR-II radio galaxies are typically described as edge-darkened and edge-brightened Active Galactic Nuclei (AGN), respectively.\n\nWe find several large scale FR-I sources among the 0.5\\% sources with largest Euclidean distances, and in\nFigure~\\ref{FIG:FRI} we show the two topmost such FR-I sources, both found in the EMU-PS survey.\nThe top panels show a bright FR-I source with a total projected angular size of $\\sim 12^{\\prime}$. \nThe host galaxy, 2MASX J21512991-5520124 with $z_{\\rm sp}=0.0388$ \\citep[][]{hernan95} is located in \nthe galaxy cluster MCXC J2151.3-5521 \\citep[][]{piffaretti11}.\nThe bottom panels show another FR-I radio source with host galaxy 2MASX J20455226-5106267 located at $z_{\\rm sp}=0.0485$ \\citep[][]{jones09} and radio emission extending over $\\sim 12^{\\prime}$.\nNote that the cutouts used to train the SOM are on the right panels and are too small to cover the full continuum emission of FR-I sources.\nDespite that the radio emission fills these cutouts to a large extent, we still find these sources at highest Euclidean distances.\n\nSeveral FR-II galaxies are also found among the top 0.5\\% sources. \nFigure~\\ref{FIG:FRII} shows three giant radio galaxies (GRGs) in the DINGO, SWAG-X, and EMU surveys.\nAll these sources appear at high Euclidean distances although the information that makes them peculiar to machine learning algorithm comes from the edge-brightened hotspots as shown in the right panels.\nThe top panels show a FR-II source with largest angular size (LAS) of $= 4.9^{\\prime}$ and projected largest linear size (LLS) of 1~Mpc. \nThe host galaxy 2MASS J22533602-3455305 is located at $z_{\\rm sp}=0.2115$ \\citep[][]{colless03}.\nThis GRG in Abell 3936 has been studied in detail by \\cite{seymour20}.\nIt shows continuous emission towards the east and a detached lobe towards the west.\nThe middle panels show linear radio jets from another FR-II source with host SDSS J090229.15+033204.3 (2MASS J09022915+0332041) at $z_{\\rm ph}=0.25$, LAS~$= 6.8^{\\prime}$ and LLS~$=1.6$~Mpc.\nThis source is a restarted radio galaxy that exhibits, in addition to the outer lobes, more recent double-lobed emission near the central galaxy.\nThe bottom panels show another FR-II source with potential host 2MASX J21365159-6125128 at $z_{\\rm sp}=0.1249$ \\citep[][]{colless03}, LAS~$= 11.1^{\\prime}$ and LLS~$=1.49$~Mpc. \nThe other potential host is 2MASS J21370099-6119472 at $z_{\\rm ph}=0.277\\pm0.054$ (DESI DR9) close to the north-east lobe would lead to LLS~$=2.56$~Mpc.\n\n\\section{Environment of ORC Candidates}\n\\label{SEC:Discussion}\nThe previously known three ORCs (ORC J2102\u20136200, ORC J2058\u2013573 and ORC J0102\u20132450, see Table~\\ref{TAB:all-orcs}) either lie in a significant overdensity or have a close companion \\citep[][]{norris21c, norris22}.\nThis suggests that the environment may be important in their formation.\nFor the two ORC candidates from the present work, we look for possible associations with low redshift galaxy clusters in Planck \\citep[][]{planck13-29}, Dark Energy Spectroscopic Instrument \\citep[DESI;][]{zou21} and Meta-catalogue of X-Ray Detected Clusters of Galaxies \\citep[MCXC;][]{piffaretti11} catalogues.\nWe do not find any galaxy cluster candidate within $10^{\\prime}$ from the centre of ORCs in the redshift range of their optical sources (see Table~\\ref{TAB:ORC-counterparts}).\n\n\\begin{figure\n\\centering\n\\includegraphics[width=7.2cm, scale=0.5]{figures/SWAG_X_J084927.5-045721_A.pdf}\n\\includegraphics[width=7.2cm, scale=0.5]{figures/SWAG_X_J084927.5-045721_B.pdf}\n\\includegraphics[width=7.2cm, scale=0.5]{figures/EMU-PS_J222339.5-483449_A.pdf}\n\\includegraphics[width=7.2cm, scale=0.5]{figures/EMU-PS_J222339.5-483449_B.pdf}\n\\caption{Number of DESI DR8 galaxies in a circle of $5^{\\prime}$ radius centered at the sources in ORC candidates (red circles). \nThe top two panels show the galaxy number density around SWAG-X J084927.5-045721 ``A\" and ``B\", and the bottom two panels for EMU-PS J222339.5--483449 \u201cA\" and \u201cB\" objects.\nFor comparison with field densities around these sources, we show galaxy counts in circles of the same radius sliding over the RA range indicated on the X-axis but keeping the Dec fixed (black dashed lines).\nThe green dot-dashed lines show average number of galaxies in the RA range.\nGiven the redshift uncertainties of ORC candidate sources (Table~\\ref{TAB:ORC-counterparts}), we restrict DESI galaxies within $z<0.07$, $0.07