patrickvonplaten
commited on
Commit
•
fa3a8ef
1
Parent(s):
e794dfc
Update README.md
Browse files
README.md
CHANGED
@@ -17,3 +17,713 @@ This is the official *led-large-16384* checkpoint that is fine-tuned on the arXi
|
|
17 |
## Evaluation on downstream task
|
18 |
|
19 |
[This notebook](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing) shows how *led-large-16384-arxiv* can be evaluated on the [arxiv dataset](https://huggingface.co/datasets/scientific_papers)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Evaluation on downstream task
|
18 |
|
19 |
[This notebook](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing) shows how *led-large-16384-arxiv* can be evaluated on the [arxiv dataset](https://huggingface.co/datasets/scientific_papers)
|
20 |
+
|
21 |
+
## Usage
|
22 |
+
|
23 |
+
The model can be used as follows. The input is taken from the test data of the [arxiv dataset](https://huggingface.co/datasets/scientific_papers).
|
24 |
+
|
25 |
+
```python
|
26 |
+
LONG_ARTICLE = """"for about 20 years the problem of properties of
|
27 |
+
short - term changes of solar activity has been
|
28 |
+
considered extensively . many investigators
|
29 |
+
studied the short - term periodicities of the
|
30 |
+
various indices of solar activity . several
|
31 |
+
periodicities were detected , but the
|
32 |
+
periodicities about 155 days and from the interval
|
33 |
+
of @xmath3 $ ] days ( @xmath4 $ ] years ) are
|
34 |
+
mentioned most often . first of them was
|
35 |
+
discovered by @xcite in the occurence rate of
|
36 |
+
gamma - ray flares detected by the gamma - ray
|
37 |
+
spectrometer aboard the _ solar maximum mission (
|
38 |
+
smm ) . this periodicity was confirmed for other
|
39 |
+
solar flares data and for the same time period
|
40 |
+
@xcite . it was also found in proton flares during
|
41 |
+
solar cycles 19 and 20 @xcite , but it was not
|
42 |
+
found in the solar flares data during solar cycles
|
43 |
+
22 @xcite . _ several autors confirmed above
|
44 |
+
results for the daily sunspot area data . @xcite
|
45 |
+
studied the sunspot data from 18741984 . she found
|
46 |
+
the 155-day periodicity in data records from 31
|
47 |
+
years . this periodicity is always characteristic
|
48 |
+
for one of the solar hemispheres ( the southern
|
49 |
+
hemisphere for cycles 1215 and the northern
|
50 |
+
hemisphere for cycles 1621 ) . moreover , it is
|
51 |
+
only present during epochs of maximum activity (
|
52 |
+
in episodes of 13 years ) .
|
53 |
+
similarinvestigationswerecarriedoutby + @xcite .
|
54 |
+
they applied the same power spectrum method as
|
55 |
+
lean , but the daily sunspot area data ( cycles
|
56 |
+
1221 ) were divided into 10 shorter time series .
|
57 |
+
the periodicities were searched for the frequency
|
58 |
+
interval 57115 nhz ( 100200 days ) and for each of
|
59 |
+
10 time series . the authors showed that the
|
60 |
+
periodicity between 150160 days is statistically
|
61 |
+
significant during all cycles from 16 to 21 . the
|
62 |
+
considered peaks were remained unaltered after
|
63 |
+
removing the 11-year cycle and applying the power
|
64 |
+
spectrum analysis . @xcite used the wavelet
|
65 |
+
technique for the daily sunspot areas between 1874
|
66 |
+
and 1993 . they determined the epochs of
|
67 |
+
appearance of this periodicity and concluded that
|
68 |
+
it presents around the maximum activity period in
|
69 |
+
cycles 16 to 21 . moreover , the power of this
|
70 |
+
periodicity started growing at cycle 19 ,
|
71 |
+
decreased in cycles 20 and 21 and disappered after
|
72 |
+
cycle 21 . similaranalyseswerepresentedby + @xcite
|
73 |
+
, but for sunspot number , solar wind plasma ,
|
74 |
+
interplanetary magnetic field and geomagnetic
|
75 |
+
activity index @xmath5 . during 1964 - 2000 the
|
76 |
+
sunspot number wavelet power of periods less than
|
77 |
+
one year shows a cyclic evolution with the phase
|
78 |
+
of the solar cycle.the 154-day period is prominent
|
79 |
+
and its strenth is stronger around the 1982 - 1984
|
80 |
+
interval in almost all solar wind parameters . the
|
81 |
+
existence of the 156-day periodicity in sunspot
|
82 |
+
data were confirmed by @xcite . they considered
|
83 |
+
the possible relation between the 475-day (
|
84 |
+
1.3-year ) and 156-day periodicities . the 475-day
|
85 |
+
( 1.3-year ) periodicity was also detected in
|
86 |
+
variations of the interplanetary magnetic field ,
|
87 |
+
geomagnetic activity helioseismic data and in the
|
88 |
+
solar wind speed @xcite . @xcite concluded that
|
89 |
+
the region of larger wavelet power shifts from
|
90 |
+
475-day ( 1.3-year ) period to 620-day ( 1.7-year
|
91 |
+
) period and then back to 475-day ( 1.3-year ) .
|
92 |
+
the periodicities from the interval @xmath6 $ ]
|
93 |
+
days ( @xmath4 $ ] years ) have been considered
|
94 |
+
from 1968 . @xcite mentioned a 16.3-month (
|
95 |
+
490-day ) periodicity in the sunspot numbers and
|
96 |
+
in the geomagnetic data . @xcite analysed the
|
97 |
+
occurrence rate of major flares during solar
|
98 |
+
cycles 19 . they found a 18-month ( 540-day )
|
99 |
+
periodicity in flare rate of the norhern
|
100 |
+
hemisphere . @xcite confirmed this result for the
|
101 |
+
@xmath7 flare data for solar cycles 20 and 21 and
|
102 |
+
found a peak in the power spectra near 510540 days
|
103 |
+
. @xcite found a 17-month ( 510-day ) periodicity
|
104 |
+
of sunspot groups and their areas from 1969 to
|
105 |
+
1986 . these authors concluded that the length of
|
106 |
+
this period is variable and the reason of this
|
107 |
+
periodicity is still not understood . @xcite and +
|
108 |
+
@xcite obtained statistically significant peaks of
|
109 |
+
power at around 158 days for daily sunspot data
|
110 |
+
from 1923 - 1933 ( cycle 16 ) . in this paper the
|
111 |
+
problem of the existence of this periodicity for
|
112 |
+
sunspot data from cycle 16 is considered . the
|
113 |
+
daily sunspot areas , the mean sunspot areas per
|
114 |
+
carrington rotation , the monthly sunspot numbers
|
115 |
+
and their fluctuations , which are obtained after
|
116 |
+
removing the 11-year cycle are analysed . in
|
117 |
+
section 2 the properties of the power spectrum
|
118 |
+
methods are described . in section 3 a new
|
119 |
+
approach to the problem of aliases in the power
|
120 |
+
spectrum analysis is presented . in section 4
|
121 |
+
numerical results of the new method of the
|
122 |
+
diagnosis of an echo - effect for sunspot area
|
123 |
+
data are discussed . in section 5 the problem of
|
124 |
+
the existence of the periodicity of about 155 days
|
125 |
+
during the maximum activity period for sunspot
|
126 |
+
data from the whole solar disk and from each solar
|
127 |
+
hemisphere separately is considered . to find
|
128 |
+
periodicities in a given time series the power
|
129 |
+
spectrum analysis is applied . in this paper two
|
130 |
+
methods are used : the fast fourier transformation
|
131 |
+
algorithm with the hamming window function ( fft )
|
132 |
+
and the blackman - tukey ( bt ) power spectrum
|
133 |
+
method @xcite . the bt method is used for the
|
134 |
+
diagnosis of the reasons of the existence of peaks
|
135 |
+
, which are obtained by the fft method . the bt
|
136 |
+
method consists in the smoothing of a cosine
|
137 |
+
transform of an autocorrelation function using a
|
138 |
+
3-point weighting average . such an estimator is
|
139 |
+
consistent and unbiased . moreover , the peaks are
|
140 |
+
uncorrelated and their sum is a variance of a
|
141 |
+
considered time series . the main disadvantage of
|
142 |
+
this method is a weak resolution of the
|
143 |
+
periodogram points , particularly for low
|
144 |
+
frequences . for example , if the autocorrelation
|
145 |
+
function is evaluated for @xmath8 , then the
|
146 |
+
distribution points in the time domain are :
|
147 |
+
@xmath9 thus , it is obvious that this method
|
148 |
+
should not be used for detecting low frequency
|
149 |
+
periodicities with a fairly good resolution .
|
150 |
+
however , because of an application of the
|
151 |
+
autocorrelation function , the bt method can be
|
152 |
+
used to verify a reality of peaks which are
|
153 |
+
computed using a method giving the better
|
154 |
+
resolution ( for example the fft method ) . it is
|
155 |
+
valuable to remember that the power spectrum
|
156 |
+
methods should be applied very carefully . the
|
157 |
+
difficulties in the interpretation of significant
|
158 |
+
peaks could be caused by at least four effects : a
|
159 |
+
sampling of a continuos function , an echo -
|
160 |
+
effect , a contribution of long - term
|
161 |
+
periodicities and a random noise . first effect
|
162 |
+
exists because periodicities , which are shorter
|
163 |
+
than the sampling interval , may mix with longer
|
164 |
+
periodicities . in result , this effect can be
|
165 |
+
reduced by an decrease of the sampling interval
|
166 |
+
between observations . the echo - effect occurs
|
167 |
+
when there is a latent harmonic of frequency
|
168 |
+
@xmath10 in the time series , giving a spectral
|
169 |
+
peak at @xmath10 , and also periodic terms of
|
170 |
+
frequency @xmath11 etc . this may be detected by
|
171 |
+
the autocorrelation function for time series with
|
172 |
+
a large variance . time series often contain long
|
173 |
+
- term periodicities , that influence short - term
|
174 |
+
peaks . they could rise periodogram s peaks at
|
175 |
+
lower frequencies . however , it is also easy to
|
176 |
+
notice the influence of the long - term
|
177 |
+
periodicities on short - term peaks in the graphs
|
178 |
+
of the autocorrelation functions . this effect is
|
179 |
+
observed for the time series of solar activity
|
180 |
+
indexes which are limited by the 11-year cycle .
|
181 |
+
to find statistically significant periodicities it
|
182 |
+
is reasonable to use the autocorrelation function
|
183 |
+
and the power spectrum method with a high
|
184 |
+
resolution . in the case of a stationary time
|
185 |
+
series they give similar results . moreover , for
|
186 |
+
a stationary time series with the mean zero the
|
187 |
+
fourier transform is equivalent to the cosine
|
188 |
+
transform of an autocorrelation function @xcite .
|
189 |
+
thus , after a comparison of a periodogram with an
|
190 |
+
appropriate autocorrelation function one can
|
191 |
+
detect peaks which are in the graph of the first
|
192 |
+
function and do not exist in the graph of the
|
193 |
+
second function . the reasons of their existence
|
194 |
+
could be explained by the long - term
|
195 |
+
periodicities and the echo - effect . below method
|
196 |
+
enables one to detect these effects . ( solid line
|
197 |
+
) and the 95% confidence level basing on thered
|
198 |
+
noise ( dotted line ) . the periodogram values are
|
199 |
+
presented on the left axis . the lower curve
|
200 |
+
illustrates the autocorrelation function of the
|
201 |
+
same time series ( solid line ) . the dotted lines
|
202 |
+
represent two standard errors of the
|
203 |
+
autocorrelation function . the dashed horizontal
|
204 |
+
line shows the zero level . the autocorrelation
|
205 |
+
values are shown in the right axis . ] because
|
206 |
+
the statistical tests indicate that the time
|
207 |
+
series is a white noise the confidence level is
|
208 |
+
not marked . ] . ] the method of the diagnosis
|
209 |
+
of an echo - effect in the power spectrum ( de )
|
210 |
+
consists in an analysis of a periodogram of a
|
211 |
+
given time series computed using the bt method .
|
212 |
+
the bt method bases on the cosine transform of the
|
213 |
+
autocorrelation function which creates peaks which
|
214 |
+
are in the periodogram , but not in the
|
215 |
+
autocorrelation function . the de method is used
|
216 |
+
for peaks which are computed by the fft method (
|
217 |
+
with high resolution ) and are statistically
|
218 |
+
significant . the time series of sunspot activity
|
219 |
+
indexes with the spacing interval one rotation or
|
220 |
+
one month contain a markov - type persistence ,
|
221 |
+
which means a tendency for the successive values
|
222 |
+
of the time series to remember their antecendent
|
223 |
+
values . thus , i use a confidence level basing on
|
224 |
+
the red noise of markov @xcite for the choice of
|
225 |
+
the significant peaks of the periodogram computed
|
226 |
+
by the fft method . when a time series does not
|
227 |
+
contain the markov - type persistence i apply the
|
228 |
+
fisher test and the kolmogorov - smirnov test at
|
229 |
+
the significance level @xmath12 @xcite to verify a
|
230 |
+
statistically significance of periodograms peaks .
|
231 |
+
the fisher test checks the null hypothesis that
|
232 |
+
the time series is white noise agains the
|
233 |
+
alternative hypothesis that the time series
|
234 |
+
contains an added deterministic periodic component
|
235 |
+
of unspecified frequency . because the fisher test
|
236 |
+
tends to be severe in rejecting peaks as
|
237 |
+
insignificant the kolmogorov - smirnov test is
|
238 |
+
also used . the de method analyses raw estimators
|
239 |
+
of the power spectrum . they are given as follows
|
240 |
+
@xmath13 for @xmath14 + where @xmath15 for
|
241 |
+
@xmath16 + @xmath17 is the length of the time
|
242 |
+
series @xmath18 and @xmath19 is the mean value .
|
243 |
+
the first term of the estimator @xmath20 is
|
244 |
+
constant . the second term takes two values (
|
245 |
+
depending on odd or even @xmath21 ) which are not
|
246 |
+
significant because @xmath22 for large m. thus ,
|
247 |
+
the third term of ( 1 ) should be analysed .
|
248 |
+
looking for intervals of @xmath23 for which
|
249 |
+
@xmath24 has the same sign and different signs one
|
250 |
+
can find such parts of the function @xmath25 which
|
251 |
+
create the value @xmath20 . let the set of values
|
252 |
+
of the independent variable of the autocorrelation
|
253 |
+
function be called @xmath26 and it can be divided
|
254 |
+
into the sums of disjoint sets : @xmath27 where +
|
255 |
+
@xmath28 + @xmath29 @xmath30 @xmath31 + @xmath32 +
|
256 |
+
@xmath33 @xmath34 @xmath35 @xmath36 @xmath37
|
257 |
+
@xmath38 @xmath39 @xmath40 well , the set
|
258 |
+
@xmath41 contains all integer values of @xmath23
|
259 |
+
from the interval of @xmath42 for which the
|
260 |
+
autocorrelation function and the cosinus function
|
261 |
+
with the period @xmath43 $ ] are positive . the
|
262 |
+
index @xmath44 indicates successive parts of the
|
263 |
+
cosinus function for which the cosinuses of
|
264 |
+
successive values of @xmath23 have the same sign .
|
265 |
+
however , sometimes the set @xmath41 can be empty
|
266 |
+
. for example , for @xmath45 and @xmath46 the set
|
267 |
+
@xmath47 should contain all @xmath48 $ ] for which
|
268 |
+
@xmath49 and @xmath50 , but for such values of
|
269 |
+
@xmath23 the values of @xmath51 are negative .
|
270 |
+
thus , the set @xmath47 is empty . . the
|
271 |
+
periodogram values are presented on the left axis
|
272 |
+
. the lower curve illustrates the autocorrelation
|
273 |
+
function of the same time series . the
|
274 |
+
autocorrelation values are shown in the right axis
|
275 |
+
. ] let us take into consideration all sets
|
276 |
+
\{@xmath52 } , \{@xmath53 } and \{@xmath41 } which
|
277 |
+
are not empty . because numberings and power of
|
278 |
+
these sets depend on the form of the
|
279 |
+
autocorrelation function of the given time series
|
280 |
+
, it is impossible to establish them arbitrary .
|
281 |
+
thus , the sets of appropriate indexes of the sets
|
282 |
+
\{@xmath52 } , \{@xmath53 } and \{@xmath41 } are
|
283 |
+
called @xmath54 , @xmath55 and @xmath56
|
284 |
+
respectively . for example the set @xmath56
|
285 |
+
contains all @xmath44 from the set @xmath57 for
|
286 |
+
which the sets @xmath41 are not empty . to
|
287 |
+
separate quantitatively in the estimator @xmath20
|
288 |
+
the positive contributions which are originated by
|
289 |
+
the cases described by the formula ( 5 ) from the
|
290 |
+
cases which are described by the formula ( 3 ) the
|
291 |
+
following indexes are introduced : @xmath58
|
292 |
+
@xmath59 @xmath60 @xmath61 where @xmath62 @xmath63
|
293 |
+
@xmath64 taking for the empty sets \{@xmath53 }
|
294 |
+
and \{@xmath41 } the indices @xmath65 and @xmath66
|
295 |
+
equal zero . the index @xmath65 describes a
|
296 |
+
percentage of the contribution of the case when
|
297 |
+
@xmath25 and @xmath51 are positive to the positive
|
298 |
+
part of the third term of the sum ( 1 ) . the
|
299 |
+
index @xmath66 describes a similar contribution ,
|
300 |
+
but for the case when the both @xmath25 and
|
301 |
+
@xmath51 are simultaneously negative . thanks to
|
302 |
+
these one can decide which the positive or the
|
303 |
+
negative values of the autocorrelation function
|
304 |
+
have a larger contribution to the positive values
|
305 |
+
of the estimator @xmath20 . when the difference
|
306 |
+
@xmath67 is positive , the statement the
|
307 |
+
@xmath21-th peak really exists can not be rejected
|
308 |
+
. thus , the following formula should be satisfied
|
309 |
+
: @xmath68 because the @xmath21-th peak could
|
310 |
+
exist as a result of the echo - effect , it is
|
311 |
+
necessary to verify the second condition :
|
312 |
+
@xmath69\in c_m.\ ] ] . the periodogram values
|
313 |
+
are presented on the left axis . the lower curve
|
314 |
+
illustrates the autocorrelation function of the
|
315 |
+
same time series ( solid line ) . the dotted lines
|
316 |
+
represent two standard errors of the
|
317 |
+
autocorrelation function . the dashed horizontal
|
318 |
+
line shows the zero level . the autocorrelation
|
319 |
+
values are shown in the right axis . ] to
|
320 |
+
verify the implication ( 8) firstly it is
|
321 |
+
necessary to evaluate the sets @xmath41 for
|
322 |
+
@xmath70 of the values of @xmath23 for which the
|
323 |
+
autocorrelation function and the cosine function
|
324 |
+
with the period @xmath71 $ ] are positive and the
|
325 |
+
sets @xmath72 of values of @xmath23 for which the
|
326 |
+
autocorrelation function and the cosine function
|
327 |
+
with the period @xmath43 $ ] are negative .
|
328 |
+
secondly , a percentage of the contribution of the
|
329 |
+
sum of products of positive values of @xmath25 and
|
330 |
+
@xmath51 to the sum of positive products of the
|
331 |
+
values of @xmath25 and @xmath51 should be
|
332 |
+
evaluated . as a result the indexes @xmath65 for
|
333 |
+
each set @xmath41 where @xmath44 is the index from
|
334 |
+
the set @xmath56 are obtained . thirdly , from all
|
335 |
+
sets @xmath41 such that @xmath70 the set @xmath73
|
336 |
+
for which the index @xmath65 is the greatest
|
337 |
+
should be chosen . the implication ( 8) is true
|
338 |
+
when the set @xmath73 includes the considered
|
339 |
+
period @xmath43 $ ] . this means that the greatest
|
340 |
+
contribution of positive values of the
|
341 |
+
autocorrelation function and positive cosines with
|
342 |
+
the period @xmath43 $ ] to the periodogram value
|
343 |
+
@xmath20 is caused by the sum of positive products
|
344 |
+
of @xmath74 for each @xmath75-\frac{m}{2k},[\frac{
|
345 |
+
2m}{k}]+\frac{m}{2k})$ ] . when the implication
|
346 |
+
( 8) is false , the peak @xmath20 is mainly
|
347 |
+
created by the sum of positive products of
|
348 |
+
@xmath74 for each @xmath76-\frac{m}{2k},\big [
|
349 |
+
\frac{2m}{n}\big ] + \frac{m}{2k } \big ) $ ] ,
|
350 |
+
where @xmath77 is a multiple or a divisor of
|
351 |
+
@xmath21 . it is necessary to add , that the de
|
352 |
+
method should be applied to the periodograms peaks
|
353 |
+
, which probably exist because of the echo -
|
354 |
+
effect . it enables one to find such parts of the
|
355 |
+
autocorrelation function , which have the
|
356 |
+
significant contribution to the considered peak .
|
357 |
+
the fact , that the conditions ( 7 ) and ( 8) are
|
358 |
+
satisfied , can unambiguously decide about the
|
359 |
+
existence of the considered periodicity in the
|
360 |
+
given time series , but if at least one of them is
|
361 |
+
not satisfied , one can doubt about the existence
|
362 |
+
of the considered periodicity . thus , in such
|
363 |
+
cases the sentence the peak can not be treated as
|
364 |
+
true should be used . using the de method it is
|
365 |
+
necessary to remember about the power of the set
|
366 |
+
@xmath78 . if @xmath79 is too large , errors of an
|
367 |
+
autocorrelation function estimation appear . they
|
368 |
+
are caused by the finite length of the given time
|
369 |
+
series and as a result additional peaks of the
|
370 |
+
periodogram occur . if @xmath79 is too small ,
|
371 |
+
there are less peaks because of a low resolution
|
372 |
+
of the periodogram . in applications @xmath80 is
|
373 |
+
used . in order to evaluate the value @xmath79 the
|
374 |
+
fft method is used . the periodograms computed by
|
375 |
+
the bt and the fft method are compared . the
|
376 |
+
conformity of them enables one to obtain the value
|
377 |
+
@xmath79 . . the fft periodogram values are
|
378 |
+
presented on the left axis . the lower curve
|
379 |
+
illustrates the bt periodogram of the same time
|
380 |
+
series ( solid line and large black circles ) .
|
381 |
+
the bt periodogram values are shown in the right
|
382 |
+
axis . ] in this paper the sunspot activity data (
|
383 |
+
august 1923 - october 1933 ) provided by the
|
384 |
+
greenwich photoheliographic results ( gpr ) are
|
385 |
+
analysed . firstly , i consider the monthly
|
386 |
+
sunspot number data . to eliminate the 11-year
|
387 |
+
trend from these data , the consecutively smoothed
|
388 |
+
monthly sunspot number @xmath81 is subtracted from
|
389 |
+
the monthly sunspot number @xmath82 where the
|
390 |
+
consecutive mean @xmath83 is given by @xmath84 the
|
391 |
+
values @xmath83 for @xmath85 and @xmath86 are
|
392 |
+
calculated using additional data from last six
|
393 |
+
months of cycle 15 and first six months of cycle
|
394 |
+
17 . because of the north - south asymmetry of
|
395 |
+
various solar indices @xcite , the sunspot
|
396 |
+
activity is considered for each solar hemisphere
|
397 |
+
separately . analogously to the monthly sunspot
|
398 |
+
numbers , the time series of sunspot areas in the
|
399 |
+
northern and southern hemispheres with the spacing
|
400 |
+
interval @xmath87 rotation are denoted . in order
|
401 |
+
to find periodicities , the following time series
|
402 |
+
are used : + @xmath88 + @xmath89 + @xmath90
|
403 |
+
+ in the lower part of figure [ f1 ] the
|
404 |
+
autocorrelation function of the time series for
|
405 |
+
the northern hemisphere @xmath88 is shown . it is
|
406 |
+
easy to notice that the prominent peak falls at 17
|
407 |
+
rotations interval ( 459 days ) and @xmath25 for
|
408 |
+
@xmath91 $ ] rotations ( [ 81 , 162 ] days ) are
|
409 |
+
significantly negative . the periodogram of the
|
410 |
+
time series @xmath88 ( see the upper curve in
|
411 |
+
figures [ f1 ] ) does not show the significant
|
412 |
+
peaks at @xmath92 rotations ( 135 , 162 days ) ,
|
413 |
+
but there is the significant peak at @xmath93 (
|
414 |
+
243 days ) . the peaks at @xmath94 are close to
|
415 |
+
the peaks of the autocorrelation function . thus ,
|
416 |
+
the result obtained for the periodicity at about
|
417 |
+
@xmath0 days are contradict to the results
|
418 |
+
obtained for the time series of daily sunspot
|
419 |
+
areas @xcite . for the southern hemisphere (
|
420 |
+
the lower curve in figure [ f2 ] ) @xmath25 for
|
421 |
+
@xmath95 $ ] rotations ( [ 54 , 189 ] days ) is
|
422 |
+
not positive except @xmath96 ( 135 days ) for
|
423 |
+
which @xmath97 is not statistically significant .
|
424 |
+
the upper curve in figures [ f2 ] presents the
|
425 |
+
periodogram of the time series @xmath89 . this
|
426 |
+
time series does not contain a markov - type
|
427 |
+
persistence . moreover , the kolmogorov - smirnov
|
428 |
+
test and the fisher test do not reject a null
|
429 |
+
hypothesis that the time series is a white noise
|
430 |
+
only . this means that the time series do not
|
431 |
+
contain an added deterministic periodic component
|
432 |
+
of unspecified frequency . the autocorrelation
|
433 |
+
function of the time series @xmath90 ( the lower
|
434 |
+
curve in figure [ f3 ] ) has only one
|
435 |
+
statistically significant peak for @xmath98 months
|
436 |
+
( 480 days ) and negative values for @xmath99 $ ]
|
437 |
+
months ( [ 90 , 390 ] days ) . however , the
|
438 |
+
periodogram of this time series ( the upper curve
|
439 |
+
in figure [ f3 ] ) has two significant peaks the
|
440 |
+
first at 15.2 and the second at 5.3 months ( 456 ,
|
441 |
+
159 days ) . thus , the periodogram contains the
|
442 |
+
significant peak , although the autocorrelation
|
443 |
+
function has the negative value at @xmath100
|
444 |
+
months . to explain these problems two
|
445 |
+
following time series of daily sunspot areas are
|
446 |
+
considered : + @xmath101 + @xmath102 + where
|
447 |
+
@xmath103 the values @xmath104 for @xmath105
|
448 |
+
and @xmath106 are calculated using additional
|
449 |
+
daily data from the solar cycles 15 and 17 .
|
450 |
+
and the cosine function for @xmath45 ( the period
|
451 |
+
at about 154 days ) . the horizontal line ( dotted
|
452 |
+
line ) shows the zero level . the vertical dotted
|
453 |
+
lines evaluate the intervals where the sets
|
454 |
+
@xmath107 ( for @xmath108 ) are searched . the
|
455 |
+
percentage values show the index @xmath65 for each
|
456 |
+
@xmath41 for the time series @xmath102 ( in
|
457 |
+
parentheses for the time series @xmath101 ) . in
|
458 |
+
the right bottom corner the values of @xmath65 for
|
459 |
+
the time series @xmath102 , for @xmath109 are
|
460 |
+
written . ] ( the 500-day period ) ] the
|
461 |
+
comparison of the functions @xmath25 of the time
|
462 |
+
series @xmath101 ( the lower curve in figure [ f4
|
463 |
+
] ) and @xmath102 ( the lower curve in figure [ f5
|
464 |
+
] ) suggests that the positive values of the
|
465 |
+
function @xmath110 of the time series @xmath101 in
|
466 |
+
the interval of @xmath111 $ ] days could be caused
|
467 |
+
by the 11-year cycle . this effect is not visible
|
468 |
+
in the case of periodograms of the both time
|
469 |
+
series computed using the fft method ( see the
|
470 |
+
upper curves in figures [ f4 ] and [ f5 ] ) or the
|
471 |
+
bt method ( see the lower curve in figure [ f6 ] )
|
472 |
+
. moreover , the periodogram of the time series
|
473 |
+
@xmath102 has the significant values at @xmath112
|
474 |
+
days , but the autocorrelation function is
|
475 |
+
negative at these points . @xcite showed that the
|
476 |
+
lomb - scargle periodograms for the both time
|
477 |
+
series ( see @xcite , figures 7 a - c ) have a
|
478 |
+
peak at 158.8 days which stands over the fap level
|
479 |
+
by a significant amount . using the de method the
|
480 |
+
above discrepancies are obvious . to establish the
|
481 |
+
@xmath79 value the periodograms computed by the
|
482 |
+
fft and the bt methods are shown in figure [ f6 ]
|
483 |
+
( the upper and the lower curve respectively ) .
|
484 |
+
for @xmath46 and for periods less than 166 days
|
485 |
+
there is a good comformity of the both
|
486 |
+
periodograms ( but for periods greater than 166
|
487 |
+
days the points of the bt periodogram are not
|
488 |
+
linked because the bt periodogram has much worse
|
489 |
+
resolution than the fft periodogram ( no one know
|
490 |
+
how to do it ) ) . for @xmath46 and @xmath113 the
|
491 |
+
value of @xmath21 is 13 ( @xmath71=153 $ ] ) . the
|
492 |
+
inequality ( 7 ) is satisfied because @xmath114 .
|
493 |
+
this means that the value of @xmath115 is mainly
|
494 |
+
created by positive values of the autocorrelation
|
495 |
+
function . the implication ( 8) needs an
|
496 |
+
evaluation of the greatest value of the index
|
497 |
+
@xmath65 where @xmath70 , but the solar data
|
498 |
+
contain the most prominent period for @xmath116
|
499 |
+
days because of the solar rotation . thus ,
|
500 |
+
although @xmath117 for each @xmath118 , all sets
|
501 |
+
@xmath41 ( see ( 5 ) and ( 6 ) ) without the set
|
502 |
+
@xmath119 ( see ( 4 ) ) , which contains @xmath120
|
503 |
+
$ ] , are considered . this situation is presented
|
504 |
+
in figure [ f7 ] . in this figure two curves
|
505 |
+
@xmath121 and @xmath122 are plotted . the vertical
|
506 |
+
dotted lines evaluate the intervals where the sets
|
507 |
+
@xmath107 ( for @xmath123 ) are searched . for
|
508 |
+
such @xmath41 two numbers are written : in
|
509 |
+
parentheses the value of @xmath65 for the time
|
510 |
+
series @xmath101 and above it the value of
|
511 |
+
@xmath65 for the time series @xmath102 . to make
|
512 |
+
this figure clear the curves are plotted for the
|
513 |
+
set @xmath124 only . ( in the right bottom corner
|
514 |
+
information about the values of @xmath65 for the
|
515 |
+
time series @xmath102 , for @xmath109 are written
|
516 |
+
. ) the implication ( 8) is not true , because
|
517 |
+
@xmath125 for @xmath126 . therefore ,
|
518 |
+
@xmath43=153\notin c_6=[423,500]$ ] . moreover ,
|
519 |
+
the autocorrelation function for @xmath127 $ ] is
|
520 |
+
negative and the set @xmath128 is empty . thus ,
|
521 |
+
@xmath129 . on the basis of these information one
|
522 |
+
can state , that the periodogram peak at @xmath130
|
523 |
+
days of the time series @xmath102 exists because
|
524 |
+
of positive @xmath25 , but for @xmath23 from the
|
525 |
+
intervals which do not contain this period .
|
526 |
+
looking at the values of @xmath65 of the time
|
527 |
+
series @xmath101 , one can notice that they
|
528 |
+
decrease when @xmath23 increases until @xmath131 .
|
529 |
+
this indicates , that when @xmath23 increases ,
|
530 |
+
the contribution of the 11-year cycle to the peaks
|
531 |
+
of the periodogram decreases . an increase of the
|
532 |
+
value of @xmath65 is for @xmath132 for the both
|
533 |
+
time series , although the contribution of the
|
534 |
+
11-year cycle for the time series @xmath101 is
|
535 |
+
insignificant . thus , this part of the
|
536 |
+
autocorrelation function ( @xmath133 for the time
|
537 |
+
series @xmath102 ) influences the @xmath21-th peak
|
538 |
+
of the periodogram . this suggests that the
|
539 |
+
periodicity at about 155 days is a harmonic of the
|
540 |
+
periodicity from the interval of @xmath1 $ ] days
|
541 |
+
. ( solid line ) and consecutively smoothed
|
542 |
+
sunspot areas of the one rotation time interval
|
543 |
+
@xmath134 ( dotted line ) . both indexes are
|
544 |
+
presented on the left axis . the lower curve
|
545 |
+
illustrates fluctuations of the sunspot areas
|
546 |
+
@xmath135 . the dotted and dashed horizontal lines
|
547 |
+
represent levels zero and @xmath136 respectively .
|
548 |
+
the fluctuations are shown on the right axis . ]
|
549 |
+
the described reasoning can be carried out for
|
550 |
+
other values of the periodogram . for example ,
|
551 |
+
the condition ( 8) is not satisfied for @xmath137
|
552 |
+
( 250 , 222 , 200 days ) . moreover , the
|
553 |
+
autocorrelation function at these points is
|
554 |
+
negative . these suggest that there are not a true
|
555 |
+
periodicity in the interval of [ 200 , 250 ] days
|
556 |
+
. it is difficult to decide about the existence of
|
557 |
+
the periodicities for @xmath138 ( 333 days ) and
|
558 |
+
@xmath139 ( 286 days ) on the basis of above
|
559 |
+
analysis . the implication ( 8) is not satisfied
|
560 |
+
for @xmath139 and the condition ( 7 ) is not
|
561 |
+
satisfied for @xmath138 , although the function
|
562 |
+
@xmath25 of the time series @xmath102 is
|
563 |
+
significantly positive for @xmath140 . the
|
564 |
+
conditions ( 7 ) and ( 8) are satisfied for
|
565 |
+
@xmath141 ( figure [ f8 ] ) and @xmath142 .
|
566 |
+
therefore , it is possible to exist the
|
567 |
+
periodicity from the interval of @xmath1 $ ] days
|
568 |
+
. similar results were also obtained by @xcite for
|
569 |
+
daily sunspot numbers and daily sunspot areas .
|
570 |
+
she considered the means of three periodograms of
|
571 |
+
these indexes for data from @xmath143 years and
|
572 |
+
found statistically significant peaks from the
|
573 |
+
interval of @xmath1 $ ] ( see @xcite , figure 2 )
|
574 |
+
. @xcite studied sunspot areas from 1876 - 1999
|
575 |
+
and sunspot numbers from 1749 - 2001 with the help
|
576 |
+
of the wavelet transform . they pointed out that
|
577 |
+
the 154 - 158-day period could be the third
|
578 |
+
harmonic of the 1.3-year ( 475-day ) period .
|
579 |
+
moreover , the both periods fluctuate considerably
|
580 |
+
with time , being stronger during stronger sunspot
|
581 |
+
cycles . therefore , the wavelet analysis suggests
|
582 |
+
a common origin of the both periodicities . this
|
583 |
+
conclusion confirms the de method result which
|
584 |
+
indicates that the periodogram peak at @xmath144
|
585 |
+
days is an alias of the periodicity from the
|
586 |
+
interval of @xmath1 $ ] in order to verify the
|
587 |
+
existence of the periodicity at about 155 days i
|
588 |
+
consider the following time series : + @xmath145
|
589 |
+
+ @xmath146 + @xmath147 + the value @xmath134
|
590 |
+
is calculated analogously to @xmath83 ( see sect .
|
591 |
+
the values @xmath148 and @xmath149 are evaluated
|
592 |
+
from the formula ( 9 ) . in the upper part of
|
593 |
+
figure [ f9 ] the time series of sunspot areas
|
594 |
+
@xmath150 of the one rotation time interval from
|
595 |
+
the whole solar disk and the time series of
|
596 |
+
consecutively smoothed sunspot areas @xmath151 are
|
597 |
+
showed . in the lower part of figure [ f9 ] the
|
598 |
+
time series of sunspot area fluctuations @xmath145
|
599 |
+
is presented . on the basis of these data the
|
600 |
+
maximum activity period of cycle 16 is evaluated .
|
601 |
+
it is an interval between two strongest
|
602 |
+
fluctuations e.a . @xmath152 $ ] rotations . the
|
603 |
+
length of the time interval @xmath153 is 54
|
604 |
+
rotations . if the about @xmath0-day ( 6 solar
|
605 |
+
rotations ) periodicity existed in this time
|
606 |
+
interval and it was characteristic for strong
|
607 |
+
fluctuations from this time interval , 10 local
|
608 |
+
maxima in the set of @xmath154 would be seen .
|
609 |
+
then it should be necessary to find such a value
|
610 |
+
of p for which @xmath155 for @xmath156 and the
|
611 |
+
number of the local maxima of these values is 10 .
|
612 |
+
as it can be seen in the lower part of figure [ f9
|
613 |
+
] this is for the case of @xmath157 ( in this
|
614 |
+
figure the dashed horizontal line is the level of
|
615 |
+
@xmath158 ) . figure [ f10 ] presents nine time
|
616 |
+
distances among the successive fluctuation local
|
617 |
+
maxima and the horizontal line represents the
|
618 |
+
6-rotation periodicity . it is immediately
|
619 |
+
apparent that the dispersion of these points is 10
|
620 |
+
and it is difficult to find even few points which
|
621 |
+
oscillate around the value of 6 . such an analysis
|
622 |
+
was carried out for smaller and larger @xmath136
|
623 |
+
and the results were similar . therefore , the
|
624 |
+
fact , that the about @xmath0-day periodicity
|
625 |
+
exists in the time series of sunspot area
|
626 |
+
fluctuations during the maximum activity period is
|
627 |
+
questionable . . the horizontal line represents
|
628 |
+
the 6-rotation ( 162-day ) period . ] ] ]
|
629 |
+
to verify again the existence of the about
|
630 |
+
@xmath0-day periodicity during the maximum
|
631 |
+
activity period in each solar hemisphere
|
632 |
+
separately , the time series @xmath88 and @xmath89
|
633 |
+
were also cut down to the maximum activity period
|
634 |
+
( january 1925december 1930 ) . the comparison of
|
635 |
+
the autocorrelation functions of these time series
|
636 |
+
with the appriopriate autocorrelation functions of
|
637 |
+
the time series @xmath88 and @xmath89 , which are
|
638 |
+
computed for the whole 11-year cycle ( the lower
|
639 |
+
curves of figures [ f1 ] and [ f2 ] ) , indicates
|
640 |
+
that there are not significant differences between
|
641 |
+
them especially for @xmath23=5 and 6 rotations (
|
642 |
+
135 and 162 days ) ) . this conclusion is
|
643 |
+
confirmed by the analysis of the time series
|
644 |
+
@xmath146 for the maximum activity period . the
|
645 |
+
autocorrelation function ( the lower curve of
|
646 |
+
figure [ f11 ] ) is negative for the interval of [
|
647 |
+
57 , 173 ] days , but the resolution of the
|
648 |
+
periodogram is too low to find the significant
|
649 |
+
peak at @xmath159 days . the autocorrelation
|
650 |
+
function gives the same result as for daily
|
651 |
+
sunspot area fluctuations from the whole solar
|
652 |
+
disk ( @xmath160 ) ( see also the lower curve of
|
653 |
+
figures [ f5 ] ) . in the case of the time series
|
654 |
+
@xmath89 @xmath161 is zero for the fluctuations
|
655 |
+
from the whole solar cycle and it is almost zero (
|
656 |
+
@xmath162 ) for the fluctuations from the maximum
|
657 |
+
activity period . the value @xmath163 is negative
|
658 |
+
. similarly to the case of the northern hemisphere
|
659 |
+
the autocorrelation function and the periodogram
|
660 |
+
of southern hemisphere daily sunspot area
|
661 |
+
fluctuations from the maximum activity period
|
662 |
+
@xmath147 are computed ( see figure [ f12 ] ) .
|
663 |
+
the autocorrelation function has the statistically
|
664 |
+
significant positive peak in the interval of [ 155
|
665 |
+
, 165 ] days , but the periodogram has too low
|
666 |
+
resolution to decide about the possible
|
667 |
+
periodicities . the correlative analysis indicates
|
668 |
+
that there are positive fluctuations with time
|
669 |
+
distances about @xmath0 days in the maximum
|
670 |
+
activity period . the results of the analyses of
|
671 |
+
the time series of sunspot area fluctuations from
|
672 |
+
the maximum activity period are contradict with
|
673 |
+
the conclusions of @xcite . she uses the power
|
674 |
+
spectrum analysis only . the periodogram of daily
|
675 |
+
sunspot fluctuations contains peaks , which could
|
676 |
+
be harmonics or subharmonics of the true
|
677 |
+
periodicities . they could be treated as real
|
678 |
+
periodicities . this effect is not visible for
|
679 |
+
sunspot data of the one rotation time interval ,
|
680 |
+
but averaging could lose true periodicities . this
|
681 |
+
is observed for data from the southern hemisphere
|
682 |
+
. there is the about @xmath0-day peak in the
|
683 |
+
autocorrelation function of daily fluctuations ,
|
684 |
+
but the correlation for data of the one rotation
|
685 |
+
interval is almost zero or negative at the points
|
686 |
+
@xmath164 and 6 rotations . thus , it is
|
687 |
+
reasonable to research both time series together
|
688 |
+
using the correlative and the power spectrum
|
689 |
+
analyses . the following results are obtained :
|
690 |
+
1 . a new method of the detection of statistically
|
691 |
+
significant peaks of the periodograms enables one
|
692 |
+
to identify aliases in the periodogram . 2 . two
|
693 |
+
effects cause the existence of the peak of the
|
694 |
+
periodogram of the time series of sunspot area
|
695 |
+
fluctuations at about @xmath0 days : the first is
|
696 |
+
caused by the 27-day periodicity , which probably
|
697 |
+
creates the 162-day periodicity ( it is a
|
698 |
+
subharmonic frequency of the 27-day periodicity )
|
699 |
+
and the second is caused by statistically
|
700 |
+
significant positive values of the autocorrelation
|
701 |
+
function from the intervals of @xmath165 $ ] and
|
702 |
+
@xmath166 $ ] days . the existence of the
|
703 |
+
periodicity of about @xmath0 days of the time
|
704 |
+
series of sunspot area fluctuations and sunspot
|
705 |
+
area fluctuations from the northern hemisphere
|
706 |
+
during the maximum activity period is questionable
|
707 |
+
. the autocorrelation analysis of the time series
|
708 |
+
of sunspot area fluctuations from the southern
|
709 |
+
hemisphere indicates that the periodicity of about
|
710 |
+
155 days exists during the maximum activity period
|
711 |
+
. i appreciate valuable comments from professor j.
|
712 |
+
jakimiec ."""
|
713 |
+
|
714 |
+
from transformers import LEDForConditionalGeneration, LEDTokenizer
|
715 |
+
import torch
|
716 |
+
|
717 |
+
tokenizer = LEDTokenizer.from_pretrained("allenai/led-large-16384-arxiv")
|
718 |
+
|
719 |
+
input_ids = tokenizer(LONG_ARTICLE, return_tensors="pt").input_ids.to("cuda")
|
720 |
+
global_attention_mask = torch.zeros_like(input_ids)
|
721 |
+
# set global_attention_mask on first token
|
722 |
+
global_attention_mask[:, 0] = 1
|
723 |
+
|
724 |
+
model = LEDForConditionalGeneration.from_pretrained("allenai/led-large-16384-arxiv", return_dict_in_generate=True).to("cuda")
|
725 |
+
|
726 |
+
sequences = model.generate(input_ids, global_attention_mask=global_attention_mask).sequences
|
727 |
+
|
728 |
+
summary = tokenizer.batch_decode(sequences)
|
729 |
+
```
|