#!/usr/bin/perl -w
#
# dbcolmovingstats.pm
# Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2. See the file COPYING
# in $dblibdir for details.
#
=head1 NAME
dbcolmovingstats - compute moving statistics over a window of a column of data
=head1 SYNOPSIS
dbcolmovingstats [-am] [-w WINDOW] [-e EmptyValue] [-k KEY] column
=head1 DESCRIPTION
Compute moving statistics over a COLUMN of data.
Records containing non-numeric data are considered null
do not contribute to the stats (optionally they are treated as zeros
with C<-a>).
Statitics are computed over a WINDOW of samples of data.
[In progress 2020-11-12, but not completed:
Alternatively, if a key column is given with C<-k KEY>,
then a we treat the key column as a time value
and compute the time-weighted mean.]
Currently we compute mean and sample standard deviation.
(Note we only compute sample standard deviation,
not full population.)
Optionally, with C<-m> we also compute median.
(Currently there is no support for generalized quantiles.)
Values before a sufficient number have been accumulated are given the
empty value (if specified with C<-e>).
If no empty value is given, stats are computed on as many are possible if no empty
value is specified.
Dbcolmovingstats runs in O(1) memory, but must buffer a full window of data.
Quantiles currently will repeatedly sort the window and so may perform
poorly with wide windows.
=head1 OPTIONS
=over 4
=item B<-a> or B<--include-non-numeric>
Compute stats over all records (treat non-numeric records
as zero rather than just ignoring them).
=item B<-w> or B<--window> WINDOW
WINDOW of how many items to accumulate (defaults to 10).
(For compatibility with fsdb-1.x, B<-n> is also supported.)
=item B<-k> or B<--key> KEY
The KEY specifies a field that is used to evaluate the window---a window
must span at most this range of value so the key field.
(For example, if KEY is the time and window is 60, then enough samples
will be added to make at most 60s of observations.
With a key, sampling can be irregular.)
If key is specified, we also output a moving_n field for how many samples
are in each window.
=item B<-m> or B<--median>
Show median of the window in addition to mean.
=item B<-e E> or B<--empty E>
Give value E as the value for empty (null) records.
This null value is then output before a full window is accumulated.
=item B<-f FORMAT> or B<--format FORMAT>
Specify a L<printf(3)>-style format for output mean and standard deviation.
Defaults to C<%.5g>.
=back
Eventually we expect to support other options of L<dbcolstats>.
=for comment
begin_standard_fsdb_options
This module also supports the standard fsdb options:
=over 4
=item B<-d>
Enable debugging output.
=item B<-i> or B<--input> InputSource
Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<-o> or B<--output> OutputDestination
Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<--autorun> or B<--noautorun>
By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.
=item B<--help>
Show help.
=item B<--man>
Show full manual.
=back
=for comment
end_standard_fsdb_options
=head1 SAMPLE USAGE
=head2 Input:
#fsdb date epoch count
19980201 886320000 6
19980202 886406400 8
19980203 886492800 19
19980204 886579200 53
19980205 886665600 20
19980206 886752000 18
19980207 886838400 5
19980208 886924800 9
19980209 887011200 22
19980210 887097600 22
19980211 887184000 36
19980212 887270400 26
19980213 887356800 23
19980214 887443200 6
=head2 Command:
cat data.fsdb | dbmovingstats -e - -w 4 count
=head2 Output:
#fsdb date epoch count moving_mean moving_stddev
19980201 886320000 6 - -
19980202 886406400 8 - -
19980203 886492800 19 - -
19980204 886579200 53 21.5 21.764
19980205 886665600 20 25 19.442
19980206 886752000 18 27.5 17.02
19980207 886838400 5 24 20.445
19980208 886924800 9 13 7.1647
19980209 887011200 22 13.5 7.8528
19980210 887097600 22 14.5 8.8129
19980211 887184000 36 22.25 11.026
19980212 887270400 26 26.5 6.6081
19980213 887356800 23 26.75 6.3966
19980214 887443200 6 22.75 12.473
# | dbcolmovingstats -e - -n 4 count
=head1 SEE ALSO
L<Fsdb>.
L<dbcolstats>.
L<dbmultistats>.
L<dbrowdiff>.
=head1 BUGS
Currently there is no support for generalized quantiles.
=cut
# WARNING: This code is derived from dbcolmovingstats.pm; that is the master copy.
use Fsdb::Filter::dbcolmovingstats;
my $f = new Fsdb::Filter::dbcolmovingstats(@ARGV);
$f->setup_run_finish; # or could just --autorun
exit 0;
=head1 AUTHOR and COPYRIGHT
Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general
public license, version 2. See the file COPYING
with the distribution for details.
=cut
1;