#!/usr/bin/perl
#
# dbcolsdecimate.pm
# Copyright (C) 2023 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2. See the file COPYING
# in $dblibdir for details.
#
=head1 NAME
dbcolsdecimate - drop rows selectively, keeping large changes and periodic samples
=head1 SYNOPSIS
dbcolsdecimate [-p RELATIVE_PREC] [-P ABSOLUTE_PREC] column1 [column2...]
=head1 DESCRIPTION
For each of the given columns, prune it back to show changes
with at most RELATIVE_PRECISION fraction of total range change (default: 0.01;
alternativey one can specify an absolute precision).
This tool is designed for reducing the actual data in a graph
while keeping it visually identical.
Precisions, if specified, apply to any any subsequent columns.
(One can therefore have different precisions for different columsn.)
With multiple columns, major changes in I<any> column cause
a record to be emitted.
Our goal is to output an identical plot, with fewer points if we can.
This goal differs from and is easier than
prior published work that has the goal of
the number of points by a known factor, or to a constant number,
while preserving as much fidelity as possible.
We usually put out a pair of points at each change,
so that if the data has stairsteps, they don't turn in to diagonals.
Please take caution that relative precision is based on evaluation of the
range of the data, and so it is sensitive to outliers.
Verbose output (B<-v>) will show the actual precision that is promised,
allowing one to adjust manually if necessary (with B<-P>).
By default
this program temporarily stores a complete copy of the input data on disk.
However, if all columns are given absolute precisions,
this program runs with constant memory.
=head1 OPTIONS
=over 4
=item B<--precision-relative> P or B<--relative-precision> P or B<-p> P
Set the precision of how large a fraction of the total range
should be presereved.
Applies to any subsequent columns.
Default: 0.01.
=item B<--precision-absolute> P or B<--absolute-precision> P or B<-P> P
Set the precision in absolute units.
Applies to any subsequent columns.
=item B<-T TmpDir>
where to put tmp files.
Also uses environment variable TMPDIR, if -T is
not specified.
Default is /tmp.
=back
=for comment
begin_standard_fsdb_options
This module also supports the standard fsdb options:
=over 4
=item B<-d>
Enable debugging output.
=item B<-v>
Enable verbose output.
=item B<-i> or B<--input> InputSource
Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<-o> or B<--output> OutputDestination
Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<--autorun> or B<--noautorun>
By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.
=item B<--help>
Show help.
=item B<--man>
Show full manual.
=back
=for comment
end_standard_fsdb_options
=head1 SAMPLE USAGE
=head2 Input:
#fsdb x y
0 0
1 50
2 50
3 50
4 50
5 50
6 50
7 50
8 50
9 50
10 50
11 50
12 50
13 50
14 50
15 50
16 50
17 50
18 50
19 50
20 50
21 50
22 50
23 50
24 50
25 50
26 50
27 50
28 50
29 50
30 50
31 50
32 50
33 50
34 50
35 50
36 50
37 50
38 50
39 50
40 50
41 50
42 50
43 50
44 50
45 50
46 50
47 50
48 50
49 50
50 50
50 51
50 52
50 53
50 54
50 55
50 56
50 57
50 58
50 59
50 60
50 61
50 62
50 63
50 64
50 65
50 66
50 67
50 68
50 69
50 70
50 71
50 72
50 73
50 74
50 75
50 76
50 77
50 78
50 79
50 80
50 81
50 82
50 83
50 84
50 85
50 86
50 87
50 88
50 89
50 90
50 91
50 92
50 93
50 94
50 95
50 96
50 97
50 98
50 99
100 100
=head2 Command:
dbcolsdecimate -v -p 0.1 x -p 0.2 y
=head2 Output:
(from F<TEST/dbcolsdecimate_linear_different.out>):
#fsdb x y
# column x with range 100 and relative precision 0.1 gives threshold 10
# column y with range 100 and relative precision 0.2 gives threshold 20
0 0
1 50
11 50
12 50
22 50
23 50
33 50
34 50
44 50
45 50
50 70
50 71
50 91
50 92
50 99
100 100
# output 16 of 101 (0.1584)
# | dbcolsdecimate -v -p 0.1 x -p 0.2 y
=head1 SEE ALSO
L<Fsdb>,
L<dbcolmovingstats>.
=cut
# WARNING: This code is derived from dbcolsdecimate.pm; that is the master copy.
use Fsdb::Filter::dbcolsdecimate;
my $f = new Fsdb::Filter::dbcolsdecimate(@ARGV);
$f->setup_run_finish; # or could just --autorun
exit 0;
=head1 AUTHOR and COPYRIGHT
Copyright (C) 2023 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general
public license, version 2. See the file COPYING
with the distribution for details.
=cut
1;