#!/usr/bin/perl
#
# dbcolscorrelate.pm
# Copyright (C) 1998-2024 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2. See the file COPYING
# in $dblibdir for details.
#
=head1 NAME
dbcolscorrelate - find the coefficient of correlation over columns
=head1 SYNOPSIS
dbcolscorrelate column1 column2 [column3...]
=head1 DESCRIPTION
Compute the Pearson coefficient of correlation over two (or more) columns.
The output is one line of correlations.
With exactly two columns, a new column I<correlation> is created.
With more than two columns, correlations are computed for each
pairwise combination of rows, and each output column
is given a name which is the concatenation of the two source rows,
joined with an underscore.
By default, we compute the I<population correlation coefficient>
(usually designed rho, E<0x03c1>)
and assume we see all members of the population.
With the B<--sample> option we instead compute the
I<sample correlation coefficient>, usually designated I<r>.
(Be careful in that the default here to full-population
is the I<opposite> of the default in L<dbcolstats>.)
This program requires a complete copy of the input data on disk.
=head1 OPTIONS
=over 4
=item B<--sample>
Select a the sample Pearson product-moment correlation coefficient
(the "sample correlation coefficient", usually designated I<r>).
=item B<--no-sample>
Select a the Pearson product-moment correlation coefficient
(the "correlation coefficient", usually designated E<0x03c1>).
=item B<--weight> COL
Weight the correlation by column COL.
=item B<-f FORMAT> or B<--format FORMAT>
Specify a L<printf(3)>-style format for output statistics.
Defaults to C<%.5g>.
=item B<-T TmpDir>
where to put tmp files.
Also uses environment variable TMPDIR, if -T is
not specified.
Default is /tmp.
=back
=for comment
begin_standard_fsdb_options
This module also supports the standard fsdb options:
=over 4
=item B<-d>
Enable debugging output.
=item B<-i> or B<--input> InputSource
Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<-o> or B<--output> OutputDestination
Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<--autorun> or B<--noautorun>
By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.
=item B<--help>
Show help.
=item B<--man>
Show full manual.
=back
=for comment
end_standard_fsdb_options
=head1 SAMPLE USAGE
=head2 Input:
#fsdb i_x i_y
10.0 8.04
8.0 6.95
13.0 7.58
9.0 8.81
11.0 8.33
14.0 9.96
6.0 7.24
4.0 4.26
12.0 10.84
7.0 4.82
5.0 5.68
=head2 Command:
cat TEST/anscombe_quartet.in | dbcolscorrelate i_x i_y
=head2 Output:
#fsdb correlation:d
0.81642
# | dbcolscorrelate i_x i_y
=head1 SEE ALSO
L<Fsdb>,
L<dbcolstatscores>,
L<dbcolsregression>,
L<dbrvstatdiff>.
=cut
# WARNING: This code is derived from dbcolscorrelate.pm; that is the master copy.
use Fsdb::Filter::dbcolscorrelate;
my $f = new Fsdb::Filter::dbcolscorrelate(@ARGV);
$f->setup_run_finish; # or could just --autorun
exit 0;
=head1 AUTHOR and COPYRIGHT
Copyright (C) 1998-2024 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general
public license, version 2. See the file COPYING
with the distribution for details.
=cut
1;