#!/usr/bin/perl -w
#
# dbmerge.pm
# Copyright (C) 1991-2024 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2. See the file COPYING
# in $dblibdir for details.
#
=head1 NAME
dbmerge - merge all inputs in sorted order based on the the specified columns
=head1 SYNOPSIS
dbmerge --input A.fsdb --input B.fsdb [-T TemporaryDirectory] [-nNrR] column [column...]
or
cat A.fsdb | dbmerge --input - --input B.fsdb [-T TemporaryDirectory] [-nNrR] column [column...]
or
dbmerge [-T TemporaryDirectory] [-nNrR] column [column...] --inputs A.fsdb [B.fsdb ...]
or
{ echo "A.fsdb"; echo "B.fsdb" } | dbmerge --xargs column [column...]
=head1 DESCRIPTION
Merge all provided, pre-sorted input files, producing one sorted result.
Inputs can both be specified with C<--input>,
or with C<--inputs>,
or one can come from standard input and the other from C<--input>.
With C<--xargs>, each line of standard input is a filename for input.
Inputs must have identical schemas (columns, column order,
and field separators).
Unlike F<dbmerge2>, F<dbmerge> supports an arbitrary number of
input files.
Because this program is intended to merge multiple sources,
it does I<not> default to reading from standard input.
If you wish to read standard input,
use F<-> as the input source.
Also, because we deal with multiple input files,
this module doesn't output anything until it's run.
L<dbmerge> consumes a fixed amount of memory regardless of input size.
It therefore buffers output on disk as necessary.
(Merging is implemented a series of two-way merges
and possibly an n-way merge at the end,
so disk space is O(number of records).)
L<dbmerge> will merge data in parallel, if possible.
The C<--parallelism> option can control the degree of parallelism,
if desired.
=head1 OPTIONS
General option:
=over 4
=item B<--xargs>
Expect that input filenames are given, one-per-line, on standard input.
(In this case, merging can start incrementally.)
=item B<--removeinputs>
Delete the source files after they have been consumed.
(Defaults off, leaving the inputs in place.)
=item B<-T TmpDir>
where to put tmp files.
Also uses environment variable TMPDIR, if -T is
not specified.
Default is /tmp.
=item B<--parallelism N> or B<-j N>
Allow up to N merges to happen in parallel.
Default is the number of CPUs in the machine.
=item B<--endgame> (or B<--noendgame>)
Enable endgame mode, extra parallelism when finishing up.
(On by default.)
=back
Sort specification options (can be interspersed with column names):
=over 4
=item B<-r> or B<--descending>
sort in reverse order (high to low)
=item B<-R> or B<--ascending>
sort in normal order (low to high)
=item B<-n> or B<--numeric>
sort numerically
=item B<-N> or B<--lexical>
sort lexicographically
=back
=for comment
begin_standard_fsdb_options
This module also supports the standard fsdb options:
=over 4
=item B<-d>
Enable debugging output.
=item B<-i> or B<--input> InputSource
Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<-o> or B<--output> OutputDestination
Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.
=item B<--autorun> or B<--noautorun>
By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.
=item B<--header> H
Use H as the full Fsdb header, rather than reading a header from
then input.
=item B<--help>
Show help.
=item B<--man>
Show full manual.
=back
=for comment
end_standard_fsdb_options
=head1 SAMPLE USAGE
=head2 Input:
File F<a.fsdb>:
#fsdb cid cname
11 numanal
10 pascal
File F<b.fsdb>:
#fsdb cid cname
12 os
13 statistics
These two files are both sorted by C<cname>,
and they have identical schemas.
=head2 Command:
dbmerge --input a.fsdb --input b.fsdb cname
or
cat a.fsdb | dbmerge --input b.fsdb cname
=head2 Output:
#fsdb cid cname
11 numanal
12 os
10 pascal
13 statistics
# | dbmerge --input a.fsdb --input b.fsdb cname
=head1 SEE ALSO
L<dbmerge2(1)>,
L<dbsort(1)>,
L<Fsdb(3)>
=cut
# WARNING: This code is derived from dbmerge.pm; that is the master copy.
use Fsdb::Filter::dbmerge;
my $f = new Fsdb::Filter::dbmerge(@ARGV);
$f->setup_run_finish; # or could just --autorun
exit 0;
=head1 AUTHOR and COPYRIGHT
Copyright (C) 1991-2024 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general
public license, version 2. See the file COPYING
with the distribution for details.
=cut
1;