.\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "LWP::Parallel 3"
.TH LWP::Parallel 3 "2016-05-29" "perl v5.26.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
LWP::Parallel \- Extension for LWP to allow parallel HTTP and FTP access
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 2
\& use LWP::Parallel;
\& print "This is LWP::Parallel_$LWP::Parallel::VERSION\en";
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
.SS "Introduction"
.IX Subsection "Introduction"
ParallelUserAgent is an extension to the existing libwww module. It
allows you to take a list of URLs (it currently supports \s-1HTTP, FTP,\s0 and
\&\s-1FILE\s0 URLs. \s-1HTTPS\s0 might work, too) and connect to all of them _in parallel_,
then wait for the results to come in.
.PP
See the Parallel::UserAgent for how to create a \s-1LWP\s0 UserAgent that
will access multiple Web resources in parallel. The Parallel::RobotUA
module will additionally offer proper handling of robot.txt file, the
de-facto exclusion protocol for Web Robots.
.SS "Examples"
.IX Subsection "Examples"
The following examples might help to get you started:
.PP
.Vb 2
\& require LWP::Parallel::UserAgent;
\& use HTTP::Request;
\&
\& # display tons of debugging messages. See \*(Aqperldoc LWP::Debug\*(Aq
\& #use LWP::Debug qw(+);
\&
\& # shortcut for demo URLs
\& my $url = "http://localhost/";
\&
\& my $reqs = [
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url),
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url."homes/marclang/"),
\& ];
\&
\& my $pua = LWP::Parallel::UserAgent\->new();
\& $pua\->in_order (1); # handle requests in order of registration
\& $pua\->duplicates(0); # ignore duplicates
\& $pua\->timeout (2); # in seconds
\& $pua\->redirect (1); # follow redirects
\&
\& foreach my $req (@$reqs) {
\& print "Registering \*(Aq".$req\->url."\*(Aq\en";
\& if ( my $res = $pua\->register ($req) ) {
\& print STDERR $res\->error_as_HTML;
\& }
\& }
\& my $entries = $pua\->wait();
\&
\& foreach (keys %$entries) {
\& my $res = $entries\->{$_}\->response;
\&
\& print "Answer for \*(Aq",$res\->request\->url, "\*(Aq was \et", $res\->code,": ",
\& $res\->message,"\en";
\& }
.Ve
.PP
Parallel::UserAgent (as well as the Parallel::RobotUA) offer three
default methods that will be called at certain points during the
connection: \f(CW\*(C`on_connect\*(C'\fR, \f(CW\*(C`on_return\*(C'\fR and \f(CW\*(C`on_failure\*(C'\fR.
.PP
.Vb 5
\& #
\& # provide subclassed UserAgent to override on_connect, on_failure and
\& # on_return methods
\& #
\& package myUA;
\&
\& use Exporter();
\& use LWP::Parallel::UserAgent qw(:CALLBACK);
\& @ISA = qw(LWP::Parallel::UserAgent Exporter);
\& @EXPORT = @LWP::Parallel::UserAgent::EXPORT_OK;
\&
\& # redefine methods: on_connect gets called whenever we\*(Aqre about to
\& # make a a connection
\& sub on_connect {
\& my ($self, $request, $response, $entry) = @_;
\& print "Connecting to ",$request\->url,"\en";
\& }
\&
\& # on_failure gets called whenever a connection fails right away
\& # (either we timed out, or failed to connect to this address before,
\& # or it\*(Aqs a duplicate). Please note that non\-connection based
\& # errors, for example requests for non\-existant pages, will NOT call
\& # on_failure since the response from the server will be a well
\& # formed HTTP response!
\& sub on_failure {
\& my ($self, $request, $response, $entry) = @_;
\& print "Failed to connect to ",$request\->url,"\en\et",
\& $response\->code, ", ", $response\->message,"\en"
\& if $response;
\& }
\&
\& # on_return gets called whenever a connection (or its callback)
\& # returns EOF (or any other terminating status code available for
\& # callback functions). Please note that on_return gets called for
\& # any successfully terminated HTTP connection! This does not imply
\& # that the response sent from the server is a success!
\& sub on_return {
\& my ($self, $request, $response, $entry) = @_;
\& if ($response\->is_success) {
\& print "\en\enWoa! Request to ",$request\->url," returned code ", $response\->code,
\& ": ", $response\->message, "\en";
\& print $response\->content;
\& } else {
\& print "\en\enBummer! Request to ",$request\->url," returned code ", $response\->code,
\& ": ", $response\->message, "\en";
\& # print $response\->error_as_HTML;
\& }
\& return;
\& }
\&
\& package main;
\& use HTTP::Request;
\&
\& # shortcut for demo URLs
\& my $url = "http://localhost/";
\&
\& my $reqs = [
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url),
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url."homes/marclang/"),
\& ];
\&
\& my $pua = myUA\->new();
\&
\& foreach my $req (@$reqs) {
\& print "Registering \*(Aq".$req\->url."\*(Aq\en";
\& $pua\->register ($req);
\& }
\& my $entries = $pua\->wait(); # responses will be caught by on_return, etc
.Ve
.PP
The final example will demonstrate a simple Web Robot that keeps a
cache of the \*(L"robots.txt\*(R" permission files it has encountered so
far. This example also uses callbacks to handle the response as it
comes in.
.PP
.Vb 2
\& require LWP::Parallel::UserAgent;
\& use HTTP::Request;
\&
\& # persistent robot rules support. See \*(Aqperldoc WWW::RobotRules::AnyDBM_File\*(Aq
\& require WWW::RobotRules::AnyDBM_File;
\&
\& # shortcut for demo URLs
\& my $url = "http://www.cs.washington.edu/";
\&
\& my $reqs = [
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url),
\& # these are all redirects. depending on how you set
\& # \*(Aqredirect_ok\*(Aq they either just return the status code for
\& # redirect (like 302 moved), or continue to follow redirection.
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url."research/ahoy/"),
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url."research/ahoy/doc/paper.html"),
\& HTTP::Request\->new(\*(AqGET\*(Aq, "http://metacrawler.cs.washington.edu:6060/"),
\& # these are all non\-existant server. the first one should take
\& # some time, but the following ones should be rejected right
\& # away
\& HTTP::Request\->new(\*(AqGET\*(Aq, "http://www.foobar.foo/research/ahoy/"),
\& HTTP::Request\->new(\*(AqGET\*(Aq, "http://www.foobar.foo/foobar/foo/"),
\& HTTP::Request\->new(\*(AqGET\*(Aq, "http://www.foobar.foo/baz/buzz.html"),
\& # although server exists, file doesn\*(Aqt
\& HTTP::Request\->new(\*(AqGET\*(Aq, $url."foobar/bar/baz.html"),
\& ];
\&
\& my ($req,$res);
\&
\& # establish persistant robot rules cache. See WWW::RobotRules for
\& # non\-permanent version. you should probably adjust the agentname
\& # and cache filename.
\& my $rules = new WWW::RobotRules::AnyDBM_File \*(AqParallelUA\*(Aq, \*(Aqcache\*(Aq;
\&
\& # create new UserAgent (actually, a Robot)
\& my $pua = new LWP::Parallel::RobotUA ("ParallelUA",
\& \*(Aqyourname@your.site.com\*(Aq, $rules);
\&
\& $pua\->timeout (2); # in seconds
\& $pua\->delay ( 5); # in seconds
\& $pua\->max_req ( 2); # max parallel requests per server
\& $pua\->max_hosts(10); # max parallel servers accessed
\&
\& # for our own print statements that follow below:
\& local($\e) = ""; # ensure standard $OUTPUT_RECORD_SEPARATOR
\&
\& # register requests
\& foreach $req (@$reqs) {
\& print "Registering \*(Aq".$req\->url."\*(Aq\en";
\& $pua\->register ($req , \e&handle_answer);
\& # Each request, even if it failed to # register properly, will
\& # show up in the final list of # requests returned by $pua\->wait,
\& # so you can examine it # later.
\& }
\&
\& # $pua\->wait returns a pointer to an associative array, containing
\& # an \*(Aq$entry\*(Aq for each request made, sorted by its url. (as returned
\& # by $request\->url\->as_string)
\& my $entries = $pua\->wait(); # give another timeout here, 25 seconds
\&
\& # let\*(Aqs see what we got back (see also callback function!!)
\& foreach (keys %$entries) {
\& $res = $entries\->{$_}\->response;
\&
\& # examine response to find cascaded requests (redirects, etc) and
\& # set current response to point to the very first response of this
\& # sequence. (not very exciting if you set \*(Aq$pua\->redirect(0)\*(Aq)
\& my $r = $res; my @redirects;
\& while ($r) {
\& $res = $r;
\& $r = $r\->previous;
\& push (@redirects, $res) if $r;
\& }
\&
\& # summarize response. see "perldoc HTTP::Response"
\& print "Answer for \*(Aq",$res\->request\->url, "\*(Aq was \et", $res\->code,": ",
\& $res\->message,"\en";
\& # print redirection history, in case we got redirected
\& foreach (@redirects) {
\& print "\et",$_\->request\->url, "\et", $_\->code,": ", $_\->message,"\en";
\& }
\& }
\&
\& # our callback function gets called whenever some data comes in
\& # (same parameter format as standard LWP::UserAgent callbacks!)
\& sub handle_answer {
\& my ($content, $response, $protocol, $entry) = @_;
\&
\& print "Handling answer from \*(Aq",$response\->request\->url,": ",
\& length($content), " bytes, Code ",
\& $response\->code, ", ", $response\->message,"\en";
\&
\& if (length ($content) ) {
\& # just store content if it comes in
\& $response\->add_content($content);
\& } else {
\& # Having no content doesn\*(Aqt mean the connection is closed!
\& # Sometimes the server might return zero bytes, so unless
\& # you already got the information you need, you should continue
\& # processing here (see below)
\&
\& # Otherwise you can return a special exit code that will
\& # determins how ParallelUA will continue with this connection.
\&
\& # Note: We have to import those constants via "qw(:CALLBACK)"!
\&
\& # return C_ENDCON; # will end only this connection
\& # (silly, we already have EOF)
\& # return C_LASTCON; # wait for remaining open connections,
\& # but don\*(Aqt issue any new ones!!
\& # return C_ENDALL; # will immediately end all connections
\& # and return from $pua\->wait
\& }
\&
\& # ATTENTION!! If you want to keep reading from your connection,
\& # you should have a final \*(Aqreturn undef\*(Aq statement here. Even if
\& # you think that all data has arrived, it does not hurt to return
\& # undef here. The Parallel UserAgent will figure out by itself
\& # when to close the connection!
\&
\& return undef; # just keep on connecting/reading/waiting
\& # until the server closes the connection.
\& }
.Ve
.SH "AUTHOR"
.IX Header "AUTHOR"
Marc Langheinrich, marclang@cpan.org
.SH "SEE ALSO"
.IX Header "SEE ALSO"
See \s-1LWP\s0 for an overview on Web communication using Perl. See
LWP::Parallel::UserAgent and LWP::Parallel::RobotUA for details
on how to use this library.
.SH "COPYRIGHT"
.IX Header "COPYRIGHT"
Copyright 1997\-2004 Marc Langheinrich <marclang@cpan.org>
.PP
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.