Greytree

TamWiki

For a mouse who is a packrat

Technology » Using Curl In PHP
a one line description

Summary:this is what goes at the top of the site

(redirected from Main.UsingCurl)

<< Prev: Smarty

Up: ^PHP^

On this page... (hide)

The cURL extension to PHP is a very useful extension for retrieving web objects that you don't necessarily want to deal with directly, such as retrieving files for storage and later use.

cURL typically comes installed in many versions of PHP. Running phpinfo() will tell you whether you've got cURL installed.

cURL has many options to it. Selecting the right options is important for using cURL effectively.

An example

Retrieving a file while following redirects:

  1. <?php
  2.  
  3. define('TEMPDIR', '/tmp/');
  4. define('IMAGEDIR', '/tmp/');
  5. define('DEBUG', true);
  6.  
  7. $image = pull_image('Some $misc //image//',time(),"http://ttwiki/pub/skins/tarski/img/hdr/greytree.jpg");
  8. echo "\$image=$image\n";
  9.  
  10. function pull_image($name, $date, $imguri)
  11. {
  12.   if (DEBUG) echo "\$name=$name, \$date=$date, \$imguri=$imguri\n";
  13.   $ch = curl_init();
  14.   $fn = tempnam(TEMPDIR, "img"); // TEMPDIR defined elsewhere
  15.   if (DEBUG) echo "\$fn=$fn\n";
  16.   $fh = fopen($fn,'w'); // file to contain page
  17.   $hn = $fn . '.header';
  18.   if (DEBUG) echo "\$hn=$hn\n";
  19.   $hh = fopen($hn,'w'); // file to contain header
  20.   $options = Array(
  21.        CURLOPT_URL => $imguri,
  22.        CURLOPT_USERAGENT => "Mozilla/5.0",
  23.        CURLOPT_FILE => $fh,
  24.        CURLOPT_HEADER => false,
  25.        CURLOPT_WRITEHEADER => $hh,
  26.        CURLOPT_FOLLOWLOCATION => TRUE,
  27.        CURLOPT_MAXREDIRS => '10'
  28.        );
  29.   curl_setopt_array($ch, $options);
  30.   if (curl_exec($ch) === FALSE) {
  31.     die("Unable to retrieve $imgurl: ".curl_error($ch)."\n");
  32.   }
  33.   curl_close($ch);
  34.   fclose($fh);
  35.   fclose($hh);
  36.   if (file_exists($fn)) {
  37.     $ext = determine_extension($fn,$hn);
  38.     if (false === $ext) {
  39.       die("File retrieved $fn is not an image type of file\n");
  40.     }
  41.     $savefn = IMAGEDIR.preg_replace('/[^[:alnum:]]/','',$name)."-".date("Y-m-d",$date).".".$ext; //IMAGEDIR defined elsewhere
  42.     if (DEBUG) echo "\$savefn=$savefn\n";
  43.     rename($fn, $savefn);
  44.     return $savefn;
  45.   } else {
  46.     return NULL;
  47.   }
  48. }
  49.  
  50. function determine_extension($fn, $hn)
  51. {
  52.   // determine the extension based on the file Content-type returned in the header
  53.   $type2ext = array('image/jpeg' => 'jpg',
  54.         'image/jpg' => 'jpg',
  55.         'image/png' => 'png',
  56.         'image/gif' => 'gif');
  57.  
  58.   $header=file_get_contents($hn);
  59.   $header_lines = explode("\r\n",$header);
  60.   if (DEBUG) echo "\$header_lines=\n".print_r($header_lines,true)."\n";
  61.   $i = 0;
  62.   while(! preg_match("/^HTTP.* 200 OK/",$header_lines[$i]) && $i < count($header_lines)) {$i++;}
  63.   if (DEBUG) echo "\$i=$i, \$header_lines[$i]=$header_lines[$i]\n";
  64.   if ($i >= count($header_lines)) return false; // no header returned found
  65.   while(! preg_match("/^Content-type:/i",$header_lines[$i]) && $i < count($header_lines)) {$i++;}
  66.   if (DEBUG) echo "\$i=$i, \$header_lines[$i]=$header_lines[$i]\n";
  67.   if ($i >= count($header_lines)) return false; // no Content-type returned
  68.   list($name,$value) = explode(": ",$header_lines[$i]);
  69.   if (DEBUG) echo "\$name=$name, \$value=$value\n";
  70.   $parts = explode("; ",$value);
  71.   if (DEBUG) echo "\$parts=\n".print_r($parts,true)."\n";
  72.   $type=$parts[0];
  73.   if (DEBUG) echo "\$type=$type\n";
  74.   return (isset($type2ext[$type])) ? $type2ext[$type] : false;
  75. }

Which outputs:

$ php usingcurlex.php 
$name=Some $misc //image//, $date=1334033959, $imguri=http://ttwiki/pub/skins/tarski/img/hdr/greytree.jpg
$fn=/private/tmp/imgRlsf9s
$hn=/private/tmp/imgRlsf9s.header
$header_lines=
Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Set-Cookie: TRACKID=9ac675f419705b6e771624c13eab4e8e; Path=/; Version=1
    [2] => Content-Type: image/jpeg
    [3] => Accept-Ranges: bytes
    [4] => ETag: "1813642202"
    [5] => Last-Modified: Thu, 29 Mar 2012 04:13:45 GMT
    [6] => Content-Length: 23355
    [7] => Date: Tue, 10 Apr 2012 04:59:19 GMT
    [8] => Server: lighttpd
    [9] => 
    [10] => 
)

$i=0, $header_lines[0]=HTTP/1.1 200 OK
$i=2, $header_lines[2]=Content-Type: image/jpeg
$name=Content-Type, $value=image/jpeg
$parts=
Array
(
    [0] => image/jpeg
)

$type=image/jpeg
$savefn=/tmp/Somemiscimage-2012-04-09.jpg
$image=/tmp/Somemiscimage-2012-04-09.jpg

What's important to note here is that the curl extension uses a file to write the retrieved data to. (If no file handle is given to curl, curl->exec send the returned url to STDOUT (i.e., the browser). To get curl to return the result to a variable, use set CURLOPT_RETURNTRANSFER in the curl options to true.) It is considered a best practice not to retrieve files directly into where they will reside, but to retrieve them to a temporary location, and then do whatever processing may be needed on them before moving them to the permanent location. This works similarly to how PHP handles uploaded files.

In the example above, two temporary filenames were created, one for the file contents and one for the header(s), and then opened in write mode. The file handles were passed into the curl object, options were set, and the curl run. After checking the result to make sure the curl ran correctly, the files are both closed and the curl is shut down. From then on, the file is dealt with first in it's temporary location, and finally moved to the permanent location. The header file is read to determine the type of file sent (header "Content-Type:").


Tags: Categories: Articles

Recent Changes | Printable View | Page History | Edit Page
Page last modified on April 10, 2012, at 12:31 AM by tamara