I find curl's cookie jar problematic so I wrote my own routine. There are other times I need to add cookies scraped from the page.
For this CURLOPT_HEADER
must be true.
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$requestHeader= substr($data,0,$skip);
$data = substr($data,$skip);
$e = 0;
while(true){
$s = strpos($requestHeader,'Set-Cookie: ',$e);
if (!$s){break;}
$s += 12;
$e = strpos($requestHeader,';',$s);
$cookie = substr($requestHeader,$s,$e-$s) ;
$s = strpos($cookie,'=');
$key = substr($cookie,0,$s);
$value = substr($cookie,$s);
$cookies[$key] = $value;
}
Then to use the $cookies[]:
$cookie = '';
$show = '';
$delim = '';
foreach ($cookies as $k => $v){
$cookie .= "$delim$k$v";
$delim = '; ';
}
Then use $cookie:
curl_setopt($ch, CURLOPT_COOKIE, $cookie );
When there is trouble, I often set FOLLOWLOCATION
to false:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
If there is a Redirect you can see what is happening and need get cookies that are set in the Redirected URL's Response Header then FOLLOWLOCATION
must be set to false.
When the curl URL takes you to a redirect curl_getinfo
will get the redirect location URL.
$status = intval(curl_getinfo($ch,CURLINFO_HTTP_CODE));
if ($status > 299 && $status < 400){
$url= curl_getinfo($ch,CURLINFO_REDIRECT_URL );
}
// update cookies, do not clear `cookies()`;
When it gets tough I uses these options to get both Response and Response Headers. The Response Header will be return in the curl_exec()
data. The Request Header will be return by curl_getinfo()
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
$info = rawurldecode(var_export(curl_getinfo($ch),true));
$data = curl_exec($ch);
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$requestHeader= substr($data,0,$skip);
$data = substr($data,$skip);
$filename = parse_url($url, PHP_URL_HOST);
$filename .= parse_url($url, PHP_URL_PATH) . '.txt';
$fp = fopen($filename,'w');
fwrite($fp,$info\n$data");
fclose($fp);
$data = substr($data,$skip);
}
Both header and the HTML are stored in the file. You can then view both HTTP Headers, the HTML and JavaScript. Sometimes cookies are set by JavaScript document.cookie, or the page redirected with window.location, or an HTML form's submit button is clicked with JS. In these cases it may be necessary to scrape the cookies and or redirect location from the curl data.
Then I use FireFox Inspector or Chrome Development Tool.
I go to the Network Tab
In FireFox I go to Settings and turn on "Enable Persistent logs"
In Chrome I click "Preserve log" on the Network Tab
Then I use the Browser to go wherever I want curl to go.
Now I can see every Request and Response including redirects and compare them with the save headers.
When you need the header to look exactly like the saved Browser headers:
Create an array to put the Request Header Key Values
Fill in the Request array with exactly what is in the Request header of your upload.
EXAMPLE:
$request = array();
$request[] = "Host: www.example.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
Add to curl:
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
Many times it is much easier to use a mobile version. Many times the desktop version page requires JavaScript and the mobile version does not. I use FireFox with user agent switcher using an old Motorola user agent to retrieve the headers and HTML. Then I use the same user agent in curl's HTTPHEADER
:
request[] = 'User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
cookie.txt
please