curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

//a generic query = 5% success
//add "(powered by) wordpress" 
      $query=urlencode('keywords+wordpress+trackback');
      $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
      $count=0;
      foreach($xml->channel->item as $i) {

           $count++;

//the data from msn
           $target['link'] = (string) $i->link;
           $target['title'] = (string) $i->title;
           $target['excerpt'] = (string) $i->description;

//some variables I'll need later on
           $target[id'] = $count;
           $target['trackback'] = '';
           $target['trackback_success'] = 0;

           $trackbacks[]=$target;
       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

for($t=0;$t]*?href[\s]?=[\s\"\']+".
           "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
        $content, &$matches);
 $uri_array = $matches[1];
 foreach($uri_array as $key => $link) { 
             if(strpos($link, 'rackbac')>0) { 
                $trackbacks[$t]['trackback'] = $link;
                break; 
             }
        }
}

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

function cache_xml_store($trackbacks, $pagetitle) 
{
 $xml = '< ?xml version="1.0" encoding="UTF-8"?>
 ';
 for($a=0;$a';
  $xml .= ''.$arr['excerpt'].'';
  $xml .= ''.$arr['link'].'';
  $xml .= ''.$arr['title'].'';
  $xml .= '';
 }
 $xml .= '';
 
 $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
 if(file_exists($fname)) unlink('cache/'.$fname);
 $fhandle = fopen($fname, 'w');
 fwrite($fhandle, $xml);
 fclose($fhandle);
 return;
}

I use simplexml to read that cached file and show the excertps and links once the page is requested.

// retrieve the cached xml and return it as array.
function cache_xml_retrieve($pagetitle)
{
 $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
 if(file_exists($fname)) {
  $xml=@simplexml_load_file($fname);
  if(!$xml) return false;
  foreach($xml->entry as $e) {
   $trackback['id'] =(string) $e->id;
   $trackback['link'] =  rid((string) $e->link);
   $trackback['title'] =  (string) $e->title;
   $trackback['description'] =  (string) $e->description;

   $trackbacks[] = $arr;
  }
  return $trackbacks;
 } 
 return false;
}

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));

$cached_excerpts = cache_xml_retrieve($pagetitle);

//do some stuff with, make it look nice  :
for($s=0;$s'.$cached_excerpts['title'].'';
}

Now I prepare the data for the trackback post :

for($t=0;$t "url of my page with the link to the target",
  "title" => "title of my page",
 "blog_name" => "name of my blog",
 "excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
        );
        //...and try the trackback
        $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
    }
}

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

function trackback_ping($trackback_url, $trackback)
 {

//make a string of the data array to post
 foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
        $postfields= implode('&', $strout);
  
//create a curl instance
 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, $trackback_url);
 curl_setopt($ch, CURLOPT_TIMEOUT, 3);
 curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//set a custom form header
 curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));

 curl_setopt($ch, CURLOPT_NOBODY, true);

        curl_setopt($ch, CURLOPT_POST, true);
 curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); 
  
 $content = curl_exec($ch);

//if the return has a tag 'error' with as value 0 it went flawless
 $success = 0; 
 if(strpos($content, '>0')>0) $success = 1; 
 curl_close ($ch);
 unset($ch);
 return $success;
 }

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

for($t=0;$t0) {
        $store_trackbacks[]=$trackbacks[$t];
    }
}
cache_xml_store($store_trackbacks, $pagetitle);

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

/**
  * Fetches ping url from the given url
  *
  * @param string $url URL to probe for RDF
  * @return string Ping URL
  */
 protected function getPingURL($url) {
  $pingUrl = '';
  // Get URL content
  $urlContent = t3lib_div::getURL($url);
  if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {
    // We will use quick regular expression to find ping URL
    $rdfContent = substr($urlContent, $rdfPos, $endPos);
    $pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
   }
  }
  return $pingUrl;
 }

4 thoughts on “curl trackbacks”

  1. I get about 10% backlinks when I spider, more than I expected as the site doesn’t have any content of itself. Buying links or using tnx is more effective, as background process this works better. I want a stand alone script with an xml cache so I can use it on other sites without being dependant on script capabilities.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top