Monday, February 15, 2010

Moving domains

In an effort to save me money and time, I'm in the process of moving some of my domains and putting them at Dreamhost. The only thing is that Dreamhost costs $9.95 for a registration transfer. Luckily this would be a one-time fee, but still, it is unfortunate that if I already have an existing domain, I'm going to have to re-register it at Dreamhost.

Also, I will be letting the following domains expire:
  • isocy.com
  • iservere1.com
  • iservere2.com
  • rachelfaciana.com
  • isociale.net
  • isociale.org
I think that is it, but there may be another that I'm letting expire. As far as the rest of my domains, almost all of them are already at Dreamhost, the few that are not include, this domain (jhartig.com) and fastest963.com. I will be moving fastest963.com closer to when it expires, a few months, and same with jhartig.com but not until mid-June or July. I have already sent in requests to eNom for fastest963.com and jhartig.com because I don't know how long they take for requests and I don't want to be waiting last minute for the transfer.

I will also be consolidating my Google Apps accounts, I have many and I would like to keep them to a minimum so I don't have to check a million emails a day (exaggeration). With deVolf, we have a bunch of Domain Alias's set up for all the deVolf domains and I plan to do the same with my personal domains. groovesharkday.com is being configured soon (Google takes 7 days to process this domain switch) and gs-status.com is already configured.

Among the other domains that I own, they are either already configured or they are secret. Along the way, I will be posting any problems that I come across.

Thanks,
James Hartig

Friday, February 12, 2010

Perfect regex for removing links when parsing HTML

After a few long hours:

PHP Version:

/\<a.*?href=('|\")(.*?)(?:(?<!\\\)\\1|\w+(?=\=)|.(?=\s))[^\>]*?>(.*?)(?:\<\/)(?=[a]).*?(?=\>)\>/i


Actual Regex

/\<a.*?href=('|\")(.*?)(?:(?<!\\)\1|\w+(?=\=)|.(?=\s))[^\>]*?>(.*?)(?:\<\/)(?=[a]).*?(?=\>)\>/i


That regex was designed for deVolf's new RSS Import feature. It takes an a link and removes the href link and the text inside the . It allows for empty links as well as links without href's. The regex return matches are as follows:
  1. match 1 is whether single or double quotes were used, this is required for later on in the regex and is not usual after the regex is run
  2. match 2 contains the href link
  3. match 3 contains the text between the <a></a>
Things to consider:
  • The regex matches anything after <a> until it hits </a>
  • Between the href="" it looks for a closing quote (that matches the quote used to start it), a space or another html property. Therefore, I recommend checking the end of the url for a quote or space before working with it.
  • It will NOT match newlines that are in the link anywhere. If you want to, add a s after the i at the end.
  • It works with PHP 5.3. I have not tested other versions.

Thanks,
James Hartig

Sunday, February 7, 2010

PHP and twURLa

A few weeks ago I started work on a project, twURLa. Basically, it is a site that tracks domains on Twitter and ranks them. Over the course of the few weeks, I learned a lot about PHP performance and it has been very beneficial, yet stressful. Here are a few things I learned:

  • Sockets are awesome, streams suck
  • Non-blocking is annoying
  • Debugging is very hard with very unpredictable data
  • JSON is better than serialize
  • Disks are extremely slow
  • A simple VPS can power twURLa
Basically, we started out using streams to connect to all the sites we process, which ended up not being fast enough at all! After switching to sockets, I had a lot more control and I was able to get 1 PHP script to process hundreds of  URLs per second. Throughout the process, debugging was difficult with our test data being a stream of Tweets from Twitter. What we did was save portions of the feed and then I would manually process them and compare to what the script says. The thing is: it took me an hour to process what the script did in 2 seconds.

Our VPS is powered by Fivebean. Fivebean has been extremely helpful and without them twURLa would not be where it is now. We had a very low budget and Fivebean allowed us to work around this and get our site up and running without trouble. Their support is very knowledgeable and fast; the average response time was 10-15 minutes.


Thanks,
James Hartig