Skip to main content

Curl HTTP Client 2.0

It’s been a while since I last updated my Curl HTTP Client class. That’s the class that we’ve been using for years now, for all kinds of site scrapping, bulk domain registration without API, … and even today we use it as a part of core in our brand new Payment system.

Since I had some spare time this weekend, I finally managed to merge some of updates we’ve created during all these years and put a class to github so I can do frequent updates regularly. Don’t worry, class has retained most of it’s previous functionality explained here so update to latest version shouldn’t cause any problem.

Here’s the GitHub page: https://github.com/dinke/curl_http_client. Feel free to use it in your own projects and send in your comments.

Validating an integer with PHP

When I started writing this post I wasn’t sure should I put this into programming category or fun … I mean, validating if passed variable is integer aka whole number, how hard can it be? It appears not really easy in PHP 🙂

First off some background, I needed to validate an array with db key values (like array(1,2,6,7,…n)), so I thought of using simple function like array_filter with callback, something like:

array_filter($foo_array, 'is_int');

where is_int is callback, calling built in is_int function for each array value in order to filter out non int values.

The problem is (yes after 10 years dealing with PHP I am aware of that), PHP doesn’t treat int and string numbers the same, so string ’42’ wouldn’t be recognized as integer.

is_int('42'); //false
is_int(42); //true

To make things more “interesting” for programmers, if you have integer form data like age, year or whatever, they will be sent trough $_POST/$_GET/$_REQUEST arrays as ‘string numbers’ (so string ’42’ not int 42).

There is nice function to deal with such things and it’s name is is_numeric … but it only checks if string is actually number, so float values will be evaluated to true as well. ctype_digit on the other hand is opposite of is_int, it will only return true if test variable is string number, so ’42’ would evaluate to true, but 42 to false.

ctype_digit('42'); //true
ctype_digit(42);//false

And to make things even worse, PHP is silently converting really big integers to float (those bigger than PHP_INT_MAX const) so guess what would you get for number like is_int(23234234234235334234234)? Yep, false 🙂

$var = PHP_INT_MAX;
var_dump($var++); //true
var_dump($var); //false it's float now!

Yeah I know, you could cast var to int and do is_int … or cast var to string and do ctype_digit … and other dirty hacks… but what if someone smart from PHP team had decided to let say add 2nd argument to is_int check so you can check for type in some kind of ‘non strict’ mode, so string ’42’ is actually evaluated as integer? Something like this in C I guess:

function is_int(int var, int strict = 1)
{
   //if strict is false evaluate string integers like '42' to true!
}

All in all (at least to me), the easiest way to validate whether a variable is whole number (aka string or integer string) is with regular expression. Something like this:

/**
 * Test if number is integer, including string integers
 * @param mixed var
 * @return boolean
 */
function isWholeNumber($var)
{
	if(preg_match('/^\d+$/', $var))
	{
		return true;
	}
	else
	{
		return false;
	}
}

Someone from PHP dev team should really consider fixing this.

MySQL: Deleting with Left Join

Today I had to deal with one huge table and cleanup all data where foreign key doesn’t have it’s primary key match in original table, just to remind myself how sub-queries in MySQL are terrible slower than joins.

I have some script which generates domains from typos, so I have one table with original domains (master_domains) and other one (result_domains) with generated typo domains. Basically something like this:

mysql> describe master_domains;
+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| domain | varchar(255)     | NO   | UNI | NULL    |                | 
+--------+------------------+------+-----+---------+----------------+
2 rows in set (0.07 sec)

mysql> describe result_domains;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| domain    | varchar(255)     | NO   | UNI | NULL    |                | 
| master_id | int(10) unsigned | YES  | MUL | NULL    |                | 
+-----------+------------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)

Table result_domains has master_id which is foreign key reference to primary key (id) in master_domains table. Since I also have other scripts generating domains without typos (which store result_domains.master_id field as NULL), today I simple wanted to get rid of those masters without proper master_id reference in result table or in other words those master domains where result_domains.master_id is NOT NULL.

With sub-queries you could write query easily with something like this:

delete from master_domains where id not in 
(select master_id from result_domains_frontend)

It is good habit to always run select query before deleting big number of rows (just to make sure your query is written correctly) so I tried select query first:

select * from master_domains where id not in 
(select master_id from result_domains_frontend) limit 10

However, it took several minutes to run without any output so eventually I’ve decided to stop it. I know that sub-queries are much slower than joins, so decided to do try removal operation with left join.

Left joins are actually perfect weapon to find rows that exist in one (left) and doesn’t exist in other (right) table. They also have one big advantage over sub-queries – they are performing much faster, plus they are backward compatible with old prehistoric MySQL 5.x versions. However delete syntax is little bit tricky so after few trial and errors eventually I came out with this query:

delete master_domains.* from master_domains 
left join result_domains_frontend 
on master_domains.id=result_domains_frontend.master_id 
where result_domains_frontend.master_id is null ;

And voila after a while it came up with result:

mysql> delete master_domains.* from master_domains 
left join result_domains_frontend 
on master_domains.id=result_domains_frontend.master_id 
where result_domains_frontend.master_id is null ;
Query OK, 270558 rows affected (46.58 sec)
mysql>