incidence

Digitalblend-CL

Baidu: means too many crawls...

At the start of 2012, Tilluminati.com has been experiencing some slow dows and for not apparent reason the site experienced lots of bandwidth usage with out any known reason what was causing the excessive bandwidth usage or who was pinging and sending so many web spiders to crawl the site. I knew that it could not be a hacker, all of the hits to the website had to be coming from a sepific set of IP Addresses, since this sort of thing is outside of the webhost's statement of support, I was on my own.

I caught Baidu..

Well after I upgraded Magento in April, I finally had a means of understanding what was going on, since all of the bandwidth usage in early 2012 was not excessive, I did nothing. But as the weeks went by things only got worse and something was causing the site to use a average of 110% to 120% of the server's resources, this was when I knew I had to try and solve the problem. Magento 1.7 has a online user tracking tool, and it recorded a constant stream of hits on all the pages of the site coming from two IP Address 180.76.5.0 - 255 and 180.76.5.0 - 255; at the time, until Magento 1.7 I had no success stoping the traffic from these two set of IP Addresses, robot.txt nothing worked. I was trying to avoid having to block these two IP addresses but that is what I ended up doing today.

So if you have to maintain servers, and you have a bandwith issue, check the server logs and see if the IP Addresses from Baidu keep showing up on your log, they are either 180.76.5.0 or 180.76.6.0 and their entire number range which is 0 to 255. The problem is that every hour Baidu is sending a average of 20-60 web search spiders to crawl the site, sucking up all the bandwidth on the webhosting account.

I searched google and looked through a number of webhosting forums, a number of web masters and administrators have complained about the same issue.

A solution found

The solution is simple, at the bottom of your .htaccess file in the root directory where your domain lis linked to on the web-host, you will have to block the two IP Adresses that Baidu uses. This is the only really effective way to solve the problem, so far it has worked for me, Magento is not logging visits from Baidu and the bandwidth usage has dropped to a normal level.

The issue is that once baidu has found your site, it keeps sending a steady stream of search spiders to your site, yes I know that Baidu's search spiders are suppose to obey your robot.txt file, but in practice this is not happening.

You will need to insert the code into the bottom of the .htaccess file, to block Baidu effectively you will have to block the entire range of IP Addresses that it is being served from.

############################################
# allow all except those indicated here

order allow,deny
allow from all
deny from xxx.xx.x.x
deny from xxx.xx.x.x

 Once you have inserted the code, check your server logs to see if the IP Addresses from Baidu are being blocked. I don't mind search engines crawling a site, it helps a site get notices, but 20-60 hits everyhour, now that is just to much.

Add a comment

CentOS Package installation problems.

If you have experience working on a server, you will have encountered a failed installation of a server-side tool package at one time or another. I my case it has been various PECL packages and tools, they work great and are updated regularly on top of that they are free. But when there are problems many of these packages are a pain to deal with.

Important: I am assuming that you are using Fedora or Centos, my web host is using CentOS for the VPS.

Add a comment

Read more...

Time Machine and spotlight troubles

Time Machine Privileges and Access Problems.

Time Machine logo

When Time Machine backup is working properly it's a great tool provided to you for free by Apple in OSX, when my old iMac died on me it saved my life, I did not have to go manually copy over the backup files one by one like in past IT projects. But if you use it long enough you will encounter problems some where along the way. For me the most common problem is a corrupt permission or login preferences in the time machine folder. If you are a using a Time Capsule or external drive as your time machine backup, you can just reformat the drive and start over again, or just mount the drive, log in and delete the offending backup and try again.

Add a comment

Read more...

The Holidays 2011

Christmas Eve

Well another holiday season has passed and I have done lots of traveling back and forth between San Diego and the San Francisco Bay Area. We celebrated Thanksgiving and Christmas in my mom's home this year. My Uncles from Toronto and the United Kingdom came to celebrate Christmas and one of my Uncle's Birthday, we had a very nice dinner at the R & G Lounge in Chinatown San Francisco in Mid-December.

I spent a lot of time flying back and forth between San Diego and San Jose this holiday season, like last year, there were lots of relatives visiting towards the end of the year.

There was lots of food, and lots of time with relatives, but I had fun and enjoyed spending time with my relatives during the Christmas holiday season. I don't know who enjoyed the holidays more, me or my sister's two dogs. Well now that 2011 is behind us, it's time to ring in the new year, we still have Chinese New Year on January 23rd.

Incase you are wondering, the video below is from my Uncle's Birthday Party.

Add a comment

Read more...

So Sad...

After having had over a month to digest and contemplate what I had lost, I thought it is time for me to post one of my greatest losses on this page. The events of the past few weeks, weight on me like a cold wet cloak, I do try my best to keep things in perspective. I believe that no one saw this coming, I received a package 3 days later, after reading the letter believe that not even he expected to have what happened happened. He was looking to the future, and the excitement and joy it brought.

Add a comment

Read more...