Tel: 01455 447501

NEW MEDIA DEVELOPMENT

Featured Sites

  • Screenshot of the Mothercare jobs website

The mothercare jobs site uses HR-XML to import vacancies directly from Mothercare's internal recruitment systems for instant publication.

Affiliations

  • Guild of Accessible Web Designers Logo
  • UK Web Design Association Logo
  • webmasters lookup logo
  • Nominet logo
  • Centralnic logo
  • Melbourne IT logo

 

 

The New Media Development Spider

The New Media Development spider is a multi purpose spider that is used as a search engine crawler to crawl clients sites for site search purposes.

The spider can also be used to validate a sites external links using a web crawler similar to those that search engines operate. This means we look at the pages on a web site's domain for all robots.txt files, which tell our spider which files it is allowed to access.

As a third function of the spider, it is employed within the New Media Development CMS system as an HTTP client in order to consume web-services and interact with other websites by sending and retrieving data.

All web sites have the ability to define what parts of their domain are off-limits to specific robot user agents. NMDSpider respects and obeys all robots.txt files.

robots.txt

Web administrators should use the following information to update your sites robots.txt files.

Our current User Agent String is

Mozilla/5.0 (compatible; NMDSpider/1.1; http://www.newmediadev.net/page.cfm/content/NMD_Spider) NMD Spider/1.1

If you would like us to not crawl your site, please add this to your robots.txt:

User-agent: NMDSpider
Disallow: /

If you feel that we are crawling too fast, please add this to your robots.txt:

User-agent: NMDSpider
Crawl-Delay: 5

This will slow our crawl to 1 page (at most) every 5 seconds.

If you would like to explicitly allow New Media Development's spider on your site, please add this to your robots.txt:

User-Agent: NMDSpider
Allow: /

For more information on robots.txt files, see robotstxt.org.