SEORoberto – Website Crawl & Monitoring tool

I’m finally done with the final project for the Full Stack Foundation course. I built a tool to scan, monitor and manage website audits for search engine and digital marketers.

It’s built with AngularJS, Node.js, Express and MongoDB and took 2 months to complete whilst attending classes full-time. It’s not fully baked but an MVP.

As part of the assessment I was required to do a presentation to investors, you can view my pitch for the app here. Visit https://seoroberto.herokuapp.com to test the app. Here’s a breakdown of the features of this app.

Static Pages

  • Product, pricing and contact pages with information about the product and pricing plans. However payment is not integrated in this MVP.
  • Registration, login with welcome email and logout.

Crawling

  • Input a domain URL and with a click of the button, the tool crawl the links on every webpage and then saves each page’s title, headings, meta description, etc, as well as the user’s ID into the MongoDB database as JSON objects. Only the first 8 links are crawled for demonstration, to keep the database small.

  • View the scanned data in a table where you can drag & drop to rearrange columns, hide columns and sort headers. The headers are sticky headers. There’s the option to export the table to a CSV file too.
  • Search and filter functionality. You can filter the data by date, domain url, or both. You can also search for a specific keyword. It makes for easy comparison and analysis of data. e.g. enter a specific page url and see how the titles/descriptions change over time and how that affected search rankings.
  • Missing labels show empty values on webpage that should be filled in for search engines to pick up.
  • Too-long labels show the no. of characters count in your titles and meta description. Too-long titles and descriptions are truncated by Google.

  • Report – this page is hardcoded with sample data. This is just an example chart, more visualisations of key performance metrics would be added in the actual app.
  • Chart – shows the proportion and number of indexed and noindex pages. Also shows the number of metarobots that are not indicated, this should be fixed. This is a performance metric, as more indexed pages means more landing pages for search. Chart can be downloaded as PDF or image file.

  • Scheduled scans – currently user can set only one scheduled scan per account. A scheduler pings the API endpoint in the app weekly at a specific date and time. Scan is performed on the server, then an email sent to the user when it is done. So he/she can login to view the scanned data. There’s however a bug in the scheduled scan that I’m not yet able to fix.

There’s lots more work to be done on the app and I will continue working on it as a side project 🙂