Nuroa is an vertical search engine centred around property buy/sell, rent and vacation homes. Backend part is responsible for crawling, parsing, transforming and summarising all data that site operates on. Consuming huge documents (XML feed files of sizes > 2 GB), it has to be really memory efficient and need to provide robust operation regardless of processor or I/O overloading.
There are two types of data acquisition:
- From cooperating sites we get feeds in different file formats, mostly in XML format which are parsed and stored as internal representation in our databases
- From sites which don’t offer feeds we utilise our own web-crawler that uses templates to allow us to capture the property listing data.
Highlights
- Highly parallel feed parsers
- Minimal resource usage
- Monitoring and controlling CPU and I/O usage
- Full text search through Lucene
Technologies used
- ORACLE DB for operative data
- MySQL for standing data
- MongoDB for listing content
- Spring
- Lucene
- IBATIS
- JMX
- JMS
- Spring MVC
- Java Cache
Info
Role: Senior Software Developer
Time span: Jan 2011 - Jun 2012