Before content can be indexed and searched, it needs to be collected and stored on a local server. The process of gathering content consists of two stages. First we need to collect, or crawl the data. Second the data needs to be pre-processed and transformed to a format suitable for indexing.