Spanish Web Archive

Web archiving is the main way to accomplish the legal deposit of online publications. It is carried out with crawler robots that go through previously selected URLs and save everything they have linked to the frequency, depth and size that is determined. The result of these process of harvesting are the web archives.

Today, it is impossible to achieve exhaustiveness in web archiving, so the Spanish National Library has opted for a mixed model that combines massive and selective collections:

  1. Bulk crawls collect the largest number of domains possible, while not delving deep into the navigation levels, and they are linked with a national top-level domain, such as .es. They are carried out once a year.
  2. Selective crawls are made to complete the massive collections, as they collect more frequently a smaller sample of websites selected for their relevance to history, society and culture. They are carried out several times a year in collaboration with the conservation centers of the Autonomous Communities and other specialized institutions. These selective collections can be of three types:
    1. Themed crawls: each department of the Biblioteca Nacional de España and each Autonomous Region maintain its thematic collections with online resources that they consider necessary to keep as part of the legal deposit. For example: Music and Audiovisuals, Andalusian electronic journals, Institutions of the Valencian Community, etc.
    2. Crawls for events: about events of special relevance.
    3. Crawls for risk: in the case of websites in danger of extinction.

Further information: BNE Web archive

DATA AND FORMATS AVAILABLE:

 

Spanish Web Archive
Bulk crawls: OpenWayBack HTML (*Only available at the BNE building)
Themed crawls CSV, JSON, ODS, TXT, XLS, XML
Autonomous Communities crawls CSV, JSON, ODS, TXT, XLS, XML
Crawls for events CSV, JSON, ODS, TXT, XLS, XML
Crawls for elections CSV, JSON, ODS, TXT, XLS, XML
eCrawls for risk CSV, JSON, ODS, TXT, XLS, XML