About The Wayback Machine
The Wayback Machine is a world wide web digital archive which was created by a nonprofit organization in San Fransisco, California, United States called the Internet Archive. It was launched on October 24, 2001, by the Internet Archive. The Wayback Machine was set up by Bruce Gilliat and Brewster Kahle. It gives its users access to archived versions of web pages across time. Its website is web.archive.org.
The Wayback Machine, since 1996, has been archiving cached pages of websites onto its large bunch of Linux nodes. It keeps its archive up to date by revisiting sites frequently (say weekly or monthly). Sites may also be captured when visitors input the site’s URL in a search engine. The motive is to archive content that may be lost when a site is shut down or changed. The long term vision of the creators of the machine is to archive the entire internet.
The name “Wayback Machine” was selected as a reference to the WABAC Machine (pronounced wayback), a time traveling device used by the characters Sherman and Mr. Peabody in the animated cartoon called The Rocky and Bullwinkle Show. In Peabody’s Improbable History (one of the component segments of the animated cartoon), the characters regularly used the WABAC machine to witness, take part in, and, more often than not, change famous occurrences in history.
The Internet Archive, which describes itself to be created for the long term, is found by some users to be working intensely to capture data before they disappear without any trace.
How the Wayback Machine Works
Wayback Machine is an amazingly valuable tool with over 300 billion pages stored over the past two decades. This machine works by crawling through millions of websites and simultaneously taking snapshots of them .these snapshots are then stored. The Machine allows users to see what certain website looked like at a particular time. It also helps to see old websites that are no longer on the internet.
The software has been created to download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system & downloadable software by crawling the web. Since a lot of the data are restricted by publishers or saved in inaccessible databases, the information so gathered by these crawlers does not include some information available on the net.
Snapshots usually take more than 6 months to become available after being archived or, at times, even later (say twenty-four months or even longer). The rate at which snapshots are taken is variable, so not all updates from tracked websites are recorded. At times, between snapshots, there are intervals of years or several weeks.
Sites, after August 2008, had to be listed on the Open Directory so as to be included.
Storage Capacity of the Machine
The Wayback Machine, as at 2009, contained 3 petabytes of data approximately. Furthermore, it was growing speedily at 100 terabytes per month. It stores its data on Petabox rack systems.
The Internet Archive, in 2009, moved its storage architecture to Sun Open Storage. It now hosts a data center in a Sun Modular Datacenter.
A new and improved version of the Machine was made available for public testing in 2011. The new version had a fresher index of archived content and updated interface.
In March 2011, a statement on the Wayback Machine forum stated that “The Beta of the new Wayback Machine has a more complete and up-to-date index of all crawled materials into 2010, and will continue to be updated regularly. The index driving the classic Wayback Machine only has a little bit of material past 2008, and no further index updates are planned, as it will be phased out this year”. (“Beta Wayback Machine, in forum”. archive.org)
The company announced an incredible milestone of 240 billion URLs in January 2013.
The company also announced a feature in October 2011 called “save a page”. This helps any user of the internet to archive the contents of a URL.
The Wayback Machine, as at December 2014, contained almost 9 petabytes of data and its growth rate was increased to 20 terabytes per week.
The Wayback Machine was reported to contain around 15 petabytes of data as of July 2016.
The website’s global rank on Alexa changed from 162 to 279 between October 2013 and August 2017.
The Wayback Machine Usage
The site is usually used by citizens and journalists to review dated news reports, changes to web contents or websites that are no longer on the internet. Politicians have been held accountable using its content. Battlefield lies were also exposed using its content.
In 2014, Igor Girkin (separatist rebel leader in Ukraine) was shown by an archive of his social media page to be boasting about his troops having shot down a supposed Ukrainian military airplane before it was realized that the plane was actually a civilian Malaysian Airlines jet. He deleted the post after this and blamed Ukraine’s military.
The March for Science (formerly known as the Scientists’ March on Washington), in 2017, was birthed by a discussion on Reddit (an American social news aggregation, web content rating, and discussion website). It showed that someone had visited archive.org and found out that all references to climate change had been erased from the White House website. In response to this, a user made a comment, “There needs to be a Scientists’ March on Washington”. (Foley, Katherine Ellen. “The March for Science started with a single Reddit thread”).
In Europe, this Machine could be said to be violating copyright laws. This is so because only the creator of the content can decide where their content is duplicated or published. As a result of this sole right of the creator, the Archive would have to remove some pages from its system upon a request from the content creator.
The website, archive.org, is currently blocked in India. It was also blocked in Russia in 2015 after it enabled the HTTPS protocol. It was also blocked in China too.