Implementing an S3 native filesystem in Hadoop

Let's first create InputStream and OutputStream for the filesystem. In our example, we have to connect to the AWS to read and write files to S3.

Hadoop provides us with the FSInputStream class to cater to custom filesystems. We extend this class and override a few methods in the example implementation. A lot of private variables are declared along with the constructor and helper methods to initialize the client as illustrated in the following code snippet. The private variables contain objects that are used to configure and retrieve data from the filesystem. In this example, we use objects such as AmazonS3Client to call REST web APIs on AWS, S3Object as a representation of the remote object on S3, ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop: Data Processing and Modelling by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Implementing an S3 native filesystem in Hadoop

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly