Azure Data Lake Overview For Beginners

What Is A Data Lake?

  • A data lake is a central storage repository that carries big data from many sources of raw data in its native form until it is needed.
  • It can store structured, unstructured data, or semi-structured, which means data can be kept in a more flexible format for future use.
  • A data lake is capable of Store and analyzes petabyte-size files and trillions of objects.
  • It also Develops massively parallel programs easily.

Why Azure Data Lake?

  • Azure Data Lake includes all the facilities required to make it easy for data scientists, developers, and analysts to store data of any shape, size, and speed.
  • It does all types of analytics and processing across platforms and languages.
  • It removes all the difficulties of ingesting and storing all of your data while making it faster to get up and running with streaming, batch, and interactive analytics.


What Is Azure Data Lake Analytics?

  • Azure Data Lake Analytics is an on-demand analytics job service built on Apache Hadoop YARN that simplifies big data.
  • It processes big data jobs in seconds and no infrastructure to worry about because there are no virtual machines, servers, or clusters to wait for, manage, or tune.
  • It is designed to let users perform analytics on data up to petabytes in size.
  • It covers U-SQL, a query language that extends the simple, familiar, declarative nature of SQL with the dramatic power of C#.
  • It is a cost-effective solution for big data workloads. You pay on a per-job basis when data is processed.

What Is Azure Data Lake Store?

  •  Azure Data Lake Store provides a single repository where small or large organizations upload data of just about infinite size.
  • It is designed for high-performance processing and analytics from Hadoop Distributed File System tools and applications, including support for low latency workloads.
  • It allows structured and unstructured data in their native formats.
  • It allows for huge throughput to boost analytic performance.
  • It offers high availability, durability, and reliability.
  • Azure storage services are better than Amazon S3 because it gives an integrated analytics service and places no limits on file volume.
  • Types of ADLS.
    • ADLS Gen1.
    • ADLS Gen2.

1) What is Azure Data Lake Storage Gen 1?

  • Azure Data Lake Storage Gen1 is an enterprise-wide hyper-scale storehouse for big-data analytic workloads.
  • It permits us to capture data of any type, size, and ingestion speed in one single place for operational and exploratory analytics.
  • It carries all enterprise-grade capabilities such as scalability, security, manageability, availability, and reliability.

Key Features of Data Lake Storage Gen1

Some of the key features of Data Lake Storage Gen1 include the following.

  • Made for Hadoop: we can easily analyze data stored in ADLS Gen1 using Hadoop analytic frameworks such as Hive or MapReduce.
  • Unlimited storage: ADLS Gen1provides unlimited storage and can store a range of data for analytics and range from kilobytes to petabytes in size.
  • Big data analytics: ADLS Gen1is built for running large-scale analytic systems that require huge throughput to analyze and query large amounts of data.
  • Highly available and Securing data: In ADLS Gen1 data are stored securely by making redundant copies to guard against any sudden failures.

2) What Is Azure Data Lake Storage Gen2?

  • ADLS Gen2 is a collection of capabilities for big data analytics.
  • It is built on Azure Blob storage, and have all the key features of ADLS Gen1.
  • ADLS Gen2 offers capabilities like:
    • file system semantics
    • file-level security
    • directory
    • low-cost
    • scalability
    • high availability/disaster recovery

Key Features of Data Lake Storage Gen2

Some of the key features of Data Lake Storage Gen2 include the following.

  • Hadoop suitable access: ADLS Gen2 permits you to access and manage data just as you would with a Hadoop Distributed File System (HDFS).
  • POSIX permissions: The security design for ADLS Gen2 supports ACL and POSIX permissions along with some more granularity specific to ADLS Gen2.
  • Low Cost: ADLS Gen2 offers low-cost transactions and storage capacity.
  • Optimized driver: The ABFS driver is developed exactly for big data analytics.

Uses-Cases Of Azure Data Lake

  • General-purpose object storage handled by Azure.
  • Streaming and processing of batch workloads.
  • Selection of data by analysts and data engineers for specific needs without making copies.

Advantage Of Azure Data Lake

  • Highly flexible and scalable as it is housed on the cloud.
  • Allows streamlining data storage for all business needs.
  • A huge amount of data can be processed simultaneously providing quick access to insights.
  • Data Lake stores everything like multimedia, logs, XML, sensor data, social data, binary, chat, and people data.
  • No limit on data storage and file size.
  • Supports massive analytics workloads for in-depth analytics.
  • It supports schema-less storage.

Data Lake Price/Month (Pay-as-you-go)

  • First 100 TB: Rs. 2.58 per GB
  • Next 100 TB to 1,000 TB: Rs. 2.52 per GB
  • Next 1,000 TB to 5,000 TB: Rs. 2.45 per GB

Like it? Share with your friends!

What's Your Reaction?

hate hate
confused confused
fail fail
fun fun
geeky geeky
love love
lol lol
omg omg
win win


Your email address will not be published. Required fields are marked *

Choose A Format
Personality quiz
Series of questions that intends to reveal something about the personality
Trivia quiz
Series of questions with right and wrong answers that intends to check knowledge
Voting to make decisions or determine opinions
Formatted Text with Embeds and Visuals
The Classic Internet Listicles
The Classic Internet Countdowns
Open List
Submit your own item and vote up for the best submission
Ranked List
Upvote or downvote to decide the best list item
Upload your own images to make custom memes
Youtube, Vimeo or Vine Embeds
Soundcloud or Mixcloud Embeds
Photo or GIF
GIF format