Azure Storage
Introduction
An Azure Storage Account is a secure and scalable storage service that supports various storage types, including blobs, files, queues, and tables.
Azure Storage Offerings
Storage Types
- Blob Storage: Blob storage is a service for storing vast amounts of unstructured data, such as text or binary data. It's ideal for documents, images, and videos.
- File Storage: Provides a scalable and fully managed file-sharing service for applications and users, suitable for file sharing and storing application data.
- Queue Storage: Enables reliable messaging between application components through authenticated HTTP or HTTPS calls, supporting asynchronous communication.
- Table Storage: A NoSQL data store that offers a key/attribute schema-less design for fast and scalable storage of structured data like logs or sensor data.
Key Features
- Scalability: Easily scales to accommodate increasing data storage and application demands.
- Security: Supports encryption, access controls, and integration with Azure Active Directory for secure data management.
- High Availability: Offers multiple redundancy options to ensure data availability and disaster recovery.
- Integration: Seamlessly integrates with Azure services like Azure Functions, Data Factory, and Logic Apps for enhanced functionality.
Use Cases
Azure Storage Account is suitable for various scenarios such as:
- Storing unstructured data such as media files and documents.
- Creating shared file systems for applications.
- Enabling asynchronous messaging between components.
- Managing NoSQL data for scalable applications.
Data Redundancy
In Azure Storage, data redundancy involves creating copies of data in multiple locations to ensure its safety and availability in case of unexpected events. Azure provides various types of data redundancy options.
Types of Data Redundancy
Locally Redundant Storage (LRS):
LRS replicates three copies of data within the same datacenter in the primary region. This is the most cost-effective option, mitigating local hardware failures like rack or drive failures.
1 Data Center = More than 12,000 failuresZone-Redundant Storage (ZRS):
ZRS replicates three copies of data across three availability zones in a single region, providing higher durability and protection against datacenter failures.
1 Zone = 3 Data Centers; 1 Region = 3 ZonesGeo-Redundant Storage (GRS):
GRS replicates six copies of data — three in the primary region and three in a secondary region located at least 100 miles away. It is ideal for disaster recovery and robust data durability.Read-Access Geo-Redundant Storage (RA-GRS):
RA-GRS provides the same level of redundancy as GRS with added read access to the data in the secondary region, allowing access during primary region outages.Geo-Zone-Redundant Storage (GZRS):
GZRS replicates data across availability zones in both the primary and secondary regions, ensuring resilience against multiple failures and higher availability for mission-critical applications.
Configuring Data Redundancy
- Select the appropriate redundancy option when creating a storage account.
- Change the redundancy option for an existing storage account, which may involve some data movement.
By choosing the right redundancy option, you can protect your data against hardware failures, datacenter outages, and regional disasters while balancing costs and specific application requirements.
Access Key vs Shared Access Key (SAS)
- Access Keys: Access keys are master passwords for Azure Storage. They provide complete access to the entire storage account and are suitable for long-term, persistent access requirements. Access keys should be kept highly secure to prevent unauthorized access.
- Shared Access Signatures (SAS) SAS tokens are temporary, scoped permissions for specific resources within a storage account. They grant limited access for a defined time and are ideal for temporary scenarios like granting access to partners or dynamic applications.
Container vs Blob
Container
A container organizes blobs logically, similar to directories or folders. A storage account can have multiple containers, and each container can store unlimited blobs. Container names must be unique within a storage account.
Blob
A blob is the actual data stored within a container. Blobs can range from small files to terabytes of data. They are categorized into the following types:
Block Blobs
Block blobs are used for large, unstructured data like documents and media files.
Page Blobs
Page blobs are ideal for random read/write operations, such as virtual hard disks (VHDs).
Append Blobs
Append blobs are optimized for append-only operations, like logging.
Azure Lifecycle Management
Azure Storage Lifecycle Management enables the creation of customizable, automated rules for managing data across its storage tiers: Hot, Cool, and Archive. Each tier offers distinct performance and cost characteristics, making it possible to optimize data storage based on specific needs.
Capabilities of Lifecycle Management
- Automated Tier Transition: Automatically transition data between tiers based on conditions such as the last modified date or access frequency.
- Cost Optimization: Delete data automatically after a defined retention period to reduce storage costs.
With these rules, organizations can achieve cost efficiency and ensure seamless, automated data management within Azure Storage.
Azure Data Lake
A data lake is a centralized repository designed to store, process, and secure large amounts of structured, semi-structured, and unstructured data. It can store data in its native format and process any variety of it, ignoring size limits.
Key Features
- Scalable Storage: Azure Data Lake provides scalable and cost-effective storage for big data, making it suitable for storing vast amounts of data.
- Data Lake Storage: Its Azure Data Lake Storage Gen2, which combines the capabilities of Azure Blob Storage and the Azure Data Lake Storage service.
- Data Analytics: Supports various analytics and data processing tools like Azure Databricks, HDInsight, and Azure Data Factory for data analysis.
- Security: Provides robust security and access control features to protect data, including Azure Active Directory integration and role-based access control.
- Data Lake Analytics: Allows running big data analytics without the need to manage infrastructure. We can use U-SQL, Spark, or Hive.
- Data Versioning: Supports versioning of data, making it easy to track changes and revert to previous versions when necessary.
- Geo-Redundancy: Offers options for geo-redundant storage to ensure data availability and durability.
- Data Compression & Encryption: Data Lake supports compression and encryption, helping to optimize storage costs and maintain data security.
Azure Data Lake Storage: Gen1 vs Gen2
ADLS Gen1:
- Hierarchical Namespace: ADLS Gen1 doesn't have a true hierarchical namespace, which makes it less suitable for organizing and managing data at scale efficiently.
- Storage Format: It stores data in a proprietary file system.
- Analytics Compatibility: It's compatible with Azure Data Lake Analytics for big data analytics.
- Access Control: Supports Azure Data Lake Store ACLs (Access Control Lists) for managing access at the file and folder level.
ADLS Gen2:
- Hierarchical Namespace: ADLS Gen2 introduces a true hierarchical file system, which allows organizing and managing data more efficiently by using directories and subdirectories.
- Storage Format: It is built on top of Azure Blob Storage and supports various open data formats, making it more versatile.
- Analytics Compatibility: It's compatible with Azure Data Lake Analytics, and other Azure services like Azure Databricks, HDInsight, and more.
- Access Control: Utilizes Azure AD for access control, making it more integrated with Azure's role-based access control (RBAC) system.