This document explains how to count the number of objects and their size in a directory.
Operation Scenarios
To calculate the capacity and object count of a directory within a bucket, use the directory statistics function.
Usage Method
I. Using the ZOS Console for Directory Statistics
1. In the ZOS console, click the bucket name to enter the Overview page.
2. Select the File Management tab. The page displays all directories and files in the bucket in a paginated view.
3. Select the directory you want to calculate and click Statistics in the Operation column.
4. After the Directory Statistics dialog appears, the statistics task will start. For directories containing a large number of objects, the calculation may take some time. Please wait patiently. Do not close the dialog before the statistics are complete, or the task will be interrupted.
5. When the Directory statistics in progress prompt disappears, the task is complete, and the results will be displayed in the dialog.
6. After completion, the directory capacity and the data generation time will also be displayed on the file list page. To ensure timeliness, this data will be automatically cleared after one day. To obtain the statistics again, click Statistics once more.
Note
1. Calculating statistics for directories with many objects may take time. Thank you for your patience.
2. The console supports statistics for directories containing up to 100,000 objects. If the object count exceeds 100,000, the process will prompt and abort. For directories exceeding 100,000 objects, use Method II below.
3. Frequent object uploads or deletions during the statistics process, including deletions performed automatically by lifecycle policies in the background, may affect results. In such cases, run the statistics again.
II. Using the SDK for Directory Statistics
1. First, see the ZOS Python SDK Usage Manual. pdf in the SDK Developer Documentation and install the required SDK environment.
2. Use the following sample code to complete directory statistics. Replace the placeholders in "User Basic Information" accordingly.
import boto3
# User basic information
access_key = "Your_Access_Key"
secret_key = "Your_Secret_Key"
endpoint = "Your_Endpoint"
bucket_name = "Your_Bucket_Name"
folder_name = "Your_Folder_Name"
# Statistical result data structure
# Count: File Count
# Size: Capacity in bytes
# StorageTypeStandard: Standard Storage
# StorageTypeIa: Infrequent Storage
# StorageTypeGlacier: Archive Storage
result = {
' Count': 0,
' Size': 0,
' StorageTypeStandard': {
' Count': 0,
' Size': 0,
},
' StorageTypeIa': {
' Count': 0,
' Size': 0,
},
' StorageTypeGlacier': {
' Count': 0,
' Size': 0,
}
}
def get_s3_client():
"""
Get S3 client
"""
session = boto3.session.Session(access_key, secret_key)
s3_client = session.client("s3", endpoint_url=endpoint)
return s3_client
def folder_statistics(bucket_name, folder_name):
"""
Folder storage type statistics
:param bucket_name: Bucket name
:param folder_name: Folder name, full folder path, e.g.: 'abc/def/ghi/jkl/'
:return: None
"""
params = {
' Bucket': bucket_name,
' Prefix': folder_name,
}
# Helper function: update statistics result
def _update_result(result, versions):
for v in versions:
# Skip the file directory.
if v[' Key' ] == params[' Prefix' ]:
continue
size = v[' Size' ]
storage = v[' StorageClass' ]
result[' Count' ] += 1
result[' Size' ] += size
if storage == ' STANDARD':
storage = ' StorageTypeStandard'
elif storage == ' STANDARD_IA':
storage = ' StorageTypeIa'
elif storage == ' GLACIER':
storage = ' StorageTypeGlacier'
else:
continue
result[storage][' Count' ] += 1
result[storage][' Size' ] += size
s3_client = get_s3_client()
resp = s3_client.list_object_versions(**params)
_update_result(result, resp.get(' Versions', []))
# Loop queries and update results until fully traversed.
while resp.get(' IsTruncated', False):
next_key_marker = resp.get(' NextKeyMarker')
next_version_id_marker = resp.get(' NextVersionIdMarker')
print(f'list_object_versions is truncated, NextKeyMarker = {next_key_marker}, NextVersionIdMarker = {next_version_id_marker}')
params[' KeyMarker' ] = next_key_marker
if next_version_id_marker:
params[' VersionIdMarker' ] = next_version_id_marker
resp = s3_client.list_object_versions(**params)
_update_result(result, resp.get(' Versions', []))
params.pop(' KeyMarker', None)
params.pop(' VersionIdMarker', None)
if__name__==' __main__':
# Input the bucket name and folder name to perform statistics.
folder_statistics(bucket_name, folder_name)
print(f"Standard storage capacity (bytes): {result.get(' StorageTypeStandard', {}).get(' Size', 0)}")
print(f"Standard storage file count: {result.get(' StorageTypeStandard', {}).get(' Count', 0)}")
print(f"Infrequent storage capacity (bytes): {result.get(' StorageTypeIa', {}).get(' Size', 0)}")
print(f"Infrequent storage file count: {result.get(' StorageTypeIa', {}).get(' Count', 0)}")
print(f"Archive storage capacity (bytes): {result.get(' StorageTypeGlacier', {}).get(' Size', 0)}")
print(f"Archive storage file count:" {result.get(' StorageTypeGlacier', {}).get(' Count', 0)}")
print(f"Total capacity under folder (bytes): {result.get(' Size', 0)}")
print(f"Total file count under folder: {result.get(' Count', 0)}")