Closed Bug 1254547 Opened 10 years ago Closed 10 years ago

Parquet datasets are no longer accessible from Spark clusters

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rvitillo, Unassigned)

References

Details

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

10 years ago

[hadoop@ip-172-31-20-15 ~]$ aws s3 ls s3://telemetry-parquet/longitudinal/ A client error (AccessDenied) occurred when calling the ListObjects operation: Access Denied

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Severity: normal → blocker

Flags: needinfo?(whd)

Priority: -- → P1

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Blocks: 1251580

Wesley Dawson [:whd]

Comment 1

•

10 years ago

This didn't happen as part of the Spark 1.6 deploy. This happened because during that deploy I did a manual diff and noticed that somebody had added the parquet IAM permissions manually to the spark role. I made a note of this in https://bugzilla.mozilla.org/show_bug.cgi?id=1253392#c1 and was going to file a PR for it today (still am). As a consequence I did the CFN portion of the deploy manually to add the permissions for https://bugzilla.mozilla.org/show_bug.cgi?id=1253392 without losing the parquet permissions. Later :rvitillo and :mreid attempted to deploy spark bootstrap updates with ansible for https://github.com/mozilla/emr-bootstrap-spark/pull/15 which aside from failing due to other IAM permissions issues wiped out the permissions not captured in version control. I had a copy of the old permissions from when I did a diff so I've filed the PR to fix this: https://github.com/mozilla/emr-bootstrap-spark/pull/20 I'll close this once I've deployed it.

Flags: needinfo?(whd)

Wesley Dawson [:whd]

Comment 2

•

10 years ago

Deployed.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Parquet datasets are no longer accessible from Spark clusters

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated