Closed Bug 1254547 Opened 10 years ago Closed 10 years ago

Parquet datasets are no longer accessible from Spark clusters

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Unassigned)

References

Details

[hadoop@ip-172-31-20-15 ~]$ aws s3 ls s3://telemetry-parquet/longitudinal/ A client error (AccessDenied) occurred when calling the ListObjects operation: Access Denied
Severity: normal → blocker
Flags: needinfo?(whd)
Priority: -- → P1
Blocks: 1251580
This didn't happen as part of the Spark 1.6 deploy. This happened because during that deploy I did a manual diff and noticed that somebody had added the parquet IAM permissions manually to the spark role. I made a note of this in https://bugzilla.mozilla.org/show_bug.cgi?id=1253392#c1 and was going to file a PR for it today (still am). As a consequence I did the CFN portion of the deploy manually to add the permissions for https://bugzilla.mozilla.org/show_bug.cgi?id=1253392 without losing the parquet permissions. Later :rvitillo and :mreid attempted to deploy spark bootstrap updates with ansible for https://github.com/mozilla/emr-bootstrap-spark/pull/15 which aside from failing due to other IAM permissions issues wiped out the permissions not captured in version control. I had a copy of the old permissions from when I did a diff so I've filed the PR to fix this: https://github.com/mozilla/emr-bootstrap-spark/pull/20 I'll close this once I've deployed it.
Flags: needinfo?(whd)
Deployed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.