Closed Bug 1252844 Opened 10 years ago Closed 9 years ago

Rewrite executive summary job

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rvitillo, Unassigned)

References

Details

User Story

The Scala job that creates the executive summary dataset in telemetry-batch-view was originally written to run in AWS lambda functions, as such there was no need for Spark.

As the scope of telemetry-batch-view has pivoted several times over the past months and the executive summary job is our “example job”, it’s time to rewrite and make it simpler.

For starters, we should get rid of its base class “SimpleDerivedStream”, which is not used by any of the other Spark jobs, and merge its functionality within the executive job. 

Furthermore, we should use a Spark SQL schema instead of an Avro schema to generate the dataset; by doing so we can remove the functionality to create Parquet files manually and store dates “as dates”.

Ideally, all our future jobs should use a Spark SQL schema. The longitudinal dataset will keep using our custom solution based on Avro as there we really need the flexibility to decide when and what data is pulled from disk to memory and pushed from memory to disk.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

10 years ago

No description provided.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Blocks: 1245490

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

User Story: (updated)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Blocks: 1252828

Rob Miller [:rmiller]

Updated

•

10 years ago

Points: --- → 3

Priority: -- → P3

Mark Reid [:mreid]

Comment 1

•

9 years ago

This job is not needed any more, so the code has been removed.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Rewrite executive summary job

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Updated