Closed
Bug 1252844
Opened 10 years ago
Closed 9 years ago
Rewrite executive summary job
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Unassigned)
References
Details
User Story
The Scala job that creates the executive summary dataset in telemetry-batch-view was originally written to run in AWS lambda functions, as such there was no need for Spark. As the scope of telemetry-batch-view has pivoted several times over the past months and the executive summary job is our “example job”, it’s time to rewrite and make it simpler. For starters, we should get rid of its base class “SimpleDerivedStream”, which is not used by any of the other Spark jobs, and merge its functionality within the executive job. Furthermore, we should use a Spark SQL schema instead of an Avro schema to generate the dataset; by doing so we can remove the functionality to create Parquet files manually and store dates “as dates”. Ideally, all our future jobs should use a Spark SQL schema. The longitudinal dataset will keep using our custom solution based on Avro as there we really need the flexibility to decide when and what data is pulled from disk to memory and pushed from memory to disk.
No description provided.
Reporter | ||
Updated•10 years ago
|
User Story: (updated)
Updated•10 years ago
|
Points: --- → 3
Priority: -- → P3
Comment 1•9 years ago
|
||
This job is not needed any more, so the code has been removed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•