Closed Bug 1345064 Opened 9 years ago Closed 4 years ago

Prototype Aggregates Dataset in Parquet

Categories

(Data Platform and Tools :: General, enhancement, P3)

enhancement
Points:
5

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: frank, Assigned: robhudson)

References

Details

User Story

We want to create a proof of concept that accomplishes the following:
- Aggregate pings, similar to now, but write to a parquet file.
- The parquet will be partitioned by metric (~1400 metrics)
- The parquet output will be sorted by date first, then other highly filtered on dimensions thereafter (e.g. version, then os)
- There will be a build_id cutoff for each channel, a point at which we no longer aggregate pings for old build_ids (to be defined)

The parquet schema would have:
- A column for each dimension
- A column for size
- A column for count
- As well as the current column with the aggregated data

The POC would involve testing the "queryability" of common queries by submission_date and build_id and check for speed.

There are some unknowns around whether or not we merge new data as it is aggregated or if we do a pseudo-merge and follow up with a real parquet merge weekly or other time interval. Perhaps the real merge doesn't take very long. So some testing could be done here.

Attachments

(1 file)

No description provided.
Component: Metrics: Pipeline → Datasets: Telemetry Aggregates
Product: Cloud Services → Data Platform and Tools
No longer blocks: 1255755
Assignee: nobody → robhudson
Points: 3 → 5
Priority: P3 → P1
User Story: (updated)
Depends on: 1475357
Depends on: 1475358
Depends on: 1475359
Priority: P1 → P3
The build ID cutoffs per channel are defined as follows: # Cutoff in days. BUILD_ID_CUTOFFS = { 'release': 84, 'esr': 84, 'beta': 30, 'aurora': 30, 'nightly': 10, }
Depends on: 1488465
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Component: Datasets: Telemetry Aggregates → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: