Closed
Bug 1345064
Opened 9 years ago
Closed 4 years ago
Prototype Aggregates Dataset in Parquet
Categories
(Data Platform and Tools :: General, enhancement, P3)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: frank, Assigned: robhudson)
References
Details
User Story
We want to create a proof of concept that accomplishes the following: - Aggregate pings, similar to now, but write to a parquet file. - The parquet will be partitioned by metric (~1400 metrics) - The parquet output will be sorted by date first, then other highly filtered on dimensions thereafter (e.g. version, then os) - There will be a build_id cutoff for each channel, a point at which we no longer aggregate pings for old build_ids (to be defined) The parquet schema would have: - A column for each dimension - A column for size - A column for count - As well as the current column with the aggregated data The POC would involve testing the "queryability" of common queries by submission_date and build_id and check for speed. There are some unknowns around whether or not we merge new data as it is aggregated or if we do a pseudo-merge and follow up with a real parquet merge weekly or other time interval. Perhaps the real merge doesn't take very long. So some testing could be done here.
Attachments
(1 file)
No description provided.
Reporter | ||
Updated•8 years ago
|
Component: Metrics: Pipeline → Datasets: Telemetry Aggregates
Product: Cloud Services → Data Platform and Tools
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → robhudson
Points: 3 → 5
Priority: P3 → P1
Reporter | ||
Updated•8 years ago
|
User Story: (updated)
Comment 1•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Priority: P1 → P3
Assignee | ||
Comment 2•7 years ago
|
||
The build ID cutoffs per channel are defined as follows:
# Cutoff in days.
BUILD_ID_CUTOFFS = {
'release': 84,
'esr': 84,
'beta': 30,
'aurora': 30,
'nightly': 10,
}
Updated•4 years ago
|
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Updated•3 years ago
|
Component: Datasets: Telemetry Aggregates → General
You need to log in
before you can comment on or make changes to this bug.
Description
•