Closed Bug 1152449 Opened 10 years ago Closed 7 years ago

[meta] Improve pipeline testability

Categories

(Data Platform and Tools :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: mreid, Unassigned)

References

Details

In order to avoid problems like bug 1151839, it would be nice to have a way to run automated tests of Sandbox Decoders / Filters / Encoders against a known data set and check the resulting output. It could also be handy for development and debugging.
Not that I did not include Sandbox Inputs or Outputs, since they all tend to have different behaviour and may need external resources for testing.
Priority: -- → P2
Blocks: 1125443
Narrowing the scope a little bit - we don't necessarily need separate code / infrastructure for testing, we just need the following: - ability to run heka in a controlled manner, against a known data set - ability to verify / diff the output against the expected values. We could store test configs and sample input/output data in S3, since we cannot check "real" data into public git repos.
Summary: Make a runner/harness for pipeline sandboxes → Improve pipeline testability
Sorry to just dive in, but this got me thinking :) It sounds like there's two main parts that need to be tested and they don't necessarily need to be done together. 1. Heka collection, and 2. data processing (filters, decoders, etc.). Some ideas: You do the whole thing as one e2e test, mock some data and see if it spits out the correct values. These kind of tests are good for an overall sanity check, but don't necessarily point you in the right direction without digging into logs and stack traces. Or, Break it up into the two phases I mentioned: 1) We have the whole pipeline in staging, we could configure a few firefox instances to use those staging edge servers for tracking. You then automate a series of interactions (open browser, navigate to a page, whatever) that should provide consistent results in terms of data. Fair bit of work here to do essentially an e2e test, but you could short circuit it by pointing the edge servers to a simple logger and then script a parser to count each interaction (or do it manually to start). 2) Can you basically mock the heka client collection phase with a sample log/json file? Here's our input file, run it through the pipeline and does it spit out whats expected. Or, Break it down even further and treat each component as a contract, test the individual input/output of each for correctness. This is obviously even more work, and it doesn't necessary catch integration issues, but it does narrow solving problems down significantly. This combined with the e2e test would give a lot of value imo, as the e2e test would give you the whole integrated picture and the component tests give you a nice structure to adhere to.
Please add specific tests as separate bugs that block this one.
Summary: Improve pipeline testability → [meta] Improve pipeline testability
Let me know if this makes sense and I can file some bugs. I'm thinking of this as two phases, collection and processing: collection = user actions -> raw s3 log processing = raw s3 logs -> dashboard numbers Potential TODOs/blocking bugs 1.) we need to decide what metric(s) to test with, and is it a count/percent/whatever 2.) we need a script or command to parse raw s3 logs to get that specific metric/number from #1 3.) we need to be able to run a defined set of user actions for the collection phase (likely an automated test or script) 4.) we need to be able to supply a pre-defined log for the processing phase. related: how can we generate this for any given metric over a specific time period? 5.) to test the collection phase we compare the input of #3 against the output of #2 6.) to test the processing phase we compare the input of #4 against the final dashboard number, as well as contrast the dashboard with the output of #2 for sanity The two phases don't have to use the same metric/dataset, but they could. They also don't both need to be done right away, can work on a test for one phase first, whatever is higher priority.
Flags: needinfo?(mreid)
Are you thinking of the "collection" phase as actually running Firefox and generating / submitting some Telemetry data? That could be really helpful, though it may be a fair amount of development effort to get there. Let's file a separate bug for exploring that possibility - one thought: maybe we can get the telemetry data from releng's build machines (they don't actually submit telemetry aiui, since they're not supposed to touch the network during build/test). In the meantime, there are some existing client-side tests that cover at least part of the collection phase, and run as part of every official build - if we identify things in the data that could be more effectively tested on the client, we should file bugs in Toolkit/Telemetry for now. The "processing" phase is what I was hoping to cover in this bug, since we only have automation to test the build of Heka itself, nothing to actually exercise our data processing code. So I'd like to focus on #4 (defining test data sets) and #6 (comparing outputs with expected values) for the critical processing plugins as top priority.
Flags: needinfo?(mreid)
:mreid, yeah exactly, open an fx instance and submit some actual telemetry data. agreed that's a separate concern from this ticket, and also agreed that doing the processing phase makes sense for now. I will make tickets for all of these and then prioritize them accordingly with 4 and 6 at the top and then we can figure out details of each on their own.
Sounds good. Adding Georg in case he has some ideas how we might tackle the generate/submit part.
Depends on: 1165146
Depends on: 1165148
Depends on: 1165149
Depends on: 1165150
Here's the tree to capture this: https://bugzilla.mozilla.org/showdependencytree.cgi?id=1152449&hide_resolved=1 Didn't create tickets yet for #5 and #6 above as that's the actual test execution, we can create those when ready.
Depends on: 1178396
Iteration: --- → 43.2 - Sep 7
Priority: P2 → P3
No longer blocks: 1125443
Depends on: 1125443
Depends on: 1310324
Component: Metrics: Pipeline → General
Product: Cloud Services → Data Platform and Tools
Heka is no longer in use - the context for the data processing infrastructure has changed completely since this bug was filed. I'm going to close this as no longer directly applicable.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.