1152449 - [meta] Improve pipeline testability

Reporter

Description

•

10 years ago

In order to avoid problems like bug 1151839, it would be nice to have a way to run automated tests of Sandbox Decoders / Filters / Encoders against a known data set and check the resulting output. It could also be handy for development and debugging.

Mark Reid [:mreid]

Reporter

Comment 1

•

10 years ago

Not that I did not include Sandbox Inputs or Outputs, since they all tend to have different behaviour and may need external resources for testing.

Katie Parlante

Updated

•

10 years ago

Priority: -- → P2

Mark Reid [:mreid]

Reporter

Updated

•

10 years ago

Blocks: 1125443

Mark Reid [:mreid]

Reporter

Comment 2

•

10 years ago

Narrowing the scope a little bit - we don't necessarily need separate code / infrastructure for testing, we just need the following: - ability to run heka in a controlled manner, against a known data set - ability to verify / diff the output against the expected values. We could store test configs and sample input/output data in S3, since we cannot check "real" data into public git repos.

Summary: Make a runner/harness for pipeline sandboxes → Improve pipeline testability

Stuart Philp :sphilp

Comment 3

•

10 years ago

Sorry to just dive in, but this got me thinking :) It sounds like there's two main parts that need to be tested and they don't necessarily need to be done together. 1. Heka collection, and 2. data processing (filters, decoders, etc.). Some ideas: You do the whole thing as one e2e test, mock some data and see if it spits out the correct values. These kind of tests are good for an overall sanity check, but don't necessarily point you in the right direction without digging into logs and stack traces. Or, Break it up into the two phases I mentioned: 1) We have the whole pipeline in staging, we could configure a few firefox instances to use those staging edge servers for tracking. You then automate a series of interactions (open browser, navigate to a page, whatever) that should provide consistent results in terms of data. Fair bit of work here to do essentially an e2e test, but you could short circuit it by pointing the edge servers to a simple logger and then script a parser to count each interaction (or do it manually to start). 2) Can you basically mock the heka client collection phase with a sample log/json file? Here's our input file, run it through the pipeline and does it spit out whats expected. Or, Break it down even further and treat each component as a contract, test the individual input/output of each for correctness. This is obviously even more work, and it doesn't necessary catch integration issues, but it does narrow solving problems down significantly. This combined with the e2e test would give a lot of value imo, as the e2e test would give you the whole integrated picture and the component tests give you a nice structure to adhere to.

Mark Reid [:mreid]

Reporter

Comment 4

•

10 years ago

Please add specific tests as separate bugs that block this one.

Summary: Improve pipeline testability → [meta] Improve pipeline testability

Stuart Philp :sphilp

Comment 5

•

10 years ago

Let me know if this makes sense and I can file some bugs. I'm thinking of this as two phases, collection and processing: collection = user actions -> raw s3 log processing = raw s3 logs -> dashboard numbers Potential TODOs/blocking bugs 1.) we need to decide what metric(s) to test with, and is it a count/percent/whatever 2.) we need a script or command to parse raw s3 logs to get that specific metric/number from #1 3.) we need to be able to run a defined set of user actions for the collection phase (likely an automated test or script) 4.) we need to be able to supply a pre-defined log for the processing phase. related: how can we generate this for any given metric over a specific time period? 5.) to test the collection phase we compare the input of #3 against the output of #2 6.) to test the processing phase we compare the input of #4 against the final dashboard number, as well as contrast the dashboard with the output of #2 for sanity The two phases don't have to use the same metric/dataset, but they could. They also don't both need to be done right away, can work on a test for one phase first, whatever is higher priority.

Flags: needinfo?(mreid)

Mark Reid [:mreid]

Reporter

Comment 6

•

10 years ago

Are you thinking of the "collection" phase as actually running Firefox and generating / submitting some Telemetry data? That could be really helpful, though it may be a fair amount of development effort to get there. Let's file a separate bug for exploring that possibility - one thought: maybe we can get the telemetry data from releng's build machines (they don't actually submit telemetry aiui, since they're not supposed to touch the network during build/test). In the meantime, there are some existing client-side tests that cover at least part of the collection phase, and run as part of every official build - if we identify things in the data that could be more effectively tested on the client, we should file bugs in Toolkit/Telemetry for now. The "processing" phase is what I was hoping to cover in this bug, since we only have automation to test the build of Heka itself, nothing to actually exercise our data processing code. So I'd like to focus on #4 (defining test data sets) and #6 (comparing outputs with expected values) for the critical processing plugins as top priority.

Flags: needinfo?(mreid)

Stuart Philp :sphilp

Comment 7

•

10 years ago

:mreid, yeah exactly, open an fx instance and submit some actual telemetry data. agreed that's a separate concern from this ticket, and also agreed that doing the processing phase makes sense for now. I will make tickets for all of these and then prioritize them accordingly with 4 and 6 at the top and then we can figure out details of each on their own.

Mark Reid [:mreid]

Reporter

Comment 8

•

10 years ago

Sounds good. Adding Georg in case he has some ideas how we might tackle the generate/submit part.

Stuart Philp :sphilp

Updated

•

10 years ago

Depends on: 1165146

Stuart Philp :sphilp

Updated

•

10 years ago

Depends on: 1165148

Stuart Philp :sphilp

Updated

•

10 years ago

Depends on: 1165149

Stuart Philp :sphilp

Updated

•

10 years ago

Depends on: 1165150

Stuart Philp :sphilp

Comment 9

•

10 years ago

Here's the tree to capture this: https://bugzilla.mozilla.org/showdependencytree.cgi?id=1152449&hide_resolved=1 Didn't create tickets yet for #5 and #6 above as that's the actual test execution, we can create those when ready.

Mike Trinkala [:trink]

Updated

•

10 years ago

Depends on: 1178396

Thomas Huelbert

Updated

•

10 years ago

Iteration: --- → 43.2 - Sep 7

Thomas Huelbert

Updated

•

10 years ago

Priority: P2 → P3

Mike Trinkala [:trink]

Updated

•

9 years ago

No longer blocks: 1125443

Mike Trinkala [:trink]

Updated

•

9 years ago

Depends on: 1125443

Mark Reid [:mreid]

Reporter

Updated

•

9 years ago

Depends on: 1310324

Thomas Huelbert

Updated

•

8 years ago

Component: Metrics: Pipeline → General

Product: Cloud Services → Data Platform and Tools

Mark Reid [:mreid]

Reporter

Comment 10

•

7 years ago

Heka is no longer in use - the context for the data processing infrastructure has changed completely since this bug was filed. I'm going to close this as no longer directly applicable.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID

Bugzilla

[meta] Improve pipeline testability

Categories

(Data Platform and Tools :: General, defect, P3)

Tracking

(Not tracked)

People

(Reporter: mreid, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Updated

Comment 9

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 10