Closed Bug 1355154 Opened 8 years ago Closed 8 years ago

Lazy Json means missing fields for ujson.dumps

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: frank, Unassigned)

References

Details

Frank Bertsch [:frank]

Reporter

Description

•

8 years ago

I ran into the problem where I have a bunch of records retrieved via Dataset API. If I run: >> pings = Dataset.from_source('telemetry').where(submission_date = '20170301').records(sc, sample=.0001) >> pings.map(lambda x: ujson.dumps(x)) The dumped ping ends up missing a bunch of fields (for example, all histograms).

William Lachance (:wlach)

Comment 1

•

8 years ago

Ok, I had a lot of fun trying to figure out what was going on, in the end the problem is kind of obvious. Python uses an optimized set of functions for serializing json written in C, which are defined here: https://github.com/python/cpython/blob/2.7/Modules/_json.c And called from here: https://github.com/python/cpython/blob/2.7/Lib/json/encoder.py#L10 The optimized functions will not be able to use duck typing to handle what's inside and will just default to treating these objects as type 'dict'. There seems to be a bit of tension here between making things easy-to-use vs. fast. The simplest solution that comes to mind is adding a class method to dump the contents of a heka message to JSON. Would that be acceptable?

Assignee: nobody → wlachance

Roberto Agostino Vitillo (:rvitillo)

Comment 2

•

8 years ago

(In reply to William Lachance (:wlach) (use needinfo!) from comment #1) > Ok, I had a lot of fun trying to figure out what was going on, in the end > the problem is kind of obvious. > > Python uses an optimized set of functions for serializing json written in C, > which are defined here: > > https://github.com/python/cpython/blob/2.7/Modules/_json.c > > And called from here: > > https://github.com/python/cpython/blob/2.7/Lib/json/encoder.py#L10 > > The optimized functions will not be able to use duck typing to handle what's > inside and will just default to treating these objects as type 'dict'. See also https://github.com/mozilla/python_moztelemetry/issues/8 > There seems to be a bit of tension here between making things easy-to-use > vs. fast. The simplest solution that comes to mind is adding a class method > to dump the contents of a heka message to JSON. Would that be acceptable? This is just one of the ways this bug manifests itself (see issue above). It would great to figure out a way to make this work seemingly for the user.

William Lachance (:wlach)

Comment 3

•

8 years ago

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2) > (In reply to William Lachance (:wlach) (use needinfo!) from comment #1) > > There seems to be a bit of tension here between making things easy-to-use > > vs. fast. The simplest solution that comes to mind is adding a class method > > to dump the contents of a heka message to JSON. Would that be acceptable? > > This is just one of the ways this bug manifests itself (see issue above). It > would great to figure out a way to make this work seemingly for the user. I suspect that there isn't really any easy solution here, short of modifying the python interpreter. Yesterday Frank found a hack to run `copy.deepcopy` on the object before passing to json.dumps worked. That almost seems as good as any other. Maybe we could just add some kind of shortcut method which does exactly that.

Katie Parlante

Updated

•

8 years ago

Component: Metrics: Pipeline → Telemetry APIs for Analysis

Priority: -- → P2

Product: Cloud Services → Data Platform and Tools

William Lachance (:wlach)

Comment 4

•

8 years ago

I'm not working on this right now.

Assignee: wlachance → nobody

Frank Bertsch [:frank]

Reporter

Updated

•

8 years ago

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

3 years ago

Component: Telemetry APIs for Analysis → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Lazy Json means missing fields for ujson.dumps

Categories

(Data Platform and Tools :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: frank, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Updated

Updated