Closed
Bug 1355154
Opened 8 years ago
Closed 8 years ago
Lazy Json means missing fields for ujson.dumps
Categories
(Data Platform and Tools :: General, enhancement, P2)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: frank, Unassigned)
References
Details
I ran into the problem where I have a bunch of records retrieved via Dataset API.
If I run:
>> pings = Dataset.from_source('telemetry').where(submission_date = '20170301').records(sc, sample=.0001)
>> pings.map(lambda x: ujson.dumps(x))
The dumped ping ends up missing a bunch of fields (for example, all histograms).
Comment 1•8 years ago
|
||
Ok, I had a lot of fun trying to figure out what was going on, in the end the problem is kind of obvious.
Python uses an optimized set of functions for serializing json written in C, which are defined here:
https://github.com/python/cpython/blob/2.7/Modules/_json.c
And called from here:
https://github.com/python/cpython/blob/2.7/Lib/json/encoder.py#L10
The optimized functions will not be able to use duck typing to handle what's inside and will just default to treating these objects as type 'dict'.
There seems to be a bit of tension here between making things easy-to-use vs. fast. The simplest solution that comes to mind is adding a class method to dump the contents of a heka message to JSON. Would that be acceptable?
Assignee: nobody → wlachance
Comment 2•8 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #1)
> Ok, I had a lot of fun trying to figure out what was going on, in the end
> the problem is kind of obvious.
>
> Python uses an optimized set of functions for serializing json written in C,
> which are defined here:
>
> https://github.com/python/cpython/blob/2.7/Modules/_json.c
>
> And called from here:
>
> https://github.com/python/cpython/blob/2.7/Lib/json/encoder.py#L10
>
> The optimized functions will not be able to use duck typing to handle what's
> inside and will just default to treating these objects as type 'dict'.
See also https://github.com/mozilla/python_moztelemetry/issues/8
> There seems to be a bit of tension here between making things easy-to-use
> vs. fast. The simplest solution that comes to mind is adding a class method
> to dump the contents of a heka message to JSON. Would that be acceptable?
This is just one of the ways this bug manifests itself (see issue above). It would great to figure out a way to make this work seemingly for the user.
Comment 3•8 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> (In reply to William Lachance (:wlach) (use needinfo!) from comment #1)
> > There seems to be a bit of tension here between making things easy-to-use
> > vs. fast. The simplest solution that comes to mind is adding a class method
> > to dump the contents of a heka message to JSON. Would that be acceptable?
>
> This is just one of the ways this bug manifests itself (see issue above). It
> would great to figure out a way to make this work seemingly for the user.
I suspect that there isn't really any easy solution here, short of modifying the python interpreter.
Yesterday Frank found a hack to run `copy.deepcopy` on the object before passing to json.dumps worked. That almost seems as good as any other. Maybe we could just add some kind of shortcut method which does exactly that.
Updated•8 years ago
|
Component: Metrics: Pipeline → Telemetry APIs for Analysis
Priority: -- → P2
Product: Cloud Services → Data Platform and Tools
Reporter | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•3 years ago
|
Component: Telemetry APIs for Analysis → General
You need to log in
before you can comment on or make changes to this bug.
Description
•