adding the JsonShapeSerializer#3842
Conversation
| return schema.GetMemberName(); | ||
| } | ||
|
|
||
| void BeginStructure(const Schema&) {} |
There was a problem hiding this comment.
whats goin on here, shouldnt we be adding something to the stack when we begin/end a structure?
|
|
||
| void WriteBoolean(const Schema& schema, bool value) { | ||
| if (!m_stack.empty() && m_stack.back().isList) { | ||
| JsonValue v; |
There was a problem hiding this comment.
i dont think we wanna declare a JsonValue value here because thats gunna make a heap allocation in cjson
| Aws::Vector<JsonValue> listItems; | ||
| }; | ||
|
|
||
| Aws::Vector<StackEntry> m_stack; |
There was a problem hiding this comment.
OK i see the grand picture of this and this matches our performance profile issues. This is actually much more simple than you are making it, lets start with what we are trying to do:
- we are prioritizing code gen to avoid a DOM/in memory tree representation of the object like cjson does
- we are allocating all of the memory up front (or at least try to) so that that we're only allocating and deallocating memory once.
So to start JsonValue will NOT be used in this class. JsonValue calls cjson which will have a multiple heap allocations as it traverses the tree. so that is out the window.
We are going to make it simple we have a character buffer(your calling it a stack in this case), and we just write to it, nothing more.
The basic structure of this would look like
#include <vector>
class json_parser_no_dom {
public:
json_parser_no_dom(size_t buffer_size) {
buffer_.reserve(buffer_size);
}
json_parser_no_dom(const json_parser_no_dom &other) = delete;
json_parser_no_dom(json_parser_no_dom &&other) noexcept = default;
json_parser_no_dom & operator=(const json_parser_no_dom &other) = delete;
json_parser_no_dom & operator=(json_parser_no_dom &&other) noexcept = default;
//...
std::string serialize() { return buffer_; }
private:
std::string buffer_{};
};where in this case the parser has a pre-allocated buffer to write stuff into. now we just put stuff into it.
class json_parser_no_dom {
public:
void write_string(const std::string& str, const std::string& value) {
buffer_.append("\"");
buffer_.append(str);
buffer_.append("\":");
buffer_.append("\"");
buffer_.append(value);
buffer_.append("\"");
}
};where you could do something like
auto main() -> int {
json_parser_no_dom json_parser{100};
json_parser.begin_structure();
json_parser.write_string("foo", "bar");
json_parser.end_structure();
std::cout << json_parser.serialize() << std::endl;
return 0;
}and see
{"foo":"bar"}
so at a very high level I think the problem here is that you are relying on JsonValue and cjson to do the hard parts, and that actually means you are performing heap allocations. what you want to do is rely on the schema for tree walking, and then you just dump the tree to a buffer.
| void EndNestedStructure() override; | ||
|
|
||
| Aws::String GetPayload() const; | ||
| const Aws::String& GetPayload() const; |
There was a problem hiding this comment.
while returning by const Aws::String&avoids a copy, this would create a lifetime issues upstream where if JsonShapeSerializer goes out of scope you have a dangling ref. this also correlates to another comment i have which is that i dont like the temporary variables that we use to read into std::vector. so at a high level i think we have a few requirements:
- we want to avoid a copy when we call get payload
- after get payload is called, we can call get payload again, however the buffer at that point is "finalized" and we want to transfer ownership for the buffer to the called.
- we want to avoid temporary variables while writing values to underlying buffer.
So to solve all of these things i suggest that we:
- use std::string as the buffer, pre-allocated to 8KiB on the heap to avoid re-allocation.
- we create a operation that returns the buffer string by move that invalidates the serializer object however avoid a copy.
enum class error {
ALREADY_FINALIZED,
};
class buffer_builder {
public:
buffer_builder() {
buffer_.reserve(8192);
}
//methods to write to buffer
std::expected<std::string, error> serialize() {
if (state_ == state::FINALIZED) {
return std::unexpected(error::ALREADY_FINALIZED);
}
state_ = state::FINALIZED;
return std::move(buffer_);
}
private:
enum class state: uint8_t {
INITIALIZED,
FINALIZED,
} state_{state::INITIALIZED};
std::string buffer_;
};
Adding the implementation for JsonShapeSerializer
Check all that applies:
Check which platforms you have built SDK on to verify the correctness of this PR.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.