Hello. I’m looking for the most efficient way to get a string segment from a larger string, from starting to ending index. I don’t think Text can be treated like an array of chars, so I don’t know how to jump directly to the start index (without looping). Does anyone have a more efficient approach than this? (I will be working with large Text and this function will be called very often.) Thanks!
// mimics JavaScript substring function
public func subText(value : Text, indexStart: Nat, indexEnd : Nat) : Text {
if (indexStart == 0 and indexEnd >= value.size()) {
return value;
};
if (indexStart >= value.size()) {
return "";
};
var result : Text = "";
var i : Nat = 0;
label l for (c in value.chars()) {
if (i >= indexStart and i < indexEnd) {
result := result # Char.toText(c);
};
if (i == indexEnd) {
break l;
};
i += 1;
};
result;
};
Thanks @paulk for sharing a better approach using Iter.range.
public query func subText(value : Text, indexStart : Nat, indexEnd : Nat) : async Text {
if (indexStart == 0 and indexEnd >= value.size()) {
return value;
}
else if (indexStart >= value.size()) {
return "";
};
var indexEndValid = indexEnd;
if (indexEnd > value.size()) {
indexEndValid := value.size();
};
var result : Text = "";
var iter = Iter.toArray<Char>(Text.toIter(value));
for (index in Iter.range(indexStart, indexEndValid - 1)) {
result := result # Char.toText(iter[index]);
};
result;
};
I’m curious to know if there are performance improvements in paulk's version over your first solution? I ask because I think Iter.toArray() also loops through the text to get all the values into an array. It is more intuitive and easier to read but does it make the code run faster or use less memory?
You make an excellent point. It’s tough to benchmark performance of a function within a canister because timestamps (Time.now()) are actually block times. A test with very large Text and a large number of iterations could give us an idea though.
@tomijaga Wow! I did some tests and the performance difference is very surprising and not obvious from looking at the code.
Although, I could not get benchmark times, I was able to find the number of test iterations where one of the functions would exceed the query execution time limit, which I believe is 2,000 ms.
I tested a 10,000 character string and called subText or subText2 1,000 times to get the last 1,000 characters. The first function succeeded and the second timed-out.
Then I played with the test iterations to find the threshold where each function would time-out.
The first function was able to run 1,735 times, while the second could only run 320 times. So the first function runs over 5X faster!!
To be clear, the test functions call the getLongValue function to build a 10,000 character string which contributes to some of the execution time, but that is a constant for both tests.
So, your analysis was correct and it made a much bigger difference that I expected. Thank you for your valuable input!
@tomijaga I may be wrong about the instruction limit referring to a time-out. Now that I think about it, it may be a call-stack error.
If that’s the meaning, then the tests conclude that first function uses a smaller call-stack and can therefore process more operations. However, it may not indicate that one function performs better than the other.