Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed leading white space stripping and added new options to iOS LlmInference #5635

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -159,20 +159,26 @@ extension LlmInference {
try llmInference.shouldContinueWithResponseGeneration()

/// Used to make a decision about whitespace stripping.
var receivedFirstToken = true
var receivedFirstNonEmptyToken = false

llmSessionRunner.predictAsync(
progress: { partialResponseStrings, error in
guard let responseStrings = partialResponseStrings,
let humanReadableLlmResponse = Session.humanReadableString(
llmResponses: responseStrings, stripLeadingWhitespaces: receivedFirstToken)
llmResponses: responseStrings, stripLeadingWhitespaces: !receivedFirstNonEmptyToken)
else {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yishuangP I have changed the name of the state variable and the values it takes in different scenarios to map closely to the meaning of the variable name for better readability. So the condition check for leading white space stripping is flipped. The logic remains the same as the previous implementation.

progress(nil, GenAiInferenceError.invalidResponse)
return
}

/// Reset state after first response is processed.
receivedFirstToken = false
/// Reset state after first non empty response is processed. Ensures that leading
/// whitespaces are stripped from the first non empty response.
/// Some models generate series of empty responses for a few times in the beginning before
/// generating a valid response. Ensures that leading white spaces are stripped from the
/// first non empty response.
if !humanReadableLlmResponse.isEmpty {
receivedFirstNonEmptyToken = true
}

progress(humanReadableLlmResponse, nil)
},
Expand Down Expand Up @@ -291,7 +297,8 @@ extension String {
.replacingOccurrences(of: String.newLine, with: "\n")
humanReadableString =
stripLeadingWhitespaces
? humanReadableString.trimmingCharacters(in: .whitespaces) : humanReadableString
? String(humanReadableString.drop(while: {$0.isWhitespace}))
: humanReadableString
return humanReadableString.components(separatedBy: String.eod).first
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ import MediaPipeTasksGenAIC
supported_lora_ranks: supportedLoraRanks.baseAddress,
max_top_k: options.maxTopk,
llm_activation_data_type: options.activationDataType.activationDataTypeC,
num_draft_tokens: 0)
num_draft_tokens: options.draftTokenCount)
return try LlmTaskRunner(modelSettings: modelSetting)
}
}
Expand Down Expand Up @@ -224,6 +224,10 @@ extension LlmInference {
/// The activation data type for the model.
@objc public var activationDataType: ActivationDataType = .default

/// Number of draft tokens to generate when using speculative decoding. Setting to 0 will
/// disable speculative decoding.
@objc public var draftTokenCount: Int = 0

/// Creates a new instance of `Options` with the given `modelPath` and default values of
/// `maxTokens`, `maxTopk`, `supportedLoraRanks` and `activationDataType`.
/// This function is only intended to be used from Objective C.
Expand Down